Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

4-bit precision

8-bit precision

text-embeddings-inference

Mixture of Experts

Carbon Emissions

Models

13

Full-text search

Active filters: W4A16

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v2

Text Generation • 33B • Updated Dec 18, 2024 • 20 • 16

ModelCloud/QwQ-32B-Preview-gptqmodel-4bit-vortex-v3

Text Generation • 33B • Updated Dec 20, 2024 • 18 • 14

ModelCloud/Falcon3-10B-Instruct-gptqmodel-4bit-vortex-v1

Text Generation • 10B • Updated Dec 21, 2024 • 7 • 3

ModelCloud/Qwen2.5-0.5B-Instruct-gptqmodel-w4a16

Text Generation • 0.5B • Updated Oct 19, 2025 • 10 • 1

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v1

Text Generation • 8B • Updated Jan 24, 2025 • 24 • 5

ModelCloud/DeepSeek-R1-Distill-Qwen-7B-gptqmodel-4bit-vortex-v2

Text Generation • 8B • Updated Jan 24, 2025 • 233 • 7

RedHatAI/phi-4-quantized.w4a16

Text Generation • 3B • Updated Sep 25, 2025 • 277 • 4

RedHatAI/Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16

Image-Text-to-Text • 5B • Updated Oct 29, 2025 • 5.1k • 10

RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

Image-Text-to-Text • 20B • Updated Sep 22, 2025 • 148k • 12

pyrymikko/nomic-embed-code-W4A16-AWQ

1B • Updated Sep 30, 2025 • 213k

tcclaviger/Minimax-M2-Thrift-GPTQ-W4A16-AMD

Text Generation • 24B • Updated Dec 1, 2025 • 7 • 1

TevunahAi/granite-34b-code-instruct-8k-Ultra-Hybrid

Text Generation • 11B • Updated Dec 1, 2025 • 8

TevunahAi/Llama-3.1-70B-Instruct-Ultra-Hybrid

Text Generation • 22B • Updated about 1 month ago • 26