Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

251

Full-text search

Active filters: int4

2imi9/Qwen3-1.7b-gptq-int4

Text Generation • 0.9B • Updated Sep 12, 2025 • 2

Dhruvil03/Perception-LM-1B-Int4bit

Image-Text-to-Text • 2B • Updated Dec 2, 2025 • 7 • 1

RiverkanIT/Ling-mini-2.0-Quantized

Text Generation • Updated Sep 17, 2025 • 2

ForeseeLab/foreseeai-qwen3-4b-iot-int4

Text Generation • 4B • Updated Sep 30, 2025 • 1

ISTA-DASLab/Llama-3.1-8B-Instruct-MR-GPTQ-nvfp

Image-Text-to-Text • 5B • Updated Oct 3, 2025 • 62

ISTA-DASLab/Llama-3.1-8B-Instruct-MR-GPTQ-mxfp

Image-Text-to-Text • 5B • Updated Oct 3, 2025 • 12

huawei-csl/Qwen3-1.7B-4bit-SINQ

Text Generation • 1B • Updated Feb 2 • 5 • 5

huawei-csl/Qwen3-1.7B-4bit-ASINQ

Text Generation • 1B • Updated Feb 2 • 3 • 5

huawei-csl/Qwen3-32B-4bit-SINQ

Text Generation • 18B • Updated Feb 2 • 11 • 7

huawei-csl/Qwen3-14B-4bit-SINQ

Text Generation • 9B • Updated Feb 2 • 7 • 5

huawei-csl/Qwen3-14B-4bit-ASINQ

Text Generation • 9B • Updated Feb 2 • 3 • 6

huawei-csl/Qwen3-32B-4bit-ASINQ

Text Generation • 18B • Updated Feb 2 • 4 • 8

ModelCloud/GLM-4.6-GPTQMODEL-W4A16-v1

Text Generation • 357B • Updated Oct 28, 2025 • 1

ModelCloud/GLM-4.6-GPTQMODEL-W4A16-v2

Text Generation • 357B • Updated Oct 28, 2025 • 3 • 1

PangaiaSoftware/YanoljaNEXT-Rosetta-4B-onnx

Translation • Updated Oct 21, 2025 • 1 • 2

RedHatAI/NVIDIA-Nemotron-Nano-9B-v2-quantized.w4a16

Text Generation • 2B • Updated Jan 23 • 245 • 5

ModelCloud/GLM-4.6-REAP-268B-A32B-GPTQMODEL-W4A16

Text Generation • 269B • Updated Oct 28, 2025 • 3 • 2

AhtnaGlen/phi-4-mini-instruct-int4-sym-npu-ov

Text Generation • Updated Oct 27, 2025 • 3

tencent/DeepSeek-V3.1-Terminus-W4AFP8

Text Generation • 349B • Updated Nov 4, 2025 • 1.66k • 15

ModelCloud/MiniMax-M2-GPTQMODEL-W4A16

Text Generation • 229B • Updated Oct 28, 2025 • 59 • 3

ModelCloud/Marin-32B-Base-GPTQMODEL-W4A16

Text Generation • 33B • Updated Oct 29, 2025 • 1 • 1

ModelCloud/Marin-32B-Base-GPTQMODEL-AWQ-W4A16

Text Generation • 33B • Updated Oct 30, 2025 • 10 • 2

huawei-csl/Apertus-8B-2509-4bit-SINQ

Text Generation • 5B • Updated Feb 2 • 2

huawei-csl/Apertus-8B-2509-4bit-ASINQ

Text Generation • 5B • Updated Feb 2 • 5 • 2

ModelCloud/Granite-4.0-H-1B-GPTQMODEL-W4A16

Text Generation • 1B • Updated Oct 31, 2025 • 10 • 1

ModelCloud/Granite-4.0-H-350M-GPTQMODEL-W4A16

Text Generation • 0.3B • Updated Oct 31, 2025 • 25 • 1

ModelCloud/Brumby-14B-Base-GPTQMODEL-W4A16

Text Generation • 15B • Updated Oct 31, 2025 • 5 • 1

ModelCloud/Brumby-14B-Base-GPTQMODEL-W4A16-v2

Text Generation • 15B • Updated Oct 31, 2025 • 43 • 1

SherlockID365/Qwen3-VL-8B-Instruct-quantized.w4a16

Image-Text-to-Text • 3B • Updated Nov 3, 2025 • 46 • 1

Ishant86/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-compressed-tensors-int4

6B • Updated Nov 13, 2025