Qwen3.5-35B-A3B-NVFP4

This is a quantized version of Qwen/Qwen3.5-35B-A3B using the NVFP4 quantization scheme.

Please use nightly vLLM for support.

Changelog

  • 02/03/2026: Re-quantized with MTP (multi-token prediction) weights preserved, enabling speculative decoding with vLLM.
  • 25/02/2026: Initial upload.

Calibration

Creation

This model was created using VLLM's LLM Compressor with Qwen3.5 MoE support added via PR #2383. The PR adds a custom CalibrationQwen3MoeSparseMoeBlock that routes calibration data to all experts during quantization, ensuring every expert receives proper calibration for accurate NVFP4 quantization.

Downloads last month
33,057
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sehyo/Qwen3.5-35B-A3B-NVFP4

Quantized
(79)
this model