--- license: apache-2.0 base_model: - microsoft/Phi-4-reasoning pipeline_tag: text-generation library_name: transformers --- ## Phi-4 Reasoning Quantized --- ### **🚀 Model Description** This is an **int8 quantized version** of **Phi-4 Reasoning**, optimized using **torchao** for reduced memory footprint and accelerated inference. The quantization applies **int8 weights with dynamic int8 activations**, maintaining high task performance while enabling efficient deployment on consumer and edge hardware. --- ### **Quantization Details** * **Method:** torchao quantization * **Weight Precision:** int8 * **Activation Precision:** int8 dynamic * **Technique:** Symmetric mapping * **Impact:** Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities. --- ### **🎯 Intended Use** * Fast inference in **production environments with limited VRAM** * Research on **int8 quantization deployment performance** * Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks. --- ### **⚠️ Limitations** * Slight degradation in performance compared to full-precision (bfloat16) models * English-centric training data; may underperform in other languages or nuanced tasks * Further finetuning or quantization-aware calibration can enhance task-specific performance. ---