BioMistral-7B-FP8-Dynamic
Overview
BioMistral-7B-FP8-Dynamic is an FP8 Dynamic–quantized version of the BioMistral-7B model, designed for high-performance inference while maintaining strong quality on biomedical and medical NLP tasks.
This model is primarily intended for deployment with vLLM on modern GPUs (Hopper / Ada architectures).
Base Model
- Base model: BioMistral-7B
- Architecture: Mistral-style decoder-only Transformer
- Domain: Biomedical / Medical Natural Language Processing
Quantization
- Method: FP8 Dynamic
- Scope: Linear layers
- Objective: Reduce VRAM usage and improve inference throughput
Notes
- The weights are already quantized.
- Do not apply additional runtime quantization.
Intended Use
- Biomedical and medical text generation
- Medical writing assistance
- Summarization and analysis of scientific literature
- Medical RAG pipelines (clinical notes, research papers)
Deployment (vLLM)
Recommended
vllm serve ig1/BioMistral-7B-FP8-Dynamic \
--served-model-name biomistral-7b-fp8 \
--dtype auto
- Downloads last month
- 12