Add Artificial Analysis evaluations for ministral-8b

#4
Files changed (1) hide show
  1. README.md +299 -249
README.md CHANGED
@@ -1,250 +1,300 @@
1
- ---
2
- library_name: vllm
3
- language:
4
- - en
5
- - fr
6
- - es
7
- - de
8
- - it
9
- - pt
10
- - nl
11
- - zh
12
- - ja
13
- - ko
14
- - ar
15
- license: apache-2.0
16
- inference: false
17
- extra_gated_description: >-
18
- If you want to learn more about how we process your personal data, please read
19
- our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
20
- tags:
21
- - mistral-common
22
- ---
23
-
24
- # Ministral 3 8B Base 2512
25
- A balanced model in the Ministral 3 family, **Ministral 3 8B** is a powerful, efficient tiny language model with vision capabilities.
26
-
27
- This model is the base pre-trained version, not fine-tuned for instruction or reasoning tasks, making it ideal for custom post-training processes.
28
- For instruction and chat based use cases, we recommend using [Ministral 3 8B Instruct 2512](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512).
29
-
30
- The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.
31
-
32
- ## Key Features
33
- Ministral 3 8B consists of two main architectural components:
34
- - **8.4B Language Model**
35
- - **0.4B Vision Encoder**
36
-
37
- The Ministral 3 8B Base model offers the following capabilities:
38
- - **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.
39
- - **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
40
- - **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.
41
- - **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
42
- - **Large Context Window**: Supports a 256k context window.
43
-
44
- ### Use Cases
45
- Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.
46
- - Chat interfaces in constrained environments
47
- - Local daily-driver AI assistant
48
- - Image/document description and understanding
49
- - Translation and content generation
50
- - Specialized agentic use cases
51
- - Fine-tuning and specialization
52
- - And more...
53
-
54
- Bringing advanced AI capabilities to resource-constrained environments.
55
-
56
- ## Ministral 3 Family
57
-
58
- | Model Name | Type | Precision | Link |
59
- |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
60
- | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
61
- | Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
62
- | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
63
- | **Ministral 3 8B Base 2512** | **Base pre-trained** | **BF16** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
64
- | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
65
- | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
66
- | Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
67
- | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
68
- | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
69
-
70
- Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).
71
-
72
- ## Benchmark Results
73
-
74
- We compare Ministral 3 to similar sized models.
75
-
76
- ### Reasoning
77
-
78
- | Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |
79
- |---------------------------|-------------|-------------|--------------|---------------|
80
- | **Ministral 3 14B** | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u> |
81
- | Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |
82
- | | | | | |
83
- | **Ministral 3 8B** | 0.787 | <u>0.860</u>| 0.668 | <u>0.616</u> |
84
- | Qwen3-VL-8B-Thinking | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580 |
85
- | | | | | |
86
- | **Ministral 3 3B** | <u>0.721</u>| <u>0.775</u>| 0.534 | <u>0.548</u> |
87
- | Qwen3-VL-4B-Thinking | 0.697 | 0.729 | <u>0.601</u> | 0.513 |
88
-
89
- ### Instruct
90
-
91
- | Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |
92
- |---------------------------|-------------|------------|-------------|------------------|
93
- | **Ministral 3 14B** | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u> |
94
- | Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |
95
- | Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |
96
- | | | | | |
97
- | **Ministral 3 8B** | 0.509 | <u>66.8</u>| 0.876 | <u>8.08</u> |
98
- | Qwen3-VL-8B-Instruct | <u>0.528</u>| 66.3 | <u>0.946</u>| 8.00 |
99
- | | | | | |
100
- | **Ministral 3 3B** | 0.305 | <u>56.8</u>| 0.830 | 7.83 |
101
- | Qwen3-VL-4B-Instruct | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u> |
102
- | Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |
103
- | Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 |
104
-
105
- ### Base
106
-
107
- | Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
108
- |---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|
109
- | **Ministral 3 14B** | 0.742 | <u>0.676</u> | 0.648 | 0.820 | 0.794 | 0.749 |
110
- | Qwen3 14B Base | <u>0.754</u> | 0.620 | <u>0.661</u> | <u>0.837</u> | <u>0.804</u>| 0.703 |
111
- | Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | <u>0.788</u> |
112
- | | | | | | | |
113
- | **Ministral 3 8B** | <u>0.706</u> | <u>0.626</u> | 0.591 | 0.793 | <u>0.761</u>| <u>0.681</u> |
114
- | Qwen 3 8B Base | 0.700 | 0.576 | <u>0.596</u> | <u>0.794</u> | 0.760 | 0.639 |
115
- | | | | | | | |
116
- | **Ministral 3 3B** | 0.652 | <u>0.601</u> | 0.511 | 0.735 | 0.707 | 0.592 |
117
- | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |
118
- | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |
119
-
120
- ## Usage
121
-
122
- The model can be used with the following frameworks;
123
- - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
124
- - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
125
-
126
- ### vLLM
127
-
128
- We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
129
-
130
- #### Installation
131
-
132
- Make sure to install **vllm >= 1.12.0**:
133
-
134
- ```
135
- pip install vllm --upgrade
136
- ```
137
-
138
- Doing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).
139
-
140
- To check:
141
- ```
142
- python -c "import mistral_common; print(mistral_common.__version__)"
143
- ```
144
-
145
- You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).
146
-
147
- #### Serve
148
-
149
- Due to their size and the BF16 format of their weights `Ministral-3-3B-Base-2512` and `Ministral-3-8B-Base-2512` can run on a single 1xH200 GPU.
150
-
151
- A simple launch command is:
152
-
153
- ```bash
154
- vllm serve mistralai/Ministral-3-8B-Instruct-2512 \
155
- --tokenizer_mode mistral --config_format mistral --load_format mistral
156
- ```
157
-
158
- Additional flags:
159
-
160
- * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
161
- * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
162
-
163
- #### Usage of the model
164
-
165
- Here we asumme that the model `mistralai/Ministral-3-8B-Base-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
166
-
167
- <details>
168
- <summary>Test Base</summary>
169
-
170
- Quick test with the base model.
171
-
172
- ```python
173
- from openai import OpenAI
174
-
175
- # Modify OpenAI's API key and API base to use vLLM's API server.
176
- openai_api_key = "EMPTY"
177
- openai_api_base = "http://localhost:8000/v1"
178
-
179
- TEMP = 0.15
180
- MAX_TOK = 256
181
-
182
- client = OpenAI(
183
- api_key=openai_api_key,
184
- base_url=openai_api_base,
185
- )
186
-
187
- models = client.models.list()
188
- model = models.data[0].id
189
-
190
- response = client.completions.create(
191
- model=model,
192
- prompt="What is the best thing in the universe ?",
193
- temperature=TEMP,
194
- max_tokens=MAX_TOK,
195
- )
196
-
197
- print(response.choices[0].text)
198
- ```
199
-
200
- </details>
201
-
202
- ### Transformers
203
-
204
- You can also use Ministral 3 8B Base 2512 with `Transformers` !
205
- Make sure to install `Transformers` from its first v5 release candidate or from "main":
206
-
207
- ```
208
- pip install transformers==5.0.0rc0
209
- ```
210
-
211
- To make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.
212
-
213
- ```bash
214
- pip install mistral-common --upgrade
215
- ```
216
-
217
- Then load our tokenizer along with the model and generate:
218
-
219
- <details>
220
- <summary>Python snippet</summary>
221
-
222
- ```python
223
- from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend, FineGrainedFP8Config
224
-
225
- model_id = "mistralai/Ministral-3-8B-Base-2512"
226
- model = Mistral3ForConditionalGeneration.from_pretrained(
227
- model_id,
228
- device_map="auto",
229
- )
230
- tokenizer = MistralCommonBackend.from_pretrained(model_id)
231
-
232
- input_ids = tokenizer.encode("Once about a time, France was a", return_tensors="pt")
233
- input_ids = input_ids.to("cuda")
234
-
235
- output = model.generate(
236
- input_ids,
237
- max_new_tokens=30,
238
- )[0]
239
-
240
- decoded_output = tokenizer.decode(output[len(input_ids[0]):])
241
- print(decoded_output)
242
- ```
243
-
244
- </details>
245
-
246
- ## License
247
-
248
- This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
249
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
250
  *You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*
 
1
+ ---
2
+ library_name: vllm
3
+ language:
4
+ - en
5
+ - fr
6
+ - es
7
+ - de
8
+ - it
9
+ - pt
10
+ - nl
11
+ - zh
12
+ - ja
13
+ - ko
14
+ - ar
15
+ license: apache-2.0
16
+ inference: false
17
+ extra_gated_description: If you want to learn more about how we process your personal
18
+ data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
19
+ tags:
20
+ - mistral-common
21
+ model-index:
22
+ - name: Ministral-3-8B-Base-2512
23
+ results:
24
+ - task:
25
+ type: evaluation
26
+ dataset:
27
+ name: Artificial Analysis Benchmarks
28
+ type: artificial_analysis
29
+ metrics:
30
+ - name: Artificial Analysis Intelligence Index
31
+ type: artificial_analysis_intelligence_index
32
+ value: 28.2
33
+ - name: Artificial Analysis Coding Index
34
+ type: artificial_analysis_coding_index
35
+ value: 18.4
36
+ - name: Artificial Analysis Math Index
37
+ type: artificial_analysis_math_index
38
+ value: 31.7
39
+ - name: Mmlu Pro
40
+ type: mmlu_pro
41
+ value: 0.642
42
+ - name: Gpqa
43
+ type: gpqa
44
+ value: 0.471
45
+ - name: Hle
46
+ type: hle
47
+ value: 0.043
48
+ - name: Livecodebench
49
+ type: livecodebench
50
+ value: 0.303
51
+ - name: Scicode
52
+ type: scicode
53
+ value: 0.208
54
+ - name: Aime 25
55
+ type: aime_25
56
+ value: 0.317
57
+ - name: Ifbench
58
+ type: ifbench
59
+ value: 0.291
60
+ - name: Lcr
61
+ type: lcr
62
+ value: 0.24
63
+ - name: Terminalbench Hard
64
+ type: terminalbench_hard
65
+ value: 0.043
66
+ - name: Tau2
67
+ type: tau2
68
+ value: 0.266
69
+ source:
70
+ name: Artificial Analysis API
71
+ url: https://artificialanalysis.ai
72
+ ---
73
+
74
+ # Ministral 3 8B Base 2512
75
+ A balanced model in the Ministral 3 family, **Ministral 3 8B** is a powerful, efficient tiny language model with vision capabilities.
76
+
77
+ This model is the base pre-trained version, not fine-tuned for instruction or reasoning tasks, making it ideal for custom post-training processes.
78
+ For instruction and chat based use cases, we recommend using [Ministral 3 8B Instruct 2512](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512).
79
+
80
+ The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 8B can even be deployed locally, capable of fitting in 24GB of VRAM in BF16, and less than 12GB of RAM/VRAM when quantized.
81
+
82
+ ## Key Features
83
+ Ministral 3 8B consists of two main architectural components:
84
+ - **8.4B Language Model**
85
+ - **0.4B Vision Encoder**
86
+
87
+ The Ministral 3 8B Base model offers the following capabilities:
88
+ - **Vision**: Enables the model to analyze images and provide insights based on visual content, in addition to text.
89
+ - **Multilingual**: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
90
+ - **Edge-Optimized**: Delivers best-in-class performance at a small scale, deployable anywhere.
91
+ - **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
92
+ - **Large Context Window**: Supports a 256k context window.
93
+
94
+ ### Use Cases
95
+ Perfect for balanced performance in local or embedded systems, combining versatility with efficiency.
96
+ - Chat interfaces in constrained environments
97
+ - Local daily-driver AI assistant
98
+ - Image/document description and understanding
99
+ - Translation and content generation
100
+ - Specialized agentic use cases
101
+ - Fine-tuning and specialization
102
+ - And more...
103
+
104
+ Bringing advanced AI capabilities to resource-constrained environments.
105
+
106
+ ## Ministral 3 Family
107
+
108
+ | Model Name | Type | Precision | Link |
109
+ |--------------------------------|--------------------|-----------|------------------------------------------------------------------------------------------|
110
+ | Ministral 3 3B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Base-2512) |
111
+ | Ministral 3 3B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512) |
112
+ | Ministral 3 3B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-3B-Reasoning-2512) |
113
+ | **Ministral 3 8B Base 2512** | **Base pre-trained** | **BF16** | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Base-2512) |
114
+ | Ministral 3 8B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Instruct-2512) |
115
+ | Ministral 3 8B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-8B-Reasoning-2512) |
116
+ | Ministral 3 14B Base 2512 | Base pre-trained | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Base-2512) |
117
+ | Ministral 3 14B Instruct 2512 | Instruct post-trained | FP8 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Instruct-2512) |
118
+ | Ministral 3 14B Reasoning 2512 | Reasoning capable | BF16 | [Hugging Face](https://huggingface.co/mistralai/Ministral-3-14B-Reasoning-2512) |
119
+
120
+ Other formats available [here](https://huggingface.co/collections/mistralai/ministral-3-additional-checkpoints).
121
+
122
+ ## Benchmark Results
123
+
124
+ We compare Ministral 3 to similar sized models.
125
+
126
+ ### Reasoning
127
+
128
+ | Model | AIME25 | AIME24 | GPQA Diamond | LiveCodeBench |
129
+ |---------------------------|-------------|-------------|--------------|---------------|
130
+ | **Ministral 3 14B** | <u>0.850</u>| <u>0.898</u>| <u>0.712</u> | <u>0.646</u> |
131
+ | Qwen3-14B (Thinking) | 0.737 | 0.837 | 0.663 | 0.593 |
132
+ | | | | | |
133
+ | **Ministral 3 8B** | 0.787 | <u>0.860</u>| 0.668 | <u>0.616</u> |
134
+ | Qwen3-VL-8B-Thinking | <u>0.798</u>| <u>0.860</u>| <u>0.671</u> | 0.580 |
135
+ | | | | | |
136
+ | **Ministral 3 3B** | <u>0.721</u>| <u>0.775</u>| 0.534 | <u>0.548</u> |
137
+ | Qwen3-VL-4B-Thinking | 0.697 | 0.729 | <u>0.601</u> | 0.513 |
138
+
139
+ ### Instruct
140
+
141
+ | Model | Arena Hard | WildBench | MATH Maj@1 | MM MTBench |
142
+ |---------------------------|-------------|------------|-------------|------------------|
143
+ | **Ministral 3 14B** | <u>0.551</u>| <u>68.5</u>| <u>0.904</u>| <u>8.49</u> |
144
+ | Qwen3 14B (Non-Thinking) | 0.427 | 65.1 | 0.870 | NOT MULTIMODAL |
145
+ | Gemma3-12B-Instruct | 0.436 | 63.2 | 0.854 | 6.70 |
146
+ | | | | | |
147
+ | **Ministral 3 8B** | 0.509 | <u>66.8</u>| 0.876 | <u>8.08</u> |
148
+ | Qwen3-VL-8B-Instruct | <u>0.528</u>| 66.3 | <u>0.946</u>| 8.00 |
149
+ | | | | | |
150
+ | **Ministral 3 3B** | 0.305 | <u>56.8</u>| 0.830 | 7.83 |
151
+ | Qwen3-VL-4B-Instruct | <u>0.438</u>| <u>56.8</u>| <u>0.900</u>| <u>8.01</u> |
152
+ | Qwen3-VL-2B-Instruct | 0.163 | 42.2 | 0.786 | 6.36 |
153
+ | Gemma3-4B-Instruct | 0.318 | 49.1 | 0.759 | 5.23 |
154
+
155
+ ### Base
156
+
157
+ | Model | Multilingual MMLU | MATH CoT 2-Shot | AGIEval 5-shot | MMLU Redux 5-shot | MMLU 5-shot | TriviaQA 5-shot |
158
+ |---------------------|-------------------|-----------------|----------------|-------------------|-------------|-----------------|
159
+ | **Ministral 3 14B** | 0.742 | <u>0.676</u> | 0.648 | 0.820 | 0.794 | 0.749 |
160
+ | Qwen3 14B Base | <u>0.754</u> | 0.620 | <u>0.661</u> | <u>0.837</u> | <u>0.804</u>| 0.703 |
161
+ | Gemma 3 12B Base | 0.690 | 0.487 | 0.587 | 0.766 | 0.745 | <u>0.788</u> |
162
+ | | | | | | | |
163
+ | **Ministral 3 8B** | <u>0.706</u> | <u>0.626</u> | 0.591 | 0.793 | <u>0.761</u>| <u>0.681</u> |
164
+ | Qwen 3 8B Base | 0.700 | 0.576 | <u>0.596</u> | <u>0.794</u> | 0.760 | 0.639 |
165
+ | | | | | | | |
166
+ | **Ministral 3 3B** | 0.652 | <u>0.601</u> | 0.511 | 0.735 | 0.707 | 0.592 |
167
+ | Qwen 3 4B Base | <u>0.677</u> | 0.405 | <u>0.570</u> | <u>0.759</u> | <u>0.713</u>| 0.530 |
168
+ | Gemma 3 4B Base | 0.516 | 0.294 | 0.430 | 0.626 | 0.589 | <u>0.640</u> |
169
+
170
+ ## Usage
171
+
172
+ The model can be used with the following frameworks;
173
+ - [`vllm`](https://github.com/vllm-project/vllm): See [here](#vllm)
174
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
175
+
176
+ ### vLLM
177
+
178
+ We recommend using this model with [vLLM](https://github.com/vllm-project/vllm).
179
+
180
+ #### Installation
181
+
182
+ Make sure to install **vllm >= 1.12.0**:
183
+
184
+ ```
185
+ pip install vllm --upgrade
186
+ ```
187
+
188
+ Doing so should automatically install [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).
189
+
190
+ To check:
191
+ ```
192
+ python -c "import mistral_common; print(mistral_common.__version__)"
193
+ ```
194
+
195
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest).
196
+
197
+ #### Serve
198
+
199
+ Due to their size and the BF16 format of their weights `Ministral-3-3B-Base-2512` and `Ministral-3-8B-Base-2512` can run on a single 1xH200 GPU.
200
+
201
+ A simple launch command is:
202
+
203
+ ```bash
204
+ vllm serve mistralai/Ministral-3-8B-Instruct-2512 \
205
+ --tokenizer_mode mistral --config_format mistral --load_format mistral
206
+ ```
207
+
208
+ Additional flags:
209
+
210
+ * You can set `--max-model-len` to preserve memory. By default it is set to `262144` which is quite large but not necessary for most scenarios.
211
+ * You can set `--max-num-batched-tokens` to balance throughput and latency, higher means higher throughput but higher latency.
212
+
213
+ #### Usage of the model
214
+
215
+ Here we asumme that the model `mistralai/Ministral-3-8B-Base-2512` is served and you can ping it to the domain `localhost` with the port `8000` which is the default for vLLM.
216
+
217
+ <details>
218
+ <summary>Test Base</summary>
219
+
220
+ Quick test with the base model.
221
+
222
+ ```python
223
+ from openai import OpenAI
224
+
225
+ # Modify OpenAI's API key and API base to use vLLM's API server.
226
+ openai_api_key = "EMPTY"
227
+ openai_api_base = "http://localhost:8000/v1"
228
+
229
+ TEMP = 0.15
230
+ MAX_TOK = 256
231
+
232
+ client = OpenAI(
233
+ api_key=openai_api_key,
234
+ base_url=openai_api_base,
235
+ )
236
+
237
+ models = client.models.list()
238
+ model = models.data[0].id
239
+
240
+ response = client.completions.create(
241
+ model=model,
242
+ prompt="What is the best thing in the universe ?",
243
+ temperature=TEMP,
244
+ max_tokens=MAX_TOK,
245
+ )
246
+
247
+ print(response.choices[0].text)
248
+ ```
249
+
250
+ </details>
251
+
252
+ ### Transformers
253
+
254
+ You can also use Ministral 3 8B Base 2512 with `Transformers` !
255
+ Make sure to install `Transformers` from its first v5 release candidate or from "main":
256
+
257
+ ```
258
+ pip install transformers==5.0.0rc0
259
+ ```
260
+
261
+ To make the best use of our model with `Transformers` make sure to have [installed](https://github.com/mistralai/mistral-common) `mistral-common >= 1.8.6` to use our tokenizer.
262
+
263
+ ```bash
264
+ pip install mistral-common --upgrade
265
+ ```
266
+
267
+ Then load our tokenizer along with the model and generate:
268
+
269
+ <details>
270
+ <summary>Python snippet</summary>
271
+
272
+ ```python
273
+ from transformers import Mistral3ForConditionalGeneration, MistralCommonBackend, FineGrainedFP8Config
274
+
275
+ model_id = "mistralai/Ministral-3-8B-Base-2512"
276
+ model = Mistral3ForConditionalGeneration.from_pretrained(
277
+ model_id,
278
+ device_map="auto",
279
+ )
280
+ tokenizer = MistralCommonBackend.from_pretrained(model_id)
281
+
282
+ input_ids = tokenizer.encode("Once about a time, France was a", return_tensors="pt")
283
+ input_ids = input_ids.to("cuda")
284
+
285
+ output = model.generate(
286
+ input_ids,
287
+ max_new_tokens=30,
288
+ )[0]
289
+
290
+ decoded_output = tokenizer.decode(output[len(input_ids[0]):])
291
+ print(decoded_output)
292
+ ```
293
+
294
+ </details>
295
+
296
+ ## License
297
+
298
+ This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
299
+
300
  *You must not use this model in a manner that infringes, misappropriates, or otherwise violates any third party’s rights, including intellectual property rights.*