flyingfishinwater commited on
Commit
6920b32
·
verified ·
1 Parent(s): 3c873ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +257 -40
README.md CHANGED
@@ -10,7 +10,6 @@ on the App Store.
10
  Refer more information on [Privacy AI Official Site:](https://privacyai.acmeup.com)
11
 
12
  ## Qwen3 4B Instruct 2507
13
-
14
  Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.
15
 
16
  **Model Intention:** Latest Qwen3-4B Instruct model with enhanced reasoning, logical thinking, mathematics, science, coding, and tool usage capabilities
@@ -30,7 +29,6 @@ Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, fea
30
  **Context Length:** 2048 tokens
31
 
32
  **Prompt Format:**
33
-
34
  ```
35
 
36
  ```
@@ -47,7 +45,6 @@ Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, fea
47
  ---
48
 
49
  ## Qwen3 4B Thinking 2507
50
-
51
  Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.
52
 
53
  **Model Intention:** Advanced reasoning model with thinking mode enabled for complex logical reasoning, mathematics, science, and coding tasks
@@ -67,7 +64,6 @@ Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enha
67
  **Context Length:** 2048 tokens
68
 
69
  **Prompt Format:**
70
-
71
  ```
72
 
73
  ```
@@ -84,12 +80,11 @@ Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enha
84
  ---
85
 
86
  ## GLM Edge 4B Chat
87
-
88
  GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.
89
 
90
  **Model Intention:** It is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI
91
 
92
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf?download=true)
93
 
94
  **Model Info URL:** [https://huggingface.co/THUDM](https://huggingface.co/THUDM)
95
 
@@ -104,7 +99,6 @@ GLM-4 is the latest generation of pre-trained models in the GLM-4 series launche
104
  **Context Length:** 1024 tokens
105
 
106
  **Prompt Format:**
107
-
108
  ```
109
  {% for item in messages %}{% if item['role'] == 'system' %}<|system|>
110
  {{ item['content'] }}{% elif item['role'] == 'user' %}<|user|>
@@ -125,12 +119,11 @@ GLM-4 is the latest generation of pre-trained models in the GLM-4 series launche
125
  ---
126
 
127
  ## Gemma 3n E2B it
128
-
129
  Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.
130
 
131
  **Model Intention:** Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input
132
 
133
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf?download=true)
134
 
135
  **Model Info URL:** [https://huggingface.co/google/gemma-3n-E2B-it](https://huggingface.co/google/gemma-3n-E2B-it)
136
 
@@ -144,10 +137,9 @@ Gemma 3n models are designed for efficient execution on low-resource devices. Th
144
 
145
  **File Size:** 2720 MB
146
 
147
- **Context Length:** 8000 tokens
148
 
149
  **Prompt Format:**
150
-
151
  ```
152
 
153
  ```
@@ -164,12 +156,11 @@ Gemma 3n models are designed for efficient execution on low-resource devices. Th
164
  ---
165
 
166
  ## SmolLM3 3B
167
-
168
  SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.
169
 
170
  **Model Intention:** SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages (English, French, Spanish, German, Italian, and Portuguese), advanced reasoning and long context.
171
 
172
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf?download=true)
173
 
174
  **Model Info URL:** [https://huggingface.co/HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
175
 
@@ -184,7 +175,6 @@ SmolLM3 is a fully open model that offers strong performance at the 3B–4B scal
184
  **Context Length:** 2048 tokens
185
 
186
  **Prompt Format:**
187
-
188
  ```
189
 
190
  ```
@@ -201,12 +191,11 @@ SmolLM3 is a fully open model that offers strong performance at the 3B–4B scal
201
  ---
202
 
203
  ## Phi4 mini 4B
204
-
205
  Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic). The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
206
 
207
  **Model Intention:** Phi-4-mini-instruct is a lightweight model focused on high-quality, reasoning dense data. It supports 128K token context length
208
 
209
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf?download=true)
210
 
211
  **Model Info URL:** [https://huggingface.co/microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct)
212
 
@@ -221,7 +210,6 @@ Phi-4-mini-instruct is a lightweight open model built upon synthetic data and fi
221
  **Context Length:** 2048 tokens
222
 
223
  **Prompt Format:**
224
-
225
  ```
226
  {% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}
227
  ```
@@ -238,12 +226,11 @@ Phi-4-mini-instruct is a lightweight open model built upon synthetic data and fi
238
  ---
239
 
240
  ## Qwen3 1.7B
241
-
242
  Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.
243
 
244
  **Model Intention:** The 1.7B model in the Qwen3 series is a small model designed for fast predictions and function calls.
245
 
246
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf?download=true)
247
 
248
  **Model Info URL:** [https://huggingface.co/Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
249
 
@@ -258,7 +245,6 @@ Qwen3 1.7B is one of the small models in the Qwen series, designed for efficienc
258
  **Context Length:** 2048 tokens
259
 
260
  **Prompt Format:**
261
-
262
  ```
263
 
264
  ```
@@ -275,12 +261,11 @@ Qwen3 1.7B is one of the small models in the Qwen series, designed for efficienc
275
  ---
276
 
277
  ## ERNIE-4.5 0.3B
278
-
279
  ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training
280
 
281
  **Model Intention:** ERNIE-4.5-0.3B-Base is a text dense Base model for testing the model's architecture.
282
 
283
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf?download=true)
284
 
285
  **Model Info URL:** [https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT](https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT)
286
 
@@ -295,7 +280,6 @@ ERNIE 4.5 is a series of open source models created by Baidu. The advanced capab
295
  **Context Length:** 2048 tokens
296
 
297
  **Prompt Format:**
298
-
299
  ```
300
 
301
  ```
@@ -312,12 +296,11 @@ ERNIE 4.5 is a series of open source models created by Baidu. The advanced capab
312
  ---
313
 
314
  ## LFM2 1.2B
315
-
316
  LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. It supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
317
 
318
  **Model Intention:** LFM2 1.2B is particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations
319
 
320
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf?download=true)
321
 
322
  **Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-1.2B](https://huggingface.co/LiquidAI/LFM2-1.2B)
323
 
@@ -329,10 +312,9 @@ LFM2 is a new generation of hybrid models developed by Liquid AI, specifically d
329
 
330
  **File Size:** 696 MB
331
 
332
- **Context Length:** 1024 tokens
333
 
334
  **Prompt Format:**
335
-
336
  ```
337
 
338
  ```
@@ -349,12 +331,11 @@ LFM2 is a new generation of hybrid models developed by Liquid AI, specifically d
349
  ---
350
 
351
  ## Jan v1 4B
352
-
353
  Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.
354
 
355
  **Model Intention:** Advanced agentic language model optimized for reasoning and problem-solving with 91.1% accuracy on question answering
356
 
357
- **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf?download=true)
358
 
359
  **Model Info URL:** [https://huggingface.co/janhq/Jan-v1-4B](https://huggingface.co/janhq/Jan-v1-4B)
360
 
@@ -369,7 +350,6 @@ Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, bu
369
  **Context Length:** 2048 tokens
370
 
371
  **Prompt Format:**
372
-
373
  ```
374
 
375
  ```
@@ -386,7 +366,6 @@ Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, bu
386
  ---
387
 
388
  ## Menlo Lucy 1.7B
389
-
390
  Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.
391
 
392
  **Model Intention:** Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing.
@@ -406,7 +385,6 @@ Lucy is a compact but capable 1.7B model focused on agentic web search and light
406
  **Context Length:** 2048 tokens
407
 
408
  **Prompt Format:**
409
-
410
  ```
411
 
412
  ```
@@ -423,7 +401,6 @@ Lucy is a compact but capable 1.7B model focused on agentic web search and light
423
  ---
424
 
425
  ## Nemotron 1.5B
426
-
427
  OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.
428
 
429
  **Model Intention:** It is a reasoning model that is post-trained for reasoning about math, code and science solution generation.
@@ -443,7 +420,6 @@ OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivativ
443
  **Context Length:** 2048 tokens
444
 
445
  **Prompt Format:**
446
-
447
  ```
448
 
449
  ```
@@ -460,7 +436,6 @@ OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivativ
460
  ---
461
 
462
  ## Qwen3 1.7B Uncensored
463
-
464
  Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.
465
 
466
  **Model Intention:** An uncensored 1.7B model optimized for creative writing, fiction stories, horror narratives, and unrestricted conversational scenarios.
@@ -480,7 +455,6 @@ Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing a
480
  **Context Length:** 2048 tokens
481
 
482
  **Prompt Format:**
483
-
484
  ```
485
 
486
  ```
@@ -497,7 +471,6 @@ Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing a
497
  ---
498
 
499
  ## Gemma 3 270M
500
-
501
  Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.
502
 
503
  **Model Intention:** Ultra-compact 270M parameter model optimized for resource-constrained environments with 32K context length
@@ -512,17 +485,86 @@ Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designe
512
 
513
  **Developer:** [https://huggingface.co/google](https://huggingface.co/google)
514
 
515
- **File Size:** 160 MB
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
516
 
517
  **Context Length:** 2048 tokens
518
 
519
  **Prompt Format:**
 
520
 
521
  ```
522
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
523
  ```
524
 
525
- **Template Name:** gemma
 
 
526
 
527
  **Add BOS Token:** Yes
528
 
@@ -531,4 +573,179 @@ Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designe
531
  **Parse Special Tokens:** Yes
532
 
533
 
534
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  Refer more information on [Privacy AI Official Site:](https://privacyai.acmeup.com)
11
 
12
  ## Qwen3 4B Instruct 2507
 
13
  Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.
14
 
15
  **Model Intention:** Latest Qwen3-4B Instruct model with enhanced reasoning, logical thinking, mathematics, science, coding, and tool usage capabilities
 
29
  **Context Length:** 2048 tokens
30
 
31
  **Prompt Format:**
 
32
  ```
33
 
34
  ```
 
45
  ---
46
 
47
  ## Qwen3 4B Thinking 2507
 
48
  Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.
49
 
50
  **Model Intention:** Advanced reasoning model with thinking mode enabled for complex logical reasoning, mathematics, science, and coding tasks
 
64
  **Context Length:** 2048 tokens
65
 
66
  **Prompt Format:**
 
67
  ```
68
 
69
  ```
 
80
  ---
81
 
82
  ## GLM Edge 4B Chat
 
83
  GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.
84
 
85
  **Model Intention:** It is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI
86
 
87
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf)
88
 
89
  **Model Info URL:** [https://huggingface.co/THUDM](https://huggingface.co/THUDM)
90
 
 
99
  **Context Length:** 1024 tokens
100
 
101
  **Prompt Format:**
 
102
  ```
103
  {% for item in messages %}{% if item['role'] == 'system' %}<|system|>
104
  {{ item['content'] }}{% elif item['role'] == 'user' %}<|user|>
 
119
  ---
120
 
121
  ## Gemma 3n E2B it
 
122
  Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.
123
 
124
  **Model Intention:** Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input
125
 
126
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf)
127
 
128
  **Model Info URL:** [https://huggingface.co/google/gemma-3n-E2B-it](https://huggingface.co/google/gemma-3n-E2B-it)
129
 
 
137
 
138
  **File Size:** 2720 MB
139
 
140
+ **Context Length:** 4096 tokens
141
 
142
  **Prompt Format:**
 
143
  ```
144
 
145
  ```
 
156
  ---
157
 
158
  ## SmolLM3 3B
 
159
  SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.
160
 
161
  **Model Intention:** SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages (English, French, Spanish, German, Italian, and Portuguese), advanced reasoning and long context.
162
 
163
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf)
164
 
165
  **Model Info URL:** [https://huggingface.co/HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
166
 
 
175
  **Context Length:** 2048 tokens
176
 
177
  **Prompt Format:**
 
178
  ```
179
 
180
  ```
 
191
  ---
192
 
193
  ## Phi4 mini 4B
 
194
  Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic). The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
195
 
196
  **Model Intention:** Phi-4-mini-instruct is a lightweight model focused on high-quality, reasoning dense data. It supports 128K token context length
197
 
198
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf)
199
 
200
  **Model Info URL:** [https://huggingface.co/microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct)
201
 
 
210
  **Context Length:** 2048 tokens
211
 
212
  **Prompt Format:**
 
213
  ```
214
  {% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}
215
  ```
 
226
  ---
227
 
228
  ## Qwen3 1.7B
 
229
  Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.
230
 
231
  **Model Intention:** The 1.7B model in the Qwen3 series is a small model designed for fast predictions and function calls.
232
 
233
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf)
234
 
235
  **Model Info URL:** [https://huggingface.co/Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
236
 
 
245
  **Context Length:** 2048 tokens
246
 
247
  **Prompt Format:**
 
248
  ```
249
 
250
  ```
 
261
  ---
262
 
263
  ## ERNIE-4.5 0.3B
 
264
  ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training
265
 
266
  **Model Intention:** ERNIE-4.5-0.3B-Base is a text dense Base model for testing the model's architecture.
267
 
268
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf)
269
 
270
  **Model Info URL:** [https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT](https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT)
271
 
 
280
  **Context Length:** 2048 tokens
281
 
282
  **Prompt Format:**
 
283
  ```
284
 
285
  ```
 
296
  ---
297
 
298
  ## LFM2 1.2B
 
299
  LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. It supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
300
 
301
  **Model Intention:** LFM2 1.2B is particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations
302
 
303
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf)
304
 
305
  **Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-1.2B](https://huggingface.co/LiquidAI/LFM2-1.2B)
306
 
 
312
 
313
  **File Size:** 696 MB
314
 
315
+ **Context Length:** 4096 tokens
316
 
317
  **Prompt Format:**
 
318
  ```
319
 
320
  ```
 
331
  ---
332
 
333
  ## Jan v1 4B
 
334
  Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.
335
 
336
  **Model Intention:** Advanced agentic language model optimized for reasoning and problem-solving with 91.1% accuracy on question answering
337
 
338
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf)
339
 
340
  **Model Info URL:** [https://huggingface.co/janhq/Jan-v1-4B](https://huggingface.co/janhq/Jan-v1-4B)
341
 
 
350
  **Context Length:** 2048 tokens
351
 
352
  **Prompt Format:**
 
353
  ```
354
 
355
  ```
 
366
  ---
367
 
368
  ## Menlo Lucy 1.7B
 
369
  Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.
370
 
371
  **Model Intention:** Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing.
 
385
  **Context Length:** 2048 tokens
386
 
387
  **Prompt Format:**
 
388
  ```
389
 
390
  ```
 
401
  ---
402
 
403
  ## Nemotron 1.5B
 
404
  OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.
405
 
406
  **Model Intention:** It is a reasoning model that is post-trained for reasoning about math, code and science solution generation.
 
420
  **Context Length:** 2048 tokens
421
 
422
  **Prompt Format:**
 
423
  ```
424
 
425
  ```
 
436
  ---
437
 
438
  ## Qwen3 1.7B Uncensored
 
439
  Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.
440
 
441
  **Model Intention:** An uncensored 1.7B model optimized for creative writing, fiction stories, horror narratives, and unrestricted conversational scenarios.
 
455
  **Context Length:** 2048 tokens
456
 
457
  **Prompt Format:**
 
458
  ```
459
 
460
  ```
 
471
  ---
472
 
473
  ## Gemma 3 270M
 
474
  Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.
475
 
476
  **Model Intention:** Ultra-compact 270M parameter model optimized for resource-constrained environments with 32K context length
 
485
 
486
  **Developer:** [https://huggingface.co/google](https://huggingface.co/google)
487
 
488
+ **File Size:** 245 MB
489
+
490
+ **Context Length:** 4096 tokens
491
+
492
+ **Prompt Format:**
493
+ ```
494
+
495
+ ```
496
+
497
+ **Template Name:** gemma
498
+
499
+ **Add BOS Token:** Yes
500
+
501
+ **Add EOS Token:** No
502
+
503
+ **Parse Special Tokens:** Yes
504
+
505
+
506
+ ---
507
+
508
+ ## LFM2 2.6B
509
+ LFM2-2.6B is a next-generation hybrid model by Liquid AI with 2.6B parameters, designed for edge AI and on-device deployment. It features multiplicative gates and short convolutions, offering 3x faster training and 2x faster decode/prefill speed on CPU. The model excels at agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. It supports 8 languages (English, Arabic, Chinese, French, German, Japanese, Korean, Spanish) with 32,768 context length and runs efficiently on CPU, GPU, and NPU hardware.
510
+
511
+ **Model Intention:** Advanced hybrid model with 3x faster training and 2x faster inference, optimized for agentic tasks, RAG, and multi-turn conversations
512
+
513
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-2.6B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-2.6B-Q4_0.gguf)
514
+
515
+ **Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B)
516
+
517
+ **Model License:** [License Info](https://huggingface.co/LiquidAI/LFM2-2.6B/raw/main/LICENSE)
518
+
519
+ **Model Description:** LFM2-2.6B is a next-generation hybrid model by Liquid AI with 2.6B parameters, designed for edge AI and on-device deployment. It features multiplicative gates and short convolutions, offering 3x faster training and 2x faster decode/prefill speed on CPU. The model excels at agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. It supports 8 languages (English, Arabic, Chinese, French, German, Japanese, Korean, Spanish) with 32,768 context length and runs efficiently on CPU, GPU, and NPU hardware.
520
+
521
+ **Developer:** [https://huggingface.co/LiquidAI](https://huggingface.co/LiquidAI)
522
+
523
+ **File Size:** 1500 MB
524
 
525
  **Context Length:** 2048 tokens
526
 
527
  **Prompt Format:**
528
+ ```
529
 
530
  ```
531
 
532
+ **Template Name:** chatml
533
+
534
+ **Add BOS Token:** Yes
535
+
536
+ **Add EOS Token:** No
537
+
538
+ **Parse Special Tokens:** Yes
539
+
540
+
541
+ ---
542
+
543
+ ## LFM2-VL 1.6B
544
+ LFM2-VL-1.6B is an advanced multimodal vision-language model by Liquid AI featuring a 1.3B language model with 297M vision encoder. It processes images up to 512×512 pixels with variable resolutions, offers fast inference speed with superior performance compared to the 450M version, and supports 32,768 context length. Optimized for edge AI deployment with hybrid conv+attention architecture and SigLIP2 NaFlex vision encoder, providing enhanced reasoning and understanding capabilities.
545
+
546
+ **Model Intention:** Enhanced multimodal vision-language model with improved reasoning capabilities, optimized for edge AI and low-latency applications
547
+
548
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-VL-1.6B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-VL-1.6B-Q4_0.gguf)
549
+
550
+ **Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B)
551
+
552
+ **Model License:** [License Info](https://huggingface.co/LiquidAI/LFM2-VL-1.6B/raw/main/LICENSE)
553
+
554
+ **Model Description:** LFM2-VL-1.6B is an advanced multimodal vision-language model by Liquid AI featuring a 1.3B language model with 297M vision encoder. It processes images up to 512×512 pixels with variable resolutions, offers fast inference speed with superior performance compared to the 450M version, and supports 32,768 context length. Optimized for edge AI deployment with hybrid conv+attention architecture and SigLIP2 NaFlex vision encoder, providing enhanced reasoning and understanding capabilities.
555
+
556
+ **Developer:** [https://huggingface.co/LiquidAI](https://huggingface.co/LiquidAI)
557
+
558
+ **File Size:** 900 MB
559
+
560
+ **Context Length:** 4096 tokens
561
+
562
+ **Prompt Format:**
563
  ```
564
 
565
+ ```
566
+
567
+ **Template Name:** chatml
568
 
569
  **Add BOS Token:** Yes
570
 
 
573
  **Parse Special Tokens:** Yes
574
 
575
 
576
+ ---
577
+
578
+ ## Qwen2.5-VL 3B Instruct
579
+ Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.
580
+
581
+ **Model Intention:** Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities up to 128K context
582
+
583
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf)
584
+
585
+ **Model Info URL:** [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF)
586
+
587
+ **Model License:** [License Info](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/raw/main/LICENSE)
588
+
589
+ **Model Description:** Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.
590
+
591
+ **Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
592
+
593
+ **File Size:** 1930 MB
594
+
595
+ **Context Length:** 2048 tokens
596
+
597
+ **Prompt Format:**
598
+ ```
599
+
600
+ ```
601
+
602
+ **Template Name:** qwen
603
+
604
+ **Add BOS Token:** Yes
605
+
606
+ **Add EOS Token:** No
607
+
608
+ **Parse Special Tokens:** Yes
609
+
610
+
611
+ ---
612
+
613
+ ## Qwen3-VL 4B Instruct
614
+ Qwen3-VL-4B-Instruct is a multimodal vision-language model with 4B parameters, featuring enhanced capabilities in instruction following, coding, mathematics, and multilingual understanding. It supports both image and text processing with strong reasoning capabilities, making it ideal for applications requiring visual understanding and text generation.
615
+
616
+ **Model Intention:** Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities
617
+
618
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Instruct-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Instruct-Q4_0.gguf)
619
+
620
+ **Model Info URL:** [https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)
621
+
622
+ **Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
623
+
624
+ **Model Description:** Qwen3-VL-4B-Instruct is a multimodal vision-language model with 4B parameters, featuring enhanced capabilities in instruction following, coding, mathematics, and multilingual understanding. It supports both image and text processing with strong reasoning capabilities, making it ideal for applications requiring visual understanding and text generation.
625
+
626
+ **Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
627
+
628
+ **File Size:** 2400 MB
629
+
630
+ **Context Length:** 2048 tokens
631
+
632
+ **Prompt Format:**
633
+ ```
634
+
635
+ ```
636
+
637
+ **Template Name:** qwen
638
+
639
+ **Add BOS Token:** Yes
640
+
641
+ **Add EOS Token:** No
642
+
643
+ **Parse Special Tokens:** Yes
644
+
645
+
646
+ ---
647
+
648
+ ## Qwen3-VL 4B Thinking
649
+ Qwen3-VL-4B-Thinking is a specialized multimodal vision-language model with enhanced reasoning capabilities and thinking mode. It excels at complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and intricate logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, making it ideal for applications requiring deep analytical capabilities and visual understanding.
650
+
651
+ **Model Intention:** Advanced multimodal reasoning model with thinking mode for complex visual reasoning, mathematics, and scientific tasks
652
+
653
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Thinking-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Thinking-Q4_0.gguf)
654
+
655
+ **Model Info URL:** [https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking)
656
+
657
+ **Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
658
+
659
+ **Model Description:** Qwen3-VL-4B-Thinking is a specialized multimodal vision-language model with enhanced reasoning capabilities and thinking mode. It excels at complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and intricate logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, making it ideal for applications requiring deep analytical capabilities and visual understanding.
660
+
661
+ **Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
662
+
663
+ **File Size:** 2100 MB
664
+
665
+ **Context Length:** 2048 tokens
666
+
667
+ **Prompt Format:**
668
+ ```
669
+
670
+ ```
671
+
672
+ **Template Name:** qwen
673
+
674
+ **Add BOS Token:** Yes
675
+
676
+ **Add EOS Token:** No
677
+
678
+ **Parse Special Tokens:** Yes
679
+
680
+
681
+ ---
682
+
683
+ ## Qwen3-VL 2B Instruct
684
+ Qwen3-VL-2B-Instruct is a compact multimodal vision-language model with 2B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
685
+
686
+ **Model Intention:** Compact multimodal vision-language model with enhanced instruction following, optimized for efficient deployment
687
+
688
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-2B-Instruct-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-2B-Instruct-Q4_0.gguf)
689
+
690
+ **Model Info URL:** [https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct)
691
+
692
+ **Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
693
+
694
+ **Model Description:** Qwen3-VL-2B-Instruct is a compact multimodal vision-language model with 2B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
695
+
696
+ **Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
697
+
698
+ **File Size:** 1300 MB
699
+
700
+ **Context Length:** 2048 tokens
701
+
702
+ **Prompt Format:**
703
+ ```
704
+
705
+ ```
706
+
707
+ **Template Name:** qwen
708
+
709
+ **Add BOS Token:** Yes
710
+
711
+ **Add EOS Token:** No
712
+
713
+ **Parse Special Tokens:** Yes
714
+
715
+
716
+ ---
717
+
718
+ ## Ministral 3 3B Instruct 2512
719
+ Ministral-3-3B-Instruct-2512 is a multimodal vision-language model with 3B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
720
+
721
+ **Model Intention:** Multimodal vision-language model with enhanced instruction following, optimized for efficient deployment and visual understanding
722
+
723
+ **Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Ministral-3-3B-Instruct-2512-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Ministral-3-3B-Instruct-2512-Q4_0.gguf)
724
+
725
+ **Model Info URL:** [https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512)
726
+
727
+ **Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
728
+
729
+ **Model Description:** Ministral-3-3B-Instruct-2512 is a multimodal vision-language model with 3B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
730
+
731
+ **Developer:** [https://huggingface.co/mistralai](https://huggingface.co/mistralai)
732
+
733
+ **File Size:** 1900 MB
734
+
735
+ **Context Length:** 4096 tokens
736
+
737
+ **Prompt Format:**
738
+ ```
739
+
740
+ ```
741
+
742
+ **Template Name:** chatml
743
+
744
+ **Add BOS Token:** Yes
745
+
746
+ **Add EOS Token:** No
747
+
748
+ **Parse Special Tokens:** Yes
749
+
750
+
751
+ ---