flyingfishinwater
/

good_and_small_models

GGUF

imatrix

conversational

Model card Files Files and versions

xet

Community

flyingfishinwater commited on Dec 2, 2025

Commit

6920b32

verified ·

1 Parent(s): 3c873ef

Update README.md

Browse files

Files changed (1) hide show

README.md +257 -40

README.md CHANGED Viewed

@@ -10,7 +10,6 @@ on the App Store.
 Refer more information on [Privacy AI Official Site:](https://privacyai.acmeup.com)
 ## Qwen3 4B Instruct 2507
 Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.
 **Model Intention:** Latest Qwen3-4B Instruct model with enhanced reasoning, logical thinking, mathematics, science, coding, and tool usage capabilities
@@ -30,7 +29,6 @@ Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, fea
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -47,7 +45,6 @@ Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, fea
 ---
 ## Qwen3 4B Thinking 2507
 Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.
 **Model Intention:** Advanced reasoning model with thinking mode enabled for complex logical reasoning, mathematics, science, and coding tasks
@@ -67,7 +64,6 @@ Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enha
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -84,12 +80,11 @@ Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enha
 ---
 ## GLM Edge 4B Chat
 GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.
 **Model Intention:** It is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/THUDM](https://huggingface.co/THUDM)
@@ -104,7 +99,6 @@ GLM-4 is the latest generation of pre-trained models in the GLM-4 series launche
 **Context Length:** 1024 tokens
 **Prompt Format:**
 ```
 {% for item in messages %}{% if item['role'] == 'system' %}<|system|>
 {{ item['content'] }}{% elif item['role'] == 'user' %}<|user|>
@@ -125,12 +119,11 @@ GLM-4 is the latest generation of pre-trained models in the GLM-4 series launche
 ---
 ## Gemma 3n E2B it
 Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.
 **Model Intention:** Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/google/gemma-3n-E2B-it](https://huggingface.co/google/gemma-3n-E2B-it)
@@ -144,10 +137,9 @@ Gemma 3n models are designed for efficient execution on low-resource devices. Th
 **File Size:** 2720 MB
-**Context Length:** 8000 tokens
 **Prompt Format:**
 ```
 ```
@@ -164,12 +156,11 @@ Gemma 3n models are designed for efficient execution on low-resource devices. Th
 ---
 ## SmolLM3 3B
 SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.
 **Model Intention:** SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages (English, French, Spanish, German, Italian, and Portuguese), advanced reasoning and long context.
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
@@ -184,7 +175,6 @@ SmolLM3 is a fully open model that offers strong performance at the 3B–4B scal
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -201,12 +191,11 @@ SmolLM3 is a fully open model that offers strong performance at the 3B–4B scal
 ---
 ## Phi4 mini 4B
 Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic).  The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
 **Model Intention:** Phi-4-mini-instruct is a lightweight model focused on high-quality, reasoning dense data. It supports 128K token context length
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct)
@@ -221,7 +210,6 @@ Phi-4-mini-instruct is a lightweight open model built upon synthetic data and fi
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 {% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}
 ```
@@ -238,12 +226,11 @@ Phi-4-mini-instruct is a lightweight open model built upon synthetic data and fi
 ---
 ## Qwen3 1.7B
 Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.
 **Model Intention:** The 1.7B model in the Qwen3 series is a small model designed for fast predictions and function calls.
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
@@ -258,7 +245,6 @@ Qwen3 1.7B is one of the small models in the Qwen series, designed for efficienc
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -275,12 +261,11 @@ Qwen3 1.7B is one of the small models in the Qwen series, designed for efficienc
 ---
 ## ERNIE-4.5 0.3B
 ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training
 **Model Intention:** ERNIE-4.5-0.3B-Base is a text dense Base model for testing the model's architecture.
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT](https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT)
@@ -295,7 +280,6 @@ ERNIE 4.5 is a series of open source models created by Baidu. The advanced capab
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -312,12 +296,11 @@ ERNIE 4.5 is a series of open source models created by Baidu. The advanced capab
 ---
 ## LFM2 1.2B
 LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. It supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
 **Model Intention:** LFM2 1.2B is particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-1.2B](https://huggingface.co/LiquidAI/LFM2-1.2B)
@@ -329,10 +312,9 @@ LFM2 is a new generation of hybrid models developed by Liquid AI, specifically d
 **File Size:** 696 MB
-**Context Length:** 1024 tokens
 **Prompt Format:**
 ```
 ```
@@ -349,12 +331,11 @@ LFM2 is a new generation of hybrid models developed by Liquid AI, specifically d
 ---
 ## Jan v1 4B
 Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.
 **Model Intention:** Advanced agentic language model optimized for reasoning and problem-solving with 91.1% accuracy on question answering
-**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf?download=true](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf?download=true)
 **Model Info URL:** [https://huggingface.co/janhq/Jan-v1-4B](https://huggingface.co/janhq/Jan-v1-4B)
@@ -369,7 +350,6 @@ Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, bu
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -386,7 +366,6 @@ Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, bu
 ---
 ## Menlo Lucy 1.7B
 Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.
 **Model Intention:** Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing.
@@ -406,7 +385,6 @@ Lucy is a compact but capable 1.7B model focused on agentic web search and light
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -423,7 +401,6 @@ Lucy is a compact but capable 1.7B model focused on agentic web search and light
 ---
 ## Nemotron 1.5B
 OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.
 **Model Intention:** It is a reasoning model that is post-trained for reasoning about math, code and science solution generation.
@@ -443,7 +420,6 @@ OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivativ
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -460,7 +436,6 @@ OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivativ
 ---
 ## Qwen3 1.7B Uncensored
 Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.
 **Model Intention:** An uncensored 1.7B model optimized for creative writing, fiction stories, horror narratives, and unrestricted conversational scenarios.
@@ -480,7 +455,6 @@ Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing a
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
@@ -497,7 +471,6 @@ Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing a
 ---
 ## Gemma 3 270M
 Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.
 **Model Intention:** Ultra-compact 270M parameter model optimized for resource-constrained environments with 32K context length
@@ -512,17 +485,86 @@ Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designe
 **Developer:** [https://huggingface.co/google](https://huggingface.co/google)
-**File Size:** 160 MB
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
-**Template Name:** gemma
 **Add BOS Token:** Yes
@@ -531,4 +573,179 @@ Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designe
 **Parse Special Tokens:** Yes
----

 Refer more information on [Privacy AI Official Site:](https://privacyai.acmeup.com)
 ## Qwen3 4B Instruct 2507
 Qwen3-4B-Instruct-2507 is the latest 4B parameter model in the Qwen3 series, featuring significant improvements in reasoning, mathematics, science, coding, and tool usage. With 262K context length and strong multilingual support, it excels at instruction following, logical reasoning, and complex problem-solving tasks.
 **Model Intention:** Latest Qwen3-4B Instruct model with enhanced reasoning, logical thinking, mathematics, science, coding, and tool usage capabilities
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## Qwen3 4B Thinking 2507
 Qwen3-4B-Thinking-2507 is a specialized variant of the Qwen3-4B series with enhanced reasoning capabilities. It features thinking mode enabled by default, providing significantly improved performance on complex reasoning tasks including logical reasoning, mathematics, science, coding, and academic benchmarks with 262K context length.
 **Model Intention:** Advanced reasoning model with thinking mode enabled for complex logical reasoning, mathematics, science, and coding tasks
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## GLM Edge 4B Chat
 GLM-4 is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. In the evaluation of data sets in semantics, mathematics, reasoning, code, and knowledge, GLM-4 has shown superior performance beyond Llama-3. In addition to multi-round conversations, GLM-4-Chat also has advanced features such as web browsing, code execution, custom tool calls (Function Call), and long text reasoning (supporting up to 128K context). This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.
 **Model Intention:** It is the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/glm-edge-4b-chat.Q4_K_M.gguf)
 **Model Info URL:** [https://huggingface.co/THUDM](https://huggingface.co/THUDM)
 **Context Length:** 1024 tokens
 **Prompt Format:**
 ```
 {% for item in messages %}{% if item['role'] == 'system' %}<|system|>
 {{ item['content'] }}{% elif item['role'] == 'user' %}<|user|>
 ---
 ## Gemma 3n E2B it
 Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input, handling text, image, video, and audio input, and generating text outputs, with open weights for pre-trained and instruction-tuned variants. These models were trained with data in over 140 spoken languages.
 **Model Intention:** Gemma 3n models are designed for efficient execution on low-resource devices. They are capable of multimodal input
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/gemma-3n-E2B-it-Q4_0.gguf)
 **Model Info URL:** [https://huggingface.co/google/gemma-3n-E2B-it](https://huggingface.co/google/gemma-3n-E2B-it)
 **File Size:** 2720 MB
+**Context Length:** 4096 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## SmolLM3 3B
 SmolLM3 is a fully open model that offers strong performance at the 3B–4B scale. The model is a decoder-only transformer using GQA and NoPE (with 3:1 ratio), it was pretrained on 11.2T tokens with a staged curriculum of web, code, math and reasoning data. Post-training included midtraining on 140B reasoning tokens.
 **Model Intention:** SmolLM3 is a 3B parameter language model designed to push the boundaries of small models. It supports 6 languages (English, French, Spanish, German, Italian, and Portuguese), advanced reasoning and long context.
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/SmolLM3-Q4_K_M.gguf)
 **Model Info URL:** [https://huggingface.co/HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## Phi4 mini 4B
 Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model is intended for broad multilingual commercial and research use. The model provides uses for general purpose AI systems and applications which require: 1). Memory/compute constrained environments; 2). Latency bound scenarios; 3) Strong reasoning (especially math and logic).  The model is designed to accelerate research on language and multimodal models, for use as a building block for generative AI powered features.
 **Model Intention:** Phi-4-mini-instruct is a lightweight model focused on high-quality, reasoning dense data. It supports 128K token context length
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Phi-4-mini-instruct-Q4_K_M.gguf)
 **Model Info URL:** [https://huggingface.co/microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct)
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 {% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %}
 ```
 ---
 ## Qwen3 1.7B
 Qwen3 1.7B is one of the small models in the Qwen series, designed for efficiency and speed. It can run seamlessly on edge devices, enabling rapid inference and real-time applications. This compact model is ideal for testing scenarios, prototyping, or deployment in resource-constrained environments.
 **Model Intention:** The 1.7B model in the Qwen3 series is a small model designed for fast predictions and function calls.
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-1.7B-Q4_K_M.gguf)
 **Model Info URL:** [https://huggingface.co/Qwen/Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B)
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## ERNIE-4.5 0.3B
 ERNIE 4.5 is a series of open source models created by Baidu. The advanced capabilities of the ERNIE 4.5 models, particularly the MoE-based A47B and A3B series, are underpinned by several key technical innovations: 1. Multimodal Heterogeneous MoE Pre-Training; 2. Scaling-Efficient Infrastructure; 3. Modality-Specific Post-Training
 **Model Intention:** ERNIE-4.5-0.3B-Base is a text dense Base model for testing the model's architecture.
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/ERNIE-4.5-0.3B-PT-Q4_0.gguf)
 **Model Info URL:** [https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT](https://huggingface.co/baidu/ERNIE-4.5-0.3B-Base-PT)
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## LFM2 1.2B
 LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions. It supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
 **Model Intention:** LFM2 1.2B is particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-1.2B-Q4_0.gguf)
 **Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-1.2B](https://huggingface.co/LiquidAI/LFM2-1.2B)
 **File Size:** 696 MB
+**Context Length:** 4096 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## Jan v1 4B
 Jan-v1-4B is an advanced agentic language model with 4.02 billion parameters, built on Qwen3-4B-Thinking. It is specifically designed for agentic reasoning and problem-solving, optimized for integration with Jan App. The model achieves strong performance on chat and question-answering benchmarks with improved reasoning capabilities, making it ideal for complex task automation and intelligent agent applications.
 **Model Intention:** Advanced agentic language model optimized for reasoning and problem-solving with 91.1% accuracy on question answering
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Jan-v1-4B-Q4_0.gguf)
 **Model Info URL:** [https://huggingface.co/janhq/Jan-v1-4B](https://huggingface.co/janhq/Jan-v1-4B)
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## Menlo Lucy 1.7B
 Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing. It is built on Qwen3-1.7B and optimized to run efficiently on mobile devices, even with CPU-only configurations. It was developed by Alan Dao, Bach Vu Dinh, Alex Nguyen, and Norapat Buppodom.
 **Model Intention:** Lucy is a compact but capable 1.7B model focused on agentic web search and lightweight browsing.
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## Nemotron 1.5B
 OpenReasoning-Nemotron-1.5B is a large language model (LLM) which is a derivative of Qwen2.5-1.5B-Instruct. It is a reasoning model that is post-trained for reasoning about math, code and science solution generation. This model is ready for commercial/non-commercial research use.
 **Model Intention:** It is a reasoning model that is post-trained for reasoning about math, code and science solution generation.
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## Qwen3 1.7B Uncensored
 Qwen3 1.7B Uncensored is an unrestricted variant designed for creative writing and storytelling without content limitations. It excels at generating fiction stories, horror narratives, plot development, scene continuation, and roleplaying scenarios. This model provides unfiltered responses and can produce intense or graphic content, making it suitable for users seeking unrestricted AI interactions for creative purposes.
 **Model Intention:** An uncensored 1.7B model optimized for creative writing, fiction stories, horror narratives, and unrestricted conversational scenarios.
 **Context Length:** 2048 tokens
 **Prompt Format:**
 ```
 ```
 ---
 ## Gemma 3 270M
 Gemma 3 270M is an ultra-compact transformer model with 268M parameters, designed for efficient deployment on mobile and edge devices. Part of Google's Gemma family, it offers strong performance for its size with 32K context length, multilingual support, and responsible AI design. Ideal for applications requiring fast inference with minimal computational resources while maintaining quality text generation capabilities.
 **Model Intention:** Ultra-compact 270M parameter model optimized for resource-constrained environments with 32K context length
 **Developer:** [https://huggingface.co/google](https://huggingface.co/google)
+**File Size:** 245 MB
+**Context Length:** 4096 tokens
+**Prompt Format:**
+```
+```
+**Template Name:** gemma
+**Add BOS Token:** Yes
+**Add EOS Token:** No
+**Parse Special Tokens:** Yes
+---
+## LFM2 2.6B
+LFM2-2.6B is a next-generation hybrid model by Liquid AI with 2.6B parameters, designed for edge AI and on-device deployment. It features multiplicative gates and short convolutions, offering 3x faster training and 2x faster decode/prefill speed on CPU. The model excels at agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. It supports 8 languages (English, Arabic, Chinese, French, German, Japanese, Korean, Spanish) with 32,768 context length and runs efficiently on CPU, GPU, and NPU hardware.
+**Model Intention:** Advanced hybrid model with 3x faster training and 2x faster inference, optimized for agentic tasks, RAG, and multi-turn conversations
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-2.6B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-2.6B-Q4_0.gguf)
+**Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-2.6B](https://huggingface.co/LiquidAI/LFM2-2.6B)
+**Model License:** [License Info](https://huggingface.co/LiquidAI/LFM2-2.6B/raw/main/LICENSE)
+**Model Description:** LFM2-2.6B is a next-generation hybrid model by Liquid AI with 2.6B parameters, designed for edge AI and on-device deployment. It features multiplicative gates and short convolutions, offering 3x faster training and 2x faster decode/prefill speed on CPU. The model excels at agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations. It supports 8 languages (English, Arabic, Chinese, French, German, Japanese, Korean, Spanish) with 32,768 context length and runs efficiently on CPU, GPU, and NPU hardware.
+**Developer:** [https://huggingface.co/LiquidAI](https://huggingface.co/LiquidAI)
+**File Size:** 1500 MB
 **Context Length:** 2048 tokens
 **Prompt Format:**
+```
 ```
+**Template Name:** chatml
+**Add BOS Token:** Yes
+**Add EOS Token:** No
+**Parse Special Tokens:** Yes
+---
+## LFM2-VL 1.6B
+LFM2-VL-1.6B is an advanced multimodal vision-language model by Liquid AI featuring a 1.3B language model with 297M vision encoder. It processes images up to 512×512 pixels with variable resolutions, offers fast inference speed with superior performance compared to the 450M version, and supports 32,768 context length. Optimized for edge AI deployment with hybrid conv+attention architecture and SigLIP2 NaFlex vision encoder, providing enhanced reasoning and understanding capabilities.
+**Model Intention:** Enhanced multimodal vision-language model with improved reasoning capabilities, optimized for edge AI and low-latency applications
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-VL-1.6B-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/LFM2-VL-1.6B-Q4_0.gguf)
+**Model Info URL:** [https://huggingface.co/LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B)
+**Model License:** [License Info](https://huggingface.co/LiquidAI/LFM2-VL-1.6B/raw/main/LICENSE)
+**Model Description:** LFM2-VL-1.6B is an advanced multimodal vision-language model by Liquid AI featuring a 1.3B language model with 297M vision encoder. It processes images up to 512×512 pixels with variable resolutions, offers fast inference speed with superior performance compared to the 450M version, and supports 32,768 context length. Optimized for edge AI deployment with hybrid conv+attention architecture and SigLIP2 NaFlex vision encoder, providing enhanced reasoning and understanding capabilities.
+**Developer:** [https://huggingface.co/LiquidAI](https://huggingface.co/LiquidAI)
+**File Size:** 900 MB
+**Context Length:** 4096 tokens
+**Prompt Format:**
 ```
+```
+**Template Name:** chatml
 **Add BOS Token:** Yes
 **Parse Special Tokens:** Yes
+---
+## Qwen2.5-VL 3B Instruct
+Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.
+**Model Intention:** Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities up to 128K context
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf)
+**Model Info URL:** [https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF)
+**Model License:** [License Info](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct/raw/main/LICENSE)
+**Model Description:** Qwen2.5-VL-3B-Instruct is a multimodal vision-language model with 3.09B parameters, featuring enhanced capabilities in coding, mathematics, and instruction following. It supports 29+ languages with up to 128K context length and 8K generation tokens. The model uses transformer architecture with RoPE, SwiGLU, and RMSNorm, offering improved resilience to diverse system prompts and specialized structured data understanding.
+**Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
+**File Size:** 1930 MB
+**Context Length:** 2048 tokens
+**Prompt Format:**
+```
+```
+**Template Name:** qwen
+**Add BOS Token:** Yes
+**Add EOS Token:** No
+**Parse Special Tokens:** Yes
+---
+## Qwen3-VL 4B Instruct
+Qwen3-VL-4B-Instruct is a multimodal vision-language model with 4B parameters, featuring enhanced capabilities in instruction following, coding, mathematics, and multilingual understanding. It supports both image and text processing with strong reasoning capabilities, making it ideal for applications requiring visual understanding and text generation.
+**Model Intention:** Multimodal vision-language model with enhanced instruction following, coding, mathematics, and multilingual capabilities
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Instruct-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Instruct-Q4_0.gguf)
+**Model Info URL:** [https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct)
+**Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
+**Model Description:** Qwen3-VL-4B-Instruct is a multimodal vision-language model with 4B parameters, featuring enhanced capabilities in instruction following, coding, mathematics, and multilingual understanding. It supports both image and text processing with strong reasoning capabilities, making it ideal for applications requiring visual understanding and text generation.
+**Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
+**File Size:** 2400 MB
+**Context Length:** 2048 tokens
+**Prompt Format:**
+```
+```
+**Template Name:** qwen
+**Add BOS Token:** Yes
+**Add EOS Token:** No
+**Parse Special Tokens:** Yes
+---
+## Qwen3-VL 4B Thinking
+Qwen3-VL-4B-Thinking is a specialized multimodal vision-language model with enhanced reasoning capabilities and thinking mode. It excels at complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and intricate logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, making it ideal for applications requiring deep analytical capabilities and visual understanding.
+**Model Intention:** Advanced multimodal reasoning model with thinking mode for complex visual reasoning, mathematics, and scientific tasks
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Thinking-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-4B-Thinking-Q4_0.gguf)
+**Model Info URL:** [https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-4B-Thinking)
+**Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
+**Model Description:** Qwen3-VL-4B-Thinking is a specialized multimodal vision-language model with enhanced reasoning capabilities and thinking mode. It excels at complex visual reasoning tasks including mathematical problem solving, scientific analysis, coding with visual inputs, and intricate logical reasoning. The thinking mode enables step-by-step problem solving with both images and text, making it ideal for applications requiring deep analytical capabilities and visual understanding.
+**Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
+**File Size:** 2100 MB
+**Context Length:** 2048 tokens
+**Prompt Format:**
+```
+```
+**Template Name:** qwen
+**Add BOS Token:** Yes
+**Add EOS Token:** No
+**Parse Special Tokens:** Yes
+---
+## Qwen3-VL 2B Instruct
+Qwen3-VL-2B-Instruct is a compact multimodal vision-language model with 2B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
+**Model Intention:** Compact multimodal vision-language model with enhanced instruction following, optimized for efficient deployment
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-2B-Instruct-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Qwen3-VL-2B-Instruct-Q4_0.gguf)
+**Model Info URL:** [https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct)
+**Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
+**Model Description:** Qwen3-VL-2B-Instruct is a compact multimodal vision-language model with 2B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
+**Developer:** [https://huggingface.co/Qwen](https://huggingface.co/Qwen)
+**File Size:** 1300 MB
+**Context Length:** 2048 tokens
+**Prompt Format:**
+```
+```
+**Template Name:** qwen
+**Add BOS Token:** Yes
+**Add EOS Token:** No
+**Parse Special Tokens:** Yes
+---
+## Ministral 3 3B Instruct 2512
+Ministral-3-3B-Instruct-2512 is a multimodal vision-language model with 3B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
+**Model Intention:** Multimodal vision-language model with enhanced instruction following, optimized for efficient deployment and visual understanding
+**Model URL:** [https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Ministral-3-3B-Instruct-2512-Q4_0.gguf](https://huggingface.co/flyingfishinwater/good_and_small_models/resolve/main/Ministral-3-3B-Instruct-2512-Q4_0.gguf)
+**Model Info URL:** [https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512](https://huggingface.co/mistralai/Ministral-3-3B-Instruct-2512)
+**Model License:** [License Info](https://www.apache.org/licenses/LICENSE-2.0.txt)
+**Model Description:** Ministral-3-3B-Instruct-2512 is a multimodal vision-language model with 3B parameters, designed for efficient deployment while maintaining strong performance in visual understanding and text generation. It supports both image and text processing with enhanced instruction following capabilities, making it ideal for applications requiring visual understanding with resource constraints. The model offers multilingual support and robust reasoning capabilities.
+**Developer:** [https://huggingface.co/mistralai](https://huggingface.co/mistralai)
+**File Size:** 1900 MB
+**Context Length:** 4096 tokens
+**Prompt Format:**
+```
+```
+**Template Name:** chatml
+**Add BOS Token:** Yes
+**Add EOS Token:** No
+**Parse Special Tokens:** Yes
+---