Update Qwen3_4B_kv_cache_f16_vs_q8_vs_q4.md
Browse files
Qwen3_4B_kv_cache_f16_vs_q8_vs_q4.md
CHANGED
|
@@ -52,8 +52,8 @@ $ llama-server --host 0.0.0.0 --port 8000 --no-mmap -c 32768 -ub 4096 -fa on -ng
|
|
| 52 |
| Psychology | 68.55% | 69.92% | 67.54% |
|
| 53 |
| Other | 56.71% | 57.14% | 56.28% |
|
| 54 |
|
| 55 |
-
*Note: All tests performed using Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf (4-bit quantized weights) with different KV cache precision settings*
|
| 56 |
-
*Raw results inside the folder `Qwen3_4B_kv_cache_f16_vs_q8_vs_q4`*
|
| 57 |
|
| 58 |
### Scripts used (default settings with temperature=0.7 and top_p=0.8):
|
| 59 |
- https://github.com/chigkim/Ollama-MMLU-Pro
|
|
|
|
| 52 |
| Psychology | 68.55% | 69.92% | 67.54% |
|
| 53 |
| Other | 56.71% | 57.14% | 56.28% |
|
| 54 |
|
| 55 |
+
*Note: All tests performed using Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf (4-bit quantized weights) with different KV cache precision settings*
|
| 56 |
+
*Raw results inside the folder `Qwen3_4B_kv_cache_f16_vs_q8_vs_q4`*
|
| 57 |
|
| 58 |
### Scripts used (default settings with temperature=0.7 and top_p=0.8):
|
| 59 |
- https://github.com/chigkim/Ollama-MMLU-Pro
|