adriabama06
/

my-llm-tests

adriabama06 commited on Sep 12

Commit

556d36e

verified ·

1 Parent(s): dd2ad01

Update Qwen3_4B_kv_cache_f16_vs_q8_vs_q4.md

Files changed (1) hide show

Qwen3_4B_kv_cache_f16_vs_q8_vs_q4.md CHANGED Viewed

@@ -52,8 +52,8 @@ $ llama-server --host 0.0.0.0 --port 8000 --no-mmap -c 32768 -ub 4096 -fa on -ng
 | Psychology | 68.55% | 69.92% | 67.54% |
 | Other | 56.71% | 57.14% | 56.28% |
-*Note: All tests performed using Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf (4-bit quantized weights) with different KV cache precision settings*
-*Raw results inside the folder `Qwen3_4B_kv_cache_f16_vs_q8_vs_q4`*
 ### Scripts used (default settings with temperature=0.7 and top_p=0.8):
 - https://github.com/chigkim/Ollama-MMLU-Pro

 | Psychology | 68.55% | 69.92% | 67.54% |
 | Other | 56.71% | 57.14% | 56.28% |
+*Note: All tests performed using Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf (4-bit quantized weights) with different KV cache precision settings*
+*Raw results inside the folder `Qwen3_4B_kv_cache_f16_vs_q8_vs_q4`*
 ### Scripts used (default settings with temperature=0.7 and top_p=0.8):
 - https://github.com/chigkim/Ollama-MMLU-Pro