adriabama06 commited on
Commit
556d36e
·
verified ·
1 Parent(s): dd2ad01

Update Qwen3_4B_kv_cache_f16_vs_q8_vs_q4.md

Browse files
Qwen3_4B_kv_cache_f16_vs_q8_vs_q4.md CHANGED
@@ -52,8 +52,8 @@ $ llama-server --host 0.0.0.0 --port 8000 --no-mmap -c 32768 -ub 4096 -fa on -ng
52
  | Psychology | 68.55% | 69.92% | 67.54% |
53
  | Other | 56.71% | 57.14% | 56.28% |
54
 
55
- *Note: All tests performed using Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf (4-bit quantized weights) with different KV cache precision settings*
56
- *Raw results inside the folder `Qwen3_4B_kv_cache_f16_vs_q8_vs_q4`*
57
 
58
  ### Scripts used (default settings with temperature=0.7 and top_p=0.8):
59
  - https://github.com/chigkim/Ollama-MMLU-Pro
 
52
  | Psychology | 68.55% | 69.92% | 67.54% |
53
  | Other | 56.71% | 57.14% | 56.28% |
54
 
55
+ *Note: All tests performed using Qwen3-4B-Instruct-2507-UD-Q4_K_XL.gguf (4-bit quantized weights) with different KV cache precision settings*
56
+ *Raw results inside the folder `Qwen3_4B_kv_cache_f16_vs_q8_vs_q4`*
57
 
58
  ### Scripts used (default settings with temperature=0.7 and top_p=0.8):
59
  - https://github.com/chigkim/Ollama-MMLU-Pro