IQ2_KS passes the Moonshot K2 Vendor Verifier test
In case someone was wondering on the 'quality' of theIQ2_KSquant, I ran the K2 Vendor Verifier test (https://github.com/MoonshotAI/K2-Vendor-Verifier) over a couple of days.
On the objective measurement of tool calling quality, IQ2_KS scores 81%, which is a pass and higher than even some commercially available API endpoints.
More details here: https://github.com/ikawrakow/ik_llama.cpp/issues/865
That's great, thank you.
I was thinking about doing the same, on a larger quant, llama.cpp versus ik_llama.
How long did it take?
How long did it take?
About 48 hours on my rig...
Thanks again for running the official vendor verifier and amazing these little quants are beating some commercial APIs! Now if the RAM prices weren't skyrocketing we'd be set lol...
Could you share the full command you used to launch ik_llama.cpp including where you got the chat template from? I'm asking for another user, @justj0sh who is having some issues with their setup here: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/10
Thanks!
Of course. I'm using the chat template ik_llama.cpp/models/templates/Kimi-K2-Instruct.jinja from the ik_llama.cpp repository.
$ git rev-parse --short HEAD
87f6943e
Command line (from my llama-swap config):
${ik_llama}
-t 23
-m /home/ai/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/Kimi-K2-Instruct-0905-IQ2_KS.gguf
--alias Kimi-K2
--jinja
--host 0.0.0.0
--chat-template-file /home/ai/ik_llama.cpp/models/templates/Kimi-K2-Instruct.jinja
-c 100000 --no-mmap -ngl 999
-ot "blk.(0|1|2|3|4|5|6|7).ffn.=CUDA0"
-ot "blk.(11|12|13|14|15|16|17).ffn.=CUDA1"
-ot "blk.(21|22).ffn.=CUDA2"
-ot "blk.(31|32).ffn.=CUDA3"
-ot exps=CPU
-mg 0 -ub 4096 -b 4096 -mla 3 -amb 1024
--temp 0.6
Where ${ik_llama} is :
/home/ai/ik_llama.cpp/build/bin/llama-server
--port ${PORT}
If it is of any use, I'm also running the k2vv tool on Kimi-K2-Thinking-smol-IQ2_KS.ggufat the moment (might be a couple of days more before I have results :-)). This is my config:
${ik_llama}
-t 23
-m /home/ai/models/ubergarm/Kimi-K2-Thinking-GGUF/Kimi-K2-Thinking-smol-IQ2_KS.gguf
--alias Kimi-K2-Thinking
--jinja
--host 0.0.0.0
--chat-template-file /home/ai/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja
-c 150000 --no-mmap -ngl 999
-ot "blk.(0|1|2|3|4|5|6|7).ffn.=CUDA0"
-ot "blk.(11|12|13|14|15|16|17).ffn.=CUDA1"
-ot "blk.(21|22).ffn.=CUDA2"
-ot "blk.(31|32).ffn.=CUDA3"
-ot exps=CPU
-mg 0 -ub 4096 -b 4096 -mla 3 -amb 1024
--temp 1.0
--min-p 0.01