IQ2_KS passes the Moonshot K2 Vendor Verifier test

by whoisjeremylam - opened Dec 3, 2025

Dec 3, 2025

In case someone was wondering on the 'quality' of theIQ2_KSquant, I ran the K2 Vendor Verifier test (https://github.com/MoonshotAI/K2-Vendor-Verifier) over a couple of days.

On the objective measurement of tool calling quality, IQ2_KS scores 81%, which is a pass and higher than even some commercially available API endpoints.

More details here: https://github.com/ikawrakow/ik_llama.cpp/issues/865

whoisjeremylam changed discussion title from IQ2_KS would pass the Moonshot K2 Vendor Verifier test to IQ2_KS passes the Moonshot K2 Vendor Verifier test Dec 3, 2025

sousekd

Dec 3, 2025

That's great, thank you.
I was thinking about doing the same, on a larger quant, llama.cpp versus ik_llama.
How long did it take?

whoisjeremylam

Dec 3, 2025

How long did it take?

About 48 hours on my rig...

ubergarm

Owner Dec 7, 2025

@whoisjeremylam

Thanks again for running the official vendor verifier and amazing these little quants are beating some commercial APIs! Now if the RAM prices weren't skyrocketing we'd be set lol...

Could you share the full command you used to launch ik_llama.cpp including where you got the chat template from? I'm asking for another user, @justj0sh who is having some issues with their setup here: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/10

Thanks!

whoisjeremylam

Dec 8, 2025

Of course. I'm using the chat template ik_llama.cpp/models/templates/Kimi-K2-Instruct.jinja from the ik_llama.cpp repository.

$ git rev-parse --short HEAD
87f6943e

Command line (from my llama-swap config):

      ${ik_llama}
      -t 23 
      -m /home/ai/models/ubergarm/Kimi-K2-Instruct-0905-GGUF/Kimi-K2-Instruct-0905-IQ2_KS.gguf 
      --alias Kimi-K2 
      --jinja 
      --host 0.0.0.0
      --chat-template-file /home/ai/ik_llama.cpp/models/templates/Kimi-K2-Instruct.jinja
      -c 100000 --no-mmap -ngl 999 
      -ot "blk.(0|1|2|3|4|5|6|7).ffn.=CUDA0" 
      -ot "blk.(11|12|13|14|15|16|17).ffn.=CUDA1" 
      -ot "blk.(21|22).ffn.=CUDA2" 
      -ot "blk.(31|32).ffn.=CUDA3"
      -ot exps=CPU 
      -mg 0 -ub 4096 -b 4096 -mla 3 -amb 1024
      --temp 0.6

Where ${ik_llama} is :

    /home/ai/ik_llama.cpp/build/bin/llama-server  
    --port ${PORT}

If it is of any use, I'm also running the k2vv tool on Kimi-K2-Thinking-smol-IQ2_KS.ggufat the moment (might be a couple of days more before I have results :-)). This is my config:

      ${ik_llama}
      -t 23
      -m /home/ai/models/ubergarm/Kimi-K2-Thinking-GGUF/Kimi-K2-Thinking-smol-IQ2_KS.gguf
      --alias Kimi-K2-Thinking
      --jinja
      --host 0.0.0.0
      --chat-template-file /home/ai/ik_llama.cpp/models/templates/Kimi-K2-Thinking.jinja
      -c 150000 --no-mmap -ngl 999
      -ot "blk.(0|1|2|3|4|5|6|7).ffn.=CUDA0"
      -ot "blk.(11|12|13|14|15|16|17).ffn.=CUDA1"
      -ot "blk.(21|22).ffn.=CUDA2"
      -ot "blk.(31|32).ffn.=CUDA3"
      -ot exps=CPU
      -mg 0 -ub 4096 -b 4096 -mla 3 -amb 1024
      --temp 1.0
      --min-p 0.01

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment