Instructions to use unsloth/Kimi-K2-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/Kimi-K2-Thinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="unsloth/Kimi-K2-Thinking", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("unsloth/Kimi-K2-Thinking", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("unsloth/Kimi-K2-Thinking", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use unsloth/Kimi-K2-Thinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/Kimi-K2-Thinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Kimi-K2-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/Kimi-K2-Thinking

SGLang

How to use unsloth/Kimi-K2-Thinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "unsloth/Kimi-K2-Thinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Kimi-K2-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "unsloth/Kimi-K2-Thinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Kimi-K2-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use unsloth/Kimi-K2-Thinking with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Kimi-K2-Thinking to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Kimi-K2-Thinking to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/Kimi-K2-Thinking to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/Kimi-K2-Thinking",
    max_seq_length=2048,
)

Docker Model Runner
How to use unsloth/Kimi-K2-Thinking with Docker Model Runner:
```
docker model run hf.co/unsloth/Kimi-K2-Thinking
```

danielhanchen commited on Jan 27

Commit

2a544c6

verified ·

1 Parent(s): 4951465

Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +4 -0
chat_template.jinja +5 -9
config.json +1 -1
docs/deploy_guidance.md +2 -2
special_tokens_map.json +1 -1
tokenizer_config.json +2 -2

README.md CHANGED Viewed

@@ -1,4 +1,8 @@
 ---
 license: other
 license_name: modified-mit
 library_name: transformers

 ---
+tags:
+- unsloth
+base_model:
+- moonshotai/Kimi-K2-Thinking
 license: other
 license_name: modified-mit
 library_name: transformers

chat_template.jinja CHANGED Viewed

@@ -1,4 +1,3 @@
-{# Unsloth template fixes #}
 {%- macro render_content(msg) -%}
     {%- set c = msg.get('content') -%}
     {%- if c is string -%}
@@ -49,16 +48,14 @@
 {%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
 {%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}
-{%- if tools -%}{%- set tools_json = tools | tojson -%}{%- set tools_json = tools_json.replace(", ", ",") -%}{%- set tools_json = tools_json.replace(": ", ":") -%}
-  <|im_system|>tool_declare<|im_middle|>{{ tools_json }}<|im_end|>
 {%- endif -%}
-{%- if messages and messages|length > 0 -%}
-  {%- if messages[0]['role'] != 'system' -%}
   <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>
-  {%- endif -%}
 {%- endif -%}
 {%- for message in hist_msgs -%}
   {{set_roles(message)}}
   {%- if message['role'] == 'assistant' -%}
@@ -97,5 +94,4 @@
 {%- if add_generation_prompt -%}
   <|im_assistant|>assistant<|im_middle|>
-{%- endif -%}
-{# Copyright 2025-present Unsloth. Apache 2.0 License. #}

 {%- macro render_content(msg) -%}
     {%- set c = msg.get('content') -%}
     {%- if c is string -%}
 {%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
 {%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}
+{%- if tools -%}
+  <|im_system|>tool_declare<|im_middle|>{{ tools | tojson(separators=(',', ':')) }}<|im_end|>
 {%- endif -%}
+{%- if messages|length == 0 or messages[0]['role'] != 'system' -%}
   <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>
 {%- endif -%}
 {%- for message in hist_msgs -%}
   {{set_roles(message)}}
   {%- if message['role'] == 'assistant' -%}
 {%- if add_generation_prompt -%}
   <|im_assistant|>assistant<|im_middle|>
+{%- endif -%}

config.json CHANGED Viewed

@@ -89,7 +89,7 @@
   "tie_word_embeddings": false,
   "topk_group": 1,
   "topk_method": "noaux_tc",
-  "transformers_version": "4.57.1",
   "unsloth_fixed": true,
   "use_cache": true,
   "v_head_dim": 128,

   "tie_word_embeddings": false,
   "topk_group": 1,
   "topk_method": "noaux_tc",
+  "transformers_version": "4.57.3",
   "unsloth_fixed": true,
   "use_cache": true,
   "v_head_dim": 128,

docs/deploy_guidance.md CHANGED Viewed

@@ -44,12 +44,12 @@ Similarly, here are the examples using TP in SGLang for Deployment.
 Here is the simple example code to run TP8 on H200 in a sigle node:
 ``` bash
-python -m sglang.launch_server --model-path $MODEL_PATH --tp 8 --trust-remote-code  --tool-call-parser kimi_k2 --reasoning_parser kimi_k2
 ```
 **Key parameter notes:**
 - `--tool-call-parser kimi_k2`: Required when enabling tool usage.
-- `--reasoning_parser kimi_k2`: Required for correctly processing reasoning content.
 ## KTransformers Deployment

 Here is the simple example code to run TP8 on H200 in a sigle node:
 ``` bash
+python -m sglang.launch_server --model-path $MODEL_PATH --tp 8 --trust-remote-code  --tool-call-parser kimi_k2 --reasoning-parser kimi_k2
 ```
 **Key parameter notes:**
 - `--tool-call-parser kimi_k2`: Required when enabling tool usage.
+- `--reasoning-parser kimi_k2`: Required for correctly processing reasoning content.
 ## KTransformers Deployment

special_tokens_map.json CHANGED Viewed

@@ -17,7 +17,7 @@
     "single_word": false
   },
   "eos_token": {
-    "content": "<|im_end|>",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,

     "single_word": false
   },
   "eos_token": {
+    "content": "[EOS]",
     "lstrip": false,
     "normalized": false,
     "rstrip": false,

tokenizer_config.json CHANGED Viewed

@@ -171,12 +171,12 @@
   },
   "bos_token": "[BOS]",
   "clean_up_tokenization_spaces": false,
-  "eos_token": "<|im_end|>",
   "extra_special_tokens": {},
   "model_max_length": 262144,
   "pad_token": "[PAD]",
   "padding_side": "left",
   "tokenizer_class": "TikTokenTokenizer",
   "unk_token": "[UNK]",
-  "chat_template": "{# Unsloth template fixes #}\n{%- macro render_content(msg) -%}\n    {%- set c = msg.get('content') -%}\n    {%- if c is string -%}\n      {{ c }}\n    {%- elif c is not none -%}\n      {% for content in c -%}\n        {% if content['type'] == 'image' or 'image' in content or 'image_url' in content -%}\n          <|media_start|>image<|media_content|><|media_pad|><|media_end|>\n        {% else -%}\n          {{ content['text'] }}\n        {%- endif -%}\n      {%- endfor -%}\n    {%- endif -%}\n{%- endmacro -%}\n\n{% macro set_roles(message) -%}\n  {%- set role_name =  message.get('name') or  message['role'] -%}\n  {%- if message['role'] == 'user' -%}\n    <|im_user|>{{role_name}}<|im_middle|>\n  {%- elif message['role'] == 'assistant' -%}\n    <|im_assistant|>{{role_name}}<|im_middle|>\n  {%- else -%}\n    <|im_system|>{{role_name}}<|im_middle|>\n  {%- endif -%}\n{%- endmacro -%}\n\n\n{%- macro render_toolcalls(message) -%}\n  <|tool_calls_section_begin|>\n  {%- for tool_call in message['tool_calls'] -%}\n    {%- set formatted_id = tool_call['id'] -%}\n    <|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>\n  {%- endfor -%}\n  <|tool_calls_section_end|>\n{%- endmacro -%}\n\n\n{# Find last non-tool-call assisitant message #}\n{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}\n{%- for idx in range(messages|length-1, -1, -1) -%}\n    {%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}\n        {%- set ns.last_non_tool_call_assistant_msg = idx -%}\n        {%- break -%}\n    {%- endif -%}\n{%- endfor -%}\n\n{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}\n{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}\n{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}\n\n{%- if tools -%}{%- set tools_json = tools | tojson -%}{%- set tools_json = tools_json.replace(\", \", \",\") -%}{%- set tools_json = tools_json.replace(\": \", \":\") -%}\n  <|im_system|>tool_declare<|im_middle|>{{ tools_json }}<|im_end|>\n{%- endif -%}\n\n{%- if messages and messages|length > 0 -%}\n  {%- if messages[0]['role'] != 'system' -%}\n  <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>\n  {%- endif -%}\n{%- endif -%}\n\n{%- for message in hist_msgs -%}\n  {{set_roles(message)}}\n  {%- if message['role'] == 'assistant' -%}\n    <think></think>{{render_content(message)}}\n    {%- if message.get('tool_calls') -%}\n      {{render_toolcalls(message)}}\n    {%- endif -%}\n  {%- elif message['role'] == 'tool' -%}\n    {%- set tool_call_id = message.tool_call_id -%}\n    ## Return of {{ tool_call_id }}\n{{render_content(message)}}\n  {%- elif message['content'] is not none -%}\n    {{render_content(message)}}\n  {%- endif -%}\n  <|im_end|>\n{%- endfor -%}\n\n{%- for message in suffix_msgs -%}\n  {{set_roles(message)}}\n  {%- if message['role'] == 'assistant' -%}\n    {%- set rc = message.get('reasoning_content', '') -%}\n    <think>{{rc}}</think>{{render_content(message)}}\n    {%- if message.get('tool_calls') -%}\n     {{render_toolcalls(message)}}\n    {%- endif -%}\n  {%- elif message['role'] == 'tool' -%}\n    {%- set tool_call_id = message.tool_call_id -%}\n    ## Return of {{ tool_call_id }}\n{{render_content(message)}}\n  {%- elif message['content'] is not none -%}\n    {{render_content(message)}}\n  {%- endif -%}\n  <|im_end|>\n{%- endfor -%}\n\n\n{%- if add_generation_prompt -%}\n  <|im_assistant|>assistant<|im_middle|>\n{%- endif -%}\n{# Copyright 2025-present Unsloth. Apache 2.0 License. #}"
 }

   },
   "bos_token": "[BOS]",
   "clean_up_tokenization_spaces": false,
+  "eos_token": "[EOS]",
   "extra_special_tokens": {},
   "model_max_length": 262144,
   "pad_token": "[PAD]",
   "padding_side": "left",
   "tokenizer_class": "TikTokenTokenizer",
   "unk_token": "[UNK]",
+  "chat_template": "{%- macro render_content(msg) -%}\n    {%- set c = msg.get('content') -%}\n    {%- if c is string -%}\n      {{ c }}\n    {%- elif c is not none -%}\n      {% for content in c -%}\n        {% if content['type'] == 'image' or 'image' in content or 'image_url' in content -%}\n          <|media_start|>image<|media_content|><|media_pad|><|media_end|>\n        {% else -%}\n          {{ content['text'] }}\n        {%- endif -%}\n      {%- endfor -%}\n    {%- endif -%}\n{%- endmacro -%}\n\n{% macro set_roles(message) -%}\n  {%- set role_name =  message.get('name') or  message['role'] -%}\n  {%- if message['role'] == 'user' -%}\n    <|im_user|>{{role_name}}<|im_middle|>\n  {%- elif message['role'] == 'assistant' -%}\n    <|im_assistant|>{{role_name}}<|im_middle|>\n  {%- else -%}\n    <|im_system|>{{role_name}}<|im_middle|>\n  {%- endif -%}\n{%- endmacro -%}\n\n\n{%- macro render_toolcalls(message) -%}\n  <|tool_calls_section_begin|>\n  {%- for tool_call in message['tool_calls'] -%}\n    {%- set formatted_id = tool_call['id'] -%}\n    <|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>\n  {%- endfor -%}\n  <|tool_calls_section_end|>\n{%- endmacro -%}\n\n\n{# Find last non-tool-call assisitant message #}\n{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}\n{%- for idx in range(messages|length-1, -1, -1) -%}\n    {%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}\n        {%- set ns.last_non_tool_call_assistant_msg = idx -%}\n        {%- break -%}\n    {%- endif -%}\n{%- endfor -%}\n\n{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}\n{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}\n{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}\n\n{%- if tools -%}\n  <|im_system|>tool_declare<|im_middle|>{{ tools | tojson(separators=(',', ':')) }}<|im_end|>\n{%- endif -%}\n\n{%- if messages|length == 0 or messages[0]['role'] != 'system' -%}\n  <|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>\n{%- endif -%}\n  \n{%- for message in hist_msgs -%}\n  {{set_roles(message)}}\n  {%- if message['role'] == 'assistant' -%}\n    <think></think>{{render_content(message)}}\n    {%- if message.get('tool_calls') -%}\n      {{render_toolcalls(message)}}\n    {%- endif -%}\n  {%- elif message['role'] == 'tool' -%}\n    {%- set tool_call_id = message.tool_call_id -%}\n    ## Return of {{ tool_call_id }}\n{{render_content(message)}}\n  {%- elif message['content'] is not none -%}\n    {{render_content(message)}}\n  {%- endif -%}\n  <|im_end|>\n{%- endfor -%}\n\n{%- for message in suffix_msgs -%}\n  {{set_roles(message)}}\n  {%- if message['role'] == 'assistant' -%}\n    {%- set rc = message.get('reasoning_content', '') -%}\n    <think>{{rc}}</think>{{render_content(message)}}\n    {%- if message.get('tool_calls') -%}\n     {{render_toolcalls(message)}}\n    {%- endif -%}\n  {%- elif message['role'] == 'tool' -%}\n    {%- set tool_call_id = message.tool_call_id -%}\n    ## Return of {{ tool_call_id }}\n{{render_content(message)}}\n  {%- elif message['content'] is not none -%}\n    {{render_content(message)}}\n  {%- endif -%}\n  <|im_end|>\n{%- endfor -%}\n\n\n{%- if add_generation_prompt -%}\n  <|im_assistant|>assistant<|im_middle|>\n{%- endif -%}"
 }