Instructions to use tencent/Hunyuan-A13B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tencent/Hunyuan-A13B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tencent/Hunyuan-A13B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/Hunyuan-A13B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("tencent/Hunyuan-A13B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tencent/Hunyuan-A13B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tencent/Hunyuan-A13B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Hunyuan-A13B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tencent/Hunyuan-A13B-Instruct

SGLang

How to use tencent/Hunyuan-A13B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tencent/Hunyuan-A13B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Hunyuan-A13B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tencent/Hunyuan-A13B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Hunyuan-A13B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tencent/Hunyuan-A13B-Instruct with Docker Model Runner:
```
docker model run hf.co/tencent/Hunyuan-A13B-Instruct
```

What's the SimpleQA score?

#13

by phil111 - opened Jun 28, 2025

Discussion

phil111

Jun 28, 2025

The tests you show are highly redundant, covering only a handful of domains. For example, math (MATH, CMATH & GSM8K), coding (EvalPlus, MultiPL-3 & MBPP), and STEM/academia (MMLU, MMLU-Pro, MMLU-Redux, GPQA, and SuperGPQA).

This is a big red flag. All models that have done this in the past have had very little broad knowledge and abilities for their size. And no publicly available tests do a better job of highlighting domain overfitting (usually math, coding, and STEM) more than the English and Chinese SimpleQA tests because they include full recall questions (non-multiple choice) across a broad spectrum of domains.

Plus Chinese models tend to retain broad Chinese knowledge and abilities, hence have high Chinese SimpleQA scores for their sized, because they're trying to make models that the general Chinese public can actually use. They only selectively overfit English test boosting data, resulting in high English MMLU scores, but rock bottom English SimpleQA scores.

I'm tired of testing these models since it's just one disappointment after another, so can you do me a favor and just publish the English & Chinese SimpleQA scores which I'm sure you ran so people can tell at a glance whether or not you overfit the math, coding, and STEM tests, and by how much?

Jul 1, 2025

Jul 1, 2025

Jul 1, 2025

Jul 5, 2025

I'm tired of testing these models since it's just one disappointment after another

Damn, I always check to see if there's a post from you in the discussion/community section when a new model is released lol

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment