Instructions to use ServiceNow-AI/Apriel-1.6-15b-Thinker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ServiceNow-AI/Apriel-1.6-15b-Thinker")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ServiceNow-AI/Apriel-1.6-15b-Thinker")
model = AutoModelForImageTextToText.from_pretrained("ServiceNow-AI/Apriel-1.6-15b-Thinker")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ServiceNow-AI/Apriel-1.6-15b-Thinker"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/Apriel-1.6-15b-Thinker",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ServiceNow-AI/Apriel-1.6-15b-Thinker

SGLang

How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ServiceNow-AI/Apriel-1.6-15b-Thinker" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/Apriel-1.6-15b-Thinker",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ServiceNow-AI/Apriel-1.6-15b-Thinker" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ServiceNow-AI/Apriel-1.6-15b-Thinker",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with Docker Model Runner:
```
docker model run hf.co/ServiceNow-AI/Apriel-1.6-15b-Thinker
```

Endless loop

by alcalde - opened Dec 10, 2025

Discussion

alcalde

Dec 10, 2025

I asked a question of your model (comparing two press releases) and it went into a ten minute endless thinking loop. This was on the official Huggingface demo page, so it's not anything I configured wrong. With the few other questions I asked it also seemed to output more thoughts than content for the final answer. When I asked a question about horse race wagering, it spent several paragraphs debating with itself whether this violated some rule about not providing gambling instructions or not. It repeated itself a few times here too but eventually managed to decide to produce output.

You never noticed your model doing this before you released it? It can't just be me, could it? Maybe the bigger the question, the more likely the endless loop? I would have named this model Azathoth instead of Apriel, because in Lovecraft's stories the god Azathoth lies dreaming at the center of chaos, kept asleep by the endless playing of maddening flutes. In a similar way, Apriel slumbers forever without answering, babbling madly and endlessly to itself. :-)

AutisticPancake

Dec 11, 2025

•

edited Dec 11, 2025

I have also encountered endless generation (though, using koboldcpp/llamacpp + sillytavern frontend), however, it seems to be tied to the sampler settings? Perhaps there's a problematic range of parameters that make it fall into a loop. Regrettably, I haven't recorded what was causing the problem, as I've been switching various sampler presets, but it did eventually went away. Perhaps the demo page chat is misconfigured too?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment