Instructions to use ServiceNow-AI/Apriel-1.6-15b-Thinker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="ServiceNow-AI/Apriel-1.6-15b-Thinker") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("ServiceNow-AI/Apriel-1.6-15b-Thinker") model = AutoModelForImageTextToText.from_pretrained("ServiceNow-AI/Apriel-1.6-15b-Thinker") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ServiceNow-AI/Apriel-1.6-15b-Thinker" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ServiceNow-AI/Apriel-1.6-15b-Thinker", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/ServiceNow-AI/Apriel-1.6-15b-Thinker
- SGLang
How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ServiceNow-AI/Apriel-1.6-15b-Thinker" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ServiceNow-AI/Apriel-1.6-15b-Thinker", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ServiceNow-AI/Apriel-1.6-15b-Thinker" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ServiceNow-AI/Apriel-1.6-15b-Thinker", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use ServiceNow-AI/Apriel-1.6-15b-Thinker with Docker Model Runner:
docker model run hf.co/ServiceNow-AI/Apriel-1.6-15b-Thinker
Endless loop
I asked a question of your model (comparing two press releases) and it went into a ten minute endless thinking loop. This was on the official Huggingface demo page, so it's not anything I configured wrong. With the few other questions I asked it also seemed to output more thoughts than content for the final answer. When I asked a question about horse race wagering, it spent several paragraphs debating with itself whether this violated some rule about not providing gambling instructions or not. It repeated itself a few times here too but eventually managed to decide to produce output.
You never noticed your model doing this before you released it? It can't just be me, could it? Maybe the bigger the question, the more likely the endless loop? I would have named this model Azathoth instead of Apriel, because in Lovecraft's stories the god Azathoth lies dreaming at the center of chaos, kept asleep by the endless playing of maddening flutes. In a similar way, Apriel slumbers forever without answering, babbling madly and endlessly to itself. :-)
I have also encountered endless generation (though, using koboldcpp/llamacpp + sillytavern frontend), however, it seems to be tied to the sampler settings? Perhaps there's a problematic range of parameters that make it fall into a loop. Regrettably, I haven't recorded what was causing the problem, as I've been switching various sampler presets, but it did eventually went away. Perhaps the demo page chat is misconfigured too?