Instructions to use McGill-NLP/A3-Qwen3.5-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use McGill-NLP/A3-Qwen3.5-2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="McGill-NLP/A3-Qwen3.5-2B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("McGill-NLP/A3-Qwen3.5-2B")
model = AutoModelForImageTextToText.from_pretrained("McGill-NLP/A3-Qwen3.5-2B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use McGill-NLP/A3-Qwen3.5-2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "McGill-NLP/A3-Qwen3.5-2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "McGill-NLP/A3-Qwen3.5-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/McGill-NLP/A3-Qwen3.5-2B

SGLang

How to use McGill-NLP/A3-Qwen3.5-2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "McGill-NLP/A3-Qwen3.5-2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "McGill-NLP/A3-Qwen3.5-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "McGill-NLP/A3-Qwen3.5-2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "McGill-NLP/A3-Qwen3.5-2B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use McGill-NLP/A3-Qwen3.5-2B with Docker Model Runner:
```
docker model run hf.co/McGill-NLP/A3-Qwen3.5-2B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

A3-Qwen3.5-2B

💾 Code	📄 Paper	🌐 Website
🤗 Dataset	🤖 Models	📦 PyPI

Structured Distillation of Web Agent Capabilities Enables Generalization

Xing Han Lù, Siva Reddy

A3-Qwen3.5-2B is a 2B multimodal web agent fine-tuned from Qwen/Qwen3.5-2B using the Agent-as-Annotators (A3) framework. It is trained on A3-Synth, a dataset of high-quality synthetic trajectories generated through a structured teacher-student distillation process.

Model Description

A3-Qwen3.5-2B is designed to navigate complex web environments by processing visual screenshots and text. By decomposing the synthetic data generation process into three modular roles—Task Designer, Annotator, and Supervisor—the A3 framework allows small, locally deployable models to achieve competitive performance on benchmarks like WebArena, even surpassing some larger closed-source models.

Quick Start: Evaluation

You can evaluate the model using the agent-as-annotators toolkit:

1. Serve the model with vLLM

vllm serve --model McGill-NLP/A3-Qwen3.5-2B

2. Run evaluation

a3-eval --benchmark webarena_test --model A3-qwen3.5-2b

Citation

If you find this model useful, please cite our work:

@misc{lu2025structured,
      title={Structured Distillation of Web Agent Capabilities Enables Generalization}, 
      author={Xing Han Lù and Siva Reddy},
      year={2025},
      eprint={2604.07776},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}