Instructions to use McGill-NLP/A3-Qwen3.5-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use McGill-NLP/A3-Qwen3.5-2B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="McGill-NLP/A3-Qwen3.5-2B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("McGill-NLP/A3-Qwen3.5-2B") model = AutoModelForImageTextToText.from_pretrained("McGill-NLP/A3-Qwen3.5-2B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use McGill-NLP/A3-Qwen3.5-2B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "McGill-NLP/A3-Qwen3.5-2B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "McGill-NLP/A3-Qwen3.5-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/McGill-NLP/A3-Qwen3.5-2B
- SGLang
How to use McGill-NLP/A3-Qwen3.5-2B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "McGill-NLP/A3-Qwen3.5-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "McGill-NLP/A3-Qwen3.5-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "McGill-NLP/A3-Qwen3.5-2B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "McGill-NLP/A3-Qwen3.5-2B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use McGill-NLP/A3-Qwen3.5-2B with Docker Model Runner:
docker model run hf.co/McGill-NLP/A3-Qwen3.5-2B
A3-Qwen3.5-2B
Structured Distillation of Web Agent Capabilities Enables Generalization
Xing Han Lù, Siva Reddy
A3-Qwen3.5-2B is a 2B multimodal web agent fine-tuned from Qwen/Qwen3.5-2B using the Agent-as-Annotators (A3) framework. It is trained on A3-Synth, a dataset of high-quality synthetic trajectories generated through a structured teacher-student distillation process.
Model Description
A3-Qwen3.5-2B is designed to navigate complex web environments by processing visual screenshots and text. By decomposing the synthetic data generation process into three modular roles—Task Designer, Annotator, and Supervisor—the A3 framework allows small, locally deployable models to achieve competitive performance on benchmarks like WebArena, even surpassing some larger closed-source models.
Quick Start: Evaluation
You can evaluate the model using the agent-as-annotators toolkit:
1. Serve the model with vLLM
vllm serve --model McGill-NLP/A3-Qwen3.5-2B
2. Run evaluation
a3-eval --benchmark webarena_test --model A3-qwen3.5-2b
Citation
If you find this model useful, please cite our work:
@misc{lu2025structured,
title={Structured Distillation of Web Agent Capabilities Enables Generalization},
author={Xing Han Lù and Siva Reddy},
year={2025},
eprint={2604.07776},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 8