This is the vSearcher model introduced in paper "InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search". The model is finetuned from Qwen2.5-VL-7B-Instruct via RL as a subagent under vReasoner GPT-5-mini. For more information on how to use this model, see our GitHub page.

@article{li2025insighto3,
  title={InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search},
  author={Kaican Li and Lewei Yao and Jiannan Wu and Tiezheng Yu and Jierun Chen and Haoli Bai and Lu Hou and Lanqing Hong and Wei Zhang and Nevin L. Zhang},
  journal={arXiv preprint arXiv:2512.18745},
  year={2025}
}