Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding Paper • 2509.25794 • Published Sep 30, 2025 • 1