IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video? Paper • 2509.24709 • Published Sep 29 • 6
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning Paper • 2511.14366 • Published 19 days ago • 15
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published 3 days ago • 40
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning Paper • 2512.05111 • Published 3 days ago • 40
How Far Are We from Genuinely Useful Deep Research Agents? Paper • 2512.01948 • Published 6 days ago • 50
How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity Paper • 2511.08487 • Published 26 days ago • 2
LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation Paper • 2510.11063 • Published Oct 13 • 1
Think Visually, Reason Textually: Vision-Language Synergy in ARC Paper • 2511.15703 • Published 18 days ago • 8
How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity Paper • 2511.08487 • Published 26 days ago • 2
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning Paper • 2511.14366 • Published 19 days ago • 15
ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning Paper • 2511.14366 • Published 19 days ago • 15