OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper ⢠2510.24411 ⢠Published Oct 28 ⢠71
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper ⢠2510.24411 ⢠Published Oct 28 ⢠71
ARISE: An Adaptive Resolution-Aware Metric for Test-Time Scaling Evaluation in Large Reasoning Models Paper ⢠2510.06014 ⢠Published Oct 7 ⢠10
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows Paper ⢠2510.24411 ⢠Published Oct 28 ⢠71
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper ⢠2510.23538 ⢠Published Oct 27 ⢠96
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper ⢠2509.15221 ⢠Published Sep 18 ⢠111
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper ⢠2509.15221 ⢠Published Sep 18 ⢠111
OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth? Paper ⢠2507.19132 ⢠Published Jul 25
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper ⢠2508.20096 ⢠Published Aug 27 ⢠36
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper ⢠2508.18265 ⢠Published Aug 25 ⢠208
CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback Paper ⢠2507.22080 ⢠Published Jul 25 ⢠9
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents Paper ⢠2507.19478 ⢠Published Jul 25 ⢠31