Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 4 days ago • 35
CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation Paper • 2604.09201 • Published 4 days ago • 1
ELT: Elastic Looped Transformers for Visual Generation Paper • 2604.09168 • Published 4 days ago • 15
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published 4 days ago • 6
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published 6 days ago • 13
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models Paper • 2604.08340 • Published 5 days ago • 5
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published 5 days ago • 267
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web Paper • 2604.08516 • Published 5 days ago • 38
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 5 days ago • 45
INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling Paper • 2604.07209 • Published 6 days ago • 33
VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics Paper • 2604.06182 • Published Feb 6 • 3
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling Paper • 2604.06916 • Published 6 days ago • 24
Experience Transfer for Multimodal LLM Agents in Minecraft Game Paper • 2604.05533 • Published 7 days ago • 13
Action Images: End-to-End Policy Learning via Multiview Video Generation Paper • 2604.06168 • Published 7 days ago • 12
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning Paper • 2604.06079 • Published 7 days ago • 5