-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 45 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 32
Collections
Discover the best community collections!
Collections including paper arxiv:2605.30280
-
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
Paper • 2510.25889 • Published • 66 -
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
Paper • 2510.27607 • Published • 10 -
A Survey on Efficient Vision-Language-Action Models
Paper • 2510.24795 • Published • 6 -
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
Paper • 2512.02834 • Published • 41
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
Paper • 2605.30280 • Published • 137 -
EarlyTom: Early Token Compression Completes Fast Video Understanding
Paper • 2605.30010 • Published • 32 -
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
Paper • 2605.30161 • Published • 59
-
microsoft/phi-4
Text Generation • 15B • Updated • 875k • 2.25k -
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Image-Text-to-Text • 28B • Updated • 153k • • 2.87k -
XiaomiMiMo/MiMo-V2.5-Pro
Text Generation • 1T • Updated • 91.5k • 585 -
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
Paper • 2605.30280 • Published • 137
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 16 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Paper • 2507.01925 • Published • 39 -
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Paper • 2507.16746 • Published • 35 -
MolmoAct: Action Reasoning Models that can Reason in Space
Paper • 2508.07917 • Published • 45 -
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Paper • 2508.20072 • Published • 32
-
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
Paper • 2605.30280 • Published • 137 -
EarlyTom: Early Token Compression Completes Fast Video Understanding
Paper • 2605.30010 • Published • 32 -
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
Paper • 2605.30161 • Published • 59
-
π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models
Paper • 2510.25889 • Published • 66 -
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
Paper • 2510.27607 • Published • 10 -
A Survey on Efficient Vision-Language-Action Models
Paper • 2510.24795 • Published • 6 -
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
Paper • 2512.02834 • Published • 41
-
microsoft/phi-4
Text Generation • 15B • Updated • 875k • 2.25k -
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Image-Text-to-Text • 28B • Updated • 153k • • 2.87k -
XiaomiMiMo/MiMo-V2.5-Pro
Text Generation • 1T • Updated • 91.5k • 585 -
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
Paper • 2605.30280 • Published • 137
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 61 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
Foundation Models in Robotics: Applications, Challenges, and the Future
Paper • 2312.07843 • Published • 16 -
Neural Fields in Robotics: A Survey
Paper • 2410.20220 • Published • 5 -
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset
Paper • 2410.22325 • Published • 10 -
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Paper • 2410.21845 • Published • 16