Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
- Website
- Community
- Solutions
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2605.30280

Vision Language Action models

about 9 hours ago

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2, 2025 • 39
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22, 2025 • 35
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11, 2025 • 45
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27, 2025 • 32

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published Oct 29, 2025 • 66
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

Paper • 2510.27607 • Published Oct 31, 2025 • 10
A Survey on Efficient Vision-Language-Action Models

Paper • 2510.24795 • Published Oct 27, 2025 • 6
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Paper • 2512.02834 • Published Dec 2, 2025 • 41

about 13 hours ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 61
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

Rethinking Memory as Continuously Evolving Connectivity

Paper • 2605.28773 • Published 9 days ago • 34
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137
EarlyTom: Early Token Compression Completes Fast Video Understanding

Paper • 2605.30010 • Published 8 days ago • 32
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Paper • 2605.30161 • Published 8 days ago • 59

microsoft/phi-4

Text Generation • 15B • Updated Nov 24, 2025 • 875k • 2.25k
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Image-Text-to-Text • 28B • Updated Apr 6 • 153k • • 2.87k
XiaomiMiMo/MiMo-V2.5-Pro

Text Generation • 1T • Updated 28 days ago • 91.5k • 585
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Foundation Models in Robotics: Applications, Challenges, and the Future

Paper • 2312.07843 • Published Dec 13, 2023 • 16
Neural Fields in Robotics: A Survey

Paper • 2410.20220 • Published Oct 26, 2024 • 5
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29, 2024 • 10
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Paper • 2410.21845 • Published Oct 29, 2024 • 16

Vision Language Action models

about 9 hours ago

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2, 2025 • 39
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22, 2025 • 35
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11, 2025 • 45
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27, 2025 • 32

Rethinking Memory as Continuously Evolving Connectivity

Paper • 2605.28773 • Published 9 days ago • 34
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137
EarlyTom: Early Token Compression Completes Fast Video Understanding

Paper • 2605.30010 • Published 8 days ago • 32
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

Paper • 2605.30161 • Published 8 days ago • 59

π_RL: Online RL Fine-tuning for Flow-based Vision-Language-Action Models

Paper • 2510.25889 • Published Oct 29, 2025 • 66
Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model

Paper • 2510.27607 • Published Oct 31, 2025 • 10
A Survey on Efficient Vision-Language-Action Models

Paper • 2510.24795 • Published Oct 27, 2025 • 6
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Paper • 2512.02834 • Published Dec 2, 2025 • 41

microsoft/phi-4

Text Generation • 15B • Updated Nov 24, 2025 • 875k • 2.25k
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Image-Text-to-Text • 28B • Updated Apr 6 • 153k • • 2.87k
XiaomiMiMo/MiMo-V2.5-Pro

Text Generation • 1T • Updated 28 days ago • 91.5k • 585
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Paper • 2605.30280 • Published 8 days ago • 137

about 13 hours ago

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 61
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

Foundation Models in Robotics: Applications, Challenges, and the Future

Paper • 2312.07843 • Published Dec 13, 2023 • 16
Neural Fields in Robotics: A Survey

Paper • 2410.20220 • Published Oct 26, 2024 • 5
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset

Paper • 2410.22325 • Published Oct 29, 2024 • 10
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning

Paper • 2410.21845 • Published Oct 29, 2024 • 16

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs