MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 15 days ago • 330
You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass Paper • 2604.10966 • Published Apr 13 • 11
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models Paper • 2603.24575 • Published Mar 25 • 18
Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos Paper • 2602.23543 • Published Feb 26 • 9
MolmoAct: Action Reasoning Models that can Reason in Space Paper • 2508.07917 • Published Aug 11, 2025 • 45
MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation Paper • 2602.11337 • Published Feb 11 • 9
VLS: Steering Pretrained Robot Policies via Vision-Language Models Paper • 2602.03973 • Published Feb 3 • 22
Learning Representations for New Sound Classes With Continual Self-Supervised Learning Paper • 2205.07390 • Published May 15, 2022
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue Paper • 2409.04927 • Published Sep 7, 2024
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models Paper • 2509.15661 • Published Sep 19, 2025 • 2
AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking Paper • 2601.17645 • Published Jan 25 • 23
Patient-Similarity Cohort Reasoning in Clinical Text-to-SQL Paper • 2601.09876 • Published Jan 14 • 7