ProEdit: Inversion-based Editing From Prompts Done Right Paper • 2512.22118 • Published 5 days ago • 15
WeDetect: Fast Open-Vocabulary Object Detection as Retrieval Paper • 2512.12309 • Published 18 days ago • 1
IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation Paper • 2512.10730 • Published 20 days ago • 2
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning Paper • 2509.24786 • Published Sep 29 • 6
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published Jun 26 • 14
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs Paper • 2506.21862 • Published Jun 27 • 36
Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Paper • 2506.05412 • Published Jun 4 • 4