Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 211
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 12 days ago • 61