Mode Seeking meets Mean Seeking for Fast Long Video Generation Paper • 2602.24289 • Published 5 days ago • 32
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation Paper • 2602.19163 • Published 10 days ago • 14
Solaris: Building a Multiplayer Video World Model in Minecraft Paper • 2602.22208 • Published 7 days ago • 27
DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation Paper • 2602.12160 • Published 20 days ago • 37
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL Paper • 2602.22190 • Published 7 days ago • 15
COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression Paper • 2602.15200 • Published 16 days ago • 7
Revisiting the Platonic Representation Hypothesis: An Aristotelian View Paper • 2602.14486 • Published 16 days ago • 11
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines? Paper • 2602.14111 • Published 17 days ago • 55
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published 20 days ago • 58
SemanticMoments: Training-Free Motion Similarity via Third Moment Features Paper • 2602.09146 • Published 23 days ago • 21
Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs Paper • 2602.10388 • Published 21 days ago • 237
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published 23 days ago • 28
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published 21 days ago • 29
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published 26 days ago • 71
Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published Oct 23, 2025 • 50