EgoLCD: Egocentric Video Generation with Long Context Diffusion Paper • 2512.04515 • Published Dec 4, 2025 • 5 • 2
BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation Paper • 2511.22973 • Published Nov 28, 2025 • 4 • 2
EvoVLA: Self-Evolving Vision-Language-Action Model Paper • 2511.16166 • Published Nov 20, 2025 • 5 • 2
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models Paper • 2510.01623 • Published Oct 2, 2025 • 10 • 2
VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery Paper • 2509.17191 • Published Sep 21, 2025 • 1 • 2
StereoAdapter: Adapting Stereo Depth Estimation to Underwater Scenes Paper • 2509.16415 • Published Sep 19, 2025 • 2 • 2
ReMoMask: Retrieval-Augmented Masked Motion Generation Paper • 2508.02605 • Published Aug 4, 2025 • 4 • 2
3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding Paper • 2507.23478 • Published Jul 31, 2025 • 15 • 2
PresentAgent: Multimodal Agent for Presentation Video Generation Paper • 2507.04036 • Published Jul 5, 2025 • 10 • 1
MediAug: Exploring Visual Augmentation in Medical Imaging Paper • 2504.18983 • Published Apr 26, 2025 • 7 • 1
DiffuMural: Restoring Dunhuang Murals with Multi-scale Diffusion Paper • 2504.09513 • Published Apr 13, 2025 • 2
PathoHR: Breast Cancer Survival Prediction on High-Resolution Pathological Images Paper • 2503.17970 • Published Mar 23, 2025 • 3 • 2
DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps Paper • 2502.15885 • Published Feb 21, 2025 • 2 • 2
KMM: Key Frame Mask Mamba for Extended Motion Generation Paper • 2411.06481 • Published Nov 10, 2024 • 5 • 2