NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 1 day ago • 47
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published 18 days ago • 36
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction Paper • 2505.21473 • Published May 27, 2025 • 16
ByteTrack: Multi-Object Tracking by Associating Every Detection Box Paper • 2110.06864 • Published Oct 13, 2021 • 1
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion Paper • 2111.14690 • Published Nov 29, 2021
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model Paper • 2407.07577 • Published Jul 10, 2024
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals Paper • 2011.12450 • Published Nov 25, 2020
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM Paper • 2412.15156 • Published Dec 19, 2024
Language as Queries for Referring Video Object Segmentation Paper • 2201.00487 • Published Jan 3, 2022