Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation Paper • 2309.11081 • Published Sep 20, 2023
Multimodal Knowledge Alignment with Reinforcement Learning Paper • 2205.12630 • Published May 25, 2022
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates Paper • 2505.22943 • Published May 28 • 3