SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language Models Paper • 2506.12992 • Published Jun 15
Boosting Medical Visual Understanding From Multi-Granular Language Learning Paper • 2511.15943 • Published 16 days ago • 1
SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 18 days ago • 17
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer Paper • 2509.16197 • Published Sep 19 • 56
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms Paper • 2410.18967 • Published Oct 24, 2024 • 1
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge Paper • 2408.02865 • Published Aug 5, 2024 • 2
SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 18 days ago • 17
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks Paper • 2504.01308 • Published Apr 2 • 14
SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 18 days ago • 17
SuperBPE Collection SuperBPE tokenizers and models trained with them • 9 items • Updated 18 days ago • 17