Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models Paper • 2512.21337 • Published 7 days ago • 26
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 8 days ago • 52
view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day 23 days ago • 46
view article Article Generative AI for Recommendation Systems: A Guide to Tokenizing User Interaction Data Mar 26, 2025 • 8
ARGenSeg: Image Segmentation with Autoregressive Image Generation Model Paper • 2510.20803 • Published Oct 23, 2025 • 9
Unified Reinforcement and Imitation Learning for Vision-Language Models Paper • 2510.19307 • Published Oct 22, 2025 • 30
LazyDrag: Enabling Stable Drag-Based Editing on Multi-Modal Diffusion Transformers via Explicit Correspondence Paper • 2509.12203 • Published Sep 15, 2025 • 19
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1, 2025 • 248
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 259
Running on Zero Featured 180 Chat with Kimi-VL-A3B-Thinking-2506 🤔 180 Chat with images, videos, or PDFs to generate text
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents Paper • 2508.05954 • Published Aug 8, 2025 • 6
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! +10 Aug 5, 2025 • 508