My thing - a EL102 Collection

EL102 's Collections

My thing

updated 9 days ago

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published 13 days ago • 5
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published 19 days ago • 17
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published 12 days ago • 26
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105
Temporal Prompting Matters: Rethinking Referring Video Object Segmentation

Paper • 2510.07319 • Published Oct 8 • 2
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published 16 days ago • 91
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents

Paper • 2511.13593 • Published 19 days ago • 24
RynnVLA-002: A Unified Vision-Language-Action and World Model

Paper • 2511.17502 • Published 15 days ago • 24
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models

Paper • 2511.11007 • Published 22 days ago • 15
Depth Anything 3: Recovering the Visual Space from Any Views

Paper • 2511.10647 • Published 23 days ago • 92
LightRAG: Simple and Fast Retrieval-Augmented Generation

Paper • 2410.05779 • Published Oct 8, 2024 • 22
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16 • 103
TradingAgents: Multi-Agents LLM Financial Trading Framework

Paper • 2412.20138 • Published Dec 28, 2024 • 14
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Paper • 2410.17799 • Published Oct 23, 2024 • 5
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Paper • 2511.13648 • Published 19 days ago • 52
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26 • 135
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery

Paper • 2510.15869 • Published Oct 17 • 45
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization

Paper • 2511.15705 • Published 17 days ago • 91
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

Paper • 2510.22543 • Published Oct 26 • 10
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

Paper • 2511.19900 • Published 11 days ago • 46