G$^2$VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning Paper • 2511.21688 • Published 10 days ago • 8
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy Paper • 2510.13778 • Published Oct 15 • 16
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15 • 104
ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians Paper • 2406.16815 • Published Jun 24, 2024 • 7
Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image Paper • 2406.16710 • Published Jun 24, 2024
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts Paper • 2509.10813 • Published Sep 13 • 30
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning Paper • 2509.22281 • Published Sep 26 • 31
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning Paper • 2509.15937 • Published Sep 19 • 20
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published Aug 20 • 68
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds Paper • 2508.14879 • Published Aug 20 • 68
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7 • 47
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting Paper • 2507.15454 • Published Jul 21 • 7
DREAMWALKER: Mental Planning for Continuous Vision-Language Navigation Paper • 2308.07498 • Published Aug 14, 2023
Evolving Symbolic 3D Visual Grounder with Weakly Supervised Reflection Paper • 2502.01401 • Published Feb 3 • 1
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance Paper • 2505.08712 • Published May 13 • 6
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7 • 47