Abstract
RigMo is a unified generative framework that simultaneously learns rig and motion from mesh sequences, encoding deformations into compact latent spaces for interpretable and physically plausible 3D animation.
Despite significant progress in 4D generation, rig and motion, the core structural and dynamic components of animation are typically modeled as separate problems. Existing pipelines rely on ground-truth skeletons and skinning weights for motion generation and treat auto-rigging as an independent process, undermining scalability and interpretability. We present RigMo, a unified generative framework that jointly learns rig and motion directly from raw mesh sequences, without any human-provided rig annotations. RigMo encodes per-vertex deformations into two compact latent spaces: a rig latent that decodes into explicit Gaussian bones and skinning weights, and a motion latent that produces time-varying SE(3) transformations. Together, these outputs define an animatable mesh with explicit structure and coherent motion, enabling feed-forward rig and motion inference for deformable objects. Beyond unified rig-motion discovery, we introduce a Motion-DiT model operating in RigMo's latent space and demonstrate that these structure-aware latents can naturally support downstream motion generation tasks. Experiments on DeformingThings4D, Objaverse-XL, and TrueBones demonstrate that RigMo learns smooth, interpretable, and physically plausible rigs, while achieving superior reconstruction and category-level generalization compared to existing auto-rigging and deformation baselines. RigMo establishes a new paradigm for unified, structure-aware, and scalable dynamic 3D modeling.
Community
🚀 New work: RigMo — Unifying Rig & Motion Learning for Generative Animation
Rigging and motion are two hard problems—usually solved separately.
RigMo unifies them.
A feed-forward framework that jointly learns rig structure + motion
directly from raw mesh sequences
→ no rig annotations, no per-sequence optimization.
✨ Highlights
• Explicit Gaussian bones + skinning weights
• SE(3) bone motions from compact latents
• Motion-DiT for controllable motion synthesis
• Interpretable, reusable, truly animatable 4D assets
📊 Strong results on DeformingThings4D / Objaverse-XL / TrueBones
🔗 Project page: https://rigmo-page.github.io
📄 Paper (arXiv): https://arxiv.org/abs/2601.06378
📺 Video: https://youtube.com/watch?v=0H0lsM3USVM
💻 Code: coming soon
#ComputerGraphics #Animation #Rigging #4D #3D #GenerativeAI
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation (2025)
- Topology-Agnostic Animal Motion Generation from Text Prompt (2025)
- AnimaMimic: Imitating 3D Animation from Video Priors (2025)
- MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos (2025)
- Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video (2026)
- DragMesh: Interactive 3D Generation Made Easy (2025)
- CAMO: Category-Agnostic 3D Motion Transfer from Monocular 2D Videos (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper