Unitree Go2 โ€” Velocity Flat (PPO)

RL locomotion policy for the Unitree Go2 quadruped robot, trained on flat terrain using PPO.

Demo

Unitree Go2 RL Locomotion in MuJoCo

Training

  • Framework: unitree_rl_mjlab (MuJoCo Warp)
  • Task: Mjlab-Velocity-Flat-Unitree-Go2
  • Algorithm: PPO (RSL-RL)
  • Hardware: 10ร— NVIDIA RTX A4000, 56 CPU cores
  • Environments: 8192 parallel
  • Training time: ~18 minutes (506 iterations)

Results

Metric Value
Mean reward 52.9
Mean episode length 1000 (max, no falls)
Steps/sec 628K-738K

Files

File Description
policy.onnx + policy.onnx.data ONNX model for deployment (go2_ctrl)
model_500.pt Final PyTorch checkpoint (best for fine-tuning)
model_0.pt ... model_400.pt Intermediate checkpoints every 100 steps
params/env.yaml Environment configuration
params/agent.yaml Agent/PPO configuration
events.out.tfevents.* TensorBoard training logs

Usage

Deploy in MuJoCo simulator

# Copy ONNX model
cp policy.onnx policy.onnx.data \
  unitree_rl_mjlab/deploy/robots/go2/config/policy/velocity/v0/exported/

# Run simulator + controller
cd unitree_mujoco/simulate/build && ./unitree_mujoco
cd unitree_rl_mjlab/deploy/robots/go2/build && ./go2_ctrl --network=lo

Fine-tune on rough terrain

# Place model_500.pt in logs/rsl_rl/go2_velocity/<run_name>/
python scripts/train.py Mjlab-Velocity-Rough-Unitree-Go2 \
  --agent.resume=True \
  --agent.load-run="<run_name>" \
  --agent.load-checkpoint="model_500.pt" \
  --agent.algorithm.learning-rate=1e-4

Known Issues

The upstream unitree_rl_mjlab has bugs that crash multi-GPU training on rough terrain โ€” see Issue #9 and PR #8.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading