← Back to Projects
Robotics & Embodied AI2025.07 – 2025.12

π0 VLA Real-Robot Reproduction

Reproduced the π0 vision-language-action model on a real robot arm from scratch in 3 months — built the arm, collected data, fine-tuned, and deployed end-to-end.

π0VLALoRALeRobotROSIsaac Sim
π0 VLA Real-Robot Reproduction

Background

π0 is a state-of-the-art vision-language-action model for robotic manipulation. I wanted to reproduce it on real hardware — not just run inference on a pre-trained checkpoint, but go through the full pipeline from hardware assembly to deployment.

What I Did

Built the robot arm and gripper setup from scratch, designed the scene, and collected 100+ teleoperation demonstrations. Built a LeRobot-format dataset, LoRA fine-tuned the π0 model, and deployed via TCP connection after Isaac Sim validation. Voice-to-text + camera input → ΔJoint output (6-axis + gripper).

Challenges

Learning VLA from zero while simultaneously building hardware in 3 months was the main challenge. Sim-to-real gap and data quality were the biggest technical hurdles — small errors in demonstration data compound quickly during policy rollout.

Takeaways

Proved that a single person can go from zero to a working VLA system in 3 months. The full pipeline — hardware, data, training, deployment — is now something I can iterate on quickly.

Gallery

π0 VLA Real-Robot Reproduction 1
π0 VLA Real-Robot Reproduction 2
π0 VLA Real-Robot Reproduction 3
π0 VLA Real-Robot Reproduction 4

Video