Skip to content

Kevinma0215/sim2act

Repository files navigation

sim2act — a VLA simulation data engine

CI License: Apache-2.0 Python 3.10+ Isaac Sim 5.1 LeRobot v3.0

From simulation to action. sim2act manufactures Vision-Language-Action (VLA) training data in NVIDIA Isaac Lab — and stress-tests the policies it produces.

🇹🇼 中文版:README.zh-Hant.md

ACT policy picking a block over a barrier
A learned ACT policy clearing the barrier task — trained entirely from simulated oracle demos.

TL;DR

sim2act takes three contact-rich Franka manipulation tasks — pick, pick-over-barrier, and non-prehensile push — through a complete data flywheel:

scene → privileged oracle (Warp GPU state machine + a 4096-env PPO teacher) → multimodal demo collection → LeRobot v3.0 → ACT imitation learning → a closed-loop eval harness

one command per stage, fully reproducible. Two results carry the project:

  1. a student ACT policy that beats its privileged oracle on the barrier task (90% vs 75%), with a measured out-of-distribution generalization curve; and
  2. a rigorous, simulator-free root-cause diagnosis of a 0%-success push policy — traced to a camera-shortcut induced by under-randomized initial states — plus the before→after fix that targets it.

The diagnosis reproduces in ~5 minutes on a published dataset + checkpoint, no simulator required.

The design optimizes for what foundation-model robotics actually weighs: data infrastructure at scale, empirical rigor, honest failure analysis, and reproducibility. It started as a Robotics & Sim take-home challenge whose thesis was "the challenge is not the quantity of the data, but the quality" (provenance in docs/archive/).

Headline results

Task Oracle Oracle SR ACT (in-dist) ACT (OOD ×1.5) Learned camera Status
pick_place Warp state machine overhead demos ✅
barrier Warp state machine 75% 90% 55% overhead (robust)
push PPO teacher (4096 env, ~98.5% train) 0%(pending fix run) wrist (fragile) → overhead 🔬 diagnosed + fix in progress

The student beats the teacher on barrier (90% > 75%), and the push failure is diagnosed down to a single causal lever — initial-state randomization width — with controlled camera ablations. Full numbers and methodology: docs/results.md.

Architecture

                    ┌──────────────── privileged oracle ────────────────┐
   scene            │  Warp GPU state machine  (pick / barrier)         │
  (Franka +  ─────▶ │  PPO RL teacher, 4096 envs  (push)                │
   two boxes)       └───────────────────────┬───────────────────────────┘
                                            ▼
                          multimodal demo collection
              overhead RGB-D + wrist RGB + joint pos/vel/torque
                  + fingertip contact forces   ·   8-D action
                                            │
                                            ▼
                  raw HDF5  ──▶  LeRobot v3.0  (parquet + mp4)
                                            │
                            ┌───────────────┴────────────────┐
                            ▼                                ▼
                ACT imitation learning          closed-loop eval harness
                  (chunked actions)             VLAWrapper · success-latch
                                                camera ablation · OOD sweep

state-machine pick PPO push teacher ACT barrier success
Left → right: Warp state-machine pick · PPO push teacher · learned ACT barrier policy.

  • Multimodal observation per step: overhead RGB-D (224²) + wrist RGB (224²) + joint position/velocity/torque + dual fingertip contact forces. Action: 8-D (7-D IK-absolute end-effector pose + gripper) at ~50 Hz.
  • Two oracles, by design. A deterministic Warp state machine (docs/state-machine.md) drives the prehensile pick / barrier tasks; a learned PPO teacher (4096 parallel envs, ~98.5% training success) drives the contact-rich push, which a hand-written controller handles poorly.

Deep dive: docs/architecture.md.

Signature case study — diagnosing a 0% push policy

The same pipeline that yields 90% on barrier yields 0% on push: the arm ignores the first box and drives diagonally toward the corner from the first step.

push policy failing push policy with wrist camera blacked out
Left: the push policy failing. Right: the same scene with the wrist camera blacked out — the behavior barely changes, exposing the policy's reliance on the wrist view.

Per-camera ablation (mean action change when one camera is zeroed):

ablation push barrier
black overhead Δ 0.038 0.089
black wrist Δ 0.197 0.027

Push leans on the fragile, ego-motion-coupled wrist camera; barrier leans on the robust static overhead camera. Teacher-forcing replay (EE-xy L1 = 0.011 m) proves the model did learn the demonstrations — so this is a closed-loop covariate-shift failure, not under-training. Root cause: push initial-state randomization is only ±3 cm (vs barrier ±13/±7 cm), which makes the static overhead view nearly invariant and uninformative, pushing the policy onto the wrist shortcut. The fix widens the randomization (scripts/fix_push_widen_dr.sh, via PUSH_BOX_DR); the before→after comparison is in progress.

Reproduce in ~5 minutes, no simulator:

conda activate lerobot
bash scripts/bootstrap_assets.sh --minimal      # pull the dataset + checkpoint from the HF Hub
bash experiments/act_push_failure/run_all.sh    # regenerates the ablation evidence

Full write-up: docs/case-study-push.md.

Quickstart

1 · Reproduce the failure analysis (no GPU / simulator, ~5 min)

See the case-study block above — it runs entirely in the lerobot conda env on a published dataset + checkpoint.

2 · Run the full pipeline (requires Isaac Lab)

sim2act uses two conda envs (full matrix in CLAUDE.md):

env used for
isaaclab (Isaac Sim 5.1) demo collection · PPO RL · LeRobot conversion · eval
lerobot (lerobot 0.5.2) the simulator-free failure analysis
conda run -n isaaclab python -m pip install -e ".[hub]"      # one-time editable install

# collect → convert → train → eval  (barrier shown; see docs/architecture.md for all tasks)
python scripts/collect/demos.py --task barrier --num_demos 100 --headless --enable_cameras
python data/convert_to_lerobot.py --input _out/datasets/franka_barrier_official_demos/dataset.hdf5 \
       --output _out/datasets/lerobot/franka_barrier --state_keys joint_pos,joint_vel --no_depth
python scripts/train/act.py --dataset _out/datasets/lerobot/franka_barrier --steps 40000
python scripts/eval/policy.py --policy act --task barrier \
       --model_path _out/act/<run>/checkpoints/last/pretrained_model \
       --num_rollouts 20 --headless --enable_cameras

Push uses the RL-teacher chain (scripts/train/push_rl.pyscripts/rl/export_push.pyscripts/collect/push_rl_demos.py); see docs/architecture.md.

What's inside

Task Oracle Primary camera Status
pick_place Warp state machine overhead demos ✅
barrier (pick over a ⅓-arm-height wall) Warp state machine overhead (robust) ACT 90% ✅
push (non-prehensile, box→box→corner) PPO teacher wrist → overhead (after fix) diagnosed, fix in progress 🔬

Sensor suite (collected for every demo): overhead RGB-D, wrist RGB, joint position/velocity/torque, and left/right fingertip contact forces — chosen for realistic sim-to-real transfer.

Repository layout

envs/          Isaac Lab env configs (base / tasks / scenes; clean inheritance chain)
controllers/   Warp GPU state machine (PickAndPlaceSm)
scripts/       pipeline entrypoints — collect / train / eval / rl / viz
eval/          VLA eval harness (VLAWrapper · EvalRunner · obs adapter · video)
data/          HDF5 → LeRobot v3.0 conversion
experiments/   act_push_failure/ — the flagship, simulator-free failure analysis
tools/         checks / smoke tests / analysis & visualization
docs/          architecture · case study · results · state machine · archive
_out/          all generated artifacts (gitignored; fetched via scripts/bootstrap_assets.sh)

Roadmap

  • Run the push DR-widening fix to completion and publish the before→after result.
  • Validate additional policy backends — OpenVLAWrapper is wired in eval/vla_wrapper.py but not yet validated; Octo / π0 are natural next wrappers.
  • Attack covariate shift directly: DAgger / action-noise collection to cover off-trajectory views.

Citation · License · Acknowledgements

If you use sim2act, please cite it (see CITATION.cff). Licensed under Apache-2.0 (LICENSE). Built on Isaac Lab, LeRobot, rsl_rl, and the ACT architecture.

About

End-to-end VLA sim data engine (Isaac Lab → LeRobot → ACT) + a reproducible policy failure analysis.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors