From simulation to action. sim2act manufactures Vision-Language-Action (VLA) training data in NVIDIA Isaac Lab — and stress-tests the policies it produces.
🇹🇼 中文版:README.zh-Hant.md

A learned ACT policy clearing the barrier task — trained entirely from simulated oracle demos.
sim2act takes three contact-rich Franka manipulation tasks — pick, pick-over-barrier, and non-prehensile push — through a complete data flywheel:
scene → privileged oracle (Warp GPU state machine + a 4096-env PPO teacher) → multimodal demo collection → LeRobot v3.0 → ACT imitation learning → a closed-loop eval harness
one command per stage, fully reproducible. Two results carry the project:
- a student ACT policy that beats its privileged oracle on the barrier task (90% vs 75%), with a measured out-of-distribution generalization curve; and
- a rigorous, simulator-free root-cause diagnosis of a 0%-success push policy — traced to a camera-shortcut induced by under-randomized initial states — plus the before→after fix that targets it.
The diagnosis reproduces in ~5 minutes on a published dataset + checkpoint, no simulator required.
The design optimizes for what foundation-model robotics actually weighs: data infrastructure at scale, empirical rigor, honest failure analysis, and reproducibility. It started as a Robotics & Sim take-home challenge whose thesis was "the challenge is not the quantity of the data, but the quality" (provenance in docs/archive/).
| Task | Oracle | Oracle SR | ACT (in-dist) | ACT (OOD ×1.5) | Learned camera | Status |
|---|---|---|---|---|---|---|
pick_place |
Warp state machine | — | — | — | overhead | demos ✅ |
barrier |
Warp state machine | 75% | 90% | 55% | overhead (robust) | ✅ |
push |
PPO teacher (4096 env, ~98.5% train) | — | 0% → (pending fix run) | — | wrist (fragile) → overhead | 🔬 diagnosed + fix in progress |
The student beats the teacher on barrier (90% > 75%), and the push failure is diagnosed down to a single causal lever — initial-state randomization width — with controlled camera ablations. Full numbers and methodology: docs/results.md.
┌──────────────── privileged oracle ────────────────┐
scene │ Warp GPU state machine (pick / barrier) │
(Franka + ─────▶ │ PPO RL teacher, 4096 envs (push) │
two boxes) └───────────────────────┬───────────────────────────┘
▼
multimodal demo collection
overhead RGB-D + wrist RGB + joint pos/vel/torque
+ fingertip contact forces · 8-D action
│
▼
raw HDF5 ──▶ LeRobot v3.0 (parquet + mp4)
│
┌───────────────┴────────────────┐
▼ ▼
ACT imitation learning closed-loop eval harness
(chunked actions) VLAWrapper · success-latch
camera ablation · OOD sweep

Left → right: Warp state-machine pick · PPO push teacher · learned ACT barrier policy.
- Multimodal observation per step: overhead RGB-D (224²) + wrist RGB (224²) + joint position/velocity/torque + dual fingertip contact forces. Action: 8-D (7-D IK-absolute end-effector pose + gripper) at ~50 Hz.
- Two oracles, by design. A deterministic Warp state machine (docs/state-machine.md) drives the prehensile pick / barrier tasks; a learned PPO teacher (4096 parallel envs, ~98.5% training success) drives the contact-rich push, which a hand-written controller handles poorly.
Deep dive: docs/architecture.md.
The same pipeline that yields 90% on barrier yields 0% on push: the arm ignores the first box and drives diagonally toward the corner from the first step.

Left: the push policy failing. Right: the same scene with the wrist camera blacked out —
the behavior barely changes, exposing the policy's reliance on the wrist view.
Per-camera ablation (mean action change when one camera is zeroed):
| ablation | push | barrier |
|---|---|---|
| black overhead Δ | 0.038 | 0.089 |
| black wrist Δ | 0.197 | 0.027 |
Push leans on the fragile, ego-motion-coupled wrist camera; barrier leans on the robust static
overhead camera. Teacher-forcing replay (EE-xy L1 = 0.011 m) proves the model did learn the
demonstrations — so this is a closed-loop covariate-shift failure, not under-training. Root cause:
push initial-state randomization is only ±3 cm (vs barrier ±13/±7 cm), which makes the static
overhead view nearly invariant and uninformative, pushing the policy onto the wrist shortcut. The fix
widens the randomization (scripts/fix_push_widen_dr.sh, via
PUSH_BOX_DR); the before→after comparison is in progress.
Reproduce in ~5 minutes, no simulator:
conda activate lerobot
bash scripts/bootstrap_assets.sh --minimal # pull the dataset + checkpoint from the HF Hub
bash experiments/act_push_failure/run_all.sh # regenerates the ablation evidenceFull write-up: docs/case-study-push.md.
See the case-study block above — it runs entirely in the lerobot conda env on a published dataset +
checkpoint.
sim2act uses two conda envs (full matrix in CLAUDE.md):
| env | used for |
|---|---|
isaaclab (Isaac Sim 5.1) |
demo collection · PPO RL · LeRobot conversion · eval |
lerobot (lerobot 0.5.2) |
the simulator-free failure analysis |
conda run -n isaaclab python -m pip install -e ".[hub]" # one-time editable install
# collect → convert → train → eval (barrier shown; see docs/architecture.md for all tasks)
python scripts/collect/demos.py --task barrier --num_demos 100 --headless --enable_cameras
python data/convert_to_lerobot.py --input _out/datasets/franka_barrier_official_demos/dataset.hdf5 \
--output _out/datasets/lerobot/franka_barrier --state_keys joint_pos,joint_vel --no_depth
python scripts/train/act.py --dataset _out/datasets/lerobot/franka_barrier --steps 40000
python scripts/eval/policy.py --policy act --task barrier \
--model_path _out/act/<run>/checkpoints/last/pretrained_model \
--num_rollouts 20 --headless --enable_camerasPush uses the RL-teacher chain (scripts/train/push_rl.py → scripts/rl/export_push.py →
scripts/collect/push_rl_demos.py); see docs/architecture.md.
| Task | Oracle | Primary camera | Status |
|---|---|---|---|
pick_place |
Warp state machine | overhead | demos ✅ |
barrier (pick over a ⅓-arm-height wall) |
Warp state machine | overhead (robust) | ACT 90% ✅ |
push (non-prehensile, box→box→corner) |
PPO teacher | wrist → overhead (after fix) | diagnosed, fix in progress 🔬 |
Sensor suite (collected for every demo): overhead RGB-D, wrist RGB, joint position/velocity/torque, and left/right fingertip contact forces — chosen for realistic sim-to-real transfer.
envs/ Isaac Lab env configs (base / tasks / scenes; clean inheritance chain)
controllers/ Warp GPU state machine (PickAndPlaceSm)
scripts/ pipeline entrypoints — collect / train / eval / rl / viz
eval/ VLA eval harness (VLAWrapper · EvalRunner · obs adapter · video)
data/ HDF5 → LeRobot v3.0 conversion
experiments/ act_push_failure/ — the flagship, simulator-free failure analysis
tools/ checks / smoke tests / analysis & visualization
docs/ architecture · case study · results · state machine · archive
_out/ all generated artifacts (gitignored; fetched via scripts/bootstrap_assets.sh)
- Run the push DR-widening fix to completion and publish the before→after result.
- Validate additional policy backends —
OpenVLAWrapperis wired ineval/vla_wrapper.pybut not yet validated; Octo / π0 are natural next wrappers. - Attack covariate shift directly: DAgger / action-noise collection to cover off-trajectory views.
If you use sim2act, please cite it (see CITATION.cff). Licensed under Apache-2.0 (LICENSE). Built on Isaac Lab, LeRobot, rsl_rl, and the ACT architecture.