sim2act — a VLA simulation data engine

From simulation to action. sim2act manufactures Vision-Language-Action (VLA) training data in NVIDIA Isaac Lab — and stress-tests the policies it produces.

🇹🇼 中文版：README.zh-Hant.md

A learned ACT policy clearing the barrier task — trained entirely from simulated oracle demos.

TL;DR

sim2act takes three contact-rich Franka manipulation tasks — pick, pick-over-barrier, and non-prehensile push — through a complete data flywheel:

scene → privileged oracle (Warp GPU state machine + a 4096-env PPO teacher) → multimodal demo collection → LeRobot v3.0 → ACT imitation learning → a closed-loop eval harness

one command per stage, fully reproducible. Two results carry the project:

a student ACT policy that beats its privileged oracle on the barrier task (90% vs 75%), with a measured out-of-distribution generalization curve; and
a rigorous, simulator-free root-cause diagnosis of a 0%-success push policy — traced to a camera-shortcut induced by under-randomized initial states — plus the before→after fix that targets it.

The diagnosis reproduces in ~5 minutes on a published dataset + checkpoint, no simulator required.

The design optimizes for what foundation-model robotics actually weighs: data infrastructure at scale, empirical rigor, honest failure analysis, and reproducibility. It started as a Robotics & Sim take-home challenge whose thesis was "the challenge is not the quantity of the data, but the quality" (provenance in docs/archive/).

Headline results

Task	Oracle	Oracle SR	ACT (in-dist)	ACT (OOD ×1.5)	Learned camera	Status
`pick_place`	Warp state machine	—	—	—	overhead	demos ✅
`barrier`	Warp state machine	75%	90%	55%	overhead (robust)	✅
`push`	PPO teacher (4096 env, ~98.5% train)	—	0% → (pending fix run)	—	wrist (fragile) → overhead	🔬 diagnosed + fix in progress

The student beats the teacher on barrier (90% > 75%), and the push failure is diagnosed down to a single causal lever — initial-state randomization width — with controlled camera ablations. Full numbers and methodology: docs/results.md.

Architecture

                    ┌──────────────── privileged oracle ────────────────┐
   scene            │  Warp GPU state machine  (pick / barrier)         │
  (Franka +  ─────▶ │  PPO RL teacher, 4096 envs  (push)                │
   two boxes)       └───────────────────────┬───────────────────────────┘
                                            ▼
                          multimodal demo collection
              overhead RGB-D + wrist RGB + joint pos/vel/torque
                  + fingertip contact forces   ·   8-D action
                                            │
                                            ▼
                  raw HDF5  ──▶  LeRobot v3.0  (parquet + mp4)
                                            │
                            ┌───────────────┴────────────────┐
                            ▼                                ▼
                ACT imitation learning          closed-loop eval harness
                  (chunked actions)             VLAWrapper · success-latch
                                                camera ablation · OOD sweep

Left → right: Warp state-machine pick · PPO push teacher · learned ACT barrier policy.

Multimodal observation per step: overhead RGB-D (224²) + wrist RGB (224²) + joint position/velocity/torque + dual fingertip contact forces. Action: 8-D (7-D IK-absolute end-effector pose + gripper) at ~50 Hz.
Two oracles, by design. A deterministic Warp state machine (docs/state-machine.md) drives the prehensile pick / barrier tasks; a learned PPO teacher (4096 parallel envs, ~98.5% training success) drives the contact-rich push, which a hand-written controller handles poorly.

Deep dive: docs/architecture.md.

Signature case study — diagnosing a 0% push policy

The same pipeline that yields 90% on barrier yields 0% on push: the arm ignores the first box and drives diagonally toward the corner from the first step.

Left: the push policy failing. Right: the same scene with the wrist camera blacked out — the behavior barely changes, exposing the policy's reliance on the wrist view.

Per-camera ablation (mean action change when one camera is zeroed):

ablation	push	barrier
black overhead Δ	0.038	0.089
black wrist Δ	0.197	0.027

Push leans on the fragile, ego-motion-coupled wrist camera; barrier leans on the robust static overhead camera. Teacher-forcing replay (EE-xy L1 = 0.011 m) proves the model did learn the demonstrations — so this is a closed-loop covariate-shift failure, not under-training. Root cause: push initial-state randomization is only ±3 cm (vs barrier ±13/±7 cm), which makes the static overhead view nearly invariant and uninformative, pushing the policy onto the wrist shortcut. The fix widens the randomization (scripts/fix_push_widen_dr.sh, via PUSH_BOX_DR); the before→after comparison is in progress.

Reproduce in ~5 minutes, no simulator:

conda activate lerobot
bash scripts/bootstrap_assets.sh --minimal      # pull the dataset + checkpoint from the HF Hub
bash experiments/act_push_failure/run_all.sh    # regenerates the ablation evidence

Full write-up: docs/case-study-push.md.

Quickstart

1 · Reproduce the failure analysis (no GPU / simulator, ~5 min)

See the case-study block above — it runs entirely in the lerobot conda env on a published dataset + checkpoint.

2 · Run the full pipeline (requires Isaac Lab)

sim2act uses two conda envs (full matrix in CLAUDE.md):

env	used for
`isaaclab` (Isaac Sim 5.1)	demo collection · PPO RL · LeRobot conversion · eval
`lerobot` (lerobot 0.5.2)	the simulator-free failure analysis

conda run -n isaaclab python -m pip install -e ".[hub]"      # one-time editable install

# collect → convert → train → eval  (barrier shown; see docs/architecture.md for all tasks)
python scripts/collect/demos.py --task barrier --num_demos 100 --headless --enable_cameras
python data/convert_to_lerobot.py --input _out/datasets/franka_barrier_official_demos/dataset.hdf5 \
       --output _out/datasets/lerobot/franka_barrier --state_keys joint_pos,joint_vel --no_depth
python scripts/train/act.py --dataset _out/datasets/lerobot/franka_barrier --steps 40000
python scripts/eval/policy.py --policy act --task barrier \
       --model_path _out/act/<run>/checkpoints/last/pretrained_model \
       --num_rollouts 20 --headless --enable_cameras

Push uses the RL-teacher chain (scripts/train/push_rl.py → scripts/rl/export_push.py → scripts/collect/push_rl_demos.py); see docs/architecture.md.

What's inside

Task	Oracle	Primary camera	Status
`pick_place`	Warp state machine	overhead	demos ✅
`barrier` (pick over a ⅓-arm-height wall)	Warp state machine	overhead (robust)	ACT 90% ✅
`push` (non-prehensile, box→box→corner)	PPO teacher	wrist → overhead (after fix)	diagnosed, fix in progress 🔬

Sensor suite (collected for every demo): overhead RGB-D, wrist RGB, joint position/velocity/torque, and left/right fingertip contact forces — chosen for realistic sim-to-real transfer.

Repository layout

envs/          Isaac Lab env configs (base / tasks / scenes; clean inheritance chain)
controllers/   Warp GPU state machine (PickAndPlaceSm)
scripts/       pipeline entrypoints — collect / train / eval / rl / viz
eval/          VLA eval harness (VLAWrapper · EvalRunner · obs adapter · video)
data/          HDF5 → LeRobot v3.0 conversion
experiments/   act_push_failure/ — the flagship, simulator-free failure analysis
tools/         checks / smoke tests / analysis & visualization
docs/          architecture · case study · results · state machine · archive
_out/          all generated artifacts (gitignored; fetched via scripts/bootstrap_assets.sh)

Roadmap

Run the push DR-widening fix to completion and publish the before→after result.
Validate additional policy backends — OpenVLAWrapper is wired in eval/vla_wrapper.py but not yet validated; Octo / π0 are natural next wrappers.
Attack covariate shift directly: DAgger / action-noise collection to cover off-trajectory views.

Citation · License · Acknowledgements

If you use sim2act, please cite it (see CITATION.cff). Licensed under Apache-2.0 (LICENSE). Built on Isaac Lab, LeRobot, rsl_rl, and the ACT architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sim2act — a VLA simulation data engine

TL;DR

Headline results

Architecture

Signature case study — diagnosing a 0% push policy

Quickstart

1 · Reproduce the failure analysis (no GPU / simulator, ~5 min)

2 · Run the full pipeline (requires Isaac Lab)

What's inside

Repository layout

Roadmap

Citation · License · Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
.vscode		.vscode
controllers		controllers
data		data
docs		docs
envs		envs
eval		eval
experiments/act_push_failure		experiments/act_push_failure
scripts		scripts
tools		tools
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-Hant.md		README.zh-Hant.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

sim2act — a VLA simulation data engine

TL;DR

Headline results

Architecture

Signature case study — diagnosing a 0% push policy

Quickstart

1 · Reproduce the failure analysis (no GPU / simulator, ~5 min)

2 · Run the full pipeline (requires Isaac Lab)

What's inside

Repository layout

Roadmap

Citation · License · Acknowledgements

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages