Kevinma0215 · Kevinma0215 · Jun 20, 2026 · Jun 20, 2026 · Jun 20, 2026 · Jun 20, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,47 @@
+name: CI
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+  workflow_dispatch:
+
+jobs:
+  lint:
+    name: ruff (lint)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - run: pip install ruff
+      - run: ruff check .
+
+  reproduce-analysis:
+    name: no-sim failure analysis (reproducibility smoke)
+    runs-on: ubuntu-latest
+    # ACT inference on a CPU runner is heavy; run on demand rather than on every push.
+    if: github.event_name == 'workflow_dispatch'
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - name: Install (lerobot + hub extras)
+        run: pip install -e ".[lerobot,hub]" "lerobot==0.5.2"
+      - name: Fetch minimal assets from the Hugging Face Hub
+        run: bash scripts/bootstrap_assets.sh --minimal
+      - name: Reproduce the analysis (no simulator)
+        run: bash experiments/act_push_failure/run_all.sh
+      - name: Assert the wrist-shortcut diagnostic holds
+        run: |
+          python - <<'PY'
+          import json
+          s = json.load(open("experiments/act_push_failure/results/push_summary.json"))
+          a = s["E2_camera_ablation"]
+          wrist, overhead = a["delta_black_wrist_only"], a["delta_black_overhead_only"]
+          print(f"black-wrist Δ={wrist:.3f}  black-overhead Δ={overhead:.3f}")
+          assert wrist > overhead, "expected push to rely on the wrist camera (wrist Δ > overhead Δ)"
+          print("OK: push policy is wrist-reliant, as diagnosed.")
+          PY
diff --git a/CITATION.cff b/CITATION.cff
@@ -0,0 +1,28 @@
+cff-version: 1.2.0
+message: "If you use sim2act in your work, please cite it."
+title: "sim2act: a VLA simulation data engine"
+abstract: >-
+  An end-to-end NVIDIA Isaac Lab pipeline for collecting multimodal Franka manipulation
+  demonstrations (a Warp state-machine oracle and a PPO RL teacher), converting them to the
+  LeRobot v3.0 format, training Action Chunking Transformer (ACT) policies, and evaluating them
+  closed-loop — together with a reproducible, simulator-free failure-analysis case study of a
+  camera-reliance shortcut in an imitation policy.
+type: software
+authors:
+  - family-names: Ma
+    given-names: Kevin
+license: Apache-2.0
+repository-code: "https://github.com/Kevinma0215/sim2act"
+url: "https://github.com/Kevinma0215/sim2act"
+version: "0.1.0"
+date-released: "2026-06-19"
+keywords:
+  - vision-language-action
+  - imitation-learning
+  - robot-learning
+  - action-chunking-transformer
+  - isaac-lab
+  - lerobot
+  - sim-to-real
+  - domain-randomization
+  - covariate-shift
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,77 @@
+# CLAUDE.md — sim2act
+
+> Working notes for AI agents on this repo. Read before editing or running anything.
+> This file is the single source of truth for commands, conventions, and verified numbers —
+> prefer it over re-deriving facts from the code.
+
+## What this is
+
+**sim2act** — a VLA (vision-language-action) **simulation data engine** on NVIDIA Isaac Lab. A Franka
+arm performs pick / barrier / push; the repo takes each task from privileged oracle → multimodal demo
+collection → LeRobot v3.0 → ACT imitation learning → a closed-loop eval harness. Positioning is
+three-in-one: **data engine** (umbrella) + **rigorous failure diagnosis** + **end-to-end closed loop**.
+Audience: foundation-model robotics engineers.
+
+Origin: built as an **R&S (Robotics & Simulation) take-home challenge**, now generalized into a public
+project. **R&S = Robotics & Simulation — NOT "Rohde & Schwarz".** The old submission codename was
+"Corvinus" and is being retired (only `docs/archive/` may still mention it).
+
+## Conda environments (CRITICAL — there are two; do not mix them)
+
+Isaac Lab is pip-installed into the `isaaclab` conda env, so run stages with **plain `python`**
+(NOT `./isaaclab.sh -p`, despite older README text). Prefer `conda run -n <env> python ...`.
+
+| env | key versions | use for |
+|---|---|---|
+| **isaaclab** (py3.11) | isaacsim 5.1.0, isaaclab 0.54.4, lerobot **0.4.4**, torch 2.7, warp 1.14 | everything touching the simulator: collect, PPO RL train, LeRobot convert, eval — and the push-fix pipeline's ACT training (`fix_push_widen_dr.sh` runs end-to-end in this env) |
+| **lerobot** (py3.12) | lerobot **0.5.2**, torch 2.11, no Isaac Sim | the **no-sim** failure analysis `experiments/act_push_failure/run_all.sh` (requires 0.5.2) |
+
+Gotcha: the two envs ship different lerobot versions (0.4.4 vs 0.5.2); existing ACT checkpoints load
+under both. Keep each pipeline inside ONE env: run the whole `fix_push_widen_dr.sh` in `isaaclab`;
+run the no-sim ablation in `lerobot`.
+
+Hardware here: 1× RTX 5060 Ti (16 GB). PPO default is 4096 envs — may need fewer on 16 GB; SPEEDRUN uses 256.
+
+## Canonical commands (run from repo root)
+
+- Editable install: `conda run -n isaaclab python -m pip install -e .`
+- Collect (SM oracle): `python scripts/collect/demos.py --task pick_place|barrier --num_demos 50 --headless --enable_cameras`
+- Push RL chain: `python scripts/train/push_rl.py --headless --num_envs 4096` → `python scripts/rl/export_push.py --headless` → `python scripts/collect/push_rl_demos.py --num_demos 50 --num_envs 4 --headless --enable_cameras`
+- Convert → LeRobot: `python data/convert_to_lerobot.py --input _out/datasets/<tag>_official_demos/dataset.hdf5 --output _out/datasets/lerobot/<tag> --state_keys joint_pos,joint_vel --no_depth`
+- Train ACT: `python scripts/train/act.py --dataset _out/datasets/lerobot/<tag> --steps 40000 --batch-size 8 [--wandb]`
+- Eval: `python scripts/eval/policy.py --policy act|oracle|dummy --task <t> --model_path <ckpt>/pretrained_model --num_rollouts 20 --headless --enable_cameras` (extras: `--ablate_camera overhead|wrist`, `--n_action_steps`, `--init_scale`, `--oracle-pose gt|noisy`)
+- No-sim push failure analysis: `conda run -n lerobot bash experiments/act_push_failure/run_all.sh`
+- Push fix (whole chain): `conda run -n isaaclab bash scripts/fix_push_widen_dr.sh` (run `SPEEDRUN=1` first). DR is set via `PUSH_BOX_DR` and shared by train/collect/eval.
+
+## Outputs: everything generated lives under `_out/` (gitignored)
+
+`_out/datasets/{<tag>_official_demos/dataset.hdf5 (raw HDF5), lerobot/<tag> (LeRobot v3.0)}`,
+`_out/rl/franka_push/<ts>/`, `_out/act/act_<tag>_run_<ts>/checkpoints/{<step>,last}/pretrained_model`,
+`_out/eval/*.json`, `_out/viz/`.
+The one generated thing that IS committed: `experiments/act_push_failure/results/` (analysis evidence —
+deliberately gitignore-excepted).
+
+## Verified numbers (cite verbatim; do not re-derive)
+
+From `experiments/act_push_failure/results/*_summary.json`:
+- push teacher-forcing EE-xy L1 = **0.011 m** (proves the model learned the demos).
+- push camera ablation: black-**wrist** Δ = **0.197** vs black-**overhead** Δ = **0.038** → wrist shortcut.
+- barrier ablation: black-overhead Δ = **0.089** vs black-wrist Δ = **0.027** → robust overhead.
+- barrier ACT **90%** in-dist vs SM oracle **75%**; OOD at init_scale 1.5 → **55%**. push ACT **0%** (pre-fix).
+- Root cause: push init DR ±3 cm (vs barrier ±13/±7 cm) → static overhead uninformative → policy takes the
+  wrist shortcut → closed-loop covariate-shift spiral.
+
+## Don't
+
+- Don't commit `_out/` or large media (`.gif/.webm/.mp4`) into git history (host on HF Hub / GitHub releases).
+- Don't rename the Python modules (`envs`/`eval`/`data`/`controllers`) — only the distribution name is `sim2act`.
+- Don't call it "Rohde & Schwarz". R&S = Robotics & Simulation.
+- Don't advertise OpenVLA / Octo / π0 as done — `OpenVLAWrapper` is wired but unvalidated (an extension point).
+- Don't hardcode `/home/kevin786/...` — use the `BASH_SOURCE` repo-root pattern (see `scripts/*.sh`).
+- Don't launch the heavy push fix without a `SPEEDRUN=1` smoke first.
+
+## Layout
+
+`envs/` (base/tasks/scenes cfg) · `controllers/` (Warp GPU state machine) · `scripts/{collect,train,eval,rl,viz}`
+· `eval/` (VLA eval harness) · `data/convert_to_lerobot.py` · `tools/{checks,smoke,viz}` ·
+`experiments/act_push_failure/` (flagship failure analysis, no-sim) · `docs/` · `_out/` (generated, gitignored).
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,41 @@
+# Contributing to sim2act
+
+Thanks for your interest. sim2act is a research codebase; contributions that improve
+reproducibility, add tasks/policies, or sharpen the analysis are very welcome.
+
+## Environments
+
+Two conda envs are used (full matrix in [CLAUDE.md](CLAUDE.md)):
+
+- **`isaaclab`** — anything that touches the simulator: demo collection, PPO RL training, LeRobot
+  conversion, and closed-loop eval. Isaac Lab is pip-installed into this env, so run scripts with
+  plain `python` (not `./isaaclab.sh -p`).
+- **`lerobot`** — the simulator-free failure analysis (`lerobot==0.5.2`).
+
+Install the package editable:
+
+```bash
+conda run -n isaaclab python -m pip install -e ".[hub]"
+```
+
+## Sanity check without a GPU or simulator
+
+The flagship failure analysis reproduces in ~5 minutes from a published dataset + checkpoint, no
+Isaac Sim required:
+
+```bash
+conda activate lerobot
+bash scripts/bootstrap_assets.sh --minimal   # pulls the dataset + checkpoint from the HF Hub
+bash experiments/act_push_failure/run_all.sh
+```
+
+## Style
+
+- Python is linted with [ruff](https://docs.astral.sh/ruff/): `ruff check . && ruff format --check .`
+- Keep generated artifacts out of git — everything lands under `_out/` (gitignored).
+- Don't hardcode absolute paths; shell scripts resolve the repo root via `BASH_SOURCE`.
+
+## Pull requests
+
+Keep PRs focused and clearly described. If a change affects the pipeline, say which stage(s) and
+which conda env you validated it in. CI runs ruff plus the no-simulator reproducibility smoke.