diff --git a/.gitignore b/.gitignore index 3776d03..32a95dd 100644 --- a/.gitignore +++ b/.gitignore @@ -1,3 +1,6 @@ +# AWS security files +aws/*.pem + # MacOS Security Files .DS_Store .claude diff --git a/Liberty-Notes/Nvidia-Issac-Lab/March-22.ipynb b/Liberty-Notes/Nvidia-Issac-Lab/March-22.ipynb new file mode 100644 index 0000000..0abf47f --- /dev/null +++ b/Liberty-Notes/Nvidia-Issac-Lab/March-22.ipynb @@ -0,0 +1,546 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a0000001", + "metadata": {}, + "source": [ + "# March 22 — AIC Competition Deep Dive\n", + "## Task Parameters · Scoring System · ACT Hyperparams · Running Experiments\n", + "\n", + "> Session notes from thorough codebase analysis (2026-03-22)" + ] + }, + { + "cell_type": "markdown", + "id": "a0000002", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 1. Submission Structure — The Most Important Thing\n", + "\n", + "**One Docker image. One policy class. Three trials.**\n", + "\n", + "The same `insert_cable()` method handles all three trials. You never submit different policies per trial. The engine sends a different `Task` message each time and your code reads it.\n", + "\n", + "```python\n", + "# The Task message fields your policy receives:\n", + "task.cable_type # e.g. \"sfp_sc\"\n", + "task.cable_name # e.g. \"cable_0\"\n", + "task.plug_type # \"sfp\" or \"sc\" <-- branch on this\n", + "task.plug_name # e.g. \"sfp_tip\" or \"sc_tip\"\n", + "task.port_type # \"sfp\" or \"sc\"\n", + "task.port_name # e.g. \"sfp_port_0\" or \"sc_port_base\"\n", + "task.target_module_name # e.g. \"nic_card_mount_0\" or \"sc_port_1\"\n", + "task.time_limit # seconds (180 in sample config)\n", + "```\n", + "\n", + "**Submission limits:**\n", + "- 1 submission per day (team-wide)\n", + "- ECR image tags are **immutable** — increment every push (`:v1`, `:v2`, ...)\n", + "- Portal login credentials go to team leaders by end of March 2026\n", + "- Always test locally first: `docker compose -f docker/docker-compose.yaml up`" + ] + }, + { + "cell_type": "markdown", + "id": "a0000003", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 2. Task Parameters — What Gets Randomised\n", + "\n", + "Trials come from **fixed YAML configs** loaded by `aic_engine`. Not randomly generated at runtime. The sample config lives at `aic_engine/config/sample_config.yaml`. The actual eval config is different (not published) but follows the same structure.\n", + "\n", + "### Three trials in qualification:\n", + "\n", + "| Trial | Plug type | Target | What's randomised |\n", + "|-------|-----------|--------|---------|\n", + "| 1 | SFP module | sfp_port_0 on NIC card | Board pose (x,y,yaw), which nic_rail (0-4) the card is on, card translation on rail, card yaw |\n", + "| 2 | SFP module | sfp_port_0 on NIC card | Same as Trial 1 but different random seed |\n", + "| 3 | SC plug | sc_port_base on SC port | Board pose (x,y,yaw), SC rail translation |\n", + "\n", + "### What the YAML config controls (per trial):\n", + "\n", + "```yaml\n", + "trials:\n", + " trial_1:\n", + " scene:\n", + " task_board:\n", + " pose: {x, y, z, roll, pitch, yaw} # board position in world\n", + " nic_rail_0:\n", + " entity_present: True/False\n", + " entity_name: \"nic_card_0\" # which NIC model to spawn\n", + " entity_pose:\n", + " translation: 0.036 # along rail (m)\n", + " roll/pitch/yaw: 0.0 # orientation offset\n", + " nic_rail_1: {entity_present: False}\n", + " # ... nic_rail_2 through 4\n", + " sc_rail_0: {entity_present: True, entity_pose: {translation, yaw}}\n", + " lc_mount_rail_0/1: {entity_present, entity_pose}\n", + " sfp_mount_rail_0/1: {entity_present, entity_pose}\n", + " sc_mount_rail_0/1: {entity_present, entity_pose}\n", + " cables:\n", + " cable_0:\n", + " attach_cable_to_gripper: True # robot starts holding plug\n", + " cable_type: \"sfp_sc_cable\" # or sfp_sc_cable_reversed\n", + " pose:\n", + " gripper_offset: {x, y, z} # plug offset in gripper frame\n", + " roll/pitch/yaw: ... # cable orientation\n", + " tasks:\n", + " task_1:\n", + " cable_type, cable_name, plug_type, plug_name\n", + " port_type, port_name, target_module_name\n", + " time_limit: 180\n", + "```\n", + "\n", + "### Rail translation limits (from sample_config.yaml):\n", + "\n", + "```yaml\n", + "task_board_limits:\n", + " nic_rail: {min: -0.0215m, max: 0.0234m} # ~4.5cm range\n", + " sc_rail: {min: -0.06m, max: 0.055m} # ~11.5cm range\n", + " mount_rail: {min: -0.09425m, max: 0.09425m} # ~18.9cm range\n", + "```\n", + "\n", + "### Eval constraints (from qualification_phase.md):\n", + "- Roll and pitch of ALL components = **always 0.0** in official eval\n", + "- SC port yaw = **always 0.0** in official eval\n", + "- Robot starts with plug **already in hand**, close to target\n", + "- Grasp pose has small deviations (~2mm, ~0.04 rad) — policy must be robust\n", + "- Target port is **always within view** of robot cameras\n", + "\n", + "**Implication:** Your policy doesn't need to handle full workspace search — the port is always visible. The hard part is precise alignment and insertion." + ] + }, + { + "cell_type": "markdown", + "id": "a0000004", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 3. Scoring System — Full Breakdown\n", + "\n", + "### Score breakdown (100 pts max per trial)\n", + "\n", + "**IMPORTANT: `scoring_tests.md` has OUTDATED numbers. Use `scoring.md` as truth.**\n", + "\n", + "| Tier | Category | Range | Condition |\n", + "|------|----------|-------|-----------|\n", + "| 1 | Model validity | 0 or 1 | Pass/fail: model loads, responds to InsertCable action |\n", + "| 2 | Trajectory smoothness | 0–6 | Savitzky-Golay jerk filter; 0 m/s³ → 6pts, ≥50 m/s³ → 0pts. **Only awarded if Tier3 > 0** |\n", + "| 2 | Task duration | 0–12 | ≤5s → 12pts, ≥60s → 0pts, linear between. **Only if Tier3 > 0** |\n", + "| 2 | Path efficiency | 0–6 | Path ≤ initial plug-port dist → 6pts, ≥1m extra → 0pts. **Only if Tier3 > 0** |\n", + "| 2 | Force penalty | 0 to −12 | Force > 20N for > 1s → −12pts |\n", + "| 2 | Off-limit contact | 0 to −24 | Any robot link touching enclosure/walls/task_board → −24pts |\n", + "| 3 | Correct insertion | 75 | Full insertion into correct port (contact sensors) |\n", + "| 3 | Wrong insertion | −12 | Inserted into wrong port |\n", + "| 3 | Partial insertion | 38–50 | Plug inside port bounding box (within 5mm x-y tolerance) |\n", + "| 3 | Proximity | 0–25 | Distance-based when not inserted (0 at max dist, 25 at port entrance) |\n", + "\n", + "**Max per trial = 100. Max total = 300.**\n", + "\n", + "### Key insight: Tier 2 is gated on Tier 3\n", + "Smoothness, duration, and efficiency are **only awarded** if the plug ends up close to or inside the port. Waving the arm smoothly scores nothing on Tier 2. You need proximity first.\n", + "\n", + "### Off-limit models (from WallToucher scoring example):\n", + "- `enclosure` — floor, corner posts, ceiling\n", + "- `enclosure walls` — transparent acrylic panels\n", + "- `task_board` — the board + everything mounted on it (NIC cards, SC ports, etc.)\n", + "\n", + "Only robot links trigger the penalty — the cable itself does not.\n", + "\n", + "### Scoring output format (`scoring.yaml`):\n", + "```yaml\n", + "trial_1:\n", + " tier1:\n", + " score: 1\n", + " message: \"Model validation succeeded.\"\n", + " tier2:\n", + " score: 20.8\n", + " categories:\n", + " smoothness: {score: 5.2, message: \"...\"}\n", + " duration: {score: 10.6, message: \"...\"}\n", + " efficiency: {score: 5.0, message: \"...\"}\n", + " force: {score: 0.0}\n", + " contacts: {score: 0.0}\n", + " tier3:\n", + " score: 9.9\n", + " message: \"Proximity score...\"\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "a0000005", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 4. How the Scorer Works Internally\n", + "\n", + "The scorer is a **C++ ROS 2 node** inside `aic_scoring/`. It's not a standalone script — it runs as part of `aic_engine` during trials.\n", + "\n", + "### What it does:\n", + "1. **During trial:** `ScoringTier2::StartRecording()` subscribes to all topics and writes a rosbag\n", + "2. **After trial:** `ScoringTier2::ComputeScore()` reads the bag offline and calculates metrics\n", + "3. Writes `scoring.yaml` to `$AIC_RESULTS_DIR`\n", + "\n", + "### Topics recorded for scoring:\n", + "```\n", + "/joint_states # robot joint positions\n", + "/tf # gripper pose\n", + "/tf_static # static transforms\n", + "/scoring/tf # cable plug+port poses (ground truth internal)\n", + "/aic/gazebo/contacts/off_limit # collision events\n", + "/fts_broadcaster/wrench # force-torque sensor\n", + "/aic_controller/joint_commands # your joint commands\n", + "/aic_controller/pose_commands # your Cartesian commands\n", + "/scoring/insertion_event # plug-port contact event\n", + "/aic_controller/controller_state # TCP pose, velocities\n", + "```\n", + "\n", + "### Can we use the scorer locally?\n", + "\n", + "**Yes — it runs automatically inside the Docker eval container.** Every `docker run ... aic_eval:latest` run with `start_aic_engine:=true` runs the scorer and writes `scoring.yaml` to `$AIC_RESULTS_DIR`.\n", + "\n", + "You can also run from source (if you build `~/ws_aic` natively) using the three-terminal pattern from `scoring_tests.md`. Docker is simpler and already set up.\n", + "\n", + "**You CANNOT run the scorer standalone** on an arbitrary rosbag — it's integrated into the engine lifecycle. The scorer starts/stops recording based on engine state machine transitions." + ] + }, + { + "cell_type": "markdown", + "id": "a0000006", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 5. How to Run All Experiments\n", + "\n", + "### Our setup (Docker-based, already working)\n", + "\n", + "```bash\n", + "# Terminal 1: Eval environment\n", + "POLICY_NAME=CheatCode GT=true # or: WaveArm/false, GentleGiant/false, etc.\n", + "RUN_DIR=~/projects/Project-Automaton/aic_results/${POLICY_NAME}_$(date +%Y%m%d_%H%M%S)\n", + "mkdir -p \"$RUN_DIR\"\n", + "\n", + "docker run -it --rm \\\n", + " --name aic_eval --network host --gpus all \\\n", + " -e DISPLAY=:0 -e WAYLAND_DISPLAY=wayland-0 \\\n", + " -e XDG_RUNTIME_DIR=/mnt/wslg/runtime-dir \\\n", + " -e PULSE_SERVER=/mnt/wslg/PulseServer \\\n", + " -e GALLIUM_DRIVER=d3d12 \\\n", + " -e LD_LIBRARY_PATH=/usr/lib/wsl/lib \\\n", + " -e AIC_RESULTS_DIR=/aic_results \\\n", + " -v /tmp/.X11-unix:/tmp/.X11-unix -v /mnt/wslg:/mnt/wslg \\\n", + " -v /usr/lib/wsl:/usr/lib/wsl -v \"$RUN_DIR\":/aic_results \\\n", + " --device /dev/dxg \\\n", + " ghcr.io/intrinsic-dev/aic/aic_eval:latest \\\n", + " ground_truth:=$GT start_aic_engine:=true\n", + "\n", + "# Terminal 2: Policy (wait for \"Retrying...\" first)\n", + "cd ~/projects/Project-Automaton/References/aic\n", + "pixi run ros2 run aic_model aic_model \\\n", + " --ros-args -p use_sim_time:=true \\\n", + " -p policy:=aic_example_policies.ros.$POLICY_NAME\n", + "```\n", + "\n", + "### All example policies and what they test:\n", + "\n", + "| Policy | GT? | Expected Tier2 | Expected Tier3 | Purpose |\n", + "|--------|-----|----------------|----------------|---------|\n", + "| WaveArm | No | smoothness only if near port | proximity | API skeleton baseline |\n", + "| CheatCode | **Yes** | high smoothness + duration + efficiency | 75 (full insert) | Gold standard to beat |\n", + "| GentleGiant | No | 0 (plug not near port) | 0 | Low jerk reference |\n", + "| SpeedDemon | No | −12 force penalty | 0 | High jerk reference |\n", + "| WallToucher | No | −24 contact penalty | 0 | Off-limit contact demo |\n", + "| WallPresser | No | −12 force + −24 contact | 0 | Excessive force demo |\n", + "| RunACT | No | varies | varies | Neural net baseline |\n", + "\n", + "### Alternative: Build from source (no Docker)\n", + "From `scoring_tests.md` — if you build `~/ws_aic` natively, use THREE terminals:\n", + "```bash\n", + "# Terminal 0\n", + "source ~/ws_aic/install/setup.bash\n", + "export RMW_IMPLEMENTATION=rmw_zenoh_cpp\n", + "ros2 run rmw_zenoh_cpp rmw_zenohd\n", + "\n", + "# Terminal 1\n", + "source ~/ws_aic/install/setup.bash\n", + "ros2 run aic_model aic_model --ros-args -p use_sim_time:=true -p policy:=...\n", + "\n", + "# Terminal 2\n", + "AIC_RESULTS_DIR=~/aic_results/run_name \\\n", + "ros2 launch aic_bringup aic_gz_bringup.launch.py \\\n", + " ground_truth:=true start_aic_engine:=true\n", + "```\n", + "Our Docker approach already handles Terminals 0+2 inside the container." + ] + }, + { + "cell_type": "markdown", + "id": "a0000007", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 6. Teleoperation & Data Collection Pipeline\n", + "\n", + "Yes — fully supported via `aic_utils/lerobot_robot_aic/`.\n", + "\n", + "### Three teleop modes:\n", + "```bash\n", + "cd ~/projects/Project-Automaton/References/aic\n", + "\n", + "# Keyboard cartesian (recommended)\n", + "pixi run lerobot-teleoperate \\\n", + " --robot.type=aic_controller --robot.id=aic \\\n", + " --teleop.type=aic_keyboard_ee --teleop.id=aic \\\n", + " --robot.teleop_target_mode=cartesian \\\n", + " --robot.teleop_frame_id=base_link \\\n", + " --display_data=true\n", + "\n", + "# SpaceMouse cartesian\n", + "# --teleop.type=aic_spacemouse\n", + "\n", + "# Joint-space keyboard\n", + "# --teleop.type=aic_keyboard_joint --robot.teleop_target_mode=joint\n", + "```\n", + "\n", + "### Keyboard map (cartesian):\n", + "| Key | Motion | Key | Motion |\n", + "|-----|--------|-----|--------|\n", + "| w/s | ±linear y | r/f | ±linear z |\n", + "| a/d | ±linear x | q/e | ±angular z |\n", + "| Shift+w/s | ±angular x | Shift+a/d | ±angular y |\n", + "| t | toggle slow/fast | | |\n", + "\n", + "### Record demos:\n", + "```bash\n", + "pixi run lerobot-record \\\n", + " --robot.type=aic_controller --robot.id=aic \\\n", + " --teleop.type=aic_keyboard_ee --teleop.id=aic \\\n", + " --robot.teleop_target_mode=cartesian --robot.teleop_frame_id=base_link \\\n", + " --dataset.repo_id=${HF_USER}/aic_demo_v0 \\\n", + " --dataset.single_task=aic_insert \\\n", + " --dataset.push_to_hub=false --dataset.private=true \\\n", + " --display_data=true\n", + "# Right Arrow = next episode, Left Arrow = re-record, ESC = stop\n", + "```\n", + "\n", + "### Train:\n", + "```bash\n", + "pixi run lerobot-train \\\n", + " --dataset.repo_id=${HF_USER}/aic_demo_v0 \\\n", + " --policy.type=act \\\n", + " --output_dir=outputs/train/act_v0 \\\n", + " --policy.device=cuda \\\n", + " --policy.repo_id=${HF_USER}/my_act_policy\n", + "```\n", + "\n", + "### Three data sources to build:\n", + "| Source | How | Strength |\n", + "|--------|-----|----------|\n", + "| CheatCode + `ground_truth:=true` | Auto-runs all trials, saves rosbags | Fast GT demos at scale, perfect insertions |\n", + "| `lerobot-record` keyboard teleop | Manual in Gazebo | High-quality human demos, realistic approach strategies |\n", + "| Isaac Lab `record_demos.py` | Teleop in Isaac | Sim diversity, domain randomisation |\n", + "\n", + "**Pro tip:** Use CheatCode to collect hundreds of GT demo trajectories first. These are essentially perfect demonstrations you can use as imitation learning targets." + ] + }, + { + "cell_type": "markdown", + "id": "a0000008", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 7. ACT (RunACT) — Hyperparameters & What to Change\n", + "\n", + "### Current RunACT.py — what it does\n", + "- Downloads `grkw/aic_act_policy` from HuggingFace\n", + "- Uses 3 wrist camera images + 26-dim robot state as input\n", + "- Outputs 7-dim velocity action (6 Cartesian DOF + gripper)\n", + "- Runs at ~4Hz for 30 seconds\n", + "- **Ignores `task.plug_type` entirely** — same behaviour for all trials\n", + "\n", + "### Input state vector (26 dims) — order must match training:\n", + "```\n", + "TCP position (3): tcp_pose.position.x/y/z\n", + "TCP orientation (4): tcp_pose.orientation.x/y/z/w (quaternion)\n", + "TCP linear vel (3): tcp_velocity.linear.x/y/z\n", + "TCP angular vel (3): tcp_velocity.angular.x/y/z\n", + "TCP error (6): controller_state.tcp_error[0:6]\n", + "Joint positions (7): joint_states.position[0:7]\n", + "```\n", + "\n", + "### Image preprocessing:\n", + "```python\n", + "image_scaling = 0.25 # resize to 25% of raw resolution\n", + "# Then per-camera normalise: (pixel/255 - mean) / std\n", + "# Stats loaded from policy_preprocessor_step_3_normalizer_processor.safetensors\n", + "```\n", + "\n", + "### Parameters to customise if retraining:\n", + "\n", + "| Parameter | Location in RunACT.py | Current value | Notes |\n", + "|-----------|----------------------|---------------|-------|\n", + "| `repo_id` | line 62 | `\"grkw/aic_act_policy\"` | Change to your HF repo |\n", + "| `image_scaling` | line 131 | `0.25` | **Must match training** |\n", + "| State vector order | `prepare_observations()` lines 202–227 | 26-dim as above | Must exactly match training data |\n", + "| Inference duration | line 251 | `30.0` seconds | Tune per your policy |\n", + "| Control rate sleep | line 292 | `0.25s` (~4Hz) | Interacts with ACT chunk size |\n", + "| Normalization stats | `.safetensors` file | Baked into checkpoint | Auto-loaded from checkpoint |\n", + "\n", + "### ACT training hyperparameters (LeRobot defaults for ACT):\n", + "```bash\n", + "pixi run lerobot-train \\\n", + " --policy.type=act \\\n", + " --policy.chunk_size=100 # action chunk length (key ACT hyperparam)\n", + " --policy.n_action_steps=100 # how many steps to execute per inference\n", + " --policy.hidden_dim=512 # transformer hidden dim\n", + " --policy.dim_feedforward=3200 # FFN width\n", + " --policy.n_heads=8 # attention heads\n", + " --policy.n_encoder_layers=4 # encoder depth\n", + " --policy.n_decoder_layers=7 # decoder depth\n", + " --policy.dropout=0.1 # regularisation\n", + " --policy.kl_weight=10.0 # CVAE KL loss weight\n", + " --training.batch_size=8 # keep small if memory limited\n", + " --training.num_workers=4 # dataloader workers\n", + " --training.lr=1e-5 # learning rate\n", + " --training.num_epochs=100 # train longer for better results\n", + "```\n", + "\n", + "### Critical gap to fix in RunACT:\n", + "Current RunACT **ignores the Task message**. For a real submission either:\n", + "1. Branch on `task.plug_type` and load different checkpoints (SFP vs SC)\n", + "2. Train a **single unified policy** that generalises across both from vision\n", + "\n", + "Option 2 is harder but only needs one checkpoint. Option 1 is easier but requires two trained models.\n", + "\n", + "### Cloud evaluator memory constraint:\n", + "Eval runs on **NVIDIA L4 (24 GB VRAM)**. Your 5090 has 32 GB. Check before submitting:\n", + "```python\n", + "total = sum(p.numel() for p in self.policy.parameters())\n", + "print(f\"Model params: {total/1e6:.1f}M\")\n", + "# ACT at default size ~80-120M params — should be fine on L4\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "a0000009", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 8. Policy Design Notes — Per Policy\n", + "\n", + "| Policy | GT needed? | Eval-legal? | Tier 2 expected | Tier 3 expected | Key gotcha |\n", + "|--------|-----------|-------------|-----------------|-----------------|------------|\n", + "| WaveArm | No | Yes | ~21 (proximity luck) | ~10-19 (proximity) | Doesn't attempt insertion |\n", + "| CheatCode | **Yes** | **No** | High (20+) | 75 (full insert) | Uses `/scoring/tf` — blocked by Zenoh ACL in eval |\n", + "| GentleGiant | No | Yes | 0 (plug not near port) | 0 | Doesn't attempt insertion |\n", + "| SpeedDemon | No | Yes | −12 force penalty | 0 | Oscillates violently |\n", + "| WallToucher | No | Yes | −24 contact penalty | 0 | Arm extends into enclosure |\n", + "| WallPresser | No | Yes | −12 force + −24 contact | 0 | Presses arm into wall |\n", + "| RunACT | No | Yes | varies | varies | Ignores task.plug_type, 30s hardcoded |\n", + "\n", + "### CheatCode strategy (the gold standard to replicate without GT):\n", + "1. **Phase 1 — Approach (5s):** 100 steps × 50ms, interpolate to 20cm above port + orient plug via quaternion slerp\n", + "2. **Phase 2 — Descent:** Lower 0.5mm per step, PI controller corrects XY drift\n", + "3. **Stop** when z_offset < −15mm (past port surface)\n", + "4. **Hold 5s** to stabilise\n", + "\n", + "To replicate without GT: replace `tf_buffer.lookup_transform(\"base_link\", port_frame)` with **camera-based port detection** outputting the same (x, y, z) target.\n", + "\n", + "### Control mode trade-offs:\n", + "\n", + "| Mode | Stiffness | Damping | Character |\n", + "|------|-----------|---------|----------|\n", + "| GentleGiant | 50 | 40 | Slow, smooth, low jerk — good for Tier 2 smoothness |\n", + "| Default | 90/50 | 50/20 | Balanced |\n", + "| SpeedDemon | 500 | 5 | Fast, oscillates, triggers force penalty |\n", + "| Insertion phase | 30/20 | 20/10 | Compliant — let contact forces guide the plug |" + ] + }, + { + "cell_type": "markdown", + "id": "a0000010", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 9. Observation API — What Your Policy Actually Receives\n", + "\n", + "```python\n", + "obs = get_observation() # up to 20Hz, time-synchronised\n", + "\n", + "# Three wrist cameras (Basler)\n", + "obs.left_image # sensor_msgs/Image\n", + "obs.center_image\n", + "obs.right_image\n", + "obs.left_camera_info # sensor_msgs/CameraInfo (intrinsics, distortion)\n", + "obs.center_camera_info\n", + "obs.right_camera_info\n", + "\n", + "# Force/Torque (Axia80 M20, 6-axis, tared at startup ~= 0N baseline)\n", + "obs.wrist_wrench.wrench.force.x/y/z # Newtons\n", + "obs.wrist_wrench.wrench.torque.x/y/z # Nm\n", + "\n", + "# Joint state (6 arm + gripper = 7)\n", + "obs.joint_states.name # ['shoulder_pan_joint', ...]\n", + "obs.joint_states.position # radians\n", + "obs.joint_states.velocity # rad/s\n", + "\n", + "# Controller state\n", + "obs.controller_state.tcp_pose # current gripper TCP Pose\n", + "obs.controller_state.tcp_velocity # Twist (linear + angular)\n", + "obs.controller_state.tcp_error # 6-element position/orientation error\n", + "```\n", + "\n", + "**What you do NOT get in eval:**\n", + "- Ground truth TF of port/plug positions (blocked by Zenoh ACL)\n", + "- Depth images (no official depth stream)\n", + "- Access to `/gazebo`, `/gz_server`, `/scoring` topics" + ] + }, + { + "cell_type": "markdown", + "id": "a0000011", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## 10. Next Steps Priority Order\n", + "\n", + "1. **Run CheatCode** with `ground_truth:=true` — get the 75pt baseline score + collect demo data\n", + "2. **Run remaining example policies** (GentleGiant, SpeedDemon, WallToucher, WallPresser) to understand scoring edge cases\n", + "3. **Build `collect_scores.py`** — parse all `scoring.yaml` files into one CSV for comparison\n", + "4. **Data collection sprint** — use CheatCode + lerobot-record to build a demo dataset\n", + "5. **Retrain ACT** on your own data, fix the Task message branching (SFP vs SC)\n", + "6. **Replace CheatCode's GT lookup** with camera-based perception — first eval-legal insertion attempt\n", + "\n", + "**Key milestone:** First eval-legal policy that earns Tier 3 > 0 in Gazebo (gets near/into the port without ground truth)." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/aws/install_dcv.sh b/aws/install_dcv.sh new file mode 100755 index 0000000..ffeed63 --- /dev/null +++ b/aws/install_dcv.sh @@ -0,0 +1,26 @@ +#!/usr/bin/env bash +# install_dcv.sh — Install NICE DCV remote desktop on Ubuntu 24.04 +set -eo pipefail + +echo "=== Installing Ubuntu Desktop ===" +sudo apt-get update -qq +sudo apt-get install -y ubuntu-desktop + +echo "=== Installing NICE DCV ===" +cd /tmp +ARCH=$(dpkg --print-architecture) +# Use ubuntu2404 for Noble; fall back to ubuntu2204 +DCV_URL="https://d1uj6qtbmh3dt5.cloudfront.net/nice-dcv-ubuntu2404-${ARCH}.tgz" +wget -q "$DCV_URL" -O nice-dcv.tgz 2>/dev/null \ + || wget -q "https://d1uj6qtbmh3dt5.cloudfront.net/nice-dcv-ubuntu2204-${ARCH}.tgz" -O nice-dcv.tgz +tar -xzf nice-dcv.tgz +cd nice-dcv-*/ +sudo apt-get install -y ./nice-dcv-server_*.deb ./nice-dcv-web-viewer_*.deb + +echo "=== Starting DCV ===" +sudo systemctl enable --now dcvserver + +echo "=== Set the ubuntu user password (required for DCV login) ===" +sudo passwd ubuntu + +echo "DCV installed. Connect via https://:8443" diff --git a/aws/install_devtools.sh b/aws/install_devtools.sh new file mode 100755 index 0000000..63067a5 --- /dev/null +++ b/aws/install_devtools.sh @@ -0,0 +1,42 @@ +#!/usr/bin/env bash +# install_devtools.sh — Install dev tools (Node via NVM, Claude Code) + +set -eo pipefail + +echo "=== Installing prerequisites ===" +sudo apt-get update -qq +sudo apt-get install -y curl ca-certificates build-essential + +# Install NVM +if [ ! -d "$HOME/.nvm" ]; then + echo "=== Installing NVM ===" + curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash +fi + +export NVM_DIR="$HOME/.nvm" +[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" + +# Install Node +echo "=== Installing Node.js (LTS) ===" +nvm install --lts +nvm use --lts +nvm alias default 'lts/*' + +# Install Claude +echo "=== Installing Claude Code ===" +npm install -g @anthropic-ai/claude-code + +# Persist NVM +if ! grep -q 'NVM_DIR' "$HOME/.bashrc"; then + echo "=== Updating ~/.bashrc ===" + cat << 'EOF' >> "$HOME/.bashrc" + +# NVM setup +export NVM_DIR="$HOME/.nvm" +[ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh" +EOF +fi + +echo "=== Done ===" +echo "Run: source ~/.bashrc" +echo "Then: claude" diff --git a/aws/install_sim.sh b/aws/install_sim.sh new file mode 100755 index 0000000..39973b5 --- /dev/null +++ b/aws/install_sim.sh @@ -0,0 +1,205 @@ +#!/usr/bin/env bash +# install_sim.sh — Install all 3 AIC simulation lanes +# Lane 1: Gazebo (distrobox + pixi) +# Lane 2: MuJoCo (ROS 2 colcon + SDF→MJCF conversion) +# Lane 3: Isaac Lab (Docker) +# +# Usage: bash install_sim.sh [--skip-isaac-build] +set -eo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +AIC="$REPO_ROOT/References/aic" +WS="$HOME/ws_aic" +ISAACLAB="$HOME/IsaacLab" +MUJOCO_WORLD="$HOME/aic_mujoco_world" +MJCF="$AIC/aic_utils/aic_mujoco/mjcf" +DISPLAY="${DISPLAY:-:1}"; export DISPLAY +SHELL_RC="$HOME/.bashrc" + +SKIP_ISAAC_BUILD=false +[[ "${1:-}" == "--skip-isaac-build" ]] && SKIP_ISAAC_BUILD=true + +die() { echo "[FAIL] $*" >&2; exit 1; } + +[[ -d "$AIC" ]] || die "AIC submodule not found at $AIC. Run: git submodule update --init --recursive" + +# ── System packages ────────────────────────────────────────────────────────── +echo "=== System packages ===" +sudo apt-get update -qq +# Remove Ubuntu vcstool if present (conflicts with ROS python3-vcstool) +dpkg -l vcstool 2>/dev/null | grep -q "^ii" && sudo dpkg --remove --force-remove-reinstreq vcstool 2>/dev/null && sudo apt --fix-broken install -y 2>/dev/null || true +sudo apt-get install -y \ + curl wget git ca-certificates gnupg lsb-release \ + build-essential python3 python3-pip g++-14 gcc-14 \ + distrobox x11-xserver-utils tmux + +# ── NVIDIA Container Toolkit ───────────────────────────────────────────────── +echo "=== NVIDIA Container Toolkit ===" +if ! dpkg -l nvidia-container-toolkit &>/dev/null; then + curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ + | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg 2>/dev/null || true + curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ + | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ + | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list >/dev/null + sudo apt-get update -qq && sudo apt-get install -y nvidia-container-toolkit + sudo nvidia-ctk runtime configure --runtime=docker + sudo systemctl restart docker +fi + +# ── Pixi ───────────────────────────────────────────────────────────────────── +echo "=== Pixi ===" +if ! command -v pixi &>/dev/null; then + curl -fsSL https://pixi.sh/install.sh | sh +fi +export PATH="$HOME/.pixi/bin:$PATH" +grep -q '/.pixi/bin' "$SHELL_RC" 2>/dev/null \ + || echo 'export PATH="$HOME/.pixi/bin:$PATH"' >> "$SHELL_RC" + +# ── Pixi workspace ────────────────────────────────────────────────────────── +echo "=== Pixi workspace ===" +# bwrap needs setuid for rattler-build's sandbox to create user namespaces +[[ -u /usr/bin/bwrap ]] || sudo chmod u+s /usr/bin/bwrap +mkdir -p "$WS/src" +[[ -e "$WS/src/aic" ]] || ln -s "$AIC" "$WS/src/aic" +cd "$WS/src/aic" && pixi install + +# ── Lane 1: Gazebo distrobox ──────────────────────────────────────────────── +echo "=== Lane 1: Gazebo (distrobox) ===" +export DBX_CONTAINER_MANAGER=docker +docker pull ghcr.io/intrinsic-dev/aic/aic_eval:latest +distrobox list 2>/dev/null | grep -q "aic_eval" \ + || distrobox create -r --nvidia -i ghcr.io/intrinsic-dev/aic/aic_eval:latest aic_eval + +# ── Lane 2: ROS 2 Kilted + MuJoCo ────────────────────────────────────────── +echo "=== Lane 2: ROS 2 Kilted ===" +if ! dpkg -l ros-kilted-ros-base 2>/dev/null | grep -q "^ii"; then + sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key \ + -o /usr/share/keyrings/ros-archive-keyring.gpg + echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] \ + http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo "$UBUNTU_CODENAME") main" \ + | sudo tee /etc/apt/sources.list.d/ros2.list >/dev/null + sudo apt-get update -qq + sudo apt-get install -y ros-kilted-desktop python3-rosdep python3-colcon-common-extensions + sudo rosdep init 2>/dev/null || true + rosdep update --rosdistro kilted 2>/dev/null || true +fi + +echo "=== Lane 2: SDFormat bindings ===" +if [[ ! -f /usr/share/keyrings/pkgs-osrf-archive-keyring.gpg ]]; then + sudo wget -q https://packages.osrfoundation.org/gazebo.gpg \ + -O /usr/share/keyrings/pkgs-osrf-archive-keyring.gpg + echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/pkgs-osrf-archive-keyring.gpg] \ + http://packages.osrfoundation.org/gazebo/ubuntu-stable $(lsb_release -cs) main" \ + | sudo tee /etc/apt/sources.list.d/gazebo-stable.list >/dev/null + sudo apt-get update -qq +fi +sudo apt-get install -y python3-sdformat16 python3-gz-math9 libsdformat16 + +echo "=== Lane 2: MuJoCo colcon build ===" +cd "$WS" +[[ -d "$WS/src/mujoco" ]] || (cd "$WS/src" && vcs import < aic/aic_utils/aic_mujoco/mujoco.repos) +source /opt/ros/kilted/setup.bash +rosdep install --from-paths src --ignore-src --rosdistro kilted -yr \ + --skip-keys "gz-cmake3 DART libogre-dev libogre-next-2.3-dev" 2>/dev/null || true +export CC=gcc-14 CXX=g++-14 +GZ_BUILD_FROM_SOURCE=1 colcon build \ + --cmake-args -DCMAKE_BUILD_TYPE=Release \ + --merge-install --symlink-install \ + --packages-ignore lerobot_robot_aic 2>&1 | tail -5 +source "$WS/install/setup.bash" + +echo "=== Lane 2: SDF→MJCF conversion ===" +mkdir -p "$MUJOCO_WORLD" +xhost +local:docker 2>/dev/null || true + +# Start Gazebo to export expanded SDF +docker rm -f aic_sdf_export 2>/dev/null || true +docker run -d --rm --name aic_sdf_export --network host --gpus all \ + -e DISPLAY="$DISPLAY" -v /tmp/.X11-unix:/tmp/.X11-unix \ + ghcr.io/intrinsic-dev/aic/aic_eval:latest \ + spawn_task_board:=true spawn_cable:=true cable_type:=sfp_sc_cable \ + attach_cable_to_gripper:=true ground_truth:=true start_aic_engine:=false +echo " Waiting 25s for Gazebo ..." && sleep 25 + +docker exec aic_sdf_export bash -c ' + source /ws_aic/install/setup.bash + gz service -s /world/aic_world/generate_world_sdf \ + --reqtype gz.msgs.SdfGeneratorConfig --reptype gz.msgs.StringMsg --timeout 10000 \ + -r "global_entity_gen_config { expand_include_tags { data: true } }" \ + | sed "s/data: \"//" | sed "s/\"$//" | sed "s/\\\\n/\n/g" | sed "s/\\\\\x27/'"'"'/g" \ + > /tmp/aic_expanded.sdf +' +docker cp aic_sdf_export:/tmp/aic_expanded.sdf /tmp/aic_expanded.sdf +docker rm -f aic_sdf_export 2>/dev/null || true +[[ -s /tmp/aic_expanded.sdf ]] || die "SDF export failed" + +# Fix SDF paths +python3 -c " +import re +with open('/tmp/aic_expanded.sdf') as f: c = f.read() +c = re.sub(r'.*?file://.*?', '', c, flags=re.DOTALL) +c = c.replace('file:///model://', 'model://') +for name in ['lc_plug_visual.glb','sc_plug_visual.glb','sfp_module_visual.glb']: + label = name.replace('_visual.glb','').replace('_',' ').title().replace('Sfp','SFP').replace('Lc','LC').replace('Sc','SC') + c = c.replace(f'file:///{name}', f'model://{label}/{name}') +with open('/tmp/aic_expanded.sdf','w') as f: f.write(c) +" + +# Extract UR5e meshes from Docker image +if [[ ! -d "$MUJOCO_WORLD/ur_description" ]]; then + docker create --name aic_mesh_copy ghcr.io/intrinsic-dev/aic/aic_eval:latest + docker cp aic_mesh_copy:/ws_aic/install/share/ur_description "$MUJOCO_WORLD/ur_description" + docker rm aic_mesh_copy +fi + +# Convert SDF→MJCF +source "$WS/install/setup.bash" +export GZ_SIM_RESOURCE_PATH="$AIC/aic_assets/models:$MUJOCO_WORLD" +sdf2mjcf /tmp/aic_expanded.sdf "$MUJOCO_WORLD/aic_world.xml" 2>&1 | tail -3 + +# Copy to mjcf dir and run add_cable_plugin.py +cp "$MUJOCO_WORLD"/*.xml "$MUJOCO_WORLD"/*.obj "$MUJOCO_WORLD"/*.png "$MUJOCO_WORLD"/*.stl "$MJCF/" 2>/dev/null || true +cd "$AIC" +python3 aic_utils/aic_mujoco/scripts/add_cable_plugin.py \ + --input "$MJCF/aic_world.xml" --output "$MJCF/aic_world.xml" \ + --robot_output "$MJCF/aic_robot.xml" --scene_output "$MJCF/scene.xml" 2>&1 | tail -3 + +# ── Lane 3: Isaac Lab ─────────────────────────────────────────────────────── +echo "=== Lane 3: Isaac Lab ===" +if [[ ! -d "$ISAACLAB" ]]; then + git clone https://github.com/isaac-sim/IsaacLab.git "$ISAACLAB" + cd "$ISAACLAB" && git checkout v2.3.2 2>/dev/null || true +fi +[[ -d "$ISAACLAB/aic" ]] || git clone https://github.com/intrinsic-dev/aic.git "$ISAACLAB/aic" + +ASSETS="$ISAACLAB/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task/aic_task/tasks/manager_based/aic_task/Intrinsic_assets" +[[ -d "$ASSETS" ]] || echo "[WARN] Intrinsic_assets missing — download from NVIDIA developer portal and extract to: $ASSETS" + +if [[ "$SKIP_ISAAC_BUILD" == false ]]; then + cd "$ISAACLAB" + echo " Building IsaacLab Docker image (~20-30 min) ..." + echo "y" | python3 docker/container.py build base +fi + +# ── Shell aliases ──────────────────────────────────────────────────────────── +echo "=== Shell aliases ===" +if ! grep -q 'aic-policy()' "$SHELL_RC" 2>/dev/null; then +cat >> "$SHELL_RC" << 'ALIASES' + +# ── AIC sim aliases ────────────────────────────────────────────────────────── +export DBX_CONTAINER_MANAGER=docker +alias aic-eval-gt='distrobox enter -r aic_eval -- /entrypoint.sh ground_truth:=true start_aic_engine:=true' +alias aic-eval-no-gt='distrobox enter -r aic_eval -- /entrypoint.sh ground_truth:=false start_aic_engine:=true' +aic-policy() { + local policy="${1:-aic_example_policies.ros.WaveArm}" + [[ "$policy" != *.* ]] && policy="aic_example_policies.ros.$policy" + cd "$HOME/ws_aic/src/aic" + pixi run ros2 run aic_model aic_model --ros-args -p use_sim_time:=true -p policy:="$policy" +} +alias aic-zenoh='source "$HOME/ws_aic/install/setup.bash" && export RMW_IMPLEMENTATION=rmw_zenoh_cpp && export ZENOH_CONFIG_OVERRIDE="transport/shared_memory/enabled=true" && ros2 run rmw_zenoh_cpp rmw_zenohd' +alias aic-mujoco='source "$HOME/ws_aic/install/setup.bash" && export RMW_IMPLEMENTATION=rmw_zenoh_cpp && export ZENOH_CONFIG_OVERRIDE="transport/shared_memory/enabled=true" && ros2 launch aic_mujoco aic_mujoco_bringup.launch.py' +alias aic-isaac='cd "$HOME/IsaacLab" && echo y | python3 docker/container.py start base && python3 docker/container.py enter base' +ALIASES +fi + +echo "=== install_sim.sh complete ===" diff --git a/aws/setup.sh b/aws/setup.sh new file mode 100755 index 0000000..9656ade --- /dev/null +++ b/aws/setup.sh @@ -0,0 +1,48 @@ +#!/usr/bin/env bash +# setup.sh — Master script: installs everything then runs headless tests +# +# Usage: +# bash setup.sh # full install + headless tests +# bash setup.sh --skip-install # headless tests only +# bash setup.sh --skip-isaac-build # skip 30-min Isaac Lab Docker build +# bash setup.sh --test-gui # run GUI tests after headless +set -eo pipefail + +DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +cd "$HOME/Project-Automaton" +echo "=== Submodules ===" +git submodule update --init --recursive +SKIP_INSTALL=false +SKIP_ISAAC_BUILD=false +TEST_GUI=false + +for arg in "$@"; do + case $arg in + --skip-install) SKIP_INSTALL=true ;; + --skip-isaac-build) SKIP_ISAAC_BUILD=true ;; + --test-gui) TEST_GUI=true ;; + esac +done + +echo "╔══════════════════════════════════════════╗" +echo "║ AIC Setup — Project Automaton ║" +echo "╚══════════════════════════════════════════╝" +echo "" + +if [[ "$SKIP_INSTALL" == false ]]; then + ISAAC_FLAG="" + [[ "$SKIP_ISAAC_BUILD" == true ]] && ISAAC_FLAG="--skip-isaac-build" + bash "$DIR/install_sim.sh" $ISAAC_FLAG +fi + +echo "" +bash "$DIR/test_headless.sh" + +if [[ "$TEST_GUI" == true ]]; then + echo "" + bash "$DIR/test_gui.sh" +fi + +echo "" +echo "Done. Source ~/.bashrc for aliases: aic-eval-gt, aic-policy, aic-mujoco, aic-isaac" diff --git a/aws/test_gui.sh b/aws/test_gui.sh new file mode 100755 index 0000000..881033c --- /dev/null +++ b/aws/test_gui.sh @@ -0,0 +1,107 @@ +#!/usr/bin/env bash +# test_gui.sh — Visual tests on DCV desktop (requires DISPLAY) +# Lane 1: Gazebo + CheatCode (Gazebo GUI) +# Lane 2: MuJoCo viewer + CheatCode via ros2_control +# Lane 3: Isaac Lab random_agent (Isaac Sim GUI) +# +# Usage: bash test_gui.sh [lane1|lane2|lane3] (default: all) +set -eo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +AIC="$REPO_ROOT/References/aic" +WS="$HOME/ws_aic" +ISAACLAB="$HOME/IsaacLab" +DISPLAY="${DISPLAY:-:1}"; export DISPLAY +export DBX_CONTAINER_MANAGER=docker +export PATH="$HOME/.pixi/bin:$PATH" + +LANE="${1:-all}" +xhost +local:docker 2>/dev/null || true + +run_lane1() { + echo "=== Lane 1: Gazebo + CheatCode (GUI) ===" + tmux kill-session -t aic_gz_gui 2>/dev/null || true + tmux new-session -d -s aic_gz_gui -x 200 -y 40 + + tmux send-keys -t aic_gz_gui:0 \ + "export DBX_CONTAINER_MANAGER=docker DISPLAY=$DISPLAY && \ + distrobox enter -r aic_eval -- /entrypoint.sh ground_truth:=true start_aic_engine:=true" Enter + sleep 40 + + tmux split-window -v -t aic_gz_gui:0 + tmux send-keys -t aic_gz_gui:0.1 \ + "export PATH=$HOME/.pixi/bin:\$PATH && cd $WS/src/aic && \ + pixi run ros2 run aic_model aic_model --ros-args \ + -p use_sim_time:=true -p policy:=aic_example_policies.ros.CheatCode" Enter + + echo " Gazebo running in tmux session 'aic_gz_gui'" + echo " Attach: tmux attach -t aic_gz_gui" + echo " Kill: tmux kill-session -t aic_gz_gui" +} + +run_lane2() { + echo "=== Lane 2: MuJoCo + CheatCode (GUI) ===" + tmux kill-session -t aic_mj_gui 2>/dev/null || true + tmux new-session -d -s aic_mj_gui -x 200 -y 40 + + # Pane 0: Zenoh router + tmux send-keys -t aic_mj_gui:0 \ + "source $WS/install/setup.bash && \ + export RMW_IMPLEMENTATION=rmw_zenoh_cpp ZENOH_CONFIG_OVERRIDE='transport/shared_memory/enabled=true' && \ + ros2 run rmw_zenoh_cpp rmw_zenohd" Enter + sleep 5 + + # Pane 1: MuJoCo bringup (includes viewer) + tmux split-window -v -t aic_mj_gui:0 + tmux send-keys -t aic_mj_gui:0.1 \ + "export DISPLAY=$DISPLAY && source $WS/install/setup.bash && \ + export RMW_IMPLEMENTATION=rmw_zenoh_cpp ZENOH_CONFIG_OVERRIDE='transport/shared_memory/enabled=true' && \ + ros2 launch aic_mujoco aic_mujoco_bringup.launch.py" Enter + sleep 20 + + # Pane 2: CheatCode policy + tmux split-window -v -t aic_mj_gui:0 + tmux send-keys -t aic_mj_gui:0.2 \ + "export PATH=$HOME/.pixi/bin:\$PATH && cd $WS/src/aic && \ + pixi run ros2 run aic_model aic_model --ros-args \ + -p use_sim_time:=true -p policy:=aic_example_policies.ros.CheatCode" Enter + + echo " MuJoCo running in tmux session 'aic_mj_gui'" + echo " Attach: tmux attach -t aic_mj_gui" + echo " Kill: tmux kill-session -t aic_mj_gui" +} + +run_lane3() { + echo "=== Lane 3: Isaac Lab random_agent (GUI) ===" + docker start isaac-lab-base 2>/dev/null \ + || (cd "$ISAACLAB" && echo "y" | python3 docker/container.py start base 2>/dev/null) || true + + # Copy Intrinsic_assets if available + ASSETS="$ISAACLAB/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task/aic_task/tasks/manager_based/aic_task/Intrinsic_assets" + ASSETS_DEST="/workspace/isaaclab/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task/aic_task/tasks/manager_based/aic_task/Intrinsic_assets" + [[ -d "$ASSETS" ]] && docker cp "$ASSETS" "isaac-lab-base:$ASSETS_DEST" 2>/dev/null || true + + PY="/workspace/isaaclab/_isaac_sim/python.sh" + docker exec isaac-lab-base bash -c " + $PY -m pip install --no-build-isolation -q -e /workspace/isaaclab/source/isaaclab 2>/dev/null + $PY -m pip install -q -e /workspace/isaaclab/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task 2>/dev/null + " || true + + echo " Starting random_agent with GUI ..." + docker exec -d isaac-lab-base bash -c " + export DISPLAY=$DISPLAY + $PY /workspace/isaaclab/aic/aic_utils/aic_isaac/aic_isaaclab/scripts/random_agent.py \ + --task AIC-Task-v0 --num_envs 1 --enable_cameras + " + echo " Isaac Lab running inside isaac-lab-base container" + echo " Check: docker exec isaac-lab-base ps aux | grep random_agent" + echo " Kill: docker exec isaac-lab-base pkill -f random_agent" +} + +case "$LANE" in + lane1) run_lane1 ;; + lane2) run_lane2 ;; + lane3) run_lane3 ;; + all) run_lane1; run_lane2; run_lane3 ;; + *) echo "Usage: $0 [lane1|lane2|lane3|all]"; exit 1 ;; +esac diff --git a/aws/test_headless.sh b/aws/test_headless.sh new file mode 100755 index 0000000..cf589da --- /dev/null +++ b/aws/test_headless.sh @@ -0,0 +1,130 @@ +#!/usr/bin/env bash +# test_headless.sh — Headless verification of all 3 sim lanes +# Lane 1: Gazebo + CheatCode → scoring.yaml +# Lane 2: MuJoCo scene loads in Python +# Lane 3: Isaac Lab → AIC-Task-v0 registered +set -eo pipefail + +REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +AIC="$REPO_ROOT/References/aic" +WS="$HOME/ws_aic" +ISAACLAB="$HOME/IsaacLab" +RESULTS="$HOME/aic_results" +DISPLAY="${DISPLAY:-:1}"; export DISPLAY +export DBX_CONTAINER_MANAGER=docker +export PATH="$HOME/.pixi/bin:$PATH" + +PASS_GAZEBO=false PASS_MUJOCO=false PASS_ISAAC=false + +# ── Lane 1: Gazebo + CheatCode scoring ────────────────────────────────────── +echo "=== Test 1/3: Gazebo + CheatCode ===" +xhost +local:docker 2>/dev/null || true +distrobox list 2>/dev/null | grep -q "aic_eval" \ + || distrobox create -r --nvidia -i ghcr.io/intrinsic-dev/aic/aic_eval:latest aic_eval + +mkdir -p "$RESULTS" +rm -f "$RESULTS/scoring.yaml" +tmux kill-session -t aic_gz 2>/dev/null || true +tmux new-session -d -s aic_gz -x 200 -y 40 + +# Pane 0: Gazebo eval +tmux send-keys -t aic_gz:0 \ + "export DBX_CONTAINER_MANAGER=docker DISPLAY=$DISPLAY && \ + distrobox enter -r aic_eval -- /entrypoint.sh ground_truth:=true start_aic_engine:=true" Enter +echo " Waiting 40s for Gazebo ..." && sleep 40 + +# Pane 1: CheatCode policy +tmux split-window -v -t aic_gz:0 +tmux send-keys -t aic_gz:0.1 \ + "export PATH=$HOME/.pixi/bin:\$PATH && cd $WS/src/aic && \ + pixi run ros2 run aic_model aic_model --ros-args \ + -p use_sim_time:=true -p policy:=aic_example_policies.ros.CheatCode" Enter + +echo " Waiting up to 300s for 3 trials to complete ..." +for i in $(seq 1 60); do + # aic_engine creates scoring.yaml immediately with zeros; wait for trial_3 (all 3 done) + if grep -q "trial_3:" "$RESULTS/scoring.yaml" 2>/dev/null; then + PASS_GAZEBO=true; break + fi + sleep 5 +done +tmux kill-session -t aic_gz 2>/dev/null || true + +if $PASS_GAZEBO; then + SCORE=$(grep "^total:" "$RESULTS/scoring.yaml" 2>/dev/null | awk '{print $2}') + echo "[OK] Gazebo CheatCode — total score: $SCORE" +else + echo "[FAIL] Gazebo — 3 trials not completed within 300s" +fi + +# ── Lane 2: MuJoCo scene verification ─────────────────────────────────────── +echo "=== Test 2/3: MuJoCo scene ===" +SCENE="$AIC/aic_utils/aic_mujoco/mjcf/scene.xml" +if [[ -f "$SCENE" ]]; then + RESULT=$(python3 -c " +import mujoco +m = mujoco.MjModel.from_xml_path('$SCENE') +d = mujoco.MjData(m) +for _ in range(10): mujoco.mj_step(m, d) +print(f'OK: {m.nbody} bodies, {m.njnt} joints, {m.nu} actuators, 10 steps clean') +" 2>&1) + if echo "$RESULT" | grep -q "^OK:"; then + PASS_MUJOCO=true + echo "[OK] MuJoCo — $RESULT" + else + echo "[FAIL] MuJoCo — $RESULT" + fi +else + echo "[FAIL] MuJoCo — scene.xml not found" +fi + +# ── Lane 3: Isaac Lab AIC-Task-v0 ─────────────────────────────────────────── +echo "=== Test 3/3: Isaac Lab ===" +if docker image inspect isaac-lab-base &>/dev/null; then + # Use docker start directly (container.py start triggers rebuilds) + docker start isaac-lab-base 2>/dev/null \ + || (cd "$ISAACLAB" && echo "y" | python3 docker/container.py start base 2>/dev/null) || true + docker update --restart unless-stopped isaac-lab-base 2>/dev/null || true + + # Copy Intrinsic_assets if available on host + ASSETS="$ISAACLAB/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task/aic_task/tasks/manager_based/aic_task/Intrinsic_assets" + ASSETS_DEST="/workspace/isaaclab/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task/aic_task/tasks/manager_based/aic_task/Intrinsic_assets" + [[ -d "$ASSETS" ]] && docker cp "$ASSETS" "isaac-lab-base:$ASSETS_DEST" 2>/dev/null || true + + PY="/workspace/isaaclab/_isaac_sim/python.sh" + docker exec isaac-lab-base bash -c " + $PY -m pip install --no-build-isolation -q -e /workspace/isaaclab/source/isaaclab 2>/dev/null + $PY -m pip install -q -e /workspace/isaaclab/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task 2>/dev/null + " 2>&1 || true + + RESULT=$(docker exec isaac-lab-base bash -c " + $PY -u -c ' +from isaaclab.app import AppLauncher +import argparse +app = AppLauncher(argparse.Namespace(headless=True)) +sim_app = app.app +import gymnasium as gym, aic_task.tasks +found = [s.id for s in gym.registry.values() if \"AIC\" in s.id] +with open(\"/tmp/aic_envs.txt\",\"w\") as f: f.write(\"\n\".join(found)) +sim_app.close() +' 2>/dev/null + cat /tmp/aic_envs.txt + " 2>&1 | tail -3) + + if echo "$RESULT" | grep -q "AIC-Task-v0"; then + PASS_ISAAC=true + echo "[OK] Isaac Lab — AIC-Task-v0 registered" + else + echo "[FAIL] Isaac Lab — AIC-Task-v0 not found" + fi +else + echo "[FAIL] Isaac Lab — image not built" +fi + +# ── Summary ────────────────────────────────────────────────────────────────── +echo "" +echo "=== HEADLESS TEST RESULTS ===" +printf " %-28s %s\n" "Lane 1 · Gazebo CheatCode" "$($PASS_GAZEBO && echo 'PASS' || echo 'FAIL')" +printf " %-28s %s\n" "Lane 2 · MuJoCo scene" "$($PASS_MUJOCO && echo 'PASS' || echo 'FAIL')" +printf " %-28s %s\n" "Lane 3 · Isaac Lab envs" "$($PASS_ISAAC && echo 'PASS' || echo 'FAIL')" +echo "" diff --git a/wiki/aws/2026-03-22-aws-setup-explainer.md b/wiki/aws/2026-03-22-aws-setup-explainer.md new file mode 100644 index 0000000..4c6c295 --- /dev/null +++ b/wiki/aws/2026-03-22-aws-setup-explainer.md @@ -0,0 +1,452 @@ +# AWS Setup — SDF, Scripts, Tools, ROS Nodes & IPC +**Date:** 22 March 2026 +**Author:** Evan +**Topic:** AWS cloud dev environment — everything explained + +--- + +## Overall Goal + +The 4 scripts set up a **cloud Ubuntu instance** (AWS) so you can develop and test AIC policies without your local Windows machine. The pipeline: + +``` +AWS Ubuntu (L4 GPU) +├── Nice DCV → remote desktop (enable_dcv.sh) +├── Claude Code → AI coding assistant in terminal (set_up_dev_tools.sh) +└── Three sim lanes → (setup_sim_environments.sh + install_and_run_aic.sh) + ├── Lane 1: Gazebo/Kilted (truth, scoring) + ├── Lane 2: MuJoCo (fast controller tuning) + └── Lane 3: Isaac Lab (RL training) +``` + +--- + +## The 4 Scripts + +### `enable_dcv.sh` — Remote Desktop +Sets up **Nice DCV** (AWS's remote desktop protocol, like RDP but GPU-accelerated): +1. Installs `ubuntu-desktop` (full GNOME GUI) +2. Downloads/installs Nice DCV server + web viewer +3. Starts `dcvserver` as a systemd service +4. You connect via browser or DCV client to get a full desktop with GPU acceleration — this is how you see Gazebo's 3D viewer from your Windows machine + +### `set_up_dev_tools.sh` — Claude Code +Installs Node.js → npm → `@anthropic-ai/claude-code`. Minimal utility script. + +### `install_and_run_aic.sh` — Repo Bootstrap +```bash +git clone https://github.com/Ice-Citron/Project-Automaton +git submodule update --init --recursive +``` +Pulls down the repo including the `aic` reference submodule. Nothing else — just the entry point. + +### `setup_sim_environments.sh` — The Main Script +~570 lines. Five phases: + +| Phase | What | +|-------|------| +| 1a | APT packages (gcc-14, distrobox, tmux…) | +| 1b | NVIDIA Container Toolkit (GPU access in Docker) | +| 1c | Pixi (conda-like env manager) | +| 1d | `~/ws_aic` workspace + `pixi install` | +| 1e | `aic_eval` distrobox container (Gazebo lane) | +| 1f | ROS 2 Kilted host install + sdformat bindings | +| 1g | MuJoCo colcon build | +| 1h | IsaacLab Docker clone + build | +| 1i | Shell aliases | +| Phase 2 | **Test**: Gazebo + CheatCode (checks `scoring.yaml`) | +| Phase 3 | SDF export → MJCF conversion | +| Phase 4 | **Test**: MuJoCo + CheatCode | +| Phase 5 | **Test**: Isaac Lab `list_envs` smoke test | + +--- + +## What Is an SDF? + +**SDF = Simulation Description Format** — XML file that fully describes a robot simulation world. It's the native format for Gazebo. + +```xml + + + + + + + model://ur5e/base.dae + + + + 0.1 0.1 0.1 + + + 4.0 + ... + + + + base_link + shoulder_link + 0 0 1 + -6.286.28 + + + ... + ... + + 0.001 + + + +``` + +**Key SDF concepts:** +- `` — a robot or object (UR5e arm, task board, cable) +- `` — a rigid body with visual + collision + inertial +- `` — connects two links (revolute = rotates, fixed = rigid, prismatic = slides) +- `` — mesh for rendering (GLB, DAE, STL) +- `` — simplified shape for physics +- `` — C++ shared library that adds behavior (ros2_control, sensors) +- `` — cameras, IMU, force-torque +- `model://` URI scheme — looks up models in `GZ_SIM_RESOURCE_PATH` + +**In this project**, Gazebo exports the live scene as `/tmp/aic.sdf` which is then converted to MJCF (MuJoCo's XML format) for the MuJoCo lane. The URI corruption bugs the script fixes (``, broken `file:///` paths) happen because Gazebo's SDF exporter writes Docker-internal paths and malformed URIs that need to be rewritten for host use. + +--- + +## Distrobox + +Distrobox is a wrapper around Docker/Podman that makes containers feel like native Linux. Key difference from raw Docker: **your home directory, X11 display, GPU, and network are all shared** with the host automatically. + +``` +Host Ubuntu (AWS) +└── distrobox aic_eval + └── Docker: ghcr.io/intrinsic-dev/aic/aic_eval:latest + ├── ROS 2 Kilted (pre-installed) + ├── Gazebo Harmonic (pre-installed) + ├── aic packages (pre-built) + └── /entrypoint.sh ← starts everything +``` + +**Why distrobox instead of raw Docker?** +- Container sees `$HOME` on host → `scoring.yaml` written inside container appears on host at `~/aic_results/` +- `$DISPLAY` is shared → Gazebo's OpenGL window appears on your DCV desktop +- `--nvidia` flag passes GPU through to Gazebo for rendering +- No volume mount headaches + +**Why the eval image?** The organizers pre-baked the exact Kilted + Gazebo + AIC packages into `ghcr.io/intrinsic-dev/aic/aic_eval:latest`. This is the **identical** environment used on the competition scoring server. Running your policy against it gives you ground truth on your score. + +--- + +## GCC 14 + +The script does: +```bash +sudo apt-get install -y g++-14 gcc-14 +export CC=/usr/bin/gcc-14 CXX=/usr/bin/g++-14 +colcon build --cmake-args -DCMAKE_CXX_COMPILER=/usr/bin/g++-14 ... +``` + +**Why GCC 14 specifically?** + +`aic_adapter` uses **C++20 ``** — the `std::format()` library (like Python f-strings but typed): +```cpp +#include +std::string msg = std::format("Joint {} position: {:.3f}", name, value); +``` + +- Ubuntu 22.04's default GCC is **11** — no `` +- Ubuntu 24.04 ships GCC 13 — partial support +- GCC 14 = full C++20 + C++23 support including `` + +Without it you get: +``` +fatal error: format: No such file or directory +``` +or: +``` +error: 'std::format' was not declared in this scope +``` + +Ubuntu's toolchain PPAs let you install multiple GCC versions side by side. The `CC`/`CXX` env vars + CMake flags tell the build system to use 14 specifically without changing the system default. + +--- + +## `ws_aic` — The Workspace + +`~/ws_aic` is a **colcon workspace** (ROS 2's build system workspace). Structure: + +``` +~/ws_aic/ +├── src/ +│ ├── aic/ ← symlink to your repo's References/aic +│ │ ├── aic_model/ ← ROS package: runs policies +│ │ ├── aic_adapter/ ← ROS package: bridges sim ↔ policy +│ │ ├── aic_controller/← ROS package: ros2_control interface +│ │ ├── aic_mujoco/ ← ROS package: MuJoCo bringup +│ │ └── aic_utils/ ← MuJoCo repos, Isaac utils +│ ├── mujoco_vendor/ ← imported by vcs from mujoco.repos +│ ├── mujoco_ros2_control/ +│ └── sdformat_mjcf/ ← SDF→MJCF converter (Python package) +├── build/ ← CMake build artifacts +├── install/ ← merged install tree (--merge-install) +│ ├── setup.bash ← sources all packages into shell +│ ├── lib/ ← executables, shared libs +│ └── share/ ← URDF, meshes, launch files, params +└── log/ ← colcon build logs +``` + +`source ~/ws_aic/install/setup.bash` is the critical step that makes all ROS packages findable. Without it, `ros2 run aic_model aic_model` fails with "package not found". + +--- + +## ROS 2 Kilted + +**ROS 2** (Robot Operating System 2) is a middleware framework. It's not really an OS — it's a publish/subscribe message bus + package ecosystem + build system. + +**Kilted** is the distro codename (like Ubuntu's "Noble"). The competition requires **Kilted** specifically because that's what the eval server runs. + +**Core ROS 2 concepts used here:** + +| Concept | What | +|---------|------| +| Node | A process that communicates via ROS | +| Topic | Named pub/sub channel (like a Kafka topic) | +| Service | Request/response RPC | +| Action | Long-running goal with feedback | +| Parameter | Named config value per-node | +| Package | Unit of distribution (CMake or Python) | +| Launch file | Python script that starts multiple nodes with config | +| RMW | "ROS Middleware" — the DDS transport layer underneath | + +--- + +## CMake + colcon + +**CMake** is the build system generator. For each ROS C++ package it: +1. Reads `CMakeLists.txt` +2. Finds dependencies (`find_package(rclcpp REQUIRED)`) +3. Generates Makefiles / Ninja files +4. Invokes the compiler + +**colcon** is a meta-build tool that orchestrates CMake across many packages: +```bash +colcon build \ + --cmake-args -DCMAKE_BUILD_TYPE=Release \ # optimization level + -DCMAKE_CXX_COMPILER=/usr/bin/g++-14 \ + --merge-install \ # all outputs go to one install/ tree + --symlink-install \ # Python files: symlink not copy (instant changes) + --packages-ignore lerobot_robot_aic aic_gazebo aic_scoring aic_engine +# ^^^^ skip these — they only exist inside the distrobox container +``` + +**Build types:** +- `Debug` — no optimization, full symbols, 5-10x slower +- `Release` — `-O3`, stripped, what you use for sim + +**Common colcon errors:** + +| Error | Cause | Fix | +|-------|-------|-----| +| `Could not find package: rclcpp` | ROS not sourced | `source /opt/ros/kilted/setup.bash` first | +| `fatal error: format: No such file or directory` | GCC < 14 | Set `CC/CXX` to gcc-14/g++-14 | +| `README.md: No such file or directory` | sdformat_mjcf out-of-tree build | Script's `ln -sf` workaround | +| `vcstool: command not found` | Ubuntu's `vcstool` conflicts | Script removes Ubuntu's, installs `python3-vcstool` | +| `MUJOCO_PATH conflicts with mujoco_vendor` | Old env var from previous install | Remove from `.bashrc`, `source` it | +| `gz-cmake3 not found` | Skip it — it's Gazebo-only | `--skip-keys "gz-cmake3 DART ..."` | + +--- + +## Zenoh — The IPC Transport + +This project uses **Zenoh** instead of the default DDS (cyclonedds/fastdds): +```bash +export RMW_IMPLEMENTATION=rmw_zenoh_cpp +export ZENOH_CONFIG_OVERRIDE="transport/shared_memory/enabled=true" +ros2 run rmw_zenoh_cpp rmw_zenohd # the router daemon +``` + +**Why Zenoh?** +- Shared memory transport → zero-copy for large camera images (3x wrist cams at 20Hz = a lot of data) +- Works cleanly across distrobox container ↔ host (DDS multicast can break across namespaces) +- The competition's `aic_eval` container is pre-configured for Zenoh + +**IPC architecture:** + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ distrobox aic_eval (Docker container) │ +│ │ +│ /entrypoint.sh starts: │ +│ ├── Gazebo Harmonic (physics sim + rendering) │ +│ │ └── publishes SDF world, clock │ +│ ├── aic_engine node │ +│ │ ├── spawns task board + cable (randomized each trial) │ +│ │ ├── subscribes to /joint_states, /wrench │ +│ │ ├── publishes /ground_truth/... (TF of plug + port) │ +│ │ └── writes scoring.yaml when 3 trials complete │ +│ ├── ros2_control (controller manager) │ +│ │ ├── loads JointTrajectoryController │ +│ │ └── loads ForceTorqueSensorBroadcaster │ +│ └── aic_adapter node │ +│ ├── bridges Gazebo topics → policy-facing topics │ +│ ├── pub: /wrist_camera_{left,right,bottom}/image_raw │ +│ ├── pub: /joint_states │ +│ ├── pub: /wrench (wrist F/T sensor) │ +│ ├── pub: /controller_state │ +│ └── sub: /aic_controller/joint_trajectory │ +│ │ +└─────────────────────────────────────────────────────────────────┘ + │ Zenoh shared-memory transport (same host) + │ (rmw_zenohd router bridges namespaces) + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ Host Ubuntu (pixi environment) │ +│ │ +│ aic_model node (ros2 run aic_model aic_model) │ +│ ├── sub: /wrist_camera_*/image_raw (3x RGB @ 20Hz) │ +│ ├── sub: /joint_states (6 joint positions/vels) │ +│ ├── sub: /wrench (Fx,Fy,Fz,Tx,Ty,Tz) │ +│ ├── sub: /controller_state (mode, stiffness) │ +│ ├── [CheatCode only] sub: /ground_truth/plug_tip_pose │ +│ ├── [CheatCode only] sub: /ground_truth/port_pose │ +│ └── pub: /aic_controller/joint_trajectory → arm moves │ +│ │ +└─────────────────────────────────────────────────────────────────┘ +``` + +--- + +## The ROS Nodes In Detail + +### `aic_engine` +- Orchestrates trials (resets the scene 3 times) +- Scores each trial: did the plug approach? align? insert? +- Has access to ground truth TF (only if `ground_truth:=true`) +- Writes `scoring.yaml` on completion + +### `aic_adapter` +- The **translation layer** — converts Gazebo's internal topics to the clean policy API +- Handles clock sync (`use_sim_time:=true` → uses `/clock` topic so sim time not wall time) +- Published topics match exactly what `docs/policy.md` specifies (so your policy is sim-agnostic) + +### `aic_model` (the policy runner) +- Generic node that loads any Python class as a policy via the `policy` parameter +- Calls `policy.step(obs)` at 20 Hz +- `obs` = dict of all subscribed topics as numpy arrays +- Returns `joint_trajectory` command +- `CheatCode` uses ground truth TF → computes analytical IK → perfect insertion +- `RunACT` loads a trained checkpoint → runs inference → attempts insertion + +### ros2_control (controller manager) +- **Not a node you write** — it's the hardware abstraction layer +- In Gazebo: talks to Gazebo's joint actuators via the `gz_ros2_control` plugin in the SDF +- In MuJoCo: talks to `mujoco_ros2_control` which wraps the MuJoCo C API +- Exposes `/joint_trajectory_controller/follow_joint_trajectory` action +- `aic_adapter` wraps this into the simpler `/aic_controller/joint_trajectory` topic + +### `rmw_zenohd` (Zenoh router daemon) +- Not a ROS node — it's the transport router +- Must start **first** before any other node +- Bridges the distrobox network namespace and host +- Enables shared memory for zero-copy image transfer + +--- + +## MuJoCo Lane Specifics + +``` +Terminal 1: rmw_zenohd ← router (must be first) +Terminal 2: aic_mujoco_bringup ← MuJoCo physics + ros2_control +Terminal 3: aic_model ← your policy +``` + +MuJoCo lane uses the **same `aic_model` node and same topic API** as Gazebo. The difference is the physics engine underneath. This is the point: you tune gains/impedance in MuJoCo at 1000Hz, then those gains should work in Gazebo at 1000Hz physics / 20Hz policy. + +The SDF → MJCF conversion (`sdf2mjcf`) is what makes the scenes match. The script: +1. Launches Gazebo just long enough to export the world as SDF +2. Copies it out of the container +3. Fixes URI corruption +4. Runs `sdf2mjcf` → `aic_world.xml` +5. Runs `add_cable_plugin.py` → splits into `scene.xml` + `aic_robot.xml` + adds MuJoCo cable plugin + +--- + +## Pixi + +Pixi is a **conda-compatible package manager** that resolves the entire ROS 2 Kilted + Python dependency tree without needing to install ROS system-wide. The `pixi.toml` in the AIC repo pins exact versions. `pixi run ros2 run ...` activates the environment and runs the command inside it — no `conda activate` needed. + +**Why pixi for the policy but colcon for MuJoCo?** +- Policy (`aic_model`) = pure Python, no native compilation needed → pixi is fast +- MuJoCo lane = C++ (mujoco_vendor, aic_adapter, mujoco_ros2_control) → needs colcon + CMake to compile + +--- + +## Error Taxonomy + +### Installation errors + +| Error | Root cause | Fix | +|-------|-----------|-----| +| `vcstool` conflicts | Ubuntu ships a Python 2 `vcstool`; ROS needs `python3-vcstool` | Script removes the old one automatically | +| `docker: permission denied` | User not in `docker` group | `sudo usermod -aG docker $USER && newgrp docker` | +| `pixi not found` | PATH not updated | `export PATH="$HOME/.pixi/bin:$PATH"` | + +### Build errors + +| Error | Root cause | Fix | +|-------|-----------|-----| +| `format: No such file or directory` | GCC < 14 | Check `CC/CXX` env vars | +| `sdformat_mjcf README.md missing` | Out-of-tree build bug | Fixed by the `ln -sf` in script | +| `gz-cmake3 not found` | Gazebo CMake package only in distrobox | Use `--skip-keys` | +| `mujoco_vendor download failed` | Network issues pulling MuJoCo binary | Retry or set `MUJOCO_PATH` manually | + +### Runtime errors + +| Error | Root cause | Fix | +|-------|-----------|-----| +| `scoring.yaml not found in 300s` | Gazebo crashed on launch | Usually missing GPU or DISPLAY | +| `aic_model: package not found` | Workspace not sourced | `source ~/ws_aic/install/setup.bash` | +| `Connection refused (Zenoh)` | `rmw_zenohd` not running | Start it first | +| `DISPLAY :1 unavailable` | X server not running | Need `Xvfb :1` or DCV session active | +| `MUJOCO_* env var conflict` | Old MuJoCo install interfering with `mujoco_vendor` | Remove from `.bashrc` | +| `No module named 'sdformat'` | `python3-sdformat16` not installed | Run step 1f+ of script | + +--- + +## How It All Connects (End-to-End) + +``` +1. enable_dcv.sh + → GPU-accelerated remote desktop on AWS + +2. install_and_run_aic.sh + → repo + submodules on disk + +3. setup_sim_environments.sh Phase 1 + → GCC 14 (C++20 ) + → NVIDIA Container Toolkit (GPU in Docker) + → Pixi (Python policy env) + → distrobox aic_eval (identical to scoring server) + → ROS 2 Kilted host (needed by colcon for MuJoCo) + → colcon build with GCC 14 (compiles aic_adapter, mujoco_vendor...) + → IsaacLab Docker (RL training) + +4. Phase 3: SDF export + Gazebo inside Docker → /tmp/aic.sdf + sed fixes (URI corruption) → sdf2mjcf → MJCF XML + add_cable_plugin.py → scene.xml (MuJoCo ready) + +5. Runtime (your development loop): + aic-eval-gt → distrobox Gazebo + aic_engine + aic_adapter + aic-zenoh → Zenoh router (bridges container ↔ host) + aic-mujoco → MuJoCo physics server (same interface as Gazebo) + aic-policy CheatCode → oracle policy (needs ground_truth:=true) + aic-policy RunACT → your trained ACT student policy + +6. Scoring: + aic_engine detects plug position at 20Hz + Tier 1 (approach) → Tier 2 (alignment) → Tier 3 (insertion) → 75pt full insert + writes scoring.yaml → you read it +``` + +The entire setup exists so that **one `aic-policy RunACT` command** running on the host (with GPU access for ACT inference) can communicate over Zenoh with Gazebo inside the distrobox container, move the UR5e arm, and get scored — all identically to how the competition server will evaluate your submission. diff --git a/wiki/aws/2026-03-22-build-tools-explainer.md b/wiki/aws/2026-03-22-build-tools-explainer.md new file mode 100644 index 0000000..c291f78 --- /dev/null +++ b/wiki/aws/2026-03-22-build-tools-explainer.md @@ -0,0 +1,557 @@ +# Build Tools — Pixi, CMake, colcon, ament, rosidl, rosdep, vcs +**Date:** 22 March 2026 +**Author:** Evan +**Topic:** Every build tool in the AIC stack — what it is, why it exists, how it fits together + +--- + +## The Big Picture + +There are two completely separate build paths in this project: + +``` +Path A — Python-first (policy runtime) + pixi.toml → pixi install → pixi run ros2 run aic_model aic_model + Uses: pixi, conda-forge, robostack-kilted, PyPI + Builds: aic_model, aic_example_policies (Python packages via pixi-build-ros) + +Path B — C++-first (sim plumbing, MuJoCo lane) + package.xml → CMakeLists.txt → colcon build → ~/ws_aic/install/ + Uses: colcon, CMake, ament, rosidl, rosdep, vcs, GCC 14 + Builds: aic_adapter, aic_controller, mujoco_vendor, mujoco_ros2_control +``` + +They produce the same ROS 2 interface (topics/services/actions) but are built completely differently. Path A is for running policies. Path B is for compiling the C++ sim bridge and MuJoCo physics backend. + +--- + +## Pixi + +### What it is + +Pixi is a **cross-platform package manager** built on top of the conda ecosystem. Think of it as `cargo` (Rust) or `poetry` (Python) but for both native binaries and Python packages at the same time. Built by prefix.dev in Rust. Very fast. + +Key properties: +- **Lockfile-first** — `pixi.lock` pins every transitive dependency to an exact hash +- **Per-project environments** — stored in `.pixi/` next to `pixi.toml`, never global +- **Conda + PyPI in one file** — `[dependencies]` = conda, `[pypi-dependencies]` = pip +- **No activation step** — `pixi run ` drops straight into the env without `conda activate` + +### Why not just pip / conda directly? + +| Tool | Problem | +|------|---------| +| pip alone | Can't install native libs (ROS 2, OpenCV, Qt) | +| conda alone | Old, slow solver; no PyPI support; activation ceremony | +| apt install ros-kilted-* | Installs system-wide; fights with other ROS distros | +| pixi | Installs per-project, pins everything, works on Linux + macOS | + +### The `pixi.toml` in this project + +Root: `References/aic/pixi.toml` + +```toml +[workspace] +name = "aic" +channels = ["robostack-kilted", "conda-forge"] # search order +platforms = ["linux-64", "osx-arm64"] + +[dependencies] # conda packages +ros-kilted-rclpy = "*" +ros-kilted-rmw-zenoh-cpp = "*" +ros-kilted-ros-core = "*" +ros-kilted-aic-model = { path = "aic_model" } # local pixi package +opencv = "<4.13.0" + +[pypi-dependencies] # pip packages +lerobot = "==0.4.3" +mujoco = "==3.5.0" +huggingface-hub = { version = "==0.35.3", extras = ["hf-transfer", "cli"] } +lerobot_robot_ros = { git = "https://github.com/...", rev = "b4a635f..." } + +[activation] +scripts = ["pixi_env_setup.sh"] # runs on every pixi run +``` + +### Channels explained + +**`robostack-kilted`** — a conda channel maintained by the ROS community that repackages ROS 2 Kilted packages as conda packages. Every `ros-kilted-*` dependency in `pixi.toml` comes from here. This is what makes `pixi install` give you a full working ROS 2 without touching `/opt/ros/`. + +**`conda-forge`** — the community-maintained conda channel for everything else (OpenCV, numpy, PyTorch, cmake, gcc…). More up-to-date than the default Anaconda channel. + +**Search order matters**: pixi tries `robostack-kilted` first, then `conda-forge`. ROS packages shadow conda-forge versions of the same lib. + +### `pixi.lock` + +The lockfile records the exact URL + SHA256 hash of every package for every platform. First 60 lines show entries like: + +```yaml +- conda: https://conda.anaconda.org/conda-forge/linux-64/cmake-4.2.3-hc85cc9f_0.conda +- conda: https://conda.anaconda.org/robostack-kilted/linux-64/ros-kilted-rclpy-...conda +- conda: https://conda.anaconda.org/conda-forge/linux-64/mujoco-3.5.0-...conda +``` + +This means everyone on the team (and the CI server) gets identical binary artifacts. No "works on my machine" from package version drift. + +### pixi-build-ros + +Each ROS package (`aic_model`, `aic_example_policies`, `aic_control_interfaces`) has its own `pixi.toml` with: + +```toml +[package.build.backend] +name = "pixi-build-ros" +version = "==0.3.3.20260113.c8b6a54" +channels = ["https://prefix.dev/pixi-build-backends", "robostack-kilted", "conda-forge"] +``` + +`pixi-build-ros` is a build backend that knows how to take a ROS package (with a `package.xml`) and make it installable as a conda package. It wraps CMake/ament internally. When the root `pixi.toml` does: + +```toml +ros-kilted-aic-model = { path = "aic_model" } +``` + +pixi calls `pixi-build-ros` on the `aic_model/` directory, which: +1. Reads `aic_model/package.xml` for dependencies +2. Runs cmake/ament to build it +3. Packages the result as a conda artifact +4. Installs it into `.pixi/envs/default/` + +This is how Python-only ROS packages get into the pixi environment without colcon. + +### Common pixi commands + +```bash +pixi install # solve lockfile + download + install everything +pixi run ros2 run aic_model aic_model # run command inside the env +pixi run python scripts/foo.py # any command +pixi shell # drop into a shell with env activated +pixi list # show installed packages + versions +pixi update # re-solve + update pixi.lock +pixi add numpy # add a dependency +``` + +--- + +## CMake + +### What it is + +CMake is a **build system generator**. It doesn't compile code directly — it generates the files that actually compile code (Makefiles, Ninja build files, Visual Studio projects). + +``` +CMakeLists.txt → cmake → Makefile / build.ninja → make/ninja → .so / executable +``` + +### Why a generator instead of just Make? + +Because the same `CMakeLists.txt` can target Linux (Ninja), macOS (Xcode), or Windows (MSVC) without changes. ROS 2 uses Ninja by default for speed. + +### Anatomy of a CMakeLists.txt (from `aic_adapter`) + +```cmake +cmake_minimum_required(VERSION 3.20) # minimum CMake version +project(aic_adapter) # package name + +# Compiler warnings — -Wall -Wextra -Wpedantic +if(CMAKE_COMPILER_IS_GNUCXX ...) + add_compile_options(-Wall -Wextra -Wpedantic) +endif() + +set(CMAKE_CXX_STANDARD 20) # require C++20 + +# find_package = locate an installed library and load its CMake config +find_package(rclcpp REQUIRED) # ROS 2 C++ client library +find_package(tf2_ros REQUIRED) # transform library +find_package(aic_control_interfaces REQUIRED) # our custom messages + +# add_executable = declare what to compile +add_executable(aic_adapter src/aic_adapter.cpp) + +# target_link_libraries = what to link against +target_link_libraries(aic_adapter PUBLIC + rclcpp::rclcpp # modern CMake target (not -lrclcpp) + tf2_ros::tf2_ros + ${aic_control_interfaces_TARGETS}) # generated message targets + +# install = where to put the binary in the install tree +install(TARGETS aic_adapter + DESTINATION lib/${PROJECT_NAME}) # → install/lib/aic_adapter/aic_adapter + +ament_package() # ROS 2 ament macro (see below) +``` + +### CMake targets vs variables + +Old CMake: `target_link_libraries(foo ${RCLCPP_LIBRARIES})` — error-prone string manipulation. + +Modern CMake (what ROS 2 uses): `target_link_libraries(foo rclcpp::rclcpp)` — typed target objects that carry include paths, compile flags, transitive deps automatically. The `::` syntax is the modern target name convention. + +### `find_package` mechanics + +When you call `find_package(rclcpp REQUIRED)`, CMake looks for `rclcppConfig.cmake` or `rclcpp-config.cmake` in: +- `CMAKE_PREFIX_PATH` — which includes `~/ws_aic/install/` after sourcing `setup.bash` +- `/opt/ros/kilted/` — the system ROS install + +This is why **sourcing setup.bash before colcon build is mandatory** — without it, `find_package(rclcpp)` fails because CMake can't find the config files. + +### Build types + +Set via `-DCMAKE_BUILD_TYPE=`: + +| Type | Flags | Use | +|------|-------|-----| +| `Debug` | `-O0 -g` | Debugger, valgrind | +| `RelWithDebInfo` | `-O2 -g` | Profile with symbols | +| `Release` | `-O3 -DNDEBUG` | Normal sim use | +| `MinSizeRel` | `-Os` | Embedded, not used here | + +The script always uses `Release` for the MuJoCo build. Debug builds of MuJoCo run 5-10x slower. + +--- + +## colcon + +### What it is + +colcon (**col**lective **con**struction) is a **meta-build tool** that runs CMake (or Python setuptools) across a workspace of many packages in the correct dependency order. + +It answers: "given 20 packages that depend on each other, in what order do I build them, and how?" + +### Why not just run cmake manually? + +With 20+ packages you'd have to manually figure out the build order, run cmake + make in each directory, then set up the install paths so each package can find the others. colcon does all of this. + +### The workspace layout + +``` +~/ws_aic/ +├── src/ ← your source packages (colcon scans this) +│ ├── aic/ ← the AIC repo (symlinked) +│ ├── mujoco_vendor/ +│ └── sdformat_mjcf/ +├── build/ ← per-package build dirs (CMake runs here) +│ ├── aic_adapter/ +│ ├── aic_controller/ +│ └── ... +├── install/ ← merged install tree (all outputs here) +│ ├── setup.bash ← sources everything into your shell +│ ├── lib/ ← executables + shared libs +│ └── share/ ← URDF, launch files, meshes, params +└── log/ ← build logs (check here on failure) +``` + +### Key flags + +```bash +colcon build \ + --cmake-args -DCMAKE_BUILD_TYPE=Release \ + -DCMAKE_CXX_COMPILER=/usr/bin/g++-14 \ + --merge-install \ # single install/ tree instead of per-package + --symlink-install \ # Python/launch files: symlink → edit without rebuild + --packages-ignore aic_gazebo aic_scoring aic_engine \ # skip these + --packages-select aic_mujoco \ # only build this one + --parallel-workers 4 # build N packages in parallel +``` + +**`--merge-install`**: by default colcon makes `install/aic_adapter/`, `install/aic_controller/` etc. `--merge-install` puts everything into one `install/` — simpler PATH/LD_LIBRARY_PATH. + +**`--symlink-install`**: for Python packages, instead of copying `.py` files to `install/`, it symlinks them. This means you can edit a Python file and the change is live immediately without rebuilding. For C++ it makes no difference (you still need to recompile). + +**`--packages-ignore`**: the four ignored packages (`aic_gazebo`, `aic_scoring`, `aic_engine`, `lerobot_robot_aic`) only exist pre-compiled inside the `aic_eval` Docker image. Their source depends on Gazebo Harmonic libraries which are not installed on the host. If you try to build them on the host, cmake fails with missing Gazebo headers. + +### Dependency resolution + +colcon reads each package's `package.xml` to find `` tags, builds a DAG, then builds leaves first. Example order for this project: + +``` +1. aic_control_interfaces (no deps on aic packages) +2. aic_model_interfaces +3. aic_task_interfaces +4. aic_adapter (depends on aic_control_interfaces, aic_model_interfaces) +5. aic_controller (depends on aic_control_interfaces, hardware_interface...) +6. aic_mujoco (depends on aic_adapter install outputs) +``` + +### Reading build logs + +On failure: +```bash +cat ~/ws_aic/log/latest_build/aic_adapter/stdout_stderr.log +``` + +colcon shows `--- stderr: aic_adapter ---` in terminal but the full CMake/compiler output is in `log/`. + +--- + +## ament + +### What it is + +**ament** is ROS 2's CMake extension layer. Every ROS 2 C++ package ends with `ament_package()` in its `CMakeLists.txt`. ament provides: + +- `find_package(ament_cmake REQUIRED)` — the base ament CMake macros +- `ament_package()` — registers the package into the install tree (generates the `*Config.cmake` files that let other packages `find_package()` it) +- `ament_export_dependencies()` — propagates transitive deps to consumers +- `ament_export_targets()` — exports CMake targets so `target_link_libraries(foo bar::bar)` works from other packages + +Without ament, you'd have to manually write CMake config files for every package. ament generates them automatically. + +### ament_python + +`aic_model` is a Python package: + +```xml + + + ament_python + +``` + +For Python packages, colcon uses `ament_python` instead of `ament_cmake`. This wraps `pip install -e .` / `setup.py` and registers the Python package into `install/lib/python3.x/site-packages/` (or via symlinks with `--symlink-install`). + +### The two package types side by side + +| | `aic_adapter` (C++) | `aic_model` (Python) | +|---|---|---| +| Build type | `ament_cmake` | `ament_python` | +| `CMakeLists.txt` | Yes, required | No (or minimal) | +| `package.xml` | Yes | Yes | +| Output | `.so` + binary in `install/lib/` | `.py` files in site-packages | +| Rebuild needed after edit? | Yes (recompile) | No (with `--symlink-install`) | + +--- + +## rosidl — Message/Service/Action Code Generation + +### What it is + +`rosidl` is the **ROS Interface Definition Language** — the system that takes `.msg`, `.srv`, `.action` files and generates C++ headers + Python classes from them. + +### From `aic_control_interfaces/CMakeLists.txt` + +```cmake +set(msg_files + "msg/ControllerState.msg" + "msg/JointMotionUpdate.msg" + "msg/MotionUpdate.msg" + "msg/TargetMode.msg" + "msg/TrajectoryGenerationMode.msg" +) +set(srv_files + "srv/ChangeTargetMode.srv" +) + +rosidl_generate_interfaces(${PROJECT_NAME} + ${msg_files} + ${srv_files} + DEPENDENCIES builtin_interfaces geometry_msgs std_msgs trajectory_msgs +) +``` + +This single macro call generates: +- **C++ headers**: `install/include/aic_control_interfaces/msg/controller_state.hpp` — usable as `#include ` +- **Python module**: `aic_control_interfaces.msg.ControllerState` — importable directly +- **Serialization code**: DDS/Zenoh wire format (IDL → C bindings) +- **TypeSupport libraries**: `.so` files loaded at runtime by the RMW layer + +### A `.msg` file + +``` +# ControllerState.msg +string controller_name +float64 stiffness_translational +float64 stiffness_rotational +int32 mode +``` + +Simple key: value format. rosidl turns this into strongly-typed structs in C++ and Python dataclasses. This is what flows over Zenoh between `aic_adapter` and `aic_model`. + +### Why this matters + +When `aic_adapter` publishes `ControllerState` and `aic_model` subscribes to it, they both use the same generated types. The serialization is automatic — you just set fields and publish. + +--- + +## rosdep + +### What it is + +`rosdep` is a **system dependency installer**. It maps ROS package names to OS packages (apt, brew, etc.). + +```bash +rosdep install --from-paths src --ignore-src --rosdistro kilted -yr +``` + +This scans every `package.xml` in `src/`, looks up each `` in the rosdep database, and installs any system packages not already present. For example: + +- `eigen3` in `package.xml` → `libeigen3-dev` via apt +- `python3-numpy` → `python3-numpy` via apt +- `rclcpp` → already in ROS repo, skipped (`--ignore-src`) + +**`--skip-keys "gz-cmake3 DART libogre-dev libogre-next-2.3-dev"`** — these are Gazebo-specific deps that only exist in the `aic_eval` container. Skipping them prevents rosdep from failing on the host. + +--- + +## vcs (vcstool) + +### What it is + +`vcs` is a **multi-repo source control tool** — like `repo` (Android) but simpler. It reads a `.repos` YAML file and clones/updates each repository at a specific commit. + +### From the script + +```bash +vcs import --skip-existing < aic/aic_utils/aic_mujoco/mujoco.repos +``` + +`mujoco.repos` looks like: + +```yaml +repositories: + gazebo/gz-mujoco: + type: git + url: https://github.com/gazebosim/gz-mujoco + version: main + mujoco_ros2_control: + type: git + url: https://github.com/taDachs/mujoco_ros2_control + version: some-sha +``` + +`vcs import` clones each repo into `src/`. This is how the MuJoCo workspace gets populated — there's no single monorepo, just a manifest of what to pull. + +**`--skip-existing`** means if the directory already exists, skip it (don't overwrite local changes). + +**Why not git submodules?** vcs allows pinning to branches or tags without the overhead of submodule tracking. It also handles heterogeneous repo types (git, mercurial, svn) though in practice only git is used here. + +--- + +## How All These Tools Interact + +### Path A: pixi (policy development) + +``` +pixi.toml + ↓ pixi install + ├── conda solver resolves full dep graph + ├── downloads conda packages from robostack-kilted + conda-forge + ├── downloads PyPI packages (lerobot, mujoco, huggingface-hub) + ├── for local packages { path = "aic_model" }: + │ calls pixi-build-ros backend + │ which runs cmake + ament internally + │ packages result as conda artifact + │ installs into .pixi/envs/default/ + └── writes resolved state to pixi.lock + +pixi run ros2 run aic_model aic_model + ↓ activates .pixi/envs/default/ + ├── runs pixi_env_setup.sh (activation script) + ├── ROS 2 Kilted is available (from robostack-kilted conda packages) + ├── rmw_zenoh_cpp is available (from robostack-kilted) + └── aic_model node starts, subscribes/publishes over Zenoh +``` + +### Path B: colcon (MuJoCo C++ build) + +``` +mujoco.repos + ↓ vcs import + src/mujoco_vendor/, src/mujoco_ros2_control/, src/sdformat_mjcf/ + +package.xml (per package) + ↓ rosdep install + apt installs: libeigen3-dev, libsophus-dev, python3-vcstool, ... + +CMakeLists.txt (per package) + ↓ colcon build (runs cmake + ninja/make per package in dep order) + ├── cmake configures: find_package() locates deps via CMAKE_PREFIX_PATH + ├── rosidl_generate_interfaces() generates msg/srv C++ + Python code + ├── GCC 14 compiles .cpp → .o → links → .so / binary + └── ament_package() generates *Config.cmake for downstream find_package() + +install/setup.bash + ↓ source + sets: CMAKE_PREFIX_PATH, AMENT_PREFIX_PATH, LD_LIBRARY_PATH, PYTHONPATH + result: ros2 run aic_mujoco ... works +``` + +### The dependency graph across tools + +``` +pixi.lock (pins exact versions) + └── conda: ros-kilted-rclpy, ros-kilted-rmw-zenoh-cpp, opencv, mujoco... + └── pypi: lerobot, huggingface-hub + +package.xml (declares ROS deps by name) + └── rosdep maps names → apt packages + └── colcon reads for build order + +CMakeLists.txt (CMake build logic) + └── find_package() uses CMAKE_PREFIX_PATH set by setup.bash + └── ament_package() generates config for downstream packages + └── rosidl generates C++/Python from .msg/.srv + +colcon (orchestrates CMake across all packages) + └── reads package.xml for dep ordering + └── runs cmake + compiler per package + └── writes merged install/ tree +``` + +--- + +## Error Reference + +### Pixi errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `pixi: command not found` | PATH not set | `export PATH="$HOME/.pixi/bin:$PATH"` | +| `error: package not found: ros-kilted-rclpy` | robostack-kilted not in channels | Check `channels` in `pixi.toml` | +| `lock file is out of date` | `pixi.toml` changed, lock not updated | `pixi install` or `pixi update` | +| `failed to build local package aic_model` | pixi-build-ros failure | Check cmake output: `pixi run --verbose` | +| `PyPI package lerobot not compatible` | Version conflict with conda packages | Pin specific version in `[pypi-dependencies]` | + +### CMake errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `Could not find package: rclcpp` | ROS not on `CMAKE_PREFIX_PATH` | `source /opt/ros/kilted/setup.bash` | +| `Could not find package: aic_control_interfaces` | ws not sourced | `source ~/ws_aic/install/setup.bash` | +| `fatal error: format: No such file or directory` | GCC < 14, no `` | `export CXX=/usr/bin/g++-14` | +| `target_link_libraries called with wrong signature` | Old CMake | `cmake_minimum_required(VERSION 3.20)` | +| `undefined reference to rclcpp::...` | Linked against wrong rclcpp | Check `CMAKE_PREFIX_PATH` order | + +### colcon errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `Package 'X' not found` | Build order wrong or dep not installed | Run `rosdep install` first | +| `[1/1 packages failed]` | Check `log/latest_build/X/stdout_stderr.log` | See `cat ~/ws_aic/log/latest_build/*/stderr.log` | +| `sdformat_mjcf README.md not found` | Out-of-tree build reads README | Script's `ln -sf` fix | +| `[0.0s] Package 'X' skipped` | In `--packages-ignore` | Intentional | +| `vcstool not found` | Ubuntu's broken `vcstool` installed | `sudo apt install python3-vcstool` | + +### rosidl errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `No module named 'aic_control_interfaces'` | Python path not set | `source ~/ws_aic/install/setup.bash` | +| `rosidl_generate_interfaces: DEPENDENCIES missing` | Interface deps not listed | Add to `DEPENDENCIES` in `CMakeLists.txt` | +| `TypeSupport not found for rmw_zenoh_cpp` | Zenoh typesupport not built | Ensure `ros-kilted-rmw-zenoh-cpp` installed | + +--- + +## Quick Reference: Which Tool For What + +| Task | Tool | Command | +|------|------|---------| +| Add a Python dep to policy env | pixi | `pixi add numpy` in `aic/` | +| Run the policy node | pixi | `pixi run ros2 run aic_model aic_model` | +| Compile aic_adapter (C++) | colcon | `colcon build --packages-select aic_adapter` | +| Add a system dep (apt) | rosdep | Add to `package.xml`, run `rosdep install` | +| Pull a new C++ repo into ws | vcs | Add to `mujoco.repos`, run `vcs import` | +| Add a new ROS message | rosidl | Create `.msg` file, add to `CMakeLists.txt` | +| Debug a build failure | colcon logs | `cat ~/ws_aic/log/latest_build/PKG/stdout_stderr.log` | +| Freeze all dep versions | pixi | `pixi update` (rewrites `pixi.lock`) | +| Check what's installed in pixi env | pixi | `pixi list` | +| Check what colcon built | colcon | `colcon list` (in `~/ws_aic/`) | diff --git a/wiki/aws/README.md b/wiki/aws/README.md new file mode 100644 index 0000000..ce52bc7 --- /dev/null +++ b/wiki/aws/README.md @@ -0,0 +1,152 @@ +# AWS Setup — AIC Simulation Environment + +Scripts to set up all 3 AIC simulation lanes on a fresh Ubuntu 24.04 AWS instance with NVIDIA GPU. + +## Quick Start + +```bash +# From a fresh instance after cloning the repo: +cd Project-Automaton + +# 1. DCV remote desktop (run first, then connect via browser) +bash aws/install_dcv.sh + +# 2. Dev tools (Node.js, Claude Code) +bash aws/install_devtools.sh + +# 3. Everything else (all 3 sim lanes + headless tests) +bash aws/setup.sh +``` + +## Scripts + +| Script | Purpose | Time | +|--------|---------|------| +| `install_dcv.sh` | NICE DCV remote desktop + Ubuntu Desktop | ~10 min | +| `install_devtools.sh` | Node.js, npm, Claude Code CLI | ~2 min | +| `install_sim.sh` | All 3 sim lanes (Gazebo, MuJoCo, Isaac Lab) | ~45 min | +| `test_headless.sh` | Headless verification + scoring | ~6 min | +| `test_gui.sh` | Visual tests on DCV desktop | manual | +| `setup.sh` | Master: runs `install_sim.sh` then `test_headless.sh` | ~50 min | + +## Master Script (`setup.sh`) + +```bash +bash aws/setup.sh # full install + headless tests +bash aws/setup.sh --skip-install # headless tests only (already installed) +bash aws/setup.sh --skip-isaac-build # skip 30-min Isaac Lab Docker build +bash aws/setup.sh --test-gui # include GUI tests after headless +``` + +## The 3 Simulation Lanes + +### Lane 1 — Gazebo (Official Eval) + +The official AIC evaluation environment. Uses a distrobox container wrapping the `aic_eval` Docker image. Scoring happens here — train anywhere, but always validate in Gazebo. + +```bash +# Terminal 1: Start Gazebo eval +aic-eval-gt # ground_truth:=true (for CheatCode/dev) +aic-eval-no-gt # ground_truth:=false (for real policies) + +# Terminal 2: Run a policy +aic-policy CheatCode # ground-truth oracle (needs gt=true) +aic-policy WaveArm # wave arm demo +``` + +### Lane 2 — MuJoCo (Fast Physics) + +MuJoCo with `ros2_control` — same controller interface as Gazebo. Good for fast iteration on policies. Uses the same ROS topics so policy code is simulator-agnostic. + +```bash +# Terminal 1: Zenoh router +aic-zenoh + +# Terminal 2: MuJoCo bringup (launches viewer + ros2_control) +aic-mujoco + +# Terminal 3: Policy +aic-policy CheatCode +``` + +**Note:** Only use the bringup (`aic-mujoco`). Don't also launch the standalone `simulate` binary — that creates two viewer windows. The bringup already includes the viewer. + +### Lane 3 — Isaac Lab (RL Training) + +Isaac Lab with Isaac Sim for RL training. Runs inside Docker. Requires `Intrinsic_assets` from NVIDIA for the full AIC task scene. + +```bash +# Enter the container +aic-isaac + +# Inside container: +isaaclab -p aic/aic_utils/aic_isaac/aic_isaaclab/scripts/random_agent.py \ + --task AIC-Task-v0 --num_envs 1 --enable_cameras + +# RL training (scale num_envs: 1 → 64 → 128 → 256) +isaaclab -p aic/aic_utils/aic_isaac/aic_isaaclab/scripts/rsl_rl/train.py \ + --task AIC-Task-v0 --num_envs 1 --enable_cameras +``` + +### Intrinsic_assets (Required for Isaac Lab) + +The NVIDIA asset pack is not included in the repo. Download manually: + +1. Go to [NVIDIA Developer Portal](https://developer.nvidia.com) (requires login) +2. Download `Intrinsic_assets.zip` +3. Extract to: `~/IsaacLab/aic/aic_utils/aic_isaac/aic_isaaclab/source/aic_task/aic_task/tasks/manager_based/aic_task/Intrinsic_assets/` + +The test scripts will `docker cp` the assets into the container automatically if they exist on the host. + +## GUI Tests (`test_gui.sh`) + +Run individual lanes or all at once. Requires DCV connection to see the GUI. + +```bash +bash aws/test_gui.sh lane1 # Gazebo + CheatCode +bash aws/test_gui.sh lane2 # MuJoCo + CheatCode via ros2_control +bash aws/test_gui.sh lane3 # Isaac Lab random_agent +bash aws/test_gui.sh # all 3 +``` + +Each lane runs in a tmux session: +- `tmux attach -t aic_gz_gui` / `tmux attach -t aic_mj_gui` to monitor +- `tmux kill-session -t aic_gz_gui` / `tmux kill-session -t aic_mj_gui` to stop + +## System Requirements + +- Ubuntu 24.04 (Noble) +- NVIDIA GPU (L4 / A10G / L40S recommended) +- 32GB+ RAM +- 100GB+ disk (Isaac Lab Docker image alone is ~26GB) +- DCV or X11 for GUI tests + +## What Survives a Reboot + +| Component | Persists? | How to Restore | +|-----------|-----------|----------------| +| DCV server | Yes (systemd) | Automatic | +| Docker daemon | Yes (systemd) | Automatic | +| `aic_eval` distrobox | Yes (Docker container) | `distrobox enter -r aic_eval` | +| `isaac-lab-base` container | Yes (restart=unless-stopped) | Automatic, but re-run `pip install` inside | +| Pixi workspace | Yes (on disk) | Nothing needed | +| MuJoCo colcon workspace | Yes (on disk) | `source ~/ws_aic/install/setup.bash` | +| MJCF scene files | Yes (on disk) | Nothing needed | +| `/tmp/*.sdf` exports | No | Re-run `install_sim.sh` or the conversion section | +| Isaac Lab pip installs | No (container layer) | `test_headless.sh` re-installs automatically | +| Intrinsic_assets in container | No (docker cp) | Test scripts re-copy automatically | +| Shell aliases | Yes (~/.bashrc) | `source ~/.bashrc` | + +## Troubleshooting + +**DCV shows blank screen:** Run `xhost +local:docker` on the DCV desktop terminal. + +**Gazebo CheatCode scores 0:** CheatCode needs `ground_truth:=true`. Use `aic-eval-gt`, not `aic-eval-no-gt`. + +**MuJoCo "DOF 33 unstable" warning:** Cable physics warning — cosmetic, doesn't affect robot control. + +**Two MuJoCo windows:** You launched both `aic-mujoco` (bringup) and the standalone `simulate` binary. Only use one — the bringup includes its own viewer. + +**Isaac Lab "ModuleNotFoundError: isaaclab":** Run inside the container: `/workspace/isaaclab/_isaac_sim/python.sh -m pip install --no-build-isolation -e /workspace/isaaclab/source/isaaclab` + +**Isaac Lab "enable_cameras" crash:** Add `--enable_cameras` flag to any script that uses cameras.