action dataloader: episode-shuffle stream (fix DROID grad-norm instability)#37
action dataloader: episode-shuffle stream (fix DROID grad-norm instability)#37fwd4 wants to merge 3 commits into
Conversation
f786168 to
8eec346
Compare
|
LGTM |
| "action_modality_embed", | ||
| ], | ||
| lr=2.0e-04, # matches internal droid_lerobot_8b_policy submit (--lr 2e-4) | ||
| lr=1.0e-04, # sqrt-scaled for 2048 global batch (internal 2e-4 was for 8192 = 4x) |
There was a problem hiding this comment.
Is this change intended? Our internal ablation showed that fixing lr to 2.0e-4 is a key to high policy success rate.
There was a problem hiding this comment.
It is accidentally introduced in one of the resource-constrained experiment, now reverted
…ebased on main) Rebased onto current main. main NVIDIA#34 upstreamed the DROID dataset (joint_pos, use_state, keep-ranges filter, action_space) so droid_lerobot_dataset.py now carries only the get_shuffle_blocks helper grafted onto main's version; NVIDIA#29's recipe change (dropped /cluster override) is incorporated. Remaining contribution: action_policy_droid_nano recipe (mode=policy, lr=2e-4 @ 8192 global, max_num_tokens_after_packing=-1, scrubbed comments), the episode-shuffle stream (action_sft_dataset.py), the multi-node-capable SFT launcher (NNODES/NODE_RANK/MASTER_ADDR passthrough + EXTRA_TAIL_OVERRIDES), and the post-train doc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
ae3db20 to
f34031a
Compare
…None joint_pos uses raw (un-normalized) joint actions, so DROIDLeRobotDataset sets action_normalization=None — but _build_result called normalize_action() unconditionally, which raises 'Unknown normalization method: None'. Guard it so None means raw actions (caught by a 2-node sanity run on the rebased branch). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
…reference The bare recipe trained with the NANO default loss_scale=1.0, weighting the vision flow-matching loss 10x lower than the Cosmos3-Nano-Policy-DROID reference (which uses 10.0). Set it post-construction so the recipe reproduces without launcher overrides. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>
| use_filter_dict=False, | ||
| filter_dict_path=None, |
There was a problem hiding this comment.
Need to change use_filter_dict to True.
Nvm. I see you override it in the launch command.
| # docs/action_policy_droid_posttrain.md. | ||
| # | ||
| # Env vars (override for your filesystem): | ||
| # DATASET_PATH DROID LeRobot v3.0 success split (…/droid_lerobot/success) |
There was a problem hiding this comment.
| # DATASET_PATH DROID LeRobot v3.0 success split (…/droid_lerobot/success) | |
| # DATASET_PATH Cosmos3-DROID success split (…/Cosmos3-DROID/success) |
| export DROID_ROOT="${DROID_ROOT:-$DATASET_PATH}" | ||
|
|
||
| EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare DROID LeRobot v3.0 — see docs/action_policy_droid_posttraining.md)" >&2; exit 1; }' | ||
| EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare DROID LeRobot v3.0 — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }' |
There was a problem hiding this comment.
| EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare DROID LeRobot v3.0 — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }' | |
| EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare Cosmos3-DROID — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }' |
|
|
||
| ```shell | ||
| # Step 1: prepare DROID LeRobot v3.0 success split -> $DATASET_PATH (see "Inputs you provide") | ||
| # Step 1: prepare DROID LeRobot v3.0 success split -> $DATASET_PATH (see "Inputs You Provide") |
There was a problem hiding this comment.
| # Step 1: prepare DROID LeRobot v3.0 success split -> $DATASET_PATH (see "Inputs You Provide") | |
| # Step 1: prepare Cosmos3-DROID success split -> $DATASET_PATH (see "Inputs You Provide") |
| @@ -28,69 +37,79 @@ be provided per environment: | |||
| filtering is run out-of-band (not yet in this repo). Point `DROID_ROOT` at the resulting | |||
| `…/droid_lerobot/success` directory (must contain `meta/info.json`). | |||
There was a problem hiding this comment.
| `…/droid_lerobot/success` directory (must contain `meta/info.json`). | |
| `…/Cosmos3-DROID/success` directory (must contain `meta/info.json`). |
| bash examples/launch_sft_action_policy_droid.sh | ||
| ``` | ||
|
|
||
| The recipe TOML (`examples/toml/sft_config/action_policy_droid_repro.toml`) sets the scalar |
There was a problem hiding this comment.
| The recipe TOML (`examples/toml/sft_config/action_policy_droid_repro.toml`) sets the scalar | |
| The recipe TOML ([`examples/toml/sft_config/action_policy_droid_repro.toml`](../examples/toml/sft_config/action_policy_droid_repro.toml)) sets the scalar |
| iterable_shuffle=True, # rank x worker episode-shuffle stream | ||
| episode_shuffle_seed=42, | ||
| use_image_augmentation=True, # SR boost (random crop+rescale + color jitter) | ||
| # Keep-ranges window filter (drops idle/non-task frames). Off by default; |
There was a problem hiding this comment.
| # Keep-ranges window filter (drops idle/non-task frames). Off by default; | |
| # keep_ranges_1_0_1.json window filter (drops idle/non-task frames). Off by default; |
| -o $BASE_CHECKPOINT_PATH \ | ||
| --checkpoint-path Cosmos3-Nano | ||
|
|
||
| # Step 3: download the keep-ranges window filter (drops idle/non-task frames -> trains |
There was a problem hiding this comment.
| # Step 3: download the keep-ranges window filter (drops idle/non-task frames -> trains | |
| # Step 3: download the keep_ranges_1_0_1.json window filter (drops idle/non-task frames -> trains |
| export BASE_CHECKPOINT_PATH=/path/to/base_checkpoint | ||
| export WAN_VAE_PATH=/path/to/Wan2.2_VAE.pth | ||
| export NPROC_PER_NODE=8 | ||
| # Enable the keep-ranges filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra |
There was a problem hiding this comment.
| # Enable the keep-ranges filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra | |
| # Enable the keep_ranges_1_0_1.json filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra |
| `--config-overrides "trainer.max_iter=10" "checkpoint.save_iter=10"` (and a small | ||
| `data_parallel_shard_degree`). Use this to validate the recipe composes and the dataset opens | ||
| before any large allocation. | ||
| The **keep-ranges filter** maps each DROID trajectory key to a list of `[start, end]` frame |
There was a problem hiding this comment.
| The **keep-ranges filter** maps each DROID trajectory key to a list of `[start, end]` frame | |
| The **keep_ranges_1_0_1.json filter** maps each DROID trajectory key to a list of `[start, end]` frame |
Problem
The DROID action SFT dataloader trained with an unstable, slow-settling grad-norm (and a noisy action-loss plateau) vs the internal reference. Root cause: the DROID action dataset is map-style and — unlike the iterable vision
SFTDataset, which self-shuffles — does not shuffle, andRankPartitionedDataLoaderwraps it in aDataLoaderwith noshuffle, i.e. aSequentialSampler. Every rank then iterates the same consecutive, overlapping windows, so the all-reduced global batch is effectively ~1 episode → high gradient variance.(Forward + gradients were verified numerically equivalent to the internal model on identical input, so this was a data-path issue, not the model/loss/optimizer.)
Fix
ActionIterableShuffleDataset(iterable_shuffle=True): anIterableDatasetview of the map-style dataset that streams rank × worker-sharded, episode-order-shuffled, sequential-within-episode — decorrelated batches with sequential reads (preserves I/O locality + copy-on-write; a plainshuffle=True/RandomSamplerinstead does random-access I/O → ~11 min/iter and OOM from broken COW). Mirrors the internal iterable dataset's per-worker episode assignment.DROIDLeRobotDataset.get_shuffle_blocks()(per-episode/segment flat-index blocks the iterable streams).DataLoader/sampler change needed —IterableDatasetis handled natively (sampler=None).Validation (8192 global batch)
Per-component action loss converges to ~0.0055 (matches internal ~0.005; the no-shuffle run plateaued noisily at 0.03–0.07). Builds on #24 (recipe + FusedAdam optimizer).
🤖 Generated with Claude Code
Added commits (recipe correctness)
mode="policy"default —DROIDLeRobotDatasetdefaulted tomode="joint"(random forward_dynamics/inverse_dynamics/policy per sample), so the policy recipe was silently training multi-task.inverse_dynamicszeros the vision loss andforward_dynamicszeros the action loss, diluting each per-task loss by ~1/3 vs the policy-only internal run. Now defaults topolicy(matching i4'sDROIDLeRobotDataset);modeis also threaded throughget_action_droid_sft_dataset.max_num_tokens_after_packing=-1— uncaps the packed-sequence length (NANO default 45056) to match the internaldroid_lerobot_8brun, so the full vision sequence is processed per step. Does not change the per-token loss; widens the effective vision context per step.