action dataloader: episode-shuffle stream (fix DROID grad-norm instability) by fwd4 · Pull Request #37 · NVIDIA/cosmos-framework

fwd4 · 2026-06-12T03:20:14Z

Problem

The DROID action SFT dataloader trained with an unstable, slow-settling grad-norm (and a noisy action-loss plateau) vs the internal reference. Root cause: the DROID action dataset is map-style and — unlike the iterable vision SFTDataset, which self-shuffles — does not shuffle, and RankPartitionedDataLoader wraps it in a DataLoader with no shuffle, i.e. a SequentialSampler. Every rank then iterates the same consecutive, overlapping windows, so the all-reduced global batch is effectively ~1 episode → high gradient variance.

(Forward + gradients were verified numerically equivalent to the internal model on identical input, so this was a data-path issue, not the model/loss/optimizer.)

Fix

ActionIterableShuffleDataset (iterable_shuffle=True): an IterableDataset view of the map-style dataset that streams rank × worker-sharded, episode-order-shuffled, sequential-within-episode — decorrelated batches with sequential reads (preserves I/O locality + copy-on-write; a plain shuffle=True/RandomSampler instead does random-access I/O → ~11 min/iter and OOM from broken COW). Mirrors the internal iterable dataset's per-worker episode assignment.

Adds DROIDLeRobotDataset.get_shuffle_blocks() (per-episode/segment flat-index blocks the iterable streams).
No DataLoader/sampler change needed — IterableDataset is handled natively (sampler=None).

Validation (8192 global batch)

iter	this fix	internal ref	no-shuffle
100	grad-norm 2.9	4.7	21
450	grad-norm 1.7	1.9	—

Per-component action loss converges to ~0.0055 (matches internal ~0.005; the no-shuffle run plateaued noisily at 0.03–0.07). Builds on #24 (recipe + FusedAdam optimizer).

🤖 Generated with Claude Code

Added commits (recipe correctness)

mode="policy" default — DROIDLeRobotDataset defaulted to mode="joint" (random forward_dynamics/inverse_dynamics/policy per sample), so the policy recipe was silently training multi-task. inverse_dynamics zeros the vision loss and forward_dynamics zeros the action loss, diluting each per-task loss by ~1/3 vs the policy-only internal run. Now defaults to policy (matching i4's DROIDLeRobotDataset); mode is also threaded through get_action_droid_sft_dataset.
max_num_tokens_after_packing=-1 — uncaps the packed-sequence length (NANO default 45056) to match the internal droid_lerobot_8b run, so the full vision sequence is processed per step. Does not change the per-token loss; widens the effective vision context per step.

mli0603 · 2026-06-12T05:06:14Z

LGTM

lfengad

overall LGTM

ychao-nvidia · 2026-06-15T14:38:38Z

                "action_modality_embed",
            ],
-            lr=2.0e-04,  # matches internal droid_lerobot_8b_policy submit (--lr 2e-4)
+            lr=1.0e-04,  # sqrt-scaled for 2048 global batch (internal 2e-4 was for 8192 = 4x)


Is this change intended? Our internal ablation showed that fixing lr to 2.0e-4 is a key to high policy success rate.

It is accidentally introduced in one of the resource-constrained experiment, now reverted

…ebased on main) Rebased onto current main. main NVIDIA#34 upstreamed the DROID dataset (joint_pos, use_state, keep-ranges filter, action_space) so droid_lerobot_dataset.py now carries only the get_shuffle_blocks helper grafted onto main's version; NVIDIA#29's recipe change (dropped /cluster override) is incorporated. Remaining contribution: action_policy_droid_nano recipe (mode=policy, lr=2e-4 @ 8192 global, max_num_tokens_after_packing=-1, scrubbed comments), the episode-shuffle stream (action_sft_dataset.py), the multi-node-capable SFT launcher (NNODES/NODE_RANK/MASTER_ADDR passthrough + EXTRA_TAIL_OVERRIDES), and the post-train doc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>

…None joint_pos uses raw (un-normalized) joint actions, so DROIDLeRobotDataset sets action_normalization=None — but _build_result called normalize_action() unconditionally, which raises 'Unknown normalization method: None'. Guard it so None means raw actions (caught by a 2-node sanity run on the rebased branch). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>

…reference The bare recipe trained with the NANO default loss_scale=1.0, weighting the vision flow-matching loss 10x lower than the Cosmos3-Nano-Policy-DROID reference (which uses 10.0). Set it post-construction so the recipe reproduces without launcher overrides. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Hao Liang <haolia@nvidia.com>

ychao-nvidia · 2026-06-18T22:02:48Z

                            use_filter_dict=False,
                            filter_dict_path=None,


~~Need to change use_filter_dict to True.~~
Nvm. I see you override it in the launch command.

ychao-nvidia · 2026-06-18T22:15:04Z

+# docs/action_policy_droid_posttrain.md.
 #
 # Env vars (override for your filesystem):
 #   DATASET_PATH          DROID LeRobot v3.0 success split (…/droid_lerobot/success)


Suggested change

# DATASET_PATH DROID LeRobot v3.0 success split (…/droid_lerobot/success)

# DATASET_PATH Cosmos3-DROID success split (…/Cosmos3-DROID/success)

ychao-nvidia · 2026-06-18T22:15:24Z

 export DROID_ROOT="${DROID_ROOT:-$DATASET_PATH}"

-EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare DROID LeRobot v3.0 — see docs/action_policy_droid_posttraining.md)" >&2; exit 1; }'
+EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare DROID LeRobot v3.0 — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }'


Suggested change

EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare DROID LeRobot v3.0 — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }'

EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] || { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare Cosmos3-DROID — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }'

ychao-nvidia · 2026-06-18T22:15:49Z


 ```shell
-# Step 1: prepare DROID LeRobot v3.0 success split -> $DATASET_PATH (see "Inputs you provide")
+# Step 1: prepare DROID LeRobot v3.0 success split -> $DATASET_PATH (see "Inputs You Provide")


Suggested change

# Step 1: prepare DROID LeRobot v3.0 success split -> $DATASET_PATH (see "Inputs You Provide")

# Step 1: prepare Cosmos3-DROID success split -> $DATASET_PATH (see "Inputs You Provide")

ychao-nvidia · 2026-06-18T22:21:20Z

@@ -28,69 +37,79 @@ be provided per environment:
   filtering is run out-of-band (not yet in this repo). Point `DROID_ROOT` at the resulting
   `…/droid_lerobot/success` directory (must contain `meta/info.json`).


Suggested change

`…/droid_lerobot/success` directory (must contain `meta/info.json`).

`…/Cosmos3-DROID/success` directory (must contain `meta/info.json`).

ychao-nvidia · 2026-06-18T23:30:50Z

 bash examples/launch_sft_action_policy_droid.sh
 ```

 The recipe TOML (`examples/toml/sft_config/action_policy_droid_repro.toml`) sets the scalar


Suggested change

The recipe TOML (`examples/toml/sft_config/action_policy_droid_repro.toml`) sets the scalar

The recipe TOML ([`examples/toml/sft_config/action_policy_droid_repro.toml`](../examples/toml/sft_config/action_policy_droid_repro.toml)) sets the scalar

ychao-nvidia · 2026-06-18T23:33:00Z

+                            iterable_shuffle=True,  # rank x worker episode-shuffle stream
+                            episode_shuffle_seed=42,
                            use_image_augmentation=True,  # SR boost (random crop+rescale + color jitter)
                            # Keep-ranges window filter (drops idle/non-task frames). Off by default;


Suggested change

# Keep-ranges window filter (drops idle/non-task frames). Off by default;

# keep_ranges_1_0_1.json window filter (drops idle/non-task frames). Off by default;

ychao-nvidia · 2026-06-18T23:33:27Z

+  -o $BASE_CHECKPOINT_PATH \
+  --checkpoint-path Cosmos3-Nano
+
+# Step 3: download the keep-ranges window filter (drops idle/non-task frames -> trains


Suggested change

# Step 3: download the keep-ranges window filter (drops idle/non-task frames -> trains

# Step 3: download the keep_ranges_1_0_1.json window filter (drops idle/non-task frames -> trains

ychao-nvidia · 2026-06-18T23:33:39Z

 export BASE_CHECKPOINT_PATH=/path/to/base_checkpoint
 export WAN_VAE_PATH=/path/to/Wan2.2_VAE.pth
 export NPROC_PER_NODE=8
+# Enable the keep-ranges filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra


Suggested change

# Enable the keep-ranges filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra

# Enable the keep_ranges_1_0_1.json filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra

ychao-nvidia · 2026-06-18T23:33:59Z

-`--config-overrides "trainer.max_iter=10" "checkpoint.save_iter=10"` (and a small
-`data_parallel_shard_degree`). Use this to validate the recipe composes and the dataset opens
-before any large allocation.
+The **keep-ranges filter** maps each DROID trajectory key to a list of `[start, end]` frame


Suggested change

The **keep-ranges filter** maps each DROID trajectory key to a list of `[start, end]` frame

The **keep_ranges_1_0_1.json filter** maps each DROID trajectory key to a list of `[start, end]` frame

fwd4 force-pushed the droid-action-shuffle branch from f786168 to 8eec346 Compare June 12, 2026 03:25

fwd4 requested review from lfengad, mli0603 and ychao-nvidia June 12, 2026 03:38

mli0603 enabled auto-merge (squash) June 12, 2026 05:06

lfengad reviewed Jun 12, 2026

View reviewed changes

Comment thread cosmos_framework/data/vfm/action/datasets/action_sft_dataset.py

lfengad previously approved these changes Jun 12, 2026

View reviewed changes

fwd4 dismissed lfengad’s stale review via 9597d5c June 15, 2026 12:45

ychao-nvidia reviewed Jun 15, 2026

View reviewed changes

fwd4 force-pushed the droid-action-shuffle branch from ae3db20 to f34031a Compare June 17, 2026 04:40

fwd4 requested review from foreverlms, lfengad and ychao-nvidia June 17, 2026 06:50

This was referenced Jun 17, 2026

Question about Cosmos3-Nano-Policy-DROID #44

Open

fix(action): handle action_normalization=None in ActionBaseDataset._b… #52

Open

ychao-nvidia reviewed Jun 18, 2026

View reviewed changes

	# DATASET_PATH DROID LeRobot v3.0 success split (…/droid_lerobot/success)
	# DATASET_PATH Cosmos3-DROID success split (…/Cosmos3-DROID/success)

	EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] \|\| { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare DROID LeRobot v3.0 — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }'
	EXTRA_DATASET_CHECK='[[ -f "$DROID_ROOT/meta/info.json" ]] \|\| { echo "ERROR: missing $DROID_ROOT/meta/info.json (prepare Cosmos3-DROID — see docs/action_policy_droid_posttrain.md)" >&2; exit 1; }'

	# Step 1: prepare DROID LeRobot v3.0 success split -> $DATASET_PATH (see "Inputs You Provide")
	# Step 1: prepare Cosmos3-DROID success split -> $DATASET_PATH (see "Inputs You Provide")

		@@ -28,69 +37,79 @@ be provided per environment:
		filtering is run out-of-band (not yet in this repo). Point `DROID_ROOT` at the resulting
		`…/droid_lerobot/success` directory (must contain `meta/info.json`).

	`…/droid_lerobot/success` directory (must contain `meta/info.json`).
	`…/Cosmos3-DROID/success` directory (must contain `meta/info.json`).

	The recipe TOML (`examples/toml/sft_config/action_policy_droid_repro.toml`) sets the scalar
	The recipe TOML ([`examples/toml/sft_config/action_policy_droid_repro.toml`](../examples/toml/sft_config/action_policy_droid_repro.toml)) sets the scalar

	# Keep-ranges window filter (drops idle/non-task frames). Off by default;
	# keep_ranges_1_0_1.json window filter (drops idle/non-task frames). Off by default;

	# Step 3: download the keep-ranges window filter (drops idle/non-task frames -> trains
	# Step 3: download the keep_ranges_1_0_1.json window filter (drops idle/non-task frames -> trains

	# Enable the keep-ranges filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra
	# Enable the keep_ranges_1_0_1.json filter via EXTRA_TAIL_OVERRIDES (space-separated Hydra

	The keep-ranges filter maps each DROID trajectory key to a list of `[start, end]` frame
	The keep_ranges_1_0_1.json filter maps each DROID trajectory key to a list of `[start, end]` frame

Conversation

fwd4 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Validation (8192 global batch)

Added commits (recipe correctness)

Uh oh!

mli0603 commented Jun 12, 2026

Uh oh!

lfengad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fwd4 commented Jun 12, 2026 •

edited

Loading