Skip to content

[fix] handle FSDP DTensor in broadcast_from_megatron_pp#113

Open
yxs wants to merge 1 commit into
ISEEKYAN:mainfrom
yxs:fix/fsdp-dtensor-broadcast
Open

[fix] handle FSDP DTensor in broadcast_from_megatron_pp#113
yxs wants to merge 1 commit into
ISEEKYAN:mainfrom
yxs:fix/fsdp-dtensor-broadcast

Conversation

@yxs

@yxs yxs commented Apr 2, 2026

Copy link
Copy Markdown

What does this PR do?

Megatron FSDP (ZeRO-3) stores parameters as DTensors. When export_weights broadcasts params across PP ranks, torch.distributed.broadcast() triggers DTensor dispatch and fails:

AssertionError: found no DeviceMesh from dtensor args for c10d.broadcast_.default!

Fix: call DTensor.full_tensor() to materialize the full parameter before broadcasting. Backward compatible, no-op for non-FSDP parameters.

Megatron FSDP (ZeRO-3) stores parameters as DTensors. When
export_weights broadcasts params across PP ranks,
torch.distributed.broadcast() triggers DTensor dispatch and fails
because the PP group is not in the DTensor's DeviceMesh.

Fix: call DTensor.full_tensor() to materialize the full parameter
before broadcasting.
@yxs

yxs commented Apr 3, 2026

Copy link
Copy Markdown
Author

@ISEEKYAN could you please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant