Skip to content

Run TorchTitan GraphTrainer AutoParallel CI#452

Draft
sanketpurandare wants to merge 1 commit into
sanketpurandare/stack/6from
sanketpurandare/stack/8
Draft

Run TorchTitan GraphTrainer AutoParallel CI#452
sanketpurandare wants to merge 1 commit into
sanketpurandare/stack/6from
sanketpurandare/stack/8

Conversation

@sanketpurandare
Copy link
Copy Markdown
Contributor

@sanketpurandare sanketpurandare commented May 8, 2026

Stacked PRs:


Run TorchTitan GraphTrainer AutoParallel CI

Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from d32c5ae to 4b5fd26 Compare May 8, 2026 08:11
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 8, 2026
@sanketpurandare sanketpurandare marked this pull request as draft May 8, 2026 08:11
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/6 to main May 8, 2026 09:12
sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from 4b5fd26 to d6c67bb Compare May 8, 2026 09:12
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/6 May 8, 2026 09:13
@sanketpurandare sanketpurandare changed the base branch from sanketpurandare/stack/6 to main May 8, 2026 09:16
sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from d6c67bb to 1d7789a Compare May 8, 2026 09:16
@sanketpurandare sanketpurandare changed the base branch from main to sanketpurandare/stack/6 May 8, 2026 09:17
sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from 1d7789a to 7b1fb64 Compare May 8, 2026 09:28
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/6 branch 2 times, most recently from 749a930 to 0592da0 Compare May 8, 2026 10:51
sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from 7b1fb64 to bcca6c2 Compare May 8, 2026 10:51
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/6 branch from 0592da0 to e55d303 Compare May 8, 2026 22:43
sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from bcca6c2 to be22e1f Compare May 8, 2026 22:43
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/6 branch from e55d303 to 5d994d4 Compare May 8, 2026 23:31
sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from be22e1f to f19b414 Compare May 8, 2026 23:31
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/6 branch from 5d994d4 to f235ee8 Compare May 8, 2026 23:51
sanketpurandare added a commit that referenced this pull request May 8, 2026
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from f19b414 to c4779f1 Compare May 8, 2026 23:51
Extend the TorchTitan integration workflow to run the GraphTrainer AutoParallel integration tests for Llama3 FSDP+TP and DeepSeek V3 EFSDP+EP.

Also run the GraphTrainer AutoParallel numerics tests for Llama3 and DeepSeek V3. The DeepSeek V3 commands disable NCCL NVLS to match the stable TorchTitan numerics setup on the four-GPU AutoParallel CI runner.

stack-info: PR: #452, branch: sanketpurandare/stack/8
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/6 branch from f235ee8 to b61c192 Compare May 11, 2026 19:39
@sanketpurandare sanketpurandare force-pushed the sanketpurandare/stack/8 branch from c4779f1 to 66c6ad6 Compare May 11, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant