-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[feat] Generalized Tensor Parallelism (GTP) #4967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
fanshiqing
wants to merge
64
commits into
NVIDIA:main
Choose a base branch
from
fanshiqing:gtp_release
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+8,998
−280
Open
Changes from all commits
Commits
Show all changes
64 commits
Select commit
Hold shift + click to select a range
162217c
Generalized Tensor Parallelism (GTP) init commit
fanshiqing 1cab66a
fix conflicts
fanshiqing ccfee04
code clean
fanshiqing 8b4041a
fix comments; defer wgrad->dgrad support in following up MR;
fanshiqing 1903598
fix comments
fanshiqing 9eb5007
code clean.
fanshiqing 019536f
Fix GTP broadcast_params + add partial DP-CP with GTP group
fanshiqing 375f09c
fix none-egpt sharded params's reduction in moe layer; fix comments
fanshiqing 6872981
rename 'generalized_tensor_parallel_size' into 'generalized_tensor_pa…
fanshiqing 6c28127
update README
fanshiqing 02f76a7
fix comments
fanshiqing 476aa05
Fix EGTP correctness on cudagraph bwd capture + main_param dedup
fanshiqing cdf5d35
fix comments
fanshiqing e885461
Merge remote-tracking branch 'adlr_github/main' into gtp_release
fanshiqing 3c84291
Only batch with _foreach_add_ when finalizing multiple (routed) weight
fanshiqing 392816a
gtp+gmm-fusion: support offloading(moe-act-input)
fanshiqing fc570d0
GTP + full-iter CG
fanshiqing 23ed3ba
[feat]GTP: prefetch recompute-forward weight gathers via a separate c…
fanshiqing ecf2dd1
GTP: allocate GRAPHED buffers into CG mempool at creation; fix comments
fanshiqing 1163b4a
fix comments
fanshiqing 5dc0423
fix onlince checks: copyright, intallation test, build.
fanshiqing ab869b9
fix te min version required for GTP.
fanshiqing 45a604f
fix online UTs; fix comments.
fanshiqing 87a50cd
fix UTs
fanshiqing 107c077
fix UTs
fanshiqing 9dca05a
Fix GTP DDP bucket alignment for distributed optimizer; add correspon…
fanshiqing 0906db0
fix formating
fanshiqing 6cdfe5d
fix regular ddp buffer bucket misalignment when GTP params are present
fanshiqing 4d1e2eb
add integration test for {mamba,attn,moe}+gtp; polish existing gtp an…
fanshiqing 4695217
fix comments from Jimmy and Deepak
fanshiqing 3725f51
feat: make (E)GTP a first-class orthogonal parallelism axis
fanshiqing 84e6a7d
code clean
fanshiqing 42fd06f
Merge remote-tracking branch 'adlr_github/main' into gtp_release
fanshiqing b8b078a
code clean
fanshiqing 83bea9f
Generate the DDP param layerout for the GTP replicate group at it's s…
fanshiqing 22fc6e6
fold the GTP intra DP groups into intra_dp_cp_group and intra_expt_dp…
fanshiqing 0a55ed5
fix UTs
fanshiqing be22dce
[feat] GTP+DCP
fanshiqing 70ef35d
rename gtp-exclude process group: with_gtp -> no_gtp
fanshiqing e430807
fix dense GTP NCCL group using stale 'ps' key
fanshiqing 5a8c469
update README with scalability
fanshiqing aa40d0d
fix comments
fanshiqing 601a658
fix comments
fanshiqing eddb7ba
Rename GTP remat knobs and add num-weight-shards user API
fanshiqing 6806f43
Support GTP/EGTP in LayerWiseDistributedOptimizer and Muon (#3)
deepakn94 7d71c08
GTP+DCP: simplify gtp replica_ids in MambaMixer.sharded_state_dict; a…
fanshiqing 69dae5a
GTP+Muon: fix DCP save/load; add corresponding UTs
fanshiqing f13a042
code clean and fix comments
fanshiqing 0e3c3d2
Fix GTP DDP grad-ready firing before deferred wgrad accumulation
fanshiqing f6dca05
fix format and comments
fanshiqing c33667a
Merge remote-tracking branch 'adlr_github/main' into gtp_release
fanshiqing ce02728
fix comments
fanshiqing ae8a571
fix linting
fanshiqing 71a53c6
Merge remote-tracking branch 'adlr_github/main' into gtp_release
fanshiqing 6aeecc1
Fix optional process-group fallbacks defeated by __getattr__; Log hum…
fanshiqing 114b6fc
Fix GTP grad norm inflated on CUDA-graph capture step; fix linting
fanshiqing 3c7aa6c
fix online UTs
fanshiqing 1b066a5
Simplify GTP grad-norm fix: drop unnecessary bwd-graph backup
fanshiqing 7d7e8c3
Move GTP from megatron.experimental into megatron.core
fanshiqing 14464b5
GTP+CG: code clean: replace GTP bwd Phase-2 completion event with a r…
fanshiqing 083c15f
add gtp public API file
fanshiqing 8aa2b6d
GTP: clean up generalized_tensor_parallelism after the core move
fanshiqing dc629cc
fix1: populate EGTP-excluded expert-DP groups in get_default_pg_colle…
fanshiqing 00c9d20
Fix: defer global TP/DP group reads in _backfill_gtp_sharded_param_map
fanshiqing File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+217 KB
docs/images/generalized_tensor_parallel/0525_gtp_mcore_te_architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+183 KB
docs/images/generalized_tensor_parallel/0611_ddp_egtp_orthogonal_bucketing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+184 KB
docs/images/generalized_tensor_parallel/0612_gtp_dcp_tp2gtp2_save_load.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+144 KB
docs/images/generalized_tensor_parallel/0613_gtp_dcp_save_call_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+130 KB
docs/images/generalized_tensor_parallel/0617_gtp64_weak_scaling_efficiency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.