Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
702 commits
Select commit Hold shift + click to select a range
439c777
Merge pull request #73 from argonne-lcf/docs-ucp-bug
saforem2 Dec 29, 2024
03da571
feat: Add `ALCF/examples/finetune_llama3/*`
saforem2 Jan 14, 2025
19bdff0
docs: Update `ALCF/examples/finetune_llama3/*`
saforem2 Jan 15, 2025
e2cb209
chore: Update `tools/hf2megads_weight_converter.py`
saforem2 Jan 15, 2025
a868788
feat: Add `ALCF/examples/finetune_llama3p2_1B/*`
saforem2 Jan 15, 2025
babef03
feat: Update `ALCF/examples/finetune_llama3/*`
saforem2 Jan 15, 2025
7727f93
Update README.md
saforem2 Jan 15, 2025
13666a1
Remove redundant `ALCF/examples/finetune_llama3p2_1B/*`
saforem2 Jan 15, 2025
6b5eed5
chore: Add `DummyOptimizer` to `tools/hf2megads_weight_converter.py`
saforem2 Jan 16, 2025
2f9e19d
fix: `NO_FLASH_ATTN` on Polaris in `ALCF/helpers.sh`
saforem2 Jan 16, 2025
0636aea
Add {sunspot, sophia} in `ALCF/examples/finetune_llama3/*`
saforem2 Jan 17, 2025
b800277
Merge branch 'main' into finetune-llama3
saforem2 Jan 18, 2025
adeca53
fix: Call `set_ccl_vars_on_aurora` only if `WORLD_SIZE > 1`
saforem2 Jan 19, 2025
4d0077c
Update README.md
saforem2 Jan 27, 2025
3af7eb4
Merge pull request #76 from argonne-lcf/saforem2-patch-2
saforem2 Jan 27, 2025
8098a70
Merge pull request #75 from argonne-lcf/fix-single-node
saforem2 Jan 28, 2025
e7990d5
added adopt optimizer
Jan 28, 2025
0948f84
adopt optimizer
Jan 28, 2025
1c04f64
fix: Resolve merge commit
saforem2 Jan 28, 2025
c1f99b9
chore: Update Llama FT
saforem2 Jan 28, 2025
3991f25
chore: Update `megatron/data/prompt_dataset.py`
saforem2 Jan 28, 2025
101d0ed
feat: Add `ALCF/examples/checkpoint_conversion/*`
saforem2 Jan 31, 2025
10903ef
docs: Update `ALCF/examples/checkpoint_conversion/README.md`
saforem2 Jan 31, 2025
19b3b74
Update README.md
saforem2 Mar 12, 2025
0dcc101
docs: Add `ALCF/notes/deprecated.md`
saforem2 Mar 12, 2025
a3424b6
docs: Update `ALCF/README.md`
saforem2 Mar 12, 2025
bf50938
docs: Update `ALCF/notes/deprecated.md`
saforem2 Mar 12, 2025
b9258f5
Merge branch 'main' into finetune-llama3
saforem2 Mar 12, 2025
cf49054
Merge pull request #74 from argonne-lcf/finetune-llama3
saforem2 Mar 12, 2025
6b8f092
Merge branch 'main' into saforem2-patch-2
saforem2 Mar 12, 2025
aedc2c2
Merge pull request #78 from argonne-lcf/saforem2-patch-2
saforem2 Mar 12, 2025
6fcf7d5
Update README.md
saforem2 Mar 12, 2025
39784a0
Merge pull request #79 from argonne-lcf/saforem2-patch-3
saforem2 Mar 12, 2025
a5fea86
added muon
Mar 20, 2025
f145195
added muon optimizer
Mar 24, 2025
8bfee6f
added muon optimizer
Mar 24, 2025
444af68
chore: formatting in `megatron/model/__init__.py`
saforem2 Mar 26, 2025
9c472b4
chore: formatting `megatron/utils.py`
saforem2 Mar 26, 2025
11f1433
Merge branch 'main' into lb-optimizers
saforem2 Mar 26, 2025
267d650
added infinite schedulers
Apr 21, 2025
6b6c63d
fix: Fix imports in `pretrain_gpt_alcf.py`
saforem2 Apr 22, 2025
7ee03d7
chore: Update `ALCF/helpers.sh`
saforem2 Apr 23, 2025
3c3eb45
feat: Add `train_alcf.sh`
saforem2 Apr 23, 2025
012800e
Merge branch 'update-ALCF-helpers' into fix/pretrain-gpt-alcf-imports
saforem2 Apr 23, 2025
68b53d9
Merge pull request #84 from argonne-lcf/fix/pretrain-gpt-alcf-imports
saforem2 Apr 23, 2025
f40b7e5
fix: Update `train_alcf.sh`
saforem2 Apr 23, 2025
61f13cf
Merge pull request #82 from argonne-lcf/fix/pretrain-gpt-alcf-imports
saforem2 Apr 30, 2025
85ac175
Merge branch 'main' into update-ALCF-helpers
saforem2 Apr 30, 2025
952940a
fix: Fix `ALCF/helpers.sh`
saforem2 Apr 30, 2025
b136aa5
fix: Replace `eval` with `bash -c` in `train_alcf.sh`
saforem2 Apr 30, 2025
9a3f6bd
chore: Fix unset `CFLAGS` in `ALCF/helpers.sh`
saforem2 May 1, 2025
61ce1c5
chore: Update `train_alcf.sh`
saforem2 May 5, 2025
ba01f41
added lr finder logic
May 6, 2025
d7e12df
fix: Remove call to `set_ccl_vars_on_aurora` in `ALCF/helpers.sh`
saforem2 May 6, 2025
e8efc70
chore: Clean up `train_alcf.sh`
saforem2 May 6, 2025
4669e65
Merge pull request #83 from argonne-lcf/update-ALCF-helpers
saforem2 May 6, 2025
fa73d59
feat: Resolve conflicts in `train_alcf.sh`
saforem2 May 19, 2025
ac0df1d
chore: Update `train_alcf.sh`
saforem2 Jun 16, 2025
48f300f
feat: Add `ALCF/notes/AuroraGPT-70B.md`
saforem2 Jun 16, 2025
5cd5da9
chore: Update `pretrain_gpt_alcf.py`
saforem2 Jun 16, 2025
be64b02
chore: Update `megatron/training.py`
saforem2 Jun 16, 2025
08ab7ee
chore: Update `train_alcf.sh`
saforem2 Jun 16, 2025
8c13fef
chore: Remove tensoboard tracking in `megatron/training_log.py`
saforem2 Jun 16, 2025
3e174cd
chore: Update `ALCF/helpers.sh`
saforem2 Jun 16, 2025
cb939d7
Merge pull request #86 from argonne-lcf/saforem2/dev
saforem2 Jun 16, 2025
ea99a52
Merge branch 'main' into saforem2/training
saforem2 Jun 16, 2025
0ddd003
feat: Remove `--use-mics` flag wen using `ZeRO` 3
saforem2 Jun 17, 2025
43794c1
docs: Update `ALCF/notes/AuroraGPT-70B.md`
saforem2 Jun 17, 2025
351cfeb
docs: Update `ALCF/notes/AuroraGPT-70B.md`
saforem2 Jun 17, 2025
e7157e4
docs: Update `ALCF/notes/AuroraGPT-70B.md`
saforem2 Jun 17, 2025
c9e5879
chore: Update `pretrain_gpt_alcf.py`
saforem2 Jun 18, 2025
b85d33b
chore: Update `megatron/training_log.py`
saforem2 Jun 18, 2025
82e0b2e
Merge pull request #87 from argonne-lcf/saforem2/training
saforem2 Jun 18, 2025
a9620b9
chore: Format `megatron/data/*`
saforem2 Jun 18, 2025
7af703b
chore: Format `megatron/text_generation/*`
saforem2 Jun 18, 2025
103467d
chore: Format `megatron/optimizer/*`
saforem2 Jun 18, 2025
e37ab7d
chore: Format `megatron/mpu/*`
saforem2 Jun 18, 2025
b22a2e1
chore: Format `megatron/tokenizer/*`
saforem2 Jun 18, 2025
ed02e5f
chore: Format `megatron/model/*`
saforem2 Jun 18, 2025
40e942a
chore: Format `megatron/core/*`
saforem2 Jun 18, 2025
f4cd2e1
chore: Format `megatron/*.py`
saforem2 Jun 18, 2025
3a6ffec
chore: Format `megatron/fused_kernels/*.py`
saforem2 Jun 18, 2025
c030485
infinite schedulers and learning rate finder
Jul 3, 2025
fdb0965
micromamba
Jul 11, 2025
a6b7bd2
emb_init branch changes added
Jul 11, 2025
8800bcc
cleaned up
Jul 11, 2025
5464ebf
Merge branch 'lb-optimizers' into saforem2/fix-formatting
saforem2 Jul 12, 2025
ffaafd7
fix: missing changes from `lb-optimizers` <- `saforem2/fix-formatting`
saforem2 Jul 12, 2025
c18e3ad
fix: Fix missing comma in `core/models/gpt/gpt_embedding.py`
saforem2 Jul 12, 2025
e9ca6d0
chore: Update `pretrain_gpt_alcf.py`
saforem2 Jul 12, 2025
a1d5588
chore: Update `ALCF/helpers.sh`
saforem2 Jul 12, 2025
3f668a1
feat: Add `ALCF/notes/debugging.md`
saforem2 Jul 14, 2025
3dc22e9
Update debugging.md
saforem2 Jul 14, 2025
386863b
chore: Update `train_alcf.sh`
saforem2 Jul 15, 2025
82b491f
chore: Update `megatron/training_log.py`
saforem2 Jul 15, 2025
9ffebe1
chore: Update `megatron/training.py`
saforem2 Jul 15, 2025
b4849ae
chore: Update `megatron/optimizer_param_scheduler.py`
saforem2 Jul 15, 2025
683386a
chore: Update `megatron/optimizer/muon.py`
saforem2 Jul 15, 2025
6165561
chore: Update `megatron/optimizer/adopt.py`
saforem2 Jul 15, 2025
558322d
chore: Update `megatron/optimizer/__init__.py`
saforem2 Jul 15, 2025
1656e98
chore: Update `megatron/core/transformer/transformer_config.py`
saforem2 Jul 15, 2025
102d797
chore: Update `megatron/checkpointing.py`
saforem2 Jul 15, 2025
f7f397b
chore: Update `megatron/arguments.py`
saforem2 Jul 15, 2025
c369de6
chore: Update `ALCF/helpers.sh`
saforem2 Jul 15, 2025
99dacb5
chore: Update `ALCF/helpers.sh`
saforem2 Jul 15, 2025
6b08a2d
docs: Update `ALCF/notes/debugging.md`
saforem2 Jul 15, 2025
984cbb1
chore: Update `megatron/training_log.py`
saforem2 Jul 15, 2025
ebb1899
chore: Update `megatron/timers.py`
saforem2 Jul 15, 2025
e3b0398
fixed infinite schedulers bugs and dshampoo name in arguments
Jul 22, 2025
1508ff5
chore: Update `ALCF/README.md`
saforem2 Aug 7, 2025
1224815
feat: Create `train.sh`
saforem2 Aug 7, 2025
6925291
chore: Update `megatron/training_log_alcf.py`
saforem2 Aug 7, 2025
dacd3d2
chore: Update `ALCF/helpers.sh`
saforem2 Aug 15, 2025
b691201
cache indices support
zhenghh04 Aug 16, 2025
d64abca
feat: Add `ALCF/data-lists/aurora/olmo-mix-1124.txt`
saforem2 Aug 21, 2025
7012ebc
chore: Update `train_alcf.sh`
Aug 21, 2025
b11e4f0
Merge branch 'saforem2/fix-formatting' of https://github.com/argonne-…
Aug 21, 2025
90aeb82
chore: Update `ALCF/helpers.sh`
saforem2 Aug 21, 2025
f41b3ab
chore: Update `ALCF/helpers.sh`
saforem2 Aug 21, 2025
df0c30a
chore: Update `megatron/training_log_alcf.py`
saforem2 Aug 21, 2025
eb10947
docs: Add `ALCF/notes/AuroraGPT-small.md`
saforem2 Aug 21, 2025
50050fd
docs: Update `ALCF/notes/AuroraGPT-small.md`
saforem2 Aug 21, 2025
f12d970
feat: Update `ALCF/data-lists/sunspot/books.txt`
Aug 21, 2025
7ab7e35
chore: Update `ALCF/helpers.sh`
saforem2 Aug 22, 2025
456abc6
Added muonclip and fixed lr_finder logic
Aug 24, 2025
1a3653d
Merge branch 'saforem2/fix-formatting' into feature/cache_indices
saforem2 Aug 25, 2025
e9467fa
Updated muonclip lr adjuster
Aug 25, 2025
99b2592
chore: Update `ALCF/helpers.sh`
saforem2 Aug 26, 2025
4eab242
feat: Add `train_aGPT_2B_large_batch.sh`
saforem2 Aug 26, 2025
3848f6f
docs: Update `ALCF/notes/*.md`
saforem2 Aug 26, 2025
96c5a10
chore: Add `train_aGPT_7B_chain.sh`
saforem2 Aug 26, 2025
fc4d167
added cooldown phase option to constant LR decay
Aug 27, 2025
9b46590
feat: Add `train_aGPT_2B_large_batch.sh`
saforem2 Aug 27, 2025
3d83690
Merge branch 'saforem2/fix-formatting' into feature/cache_indices
saforem2 Aug 27, 2025
ec58e99
Merge pull request #93 from argonne-lcf/feature/cache_indices
saforem2 Sep 3, 2025
994f2a1
Merge pull request #88 from argonne-lcf/saforem2/fix-formatting
saforem2 Sep 8, 2025
6b32c95
feat: Initial logic to prevent `NaN`s from crashing training
saforem2 Sep 11, 2025
e966d29
chore: Prevent checkpoint timers from bringing down training
saforem2 Sep 11, 2025
b4a0e7a
chore: Prevent checkpoint timers from bringing down training
saforem2 Sep 11, 2025
c72e2eb
chore: Update `ALCF/helpers.sh`
saforem2 Sep 11, 2025
da80b20
chore: Update `train_aGPT_2B_large_batch.sh`
saforem2 Sep 11, 2025
d2c6684
Merge pull request #14 from argonne-lcf/main
saforem2 Sep 17, 2025
f59df1d
custom optimizers, schedulers, hp tuning and CPT
mngom2 Sep 22, 2025
74b4cf3
Update lb_optimizers_settings_and_cpt.md
mngom2 Sep 22, 2025
d17ccad
details about doing cpt
mngom2 Sep 22, 2025
55f7963
Update and rename lb_optimizers_settings_and_cpt.md to lb_optimizers_…
mngom2 Sep 22, 2025
f077ce6
Update cpt.md
mngom2 Sep 22, 2025
9a9e69c
Update cpt.md
mngom2 Sep 22, 2025
a7825f1
Merge branch 'main' into saforem2/resilient-training
saforem2 Sep 23, 2025
56d0d80
chore: Update `ALCF/helpers.sh`
saforem2 Sep 23, 2025
7d79965
chore: Format `pretrain_gpt_alcf.py`
saforem2 Sep 23, 2025
9fed3b5
chore: Update `train_aGPT_2B_large_batch.sh`
saforem2 Sep 24, 2025
527fe81
Update cpt.md
mngom2 Sep 26, 2025
7d90dfe
Update cpt.md
mngom2 Sep 30, 2025
a513caf
Update cpt.md
mngom2 Sep 30, 2025
1385e26
Merge branch 'main' into saforem2/resilient-training
saforem2 Oct 7, 2025
653a679
Merge pull request #94 from argonne-lcf/saforem2/resilient-training
saforem2 Oct 7, 2025
41d1a41
Merge branch 'main' into main
saforem2 Oct 10, 2025
76bcc88
Merge pull request #15 from argonne-lcf/main
saforem2 Oct 10, 2025
07a7bbd
fix: Update `ALCF/helpers.sh`
saforem2 Oct 12, 2025
889ff8d
docs: Add `ALCF/notes/cooldown.md`
saforem2 Nov 10, 2025
d16272e
docs: Add `ALCF/notes/assets/`
saforem2 Nov 10, 2025
710246a
Add files via upload
saforem2 Nov 10, 2025
0eb4e65
Rename ScreenShot-2025-11-10-125411@2x.png to cooldownHD.png
saforem2 Nov 10, 2025
41d5c07
Replace cooldown image with high-definition version
saforem2 Nov 10, 2025
8da9b0d
Enhance cooldown.md with new title and example details
saforem2 Nov 10, 2025
b654baa
Fix markdown formatting in cooldown.md
saforem2 Nov 10, 2025
8db699a
initial commit
nscottnichols Nov 10, 2025
d68874d
Merge branch 'main' into auto_cooldown_script
saforem2 Nov 11, 2025
76f5f9d
docs: Update `ALCF/notes/cooldown.md`
saforem2 Nov 11, 2025
6c20809
docs: Update `ALCF/notes/cooldown.md`
saforem2 Nov 11, 2025
6b36b3e
fix: Update default `ROPE_THETA` in `ALCF/helpers.sh`
saforem2 Nov 18, 2025
366ce9c
Merge pull request #18 from argonne-lcf/main
saforem2 Dec 10, 2025
c39584d
Update cpt.md
mngom2 Jan 26, 2026
f74d237
Update cpt.md
mngom2 Jan 26, 2026
1f76b5a
Update cpt.md
mngom2 Jan 26, 2026
0f23c94
Update cpt.md
mngom2 Jan 26, 2026
04f5b1e
Update cpt.md
mngom2 Jan 26, 2026
c10a6f2
Update cpt.md
mngom2 Jan 26, 2026
1e55c76
Update cpt.md
mngom2 Jan 26, 2026
7abf24d
Update cpt.md
mngom2 Jan 26, 2026
b73f02d
cpt data mixing image
mngom2 Jan 26, 2026
c5ba8d8
Update cpt.md
mngom2 Jan 26, 2026
0a5ac67
Update cpt.md
mngom2 Jan 26, 2026
e7ad382
Update cpt.md
mngom2 Jan 26, 2026
307f78f
Update cpt.md
mngom2 Jan 26, 2026
749c5d3
Update cpt.md
mngom2 Jan 26, 2026
16fdd88
Update cpt.md
mngom2 Jan 27, 2026
790e1f0
Update cpt.md
mngom2 Jan 27, 2026
87731ff
Update cpt.md
mngom2 Jan 27, 2026
71d3ec9
Update cpt.md
mngom2 Jan 27, 2026
b03749e
Update cpt.md
mngom2 Jan 27, 2026
86d8ac6
chore: Update `tools/gen_cooldown/gen_cooldown_sweep.sh`
saforem2 Jan 27, 2026
248cf22
chore: Update `tools/cooldown_generator/make_cooldown_cmds.py`
saforem2 Jan 27, 2026
fa696cc
Merge branch 'main' into auto_cooldown_script
saforem2 Jan 27, 2026
ab22dc1
Update cpt.md
mngom2 Jan 27, 2026
747b7fe
Create readme.md
mngom2 Jan 27, 2026
c832c04
Add files via upload
mngom2 Jan 27, 2026
5b930b8
Add files via upload
mngom2 Jan 27, 2026
992fb43
Delete ALCF/notes/assets/CPT_data_mixing.png
mngom2 Jan 27, 2026
46f8399
Update cpt.md
mngom2 Jan 27, 2026
a73ccef
Update cpt.md
mngom2 Jan 28, 2026
0e259a2
Update cpt.md
mngom2 Jan 28, 2026
13e0bda
Update cpt.md
mngom2 Jan 28, 2026
dfa754e
Update cpt.md
mngom2 Jan 28, 2026
8f19fe2
Update cpt.md
mngom2 Jan 28, 2026
f02c699
Update cpt.md
mngom2 Jan 28, 2026
89ca4d4
Update cpt.md
mngom2 Jan 28, 2026
76bf567
Update cpt.md
mngom2 Jan 28, 2026
72a6437
Update cpt.md
mngom2 Jan 28, 2026
a037b7f
Update cpt.md
mngom2 Jan 28, 2026
41439c4
Update cpt.md
mngom2 Jan 28, 2026
0a595fb
Update cpt.md
mngom2 Jan 28, 2026
1ebddf5
Update cpt.md
mngom2 Jan 28, 2026
48892e3
Update cpt.md
mngom2 Jan 28, 2026
fc44d13
Update cpt.md
mngom2 Jan 28, 2026
8a28d36
Update cpt.md
mngom2 Jan 28, 2026
07170ac
Update cpt.md
mngom2 Jan 28, 2026
9cb3e8e
Update cpt.md
mngom2 Jan 28, 2026
4092a6f
Update cpt.md
mngom2 Jan 28, 2026
2a0517d
Update cpt.md
mngom2 Jan 28, 2026
d3ca3a4
Update lb_optimizers_settings.md
mngom2 Jan 28, 2026
915cada
Create readme.md
mngom2 Jan 28, 2026
b663188
Update readme.md
mngom2 Jan 28, 2026
bf3c5ce
Rename lb_optimizers_settings.md to large_batch_optimizers_settings.md
mngom2 Jan 28, 2026
ad92695
Update cpt.md
mngom2 Jan 28, 2026
2292e2c
Update large_batch_optimizers_settings.md
mngom2 Jan 28, 2026
5551ac8
Add files via upload
mngom2 Jan 28, 2026
3c26dc7
Update large_batch_optimizers_settings.md
mngom2 Jan 28, 2026
06f86a9
Update large_batch_optimizers_settings.md
mngom2 Jan 28, 2026
c2d9f8b
Update large_batch_optimizers_settings.md
mngom2 Jan 28, 2026
2808ff4
Update large_batch_optimizers_settings.md
mngom2 Jan 29, 2026
3d0fd17
Update large_batch_optimizers_settings.md
mngom2 Jan 30, 2026
63749b9
Merge branch 'main' of https://github.com/argonne-lcf/Megatron-DeepSp…
saforem2 Jan 31, 2026
22b5b24
chore: Update `ALCF/helpers.sh`
saforem2 Feb 9, 2026
88dbdc4
chore: Replace `print` with `logger.info` in `megatron/timers.py`
saforem2 Feb 9, 2026
fc0dad5
chore: Update `megatron/training.py`
saforem2 Feb 9, 2026
08e8728
chore: Update `pretrain_gpt_alcf.py`
saforem2 Feb 9, 2026
6027ea6
chore: Update `train_alcf.sh`
saforem2 Feb 9, 2026
3ea917b
chore: Update `tools/cooldown_generator/make_cooldown_cmds.py`
saforem2 Feb 9, 2026
b1bbe01
chore: Update `ALCF/helpers.sh`
saforem2 Feb 10, 2026
5f11d7b
chore: Update `tools/cooldown_generator/make_cooldown_cmds.py`
saforem2 Feb 10, 2026
b5617ba
chore: Fix https://github.com/argonne-lcf/Megatron-DeepSpeed/pull/96#…
saforem2 Feb 10, 2026
1d600aa
Merge pull request #96 from argonne-lcf/auto_cooldown_script
saforem2 Feb 10, 2026
65d0a41
feat: Add `train_aGPT_2B_sophiag_stage3.sh`
saforem2 Feb 24, 2026
81b1d1a
feat: Add `ALCF/data-lists/aurora/nvidia-math1-code2.txt`
saforem2 Feb 24, 2026
81ac9c5
feat: Add `train_aGPT_2B_sophiag_stage2.sh`
saforem2 Feb 24, 2026
b8629a6
chore: Update `train_aGPT_2B_large_batch.sh`
saforem2 Feb 24, 2026
ff45fa8
feat: Add stage2 data list
saforem2 Feb 24, 2026
ffa3456
Merge branch 'main' into main
saforem2 Mar 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: python

on:
workflow_dispatch:
pull_request:
branches:
'**'
schedule:
- cron: "0 0 * * *"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
unit-tests:
strategy:
matrix:
pyVersion: ["3.10"]
fail-fast: false

runs-on: ubuntu-22.04
container:
image: deepspeed/gh-builder:py${{ matrix.pyVersion }}

steps:
- uses: actions/checkout@v4

- name: environment
run: |
which python
python --version
- name: Install Megatron-DeepSpeed
run: |
pip3 install .
44 changes: 44 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,47 @@
# User Added
.jobenv
**.e[0-9]**
**.o[0-9]**
**.e6**
**.o6**
**.e9**
**.o9**
**.e1**
**.o1**
*.o17*
*.e17*
*.o1
*.e1
deps/*
OUTPUTS/*
ALCF/OUTPUTS/*
*tmp*
*core.*
*old*
!tools/cooldown_generator/
!tools/cooldown_generator/**
*.bak
**index-cache**
**pbslogs**
ezpz
*hostfile*
.deepspeed_env
*.DS_Store
old/*
**venv**
*.json
outputs/
venvs/
wandb/
llama-logs/
checkpoints/
*.gz
*.txt
*.idx
*.bin
*.log
__pycache__

.deepspeed_env
*.bak
.cache/*
Expand Down
Loading