Skip to content

Set scaling-study configs to exactly one epoch#71

Open
ksd3 wants to merge 1 commit into
mainfrom
feat/one-epoch-budget
Open

Set scaling-study configs to exactly one epoch#71
ksd3 wants to merge 1 commit into
mainfrom
feat/one-epoch-budget

Conversation

@ksd3

@ksd3 ksd3 commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

max_iters = 30000 in the config/pythia-like configs was an inherited nanoGPT placeholder, not derived from the dataset. At effective batch 640 that's 19.2M presentations = ~2.27 passes over the 8,474,566-galaxy Smith42/galaxies train split.

This sets max_iters = 13241 = floor(8,474,566 / 640) — one pass over the train set (~2.17B image-patch tokens, ≈ Chinchilla-optimal for the 100M run) — and shrinks lr_decay_iters to match so the cosine LR fully decays to min_lr by the end of the run rather than stopping mid-decay.

(Note: 13241 is one pass minus a 326-example remainder, since 8,474,566 isn't divisible by the effective batch; truly exact one-pass coverage under DDP needs len-aware sampling, which is a separate change.)

max_iters was an inherited placeholder (30000), which at effective batch
640 is 19.2M presentations = ~2.27 passes over the 8,474,566-galaxy train
set. Set max_iters = 13241 = floor(8,474,566 / 640) so each run does
exactly one epoch (~2.17B tokens, ~Chinchilla-optimal for the 100M model),
and shrink lr_decay_iters to match so the cosine LR fully decays to min_lr
by the end of the run rather than stopping mid-decay.
@ksd3 ksd3 force-pushed the feat/one-epoch-budget branch 2 times, most recently from 8abd9ec to 8f8ad66 Compare June 13, 2026 21:50
@ksd3 ksd3 requested a review from Smith42 June 13, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant