Pythia configs: effective batch 320 + clean one-epoch stop by Smith42 · Pull Request #73 · Smith42/astroPT

Smith42 · 2026-06-14T03:04:06Z

The config/pythia-like scaling-study runs target world size 1 and one pass over the Smith42/galaxies train split (N=8,474,566).

Configs (all 12):

gradient_accumulation_steps 58 -> 54, so the effective batch is 16*20 = 320. One epoch is then floor(N/320) = 26483 steps.
lr_decay_iters 27000*1.1 -> 26500, so the cosine schedule reaches min_lr right as the single epoch ends (max_iters=30000 stays as an unreached upper cap).

scripts/train.py:

Graceful end-of-stream handling: the streamed epoch is exhausted near step ~26.2-26.5k, before max_iters. The bare next(tdl) calls in the training loop, estimate_loss and validate now catch StopIteration and stop cleanly (saving a final checkpoint) instead of crashing.
Data shuffle uses a fixed, rank-independent seed (1337 rather than 1337 + seed_offset). At world size 1 this is a no-op, but it keeps the data order reproducible if these configs are ever run under DDP.

Supersedes #70 (data order is already deterministic at world size 1), #71 (one epoch via batch size + graceful stop rather than max_iters) and #72 (no local sampler needed).

The config/pythia-like scaling-study runs target world size 1 and one pass over the Smith42/galaxies train split (N=8,474,566). Configs (all 12): - gradient_accumulation_steps 5*8 -> 5*4, so the effective batch is 16*20 = 320. One epoch is then floor(N/320) = 26483 steps. - lr_decay_iters 27000*1.1 -> 26500, so the cosine schedule reaches min_lr right as the single epoch ends (max_iters=30000 stays as an unreached upper cap). scripts/train.py: - Graceful end-of-stream handling: the streamed epoch is exhausted near step ~26.2-26.5k, before max_iters. The bare next(tdl) calls in the training loop, estimate_loss and validate now catch StopIteration and stop cleanly (saving a final checkpoint) instead of crashing. - Data shuffle uses a fixed, rank-independent seed (1337 rather than 1337 + seed_offset). At world size 1 this is a no-op, but it keeps the data order reproducible if these configs are ever run under DDP. Supersedes #70 (data order is already deterministic at world size 1), #71 (one epoch via batch size + graceful stop rather than max_iters) and #72 (no local sampler needed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Smith42 · 2026-06-14T03:04:29Z

Solves data ordinance problem by pinning world size to 1

ksd3

This is throttling to single-GPU which is fine for the moment

Smith42 requested a review from ksd3 June 14, 2026 03:04

ksd3 approved these changes Jun 14, 2026

View reviewed changes

ksd3 merged commit 520410e into main Jun 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pythia configs: effective batch 320 + clean one-epoch stop#73

Pythia configs: effective batch 320 + clean one-epoch stop#73
ksd3 merged 1 commit into
mainfrom
fix/pythia-batch320-graceful-stop

Smith42 commented Jun 14, 2026

Uh oh!

Smith42 commented Jun 14, 2026

Uh oh!

ksd3 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Smith42 commented Jun 14, 2026

Uh oh!

Smith42 commented Jun 14, 2026

Uh oh!

ksd3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants