Add Colab training notebooks for all 3 model variants by guirguispierre · Pull Request #2 · guirguispierre/Atomic-1Bit

guirguispierre · 2026-03-11T20:30:02Z

Atomic_1Bit_Train_Stories.ipynb: Stories Base (~1.3M params, TinyStories)
Atomic_1Bit_Train_Pocket.ipynb: Pocket (~10M params, Alpaca)
Atomic_1Bit_Train_Instruct.ipynb: Flagship (~12.5M params, Alpaca)

All notebooks are self-contained with inlined model code, Google Drive persistence, AMP support, and tqdm progress bars.

- Atomic_1Bit_Train_Stories.ipynb: Stories Base (~1.3M params, TinyStories) - Atomic_1Bit_Train_Pocket.ipynb: Pocket (~10M params, Alpaca) - Atomic_1Bit_Train_Instruct.ipynb: Flagship (~12.5M params, Alpaca) All notebooks are self-contained with inlined model code, Google Drive persistence, AMP support, and tqdm progress bars.

Copilot

Pull request overview

Adds three self-contained Google Colab notebooks to train the Atomic-1Bit models (Stories Base, Pocket, Flagship Instruct), including inlined model code, Drive checkpointing, AMP support, and basic progress/visualization.

Changes:

Added a TinyStories training notebook for the ~1.3M “Stories Base” variant.
Added an Alpaca Cleaned training notebook for the ~10M “Pocket” variant (with cosine annealing).
Added an Alpaca Cleaned training notebook for the ~12.5M “Flagship Instruct” variant (warmup+cosine, grad accumulation, logging, checkpoint download).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

File	Description
notebooks/Atomic_1Bit_Train_Stories.ipynb	Colab notebook to train Stories Base with vocab filtering + Drive checkpointing.
notebooks/Atomic_1Bit_Train_Pocket.ipynb	Colab notebook to train Pocket on Alpaca with cosine annealing + Drive checkpointing.
notebooks/Atomic_1Bit_Train_Instruct.ipynb	Colab notebook to train Flagship Instruct with grad accumulation, warmup+cosine schedule, logging, and checkpoint download.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-11T20:35:13Z

+    "        print(f'Loading TinyStories ({split})...')\n",
+    "        self.dataset = load_dataset('roneneldan/TinyStories', split=f'{split}[:10%]')\n",
+    "        self.enc = tiktoken.get_encoding('gpt2')\n",


The notebook intro says it trains on TinyStories, but the dataset loader uses split=f'{split}[:10%]', which silently limits training to 10% of the split. Consider making the percentage an explicit hyperparameter (or defaulting to the full split) and documenting it in the markdown so users aren’t surprised by lower data volume.

Copilot · 2026-03-11T20:35:13Z

+    "LR           = 1e-3\n",
+    "# ----------------------------------------------\n",
+    "\n",
+    "class PocketStoriesDataset:\n",


In the Stories notebook, the dataset wrapper is named PocketStoriesDataset, which is easy to confuse with the Pocket model notebook. Renaming it to something specific like TinyStoriesDataset/StoriesDataset would make the notebook clearer and reduce copy/paste confusion.

Suggested change

"class PocketStoriesDataset:\n",

"class TinyStoriesDataset:\n",

Copilot · 2026-03-11T20:35:14Z

+    "    checkpoint = torch.load(ckpt_path, map_location=device)\n",
+    "    model.load_state_dict(checkpoint.get('model_state_dict', checkpoint))\n",
+    "    if 'step' in checkpoint:\n",
+    "        start_step = checkpoint['step']\n",


When resuming, start_step = checkpoint['step'] will cause the training loop range(start_step, total_steps) to repeat the last saved step (since checkpoints are written after completing step). Consider storing/loading the next step to run (e.g., save step + 1 in the checkpoint, or set start_step = checkpoint['step'] + 1 on load) to avoid duplicate updates.

Suggested change

" start_step = checkpoint['step']\n",

" start_step = checkpoint['step'] + 1\n",

Copilot · 2026-03-11T20:35:14Z

+    "\n",
+    "# Final save\n",
+    "save_dict = {\n",
+    "    'step': total_steps,\n",


The final checkpoint stores 'step': total_steps, but the loop runs for step in range(start_step, total_steps) so the last completed step is total_steps - 1. This off-by-one makes resume logic ambiguous; consider persisting the next step to run (or persist the last completed step consistently) and aligning the final save accordingly.

Suggested change

" 'step': total_steps,\n",

" 'step': step,\n",

Copilot · 2026-03-11T20:35:14Z

+                "# Final save\n",
+                "save_dict = {\n",
+                "    'step': total_steps,\n",
+                "    'model_state_dict': model.state_dict(),\n",
+                "    'optimizer_state_dict': optimizer.state_dict(),\n",
+                "    'scheduler_state_dict': scheduler.state_dict(),\n",
+                "}\n",


Final checkpoint stores 'step': total_steps, but the loop runs range(start_step, total_steps) so the last completed step index is total_steps - 1. Align the persisted step semantics (either save the next step to run, or save the last completed step consistently) to avoid confusion and off-by-one behavior on resume.

Copilot · 2026-03-11T20:35:15Z

+                "    if 'model_state_dict' in checkpoint:\n",
+                "        model.load_state_dict(checkpoint['model_state_dict'])\n",
+                "        if 'step' in checkpoint:\n",
+                "            start_step = checkpoint['step']\n",


On resume, start_step = checkpoint['step'] will cause the loop range(start_step, total_steps) to repeat the last saved step (since checkpoints are written after completing step). Consider persisting/loading the next step to run (e.g., save step + 1, or set start_step = checkpoint['step'] + 1 after loading).

Suggested change

" start_step = checkpoint['step']\n",

" start_step = checkpoint['step'] + 1\n",

Copilot · 2026-03-11T20:35:15Z

+                "scheduler = get_cosine_schedule_with_warmup(optimizer, WARMUP_STEPS, total_steps)\n",
+                "\n",
+                "if start_step > 0:\n",
+                "    for _ in range(start_step):\n",
+                "        scheduler.step()\n",
+                "    if checkpoint and 'scheduler_state_dict' in checkpoint:\n",
+                "        try:\n",
+                "            scheduler.load_state_dict(checkpoint['scheduler_state_dict'])\n",
+                "            print('   Scheduler state restored.')\n",
+                "        except:\n",
+                "            print('   Warning: Could not restore scheduler, using re-computed state.')\n",


The scheduler restore logic does for _ in range(start_step): scheduler.step() and then potentially loads scheduler_state_dict. This does unnecessary work (and can be very slow for large start_step), and also triggers the common "lr_scheduler.step() before optimizer.step()" warning. Prefer constructing the scheduler with last_epoch=start_step-1 (or calling scheduler.step(start_step) once) when no state dict is available, and when a scheduler_state_dict exists, load it directly without pre-stepping.

Suggested change

"scheduler = get_cosine_schedule_with_warmup(optimizer, WARMUP_STEPS, total_steps)\n",

"\n",

"if start_step > 0:\n",

" for _ in range(start_step):\n",

" scheduler.step()\n",

" if checkpoint and 'scheduler_state_dict' in checkpoint:\n",

" try:\n",

" scheduler.load_state_dict(checkpoint['scheduler_state_dict'])\n",

" print(' Scheduler state restored.')\n",

" except:\n",

" print(' Warning: Could not restore scheduler, using re-computed state.')\n",

"scheduler = get_cosine_schedule_with_warmup(optimizer, WARMUP_STEPS, total_steps, last_epoch=start_step-1)\n",

"\n",

"if start_step > 0 and checkpoint and 'scheduler_state_dict' in checkpoint:\n",

" try:\n",

" scheduler.load_state_dict(checkpoint['scheduler_state_dict'])\n",

" print(' Scheduler state restored.')\n",

" except:\n",

" print(' Warning: Could not restore scheduler, using re-computed state.')\n",

Copilot · 2026-03-11T20:35:15Z

+                "# Final save\n",
+                "save_dict = {\n",
+                "    'step': total_steps,\n",
+                "    'model_state_dict': model.state_dict(),\n",
+                "    'optimizer_state_dict': optimizer.state_dict(),\n",
+                "    'scheduler_state_dict': scheduler.state_dict(),\n",
+                "    'rng_state': torch.get_rng_state(),\n",
+                "    'np_rng_state': np.random.get_state(),\n",
+                "    'config': {\n",
+                "        'vocab_size': VOCAB_SIZE, 'dim': DIM, 'depth': DEPTH,\n",
+                "        'heads': HEADS, 'context_length': CONTEXT_LEN,\n",
+                "    },\n",
+                "}\n",
+                "torch.save(save_dict, ckpt_path)\n",
+                "print(f'\\n✅ Training complete! Checkpoint saved to {ckpt_path}')"


Final checkpoint stores 'step': total_steps, but the loop runs range(start_step, total_steps), meaning the last completed step index is total_steps - 1. Align the meaning of the stored step (last completed vs next-to-run) so resuming doesn't introduce off-by-one behavior.

Copilot · 2026-03-11T20:35:16Z

+                "    if 'model_state_dict' in checkpoint:\n",
+                "        model.load_state_dict(checkpoint['model_state_dict'])\n",
+                "        if 'step' in checkpoint:\n",
+                "            start_step = checkpoint['step']\n",


Similar to the Stories notebook, resuming sets start_step = checkpoint['step'], but the checkpoint is saved after completing step, so range(start_step, total_steps) will repeat that step on resume. Consider saving/loading the next step to run (e.g., persist step + 1 or increment start_step after loading).

Suggested change

" start_step = checkpoint['step']\n",

" start_step = checkpoint['step'] + 1\n",

Copilot · 2026-03-11T20:35:16Z

+                "# Restore scheduler if available\n",
+                "if os.path.exists(ckpt_path) and 'scheduler_state_dict' in checkpoint:\n",
+                "    try:\n",
+                "        scheduler.load_state_dict(checkpoint['scheduler_state_dict'])\n",
+                "        print('   Scheduler state restored.')\n",
+                "    except:\n",
+                "        pass\n",
+                "\n",


The LR scheduler configuration is inconsistent with the resume/extension logic: total_steps = start_step + ADDITIONAL_STEPS extends training beyond the original run, but the scheduler uses T_max=ADDITIONAL_STEPS and then (optionally) restores scheduler_state_dict. If you resume and keep training past T_max, CosineAnnealingLR will start increasing LR again. Consider either (a) setting T_max to the intended total training horizon and advancing it to start_step, or (b) treating each run as a fresh schedule and not restoring scheduler_state_dict when total_steps is extended.

Suggested change

"# Restore scheduler if available\n",

"if os.path.exists(ckpt_path) and 'scheduler_state_dict' in checkpoint:\n",

" try:\n",

" scheduler.load_state_dict(checkpoint['scheduler_state_dict'])\n",

" print(' Scheduler state restored.')\n",

" except:\n",

" pass\n",

"\n",

Copilot AI review requested due to automatic review settings March 11, 2026 20:30

Copilot started reviewing on behalf of guirguispierre March 11, 2026 20:30 View session

guirguispierre closed this Mar 11, 2026

guirguispierre deleted the colab-flat branch March 11, 2026 20:30

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Colab training notebooks for all 3 model variants#2

Add Colab training notebooks for all 3 model variants#2
guirguispierre wants to merge 1 commit into
masterfrom
colab-flat

guirguispierre commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Copilot AI Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"class PocketStoriesDataset:\n",
	"class TinyStoriesDataset:\n",

	" start_step = checkpoint['step']\n",
	" start_step = checkpoint['step'] + 1\n",

Conversation

guirguispierre commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants