NanoSeek

NanoSeek is a compact, from-scratch DeepSeek-style language model project focused on:

MLA (Multi-head Latent Attention)
MoE (Mixture of Experts)
MTP (Multi-Token Prediction)

Inspired by nanochat: minimal, hackable, and research-friendly training code.

The codebase is built for fast iteration: readable modules, strong tests, and one main training entrypoint.

Current Scope

This repository is centered on pre-training experiments and architecture validation.

Main package: nanoseek/
Main trainer: scripts/pre_train.py
Main model: nanoseek/model.py
Main config system: nanoseek/config.py

Project Layout

nanoseek/: core model, config, optimizer, tokenizer, dataloader, checkpoint manager
scripts/: training and utility scripts
eval/: evaluation metrics and diagnostics
tests/: unit and integration tests
runs/: run helper scripts

Quick Start

1) Install

cd nanoseek
pip install -e .

Optional training dependencies:

pip install -e .[training]

Optional dev/test dependencies:

pip install -e .[dev]

2) Run tests

pytest tests/ -v

3) Smoke train (CPU/Mac friendly)

python -m nanoseek.scripts.pre_train \
  --scale ablation \
  --num-iterations 20 \
  --device-batch-size 1 \
  --eval-tokens 512 \
  --eval-every -1

4) Standard training run

python -m nanoseek.scripts.pre_train \
  --run gate1-smoke \
  --scale ablation \
  --seed 42 \
  --num-iterations 100 \
  --eval-every 50 \
  --save-every 100 \
  --device-batch-size 4

Key CLI Flags

--scale: anchor, ablation, 1b, or d<N>
--depth: override scale with depth-based config
--target-flops: FLOPs-budgeted run
--target-param-data-ratio: token budget from scaling params
--no-mtp: disable MTP ablation
--aux-loss-type {bias,classic}: MoE balancing mode
--resume-from-step: resume from checkpoint

For full ablation controls, refer to scripts/pre_train.py and nanoseek/config.py.

Notes

Checkpoints are written under checkpoints/.
The training script supports single GPU and distributed launches.
EMA weights are tracked for evaluation stability.

License

Internal/research use unless specified otherwise by the repository owner.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
nanoseek		nanoseek
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoSeek

Current Scope

Project Layout

Quick Start

1) Install

2) Run tests

3) Smoke train (CPU/Mac friendly)

4) Standard training run

Key CLI Flags

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NanoSeek

Current Scope

Project Layout

Quick Start

1) Install

2) Run tests

3) Smoke train (CPU/Mac friendly)

4) Standard training run

Key CLI Flags

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages