Train and run modern small models fast on everyday CPUs — simple, transparent, and reproducible.
- Why Syntrix‑Base?
- Highlights
- Requirements
- Quickstart (pip and from source)
- Usage (Train, Sample, Eval, Config)
- Configuration overrides
- Reproducibility & Determinism
- Benchmarks
- Troubleshooting
- Contributing & Governance
- License
Syntrix‑Base is a CPU‑first, deterministic learning toolkit for tiny but modern models. It emphasizes clarity over complexity: clean PyTorch code, reproducible logs, and practical CLIs that work well on everyday hardware.
- CPU‑first ergonomics: pinned threads, deterministic seeds, dtype control
- Tiny but modern models: GPT‑mini, SSM‑mini, RNN‑mini
- Reproducible logging: JSONL logs with tokens/sec and environment
- Optional
torch.compilewith CLI toggle and auto validation
- Python >= 3.9
- Linux/macOS (Windows may work via WSL)
- PyTorch (installed automatically via
pip install -e .)
pip install syntrixgit clone https://github.com/paredezadrian/syntrix-base.git
cd syntrix-base
python3 -m venv venv && source venv/bin/activate
pip install --upgrade pip
pip install -e .mkdir -p data
wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt -O data/tinyshakespeare.txtOr download a mini text8 sample via CLI when training: add --download.text8_mini.
syntrix.train \
--config configs/gpt-mini.yaml \
--data.file data/tinyshakespeare.txt \
--train_steps 300 --eval_every 100 --save_every 150 \
--threads 4 \
--out_dir runs/gpt-mini_baseEnable torch.compile and auto‑validate throughput (accept only if >= 5% faster):
syntrix.train \
--config configs/gpt-mini.yaml \
--data.file data/tinyshakespeare.txt \
--threads 4 \
--train_steps 300 --eval_every 100 --save_every 150 \
--compile --compile.validate --compile.auto --compile.min_improvement 1.05 \
--out_dir runs/gpt-mini_compile_autoOutputs:
- Checkpoints:
runs/<name>/ckpt.pt - Logs (JSONL):
runs/<name>/log.jsonlwithstep,val_bpc,tokens_per_s,lr, and an initialenvrecord (Python, PyTorch, threads, dtype, compiled flag).
syntrix.sample \
--ckpt runs/gpt-mini_base/ckpt.pt \
--data.file data/tinyshakespeare.txt \
--max_new_tokens 200 --temp 0.9# Evaluate a checkpoint (reports validation BPC)
syntrix.eval --data.file data/tinyshakespeare.txt --ckpt runs/gpt-mini_base/ckpt.pt
# Validate and inspect a YAML config
syntrix.config --config configs/gpt-mini.yaml- Base configs live in
configs/(e.g.,configs/gpt-mini.yaml). - You can override most settings via CLI flags. Some use dot notation (e.g.,
--data.file,--download.text8_mini). - Examples:
# Increase layers and reduce batch using dot-notation overrides
syntrix.train --config configs/gpt-mini.yaml --data.file data/tinyshakespeare.txt \
--model.n_layer 6 --train.batch_size 16- Precision: switch default dtype with
--dtype float32|float64. Numeric tests use dtype‑aware tolerances.
List options:
python -m syntrix.cli_train --help
python -m syntrix.cli_sample --helpNotable flags:
--threads <int>: setstorch.set_num_threadsand pins MKL/OMP threads--compile: enabletorch.compileif available--compile.validate --compile.auto --compile.min_improvement 1.05: benchmark forward throughput and auto‑enable compile only if faster--tokenizer <char|bpe>and--bpe_vocab_size <int>--use_mmap: use memory‑mapped data loader for large files
- Seeds and threads are initialized consistently via
syntrix.utils.seed - Logs record environment details (threads, Python, PyTorch, dtype)
- Tests cover determinism and tolerance‑aware numeric checks
For reproducible commands and example results tables, see docs/benchmarks.md and architecture/FAQ in docs/architecture.md.
- Non‑deterministic results:
- Ensure
--seedand--threadsare set; checkOMP_NUM_THREADSandMKL_NUM_THREADS.
- Ensure
- Slow throughput:
- Use smaller
--block_size, small--microbatchwith higher--grad_accum, and try--compile --compile.validate --compile.auto.
- Use smaller
- Memory constraints:
- Use
--data.use_mmapfor memory‑mapped random block sampling on large files.
- Use
We welcome contributions of all kinds: bug fixes, features, docs, and benchmarks.
- Please read
CONTRIBUTING.mdfor our contribution process, standards, and PR guidelines - All participants are expected to follow our
CODE_OF_CONDUCT.md
- Issues: Use GitHub Issues for bug reports and feature requests. Include OS, Python, and PyTorch versions, steps to reproduce, and expected vs. actual behavior.
- CI: Pull requests must pass GitHub Actions (pytest on Python 3.10/3.11/3.12).
MIT with Commons Clause (non‑commercial). See LICENSE for details.