A learning-focused SystemVerilog NPU prototype with Verilator-based regression tests.
Validated by CI/tests in sim/verilator:
- RTL modules for MAC, systolic array, controller/memory/engines scaffolding
- Verilator regression binaries:
test_mac_unittest_systolic_arraytest_npu_smoketest_integrationtest_gpt2_block
- Deterministic output harness (
make benchmark-deterministic)
- Full end-to-end LLM inference fidelity/performance
- Complete microcode/engine feature parity with architecture spec
- FPGA timing/resource closure and hardware bring-up
- Expanded lint/warning cleanup across all RTL modules
See docs/ARCHITECTURE.md and roadmap issues for details.
# Build simulation
cmake -S sim/verilator -B sim/verilator/build
cmake --build sim/verilator/build -j$(nproc)
# Run all tests
ctest --test-dir sim/verilator/build --output-on-failure
# Deterministic baseline harness (defaults to RUNS=3)
make benchmark-deterministic
# Override repeat count (N runs + hash compare)
RUNS=5 make benchmark-deterministicGitHub Actions runs three checks:
stable-regressionfull-ctestlint
Branch protection hardening guide: docs/CI_BRANCH_PROTECTION.md
tiny-npu/
├── rtl/ # SystemVerilog RTL
├── sim/verilator/ # Verilator testbenches + CMake
├── python/ # Model/data helper tooling
├── docs/ # Architecture + process docs
├── benchmarks/ # Deterministic benchmark harness + baseline
└── .github/workflows/ # CI
This repository includes a minimal end-to-end path that uses real HuggingFace GPT-2-family weights (default: sshleifer/tiny-gpt2) and reports:
- reference generation from full HF model
- simulated token from an INT8 projection-only path (real hidden-state + real
lm_head, first-token only)
python -m python.run_tiny_llm_sim --prepare --prompt "hello"This runs export + quantization/packing. Pack assumptions are recorded in demo_data/quant_manifest.json.
python -m python.run_tiny_llm_sim \
--prompt "Hello tiny NPU" \
--max-new-tokens 16 \
--temperature 0.9 --top-k 40 --top-p 0.95 --seed 42Optional smoke check integration (if Verilator build exists):
python -m python.run_tiny_llm_sim --prompt "Hello tiny NPU" --run-verilator-smokepython -m python.run_tiny_llm_sim --interactive --max-new-tokens 16 --temperature 0.9 --top-k 40 --top-p 0.95 --seed 42python -m unittest python/tests/test_tiny_llm_smoke.pyNote: this smoke test auto-skips when model dependencies/download are unavailable in the environment.
python3 -m python.eval_first_token --prepare
# writes:
# benchmarks/results/first_token_eval/first_token_eval.csv
# benchmarks/results/first_token_eval/summary.jsonThis gives you a prompt-set match rate so improvements can be measured over time.
python3 -m python.eval_prompt_variation
# writes:
# benchmarks/results/prompt_variation/prompt_variation.csv
# benchmarks/results/prompt_variation/summary.jsonThis reports unique first-token count and variation ratio across a prompt set.
- RTL path is not yet wired for full GPT-2 token generation.
simulatedremains first-token projection-only INT8 emulation, not full block-by-block RTL execution.- Multi-token autoregressive decode and KV-cache handling are not yet implemented in hardware flow.
Read CONTRIBUTING.md before opening a PR.
MIT License - See LICENSE