A trace-driven branch predictor simulator in modern C++17. It implements six classic predictors (always-taken, bimodal, GShare, Yeh-Patt two-level with PAg/GAp/GAg variants, tournament, perceptron) behind a single virtual Predictor interface, ships a synthetic trace generator so it runs out of the box without any external workload files, and includes a CSV-driven sweep + matplotlib pipeline for comparing configurations on equal storage budgets.
cmake -S . -B build && cmake --build build -j
ctest --test-dir build --output-on-failure # 36 unit tests
# generate a 200K-branch mixed workload
./build/branchsim --gen mixed --out traces/mixed.trace --count 200000 --seed 1
# run a predictor on it
./build/branchsim --predictor gshare --pht-bits 12 --hist-bits 12 \
--trace traces/mixed.trace --csv results.csv
# regenerate the sample plot end to end
./scripts/run_sweep.sh # writes results/*.csv and docs/sample_results.pngCLI surface:
branchsim --gen <pattern> --out <file> --count N [--seed S] [--loop-period N] [--bias P]
patterns: always_taken | alternating | loop | biased | mixed
branchsim --predictor <kind> --trace <file> [--csv out.csv] [--append]
kinds: always_taken | bimodal | gshare | two_level | tournament | perceptron
knobs: --pht-bits B --hist-bits B --two-level GAg|GAp|PAg
--bht-bits B --local-hist B --chooser-bits B
--num-perceptrons N --perceptron-hist B
--accuracy-log path.csv --accuracy-window N (per-window CSV)
Trace format (one branch per line):
# comment line
<hex_pc> <T|N> [<hex_target>]
- Always-Taken — baseline. Returns
takenfor every branch. Useful as a floor and for sanity-checking the framework. - Bimodal — one 2-bit saturating counter per PC-hash bucket. Captures per-branch bias but can't see history, so anything alternating destroys it.
- GShare — bimodal's PHT, but indexed by
(pc XOR global_history). Lets correlated branches land in different table entries; learns alternation with one history bit and learns loop-N oncehistory_bits ≥ N. - Two-Level (Yeh-Patt) — three variants behind one class.
- GAg: single global history register, single global PHT.
- GAp: global history but a separate PHT per PC-set.
- PAg: per-PC history register feeding a single global PHT (each PC effectively gets its own learned recent-pattern).
- Tournament — bimodal and GShare in parallel, plus a per-PC chooser table of 2-bit counters that swings toward whichever was right when they disagree.
- Perceptron (Jiménez & Lin, 2001) — each PC indexes a row of small signed weights; prediction is the sign of the dot product with global history (bits encoded as ±1). Trains only when wrong or when
|y| ≤ θ.
scripts/run_sweep.sh runs every predictor on a 200K-branch mixed trace and renders three panels:
On the mixed trace at typical 12-bit configs:
| Predictor | Misprediction rate | Storage |
|---|---|---|
| perceptron (n=256, hist=24) | 17.4% | ~52 Kb |
| two_level PAg (bht=10, lh=12) | 18.4% | ~12 Kb |
| tournament (12/12/12) | 18.4% | ~16 Kb |
| gshare (pht=12, hist=12) | 18.6% | ~8 Kb |
| two_level GAg (lh=12) | 19.9% | ~8 Kb |
| always_taken | 29.1% | 0 |
| bimodal (pht=12) | 30.7% | ~8 Kb |
Bimodal does worse than always-taken here because the mixed workload includes an alternating lane that pulls its 2-bit counter into the wrong middle state — exactly the failure mode GShare and the perceptron exist to fix.
Interface. Predictor::predict(pc) is const and side-effect-free; update(pc, taken) takes the ground truth separately. The simulator loop is predict → grade → update, which keeps unit tests trivial (assert what predict returns, then drive update with known outcomes) and means a tournament predictor can poll its sub-predictors twice — once to pick, once to grade — without paying for cached state. Cost is one virtual call per branch; the bench shows ~315M branches/s on bimodal which is plenty for sweeps.
Memory layout. Every predictor sizes its tables exactly at construction (std::vector<uint8_t> for 2-bit counters, std::vector<int8_t> for perceptron weights). No reallocations, no per-branch allocations, predictable cache behavior. Each predictor reports storage_bits() so the bar chart compares like-with-like instead of comparing apples to TAGE.
Synthetic generator. Five patterns: always-taken, alternating, loop-N (N-1 takens then a fallthrough — mirrors a backward branch closing a loop), biased-random (Bernoulli with seeded mt19937_64 so traces reproduce), and a mixed workload that round-robins four PCs each running its own sub-pattern. The mixed trace is the interesting one: it forces predictors to actually use per-PC information and history, which a single-pattern stream wouldn't.
Two-level test gotcha. For PAg/GAp the BHR shifts on every update, so the test pattern predict A, predict B, update A, update B predicts B at one BHR but trains B's PHT entry at a different BHR. The interface's predict and update must be paired per-branch (predict A → update A → predict B → update B) for correctness — that's encoded as a comment in the tests so the next person doesn't relearn it.
What's deliberately omitted. No TAGE / TAGE-SC-L (the modern frontier), no loop predictor, no return-address stack, no indirect-target prediction, no BTB. All of those would matter for an apples-to-apples comparison with a real CPU front-end; this project is scoped at the "classic taken/not-taken predictors" level.
- TAGE-SC-L for a modern reference point.
- Real workload ingestion (SPEC2017, ChampSim trace format).
- BTB modeling and indirect-target prediction.
- Hardware-budget-aware sweeps that hold storage constant and search over the rest of the config.
- Per-PC misprediction breakdown — knowing which branches the predictor whiffs on is more informative than the aggregate rate.
include/branchsim/ public headers (predictor.hpp, config.hpp, factory.hpp, trace.hpp)
src/predictors/ one .cpp per predictor
src/sim.cpp CLI entry
src/trace.cpp reader + synthetic generator
tests/ GoogleTest, one file per predictor + trace
bench/ google-benchmark throughput suite (off by default)
scripts/ plot.py and run_sweep.sh
traces/ small checked-in demo traces (sweep regenerates larger ones)
docs/ sample_results.png
Build options: -DBRANCHSIM_BUILD_TESTS=ON (default) and -DBRANCHSIM_BUILD_BENCH=ON (off by default; pulls google-benchmark via FetchContent).
