branchsim

A trace-driven branch predictor simulator in modern C++17. It implements six classic predictors (always-taken, bimodal, GShare, Yeh-Patt two-level with PAg/GAp/GAg variants, tournament, perceptron) behind a single virtual Predictor interface, ships a synthetic trace generator so it runs out of the box without any external workload files, and includes a CSV-driven sweep + matplotlib pipeline for comparing configurations on equal storage budgets.

Quickstart

cmake -S . -B build && cmake --build build -j
ctest --test-dir build --output-on-failure        # 36 unit tests

# generate a 200K-branch mixed workload
./build/branchsim --gen mixed --out traces/mixed.trace --count 200000 --seed 1

# run a predictor on it
./build/branchsim --predictor gshare --pht-bits 12 --hist-bits 12 \
                  --trace traces/mixed.trace --csv results.csv

# regenerate the sample plot end to end
./scripts/run_sweep.sh          # writes results/*.csv and docs/sample_results.png

CLI surface:

branchsim --gen <pattern> --out <file> --count N [--seed S] [--loop-period N] [--bias P]
   patterns: always_taken | alternating | loop | biased | mixed

branchsim --predictor <kind> --trace <file> [--csv out.csv] [--append]
   kinds: always_taken | bimodal | gshare | two_level | tournament | perceptron
   knobs: --pht-bits B  --hist-bits B  --two-level GAg|GAp|PAg
          --bht-bits B  --local-hist B  --chooser-bits B
          --num-perceptrons N  --perceptron-hist B
          --accuracy-log path.csv  --accuracy-window N  (per-window CSV)

Trace format (one branch per line):

# comment line
<hex_pc> <T|N> [<hex_target>]

Predictors

Always-Taken — baseline. Returns taken for every branch. Useful as a floor and for sanity-checking the framework.
Bimodal — one 2-bit saturating counter per PC-hash bucket. Captures per-branch bias but can't see history, so anything alternating destroys it.
GShare — bimodal's PHT, but indexed by (pc XOR global_history). Lets correlated branches land in different table entries; learns alternation with one history bit and learns loop-N once history_bits ≥ N.
Two-Level (Yeh-Patt) — three variants behind one class.
- GAg: single global history register, single global PHT.
- GAp: global history but a separate PHT per PC-set.
- PAg: per-PC history register feeding a single global PHT (each PC effectively gets its own learned recent-pattern).
Tournament — bimodal and GShare in parallel, plus a per-PC chooser table of 2-bit counters that swings toward whichever was right when they disagree.
Perceptron (Jiménez & Lin, 2001) — each PC indexes a row of small signed weights; prediction is the sign of the dot product with global history (bits encoded as ±1). Trains only when wrong or when |y| ≤ θ.

Sample results

scripts/run_sweep.sh runs every predictor on a 200K-branch mixed trace and renders three panels:

On the mixed trace at typical 12-bit configs:

Predictor	Misprediction rate	Storage
perceptron (n=256, hist=24)	17.4%	~52 Kb
two_level PAg (bht=10, lh=12)	18.4%	~12 Kb
tournament (12/12/12)	18.4%	~16 Kb
gshare (pht=12, hist=12)	18.6%	~8 Kb
two_level GAg (lh=12)	19.9%	~8 Kb
always_taken	29.1%	0
bimodal (pht=12)	30.7%	~8 Kb

Bimodal does worse than always-taken here because the mixed workload includes an alternating lane that pulls its 2-bit counter into the wrong middle state — exactly the failure mode GShare and the perceptron exist to fix.

Design notes

Interface. Predictor::predict(pc) is const and side-effect-free; update(pc, taken) takes the ground truth separately. The simulator loop is predict → grade → update, which keeps unit tests trivial (assert what predict returns, then drive update with known outcomes) and means a tournament predictor can poll its sub-predictors twice — once to pick, once to grade — without paying for cached state. Cost is one virtual call per branch; the bench shows ~315M branches/s on bimodal which is plenty for sweeps.

Memory layout. Every predictor sizes its tables exactly at construction (std::vector<uint8_t> for 2-bit counters, std::vector<int8_t> for perceptron weights). No reallocations, no per-branch allocations, predictable cache behavior. Each predictor reports storage_bits() so the bar chart compares like-with-like instead of comparing apples to TAGE.

Synthetic generator. Five patterns: always-taken, alternating, loop-N (N-1 takens then a fallthrough — mirrors a backward branch closing a loop), biased-random (Bernoulli with seeded mt19937_64 so traces reproduce), and a mixed workload that round-robins four PCs each running its own sub-pattern. The mixed trace is the interesting one: it forces predictors to actually use per-PC information and history, which a single-pattern stream wouldn't.

Two-level test gotcha. For PAg/GAp the BHR shifts on every update, so the test pattern predict A, predict B, update A, update B predicts B at one BHR but trains B's PHT entry at a different BHR. The interface's predict and update must be paired per-branch (predict A → update A → predict B → update B) for correctness — that's encoded as a comment in the tests so the next person doesn't relearn it.

What's deliberately omitted. No TAGE / TAGE-SC-L (the modern frontier), no loop predictor, no return-address stack, no indirect-target prediction, no BTB. All of those would matter for an apples-to-apples comparison with a real CPU front-end; this project is scoped at the "classic taken/not-taken predictors" level.

Future work

TAGE-SC-L for a modern reference point.
Real workload ingestion (SPEC2017, ChampSim trace format).
BTB modeling and indirect-target prediction.
Hardware-budget-aware sweeps that hold storage constant and search over the rest of the config.
Per-PC misprediction breakdown — knowing which branches the predictor whiffs on is more informative than the aggregate rate.

Repo layout

include/branchsim/   public headers (predictor.hpp, config.hpp, factory.hpp, trace.hpp)
src/predictors/      one .cpp per predictor
src/sim.cpp          CLI entry
src/trace.cpp        reader + synthetic generator
tests/               GoogleTest, one file per predictor + trace
bench/               google-benchmark throughput suite (off by default)
scripts/             plot.py and run_sweep.sh
traces/              small checked-in demo traces (sweep regenerates larger ones)
docs/                sample_results.png

Build options: -DBRANCHSIM_BUILD_TESTS=ON (default) and -DBRANCHSIM_BUILD_BENCH=ON (off by default; pulls google-benchmark via FetchContent).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

branchsim

Quickstart

Predictors

Sample results

Design notes

Future work

Repo layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bench		bench
docs		docs
include/branchsim		include/branchsim
scripts		scripts
src		src
tests		tests
traces		traces
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

branchsim

Quickstart

Predictors

Sample results

Design notes

Future work

Repo layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages