Skip to content

jkthom/branchsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

branchsim

A trace-driven branch predictor simulator in modern C++17. It implements six classic predictors (always-taken, bimodal, GShare, Yeh-Patt two-level with PAg/GAp/GAg variants, tournament, perceptron) behind a single virtual Predictor interface, ships a synthetic trace generator so it runs out of the box without any external workload files, and includes a CSV-driven sweep + matplotlib pipeline for comparing configurations on equal storage budgets.

Quickstart

cmake -S . -B build && cmake --build build -j
ctest --test-dir build --output-on-failure        # 36 unit tests

# generate a 200K-branch mixed workload
./build/branchsim --gen mixed --out traces/mixed.trace --count 200000 --seed 1

# run a predictor on it
./build/branchsim --predictor gshare --pht-bits 12 --hist-bits 12 \
                  --trace traces/mixed.trace --csv results.csv

# regenerate the sample plot end to end
./scripts/run_sweep.sh          # writes results/*.csv and docs/sample_results.png

CLI surface:

branchsim --gen <pattern> --out <file> --count N [--seed S] [--loop-period N] [--bias P]
   patterns: always_taken | alternating | loop | biased | mixed

branchsim --predictor <kind> --trace <file> [--csv out.csv] [--append]
   kinds: always_taken | bimodal | gshare | two_level | tournament | perceptron
   knobs: --pht-bits B  --hist-bits B  --two-level GAg|GAp|PAg
          --bht-bits B  --local-hist B  --chooser-bits B
          --num-perceptrons N  --perceptron-hist B
          --accuracy-log path.csv  --accuracy-window N  (per-window CSV)

Trace format (one branch per line):

# comment line
<hex_pc> <T|N> [<hex_target>]

Predictors

  • Always-Taken — baseline. Returns taken for every branch. Useful as a floor and for sanity-checking the framework.
  • Bimodal — one 2-bit saturating counter per PC-hash bucket. Captures per-branch bias but can't see history, so anything alternating destroys it.
  • GShare — bimodal's PHT, but indexed by (pc XOR global_history). Lets correlated branches land in different table entries; learns alternation with one history bit and learns loop-N once history_bits ≥ N.
  • Two-Level (Yeh-Patt) — three variants behind one class.
    • GAg: single global history register, single global PHT.
    • GAp: global history but a separate PHT per PC-set.
    • PAg: per-PC history register feeding a single global PHT (each PC effectively gets its own learned recent-pattern).
  • Tournament — bimodal and GShare in parallel, plus a per-PC chooser table of 2-bit counters that swings toward whichever was right when they disagree.
  • Perceptron (Jiménez & Lin, 2001) — each PC indexes a row of small signed weights; prediction is the sign of the dot product with global history (bits encoded as ±1). Trains only when wrong or when |y| ≤ θ.

Sample results

scripts/run_sweep.sh runs every predictor on a 200K-branch mixed trace and renders three panels:

sample results

On the mixed trace at typical 12-bit configs:

Predictor Misprediction rate Storage
perceptron (n=256, hist=24) 17.4% ~52 Kb
two_level PAg (bht=10, lh=12) 18.4% ~12 Kb
tournament (12/12/12) 18.4% ~16 Kb
gshare (pht=12, hist=12) 18.6% ~8 Kb
two_level GAg (lh=12) 19.9% ~8 Kb
always_taken 29.1% 0
bimodal (pht=12) 30.7% ~8 Kb

Bimodal does worse than always-taken here because the mixed workload includes an alternating lane that pulls its 2-bit counter into the wrong middle state — exactly the failure mode GShare and the perceptron exist to fix.

Design notes

Interface. Predictor::predict(pc) is const and side-effect-free; update(pc, taken) takes the ground truth separately. The simulator loop is predict → grade → update, which keeps unit tests trivial (assert what predict returns, then drive update with known outcomes) and means a tournament predictor can poll its sub-predictors twice — once to pick, once to grade — without paying for cached state. Cost is one virtual call per branch; the bench shows ~315M branches/s on bimodal which is plenty for sweeps.

Memory layout. Every predictor sizes its tables exactly at construction (std::vector<uint8_t> for 2-bit counters, std::vector<int8_t> for perceptron weights). No reallocations, no per-branch allocations, predictable cache behavior. Each predictor reports storage_bits() so the bar chart compares like-with-like instead of comparing apples to TAGE.

Synthetic generator. Five patterns: always-taken, alternating, loop-N (N-1 takens then a fallthrough — mirrors a backward branch closing a loop), biased-random (Bernoulli with seeded mt19937_64 so traces reproduce), and a mixed workload that round-robins four PCs each running its own sub-pattern. The mixed trace is the interesting one: it forces predictors to actually use per-PC information and history, which a single-pattern stream wouldn't.

Two-level test gotcha. For PAg/GAp the BHR shifts on every update, so the test pattern predict A, predict B, update A, update B predicts B at one BHR but trains B's PHT entry at a different BHR. The interface's predict and update must be paired per-branch (predict A → update A → predict B → update B) for correctness — that's encoded as a comment in the tests so the next person doesn't relearn it.

What's deliberately omitted. No TAGE / TAGE-SC-L (the modern frontier), no loop predictor, no return-address stack, no indirect-target prediction, no BTB. All of those would matter for an apples-to-apples comparison with a real CPU front-end; this project is scoped at the "classic taken/not-taken predictors" level.

Future work

  • TAGE-SC-L for a modern reference point.
  • Real workload ingestion (SPEC2017, ChampSim trace format).
  • BTB modeling and indirect-target prediction.
  • Hardware-budget-aware sweeps that hold storage constant and search over the rest of the config.
  • Per-PC misprediction breakdown — knowing which branches the predictor whiffs on is more informative than the aggregate rate.

Repo layout

include/branchsim/   public headers (predictor.hpp, config.hpp, factory.hpp, trace.hpp)
src/predictors/      one .cpp per predictor
src/sim.cpp          CLI entry
src/trace.cpp        reader + synthetic generator
tests/               GoogleTest, one file per predictor + trace
bench/               google-benchmark throughput suite (off by default)
scripts/             plot.py and run_sweep.sh
traces/              small checked-in demo traces (sweep regenerates larger ones)
docs/                sample_results.png

Build options: -DBRANCHSIM_BUILD_TESTS=ON (default) and -DBRANCHSIM_BUILD_BENCH=ON (off by default; pulls google-benchmark via FetchContent).

About

Trace-driven branch predictor simulator in modern C++17 — six classic predictors behind one virtual interface, synthetic trace generator, sweep + matplotlib pipeline, Google Benchmark throughput suite.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors