Part of the hash-bench cross-platform suite — the same hash-algorithm sources timed natively on seven Nintendo consoles (NES · GB/GBC · GBA · NDS · DSi · 3DS · N64).
A hashing-algorithm benchmark cartridge for the Game Boy and Game Boy Color. Hashes a 1024-byte buffer with 18 algorithms — 15 non-cryptographic (CRC-8, CRC-16, CRC-32, Adler-32, Fletcher-16, Fletcher-32, Pearson-8, Knuth multiplicative, Bob Jenkins one-at-a-time, PJW/ELF, SDBM, DJB2, FNV-1a, Murmur3-32, xxHash32) and 3 light cryptographic (MD4, MD5, SHA-1) — and prints a paginated table of milliseconds-per-iteration and KB/s throughput on the LCD. Press LEFT/RIGHT to flip pages (3 pages of ~6), A to cycle sort mode, B to re-run.
Sibling projects with progressively more powerful CPUs:
| Project | CPU | Algos | Notes |
|---|---|---|---|
hash-bench-gb (this) |
SM83 @ 4 MHz | 18 | DMG / GBC ROM, 32 KB ROM-only |
hash-bench-gba |
ARM7TDMI @ 16.78 MHz | 32 | adds the 14 algos that don't fit in SM83's footprint |
hash-bench-nds |
ARM946E @ 33.5 MHz | 32 | dual-screen display |
hash-bench-dsi |
ARM946E @ 134 MHz | 32 | DSi-mode setCpuClock(true), runs on DSi/2DS/3DS-family |
hash-bench-3ds |
ARM11 @ 268-804 MHz | 32 | native libctru, real FPU + L2 cache |
We wanted 23 on GB (matching what the GBA build runs) and tried hard to make it fit. The blocker turned out to be a combination of SDCC SM83 code generation and the GBDK banked-call ABI:
- uint64_t bloat (4 algos). SHA-512 / SHA-3-256 / SHA-3-512 /
xxHash64 / SipHash-2-4 / CRC-64 / Fletcher-64 / MurmurHash3-128 /
PBKDF2-HMAC-SHA256 all need 64-bit arithmetic. SDCC SM83 supports
long longvia synthesised pair-of-uint32_t ops, but at tens of seconds per iteration on a 4 MHz SM83 — would ruin sweep UX. - uint32_t hash bloat (5 algos). SHA-256, RIPEMD-160, BLAKE2s, HMAC-SHA256, AES-CBC-MAC each compile to ~34 KB standalone on SM83 (vs SHA-1 at ~3 KB and CRC-32 at ~0.8 KB). The reason: SDCC's uint32_t arithmetic runtime ballons dramatically when a function has ~10 simultaneous 32-bit locals across an unrolled 64-round compress loop — most of the 34 KB is the runtime helpers, not the algorithm proper.
- Banked-call ABI fragility. The natural fix is to bank: keep main
- small algos in bank 0/1, push heavy algos into banks 2/3/4. The runtime helpers get amortised once (~36 KB across all heavy algos) and the incremental cost per algo is small (~1-2 KB), so the full 23-algo build came in at ~52 KB total which fits MBC1 fine. However, SDCC's GB banked-call trampoline reliably crashed at runtime on the first cross-bank entry in this configuration — observed across banks 2, 3, and 4, with both function-pointer and direct-call dispatch. The hello-world CGB test cart proved the GBDK pipeline itself was fine; the bug is specific to the combination of SDCC's banked-call trampoline + MBC1 + how SDCC places the runtime helpers across banks.
- Per-algo single-cart split (also doesn't fit). With ~34 KB per algo and a 32 KB ROM-only ceiling, we can't even put one heavyweight algo per cart.
So those five live on the GBA / NDS / DSi / 3DS variants, where flat 32 MB / 4 MB address spaces sidestep the banking issue entirely.
Companion to the TOTP authenticator pair totp-gb /
totp-gba whose SHA-1 implementation seeded this one.
On boot, fills a 1024-byte buffer with i*31+7 (mod 256), then for each
algorithm:
- Reads
sys_time(GBDK's VBlank counter at ~59.73 Hz) - Hashes the buffer N times (per-algorithm count tuned to ~1 second total wall time)
- Reads
sys_timeagain - Prints
NAME HASH ms/iter KB/sto the screen
The first two bytes of the digest are shown so you can sanity-check the result against the reference digests without needing a debugger.
Press B to re-run the whole sweep — handy for comparing DMG vs GBC double-speed mode in the same emulator.
Pre-built ROMs are checked into artifacts/:
| File | Target | Notes |
|---|---|---|
hash-bench-gb.gb |
DMG / Pocket / Light / SGB / GBA | 32 KB ROM-only, no MBC |
hash-bench-gbc.gbc |
GBC, GBA compat mode | 32 KB ROM-only; calls cgb_compatibility() so the DMG-style monochrome palette stays readable in CGB mode |
Why 32 KB ROM-only (no MBC): earlier iterations of this build used MBC1 banking to fit heavier crypto algos in additional banks. SDCC's GB banked-call trampoline turned out to crash reliably on the first cross-bank entry across multiple bank choices and dispatch strategies — see Why GB is limited to 18 algos. We removed banking entirely, dropped the five heaviest crypto algos (RIPEMD-160, SHA-256, BLAKE2s, HMAC-SHA256, AES-CBC-MAC), and ship a flat 32 KB ROM-only cart. Those five run on the bigger-platform variants where flat 32 MB+ address spaces make banking moot.
Load either in mGBA, BGB, or SameBoy. On real hardware, any flashcart works — no SRAM / RTC / battery needed.
Requires GBDK-2020 (https://github.com/gbdk-2020/gbdk-2020/releases).
By default the build script looks at ../totp-gb/gbdk/ (where the
sibling TOTP project vendors its toolchain). Override with GBDK_DIR
to point elsewhere:
build.bat # both DMG and CGB variants
build.bat dmg # DMG only
build.bat gbc # GBC only
Linux/macOS / CI: just make / make dmg / make gbc.
All algorithm sources are pure C with no GBDK / devkitARM dependencies,
and the exact same files compile on both the GB (Z80-like SM83) and
the GBA (ARM7TDMI). Digest size and table-vs-tableless choice are
documented at the top of each src/*.c.
| Algorithm | Type | Digest | Notes |
|---|---|---|---|
| CRC-8/SMBUS | checksum | 8 bit | Poly 0x07, bit-by-bit, no table |
| CRC-16/XMODEM | checksum | 16 bit | Poly 0x1021, bit-by-bit, no table |
| CRC-32/IEEE | checksum | 32 bit | Poly 0xEDB88320, bit-by-bit, reflected |
| Adler-32 | checksum | 32 bit | Two 16-bit accumulators mod 65521 |
| DJB2 | non-crypto | 32 bit | h = ((h<<5)+h) + c, init 5381 |
| FNV-1a 32 | non-crypto | 32 bit | One 32×32 multiply per byte |
| Pearson-8 | non-crypto | 8 bit | 256-byte permutation table; one xor + one load per byte |
| Murmur3-32 | non-crypto | 32 bit | 4-byte block ARX with fmix32 finalizer (seed 0) |
| xxHash32 | non-crypto | 32 bit | Four-lane ARX (multiply-heavy) |
| Knuth multiplicative | non-crypto | 32 bit | h = (h ^ c) * 0x9E3779B1 per byte |
| Bob Jenkins one-at-a-time | non-crypto | 32 bit | Three-shift mix per byte, three-shift finaliser |
| PJW / ELF | non-crypto | 32 bit | ELF gABI symbol-table hash, high-nibble rotor |
| SDBM | non-crypto | 32 bit | h = h * 65599 + c, shift-form |
| Fletcher-16 | checksum | 16 bit | Byte-wise mod 255, two running sums |
| Fletcher-32 | checksum | 32 bit | 16-bit-word mod 65535 |
| MD4 | cryptographic (broken) | 128 bit | RFC 1320, 64-byte block, 3 rounds × 16 ops |
| MD5 | cryptographic | 128 bit | 64-byte block, 64 rounds |
| SHA-1 | cryptographic | 160 bit | 64-byte block, 80 rounds |
The SM83 has no native multiply, so FNV-1a, Murmur3, xxHash32 — which
look "trivial" on modern hardware — are dramatically slower on the GB
than the bit-shifty CRC and Pearson families. On the GBA they're
cheaper than the crypto hashes, as expected. Pearson is by far the
fastest algorithm on both platforms (it's just T[h ^ c] per byte) at
the cost of being 8-bit-output-only; CRC-8 is the next fastest in the
same single-byte tier.
Workload buffer: buf[i] = (i * 31 + 7) & 0xFF for i in [0, 1024).
Only the 18 GB-runnable algorithms are listed below — for the full
table including RIPEMD-160 / SHA-256 / BLAKE2s / HMAC-SHA256 /
AES-CBC-MAC / SHA-512 / SHA-3 / etc., see the
hash-bench-nds README.
| Algorithm | Digest |
|---|---|
| CRC-8 | DD |
| CRC-16 | F009 |
| CRC-32 | 7C321B5D |
| CRC-64 | (omitted — 64-bit state) |
| Adler-32 | 13D3FE10 |
| Fletcher-16 | D400 |
| Fletcher-32 | F5F3FF00 |
| Pearson-8 | A0 |
| Knuth-mul | 169FA000 |
| Jenkins OAT | D88778E4 |
| PJW/ELF | 0FC634A8 |
| SDBM | 8FAF7E00 |
| DJB2 | 358B5305 |
| FNV-1a 32 | 2C6D0DC5 |
| Murmur3-32 | 56530DF1 |
| xxHash32 | (stored in the ROM but no fixed-bytes reference here — matches xxhash.xxh32(buf, seed=0)) |
| MD4 | A2909A641975A5CB590984B5323BB03B |
| MD5 | 63B2177A7AF739B5CC52AB1D1C714702 |
| SHA-1 | B66786CDE756750241D1F0EAB86CE6F81855B017 |
The ROM displays the first two bytes of each digest on-screen, which
is enough to distinguish the eighteen algorithms at a glance — if
those first-two-byte fingerprints don't match the table above for your
run, something is wrong with the build. (For 1-byte digests like CRC-8
and Pearson-8 the second byte shows as 00 because the digest buffer
is zeroed before each call — the meaningful byte is the first.)
sys_time is GBDK's auto-incrementing 16-bit frame counter (bumped
once per VBlank from the default interrupt handler). One vblank is
≈ 16.74 ms on DMG/GBP/CGB single-speed and ≈ 8.37 ms on CGB
double-speed. The harness:
- Reads
sys_timeimmediately before the inner loop - Runs N iterations (per-algorithm count)
- Reads
sys_timeimmediately after - Subtracts (wrap-safe in 16-bit), multiplies by 16742, divides by 1000 × iters for ms-per-iteration
Quantisation error per measurement is ±0.5 frames ≈ ±8 ms. To swamp
this, iteration counts are tuned so each algorithm runs at least
~250 ms wall time — see the ALGOS[] table in src/main.c
for the per-algorithm counts.
KB/s is computed as 1024 / ms_per_iter (binary KB), rounded.
src/ Pure-C sources
hashes.h Function table for all 16 algorithms (declares
HBENCH_BANKED so calls work on banked GB ROM)
crc8.c crc16.c crc32.c adler32.c non-crypto checksums
djb2.c fnv1a.c pearson.c murmur3.c xxhash32.c non-crypto hashes
md4.c md5.c sha1.c sha256.c cryptographic
ripemd160.c cryptographic, bank 3
blake2s.c cryptographic, bank 2
hmac_sha256.c crypto MAC, bank 3
main.c Boot, sweep, paginated render, input dispatch
artifacts/ Prebuilt ROMs (committed)
build.bat Windows build (DMG + GBC parallel)
Makefile Linux / CI build
The non-banked *.c files in src/ are byte-identical to the ones
in ../hash-bench-gba/source/ — both
projects pull from the same pool so a fix in one platform's algorithm
is immediately a fix in the other's. blake2s.c, ripemd160.c and
hmac_sha256.c are also identical except for their #pragma bank N
lines, which are gated on #ifdef __PORT_sm83 and ignored by
devkitARM and host gcc.