Skip to content

dmang-dev/hash-bench-gb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hash-bench-gb

Part of the hash-bench cross-platform suite — the same hash-algorithm sources timed natively on seven Nintendo consoles (NES · GB/GBC · GBA · NDS · DSi · 3DS · N64).

A hashing-algorithm benchmark cartridge for the Game Boy and Game Boy Color. Hashes a 1024-byte buffer with 18 algorithms — 15 non-cryptographic (CRC-8, CRC-16, CRC-32, Adler-32, Fletcher-16, Fletcher-32, Pearson-8, Knuth multiplicative, Bob Jenkins one-at-a-time, PJW/ELF, SDBM, DJB2, FNV-1a, Murmur3-32, xxHash32) and 3 light cryptographic (MD4, MD5, SHA-1) — and prints a paginated table of milliseconds-per-iteration and KB/s throughput on the LCD. Press LEFT/RIGHT to flip pages (3 pages of ~6), A to cycle sort mode, B to re-run.

Sibling projects with progressively more powerful CPUs:

Project CPU Algos Notes
hash-bench-gb (this) SM83 @ 4 MHz 18 DMG / GBC ROM, 32 KB ROM-only
hash-bench-gba ARM7TDMI @ 16.78 MHz 32 adds the 14 algos that don't fit in SM83's footprint
hash-bench-nds ARM946E @ 33.5 MHz 32 dual-screen display
hash-bench-dsi ARM946E @ 134 MHz 32 DSi-mode setCpuClock(true), runs on DSi/2DS/3DS-family
hash-bench-3ds ARM11 @ 268-804 MHz 32 native libctru, real FPU + L2 cache

Why GB is limited to 18 algos

We wanted 23 on GB (matching what the GBA build runs) and tried hard to make it fit. The blocker turned out to be a combination of SDCC SM83 code generation and the GBDK banked-call ABI:

  • uint64_t bloat (4 algos). SHA-512 / SHA-3-256 / SHA-3-512 / xxHash64 / SipHash-2-4 / CRC-64 / Fletcher-64 / MurmurHash3-128 / PBKDF2-HMAC-SHA256 all need 64-bit arithmetic. SDCC SM83 supports long long via synthesised pair-of-uint32_t ops, but at tens of seconds per iteration on a 4 MHz SM83 — would ruin sweep UX.
  • uint32_t hash bloat (5 algos). SHA-256, RIPEMD-160, BLAKE2s, HMAC-SHA256, AES-CBC-MAC each compile to ~34 KB standalone on SM83 (vs SHA-1 at ~3 KB and CRC-32 at ~0.8 KB). The reason: SDCC's uint32_t arithmetic runtime ballons dramatically when a function has ~10 simultaneous 32-bit locals across an unrolled 64-round compress loop — most of the 34 KB is the runtime helpers, not the algorithm proper.
  • Banked-call ABI fragility. The natural fix is to bank: keep main
    • small algos in bank 0/1, push heavy algos into banks 2/3/4. The runtime helpers get amortised once (~36 KB across all heavy algos) and the incremental cost per algo is small (~1-2 KB), so the full 23-algo build came in at ~52 KB total which fits MBC1 fine. However, SDCC's GB banked-call trampoline reliably crashed at runtime on the first cross-bank entry in this configuration — observed across banks 2, 3, and 4, with both function-pointer and direct-call dispatch. The hello-world CGB test cart proved the GBDK pipeline itself was fine; the bug is specific to the combination of SDCC's banked-call trampoline + MBC1 + how SDCC places the runtime helpers across banks.
  • Per-algo single-cart split (also doesn't fit). With ~34 KB per algo and a 32 KB ROM-only ceiling, we can't even put one heavyweight algo per cart.

So those five live on the GBA / NDS / DSi / 3DS variants, where flat 32 MB / 4 MB address spaces sidestep the banking issue entirely.

Companion to the TOTP authenticator pair totp-gb / totp-gba whose SHA-1 implementation seeded this one.

ROMs Built with GBDK-2020


What it does

On boot, fills a 1024-byte buffer with i*31+7 (mod 256), then for each algorithm:

  1. Reads sys_time (GBDK's VBlank counter at ~59.73 Hz)
  2. Hashes the buffer N times (per-algorithm count tuned to ~1 second total wall time)
  3. Reads sys_time again
  4. Prints NAME HASH ms/iter KB/s to the screen

The first two bytes of the digest are shown so you can sanity-check the result against the reference digests without needing a debugger.

Press B to re-run the whole sweep — handy for comparing DMG vs GBC double-speed mode in the same emulator.


Try it

Pre-built ROMs are checked into artifacts/:

File Target Notes
hash-bench-gb.gb DMG / Pocket / Light / SGB / GBA 32 KB ROM-only, no MBC
hash-bench-gbc.gbc GBC, GBA compat mode 32 KB ROM-only; calls cgb_compatibility() so the DMG-style monochrome palette stays readable in CGB mode

Why 32 KB ROM-only (no MBC): earlier iterations of this build used MBC1 banking to fit heavier crypto algos in additional banks. SDCC's GB banked-call trampoline turned out to crash reliably on the first cross-bank entry across multiple bank choices and dispatch strategies — see Why GB is limited to 18 algos. We removed banking entirely, dropped the five heaviest crypto algos (RIPEMD-160, SHA-256, BLAKE2s, HMAC-SHA256, AES-CBC-MAC), and ship a flat 32 KB ROM-only cart. Those five run on the bigger-platform variants where flat 32 MB+ address spaces make banking moot.

Load either in mGBA, BGB, or SameBoy. On real hardware, any flashcart works — no SRAM / RTC / battery needed.


Build from source

Requires GBDK-2020 (https://github.com/gbdk-2020/gbdk-2020/releases). By default the build script looks at ../totp-gb/gbdk/ (where the sibling TOTP project vendors its toolchain). Override with GBDK_DIR to point elsewhere:

build.bat            # both DMG and CGB variants
build.bat dmg        # DMG only
build.bat gbc        # GBC only

Linux/macOS / CI: just make / make dmg / make gbc.


Algorithms

All algorithm sources are pure C with no GBDK / devkitARM dependencies, and the exact same files compile on both the GB (Z80-like SM83) and the GBA (ARM7TDMI). Digest size and table-vs-tableless choice are documented at the top of each src/*.c.

Algorithm Type Digest Notes
CRC-8/SMBUS checksum 8 bit Poly 0x07, bit-by-bit, no table
CRC-16/XMODEM checksum 16 bit Poly 0x1021, bit-by-bit, no table
CRC-32/IEEE checksum 32 bit Poly 0xEDB88320, bit-by-bit, reflected
Adler-32 checksum 32 bit Two 16-bit accumulators mod 65521
DJB2 non-crypto 32 bit h = ((h<<5)+h) + c, init 5381
FNV-1a 32 non-crypto 32 bit One 32×32 multiply per byte
Pearson-8 non-crypto 8 bit 256-byte permutation table; one xor + one load per byte
Murmur3-32 non-crypto 32 bit 4-byte block ARX with fmix32 finalizer (seed 0)
xxHash32 non-crypto 32 bit Four-lane ARX (multiply-heavy)
Knuth multiplicative non-crypto 32 bit h = (h ^ c) * 0x9E3779B1 per byte
Bob Jenkins one-at-a-time non-crypto 32 bit Three-shift mix per byte, three-shift finaliser
PJW / ELF non-crypto 32 bit ELF gABI symbol-table hash, high-nibble rotor
SDBM non-crypto 32 bit h = h * 65599 + c, shift-form
Fletcher-16 checksum 16 bit Byte-wise mod 255, two running sums
Fletcher-32 checksum 32 bit 16-bit-word mod 65535
MD4 cryptographic (broken) 128 bit RFC 1320, 64-byte block, 3 rounds × 16 ops
MD5 cryptographic 128 bit 64-byte block, 64 rounds
SHA-1 cryptographic 160 bit 64-byte block, 80 rounds

The SM83 has no native multiply, so FNV-1a, Murmur3, xxHash32 — which look "trivial" on modern hardware — are dramatically slower on the GB than the bit-shifty CRC and Pearson families. On the GBA they're cheaper than the crypto hashes, as expected. Pearson is by far the fastest algorithm on both platforms (it's just T[h ^ c] per byte) at the cost of being 8-bit-output-only; CRC-8 is the next fastest in the same single-byte tier.

Reference digests

Workload buffer: buf[i] = (i * 31 + 7) & 0xFF for i in [0, 1024). Only the 18 GB-runnable algorithms are listed below — for the full table including RIPEMD-160 / SHA-256 / BLAKE2s / HMAC-SHA256 / AES-CBC-MAC / SHA-512 / SHA-3 / etc., see the hash-bench-nds README.

Algorithm Digest
CRC-8 DD
CRC-16 F009
CRC-32 7C321B5D
CRC-64 (omitted — 64-bit state)
Adler-32 13D3FE10
Fletcher-16 D400
Fletcher-32 F5F3FF00
Pearson-8 A0
Knuth-mul 169FA000
Jenkins OAT D88778E4
PJW/ELF 0FC634A8
SDBM 8FAF7E00
DJB2 358B5305
FNV-1a 32 2C6D0DC5
Murmur3-32 56530DF1
xxHash32 (stored in the ROM but no fixed-bytes reference here — matches xxhash.xxh32(buf, seed=0))
MD4 A2909A641975A5CB590984B5323BB03B
MD5 63B2177A7AF739B5CC52AB1D1C714702
SHA-1 B66786CDE756750241D1F0EAB86CE6F81855B017

The ROM displays the first two bytes of each digest on-screen, which is enough to distinguish the eighteen algorithms at a glance — if those first-two-byte fingerprints don't match the table above for your run, something is wrong with the build. (For 1-byte digests like CRC-8 and Pearson-8 the second byte shows as 00 because the digest buffer is zeroed before each call — the meaningful byte is the first.)


Timing methodology

sys_time is GBDK's auto-incrementing 16-bit frame counter (bumped once per VBlank from the default interrupt handler). One vblank is ≈ 16.74 ms on DMG/GBP/CGB single-speed and ≈ 8.37 ms on CGB double-speed. The harness:

  • Reads sys_time immediately before the inner loop
  • Runs N iterations (per-algorithm count)
  • Reads sys_time immediately after
  • Subtracts (wrap-safe in 16-bit), multiplies by 16742, divides by 1000 × iters for ms-per-iteration

Quantisation error per measurement is ±0.5 frames ≈ ±8 ms. To swamp this, iteration counts are tuned so each algorithm runs at least ~250 ms wall time — see the ALGOS[] table in src/main.c for the per-algorithm counts.

KB/s is computed as 1024 / ms_per_iter (binary KB), rounded.


Layout

src/                Pure-C sources
  hashes.h          Function table for all 16 algorithms (declares
                    HBENCH_BANKED so calls work on banked GB ROM)
  crc8.c crc16.c crc32.c adler32.c                     non-crypto checksums
  djb2.c fnv1a.c pearson.c murmur3.c xxhash32.c        non-crypto hashes
  md4.c md5.c sha1.c sha256.c                          cryptographic
  ripemd160.c                                          cryptographic, bank 3
  blake2s.c                                            cryptographic, bank 2
  hmac_sha256.c                                        crypto MAC, bank 3
  main.c            Boot, sweep, paginated render, input dispatch
artifacts/          Prebuilt ROMs (committed)
build.bat           Windows build (DMG + GBC parallel)
Makefile            Linux / CI build

The non-banked *.c files in src/ are byte-identical to the ones in ../hash-bench-gba/source/ — both projects pull from the same pool so a fix in one platform's algorithm is immediately a fix in the other's. blake2s.c, ripemd160.c and hmac_sha256.c are also identical except for their #pragma bank N lines, which are gated on #ifdef __PORT_sm83 and ignored by devkitARM and host gcc.


Acknowledgments

  • GBDK-2020 — SM83 toolchain
    • libgb / stdio
  • totp-gb / totp-gba — donor SHA-1 implementation and project layout
  • mGBA — accurate emulator