[NetKAT] Shrink node storage pages from 64 MiB to 16 KiB by smolkaj · Pull Request #105 · google/netkat

smolkaj · 2026-06-10T16:00:07Z

What

The page size of the managers' node vectors goes from 64 MiB to 12 KiB (packet sets, 512 nodes) / 16 KiB (transformers, 256 nodes), defined directly as power-of-two node counts.

Why

At 64 MiB, every page allocation exceeds malloc's mmap threshold (typically 128 KiB), so every manager pays an mmap/munmap syscall pair — diagnosed with strace -c, which shows one mmap+munmap per benchmark iteration on head. This is significant for short-lived managers (compile a policy, answer a query, discard), the dominant pattern in tests and the analysis engine today. At 12–16 KiB, pages stay below the mmap/trim thresholds and are recycled through the allocator's freelists, while still amortizing allocation over hundreds of nodes.

The page sizes are expressed as power-of-two node counts rather than derived from a byte budget: a byte-budget division yields a non-power-of-two count for packet sets (16 KiB / 24 B = 682), which would force the index arithmetic in PagedStableVector::operator[] — the hot path of nearly every operation — to compile to multiply sequences instead of single shift/mask instructions. (#101, designed to stack on this PR, enforces the power-of-two property at compile time and adds benchmark infrastructure.)

Measured results (CPU-time medians, 5 reps)

Small-policy compilation, the workload this PR targets, vs head:

Benchmark	before	after	speedup
`BM_FirstTimeCompileOverlappingPredicate`	11.2 µs	3.8 µs	3.0×
`BM_FirstTimeCompileNonOverlappingPredicate`	45.0 µs	36.1 µs	1.25×
`BM_FirstTimeCompileNonOverlappingPolicy` (transformer)	585 µs	266 µs	2.2×
Recompile benchmarks (no allocation)	~640 ns	~635 ns	unchanged

Large workloads are not just unaffected but slightly improved, verified with the large-scale benchmarks from #101 (random packet sets of ~10^5–10^6 BDD nodes): compiling a 32k-member set goes from 264.6 ms to 243.1 ms (−8%), Xor of two 32k-member sets from 3.19 ms to 3.13 ms, Not unchanged.

Testing

All 17 bazel test targets pass.

🤖 Generated with Claude Code

The managers' node vectors allocate memory in pages. At 64 MiB, every page allocation exceeds malloc's mmap threshold (typically 128 KiB), so each manager pays an mmap/munmap syscall pair - significant for short-lived managers, which compile a policy and are discarded. At 16 KiB, pages are recycled through the allocator's freelists, while still amortizing allocation over hundreds of nodes. In benchmarks, this speeds up first-time compilation of small policies by up to 3x (e.g. BM_FirstTimeCompileOverlappingPredicate: 10.8us -> 3.7us); the syscall cost was diagnosed with strace -c. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

smolkaj · 2026-06-10T16:00:52Z

Standalone benchmark run confirmed (medians of 5, -c opt, vs main measured back-to-back on the same machine):

Benchmark	main	this PR
BM_FirstTimeCompileNonOverlappingPredicate	45.0µs	37.1µs (−18%)
BM_ReCompileNonOverlappingPredicate	641ns	641ns
BM_FirstTimeCompileOverlappingPredicate	11.1µs	3.8µs (2.9×)
BM_ReCompileOverlappingPredicate	649ns	636ns
BM_FirstTimeCompileNonOverlappingPolicy	310.5µs	268.1µs (−14%)
BM_ReCompileNonOverlappingPolicy	4.23µs	4.10µs
BM_FirstTimeCompileOverlappingPolicy	46.5µs	31.9µs (−31%)
BM_ReCompileOverlappingPolicy	4.12µs	4.05µs

Notably, these match the numbers previously measured for the full #102 stack almost exactly — this two-line change accounts for essentially all of the first-time-compilation speedup observed there. Recompilation is unchanged, as expected (no allocation on that path).

google-cla · 2026-06-10T16:01:06Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Deriving the page size from a byte budget yields a non-power-of-two node count for packet sets (16 KiB / 24 B = 682), which forces the index arithmetic in PagedStableVector::operator[] -- on the hot path of nearly every operation -- to compile to multiply sequences instead of single shift/mask instructions. Round to 512 nodes (12 KiB) instead; transformer pages become an explicit 256 nodes (16 KiB), numerically unchanged. Both stay far below the malloc mmap/trim thresholds, which is what this PR is about. This also unblocks stacking google#101, which enforces power-of-two page sizes at compile time. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The managers' pages shrank from 2^21 to 2^9 nodes (see google#105); keep the microbenchmark representative of what production uses. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

smolkaj · 2026-06-11T21:47:31Z

This seems to overfitting to small workflows, but will make large workflows slower. Reference: https://claude.ai/share/6c75784f-9b5a-4124-ad54-14811c88ed94

Closing.

smolkaj mentioned this pull request Jun 10, 2026

[NetKAT] Store decision nodes by field: per-field unique tables + level-packed handles #102

Closed

smolkaj mentioned this pull request Jun 10, 2026

[NetKAT] Speed up PagedStableVector indexing with power-of-two page sizes #101

Draft

smolkaj closed this Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NetKAT] Shrink node storage pages from 64 MiB to 16 KiB#105

[NetKAT] Shrink node storage pages from 64 MiB to 16 KiB#105
smolkaj wants to merge 2 commits into
google:mainfrom
smolkaj:small-arena-pages

smolkaj commented Jun 10, 2026 •

edited

Loading

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

google-cla Bot commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smolkaj commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Measured results (CPU-time medians, 5 reps)

Testing

Uh oh!

smolkaj commented Jun 10, 2026

Uh oh!

google-cla Bot commented Jun 10, 2026

Uh oh!

smolkaj commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

smolkaj commented Jun 10, 2026 •

edited

Loading