[NetKAT] Store each interned decision node once: pointer-keyed unique tables#99
Draft
smolkaj wants to merge 4 commits into
Draft
[NetKAT] Store each interned decision node once: pointer-keyed unique tables#99smolkaj wants to merge 4 commits into
smolkaj wants to merge 4 commits into
Conversation
…ors. Union, Sequence, and Difference copied entire DecisionNodes — including their nested btree maps — at every recursion step, re-interned expanded nodes through the unique table just to recover handles the caller already had, and copied the per-value modification map (GetMapAtValue) inside per-value loops. Several TODOs flagged this as a likely performance problem. Restructure the combinators around cheap, non-owning views instead: * DecisionNodeView views an operand "at" a field: either the node's own maps, or the trivial "fall through to this handle" expansion when the node branches on a larger field (or is Accept). This removes both the by-value node copies and the materialize-and-re-intern expansion step. * ModifyBranchesView replaces GetMapAtValue's map copies with a view of the base map plus at most one extra entry, iterated in sorted order. * CombineModifyBranches becomes a template over the combiner, removing AnyInvocable type-erasure and double lookups, and inserts with an end hint since keys arrive sorted. * The value-collection loops now iterate in sorted order (previously nondeterministic flat_hash_set order), making combination order — and thus node numbering — deterministic. Compilation benchmarks improve by roughly 1.5-2.2x (FirstTimeCompile*), with larger wins expected on policies with wider modification maps. Behavior is unchanged: all tests pass, and the golden-file diff test output is byte-identical, including node numbering. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
…, stable goldens. Three follow-ups to the copy-elimination refactor: * Sequence accumulated its result maps by rebuilding them from scratch for every left-branch entry, making accumulation quadratic in the number of branches. It now unions each sequenced branch into the result map in place, which also removes the remaining intermediate maps entirely. * Union and Difference were near-identical after the refactor; their shared body (pointwise combination of two operands viewed at a common field) moves into a PointwiseCombine template parameterized on the operation. * The golden diff test asserted raw node-interning indices, so any reordering inside the manager — including the change above, and planned work like operation memoization — churned the golden file without any semantic change. The test runner now renumbers nodes and fields canonically before printing, by copying each compiled transformer into a fresh manager in deterministic traversal order, so the golden output depends only on transformer structure. The golden file is regenerated accordingly (verified: the previous diff was pure renumbering, with structure, fields, and labels identical). Compilation benchmarks improve a further ~9% on top of the previous commit (~1.8x total vs the original code on FirstTimeCompile NonOverlappingPolicy). All tests pass. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Findings from a four-angle cleanup review (reuse, simplification, efficiency, altitude): * Replace ModifyBranchesView's hand-rolled merging iterator (~45 lines, the subtlest code in the refactor) with a simple ForEach that visits base entries then the extra entry. No caller needed globally sorted iteration: map contents are order-independent and handle-level results are canonical regardless of combination order. * Deduplicate the operand-view prologue shared by Sequence and PointwiseCombine into ViewOperandsAtSmallestField. * Use absl::NoDestructor for the empty-map singletons (the repo's established idiom) and name the default-handle-is-Deny contract in one place (IsDenyHandle) instead of re-deriving it inline. * Hint the per-value btree insertions in Sequence/PointwiseCombine with end(), since SortedUniqueKeys emits values in increasing order. * Test runner: collect fields into a btree_set instead of sort+unique, and translate field handles through a precomputed map instead of a per-node string round-trip. * Drop a stale TODO referencing the deleted GetMapAtValue, and the now-unused any_invocable dependency. No behavior change: all tests pass and the golden file is unchanged. Benchmarks are neutral. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
39e0fd9 to
7fcbb37
Compare
… tables. The unique tables of both managers stored every node twice — once in the node arena (nodes_) and once as the hash map key — doubling node memory and copying each new node (nested btree maps and all, for transformer nodes) on insertion. The tables are now keyed by pointers into the arena, which PagedStableVector keeps stable across both growth and manager moves; transparent hash/equality functors allow lookup by node value before interning. Benchmarks improve ~2-9% across the board on top of google#98, mostly from no longer copying freshly-created nodes into the key slot. Also: * Adds BM_PushAndPullFullSetThroughPolicy, covering the read-path operations (Push/Pull) that the analysis engine is built on; the existing benchmarks only measured compilation. Sizing note: without memoization, Push/Pull cost grows exponentially in policy size (8 unioned sub-policies ran for 20+ minutes), so the benchmark deliberately stays small — and the planned operation-memoization work is where that changes. * Includes ProtoHashKey.policy_case in the proto-cache hash; it was part of equality but omitted from the hash, causing avoidable collisions across policy kinds. * CheckInternalInvariants verifies pointer-table integrity and exercises both lookup paths in both managers. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
7fcbb37 to
6224871
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Note
Stacked on #98: this branch contains #98's commits plus one commit on top. Review only the last commit here; review the rest in #98. After #98 lands, this PR will be rebased onto
main.What
The unique tables of both
PacketSetManagerandPacketTransformerManagerstored every interned decision node twice — once in the node arena (nodes_) and once as the hash-map key — doubling node memory and copying each newly created node (nested btree maps and all, for transformer nodes) into the key slot on insertion.The tables are now keyed by stable pointers into the arena —
PagedStableVectorguarantees pointers survive both arena growth and manager moves, which also keeps the hash/equality functors stateless. Transparent functors allow lookup by node value before interning, so the lookup-then-insert flow needs no extra copies anywhere.Performance
Min-of-3 interleaved optimized runs on top of #98:
FirstTimeCompileNonOverlapping−8.4%,FirstTimeCompileOverlapping−9.0%,ReCompile*−3.8%/−4.5%,PushAndPull−1.9% — mostly from no longer deep-copying fresh nodes into the key slot. Node memory halves (modulo the 16-byte pointer-map entry).Also in this PR
BM_PushAndPullFullSetThroughPolicy: the existing benchmarks only measured compilation; this covers the read-path operations (Push/Pull) the analysis engine is built on. Sizing it surfaced how brutally exponential un-memoizedPush/Pullis (eight unioned sub-policies ran 20+ minutes), so it deliberately stays small — the planned operation-memoization work is where that changes.ProtoHashKeyhash fix:policy_casewas part of equality but omitted from the hash, causing avoidable collisions across policy kinds.CheckInternalInvariantsvalidates pointer-table integrity and exercises both lookup paths, in both managers.Verification
All 17 test targets pass, including the fuzz-property tests, the differential tests against the evaluator, and #98's canonicalizing golden diff test (golden file unchanged).
Relation to the flat-node experiment
An earlier iteration of this PR additionally flattened the transformer's interned node representation into contiguous arrays. That work is parked in a follow-up draft PR (see the PR stacked on this one) with an honest cost/benefit assessment: the cheap, clearly-worthwhile parts — this PR — capture most of the measured win without introducing a second node representation.
🤖 Generated with Claude Code