Skip to content

[NetKAT] [PARKED] Flat array representation for interned PacketTransformer DecisionNodes#104

Draft
smolkaj wants to merge 5 commits into
google:mainfrom
smolkaj:flat-decision-nodes
Draft

[NetKAT] [PARKED] Flat array representation for interned PacketTransformer DecisionNodes#104
smolkaj wants to merge 5 commits into
google:mainfrom
smolkaj:flat-decision-nodes

Conversation

@smolkaj

@smolkaj smolkaj commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Warning

Parked deliberately — not seeking review. This PR is complete, green, and verified, but shelved on cost/benefit grounds until there is evidence to justify it (see "Why this is parked" below). It is stacked on #98 and #99; review only the last commit.

What

Interned PacketTransformerManager::DecisionNodes become flat — two contiguous sorted arrays (a (match_value, end_offset) header array plus a single (modify_value, branch) entry array) instead of nested absl::btree_maps: 40 vs 64 bytes per node, 2 heap allocations vs one per inner btree map. #98's combinator views are retargeted from btree maps to flat spans (binary search instead of btree::find); the map-based representation survives only as DecisionNodeBuilder for result accumulation, canonicalized and flattened once at interning. A single canonical-sequence visitor (ForEachFlatEntry) defines the flat element order for Flatten, NodeHash, and NodeEq, enabling allocation-free transparent unique-table lookup by builder.

Verification

Why this is parked (honest assessment)

Judged purely on CPU numbers, this change is not worth it: a few percent does not pay for what it costs in complexity.

What it costs. The expensive part is not the flat arrays — it is the dual representation. Every future change to node semantics touches two encodings plus the consistency contract between them (ForEachFlatEntry, the transparent functors, Flatten, the invariant checks). That is a permanent tax on everyone who touches the file. The offset-encoded arrays are also simply harder to read than a map of maps.

What it buys. Three things of very different solidity:

  1. CPU: ~±3% incremental over [NetKAT] Eliminate node and map copies in packet transformer combinators #98 + [NetKAT] Store each interned decision node once: pointer-keyed unique tables #99. Minor.
  2. Memory: the only potentially large win — back-of-envelope, a transformer node with M match values costs roughly (M+2) btree node allocations (~256 bytes each), vs ~56 + 8·entries bytes flat, i.e. potentially ~10× per node. But this is unmeasured: transformer node counts may be modest in practice, packet-set nodes (usually far more numerous) are already flat, and no memory-bound workload has been demonstrated. The strongest argument for this PR is currently faith-based.
  3. Endgame alignment: flat immutable nodes are where every mature BDD package landed. But "canonical eventually" is not "needed now" — nothing on the roadmap is blocked by this; in particular the apply caches (the actual big win, see the memoization TODOs b/382379263) key on handles and are representation-agnostic.

When to revive. Either of:

  • Profiling after operation memoization lands shows unique-table hashing/equality or node traversal as hot; or
  • A realistic workload (e.g. a production-scale table pipeline) demonstrates that transformer-node memory matters.

At that point this PR lands with evidence instead of vibes, and the work is already done: rebase, re-run the (byte-identical-golden) verification, done.

🤖 Generated with Claude Code

smolkaj and others added 5 commits June 10, 2026 03:13
…ors.

Union, Sequence, and Difference copied entire DecisionNodes — including
their nested btree maps — at every recursion step, re-interned expanded
nodes through the unique table just to recover handles the caller
already had, and copied the per-value modification map (GetMapAtValue)
inside per-value loops. Several TODOs flagged this as a likely
performance problem.

Restructure the combinators around cheap, non-owning views instead:

* DecisionNodeView views an operand "at" a field: either the node's own
  maps, or the trivial "fall through to this handle" expansion when the
  node branches on a larger field (or is Accept). This removes both the
  by-value node copies and the materialize-and-re-intern expansion step.
* ModifyBranchesView replaces GetMapAtValue's map copies with a view of
  the base map plus at most one extra entry, iterated in sorted order.
* CombineModifyBranches becomes a template over the combiner, removing
  AnyInvocable type-erasure and double lookups, and inserts with an end
  hint since keys arrive sorted.
* The value-collection loops now iterate in sorted order (previously
  nondeterministic flat_hash_set order), making combination order — and
  thus node numbering — deterministic.

Compilation benchmarks improve by roughly 1.5-2.2x (FirstTimeCompile*),
with larger wins expected on policies with wider modification maps.
Behavior is unchanged: all tests pass, and the golden-file diff test
output is byte-identical, including node numbering.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…, stable goldens.

Three follow-ups to the copy-elimination refactor:

* Sequence accumulated its result maps by rebuilding them from scratch
  for every left-branch entry, making accumulation quadratic in the
  number of branches. It now unions each sequenced branch into the
  result map in place, which also removes the remaining intermediate
  maps entirely.
* Union and Difference were near-identical after the refactor; their
  shared body (pointwise combination of two operands viewed at a common
  field) moves into a PointwiseCombine template parameterized on the
  operation.
* The golden diff test asserted raw node-interning indices, so any
  reordering inside the manager — including the change above, and
  planned work like operation memoization — churned the golden file
  without any semantic change. The test runner now renumbers nodes and
  fields canonically before printing, by copying each compiled
  transformer into a fresh manager in deterministic traversal order, so
  the golden output depends only on transformer structure. The golden
  file is regenerated accordingly (verified: the previous diff was pure
  renumbering, with structure, fields, and labels identical).

Compilation benchmarks improve a further ~9% on top of the previous
commit (~1.8x total vs the original code on FirstTimeCompile
NonOverlappingPolicy). All tests pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Findings from a four-angle cleanup review (reuse, simplification,
efficiency, altitude):

* Replace ModifyBranchesView's hand-rolled merging iterator (~45 lines,
  the subtlest code in the refactor) with a simple ForEach that visits
  base entries then the extra entry. No caller needed globally sorted
  iteration: map contents are order-independent and handle-level results
  are canonical regardless of combination order.
* Deduplicate the operand-view prologue shared by Sequence and
  PointwiseCombine into ViewOperandsAtSmallestField.
* Use absl::NoDestructor for the empty-map singletons (the repo's
  established idiom) and name the default-handle-is-Deny contract in one
  place (IsDenyHandle) instead of re-deriving it inline.
* Hint the per-value btree insertions in Sequence/PointwiseCombine with
  end(), since SortedUniqueKeys emits values in increasing order.
* Test runner: collect fields into a btree_set instead of sort+unique,
  and translate field handles through a precomputed map instead of a
  per-node string round-trip.
* Drop a stale TODO referencing the deleted GetMapAtValue, and the
  now-unused any_invocable dependency.

No behavior change: all tests pass and the golden file is unchanged.
Benchmarks are neutral.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… tables.

The unique tables of both managers stored every node twice — once in
the node arena (nodes_) and once as the hash map key — doubling node
memory and copying each new node (nested btree maps and all, for
transformer nodes) on insertion. The tables are now keyed by pointers
into the arena, which PagedStableVector keeps stable across both
growth and manager moves; transparent hash/equality functors allow
lookup by node value before interning. Benchmarks improve ~2-9%
across the board on top of google#98, mostly from no longer copying
freshly-created nodes into the key slot.

Also:
* Adds BM_PushAndPullFullSetThroughPolicy, covering the read-path
  operations (Push/Pull) that the analysis engine is built on; the
  existing benchmarks only measured compilation. Sizing note: without
  memoization, Push/Pull cost grows exponentially in policy size (8
  unioned sub-policies ran for 20+ minutes), so the benchmark
  deliberately stays small — and the planned operation-memoization
  work is where that changes.
* Includes ProtoHashKey.policy_case in the proto-cache hash; it was
  part of equality but omitted from the hash, causing avoidable
  collisions across policy kinds.
* CheckInternalInvariants verifies pointer-table integrity and
  exercises both lookup paths in both managers.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… combinators.

Reworks the flat-node representation change to stack on top of the
view-based combinator refactor (google#98), composing the two designs into
the intended end state:

* Interned nodes are flat (two contiguous sorted arrays, 40 bytes, two
  allocations) and stored exactly once, in a unique table keyed by
  stable pointers into the node arena with transparent
  lookup-by-builder (and the same table design ported to
  PacketSetManager, where nodes are even more numerous).

* The combinators keep google#98's structure but their operand views now
  read flat spans: DecisionNodeView wraps a flat node (or the trivial
  fall-through), ModifyBranchesView is a span plus at most one extra
  entry, and per-value lookups are binary searches. Only result
  accumulation still uses btree maps (DecisionNodeBuilder), which
  NodeToTransformer canonicalizes and flattens on interning.

* The builder-expansion machinery from the previous iteration of this
  branch (ToBuilder on every combinator call, handle-level field
  alignment) is gone — operands are never expanded, completing the
  deletion that google#98's views started.

Also carries over from the previous iteration: the canonical
flat-sequence visitor (ForEachFlatEntry) shared by Flatten/NodeHash/
NodeEq, streaming hashing, the Matches() iteration API, strengthened
internal invariants (flat-encoding structure, hash/eq consistency,
pointer-table integrity), the Push/Pull read-path benchmark, and the
ProtoHashKey.policy_case hash fix.

Verified structure-preserving: the golden file of the canonicalizing
diff test is byte-identical to google#98's. All 17 test targets pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@google-cla

google-cla Bot commented Jun 10, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant