feat(routing): upstream (northbound) declaration aggregation#2631
Open
BOURBONCASK wants to merge 1 commit into
Open
feat(routing): upstream (northbound) declaration aggregation#2631BOURBONCASK wants to merge 1 commit into
BOURBONCASK wants to merge 1 commit into
Conversation
Extend config-driven aggregation to a router's northbound forwarding. When a north-bound
router HAT forwards a downstream subscriber/queryable whose key-expression is included by a
configured `aggregation.upstream.{subscribers,queryables}` prefix, the per-key children are
folded into a single `${prefix}` declaration toward the upstream and suppressed there, while
staying registered in the source region so downward routing is unchanged. An upstream router
then holds one routing Resource per configured prefix instead of one per forwarded key.
The stock `aggregation.subscribers`/`publishers` only collapses a session's own declarations at
the session boundary; this collapses what a router forwards upstream on behalf of the sessions
below it -- the cost that grows as O(N*K) (N downstream branches, K keys each) on every upstream
router in a mesh.
Design:
- No new wire type: the aggregate is an ordinary DeclareSubscriber/DeclareQueryable, so mesh
propagation, matching and admin are unchanged.
- Opt-in: an empty `aggregation.upstream` takes the existing propagation path (no-op fast path).
- Aggregate Resources are pre-created at gateway build; the fold path only looks them up (a
Resource needs the full Tables, unreachable from inside a single HAT), so each aggregate's
match-set is wired once and reconnect/churn cannot accumulate duplicate cross-links.
- The aggregate queryable is advertised complete=false, so BestMatching falls through to the
real per-key queryable: a genuinely-complete source is never shadowed and no completeness
state can go stale across owner churn.
- target=AllComplete reaches complete children behind the aggregate via a transparent-forwarder
flag on the in-process query route entry (set only at the router-net fold site, no wire
change); BestMatching is untouched.
- Liveliness tokens are intentionally not folded (a liveliness sample's key is the token's own
key, so a folded wildcard token could neither enumerate the live set nor signal a per-key
removal).
- A startup check warns on suspicious prefixes (a bare `**` root, the `@/` admin-space,
duplicates, mutually-including prefixes).
Trade-offs (documented in DEFAULT_CONFIG.json5 and the config schema): at the upstream node,
per-key ACL/QoS-overwrite/interceptors and admin-space enumeration see only the `${prefix}`
aggregate, and the wildcard aggregate can forward data toward the forwarding router for
unsubscribed keys.
Tests: a deterministic in-process (MockFace) fold/teardown test, plus real loopback-TCP
integration tests covering subscriber/queryable collapse, delivery and teardown, wildcard get
fan-out, missing-key empty (no hang), target=AllComplete reaching a complete child through the
aggregate with a non-complete negative control, no cross-branch shadowing, and cross-mesh
propagation. MSRV 1.75; clippy --deny warnings clean; no behaviour change when the config is unset.
Signed-off-by: yifei.ma <yifeima98@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This follows up on the discussion in #2630.
Up front, so there's no confusion about what this is: it's an experiment I built to deal with a real
routing-table scaling problem at edge-to-cloud scale, and I'm sharing it mostly as a concrete
reference / conversation starter rather than something I expect to land as-is. I understand a broader
Zenoh 2.0 redesign is only just starting to be discussed, with the scope still taking shape — so rather
than assume where this area lands, please read this as "here's one way it could look, and here's what I
learned doing it," offered as possible input to that conversation. I'm very happy to reshape it, cut it
down, rebase it onto whatever makes sense, or just leave it as something others can borrow from.
What it's for
When many downstream sessions each declare K subscribers/queryables under a shared key-expression prefix
and a router forwards them up into a router mesh, every upstream router ends up holding ~N×K
routing-table
Resources (N branches × K keys). The routing table tends to be the first limit you hit.Zenoh's existing
aggregation.subscribers/publishersonly collapses a session's own declarations,not what a router forwards upstream for the sessions below it — so this extends config aggregation to a
router's northbound forwarding, letting an upstream router keep one
Resourceper configured prefixinstead of one per forwarded key.
What it does
When a north-bound router HAT forwards a downstream subscriber/queryable whose key-expression is included
by a configured
aggregation.upstream.{subscribers,queryables}prefix, it folds the per-key childreninto a single
${prefix}declaration upstream and suppresses the children there — keeping themregistered in the source region so downward routing is unchanged. There's no new wire type (the aggregate
is an ordinary
DeclareSubscriber/DeclareQueryable), and it's opt-in (an emptyaggregation.upstreamtakes the existing propagation path).
A few design notes (in case they're useful for the broader discussion)
The aggregate
Resources are created once when the gateway is built and the fold path just looks themup — creating a
Resourceneeds the wholeTables, which isn't reachable from inside a single HAT, anddoing it once means each aggregate's match-set is wired exactly once, so reconnect/churn can't pile up
duplicate cross-links.
The aggregate queryable is advertised
complete=false(presence, not authority), soBestMatchingfallsthrough to the real per-key queryable — a genuinely-complete source is never shadowed and no
completestate can go stale across owner churn.
For
target=AllComplete, a route entry that is non-complete, whose matched resource covers the query, andpoints at a router, is treated as a transparent forwarder (a small in-process flag on the query route
entry, set only at the router-net fold site — no wire change).
AllCompletepasses through it so the nextrouter re-applies the filter against its real children;
BestMatchingis untouched.It's northbound-only, the fold goes through a refcounted ledger, and route caches are invalidated on fold
and teardown. Suspicious prefixes (a bare
**root, the@/admin-space, duplicates, mutual inclusion)get a startup warning.
Trade-offs
At the upstream node, per-key ACL / QoS / interceptors and admin-space enumeration see only the
${prefix}aggregate (so per-key policy belongs on the forwarding router); the wildcard aggregate canforward data toward the forwarding router for unsubscribed keys; and liveliness tokens are intentionally
not folded (a liveliness sample's key is the token's key, so a folded
${prefix}/**token couldn'tenumerate per-key presence or signal a per-key loss). These are noted alongside the config in
DEFAULT_CONFIG.json5and the schema docs.Numbers
From an in-process loopback-TCP benchmark:
This is exact by construction (the fold produces one aggregate per branch), independent of RAM.
secondary signal; the A/B delta is dominated by the cardinality collapse, which is the load-bearing
result.
What changed
It's roughly ~480 lines of routing code plus config and docs, with a similar amount of tests — all behind
the empty-config fast path. The bulk lives in the router HAT (the per-prefix fold/suppress/teardown for
subscribers and queryables). Supporting pieces add the config and its docs, the pre-created aggregates and
the prefix validation in the dispatcher tables, the transparent-forwarder flag on the query route entry
(threaded through the other HATs' query paths), and the tests.
Tests
aggregate on the upstream face, and undeclaring all of them withdraws exactly one aggregate.
getfans out to all children; a missing-keygetreturns empty without hanging;target=AllCompletereaches a complete child through the aggregate while a non-complete child stays empty (with a negative
control); two branches under the same prefix don't shadow each other; cross-mesh propagation; plus an
ignored scale bench.
regions/queryable/matching/acl/adminspace/qossuites staygreen,
clippy --deny warningsis clean, and there's no behaviour change when the config is unset.(Also built + run on aarch64 with the same results.)
Compatibility
No protocol/wire change; additive opt-in config (defaults empty); MSRV (1.75) clean; built on the existing
regions/gateway routing model.
🏷️ Label-Based Checklist
No specific label requirements detected.
Current labels: No labels
Add one of these labels to this PR to see relevant checklist items:
api-sync,breaking-change,bug,ci,dependencies,documentation,enhancement,new feature,internalThis section updates automatically when labels change.