perf(nodes): batch relay stats to fix O(N×M) /api/nodes regression#1164
perf(nodes): batch relay stats to fix O(N×M) /api/nodes regression#1164efiten wants to merge 3 commits intoKpa-clawbot:masterfrom
Conversation
handleNodes looped over repeater/room nodes calling GetRepeaterRelayInfo + GetRepeaterUsefulnessScore per node. GetRepeaterUsefulnessScore iterates ALL byPayloadType entries (denominator) under a separate RLock on every call, giving O(N × totalTx) complexity — ~220M iterations per request on a node with 1500 repeaters and 145K transmissions. Add GetRepeaterNodeStatsBatch: takes ONE RLock, computes the non-advert denominator once, snapshots byPathHop slice headers for all requested pubkeys, then processes everything outside the lock. Update handleNodes to collect repeater/room pubkeys first, call the batch method once, and apply results — reducing /api/nodes relay enrichment from O(N × M) to O(M + N). GetRepeaterRelayInfo and GetRepeaterUsefulnessScore are unchanged for the single-node detail endpoint. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
handleNodes looped over repeater/room nodes calling GetRepeaterRelayInfo + GetRepeaterUsefulnessScore per node. GetRepeaterUsefulnessScore iterates ALL byPayloadType entries under a separate RLock on every call — O(N × totalTx) per request (~220M iterations with 1500 nodes and 145K transmissions). Add GetRepeaterNodeStatsBatch: ONE RLock, denominator computed once, byPathHop headers snapshotted for all nodes, processing outside the lock. Reduces /api/nodes relay enrichment from O(N × M) to O(M + N). Upstream PR: Kpa-clawbot#1164 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Kpa-clawbot
left a comment
There was a problem hiding this comment.
Independent review — PR #1164 (perf: batch relay stats)
Thanks for the perf work — the diagnosis is right, the denominator-once + single-RLock shape is the correct fix, and the EnrichNodes-style call site change in routes.go is clean. I'd ship this with the items below addressed.
Must-fix
1. Data race: slice headers snapshotted under RLock, elements read after RUnlock.
In GetRepeaterNodeStatsBatch:
s.mu.RLock()
// ...
snaps[pk] = nodeSnap{txList: s.byPathHop[key]}
// ...
s.mu.RUnlock()
// ... outside lock ...
for _, tx := range snap.txList { // racy
if tx != nil { uniq[tx.ID] = struct{}{} }
}Only the slice header is copied under the lock; the backing array is shared with writers. Writers DO mutate the backing array in place:
removeTxFromSlice(store.go:2886):idx[key] = append(list[:i], list[i+1:]...)— this shifts elements left in the same backing array. A reader iterating the snapshot will see torn / shifted / duplicated*StoreTxentries.append(s.byPathHop[key], tx)(store.go:619, 1630, 1996, 2117) — at-or-over-capacity grows in place into spare cap, which is invisible to the snapshot length but doesn't trigger a copy of existing slots, so still safe by itself; but combined with the remove path above it's not.
Compare with the existing GetRepeaterRelayInfo (repeater_liveness.go), which does collect(txList) / collect(prefixList) before s.mu.RUnlock() — i.e. it reads every tx.ID / tx.PayloadType / tx.FirstSeen it needs while still holding the read lock and only the resulting scratch slice escapes. Please mirror that: extract the per-node (ts, pt, isInTxList) tuples into a per-pubkey []entry (or a single flat slice keyed by pubkey index) inside the RLock, then release and process. Keeps the perf win — the denominator is still computed once and the lock is still held only once across the whole node set — but stops crossing the lock boundary with mutable state.
This will also be flagged by go test -race on a contention-heavy fixture, which is part of why a real test (item 2) matters.
2. No behavior tests for the new public API.
Per AGENTS.md TDD rule, this is a behavior change — GetRepeaterNodeStatsBatch is a new public method on PacketStore. The "Test plan" in the PR body is go build + "CI passes", which are not behavior tests. Required, in cmd/server/repeater_usefulness_test.go (or a new repeater_batch_test.go):
- Parity test: build a
PacketStorewith a representative mix (one node with full-key hops only, one with prefix-only hops, one with both, one with no hops, one repeater that sees only adverts). CallGetRepeaterNodeStatsBatch(pubkeys, w). For each pubkey, assertresult[pk].InfoequalsGetRepeaterRelayInfo(pk, w)field-by-field andresult[pk].ScoreequalsGetRepeaterUsefulnessScore(pk). - Empty input:
GetRepeaterNodeStatsBatch(nil, 24)and(...[]string{}, 24)return non-nil empty map. - Nodes with no
byPathHopentries appear in the result map with zeroInfoand zeroScore(this is whatroutes.go'sif stats, ok := relayStats[pk]; okrelies on, and it currently always succeeds because the loop unconditionally writesresult[pk]— that's the right behavior, please pin it with a test so a future refactor can't regress it). -racetest that hammersGetRepeaterNodeStatsBatchfrom one goroutine while another calls a mutating store method (e.g. an upsert that triggersaddTxToPathHopIndex/removeTxFromPathHopIndex). With the current snapshot-then-release shape this should fail; with the fix from item 1 it should pass.
The red→green sequence on the parity test should also be visible in the commit history per AGENTS.md "TDD IS MANDATORY".
3. uniq map is dead work.
uniq := make(map[int]struct{}, len(snap.txList)+len(snap.prefixList))
for _, tx := range snap.txList { if tx != nil { uniq[tx.ID] = struct{}{} } }
for _, tx := range snap.prefixList{ if tx != nil { uniq[tx.ID] = struct{}{} } }
entries := make([]entry, 0, len(uniq))
seen := make(map[int]bool, len(uniq))
collect := func(list []*StoreTx) { /* ... uses `seen`, not `uniq` ... */ }
collect(snap.txList); collect(snap.prefixList)uniq is built only to size entries / seen, then seen re-does the exact same dedup. Either drop uniq and oversize entries/seen with len(snap.txList)+len(snap.prefixList) (the existing GetRepeaterRelayInfo accepts that ~2× over-allocation in exchange for one fewer pass), or drop seen and key off uniq. Two passes of dedup per node, in a hot path you're explicitly trying to deflate, is the wrong direction.
4. Type and helper duplication with GetRepeaterRelayInfo.
type entry struct { ts string; pt int }, the collect closure, the cutoff math (cutoff1h, cutoff24h, windowCutoff), and the latest-tracking + RelayCount1h/24h accumulation are copy-pasted verbatim from repeater_liveness.go. After fix #1 you'll have to keep two copies of the snapshot-read shape in lockstep. Please extract a small helper, e.g.
func computeRelayInfoFromEntries(entries []entry, now time.Time, windowHours float64) RepeaterRelayInfoand call it from both GetRepeaterRelayInfo and the batch method. Same applies to the prefix-lookup logic (if len(key) >= 2 { prefix := key[:2]; if prefix != key { ... } }) — that's now in two places; one helper.
5. Doc comment claim doesn't match implementation.
The comment says "Snapshots byPathHop slice headers for all requested pubkeys under that single lock — Processes timestamps and counts outside the lock." That description IS the bug from item 1. After the fix, please rewrite to: "Copies the per-node (timestamp, payload-type) tuples under the single read lock; lock is released before parsing/scoring." (And the doc-comment paragraph in the PR body should match.)
Out-of-scope (not in this diff, file separately if you agree)
GetRepeaterUsefulnessScorerecomputestotalNonAdverton every call — even single-node detail pages eat O(byPayloadType) per call. The TODO already mentions memoization off store invalidation; worth its own issue.byPathHopkeying by 1-byte raw prefix is acknowledged-lossy inrepeater_liveness.go's comment and the same caveat now applies to the batch method's prefix lookup. Disambiguation via the path-listing endpoint's resolved-path SQL post-filter is referenced but not done here. Track separately.
Verdict
Right diagnosis, right shape, but the lock-release ordering as written is unsafe and there are no behavior tests gating it. Once items 1–3 land (and ideally 4–5), happy to re-review.
GetRepeaterNodeStatsBatch over ~1900 repeaters takes 20-30s on large datasets. Add GetRepeaterNodeStatsBatchCached with a 30s TTL so repeated /api/nodes requests (map view, live view) serve from cache after the first cold computation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GetRepeaterNodeStatsBatchCached: TTL was 30s but cold compute takes ~25s on live data (~1900 repeaters, 142K packets) → ~82% CPU duty cycle. 300s TTL drops it to ~8% (one cold pass per 5 minutes). asyncPersistResolvedPathsAndEdges: obs resolved_path updates now written in pages of 200 rows per transaction (was one big transaction) with a 5ms yield between pages, limiting SQLite write-lock hold time. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Problem
handleNodesenriches each repeater/room node by callingGetRepeaterRelayInfoandGetRepeaterUsefulnessScoreper node inside a loop.GetRepeaterUsefulnessScoreacquiress.mu.RLock()and then iterates allbyPayloadTypeentries to compute the non-advert denominator — once per node.On a deployment with ~1500 repeater/room nodes and ~145K transmissions in memory, this is ~220M iterations per
/api/nodesrequest, plus ~3000 separate lock acquisitions. Response times of 18–44 seconds have been observed in production, especially during startup backfill when write-lock contention compounds the issue.Fix
Add
GetRepeaterNodeStatsBatch(pubkeys []string, windowHours float64) map[string]RepeaterNodeStatstorepeater_usefulness.go:s.mu.RLock()for the entire node listbyPathHopslice headers for all requested pubkeys under that single lockUpdate
handleNodesto collect repeater/room pubkeys first, call the batch method once, and apply results.Complexity: O(M + N) instead of O(N × M) per request (M = total transmissions, N = repeater nodes).
GetRepeaterRelayInfoandGetRepeaterUsefulnessScoreare unchanged — they are still correct for single-node calls (e.g.handleNodeDetail).Test plan
go build ./cmd/serverpasses/api/nodesresponse is correct (relay_active, relay_count_1h/24h, usefulness_score fields present for repeaters)/api/nodes/{pubkey}(uses existing single-node methods)🤖 Generated with Claude Code