Skip to content

perf: incremental DOM diff in renderVisibleRows (#414)#3

Closed
efiten wants to merge 31 commits into
masterfrom
fix/rendervisiblerows-incremental-diff
Closed

perf: incremental DOM diff in renderVisibleRows (#414)#3
efiten wants to merge 31 commits into
masterfrom
fix/rendervisiblerows-incremental-diff

Conversation

@efiten
Copy link
Copy Markdown
Owner

@efiten efiten commented Apr 4, 2026

Summary

  • Replace full `tbody` teardown+rebuild on every scroll frame with a range-diff that only adds/removes the delta rows at the edges of the visible window
  • `buildFlatRowHtml` / `buildGroupRowHtml` now accept an `entryIdx` parameter and emit `data-entry-idx` on every `` so the diff can target rows precisely (including expanded group children)
  • Full rebuild is still used for initial render and large scroll jumps past the buffer (no overlap with previous range)
  • Also loads `packet-helpers.js` in the test sandbox, fixing 7 pre-existing test failures for the builder functions; adds 4 new tests covering `data-entry-idx` output

Fixes Kpa-clawbot#414

Test plan

  • Open packets page with 500+ packets, scroll rapidly — DOM inspector should show incremental `` adds/removes rather than full `tbody` teardown
  • Expand a grouped packet, scroll away and back — expanded children re-render correctly with correct `data-entry-idx`
  • Large scroll jump (e.g. jump to bottom via scrollbar) — full rebuild fires, no visual glitch
  • `node test-packets.js` — 72 passed, 0 failed

🤖 Generated with Claude Code

efiten and others added 30 commits April 4, 2026 08:41
## Problem

Closes Kpa-clawbot#563. Addresses the *Packet store estimated memory* item in Kpa-clawbot#559.

`estimatedMemoryMB()` used a hardcoded formula:

```go
return float64(len(s.packets)*5120+s.totalObs*500) / 1048576.0
```

This ignored three data structures that grow continuously with every
ingest cycle:

| Structure | Production size | Heap not counted |
|---|---|---|
| `distHops []distHopRecord` | 1,556,833 records | ~300 MB |
| `distPaths []distPathRecord` | 93,090 records | ~25 MB |
| `spIndex map[string]int` | 4,113,234 entries | ~400 MB |

Result: formula reported ~1.2 GB while actual heap was ~5 GB. With
`maxMemoryMB: 1024`, eviction calculated it only needed to shed ~200 MB,
removed a handful of packets, and stopped. Memory kept growing until the
OOM killer fired.

## Fix

Replace `estimatedMemoryMB()` with `runtime.ReadMemStats` so all data
structures are automatically counted:

```go
func (s *PacketStore) estimatedMemoryMB() float64 {
    if s.memoryEstimator != nil {
        return s.memoryEstimator()
    }
    var ms runtime.MemStats
    runtime.ReadMemStats(&ms)
    return float64(ms.HeapAlloc) / 1048576.0
}
```

Replace the eviction simulation loop (which re-used the same wrong
formula) with a proportional calculation: if heap is N× over budget,
evict enough packets to keep `(1/N) × 0.9` of the current count. The 0.9
factor adds a 10% buffer so the next ingest cycle doesn't immediately
re-trigger. All major data structures (distHops, distPaths, spIndex)
scale with packet count, so removing a fraction of packets frees roughly
the same fraction of total heap.

## Testing

- Updated `TestEvictStale_MemoryBasedEviction` to inject a deterministic
estimator via the new `memoryEstimator` field.
- Added `TestEvictStale_MemoryBasedEviction_UnderestimatedHeap`:
verifies that when actual heap is 5× over limit (the production failure
scenario), eviction correctly removes ~80%+ of packets.

```
=== RUN   TestEvictStale_MemoryBasedEviction
[store] Evicted 538 packets (1076 obs)
--- PASS

=== RUN   TestEvictStale_MemoryBasedEviction_UnderestimatedHeap
[store] Evicted 820 packets (1640 obs)
--- PASS
```

Full suite: `go test ./...` — ok (10.3s)

## Perf note

`runtime.ReadMemStats` runs once per eviction tick (every 60 s) and once
per `/api/perf/store` call. Cost is negligible.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…#568)

## Summary

Adds a byte-size filter to the map page, allowing users to filter
repeater markers by their hash prefix size (1-byte, 2-byte, or 3-byte).

## What changed

**`public/map.js`** — single file change:

1. **New filter state**: Added `byteSize` to the `filters` object
(default: `'all'`), persisted in `localStorage`
2. **New UI section**: Added a "Byte Size" fieldset with button group
(`All | 1-byte | 2-byte | 3-byte`) in the map controls panel, between
"Node Types" and "Display"
3. **Filter logic**: In `_renderMarkersInner`, when `byteSize !==
'all'`, repeater nodes are filtered by their `hash_size` field.
Non-repeater nodes (companions, rooms, sensors) are unaffected — they
pass through regardless of the byte-size filter setting
4. **Event binding**: Button click handlers update the filter, persist
to localStorage, and re-render markers

## Design decisions

- **Client-side only** — no backend changes needed. The `hash_size`
field is already included in the `/api/nodes` response
- **Repeaters only** — byte size is a repeater configuration concept;
other node roles don't have configurable path prefix sizes
- **Matches existing pattern** — uses the same button-group UI as the
Status filter (All/Active/Stale)
- **`hash_size` defaults to 1** — consistent with how the rest of the
codebase treats missing `hash_size` (`node.hash_size || 1`)

## Performance

No new API calls. Filter is a simple string comparison inside the
existing `nodes.filter()` loop in `_renderMarkersInner` — O(1) per node,
negligible overhead.

Fixes Kpa-clawbot#565

Co-authored-by: you <you@example.com>
…ot#569)

## Summary

Replace O(n) map iteration in `MaxTransmissionID()` and
`MaxObservationID()` with O(1) field lookups.

## What Changed

- Added `maxTxID` and `maxObsID` fields to `PacketStore`
- Updated `Load()`, `IngestNewFromDB()`, and `IngestNewObservations()`
to track max IDs incrementally as entries are added
- `MaxTransmissionID()` and `MaxObservationID()` now return the tracked
field directly instead of iterating the entire map

## Performance

Before: O(n) iteration over 30K+ map entries under a read lock
After: O(1) field return

## Tests

- Added `TestMaxTransmissionIDIncremental` verifying the incremental
field matches brute-force iteration over the maps
- All existing tests pass (`cmd/server` and `cmd/ingestor`)

Fixes Kpa-clawbot#356

Co-authored-by: you <you@example.com>
…Kpa-clawbot#567)

## Summary

Fixes Kpa-clawbot#566 — The "Inconsistent Hash Sizes" list on the Analytics page
included all node types and had no time window, causing false positives.

## Changes

### 1. Role filter on inconsistent nodes (`cmd/server/store.go`)
Added role filter to the `inconsistentNodes` loop in
`computeHashCollisions()` so only repeaters and room servers are
included. Companions are excluded since they were never affected by the
firmware bug. This matches the existing role filter on collision
bucketing from Kpa-clawbot#441.

```go
// Before:
if cn.HashSizeInconsistent {

// After:
if cn.HashSizeInconsistent && (cn.Role == "repeater" || cn.Role == "room_server") {
```

### 2. 7-day time window on hash size computation
(`cmd/server/store.go`)
Added a 7-day recency cutoff to `computeNodeHashSizeInfo()`. Adverts
older than 7 days are now skipped, preventing legitimate historical
config changes (e.g., testing different byte sizes) from creating
permanent false positives.

### 3. Frontend description text (`public/analytics.js`)
Updated the description to reflect the filtered scope: now says
"Repeaters and room servers" instead of "Nodes", mentions the 7-day
window, and notes that companions are excluded.

## Tests

- `TestInconsistentNodesExcludesCompanions` — verifies companions are
excluded while repeaters and room servers are included
- `TestHashSizeInfoTimeWindow` — verifies adverts older than 7 days are
excluded from hash size computation
- Updated existing hash size tests to use recent timestamps (compatible
with the new time window)
- All existing tests pass: `cmd/server` ✅, `cmd/ingestor` ✅

## Perf justification
The time window filter adds a single string comparison per advert in the
scan loop — O(n) with a tiny constant. No impact on hot paths.

---------

Co-authored-by: you <you@example.com>
…awbot#572)

## Summary

Index node path lookups in `handleNodePaths()` instead of scanning all
packets on every request.

## Problem

`handleNodePaths()` iterated ALL packets in the store (`O(total_packets
× avg_hops)`) with prefix string matching on every hop. This caused
user-facing latency on every node detail page load with 30K+ packets.

## Fix

Added a `byPathHop` index (`map[string][]*StoreTx`) that maps lowercase
hop prefixes and resolved full pubkeys to their transmissions. The
handler now does direct map lookups instead of a full scan.

### Index lifecycle
- **Built** during `Load()` via `buildPathHopIndex()`
- **Incrementally updated** during `IngestNewFromDB()` (new packets) and
`IngestNewObservations()` (path changes)
- **Cleaned up** during `EvictStale()` (packet removal)

### Query strategy
The handler looks up candidates from the index using:
1. Full pubkey (matches resolved hops from `resolved_path`)
2. 2-char prefix (matches short raw hops)
3. 4-char prefix (matches medium raw hops)
4. Any longer raw hops starting with the 4-char prefix

This reduces complexity from `O(total_packets × avg_hops)` to
`O(matching_txs + unique_hop_keys)`.

## Tests

- `TestNodePathsEndpointUsesIndex` — verifies the endpoint returns
correct results using the index
- `TestPathHopIndexIncrementalUpdate` — verifies add/remove operations
on the index

All existing tests pass.

Fixes Kpa-clawbot#359

Co-authored-by: you <you@example.com>
## Summary

`buildPrefixMap()` was generating map entries for every prefix length
from 2 to `len(pubkey)` (up to 64 chars), creating ~31 entries per node.
With 500 nodes that's ~15K map entries; with 1K+ nodes it balloons to
31K+.

## Changes

**`cmd/server/store.go`:**
- Added `maxPrefixLen = 8` constant — MeshCore path hops use 2–6 char
prefixes, 8 gives headroom
- Capped the prefix generation loop at `maxPrefixLen` instead of
`len(pk)`
- Added full pubkey as a separate map entry when key is longer than
`maxPrefixLen`, ensuring exact-match lookups (used by
`resolveWithContext`) still work

**`cmd/server/coverage_test.go`:**
- Added `TestPrefixMapCap` with subtests for:
  - Short prefix resolution still works
  - Full pubkey exact-match resolution still works
  - Intermediate prefixes beyond the cap correctly return nil
  - Short keys (≤8 chars) have all prefix entries
  - Map size is bounded

## Impact

- Map entries per node: ~31 → ~8 (one per prefix length 2–8, plus one
full-key entry)
- Total map size for 500 nodes: ~15K entries → ~4K entries (~75%
reduction)
- No behavioral change for path hop resolution (2–6 char prefixes)
- No behavioral change for exact pubkey lookups

## Tests

All existing tests pass:
- `cmd/server`: ✅
- `cmd/ingestor`: ✅

Fixes Kpa-clawbot#364

---------

Co-authored-by: you <you@example.com>
…pa-clawbot#571)

## Summary

`GetSubpathDetail()` iterated ALL packets to find those containing a
specific subpath — `O(packets × hops × subpath_length)`. With 30K+
packets this caused user-visible latency on every subpath detail click.

## Changes

### `cmd/server/store.go`
- Added `spTxIndex map[string][]*StoreTx` alongside existing `spIndex` —
tracks which transmissions contain each subpath key
- Extended `addTxToSubpathIndexFull()` and
`removeTxFromSubpathIndexFull()` to maintain both indexes simultaneously
- Original `addTxToSubpathIndex()`/`removeTxFromSubpathIndex()` wrappers
preserved for backward compatibility
- `buildSubpathIndex()` now populates both `spIndex` and `spTxIndex`
during `Load()`
- All incremental update sites (ingest, path change, eviction) use the
`Full` variants
- `GetSubpathDetail()` rewritten: direct `O(1)` map lookup on
`spTxIndex[key]` instead of scanning all packets

### `cmd/server/coverage_test.go`
- Added `TestSubpathTxIndexPopulated`: verifies `spTxIndex` is
populated, counts match `spIndex`, and `GetSubpathDetail` returns
correct results for both existing and non-existent subpaths

## Complexity

- **Before:** `O(total_packets × avg_hops × subpath_length)` per request
- **After:** `O(matched_txs)` per request (direct map lookup)

## Tests

All tests pass: `cmd/server` (4.6s), `cmd/ingestor` (25.6s)

Fixes Kpa-clawbot#358

---------

Co-authored-by: you <you@example.com>
Kpa-clawbot#573)

## Summary

Extracts a shared `fetchNodeDetail(pubkey)` helper in `nodes.js` that
fetches both `/nodes/{pubkey}` and `/nodes/{pubkey}/health` in parallel.
Both `selectNode()` (side panel) and `loadFullNode()` (full-screen view)
now call this single function instead of duplicating the fetch logic.

## What Changed

- **New:** `fetchNodeDetail(pubkey)` — shared async function that
returns node data with `.healthData` attached
- **Modified:** `loadFullNode()` — uses `fetchNodeDetail()` instead of
inline `Promise.all`
- **Modified:** `selectNode()` — uses `fetchNodeDetail()` instead of
inline `Promise.all`

## Why

The duplicate `api()` calls weren't a major perf issue (TTL caching
mitigates most cases), but the duplicated logic was unnecessary tech
debt. On mobile, `selectNode()` redirects to `loadFullNode()` via hash
change, so the two code paths could fire sequentially with expired
cache.

## Testing

- All frontend helper tests pass (445/445)
- All packet filter tests pass (62/62)
- All aging tests pass (29/29)
- No behavioral change — only code structure improvement

Fixes Kpa-clawbot#391

Co-authored-by: you <you@example.com>
…quential (Kpa-clawbot#574)

## Summary

`GetStoreStats()` ran 5 sequential DB queries on every call. This
combines them into **2 concurrent queries**:

1. **Node/observer counts** — single query using subqueries: `SELECT
(SELECT COUNT(*) FROM nodes WHERE ...), (SELECT COUNT(*) FROM nodes),
(SELECT COUNT(*) FROM observers)`
2. **Observation counts** — single query using conditional aggregation:
`SUM(CASE WHEN timestamp > ? THEN 1 ELSE 0 END)` scoped to the 24h
window, avoiding a full table scan for the 1h count

Both queries run concurrently via goroutines + `sync.WaitGroup`.

## What changed

- `cmd/server/store.go`: Rewrote `GetStoreStats()` — 5 sequential
`QueryRow` calls → 2 concurrent combined queries
- Error handling now propagates query errors instead of silently
ignoring them

## Performance justification

- **Before:** 5 sequential round-trips to SQLite, with 2 potentially
expensive `COUNT(*)` scans on the `observations` table
- **After:** 2 concurrent round-trips; the observation query scans the
24h window once instead of separately scanning for 1h and 24h
- The 10s cache (`statsTTL`) remains, so this fires at most once per 10s
— but when it does fire, it's ~2.5x fewer round-trips and the
observation scan is halved

## Tests

- `go test ./...` passes for both `cmd/server` and `cmd/ingestor`

Fixes Kpa-clawbot#363

---------

Co-authored-by: you <you@example.com>
## Summary

Cache `resolveRegionObservers()` results with a 30-second TTL to
eliminate repeated database queries for region→observer ID mappings.

## Problem

`resolveRegionObservers()` queried the database on every call despite
the observers table changing infrequently (~20 rows). It's called from
10+ hot paths including `filterPackets()`, `GetChannels()`, and multiple
analytics compute functions. When analytics caches are cold, parallel
requests each hit the DB independently.

## Solution

- Added a dedicated `regionObsMu` mutex + `regionObsCache` map with 30s
TTL
- Uses a separate mutex (not `s.mu`) to avoid deadlocks — callers
already hold `s.mu.RLock()`
- Cache is lazily populated per-region and fully invalidated after TTL
expires
- Follows the same pattern as `getCachedNodesAndPM()` (30s TTL,
on-demand rebuild)

## Changes

- **`cmd/server/store.go`**: Added `regionObsMu`, `regionObsCache`,
`regionObsCacheTime` fields; rewrote `resolveRegionObservers()` to check
cache first; added `fetchAndCacheRegionObs()` helper
- **`cmd/server/coverage_test.go`**: Added
`TestResolveRegionObserversCaching` — verifies cache population, cache
hits, and nil handling for unknown regions

## Testing

- All existing Go tests pass (`go test ./...`)
- New test verifies caching behavior (population, hits, nil for unknown
regions)

Fixes Kpa-clawbot#362

---------

Co-authored-by: you <you@example.com>
)

## Summary

Replace full `buildDistanceIndex()` rebuild with incremental
`removeTxFromDistanceIndex`/`addTxToDistanceIndex` for only the
transmissions whose paths actually changed during
`IngestNewObservations`.

## Problem

When any transmission's best path changed during observation ingestion,
the **entire distance index was rebuilt** — iterating all 30K+ packets,
resolving all hops, and computing haversine distances. This
`O(total_packets × avg_hops)` operation ran under a write lock, blocking
all API readers.

A 30-second debounce (`distRebuildInterval`) was added in Kpa-clawbot#557 to
mitigate this, but it only delayed the pain — the full rebuild still
happened, just less frequently.

## Fix

- Added `removeTxFromDistanceIndex(tx)` — filters out all
`distHopRecord` and `distPathRecord` entries for a specific transmission
- Added `addTxToDistanceIndex(tx)` — computes and appends new distance
records for a single transmission
- In `IngestNewObservations`, changed path-change handling to call
remove+add for each affected tx instead of marking dirty and waiting for
a full rebuild
- Removed `distDirty`, `distLast`, and `distRebuildInterval` since
incremental updates are cheap enough to apply immediately

## Complexity

- **Before:** `O(total_packets × avg_hops)` per rebuild (30K+ packets)
- **After:** `O(changed_txs × avg_hops + total_dist_records)` — the
remove is a linear scan of the distance slices, but only for affected
txs; the add is `O(hops)` per changed tx

The remove scan over `distHops`/`distPaths` slices is linear in slice
length, but this is still far cheaper than the full rebuild which also
does JSON parsing, hop resolution, and haversine math for every packet.

## Tests

- Updated `TestDistanceRebuildDebounce` →
`TestDistanceIncrementalUpdate` to verify incremental behavior and check
for duplicate path records
- All existing tests pass (`go test ./...` in both `cmd/server` and
`cmd/ingestor`)

Fixes Kpa-clawbot#365

---------

Co-authored-by: you <you@example.com>
Kpa-clawbot#577)

## Summary

Replace N+1 per-hop DB queries in `handleResolveHops` with O(1) lookups
against the in-memory prefix map that already exists in the packet
store.

## Problem

Each hop in the `resolve-hops` API triggered a separate `SELECT ... LIKE
?` query against the nodes table. With 10 hops, that's 10 DB round-trips
— unnecessary when `getCachedNodesAndPM()` already maintains an
in-memory prefix map that can resolve hops instantly.

## Changes

- **routes.go**: Replace the per-hop DB query loop with `pm.m[hopLower]`
lookups from the prefix map. Convert `nodeInfo` → `HopCandidate` inline.
Remove unused `rows`/`sql.Scan` code.
- **store.go**: Add `InvalidateNodeCache()` method to force prefix map
rebuild (needed by tests that insert nodes after store initialization).
- **routes_test.go**: Give `TestResolveHopsAmbiguous` a proper store so
hops resolve via the prefix map.
- **resolve_context_test.go**: Call `InvalidateNodeCache()` after
inserting test nodes. Fix confidence assertion — with GPS candidates and
no affinity context, `resolveWithContext` correctly returns
`gps_preference` (previously masked because the prefix map didn't have
the test nodes).

## Complexity

O(1) per hop lookup via hash map vs O(n) DB scan per hop. No hot-path
impact — this endpoint is called on-demand, not in a render loop.

Fixes Kpa-clawbot#369

---------

Co-authored-by: you <you@example.com>
## Summary

Skip `updateTimeline()` canvas redraws in `bufferPacket()` when the
browser tab is hidden (`_tabHidden === true`). Instead, batch-update the
timeline once when the tab becomes visible again via the
`visibilitychange` handler.

Fixes Kpa-clawbot#385

## What Changed

**`public/live.js`** — two surgical edits:

1. **`bufferPacket()`**: Removed `updateTimeline()` call from the
`_tabHidden` early-return path. When the tab is backgrounded, packets
are still buffered (for VCR) but no canvas work is done.

2. **`visibilitychange` handler**: Added `updateTimeline()` call when
the tab is restored, so the timeline catches up in a single repaint
instead of N repaints (one per buffered packet).

## Performance Impact

At 5+ packets/sec with a backgrounded tab, this eliminates continuous
canvas redraws (`updateTimeline()` calls `ctx.clearRect` + full canvas
redraw + `updateTimelinePlayhead()`) that are invisible to the user. CPU
usage drops to near-zero for timeline rendering while backgrounded.

## Tests

All existing tests pass:
- `test-packet-filter.js` — 62 passed
- `test-aging.js` — 29 passed  
- `test-frontend-helpers.js` — 445 passed

Co-authored-by: you <you@example.com>
…bot#579)

## Summary

`handleObservers()` in `routes.go` was calling `GetNodeLocations()`
which fetches ALL nodes from the DB just to match ~10 observer IDs
against node public keys. With 500+ nodes this is wasteful.

## Changes

- **`db.go`**: Added `GetNodeLocationsByKeys(keys []string)` — queries
only the rows matching the given public keys using a parameterized
`WHERE LOWER(public_key) IN (?, ?, ...)` clause.
- **`routes.go`**: `handleObservers` now collects observer IDs and calls
the targeted method instead of the full-table scan.
- **`coverage_test.go`**: Added `TestGetNodeLocationsByKeys` covering
known key, empty keys, and unknown key cases.

## Performance

With ~10 observers and 500+ nodes, the query goes from scanning all 500
rows to fetching only ~10. The original `GetNodeLocations()` is
preserved for any other callers.

Fixes Kpa-clawbot#378

Co-authored-by: you <you@example.com>
…query (Kpa-clawbot#580)

## Summary

Eliminates the N+1 API call storm when toggling off "Group by Hash" in
the packets table.

## Problem

When ungrouped mode was active, `loadPackets()` fired individual
`/api/packets/{hash}` requests for every multi-observation packet. With
200+ multi-obs packets, this created 200+ parallel HTTP requests —
overwhelming both browser connection limits and the server.

## Fix

The server already supports `expand=observations` on the `/api/packets`
endpoint, which returns observations inline. Instead of:

1. Always fetching grouped (`groupByHash=true`)
2. Then N+1 fetching each packet's children individually

We now:

1. Fetch grouped when grouped mode is active (`groupByHash=true`)
2. Fetch with `expand=observations` when ungrouped — **single API call**
3. Flatten observations client-side

**Result: 200+ API calls → 1 API call.**

## Changes

- `public/packets.js`: Replaced N+1 observation fetching loop with
single `expand=observations` query parameter, flatten inline
observations client-side.

## Testing

- All frontend tests pass (packet-filter: 62/62, frontend-helpers:
445/445)
- All Go backend tests pass

Fixes Kpa-clawbot#382

Co-authored-by: you <you@example.com>
…ot#581)

## Summary

`replayRecent()` in `live.js` fetched observation details for 8 packet
groups **sequentially** — each `await fetch()` waited for the previous
to complete before starting the next.

## Change

Replaced the sequential `for` loop with `Promise.all()` to fetch all 8
detail API calls **concurrently**. The mapping from results to live
packets is unchanged.

**Before:** 8 sequential fetches (total time ≈ sum of all request
durations)
**After:** 8 parallel fetches (total time ≈ max of all request
durations)

## Notes

- `replayRecent()` is currently disabled (commented out at line 856), so
this is dormant code — no runtime risk
- No behavioral change: same data mapping, same rendering, same VCR
buffer population
- All existing tests pass

Fixes Kpa-clawbot#394

---------

Co-authored-by: you <you@example.com>
…pa-clawbot#582)

## Summary

Eliminates visible marker flicker on zoom/resize events in the map page
when displaying 500+ nodes.

## Problem

`renderMarkers()` was called on every `zoomend` and `resize` event,
which did `markerLayer.clearLayers()` followed by a full rebuild of all
markers. With many nodes, this caused a visible flash where all markers
disappeared briefly before being re-added.

## Solution

Instead of rebuilding all markers from scratch on zoom/resize:

1. **Store Leaflet layer references** on marker data objects
(`_leafletMarker`, `_leafletLine`, `_leafletDot`) during the initial
full render
2. **Add `_repositionMarkers()`** — re-runs `deconflictLabels()` at the
new zoom level and updates existing marker positions via
`setLatLng()`/`setLatLngs()` without clearing the layer group
3. **Debounce zoom/resize handlers** (150ms) to coalesce rapid events
during animated zooms
4. **Dynamically manage offset indicators** — adds/removes deconfliction
offset lines and dots as positions change at different zoom levels

Full `renderMarkers()` is still called for filter changes, data updates,
and theme changes — only zoom/resize uses the lightweight repositioning
path.

## Complexity

- `_repositionMarkers()`: O(n) — single pass over stored marker data
- `deconflictLabels()`: O(n × k) where k is max spiral offsets (48) —
unchanged
- No new API calls, no DOM rebuilds

Fixes Kpa-clawbot#393

---------

Co-authored-by: you <you@example.com>
…ant API call (Kpa-clawbot#583)

## Summary

Reduces the analytics nodes tab from 3 parallel API calls to 2 by
computing network status (active/degraded/silent counts) client-side
instead of fetching from `/nodes/network-status`.

## What Changed

**`public/analytics.js` — `renderNodesTab()`:**
- Removed the `/nodes/network-status` API call from the `Promise.all`
batch
- Added client-side computation of active/degraded/silent counts using
the shared `getHealthThresholds()` function from `roles.js`
- Uses `nodesResp.total` and `nodesResp.counts` (already returned by
`/nodes` endpoint) for total node count and role breakdown

## Why This Works

The `/nodes` response already includes:
- `total` — count of all matching nodes (server-computed across full DB)
- `counts` — role counts across all nodes (from `GetAllRoleCounts()`)
- Per-node `last_seen`/`last_heard` timestamps

The `getHealthThresholds()` function in `roles.js` provides the same
degraded/silent thresholds used server-side, so client-side status
computation produces equivalent results for the loaded node set.

## Performance

- **Before:** 3 parallel API calls (`/nodes`, `/nodes/bulk-health`,
`/nodes/network-status`)
- **After:** 2 parallel API calls (`/nodes`, `/nodes/bulk-health`)
- Network status computation is O(n) over the 200 loaded nodes —
negligible client-side cost
- The `/nodes/network-status` endpoint scanned ALL nodes in the DB on
every call; this eliminates that server-side work entirely

## Testing

- All frontend helper tests pass (445/445)
- All packet filter tests pass (62/62)  
- All aging tests pass (29/29)
- All Go backend tests pass

Fixes Kpa-clawbot#392

---------

Co-authored-by: you <you@example.com>
…t#584)

## Summary

Compress `public/og-image.png` from **1,159,050 bytes (1.1MB)** to
**234,899 bytes (235KB)** — an **80% reduction**.

## What Changed

- Applied lossy PNG quantization via `pngquant` (quality 45-65, speed 1)
- Image dimensions unchanged: 1200×630px (standard OG image size)
- Visual quality remains suitable for social media previews

## Why

A 1.1MB OpenGraph image is excessive. Typical OG images are 50-200KB.
This reduces deployment size and Git repo bloat without affecting
functionality (browsers don't preload OG images).

## Testing

- Unit tests pass (`npm run test:unit`)
- No code changes — image-only commit
- `index.html` reference unchanged (`<meta property="og:image"
content="/og-image.png">`)

Fixes Kpa-clawbot#397

Co-authored-by: you <you@example.com>
…me (Kpa-clawbot#585)

## Summary

Coalesce WS-triggered `renderTableRows()` calls using
`requestAnimationFrame` instead of `setTimeout` debouncing.

Fixes Kpa-clawbot#396

## Problem

During high WebSocket throughput, multiple WS batches could each trigger
a `renderTableRows()` call via `setTimeout(..., 200)`. With rapid
batches, this caused the 50K-row table to be fully rebuilt every few
hundred milliseconds, causing UI jank.

## Solution

Replace the `setTimeout`-based debounce with a `requestAnimationFrame`
coalescing pattern:

1. **`scheduleWSRender()`** — sets a dirty flag and schedules a single
rAF callback
2. **Dirty flag** — multiple WS batches within the same frame just set
the flag; only one render fires
3. **Cleanup** — `destroy()` cancels any pending rAF and resets the
dirty flag

This ensures at most **one `renderTableRows()` per animation frame**
(~16ms), regardless of how many WS batches arrive.

## Performance justification

- **Before:** Each WS batch → `setTimeout(renderTableRows, 200)` — N
batches in <200ms = N renders
- **After:** N batches in one frame → 1 render on next rAF (~16ms)
- Worst case goes from O(N) renders per second to O(60) renders per
second (frame-capped)

## Changes

- `public/packets.js`: Add `scheduleWSRender()` with rAF + dirty flag;
replace setTimeout in WS handler; clean up in `destroy()`
- `test-frontend-helpers.js`: Update tests to verify rAF coalescing
pattern instead of setTimeout debounce

## Testing

- All existing tests pass (`npm test` — 0 failures)
- Updated 2 test cases to verify new rAF coalescing behavior

Co-authored-by: you <you@example.com>
…hange (Kpa-clawbot#586)

## Summary

Fixes the N+1 API call pattern when changing observation sort mode on
the packets page. Previously, switching sort to Path or Time fired
individual `/api/packets/{hash}` requests for **every**
multi-observation group without cached children — potentially 100+
concurrent requests.

## Changes

### Backend: Batch observations endpoint
- **New endpoint:** `POST /api/packets/observations` accepts `{"hashes":
["h1", "h2", ...]}` and returns all observations keyed by hash in a
single response
- Capped at 200 hashes per request to prevent abuse
- 4 test cases covering empty input, invalid JSON, too-many-hashes, and
valid requests

### Frontend: Use batch endpoint
- `packets.js` sort change handler now collects all hashes needing
observation data and sends a single POST request instead of N individual
GETs
- Same behavior, single round-trip

## Performance

- **Before:** Changing sort with 100 visible groups → 100 concurrent API
requests, browser connection queueing (6 per host), several seconds of
lag
- **After:** Single POST request regardless of group count, response
time proportional to store lookup (sub-millisecond per hash in memory)

Fixes Kpa-clawbot#389

---------

Co-authored-by: you <you@example.com>
…bot#587)

## Summary

Consolidates the 4 parallel `/api/analytics/subpaths` calls in the Route
Patterns tab into a single `/api/analytics/subpaths-bulk` endpoint,
eliminating 3 redundant server-side scans of the subpath index on cache
miss.

## Changes

### Backend (`cmd/server/routes.go`, `cmd/server/store.go`)
- New `GET
/api/analytics/subpaths-bulk?groups=2-2:50,3-3:30,4-4:20,5-8:15`
endpoint
- Groups format: `minLen-maxLen:limit` comma-separated
- `GetAnalyticsSubpathsBulk()` iterates `spIndex` once, bucketing
entries into per-group accumulators by hop length
- Hop name resolution is done once per raw hop and shared across groups
- Results are cached per-group for compatibility with existing
single-key cache lookups
- Region-filtered queries fall back to individual
`GetAnalyticsSubpaths()` calls (region filtering requires
per-transmission observer checks)

### Frontend (`public/analytics.js`)
- `renderSubpaths()` now makes 1 API call instead of 4
- Response shape: `{ results: [{ subpaths, totalPaths }, ...] }` —
destructured into the same `[d2, d3, d4, d5]` variables

### Tests (`cmd/server/routes_test.go`)
- `TestAnalyticsSubpathsBulk`: validates 3-group response shape, missing
params error, invalid format error

## Performance

- **Before:** 4 API calls → 4 scans of `spIndex` + 4× hop resolution on
cache miss
- **After:** 1 API call → 1 scan of `spIndex` + 1× hop resolution
(shared cache)
- Cache miss cost reduced by ~75% for this tab
- No change on cache hit (individual group caching still works)

Fixes Kpa-clawbot#398

Co-authored-by: you <you@example.com>
…t scan (Kpa-clawbot#588)

## Problem

`GetNodeAnalytics()` in `store.go` scans ALL 30K+ packets doing
`strings.Contains` on every JSON blob when the node has a name, then
filters by time range *after* the full scan. This is `O(packets ×
json_length)` on every `/api/nodes/{pubkey}/analytics` request.

## Fix

Move the `fromISO` time check inside the scan loop so old packets are
skipped **before** the expensive `strings.Contains` matching. For the
non-name path (indexed-only), the time filter is also applied inline,
eliminating the separate `allPkts` intermediate slice.

### Before
1. Scan all packets → collect matches (including old ones) → `allPkts`
2. Filter `allPkts` by time → `packets`

### After
1. Scan packets, skip `tx.FirstSeen <= fromISO` immediately → `packets`

This avoids `strings.Contains` calls on packets outside the requested
time window (typically 7 days out of months of data).

## Complexity
- **Before:** `O(total_packets × avg_json_length)` for name matching
- **After:** `O(recent_packets × avg_json_length)` — only packets within
the time window are string-matched

## Testing
- `cd cmd/server && go test ./...` — all tests pass

Fixes Kpa-clawbot#367

Co-authored-by: you <you@example.com>
…pa-clawbot#589)

## Summary

`QueryMultiNodePackets()` was scanning ALL packets with
`strings.Contains` on JSON blobs — O(packets × pubkeys × json_length).
With 30K+ packets and multiple pubkeys, this caused noticeable latency
on `/api/packets?nodes=...`.

## Fix

Replace the full scan with lookups into the existing `byNode` index,
which already maps pubkeys to their transmissions. Merge results with
hash-based deduplication, then apply time filters.

**Before:** O(N × P × J) where N=all packets, P=pubkeys, J=avg JSON
length
**After:** O(M × P) where M=packets per pubkey (typically small), plus
O(R log R) sort for pagination correctness

Results are sorted by `FirstSeen` after merging to maintain the
oldest-first ordering expected by the pagination logic.

Fixes Kpa-clawbot#357

Co-authored-by: you <you@example.com>
)

## Summary

`EvictStale()` was doing O(n) linear scans per evicted item to remove
from secondary indexes (`byObserver`, `byPayloadType`, `byNode`).
Evicting 1000 packets from an observer with 50K observations meant 1000
× 50K = 50M comparisons — all under a write lock.

## Fix

Replace per-item removal with batch single-pass filtering:

1. **Collect phase**: Walk evicted packets once, building sets of
evicted tx IDs, observation IDs, and affected index keys
2. **Filter phase**: For each affected index slice, do a single pass
keeping only non-evicted entries

**Before**: O(evicted_count × index_slice_size) per index — quadratic in
practice
**After**: O(evicted_count + index_slice_size) per affected key — linear

## Changes

- `cmd/server/store.go`: Restructured `EvictStale()` eviction loop into
collect + batch-filter pattern

## Testing

- All existing tests pass (`cd cmd/server && go test ./...`)

Fixes Kpa-clawbot#368

Co-authored-by: you <you@example.com>
## Summary

Sort `snrVals` and `rssiVals` once upfront in `computeAnalyticsRF()` and
read min/max/median directly from the sorted slices, instead of copying
and sorting per stat call.

## Changes

- Sort both slices once before computing stats (2 sorts total instead of
4+ copy+sorts)
- Read `min` from `sorted[0]`, `max` from `sorted[len-1]`, `median` from
`sorted[len/2]`
- Remove the now-unused `sortedF64` and `medianF64` helper closures

## Performance impact

With 100K+ observations, this eliminates multiple O(n log n) copy+sort
operations. Previously each call to `medianF64` did a full copy + sort,
and `minF64`/`maxF64` did O(n) scans on the unsorted array. Now: 2
in-place sorts total, O(1) lookups for min/max/median.

Fixes Kpa-clawbot#366

Co-authored-by: you <you@example.com>
…bot#592)

## Summary

Combines the chained `filterTxSlice` calls in `filterPackets()` into a
single pass over the packet slice.

## Problem

When multiple filter parameters are specified (e.g.,
`type=4&route=1&since=...&until=...`), each filter created a new
intermediate `[]*StoreTx` slice. With N filters, this meant N separate
scans and N-1 unnecessary allocations.

## Fix

All filter predicates (type, route, observer, hash, since, until,
region, node) are pre-computed before the loop, then evaluated in a
single `filterTxSlice` call. This eliminates all intermediate
allocations.

**Preserved behavior:**
- Fast-path index lookups for hash-only and observer-only queries remain
unchanged
- Node-only fast-path via `byNode` index preserved
- All existing filter semantics maintained (same comparison operators,
same null checks)

**Complexity:** Single `O(n)` pass regardless of how many filters are
active, vs previous `O(n * k)` where k = number of active filters (each
pass is O(n) but allocates).

## Testing

All existing tests pass (`cd cmd/server && go test ./...`).

Fixes Kpa-clawbot#373

Co-authored-by: you <you@example.com>
… shutdown (Kpa-clawbot#593)

## Summary

Replace `time.Tick()` with `time.NewTicker()` in the auto-prune
goroutine so it stops cleanly during graceful shutdown.

## Problem

`time.Tick` creates a ticker that can never be garbage collected or
stopped. While the prune goroutine runs for the process lifetime, it
won't stop during graceful shutdown — the goroutine leaks past the
shutdown sequence.

## Fix

- Create a `time.NewTicker` and a done channel
- Use `select` to listen on both the ticker and done channel
- Stop the ticker and close the done channel in the shutdown path (after
`poller.Stop()`)
- Pattern matches the existing `StartEvictionTicker()` approach

## Testing

- `go build ./...` — compiles cleanly
- `go test ./...` — all tests pass

Fixes Kpa-clawbot#377

Co-authored-by: you <you@example.com>
…construction (Kpa-clawbot#594)

## Summary

Optimizes `QueryGroupedPackets()` in `store.go` to eliminate two major
inefficiencies on every grouped packet list request:

### Changes

1. **Cache `UniqueObserverCount` on `StoreTx`** — Instead of iterating
all observations to count unique observers on every query
(O(total_observations) per request), we now track unique observers at
ingest time via an `observerSet` map and pre-computed
`UniqueObserverCount` field. This is updated incrementally as
observations arrive.

2. **Defer map construction until after pagination** — Previously,
`map[string]interface{}` was built for ALL 30K+ filtered results before
sorting and paginating. Now the grouped cache stores sorted `[]*StoreTx`
pointers (lightweight), and `groupedTxsToPage()` builds maps only for
the requested page (typically 50 items). This eliminates ~30K map
allocations per cache miss.

3. **Lighter cache footprint** — The grouped cache now stores
`[]*StoreTx` instead of `*PacketResult` with pre-built maps, reducing
memory pressure and GC work.

### Complexity

- Observer counting: O(1) per query (was O(total_observations))
- Map construction: O(page_size) per query (was O(n) where n = all
filtered results)
- Sort remains O(n log n) on cache miss, but the cache (3s TTL) absorbs
repeated requests

### Testing

- `cd cmd/server && go test ./...` — all tests pass
- `cd cmd/ingestor && go build ./...` — builds clean

Fixes Kpa-clawbot#370

---------

Co-authored-by: you <you@example.com>
…-clawbot#595)

## Summary

`txToMap()` previously always allocated observation sub-maps for every
packet, even though the `/api/packets` handler immediately stripped them
via `delete(p, "observations")` unless `expand=observations` was
requested. A typical page of 50 packets with ~5 observations each caused
300+ unnecessary map allocations per request.

## Changes

- **`txToMap`**: Add variadic `includeObservations bool` parameter.
Observations are only built when `true` is passed, eliminating
allocations when they'd just be discarded.
- **`PacketQuery`**: Add `ExpandObservations bool` field to thread the
caller's intent through the query pipeline.
- **`routes.go`**: Set `ExpandObservations` based on
`expand=observations` query param. Removed the post-hoc `delete(p,
"observations")` loop — observations are simply never created when not
requested.
- **Single-packet lookups** (`GetPacketByID`, `GetPacketByHash`): Always
pass `true` since detail views need observations.
- **Multi-node/analytics queries**: Default (no flag) = no observations,
matching prior behavior.

## Testing

- Added `TestTxToMapLazyObservations` covering all three cases: no flag,
`false`, and `true`.
- All existing tests pass (`go test ./...`).

## Perf Impact

Eliminates ~250 observation map allocations per /api/packets request (at
default page size of 50 with ~5 observations each). This is a
constant-factor improvement per request — no algorithmic complexity
change.

Fixes Kpa-clawbot#374

Co-authored-by: you <you@example.com>
Replace full tbody rebuild on every scroll frame with a range-diff
that only adds/removes the entry rows at the edges of the visible
window. Rows emitted by buildFlatRowHtml/buildGroupRowHtml now carry
data-entry-idx so the diff can target them precisely.

Full rebuild is still used for initial render and large scroll jumps
past the buffer. Also loads packet-helpers.js in the test sandbox,
fixing 7 pre-existing test failures for buildFlatRowHtml /
buildGroupRowHtml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@efiten efiten closed this Apr 4, 2026
Kpa-clawbot pushed a commit that referenced this pull request May 16, 2026
…pi/perf when every chunk errors

Currently loadBackgroundChunks unconditionally sets backgroundLoadDone=true
even when every chunk errored, so /api/perf reports
'backgroundLoadComplete: true' while data is silently missing. Operators
have no signal of failure. This test fails because backgroundLoadFailed
key does not yet exist in the perf payload (Munger r2 #3).
Kpa-clawbot pushed a commit that referenced this pull request May 16, 2026
… — failure observability, lock granularity, single cutoff, atomic contract

Five-in-one because they all touch the same hot-startup data flow:

#3 backgroundLoadFailed flag: surfaced in /api/perf map and the typed
   PerfPacketStoreStats struct. Set true iff at least one chunk errored
   during loadBackgroundChunks. backgroundLoadComplete now only signals
   'the loader returned', backgroundLoadFailed signals data integrity.

#4 atomic ordering contract: doc comment on hashMigrationComplete /
   backgroundLoadDone / backgroundLoadFailed explaining what each gates
   and the read-order invariant (Done must be read before Failed is
   meaningful).

Kpa-clawbot#6 loadChunk merge: the previous implementation held s.mu.Lock across
   the entire merge of a chunk (hundreds of ms on large chunks);
   runtime.Gosched between chunks didn't help readers/ingest that
   arrived mid-chunk. Now the prepend is one short critical section
   and the per-index merge runs in 500-packet batches with the lock
   released+reacquired between batches plus a Gosched. Counter update
   is its own tiny final section.

Kpa-clawbot#7 single cutoff in Load(): replaced the second time.Now().UTC() call
   used to seed s.oldestLoaded with the same hotCutoffStr that fed the
   load SQL. Microsecond skew at chunk borders is gone.

#3 covered by TestHotStartup_BackgroundLoadFailureSurfacesInPerf
   (RED in the previous commit, GREEN here).
Kpa-clawbot#6/Kpa-clawbot#7 covered by existing TestHotStartup_BackgroundFillsToRetention
   and TestHotStartup_ConcurrentQueryDuringBackgroundLoad — both still
   pass. bounded_load_test fixture had to switch observations.timestamp
   from RFC3339 string to unix int (production schema) so the restored
   item-1 RFC3339 subquery query path can match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

debt: renderVisibleRows rebuilds innerHTML on every scroll frame

2 participants