Skip to content

perf(hub): strip large payload bodies in digest fold loaders (#118 §3)#297

Merged
physercoe merged 1 commit into
mainfrom
perf/118-fold-strip-payload-bodies
Jun 27, 2026
Merged

perf(hub): strip large payload bodies in digest fold loaders (#118 §3)#297
physercoe merged 1 commit into
mainfrom
perf/118-fold-strip-payload-bodies

Conversation

@physercoe

Copy link
Copy Markdown
Owner

Third Tier-A perf PR for #118. Cuts the dominant cost of the digest backfill — loading + unmarshaling the large text bodies of a 10k-event session.

Problem (#118 bottleneck #1)

backfillAgentDigest and the fold worker load the agent's full event log via loadFoldEvents{,Before,After}, pulling each row's complete payload_json and json.Unmarshal-ing it into a map[string]any. That includes the large display bodies — a single text event can carry the run's whole accumulated transcript. For a 10k-event session that text I/O + unmarshal dominates the backfill that still runs on a schema bump or live read-repair (after siblings #295/#296).

Fix

The fold only ever reads small structured keys — input_tokens, output_tokens, cost_usd, by_model, tool ids/names, status, is_error, type, turn_idnever a body. So shrink the payload server-side: a shared foldEventCols projection json_removes text/content/message/delta/output/thinking/thought/reasoning before the row crosses into Go.

  • SQLite silently ignores json_remove paths that don't exist → listing a body an event lacks is a no-op.
  • json_valid guards a NULL/malformed payload through untouched (scanFoldEvents already tolerates a non-JSON blob).
  • Removing a field the fold doesn't read cannot change the digest.

Why a dedicated test (not the brute==incremental ones)

The brute and incremental folds share this projection, so a wrongly-dropped field would break both identically and the equivalence tests would stay green. TestFoldStripsBodiesWithoutChangingDigest instead folds the canonical vector with 50 KB bodies injected and asserts the stripped-from-DB digest is byte-identical to the full-payload reference. Verified it fails if a read field (e.g. $.name) is added to the strip list — the tools map collapses to unknown and the assert trips.

Verification

go build, go vet, gofmt clean. Digest/fold/insights/turns/canonical-vector suites pass, including -race (97s, clean).

Completes #118 Tier A (the read-cache §2 was dropped — #1 + #4 absorbed its benefit; this directly attacks the residual backfill cost instead). Siblings: #295, #296.

🤖 Generated with Claude Code

backfillAgentDigest / the fold worker load the agent's full event log via
loadFoldEvents{,Before,After}, pulling each row's complete payload_json and
json.Unmarshal-ing it — including the large display bodies (a single `text`
event can carry the run's whole accumulated transcript). For a 10k-event
session that text I/O + unmarshal dominates the backfill (#118 bottleneck #1).

The fold only ever reads small structured keys (input/output_tokens, cost_usd,
by_model, tool ids/names, status, is_error, type, turn_id) — never a body. So
shrink the payload server-side: a shared `foldEventCols` projection json_remove's
text/content/message/delta/output/thinking/thought/reasoning before the row
crosses into Go. SQLite ignores absent paths (listing a body an event lacks is a
no-op) and json_valid guards a NULL/malformed payload through untouched.

Removing a field the fold doesn't read cannot change the digest — and the
brute==incremental tests CAN'T catch a wrongly-dropped field (both paths share
this projection, so they'd break identically). So
TestFoldStripsBodiesWithoutChangingDigest folds the canonical vector with 50KB
bodies injected and asserts the stripped-from-DB digest is byte-identical to the
full-payload reference (verified to fail if a read field like $.name is added to
the strip list).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@physercoe physercoe merged commit 88e37af into main Jun 27, 2026
4 checks passed
@physercoe physercoe deleted the perf/118-fold-strip-payload-bodies branch June 27, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant