Skip to content

fix(server): cap held SSE/long-poll connections + cache referenced asset ids#153

Merged
benvinegar merged 1 commit into
mainfrom
fix/hold-connection-cap
Jun 26, 2026
Merged

fix(server): cap held SSE/long-poll connections + cache referenced asset ids#153
benvinegar merged 1 commit into
mainfrom
fix/hold-connection-cap

Conversation

@benvinegar

Copy link
Copy Markdown
Member

Closes #146.

What

Two halves of the issue, both in server/app.ts + the two stores:

1. Connection caps on held SSE + long-poll

/api/events (SSE) and /api/comments?wait=N (long-poll) now share a per-instance holdConnections counter, gated by maxHoldConnections (default 32, configurable via AppOptions). Over-cap → 503. Each slot releases exactly once via a guarded release() wired to stream abort, request abort, and normal return — no leak. Instant ?wait=0 reads don't count (they don't hold a socket).

2. Index the referenced-asset set

referencedAssetIds() (used by /a/:id's optimistic-read wait and putAsset's eviction scan) was re-parsing every post's surfaces + history JSON on each call — a full-table scan on every /a/:id miss. Now built lazily and maintained incrementally: createPost/updatePost fold new refs in, removePost/removeSession/importBoard invalidate. Correct because post history is append-only — a ref stays referenced until its whole post is deleted, at which point we recompute from scratch.

Why not a Cloudflare platform feature

Researched this — there's no platform equivalent for concurrent-held-connection capping:

  • DO overload protection is request-rate / queue-depth (~1000 RPS soft cap), not concurrent streams. A slow SSE flood (500 streams over 30s) stays under it yet pins 500 sockets open indefinitely — exactly the issue scenario.
  • WAF Rate Limiting is edge request-rate per IP, complementary (worth adding for a publicRead deploy as a first wave), but it's dashboard config on the zone, not something shipped in an npm package, and does nothing for the Node self-host path. server/app.ts is runtime-agnostic by repo invariant.
  • WebSocket Hibernation is WS-only; sideshow uses SSE + long-poll.

The in-process counter is what Cloudflare's own in-memory-state docs imply you do for custom per-object coordination. Layering the edge WAF rate limiter on top for a deploy is a good follow-up note, but it complements rather than replaces this.

Why 32

One workspace = one user (per AGENTS.md). Real concurrent holds: one SSE per open viewer tab (a user keeps a few) + one long-poll per active agent. A multi-agent session with 5 agents + a few tabs legitimately reaches ~15. 32 clears that with headroom; a real flood is orders of magnitude bigger, so rejecting at 32 vs 16 makes no difference to flood protection — only to legitimate use. Configurable for anyone who needs more.

Tests

  • 3 API tests: SSE cap + slot release, long-poll cap (instant reads exempt), SSE/long-poll shared budget.
  • 3 store-contract tests (run against both SqlStore and JsonFileStore): removed-post invalidation, update-keeps-history-referenced, cold-cache unreferenced.

Validation

npm run typecheck (3 programs) · npm run lint · npm run format:check · npm test — all green (270/270 on this branch; +9 from the new tests).

…set ids (#146)

Bound concurrently-held SSE (/api/events) and long-poll
(/api/comments?wait=N) connections per workspace via maxHoldConnections
(default 32, configurable through AppOptions). Over-cap returns 503; slots
release exactly once on stream/request abort or normal return. Instant
?wait=0 reads don't count. The platform has no equivalent knob — the DO
overload protection and WAF rate limiting are request-rate/queue-depth, not
concurrent-held-connection, so a slow SSE flood stays under both yet pins
sockets open indefinitely. One workspace is one user, so 32 clears real
concurrency (a few viewer tabs + several agents in a multi-agent session)
with headroom; a real flood is orders of magnitude bigger.

Also index the referenced-asset set used by /a/:id's optimistic-read wait
and putAsset's eviction scan: it was re-parsing every post's surfaces+history
JSON on each call (full-table scan on every /a/:id miss), now built lazily
and maintained incrementally on post create/update, invalidated on remove.
Correct because post history is append-only — a ref stays referenced until
its whole post is deleted.
@benvinegar benvinegar merged commit 5dfcb82 into main Jun 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No concurrency/rate caps on SSE + long-poll (DoS on publicRead boards)

1 participant