Skip to content

fix(cache): revalidate route hardening & ordering (#5 review)#15

Merged
DevHDI merged 4 commits into
mainfrom
fix/revalidate-route-hardening
May 13, 2026
Merged

fix(cache): revalidate route hardening & ordering (#5 review)#15
DevHDI merged 4 commits into
mainfrom
fix/revalidate-route-hardening

Conversation

@DevHDI

@DevHDI DevHDI commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

Addresses 3 cache/revalidation findings surfaced by the Raven's Verdict review on bmad-method-ui#5.

Each commit is atomic and references its finding.

Findings addressed

# Severity Issue Fix
#8 🔴 Critical request.json() not caught — malformed body returned 500 instead of 400 Wrap in try/catch returning 400 "Invalid JSON"
#6 🔴 Critical revalidateTag called before the GitHub fetch — a failed fetch left the cache invalidated with no recovery Move revalidateTag after both fetch and prisma.repo.update succeed
#13 🟡 Moderate /api/revalidate had no rate limit; if the shared secret leaked, an attacker could hammer it Add per-IP rate limit (30/min) via the existing checkRateLimit helper

Commits

  • 15bfa26 fix(api): handle malformed JSON on /api/revalidate as 400
  • ff0b729 fix(actions): revalidate cache only after successful refresh
  • 93023cf fix(api): rate-limit /api/revalidate per client IP

Verification

  • pnpm tsc --noEmit — clean (after clearing stale .next/types cache)
  • pnpm lint — 0 errors (2 pre-existing warnings unrelated)
  • pnpm test — 151/151 passing

Test plan

  • curl -X POST /api/revalidate -H "x-revalidate-secret: …" -d 'not-json' → expect HTTP 400 (was 500)
  • Simulate a GitHub getTree failure in refreshGitHubRepo → cache tag is NOT invalidated; existing cached entry remains
  • 40 rapid POST requests to /api/revalidate from one IP → expect 429 after request 30 within 60s

Context

PR 2 of 4 addressing bmad-method-ui#5 review. Follow-ups:

DevHDI added 4 commits May 13, 2026 22:34
await request.json() throws SyntaxError on empty bodies, wrong
Content-Type, or truncated payloads. The error was not caught, so the
endpoint returned an unstructured 500 instead of a 400, and bypassed
any error-monitoring middleware that expects structured 4xx
responses.

Wrap the parse in try/catch and return a clear 400 "Invalid JSON".

Addresses finding #8 from bmad-method-ui#5 review.
refreshGitHubRepo called revalidateTag *before* the GitHub fetch.
When the fetch threw (network error, rate limit, 404), the cache was
already invalidated: subsequent renders fetched stale or empty data
with no recovery until a future successful refresh.

Move revalidateTag to the end of the function, after the GitHub fetch
and the prisma.repo.update both succeed. If either step fails, the
existing cached entry remains valid.

Addresses finding #6 from bmad-method-ui#5 review.
The endpoint was protected by a shared secret with no rate limit of
its own. If the secret were ever obtained (logs, accidental commit,
compromised CI), an attacker could issue unlimited revalidateTag
calls to drive continuous cache invalidation and elevate upstream
load.

Add a per-IP limit using the existing in-memory checkRateLimit
helper: 30 requests per minute, which comfortably exceeds any
legitimate build/deploy cadence. The IP is read from x-forwarded-for
(first hop) with x-real-ip and a literal "unknown" fallback so a
missing header cannot bypass the bucket.

Defense in depth — the secret remains the primary protection.

Addresses finding #13 from bmad-method-ui#5 review.
The per-IP pre-auth limit added in 93023cf trusted the
x-forwarded-for header, which is client-controlled. In the very
threat the limit was meant to mitigate ("secret leaked"), the
attacker can rotate x-forwarded-for on every request and obtain a
fresh bucket each time, neutralising the cap.

Restructure the handler:

1. Validate the shared secret first. timingSafeEqual on SHA-256
   digests is computationally resistant to brute force, so no
   pre-auth quota is needed.
2. Apply a *post-auth* rate limit with a fixed key
   `revalidate:authenticated`. This caps total invalidations across
   all valid-secret holders, so even with a leaked secret an
   attacker is bounded to 30/min regardless of source IP.

Drop the now-unused getClientIp helper.

Addresses follow-up review on 93023cf.
@DevHDI

DevHDI commented May 13, 2026

Copy link
Copy Markdown
Owner Author

Follow-up fix: rate-limit was bypassable via x-forwarded-for spoofing

Reviewer's finding was correct. The per-IP pre-auth limit added in 93023cf trusted a client-controlled header, so the very scenario it was meant to mitigate (secret leaked) could be bypassed by rotating x-forwarded-for on every request.

Fix in 8dfe59d

Restructured src/app/api/revalidate/route.ts:

  1. Validate the shared secret first. timingSafeEqual on SHA-256 digests makes brute force computationally infeasible (2^256 search space), so no pre-auth quota is meaningful.
  2. Apply a post-auth rate limit with a fixed global key (revalidate:authenticated). This caps total invalidations across all valid-secret holders — even with a leaked secret, an attacker is bounded to 30/min regardless of source IP.
  3. Dropped the now-unused getClientIp helper.

Verification

  • pnpm tsc --noEmit — clean
  • pnpm lint — 0 errors (2 pre-existing warnings)
  • pnpm test — 151/151

Note

Considered keeping a layered approach (pre-auth IP-or-global + post-auth global) but rejected it: pre-auth limits without trusted IP source are theatre, and the secret check itself is fast enough that pathological 401-spam is bounded by the runtime's request handling capacity. Single post-auth global bucket is the honest fix.

@DevHDI DevHDI merged commit 2c6c85d into main May 13, 2026
2 checks passed
@DevHDI DevHDI deleted the fix/revalidate-route-hardening branch May 13, 2026 22:00
DevHDI added a commit that referenced this pull request May 13, 2026
fetchFileContent wraps provider.extendBmadDirs in `try { ... } catch {}`
with a comment relying on the downstream getFileContent →
assertSafePath check to enforce the whitelist invariant. That
reasoning is still correct (assertSafePath refuses any path whose
top-level segment is not in the whitelist, so an extension that
failed to apply cannot be exploited), but discarding the error gives
zero visibility when the validation path is exercised in production.

Replace the silent catch with a console.warn that captures the
owner / name / segment / error message. No behavior change — the
authoritative refusal still happens in assertSafePath if the segment
was not added.

Addresses finding #15 from bmad-method-ui#5 review.
DevHDI added a commit that referenced this pull request May 13, 2026
* fix(admin): show parsingErrorRate as N/A until tracking exists

The admin dashboard displayed `0%` for "Parsing Errors" with the
description "Target KPI: < 1%", suggesting the metric was working when
in fact no parsing-error log table exists yet — the value was a
hardcoded zero. Admins reading "0%" would conclude the system has no
parsing errors when the reality is the data is not collected.

Change `parsingErrorRate` from `number` to `number | null` in the
UsageMetrics interface, return `null` from getUsageMetrics(), and have
the stats card render "N/A" with the description "Not yet tracked"
when the value is null. The existing display path (number + "%") is
preserved for when a real metric is wired up.

Addresses finding #17 from bmad-method-ui#5 review.

* fix(actions): represent detectBmadRepos batch failures as null

When a GraphQL batch fails (transient network error, GitHub hiccup),
the catch handler set every repo in that chunk to `false`, which is
the same value used for "confirmed no BMAD". A caller asking the UI
"does this repo have BMAD?" got "no" — when the truthful answer was
"we don't know, the API call failed".

Change the result map type from `Record<string, boolean>` to
`Record<string, boolean | null>`:
  - true  → BMAD detected
  - false → confirmed absent
  - null  → detection failed for this repo (unknown)

The single in-repo consumer (add-repo-dialog) already coerces
`bmadResult.data[r.fullName] ?? false` at the boundary so existing
behavior is preserved; surfacing the unknown count in the UI is a
follow-up that can build on this contract.

Addresses finding #16 from bmad-method-ui#5 review.

* chore(actions): log extendBmadDirs failures instead of silently catching

fetchFileContent wraps provider.extendBmadDirs in `try { ... } catch {}`
with a comment relying on the downstream getFileContent →
assertSafePath check to enforce the whitelist invariant. That
reasoning is still correct (assertSafePath refuses any path whose
top-level segment is not in the whitelist, so an extension that
failed to apply cannot be exploited), but discarding the error gives
zero visibility when the validation path is exercised in production.

Replace the silent catch with a console.warn that captures the
owner / name / segment / error message. No behavior change — the
authoritative refusal still happens in assertSafePath if the segment
was not added.

Addresses finding #15 from bmad-method-ui#5 review.

* fix(dialog): surface unknown-detection count to the user

Commit 8090e03 had the server return `null` for repos in a failed
batch, but add-repo-dialog still applied `?? false` at the boundary
and rendered the repo identically to a confirmed "no BMAD". The data
contract was honest but the user saw the same lie as before.

Count the null values returned by detectBmadRepos and surface that
count via a new `unknownDetectCount` state. When > 0 the dialog
shows an amber warning ("BMAD detection failed for N
repositor(y/ies)…") in the same area as the existing detection
status. The repo list itself stays usable; only the misleading badge
gets the disclaimer.

State name is intentionally distinct from existing/planned warning
state in PR #14 so the two surfaces can coexist when
both PRs merge.

Addresses the follow-up review on 8090e03.

* fix(dialog): use complete words for repository pluralization

The previous JSX split the word as `repositor` + `{count === 1 ? "y"
: "ies"}`. React preserves whitespace between text and expressions,
so the rendered output had a space in the middle of the word
("repositor y" / "repositor ies").

Move the whole word into the conditional ("repository" /
"repositories") and use an explicit `{" "}` to control the space
between the count and the word. The dash and trailing text are also
on their own lines, which JSX collapses to single spaces correctly.

Addresses the follow-up review on 56567db.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant