Skip to content

fix(infra): docker fail-fast & env hygiene (#5 review)#16

Merged
DevHDI merged 3 commits into
mainfrom
fix/infra-hardening
May 13, 2026
Merged

fix(infra): docker fail-fast & env hygiene (#5 review)#16
DevHDI merged 3 commits into
mainfrom
fix/infra-hardening

Conversation

@DevHDI

@DevHDI DevHDI commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

Addresses 3 infrastructure findings from the Raven's Verdict review on bmad-method-ui#5.

Findings addressed

# Severity Issue Fix
#3 🔴 Critical Dockerfile CMD used || then ;node server.js started even when prisma migrate deploy failed Replace ; with && so the container exits non-zero on migration failure
#11 🟡 Moderate .env.example shipped bmad_dev_password — a realistic value that gets copy-pasted into prod Replace with <your-db-password> placeholder + point the reader at scripts/setup.sh / docker-compose.yml
#10 🟡 Moderate src/lib/rate-limit.ts is process-local — known MVP tradeoff but the constraint was invisible in the code Add a top-of-file block documenting the in-memory scope, restart/multi-instance consequences, and migration path

Commits

  • a195878 fix(docker): fail-fast if prisma migrate deploy fails
  • aee66fe chore(env): replace realistic dev password with a placeholder in .env.example
  • f3a25b4 docs(rate-limit): document MVP in-memory limitations

Verification

  • pnpm tsc --noEmit — clean
  • pnpm lint — 0 errors (2 pre-existing warnings unrelated)
  • pnpm test — 151/151 passing

Test plan

  • Build the Docker image with a deliberately broken migration → container exits 1, app does NOT start (was: warning logged, app started against mismatched schema)
  • Run bash scripts/setup.sh on a fresh clone → .env is filled with a working dev DATABASE_URL (setup.sh injects the password explicitly; .env.example placeholder is intentional)
  • Read src/lib/rate-limit.ts cold → constraint is obvious from the top-of-file doc

Context

PR 3 of 4 addressing bmad-method-ui#5 review.

DevHDI added 3 commits May 13, 2026 23:08
The previous CMD was:
  prisma migrate deploy || echo 'Warning: migration skipped'; node server.js

The shell parses this as:
  (prisma migrate deploy || echo 'Warning: migration skipped') ; node server.js

The semicolon binds at the statement level, so `node server.js` ran
regardless of whether migrations succeeded. If the database schema was
out of sync, the app started against a mismatched schema and risked
runtime errors or data corruption on the first query.

Replace `;` with `&&` so the container exits non-zero when migrations
fail, which surfaces the failure to the orchestrator (Docker, Compose,
Kubernetes) and prevents the app from running on a broken schema.

Addresses finding #3 from bmad-method-ui#5 review.
….example

The DATABASE_URL line shipped `bmad_dev_password` — a value that looks
plausible enough to be copy-pasted into a production deployment.
BETTER_AUTH_SECRET in the same file already follows the safer pattern
(empty value plus generation instructions); apply the same hygiene to
the database URL.

Replace the literal with `<your-db-password>` so a copy-paste into
prod fails loudly. The comment now points readers at scripts/setup.sh
(which writes a working dev URL) and at docker-compose.yml (whose
POSTGRES_PASSWORD must match).

Addresses finding #11 from bmad-method-ui#5 review.
checkRateLimit is process-local and resets on every restart, deploy,
or new container instance — a known trade-off that wasn't visible in
the code. The next person to add a rate limit (or to scale the app
horizontally) needs to see the constraint up front rather than
discovering it via a bypass.

Add a top-of-file block that:
  - States the in-memory / per-process scope explicitly
  - Lists the practical consequences (restart loss, multi-instance
    bypass via load-balancer rotation)
  - Points at the migration path (shared store such as Redis) and
    notes that the public API is intentionally swap-friendly

No behavior change.

Addresses finding #10 from bmad-method-ui#5 review.
@DevHDI DevHDI merged commit 2323d41 into main May 13, 2026
2 checks passed
@DevHDI DevHDI deleted the fix/infra-hardening branch May 13, 2026 22:00
DevHDI added a commit that referenced this pull request May 13, 2026
When a GraphQL batch fails (transient network error, GitHub hiccup),
the catch handler set every repo in that chunk to `false`, which is
the same value used for "confirmed no BMAD". A caller asking the UI
"does this repo have BMAD?" got "no" — when the truthful answer was
"we don't know, the API call failed".

Change the result map type from `Record<string, boolean>` to
`Record<string, boolean | null>`:
  - true  → BMAD detected
  - false → confirmed absent
  - null  → detection failed for this repo (unknown)

The single in-repo consumer (add-repo-dialog) already coerces
`bmadResult.data[r.fullName] ?? false` at the boundary so existing
behavior is preserved; surfacing the unknown count in the UI is a
follow-up that can build on this contract.

Addresses finding #16 from bmad-method-ui#5 review.
DevHDI added a commit that referenced this pull request May 13, 2026
* fix(admin): show parsingErrorRate as N/A until tracking exists

The admin dashboard displayed `0%` for "Parsing Errors" with the
description "Target KPI: < 1%", suggesting the metric was working when
in fact no parsing-error log table exists yet — the value was a
hardcoded zero. Admins reading "0%" would conclude the system has no
parsing errors when the reality is the data is not collected.

Change `parsingErrorRate` from `number` to `number | null` in the
UsageMetrics interface, return `null` from getUsageMetrics(), and have
the stats card render "N/A" with the description "Not yet tracked"
when the value is null. The existing display path (number + "%") is
preserved for when a real metric is wired up.

Addresses finding #17 from bmad-method-ui#5 review.

* fix(actions): represent detectBmadRepos batch failures as null

When a GraphQL batch fails (transient network error, GitHub hiccup),
the catch handler set every repo in that chunk to `false`, which is
the same value used for "confirmed no BMAD". A caller asking the UI
"does this repo have BMAD?" got "no" — when the truthful answer was
"we don't know, the API call failed".

Change the result map type from `Record<string, boolean>` to
`Record<string, boolean | null>`:
  - true  → BMAD detected
  - false → confirmed absent
  - null  → detection failed for this repo (unknown)

The single in-repo consumer (add-repo-dialog) already coerces
`bmadResult.data[r.fullName] ?? false` at the boundary so existing
behavior is preserved; surfacing the unknown count in the UI is a
follow-up that can build on this contract.

Addresses finding #16 from bmad-method-ui#5 review.

* chore(actions): log extendBmadDirs failures instead of silently catching

fetchFileContent wraps provider.extendBmadDirs in `try { ... } catch {}`
with a comment relying on the downstream getFileContent →
assertSafePath check to enforce the whitelist invariant. That
reasoning is still correct (assertSafePath refuses any path whose
top-level segment is not in the whitelist, so an extension that
failed to apply cannot be exploited), but discarding the error gives
zero visibility when the validation path is exercised in production.

Replace the silent catch with a console.warn that captures the
owner / name / segment / error message. No behavior change — the
authoritative refusal still happens in assertSafePath if the segment
was not added.

Addresses finding #15 from bmad-method-ui#5 review.

* fix(dialog): surface unknown-detection count to the user

Commit 8090e03 had the server return `null` for repos in a failed
batch, but add-repo-dialog still applied `?? false` at the boundary
and rendered the repo identically to a confirmed "no BMAD". The data
contract was honest but the user saw the same lie as before.

Count the null values returned by detectBmadRepos and surface that
count via a new `unknownDetectCount` state. When > 0 the dialog
shows an amber warning ("BMAD detection failed for N
repositor(y/ies)…") in the same area as the existing detection
status. The repo list itself stays usable; only the misleading badge
gets the disclaimer.

State name is intentionally distinct from existing/planned warning
state in PR #14 so the two surfaces can coexist when
both PRs merge.

Addresses the follow-up review on 8090e03.

* fix(dialog): use complete words for repository pluralization

The previous JSX split the word as `repositor` + `{count === 1 ? "y"
: "ies"}`. React preserves whitespace between text and expressions,
so the rendered output had a space in the middle of the word
("repositor y" / "repositor ies").

Move the whole word into the conditional ("repository" /
"repositories") and use an explicit `{" "}` to control the space
between the count and the word. The dash and trailing text are also
on their own lines, which JSX collapses to single spaces correctly.

Addresses the follow-up review on 56567db.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant