feat: BoringStack greenfield scaffolder + overnight harness findings#48
Conversation
…27b evals Validated harness fixes surfaced by running local-model evals (see memory tsforge-eval-findings-2026-06 / autonomous-overnight-run). Each tightens or de-frictions the gate without weakening strictness: - detect-gate/web-gate: force-exclude *.test.ts from type-aware lint + tsc overlay + buildWebTscCheck (cwd-threaded) so a test sibling can't send the model into rewriting tsconfig.json; thread cwd through buildWebTypeGate/buildWebGate. - loop: net-progress guard + high turn ceiling in turn.ts/settleGate; same-persist + gate-stuck guards; readonly-spin guard; loop.constants tuning. - files/loop tools: edit-hashline + file-ops + write-guard fuzzy-edit + self-import (join-not-resolve relative-path) fixes; salvage-toolcall hardening. - inference: transport/stream/wire robustness; openai-compatible. - rule-packs: no-jsx-computation relaxed on idiomatic inline code (warn, not error); test-sibling-required + typescript-core/react-component-architecture refinements. - policy: patterns + evaluation refinements. All with accompanying tests; suite green (1469 pass, 2 pre-existing flakes: transport model-endpoint-unreachable + boot-oracle pollUntilReady).
Manifest-driven greenfield scaffolder that stands up boringstack (or its Astro
static site) by driving boringstack's OWN scripts — tsforge holds no stack
knowledge, it reads .tsforge/scaffold-manifest.json from the clone.
Pure core (unit-tested): plan (topology / conditional-required secrets with
requiresSecretsWhen / cross-rules / env edits routed per-file via envFileByGroup +
${STACK}), manifest parser (unknown-narrowing, no casts), completeness alarm
(fails on any unmodelled WITH_*/*_ENABLED toggle; validated against a snapshot of
boringstack's real .env.example), wizard step generation, .env apply with secret
redaction.
I/O layer (injected runner/fs/poller, fully faked in tests): clone at ref + record
replay sha, configure (rename-project.sh → setup.sh → per-file env writes →
generated secrets), boot (setup.sh --up + health poll), run-scaffold orchestrator
+ gate-command composition. 57 scaffold tests, lint + types clean.
…checkout Opt-in (TSFORGE_SCAFFOLD_E2E=1) + only when a local clone exists. Proves the I/O layer end-to-end against the real repo: shallow clone, sha resolve, BoringStack's own setup.sh bootstrap, per-file env writes seeded from .example. Skips in CI.
The shared seam for the headless eval driver + (later) the interactive CLI: --archetype/--stack/--dest/--set KEY=VALUE/--multi KEY=a,b/--ref/--no-boot → IScaffoldAnswers, validated. 9 tests.
- loadBundledManifest(): the bootstrap manifest shipped in src/scaffold (mirrors boringstack's .tsforge source-of-truth; drift-guarded against the test fixture). defaultRef aligned to 'main' (tsforge records the resolved sha per scaffold). - headless-scaffold-build.ts: non-interactive entry (parseScaffoldArgs → runScaffold) that clones, configures via boringstack's own scripts, optionally boots, and prints the handoff (gateCwd + composed gateCommand). BORINGSTACK_REPO env override for dev/E2E. Validated end-to-end against a local clone (clone→setup.sh→env writes→ replay record). Model-driven build loop is the documented next step (CLI). 68 scaffold tests + 1 gated E2E, lint + types clean.
…olations) scaffoldPreview(manifest, answers) → the wizard overview / headless dry-run text: N-services topology (the 5-vs-20 cost), the conditional-required secret keys the user must supply (never their values), and any blocking cross-rule violation. Pure, derived from answersToPlan. 5 tests.
runScaffoldCommand: archetype/stack/dest from flags, config toggles via the interactive wizard (live topology/secrets/violations preview on the overview), --set/--multi flags override; off-TTY or Astro skips the wizard. Routed as a 'tsforge scaffold' subcommand so its arg vocabulary doesn't collide with the harness flags. Prints the handoff: the exact 'tsforge --dir … --accept …' command to build. Validated end-to-end against a local clone. mergeAnswerValues + preview unit-tested; TTY glue degrades gracefully off-TTY.
Data-driven coverage of every infra toggle's exact service delta (both directions + sole-ownership), WUD prod-only behavior, each email provider's exact secret set (and exclusion of the others'), each OAuth provider's own credential pair, and that tracing/cache choices don't false-alarm. 20 tests.
The F2–F21 findings landed with lint debt (they were committed from an uncommitted working tree). Behavior-preserving fixes: - edit.ts: exotic-space char class → \u escapes (was literal irregular whitespace) - syntax-check.ts: parseDiagnostics read via isRecord/isArray guards (drop the as-unknown-as double cast — house rule: no casts) - transport.ts: heartbeat console.error → process.stderr.write (no-console) - wire.ts: drop dead '?? 0' on a matchAll .index (always defined) - prettier/formatting + no-lonely-if/prefer-includes auto-fixes across the findings src + test files. Only remaining lint failure is pre-existing web-fetch.ts (lazy-imports jsdom/ readability/turndown — undeclared optional deps, not installed here). Suite green (1564 pass).
… fully green) Two long-standing env-dependent flakes, now resolved: - boot-oracle 'times out when nothing answers': injected a fetchFn into pollUntilReady (matching the now/sleep injection) so the timeout path runs on the fake clock with NO real network. A real fetch to a dead URL could hang up to its 2s abort per poll (×4 > the 5s budget) when the host drops the connection. - proptest-oracle 'FAILS/PASSES' subprocess tests: gate on fast-check being resolvable (it's an opt-in dep the oracle skips when absent — 'install it to enable'), the same way browser-oracle gates on chromium. Without it the oracle no-ops (exit 0) and the assertions passed/failed for the wrong reason. Suite: 1564 pass, 0 fail (was 2 failing).
… rule
Live eval finding (saas-crm build, aeon-qwen3.6-27b): with consistent-type-assertions
'never' banning all `as`, the model reaches for branded/nominal IDs
(`string & { _brand }`), which CANNOT be constructed without a cast — so it thrashes
for many turns (observed ~25 turns churning on branded IDs + the downstream
noUncheckedIndexedAccess 'T | undefined' cascade), bouncing between rule violations
and TS2322s, sometimes abandoning to plain strings (losing safety).
The harness's philosophy is runtime validation at boundaries, not compile-time casts
(see no-unsafe-boundary-cast), so branded IDs are off-pattern anyway. Enriched the
consistent-type-assertions rule doc (what/bad/good) to say so explicitly: use a plain
`type UserId = string` + validate at the boundary, don't reach for branded types.
Pure friction reduction — no strictness change. 14 feedback tests green.
New 'Greenfield scaffolding' page (scaffold/boringstack.mdx): the two archetypes, when it runs (tsforge scaffold, greenfield only), the clone→configure→boot→handoff flow, all flags, the manifest-as-source-of-truth model + completeness alarm, and how to test it (unit / gated real-clone E2E / headless end-to-end). Added to the sidebar and the commands reference. Marked the legacy web scaffolder page (--web) as superseded/being-retired.
There was a problem hiding this comment.
Code Review
This pull request introduces a greenfield scaffolding wizard (tsforge scaffold) to stand up new projects from the BoringStack template or an Astro static site, driven by a declarative manifest. It also adds robust features like a syntax-regression guard for line edits, a net-progress convergence guard, a no-self-import ESLint rule, and improved connection retry budgets. Review feedback identifies several critical issues to address, including Windows path and CRLF line-ending compatibility, a potential process hang from a non-abortable backoff sleep, regex parsing vulnerability to comments in module-exports.ts, and missing exit code checks for the scaffolding scripts.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…tleaks - RULES.md + rule-docs.generated.json regenerated: the F2–F21 findings added 4 rules (no-self-import, max-hooks-per-file, no-anonymous-useEffect, no-derived-state-in-effect) and tonight's F22 rule-doc edit, but the generated catalog wasn't rebuilt — the CI 'rules:build + git diff RULES.md' drift check (and the docs linkcheck that depends on RULES.md) failed. Now idempotent. - .gitleaks.toml: allowlist the scaffold manifest paths. gitleaks' generic-api-key heuristic flagged `requiresSecretsWhen: "AI_ENABLED=true"` because the field name contains 'Secret'. The manifest names required-secret KEYS + gating tokens, never real values — false positive.
The explicit `ReturnType<typeof readdirSync>` annotation resolved to the Buffer-named Dirent overload under CI's @types/node (Dirent<NonSharedBuffer>), failing typecheck with TS2322/TS2345 on entry.name — though it passed on the older local @types/node. Pin name type to string via `encoding: "utf8"` and let the return type infer (never naming the version-variant generic `Dirent`), matching the inferred-type pattern the repo's other readdir calls already use. 8 tests green.
- no-self-import: barrel self-import check uses join(resolved, 'index') not a hardcoded '/index' (Windows separator safety). - configure/applyEnvEdits: preserve the file's existing line ending (CRLF vs LF) instead of forcing LF into a CRLF file (mixed-ending corruption). - configure/applyScaffold: check exit codes of rename-project.sh + setup.sh and throw on failure — no more silent half-configured scaffold. - transport: abortable backoff sleep — a caller abort during a multi-second backoff rejects immediately instead of hanging until the timer fires. - module-exports/readExportedNames: strip comments before parsing the export list, so a comment inside an export block doesn't drop valid exports.
P1 — --stack prod/smoke now drives the STACK env value (plan.ts: STACK resolves to answers.stack, not the dev-only field default; wizard.ts: STACK step defaults to the chosen stack). Was writing STACK= for prod and STACK=dev for smoke. P1 — the cloned BoringStack manifest is now the source of truth: runScaffold clones using the bundled bootstrap (repo+ref), then reads <clone>/.tsforge/ scaffold-manifest.json and plans/configures/records from THAT (falls back to bundled if absent). Honors --ref + manifest drift. P1 — boot health-check failure now sets an error, so a timed-out poll exits non-zero instead of printing 'booted false' and exiting 0. P2 — user-supplied required secrets are written: --set RESEND_API_KEY=… (and OAuth creds, etc.) now produce a secret env edit routed to the requiring field's env file, instead of being silently dropped (they're not manifest fields). P3 — the shell-write guard skips option tokens after tee, so 'tee -a src/foo.ts' is caught instead of capturing the -a flag. Tests added for each. Full suite green (1588 pass, 0 fail).
… wire.ts (Codex round 2) P1 — drop the pre-clone validation against the BUNDLED manifest: a config valid under a newer --ref was being rejected by stale bundled cross-rules. Validation now happens only post-clone against the repo's own manifest. And a cloned manifest that's present-but-malformed now THROWS (with cause) instead of silently falling back to the stale bundle. Tests: relaxed clone rules accept a bundled-invalid config; malformed clone manifest throws. P3 — wire.ts dedupe key used a literal NUL byte, so git flagged the .ts as binary (diff/review blind). Replaced with the \u0000 escape — identical at runtime, textual in source. (salvage-toolcall tests unchanged, green.)
… run away Root cause of the observed 190-turn / 2-gate-run loop: the Session loop runs the FULL gate (with the no-progress guards) only when the model YIELDS. A model stuck editing one file (read→edit_lines→read→edit_lines) never yields, so the gate never runs, the build error is never surfaced (the mid-work incremental check is TS-only, blind to CSS/build), and samePersist/gateNoProgress/net-progress never tick. Fix: track edits since the last full gate; after FULL_GATE_EVERY (9) edits without a yield, force a full gate (gateAfterChurn). That surfaces the real error into the conversation AND advances the no-progress guards, so a genuine churn loop is stopped instead of running to the turn backstop. Extracted handleWorkingTurn to keep drive() lean. Test: a never-yielding overwrite loop now forces a gate and goes stuck well before maxTurns (was: churns to the backstop, gate never runs).
…HORED files F24 (from the saas-crm churn log): a model painted its own service file into a corner — stale oldString anchors + too-large edits — and thrashed ~20 turns because the edit:not-found message said 'Do NOT use create'. But a file the model authored this session CAN be fully rewritten via create (the overwrite escape hatch). Make the message touched-aware: for an authored file, offer 'create it again to fully rewrite'; for a scaffold-owned file, keep steering to edit. Test added.
Follow-up to F22: the rule-doc steer only fires AFTER the no-`as` rule triggers, so a model whose branded-ID construction didn't trip it kept choosing branded/ nominal IDs and stuck on them (live: 'Module @/shared/shared.types has no exported member BrandId'). Promote the steer to the always-on strict-rules line (gateRulesSentence, in the build system prompt): for IDs use a plain alias (`type UserId = string`) validated at the boundary, NOT branded types (they need a forbidden `as` cast). Prevents the choice rather than only aiding recovery. Test added.
The F23 force-gate fired after FULL_GATE_EVERY edits even in a no-gate session,
where the empty gate trivially passes → send() returned {status:done} at turn 9,
before the model yielded its real response. Guard it on this.hasGate: with no gate
there's nothing to force (the churn guard exists to surface gate failures), so it's
a no-op. Regression test: a no-gate session creating 10 files runs to its yield
(responded, turns>9), not done at 9.
Two changesets (12 commits). The headline is the BoringStack greenfield scaffolder; bundled with it are the validated overnight harness findings (kept together per request).
BoringStack scaffolder (the new work)
Replaces the synthetic web scaffolder (
web-templates.ts) with a wizard that stands up BoringStack by driving its own scripts — tsforge holds no stack knowledge; the entire config surface lives in a manifest committed in the BoringStack repo (PRs boringstack#224 merged, #225 open).answersToPlan(container topology, conditional-required secrets incl.requiresSecretsWhen, cross-rules, per-file env routing viaenvFileByGroup+${STACK}),unknown-narrowing manifest parser, completeness alarm (fails on any unmodelledWITH_*/*_ENABLEDtoggle — validated against a snapshot of BoringStack's real.env.example), wizard step generation,.envapply with secret redaction.cloneRepo(shallow clone + replay-sha record),applyScaffold(rename-project.sh → setup.sh → per-file env writes → generated secrets),bootStack(setup.sh --up+ health poll),runScaffoldorchestrator + gate-command composition.parseScaffoldArgs, headlessheadless-scaffold-build.ts, bundled-manifest loader (drift-guarded), consequencescaffoldPreview, and the interactivetsforge scaffoldsubcommand.--webpage marked superseded.The old web scaffolder is not removed yet (Phase C) — it still backs the eval paths; retirement follows once those migrate.
Overnight harness findings
Validated fixes from local-model evals (
aeon-qwen3.6-27b), each tightening or de-frictioning the gate without weakening strictness:757903d): test-exclude in the web type-aware lint/tsc overlay (no tsconfig rabbit-hole), net-progress + same-persist loop guards, fuzzy-edit/self-import path fixes, inference robustness, rule-pack refinements.5c03e16): steer the model off branded/nominal IDs underconsistent-type-assertions: never(they can't be constructed without the forbidden cast → ~25-turn thrash). Pure friction reduction.f56f6a3) + made the suite fully green by fixing two env-dependent oracle-test flakes (7cfcc2c).Status
Suite green (1564 pass, 0 fail). Types + lint clean except pre-existing
web-fetch.ts(undeclared optional jsdom/readability/turndown — a dependency decision left to you).