Skip to content

feat: BoringStack greenfield scaffolder + overnight harness findings#48

Merged
agjs merged 21 commits into
mainfrom
feat/scaffold-io-layer
Jun 24, 2026
Merged

feat: BoringStack greenfield scaffolder + overnight harness findings#48
agjs merged 21 commits into
mainfrom
feat/scaffold-io-layer

Conversation

@agjs

@agjs agjs commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Two changesets (12 commits). The headline is the BoringStack greenfield scaffolder; bundled with it are the validated overnight harness findings (kept together per request).

BoringStack scaffolder (the new work)

Replaces the synthetic web scaffolder (web-templates.ts) with a wizard that stands up BoringStack by driving its own scripts — tsforge holds no stack knowledge; the entire config surface lives in a manifest committed in the BoringStack repo (PRs boringstack#224 merged, #225 open).

  • Pure core: answersToPlan (container topology, conditional-required secrets incl. requiresSecretsWhen, cross-rules, per-file env routing via envFileByGroup + ${STACK}), unknown-narrowing manifest parser, completeness alarm (fails on any unmodelled WITH_*/*_ENABLED toggle — validated against a snapshot of BoringStack's real .env.example), wizard step generation, .env apply with secret redaction.
  • I/O layer (injected runner/fs/poller, fully faked in tests): cloneRepo (shallow clone + replay-sha record), applyScaffold (rename-project.sh → setup.sh → per-file env writes → generated secrets), bootStack (setup.sh --up + health poll), runScaffold orchestrator + gate-command composition.
  • Entry points: parseScaffoldArgs, headless headless-scaffold-build.ts, bundled-manifest loader (drift-guarded), consequence scaffoldPreview, and the interactive tsforge scaffold subcommand.
  • Validated end-to-end against a real BoringStack clone (gated E2E + manual). ~97 scaffold tests.
  • Docs: new "Greenfield scaffolding" page + commands reference; legacy --web page marked superseded.

The old web scaffolder is not removed yet (Phase C) — it still backs the eval paths; retirement follows once those migrate.

Overnight harness findings

Validated fixes from local-model evals (aeon-qwen3.6-27b), each tightening or de-frictioning the gate without weakening strictness:

  • F2–F21 (757903d): test-exclude in the web type-aware lint/tsc overlay (no tsconfig rabbit-hole), net-progress + same-persist loop guards, fuzzy-edit/self-import path fixes, inference robustness, rule-pack refinements.
  • F22 (5c03e16): steer the model off branded/nominal IDs under consistent-type-assertions: never (they can't be constructed without the forbidden cast → ~25-turn thrash). Pure friction reduction.
  • Lint cleanup (f56f6a3) + made the suite fully green by fixing two env-dependent oracle-test flakes (7cfcc2c).

Status

Suite green (1564 pass, 0 fail). Types + lint clean except pre-existing web-fetch.ts (undeclared optional jsdom/readability/turndown — a dependency decision left to you).

agjs added 11 commits June 24, 2026 10:46
…27b evals

Validated harness fixes surfaced by running local-model evals (see memory
tsforge-eval-findings-2026-06 / autonomous-overnight-run). Each tightens or
de-frictions the gate without weakening strictness:

- detect-gate/web-gate: force-exclude *.test.ts from type-aware lint + tsc overlay
  + buildWebTscCheck (cwd-threaded) so a test sibling can't send the model into
  rewriting tsconfig.json; thread cwd through buildWebTypeGate/buildWebGate.
- loop: net-progress guard + high turn ceiling in turn.ts/settleGate; same-persist
  + gate-stuck guards; readonly-spin guard; loop.constants tuning.
- files/loop tools: edit-hashline + file-ops + write-guard fuzzy-edit + self-import
  (join-not-resolve relative-path) fixes; salvage-toolcall hardening.
- inference: transport/stream/wire robustness; openai-compatible.
- rule-packs: no-jsx-computation relaxed on idiomatic inline code (warn, not error);
  test-sibling-required + typescript-core/react-component-architecture refinements.
- policy: patterns + evaluation refinements.

All with accompanying tests; suite green (1469 pass, 2 pre-existing flakes:
transport model-endpoint-unreachable + boot-oracle pollUntilReady).
Manifest-driven greenfield scaffolder that stands up boringstack (or its Astro
static site) by driving boringstack's OWN scripts — tsforge holds no stack
knowledge, it reads .tsforge/scaffold-manifest.json from the clone.

Pure core (unit-tested): plan (topology / conditional-required secrets with
requiresSecretsWhen / cross-rules / env edits routed per-file via envFileByGroup +
${STACK}), manifest parser (unknown-narrowing, no casts), completeness alarm
(fails on any unmodelled WITH_*/*_ENABLED toggle; validated against a snapshot of
boringstack's real .env.example), wizard step generation, .env apply with secret
redaction.

I/O layer (injected runner/fs/poller, fully faked in tests): clone at ref + record
replay sha, configure (rename-project.sh → setup.sh → per-file env writes →
generated secrets), boot (setup.sh --up + health poll), run-scaffold orchestrator
+ gate-command composition. 57 scaffold tests, lint + types clean.
…checkout

Opt-in (TSFORGE_SCAFFOLD_E2E=1) + only when a local clone exists. Proves the I/O
layer end-to-end against the real repo: shallow clone, sha resolve, BoringStack's
own setup.sh bootstrap, per-file env writes seeded from .example. Skips in CI.
The shared seam for the headless eval driver + (later) the interactive CLI:
--archetype/--stack/--dest/--set KEY=VALUE/--multi KEY=a,b/--ref/--no-boot →
IScaffoldAnswers, validated. 9 tests.
- loadBundledManifest(): the bootstrap manifest shipped in src/scaffold (mirrors
  boringstack's .tsforge source-of-truth; drift-guarded against the test fixture).
  defaultRef aligned to 'main' (tsforge records the resolved sha per scaffold).
- headless-scaffold-build.ts: non-interactive entry (parseScaffoldArgs → runScaffold)
  that clones, configures via boringstack's own scripts, optionally boots, and prints
  the handoff (gateCwd + composed gateCommand). BORINGSTACK_REPO env override for
  dev/E2E. Validated end-to-end against a local clone (clone→setup.sh→env writes→
  replay record). Model-driven build loop is the documented next step (CLI).

68 scaffold tests + 1 gated E2E, lint + types clean.
…olations)

scaffoldPreview(manifest, answers) → the wizard overview / headless dry-run text:
N-services topology (the 5-vs-20 cost), the conditional-required secret keys the
user must supply (never their values), and any blocking cross-rule violation.
Pure, derived from answersToPlan. 5 tests.
runScaffoldCommand: archetype/stack/dest from flags, config toggles via the
interactive wizard (live topology/secrets/violations preview on the overview),
--set/--multi flags override; off-TTY or Astro skips the wizard. Routed as a
'tsforge scaffold' subcommand so its arg vocabulary doesn't collide with the
harness flags. Prints the handoff: the exact 'tsforge --dir … --accept …' command
to build. Validated end-to-end against a local clone. mergeAnswerValues + preview
unit-tested; TTY glue degrades gracefully off-TTY.
Data-driven coverage of every infra toggle's exact service delta (both directions
+ sole-ownership), WUD prod-only behavior, each email provider's exact secret set
(and exclusion of the others'), each OAuth provider's own credential pair, and that
tracing/cache choices don't false-alarm. 20 tests.
The F2–F21 findings landed with lint debt (they were committed from an uncommitted
working tree). Behavior-preserving fixes:
- edit.ts: exotic-space char class → \u escapes (was literal irregular whitespace)
- syntax-check.ts: parseDiagnostics read via isRecord/isArray guards (drop the
  as-unknown-as double cast — house rule: no casts)
- transport.ts: heartbeat console.error → process.stderr.write (no-console)
- wire.ts: drop dead '?? 0' on a matchAll .index (always defined)
- prettier/formatting + no-lonely-if/prefer-includes auto-fixes across the findings
  src + test files.

Only remaining lint failure is pre-existing web-fetch.ts (lazy-imports jsdom/
readability/turndown — undeclared optional deps, not installed here). Suite green
(1564 pass).
… fully green)

Two long-standing env-dependent flakes, now resolved:
- boot-oracle 'times out when nothing answers': injected a fetchFn into
  pollUntilReady (matching the now/sleep injection) so the timeout path runs on
  the fake clock with NO real network. A real fetch to a dead URL could hang up to
  its 2s abort per poll (×4 > the 5s budget) when the host drops the connection.
- proptest-oracle 'FAILS/PASSES' subprocess tests: gate on fast-check being
  resolvable (it's an opt-in dep the oracle skips when absent — 'install it to
  enable'), the same way browser-oracle gates on chromium. Without it the oracle
  no-ops (exit 0) and the assertions passed/failed for the wrong reason.

Suite: 1564 pass, 0 fail (was 2 failing).
… rule

Live eval finding (saas-crm build, aeon-qwen3.6-27b): with consistent-type-assertions
'never' banning all `as`, the model reaches for branded/nominal IDs
(`string & { _brand }`), which CANNOT be constructed without a cast — so it thrashes
for many turns (observed ~25 turns churning on branded IDs + the downstream
noUncheckedIndexedAccess 'T | undefined' cascade), bouncing between rule violations
and TS2322s, sometimes abandoning to plain strings (losing safety).

The harness's philosophy is runtime validation at boundaries, not compile-time casts
(see no-unsafe-boundary-cast), so branded IDs are off-pattern anyway. Enriched the
consistent-type-assertions rule doc (what/bad/good) to say so explicitly: use a plain
`type UserId = string` + validate at the boundary, don't reach for branded types.
Pure friction reduction — no strictness change. 14 feedback tests green.
Comment thread packages/core/src/scaffold/scaffold-manifest.json Fixed
Comment thread packages/core/tests/fixtures/scaffold/scaffold-manifest.json Fixed
New 'Greenfield scaffolding' page (scaffold/boringstack.mdx): the two archetypes,
when it runs (tsforge scaffold, greenfield only), the clone→configure→boot→handoff
flow, all flags, the manifest-as-source-of-truth model + completeness alarm, and how
to test it (unit / gated real-clone E2E / headless end-to-end). Added to the sidebar
and the commands reference. Marked the legacy web scaffolder page (--web) as
superseded/being-retired.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a greenfield scaffolding wizard (tsforge scaffold) to stand up new projects from the BoringStack template or an Astro static site, driven by a declarative manifest. It also adds robust features like a syntax-regression guard for line edits, a net-progress convergence guard, a no-self-import ESLint rule, and improved connection retry budgets. Review feedback identifies several critical issues to address, including Windows path and CRLF line-ending compatibility, a potential process hang from a non-abortable backoff sleep, regex parsing vulnerability to comments in module-exports.ts, and missing exit code checks for the scaffolding scripts.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/core/src/rule-packs/typescript-core/rules/no-self-import.ts Outdated
Comment thread packages/core/src/scaffold/configure.ts
Comment thread packages/core/src/inference/transport.ts Outdated
Comment thread packages/core/src/files/module-exports.ts
Comment thread packages/core/src/scaffold/configure.ts Outdated
agjs added 9 commits June 24, 2026 12:08
…tleaks

- RULES.md + rule-docs.generated.json regenerated: the F2–F21 findings added 4 rules
  (no-self-import, max-hooks-per-file, no-anonymous-useEffect, no-derived-state-in-effect)
  and tonight's F22 rule-doc edit, but the generated catalog wasn't rebuilt — the CI
  'rules:build + git diff RULES.md' drift check (and the docs linkcheck that depends on
  RULES.md) failed. Now idempotent.
- .gitleaks.toml: allowlist the scaffold manifest paths. gitleaks' generic-api-key
  heuristic flagged `requiresSecretsWhen: "AI_ENABLED=true"` because the field name
  contains 'Secret'. The manifest names required-secret KEYS + gating tokens, never
  real values — false positive.
The explicit `ReturnType<typeof readdirSync>` annotation resolved to the
Buffer-named Dirent overload under CI's @types/node (Dirent<NonSharedBuffer>),
failing typecheck with TS2322/TS2345 on entry.name — though it passed on the older
local @types/node. Pin name type to string via `encoding: "utf8"` and let the
return type infer (never naming the version-variant generic `Dirent`), matching the
inferred-type pattern the repo's other readdir calls already use. 8 tests green.
- no-self-import: barrel self-import check uses join(resolved, 'index') not a
  hardcoded '/index' (Windows separator safety).
- configure/applyEnvEdits: preserve the file's existing line ending (CRLF vs LF)
  instead of forcing LF into a CRLF file (mixed-ending corruption).
- configure/applyScaffold: check exit codes of rename-project.sh + setup.sh and
  throw on failure — no more silent half-configured scaffold.
- transport: abortable backoff sleep — a caller abort during a multi-second backoff
  rejects immediately instead of hanging until the timer fires.
- module-exports/readExportedNames: strip comments before parsing the export list,
  so a comment inside an export block doesn't drop valid exports.
P1 — --stack prod/smoke now drives the STACK env value (plan.ts: STACK resolves to
  answers.stack, not the dev-only field default; wizard.ts: STACK step defaults to
  the chosen stack). Was writing STACK= for prod and STACK=dev for smoke.
P1 — the cloned BoringStack manifest is now the source of truth: runScaffold clones
  using the bundled bootstrap (repo+ref), then reads <clone>/.tsforge/
  scaffold-manifest.json and plans/configures/records from THAT (falls back to
  bundled if absent). Honors --ref + manifest drift.
P1 — boot health-check failure now sets an error, so a timed-out poll exits non-zero
  instead of printing 'booted false' and exiting 0.
P2 — user-supplied required secrets are written: --set RESEND_API_KEY=… (and OAuth
  creds, etc.) now produce a secret env edit routed to the requiring field's env
  file, instead of being silently dropped (they're not manifest fields).
P3 — the shell-write guard skips option tokens after tee, so 'tee -a src/foo.ts'
  is caught instead of capturing the -a flag.

Tests added for each. Full suite green (1588 pass, 0 fail).
… wire.ts (Codex round 2)

P1 — drop the pre-clone validation against the BUNDLED manifest: a config valid
  under a newer --ref was being rejected by stale bundled cross-rules. Validation
  now happens only post-clone against the repo's own manifest. And a cloned manifest
  that's present-but-malformed now THROWS (with cause) instead of silently falling
  back to the stale bundle. Tests: relaxed clone rules accept a bundled-invalid
  config; malformed clone manifest throws.
P3 — wire.ts dedupe key used a literal NUL byte, so git flagged the .ts as binary
  (diff/review blind). Replaced with the \u0000 escape — identical at runtime,
  textual in source. (salvage-toolcall tests unchanged, green.)
… run away

Root cause of the observed 190-turn / 2-gate-run loop: the Session loop runs the
FULL gate (with the no-progress guards) only when the model YIELDS. A model stuck
editing one file (read→edit_lines→read→edit_lines) never yields, so the gate never
runs, the build error is never surfaced (the mid-work incremental check is TS-only,
blind to CSS/build), and samePersist/gateNoProgress/net-progress never tick.

Fix: track edits since the last full gate; after FULL_GATE_EVERY (9) edits without
a yield, force a full gate (gateAfterChurn). That surfaces the real error into the
conversation AND advances the no-progress guards, so a genuine churn loop is
stopped instead of running to the turn backstop. Extracted handleWorkingTurn to
keep drive() lean. Test: a never-yielding overwrite loop now forces a gate and goes
stuck well before maxTurns (was: churns to the backstop, gate never runs).
…HORED files

F24 (from the saas-crm churn log): a model painted its own service file into a
corner — stale oldString anchors + too-large edits — and thrashed ~20 turns because
the edit:not-found message said 'Do NOT use create'. But a file the model authored
this session CAN be fully rewritten via create (the overwrite escape hatch). Make
the message touched-aware: for an authored file, offer 'create it again to fully
rewrite'; for a scaffold-owned file, keep steering to edit. Test added.
Follow-up to F22: the rule-doc steer only fires AFTER the no-`as` rule triggers,
so a model whose branded-ID construction didn't trip it kept choosing branded/
nominal IDs and stuck on them (live: 'Module @/shared/shared.types has no exported
member BrandId'). Promote the steer to the always-on strict-rules line
(gateRulesSentence, in the build system prompt): for IDs use a plain alias
(`type UserId = string`) validated at the boundary, NOT branded types (they need a
forbidden `as` cast). Prevents the choice rather than only aiding recovery. Test added.
The F23 force-gate fired after FULL_GATE_EVERY edits even in a no-gate session,
where the empty gate trivially passes → send() returned {status:done} at turn 9,
before the model yielded its real response. Guard it on this.hasGate: with no gate
there's nothing to force (the churn guard exists to surface gate failures), so it's
a no-op. Regression test: a no-gate session creating 10 files runs to its yield
(responded, turns>9), not done at 9.
@agjs agjs merged commit 144ad3a into main Jun 24, 2026
8 checks passed
@agjs agjs deleted the feat/scaffold-io-layer branch June 24, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants