Skip to content

[codex] Gate docs changes per task#1

Draft
jahooma wants to merge 8 commits intomainfrom
fix-docs-writer-agent-pr
Draft

[codex] Gate docs changes per task#1
jahooma wants to merge 8 commits intomainfrom
fix-docs-writer-agent-pr

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented Apr 9, 2026

What changed

This refactors evalbuff's docs-selection loop to gate each candidate docs change against the originating carving task instead of accepting all docs updates.

It also moves docs planning to a single per-task docs-writer pass that:

  • reads the docs corpus once
  • rejects overfit or low-value suggestions up front
  • creates one independent docs-only commit per accepted candidate
  • records accepted/rejected decisions in a manifest

The branch also includes Bun env preloading so live tests pick up .env.local automatically, plus new unit and real-agent E2E coverage for the docs writer.

Why

The previous flow accepted docs changes too eagerly and did not separate reusable improvements from task-specific overfitting. It also paid a high token cost by rereading docs for each suggestion.

This change makes docs acceptance measurable per task, pushes overfitting decisions into the docs writer, and reduces docs-writer context cost by planning all candidate changes in one pass.

Impact

  • evalbuff now processes feature loops sequentially with docs gating based on rejudge/rerun score deltas
  • overfit docs suggestions are rejected before they become candidates
  • accepted docs candidates are independently replayable from committed patches
  • live Bun test runs automatically load .env.local / .env
  • the docs-writer E2E can run against the real provider path and preserves failure bundles with provider error details when it fails

Validation

  • bun test src/__tests__/docs-writer.test.ts
  • bun run typecheck
  • bun run test
  • bun test src/__tests__/docs-writer.e2e.test.ts

Root cause

Docs updates were being selected without a task-local gate and without a reusable/overfit decision point. The docs writer also incurred unnecessary token cost by re-reading the repository docs for every individual suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant