Skip to content

Feat/contributor agent hardening#2

Open
gaurav0107 wants to merge 7 commits into
mainfrom
feat/contributor-agent-hardening
Open

Feat/contributor agent hardening#2
gaurav0107 wants to merge 7 commits into
mainfrom
feat/contributor-agent-hardening

Conversation

@gaurav0107

Copy link
Copy Markdown
Owner

No description provided.

gaurav0107 and others added 7 commits May 1, 2026 02:51
…un_telemetry

Adds the contract rows and JSON schemas other agent changes depend on. Also
pins flake_signatures.md writers/readers to {builder, scorer} (was "any agent",
which let the file drift to unused).

No behavior change in this commit — downstream agents consume these in
subsequent commits on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walks tracked files, greps the first 20 lines for generator markers
(AUTO-GENERATED, @generated, openapi-generator, protoc, prisma, etc.),
and writes the path + marker + best-effort regenerate_cmd.

Motivation: in the airflow #65685 post-mortem, the builder hand-edited
an openapi-generated YAML, was reviewed, "fixed" the comment, and got
the edit clobbered the next time pre-commit regenerated. Builder will
consume this catalog to block edits to generated paths in a follow-up
commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the plan was only passed as an in-memory shell variable. On
retry or phase-boundary hops, the plan was lost and the builder had to
infer from the issue body alone. Now persisted to plan.md with the H2
sections the builder's parser expects (Goal, Files to edit, Approach,
Test strategy, Risks, Metadata).

Return value unchanged; the prose is still emitted for immediate
consumers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-drift linter, flake classifier

Four changes, each 1:1 mapped to a failure mode from the airflow #65685
post-mortem:

1. Step 0 fact-forcing preamble. Builder must now print plan source,
   caller-graph count, public-API impact, generated-file check, and last
   3 reviewer-intent entries before any edit. Prevents plan drift.

2. Step 1.5 generated-file guard. Reads generated_files.json; blocks
   hand-edits to cataloged paths and regenerates via recorded
   regenerate_cmd where known. Emits GENERATED_FILE_BLOCKED.

3. Step 5 review-drift linter replaced vague prose with three concrete
   grep checks: newsfragment filename matches issue, removed-symbol
   echo in commit body, provider-name leak into core files. Warnings
   only; do not block push.

4. CI-gate flake classifier. On CI failure, grep the log tail against
   _global/flake_signatures.md patterns; matches are recorded as flake
   hits (non-blocking) rather than mistakes. Sets
   BUILDER_LAST_CI_FLAKE=true so the scorer can skip the CI cap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ures

Before applying the 40% CI-health cap, the scorer now greps each failing
log tail against patterns in _global/flake_signatures.md. If every
failing command matches a known flake (pip mirror timeouts, redis
connection refused, etc.), the cap is skipped and caps_applied[] gets
"ci_health_flake_skipped" with matched signatures recorded in notes.

A single real failure still caps — we don't hand-wave around regressions.

Pairs with the builder's flake classifier (prior commit): builder records
flake hits; scorer consults the catalog.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before drafting replies, load maintainer_tone.json and adjust style per
reviewer preference (short_replies / detailed_rationale / quotes_code).
After handling each maintainer comment, infer tone signals from the body
and merge back into the file (atomic). Entries older than 180 days are
pruned on read.

Observational only — tone preferences never override suspicious-
classification or other safety gates. Stranger comments don't update the
file; only top-50 contributors do.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three changes:

1. Phase 4 reads plan.md from disk rather than relying on the planner's
   return value stored in a shell variable. Retries across phase
   boundaries now see a consistent plan.

2. New Phase 0.5 housekeeping prunes mistakes.md entries older than 90
   days. SHARED_STATE.md has long claimed orchestrator does this pruning
   but the implementation was missing; mistakes.md was growing unbounded
   and poisoning the planner's "known mistakes" prompt.

3. run_phase helper wraps every specialist dispatch and appends a
   {ts, iteration, phase, duration_s, outcome} JSONL line to
   run_telemetry.jsonl. /contribution-dashboard can now show where time
   went per phase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant