Problem
The fix loop's responsibility ends when a PR is opened. If the PR is merged and introduces a regression — a test that now fails on a different code path, a type error that CI catches in a downstream build, a behavior change that breaks a dependent service — nothing in this system detects it. The only safety nets are the target project's CI and human reviewers.
This matters because:
- The fix loop is calibrated for minimal, targeted changes, but it can still be wrong
- An agent-merged regression may not be immediately visible (e.g. flaky tests, integration failures)
- There's no audit trail connecting a regression back to the agent fix that caused it
What's needed
A lightweight post-merge check that runs some time after a PR is merged and records whether the fix held:
- CI status check: After merge, poll the target repo's CI for the merge commit. If CI fails, file a follow-up issue or comment on the original issue linking the failure.
- Groom integration: The groom loop already checks if issues are resolved. Extend it to also check if any agent-fix PRs have merged into a failing CI state.
- Failure attribution: If a CI failure is detected post-merge, label the original issue
needs-human-review and comment with the failing check URL.
Minimal version
A command run.py check-merged <project-id> that:
- Finds agent-opened PRs merged in the last N days
- Checks CI status of each merge commit
- Comments on the original issue if CI is failing
This doesn't require loop integration — it can be run as a separate cron step.
Definition of Done
run.py check-merged <project-id> outputs a table of merged agent PRs and their CI status
- PRs with failing CI trigger a comment on the original issue and add
needs-human-review
- The command is documented in docs/playbooks/
Out of Scope
- Automated rollback (too risky to automate)
- Monitoring for behavioral regressions not caught by CI
Problem
The fix loop's responsibility ends when a PR is opened. If the PR is merged and introduces a regression — a test that now fails on a different code path, a type error that CI catches in a downstream build, a behavior change that breaks a dependent service — nothing in this system detects it. The only safety nets are the target project's CI and human reviewers.
This matters because:
What's needed
A lightweight post-merge check that runs some time after a PR is merged and records whether the fix held:
needs-human-reviewand comment with the failing check URL.Minimal version
A command
run.py check-merged <project-id>that:This doesn't require loop integration — it can be run as a separate cron step.
Definition of Done
run.py check-merged <project-id>outputs a table of merged agent PRs and their CI statusneeds-human-reviewOut of Scope