Review agent misses security-critical findings on large architectural PRs

## What happened

PR #792 implemented 3 ADRs across 52 files (~4K lines). The review bot ran 9 formal review rounds over 4 days (May 9-13), producing a 60K+ character sticky comment, and submitted CHANGES_REQUESTED each time. Despite this, when the human reviewer (ralphbean) reviewed on May 13, they found 4 important issues the bot had never flagged:

1. **Fail-open security bug** in `internal/mint/main.go`: `ALLOWED_WORKFLOW_FILES` failed open when unset -- any workflow in `fullsend-ai/fullsend` could mint agent tokens for any enrolled org. This was a trust surface expansion introduced by the PR's OIDC changes.
2. **Role escalation** in `reusable-retro.yml`: retro agent role changed from `retro` (read-only) to `fullsend` (write access to contents, PRs, actions, workflows) with no justification.
3. **ADR inconsistency**: `env/` listed as "Org-only" in ADR 0035 but treated as a layered directory everywhere in code.
4. **Missing test assertion**: scaffold test verified 6 of 7 `.gitkeep` files, missing `customized/env/.gitkeep`.

The bot's review focused on build verification, routing completeness, dependency pinning, and documentation -- valid but lower-severity concerns. The human found the security and correctness issues.

## What could go better

The review agent appears to lack specific guidance for evaluating security-sensitive changes like OIDC token validation, role/permission assignments, and trust boundary modifications. On a PR this large, the bot may be spreading attention too thin across all 52 files rather than prioritizing security-critical paths.

I'm fairly confident this is a systematic gap rather than a one-off, because: (a) the bot had 9 chances to catch these issues and didn't, (b) issues #870 and #580 document similar patterns where the bot misses important findings or doesn't calibrate severity well, and (c) the fail-open bug is exactly the kind of issue that requires understanding the security implications of a code change, not just its syntactic correctness.

I'm less certain about the fix -- it could be a prompt/skill issue (the review agent lacks security-focused review dimensions) or a context window issue (52 files may exceed the agent's effective attention span).

## Proposed change

Add a security-focused review dimension to the review agent's skill or prompt. Specifically:

1. In the `code-review` skill (or the review agent definition in `agents/review.md`), add an explicit review dimension for **permission and trust boundary changes**: flag any modifications to role assignments, OIDC validation logic, token scoping, `secrets:` blocks, or `permissions:` blocks in workflow files.

2. Add a heuristic: when a PR modifies files matching patterns like `**/mint/**`, `**/auth/**`, `**/oidc/**`, or workflow files with `permissions:` blocks, the review agent should prioritize those files and explicitly check for fail-open vs fail-closed behavior.

3. Consider a two-pass strategy for large PRs (>30 files): first pass identifies security-critical files, second pass deep-reviews those files specifically.

## Validation criteria

On the next 5 PRs that modify mint, auth, or workflow permission logic, the review agent should flag at least one security-relevant observation per PR. Specifically, if a similar fail-open pattern is introduced, the bot should catch it before human review. Measure over 3 months.

---
_Generated by retro agent from https://github.com/fullsend-ai/fullsend/pull/792_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review agent misses security-critical findings on large architectural PRs #898

What happened

What could go better

Proposed change

Validation criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review agent misses security-critical findings on large architectural PRs #898

Description

What happened

What could go better

Proposed change

Validation criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions