Skip to content

feat(review agent): anchor severity across re-reviews#729

Open
ben-alkov wants to merge 1 commit into
fullsend-ai:mainfrom
ben-alkov:worktree-feat+review-severity-anchoring
Open

feat(review agent): anchor severity across re-reviews#729
ben-alkov wants to merge 1 commit into
fullsend-ai:mainfrom
ben-alkov:worktree-feat+review-severity-anchoring

Conversation

@ben-alkov
Copy link
Copy Markdown
Contributor

The Review Agent was stateless across runs — on re-review it could escalate or de-escalate severity for findings in unchanged code. Inject prior review context using the same pattern the Fix Agent uses for its pre-fetched review body.

Three-layer fix:

  • Plumbing: workflow pre-fetch step extracts the prior sticky comment and SHA, harness copies it into the sandbox, env file exports the SHA
  • Skill instructions: pr-review step 2a computes changed-file set, code-review step 4 anchors severity for unchanged-file findings
  • Agent identity: documents the two new inputs

First reviews are unaffected — prior-review.txt is absent and PRIOR_REVIEW_SHA is empty, so the agent assesses everything fresh.

Closes: #681

Assisted-by: Claude Code (Opus 4.6)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Site preview

Preview: https://d7e81e3f-site.fullsend-ai.workers.dev

Commit: 78507422c652d19f9e34ebccdaeade070d544f7b

@fullsend-ai-review
Copy link
Copy Markdown

fullsend-ai-review Bot commented May 7, 2026

Review: #729

Head SHA: 17c58ca
Timestamp: 2026-05-13T00:00:00Z
Outcome: comment-only

Summary

This PR implements severity anchoring for re-reviews via a well-structured three-layer approach: workflow plumbing (pre-fetch script + env propagation), skill instructions (anchoring rules + finding matching), and agent identity (zero-trust exception documentation). The design is sound and the implementation is clean, with one medium finding: the skill instructions reference an env var (GITHUB_REPOSITORY) that isn’t available in the agent sandbox, which would cause the compare API to fail and fall back to treating all files as changed — defeating the anchoring feature on every re-review. The safe fallback prevents incorrect behavior, but the feature would be inert until corrected.

Findings

Medium

  • [correctness] skills/pr-review/SKILL.md:step 2a — The compare API code snippet uses GITHUB_REPOSITORY (a GitHub Actions runner variable) but the review agent runs inside a sandbox where only REPO_FULL_NAME is available (exported via env/review.env). When the agent executes this snippet, GITHUB_REPOSITORY will be empty/unset, the gh api call will fail, and the fallback logic will treat all files as changed — effectively disabling severity anchoring on every re-review. The comment in the snippet (# GITHUB_REPOSITORY is a standard GH Actions var) reinforces the incorrect variable name.
    Remediation: Replace GITHUB_REPOSITORY with REPO_FULL_NAME in the compare API snippet and update the comment. For example: COMPARE=$(gh api "repos/${REPO_FULL_NAME}/compare/${PRIOR_REVIEW_SHA}...${head_SHA}").

Low

  • [correctness] scripts/pre-fetch-prior-review.sh:82 — The SHA extraction regex grep -oP uses Perl-compatible regex (-P flag). This works on Ubuntu (GitHub Actions runners) but is a PCRE dependency worth noting for portability. No action needed for current use.

Footer

Outcome: comment-only
This review applies to SHA 17c58ca17d17fb549fa62b8e49629a320cf46380. Any push to the PR head clears this review and requires a new evaluation.

Previous run

Review: #729

Head SHA: b0ba168
Timestamp: 2026-05-12T00:00:00Z
Outcome: request-changes

Summary

The design intent is sound — anchoring severity across re-reviews addresses a real pain point documented in #681. However, pre-fetch-prior-review.sh has three bugs that prevent it from functioning at all: (1) it references TRIGGER_SOURCE which is never defined in the review workflow, crashing the script under set -u; (2) it references a non-existent workflow step sandbox-token for GH_TOKEN, so the API call runs unauthenticated; and (3) COMMENT_JSON is always empty because stdout is redirected to a file inside the $(...) capture, making the entire provenance-validation section dead code. The skill instruction updates and agent identity changes are well-written and correctly scoped, but the plumbing layer that feeds them data is broken.

Findings

Critical

  • [undefined-variable] scripts/pre-fetch-prior-review.sh:30TRIGGER_SOURCE is referenced but never defined for the review workflow. The review workflow has no trigger_source input, and the Pre-fetch prior review context step does not set it in env:. Under set -euo pipefail, this crashes the script with unbound variable on every run. This line was copied from the fix workflow pattern (fix.yml:239) where TRIGGER_SOURCE is available via ${{ inputs.trigger_source }}, but that input doesn't exist for the review workflow.
    Remediation: Remove the TRIGGER_SOURCE check entirely — it is a fix-agent concept ("bot-triggered run with no review body to fix") that has no meaning for the review agent, which fetches prior reviews for severity anchoring, not for remediation. If a first-time review finds no prior review, that's the normal path, not an error.

  • [wrong-step-reference] .github/workflows/review.yml:58GH_TOKEN: ${{ steps.sandbox-token.outputs.token }} references step ID sandbox-token, which does not exist in this workflow. The only token-minting step is app-token (line 43). In GitHub Actions, referencing a non-existent step output silently resolves to empty string, so GH_TOKEN will be empty and the gh api call in the script will run unauthenticated, failing or returning unexpected results.
    Remediation: Change to ${{ steps.app-token.outputs.token }}.

High

  • [dead-code-logic-error] scripts/pre-fetch-prior-review.sh:18-22COMMENT_JSON is always empty. The command COMMENT_JSON=$(gh api ... | jq ... > "${PRIOR_FILE}" ...) redirects jq's stdout to PRIOR_FILE inside the $(...) capture, so $(...) captures nothing. Consequently, the check on line 35 (if [[ -z "${COMMENT_JSON}" ...]]) always evaluates to true, the script always takes the "no prior review found" early-exit path, and the entire provenance-validation block (lines 37–74) and SHA-extraction block (lines 76–93) are unreachable dead code. The provenance validation — which is a security-critical feature of this PR — never executes.
    Remediation: Separate the capture from the file write. First capture the full comment JSON object (not just .body) into a variable, then validate provenance, then extract .body to the file. The fix workflow's inline approach (fix.yml:229-231) works because it doesn't try to capture into a variable simultaneously. Example:
    COMMENT_JSON=$(gh api "repos/${SOURCE_REPO}/issues/${PR_NUM}/comments" \
      --paginate --jq '.[]' \
      | jq -s '[.[] | select(.user.login == "'"${REVIEW_BOT}"'" and (.body | contains("<!-- fullsend:review-agent -->")))] | last // empty')
    Then after provenance checks: echo "${COMMENT_JSON}" | jq -r '.body // ""' > "${PRIOR_FILE}"

Medium

  • [incorrect-instruction-code] skills/pr-review/SKILL.md:95-109 — The bash example in step 2a is incorrect. prior_SHA=$(gh api repos/.../compare/"${PRIOR_REVIEW_SHA}") stores the full JSON response (not a SHA) and calls the compare endpoint with only one SHA (the API requires base...head format). Then COMPARE="$prior_SHA$...$head_SHA" attempts variable expansion of $... (undefined). The agent following these instructions verbatim would fail. Should be a single call: COMPARE=$(gh api repos/"${GITHUB_REPOSITORY}"/compare/"${PRIOR_REVIEW_SHA}...${head_SHA}").
    Remediation: Fix the bash example to use a single gh api call with the correct base...head compare syntax.

  • [jq-string-interpolation] scripts/pre-fetch-prior-review.sh:19-21REVIEW_BOT is interpolated directly into a jq filter string via bash string interpolation. While GitHub org names are restricted to alphanumeric/hyphens (making injection impractical), this pattern is fragile and contrary to jq best practices. Use --arg to pass variables safely.
    Remediation: Use jq --arg bot "${REVIEW_BOT}" '[.[] | select(.user.login == $bot and ...)]'.

Low

  • [missing-header-doc] scripts/pre-fetch-prior-review.sh:8 — The required-variables header lists REVIEW_APP_CLIENT_ID and SOURCE_REPO but omits GH_TOKEN, which is required for gh api to authenticate.
    Remediation: Add GH_TOKEN to the required-variables comment block.

Info

  • [intent-alignment] The three-layer approach (workflow plumbing, skill instructions, agent identity) matches the issue's request well. The skill-level severity anchoring rules in code-review/SKILL.md are clearly written and correctly scoped — unchanged files anchor severity, changed files allow re-evaluation, new findings assess normally. The PRIOR_REVIEW_PROVENANCE signal and the acknowledgment that comment edit history is not available via the REST API demonstrate good security thinking, even though the implementation doesn't currently reach the provenance code path.

  • [scope-appropriate] The PR is appropriately scoped as a feature that addresses the specific problem in Review agent: findings must not escalate severity across re-reviews unless code changes justify it #681. No unauthorized capability is being added.

Footer

Outcome: request-changes
This review applies to SHA b0ba168d1818d62bcfc16d928c8eb7d99805b823. Any push to the PR head clears this review and requires a new evaluation.

Previous run (2)

Review: #729

Head SHA: b68e77d
Timestamp: 2026-05-08T00:00:00Z
Outcome: approve

Summary

This PR implements severity anchoring for re-reviews, directly addressing issue #681 where findings escalated across re-reviews without code changes. The implementation is a well-structured three-layer fix: workflow pre-fetch with provenance validation, harness file plumbing, and skill-level instructions for severity anchoring. All edge cases are handled gracefully — first reviews, provenance failures, oversized bodies, compare API failures, and force-pushes all degrade safely. No critical, high, or medium findings.

Findings

Critical

None.

High

None.

Medium

None.

Low

None.

Info

  • [correctness] skills/code-review/SKILL.md — The anchoring instruction uses "substantially same code area/function" as the matching criterion for prior findings. This is necessarily fuzzy for a prompt-based approach. In practice, LLM interpretation may vary across runs for borderline cases (e.g., refactored functions, renamed variables). This is an inherent limitation of the approach rather than a defect — deterministic matching would require structured finding IDs, which would be a larger change.

  • [platform-security] .github/workflows/review.yml:100-105 — The provenance check validates original authorship via performed_via_github_app.client_id but cannot detect post-creation edits to the comment body (GitHub REST API limitation). The code explicitly documents this gap and tracks HMAC-based content integrity as a follow-up. The practical impact is limited: exploiting this requires repo write access, and the worst case is influencing severity anchoring (not bypassing review).

  • [correctness] skills/pr-review/SKILL.md — Step 2a checks PRIOR_REVIEW_SHA and the prior review file but does not explicitly reference PRIOR_REVIEW_PROVENANCE. The agent definition documents this env var and instructs the agent to note it in output when provenance fails. Since the agent definition is authoritative over skills, the agent should still check it — but adding an explicit mention in step 2a would make the expectation clearer.

Footer

Outcome: approve
This review applies to SHA b68e77d8a0ecb94c457d1beb0468c4d5d66ca5e4. Any push to the PR head clears this review and requires a new evaluation.

Previous run (3)

Review: #729

Head SHA: eee3134
Timestamp: 2026-05-07T00:00:00Z
Outcome: approve

Summary

This PR correctly implements severity anchoring for re-reviews by following the established pre-fetch pattern from the fix agent workflow. The three-layer approach — workflow plumbing, skill instructions, and agent documentation — is well-structured and handles the first-review case gracefully via optional file handling. One low-severity robustness improvement is noted below but does not block approval.

Findings

Critical

None.

High

None.

Medium

None.

Low

  • [correctness] internal/scaffold/fullsend-repo/.github/workflows/review.yml:86 — The gh api "repos/${SOURCE_REPO}/issues/${PR_NUM}/comments" call does not use --paginate. The issues comments API returns 30 items per page by default. On PRs with 30+ comments, the sticky review comment could be missed, causing anchoring to silently skip. Consider adding --paginate for robustness.

Info

None.

Footer

Outcome: approve
This review applies to SHA eee3134a6417ab8082f509f2a798d8e16c1d6fa1. Any push to the PR head clears this review and requires a new evaluation.

Previous run (4)

Review: #729

Head SHA: f247ed0
Timestamp: 2026-05-07T00:00:00Z
Outcome: approve

Summary

This PR implements severity anchoring for the review agent across re-reviews, addressing the observed severity escalation on unchanged code (issue #681). The three-layer approach — workflow plumbing to pre-fetch prior review context, skill instructions for anchoring rules, and agent identity documentation — is well-structured and follows existing patterns throughout. First reviews are unaffected, and safe defaults apply when the compare API fails or the prior review is absent. No security concerns: the prior review is fetched with the existing read-only token, filtered by bot identity, and capped at 1 MB.

Findings

Low

  • [error-observability] review.yml:88 — The 2>/dev/null on the gh api call suppresses all stderr, including auth failures or rate-limiting. The fallback behavior (no anchoring) is correct, but production debugging would be easier if stderr were captured rather than discarded. Consider redirecting to $GITHUB_STEP_SUMMARY or a log artifact.

Footer

Outcome: approve
This review applies to SHA f247ed081d33dc7a67de549d0a4918fa691be195. Any push to the PR head clears this review and requires a new evaluation.

@ben-alkov ben-alkov requested review from ascerra and ralphbean May 7, 2026 17:25
Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change — severity anchoring across re-reviews directly addresses the trust problem documented in #681, and the three-layer approach (workflow plumbing → skill instructions → agent docs) is clean and follows the fix agent's established pattern.

One change requested, plus a few notes for future consideration.

Strategic assessment: This is well-scoped, directly closes a filed issue with concrete evidence, and the implementation is architecturally sound. The safe defaults (first reviews unaffected, compare API failures degrade to no anchoring) are exactly right.

Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/skills/code-review/SKILL.md Outdated
Comment thread internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

fullsend review is working on this — view logs

Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change — severity anchoring directly addresses the trust problem in #681, and the three-layer approach is clean. The follow-up commit addresses all four prior inline comments (thumbs-upped each). One change requested around provenance validation, plus two notes for future work.

All four prior review findings from the first pass have been addressed by the follow-up commit:

  • prior_review_file now set in both branches
  • --paginate dropped to avoid per-page jq issue
  • Bidirectional anchoring now explicitly documented
  • total_commits guard added for compare API truncation

Comment thread internal/scaffold/fullsend-repo/agents/review.md
Comment thread internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md Outdated
Comment thread internal/scaffold/fullsend-repo/skills/code-review/SKILL.md
Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change — severity anchoring directly addresses the trust problem in #681 and the three-layer approach is clean. The follow-up commit addressed all four round-1 findings. Two new issues surfaced plus an additional angle on the existing provenance discussion.

Strategic assessment: Well-scoped, closes a filed issue with concrete evidence, and the implementation follows established patterns. The safe defaults (first reviews unaffected, compare failures degrade to no anchoring) are exactly right. The outstanding items are about robustness, not direction.

Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md Outdated
@ben-alkov ben-alkov closed this May 8, 2026
@ben-alkov ben-alkov deleted the worktree-feat+review-severity-anchoring branch May 8, 2026 19:47
@ben-alkov ben-alkov restored the worktree-feat+review-severity-anchoring branch May 8, 2026 19:53
@ben-alkov ben-alkov reopened this May 8, 2026
@ben-alkov ben-alkov force-pushed the worktree-feat+review-severity-anchoring branch from eee3134 to b68e77d Compare May 8, 2026 19:58
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

fullsend review is working on this — view logs

Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change — severity anchoring directly addresses the trust problem in #681. The three-layer approach is clean and follows established patterns. One change requested (jq argument injection hardening), plus three notes for future consideration.

Three existing findings from prior reviews were thumbs-upped (+1'd) as they remain relevant and unaddressed:

  • Pagination gap after --paginate removal (comment by ralphbean)
  • Compare API inner subcommand failure handling (comment by ralphbean)
  • Multi-file findings anchoring gap (comment by ralphbean, deferred)

Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change — severity anchoring directly addresses the trust problem in #681 (documented severity flip-flopping on unchanged code). The three-layer approach (workflow plumbing → skill instructions → agent docs) follows the fix agent's established pattern and is architecturally sound.

Strategic assessment: Well-scoped, closes a filed issue with concrete evidence, and the implementation direction is right. The GitHub App provenance validation added in this commit is exactly the right move.

However, this is the third review round, and while each iteration has addressed prior findings, new issues continue to surface:

  • 6 new issues requiring changes (2 important workflow bugs, 4 high-severity instruction gaps)
  • 8 new issues flagged for future work (portability, conventions, ambiguities)
  • 6 existing issues remain unresolved from prior review rounds (per the thumbs-upped comments)

The core idea is solid, but the implementation needs hardening — particularly around shell script safety (set -euo pipefail, API error detection), instruction correctness (sentinel parsing, undefined variables), and resolving the conflict between severity anchoring and the zero-trust principle.


Breakdown by category:

Workflow shell script (review.yml):

  • Missing set -euo pipefail — errors silently swallowed (IMPORTANT, request changes)
  • API error masking via 2>/dev/null || echo "" — auth failures indistinguishable from "no prior review" (IMPORTANT, request changes)
  • grep -oP non-portable to macOS (MODERATE, note for future)
  • File written to $GITHUB_WORKSPACE not $RUNNER_TEMP (MODERATE, note for future)
  • wc -c whitespace handling deviates from repo convention (MODERATE, note for future)
  • prior_sha written to GITHUB_OUTPUT without heredoc delimiter (MODERATE, note for future)
  • Sentinel-on-first-line edge case (MINOR, note for future)
  • Hardcoded dest path collision potential (MINOR, note for future)

Skills and agent docs:

  • Sentinel parsing instruction wrong relative to sticky.go behavior (HIGH, request changes)
  • REPO_FULL_NAME variable used but never defined (HIGH, request changes)
  • PRIOR_REVIEW_PROVENANCE=unverifiable-* glob pattern never enumerated or referenced in skills (HIGH, request changes)
  • Severity anchoring "MUST match" conflicts with agents/review.md zero-trust principle (MEDIUM, request changes)
  • "Substantially same code area/function" has no operational definition (MEDIUM, note for future)
  • Provenance failure output location unspecified (MEDIUM, note for future)

Summary: Six changes requested (shell safety + instruction correctness), eight notes for future consideration (conventions + ambiguities). The direction is excellent — once these gaps are addressed, this will be a solid addition to the review agent's capabilities.

Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/.github/workflows/review.yml Outdated
Comment thread internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md Outdated
Comment thread internal/scaffold/fullsend-repo/agents/review.md
Comment thread internal/scaffold/fullsend-repo/skills/code-review/SKILL.md Outdated
Comment thread internal/scaffold/fullsend-repo/skills/code-review/SKILL.md
Comment thread internal/scaffold/fullsend-repo/agents/review.md Outdated
@ben-alkov ben-alkov force-pushed the worktree-feat+review-severity-anchoring branch from b68e77d to b0ba168 Compare May 12, 2026 18:33
@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

Copy link
Copy Markdown

@fullsend-ai-review fullsend-ai-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

@ben-alkov ben-alkov force-pushed the worktree-feat+review-severity-anchoring branch from b0ba168 to 1d8e55b Compare May 13, 2026 19:28
@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

@ben-alkov ben-alkov force-pushed the worktree-feat+review-severity-anchoring branch from 1d8e55b to 17c58ca Compare May 13, 2026 19:40
@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change — severity anchoring directly addresses the trust problem in #681, and the three-layer approach is clean and well-executed. The squashed commit addresses the majority of prior review findings (jq injection, set -euo pipefail, standalone script extraction, pagination, provenance validation, multi-file findings).

One change requested — and I owe an apology on this one.

Strategic assessment: Well-scoped, closes a filed issue with concrete evidence, follows established patterns. The provenance validation via GitHub App client_id is a strong addition. The safe defaults (first reviews unaffected, compare failures degrade to no anchoring) are exactly right.

Comment thread internal/scaffold/fullsend-repo/skills/pr-review/SKILL.md Outdated
@ralphbean
Copy link
Copy Markdown
Contributor

Also, there's a merge conflict now. needs rebase and resolution.

@ben-alkov ben-alkov force-pushed the worktree-feat+review-severity-anchoring branch from 17c58ca to 55e32e2 Compare May 14, 2026 13:54
@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

The review agent was stateless across runs — on re-review it could
escalate or de-escalate severity for findings in unchanged code.

Inject prior review context using the same pattern the fix agent
uses for its pre-fetched review body.

Closes: fullsend-ai#681

Assisted-by: Claude Code (Opus 4.6)
Signed-off-by: Ben Alkov <ben.alkov@redhat.com>
@ben-alkov ben-alkov force-pushed the worktree-feat+review-severity-anchoring branch from 55e32e2 to 7850742 Compare May 14, 2026 14:08
@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review agent: findings must not escalate severity across re-reviews unless code changes justify it

2 participants