Skip to content

feat(review-agent): add outcome labels to post-review.sh#874

Merged
ben-alkov merged 1 commit into
mainfrom
agent/552-add-review-outcome-labels
May 14, 2026
Merged

feat(review-agent): add outcome labels to post-review.sh#874
ben-alkov merged 1 commit into
mainfrom
agent/552-add-review-outcome-labels

Conversation

@fullsend-ai-coder
Copy link
Copy Markdown

After posting the review verdict, post-review.sh now applies outcome labels based on the review action, completing the label state machine documented in the bugfix workflow guide:

  • approve (no protected-path downgrade) -> ready-for-merge
  • approve (downgraded to comment) -> requires-manual-review
  • comment (split/conflicting review) -> requires-manual-review
  • request_changes -> no label (fix agent triggers on the event)

Labels are created if missing, matching the needs-human pattern in post-fix.sh. A DOWNGRADED flag tracks whether the protected-path check changed the action so the correct label is applied based on the final posted action rather than the original.

Adds post-review-test.sh with unit tests for the label-selection logic and registers it in the Makefile script-test target.

Note: pre-commit and make lint could not run in sandbox due to network restrictions (gitleaks hook install blocked). The post-script runs authoritative pre-commit on the runner.


Closes #552

Post-script verification

  • Branch is not main/master (agent/552-add-review-outcome-labels)
  • Secret scan passed (gitleaks — 7f3e5cce7dbb6d8aab3616b4b084ab3375fc84f8..HEAD)
  • Pre-commit hooks passed (authoritative run on runner)
  • Tests ran inside sandbox

@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Site preview

Preview: https://bfa15481-site.fullsend-ai.workers.dev

Commit: e903bf9014460e0a0c44e54146e08628ffb23e94

@fullsend-ai-review
Copy link
Copy Markdown

fullsend-ai-review Bot commented May 13, 2026

Review: #874

Head SHA: e903bf9
Timestamp: 2026-05-14T00:00:00Z
Outcome: approve

Summary

This PR correctly implements outcome labels in post-review.sh, completing the label state machine documented in the bugfix workflow guide. The implementation is well-scoped to issue #552, follows existing patterns (label creation from post-fix.sh), correctly tracks protected-path downgrades via a DOWNGRADED flag, removes stale labels before applying new ones for idempotency, and includes a comprehensive test file. No critical or high findings.

Findings

Critical

None.

High

None.

Medium

  • [Correctness / Test integrity] post-review-test.sh — The test reimplements the label-selection logic as a standalone determine_outcome_label() function rather than sourcing or extracting it from post-review.sh. If someone updates the production script's label logic but not the test mirror, the tests will pass with stale expectations. The inline comment "Label logic is mirrored in post-review-test.sh — update both" mitigates this but does not prevent drift. Consider extracting a shared function sourced by both files, or adding a structural check (e.g., a hash or marker comment) to detect divergence.
    Remediation: Extract the label-selection logic into a shared shell function file sourced by both post-review.sh and post-review-test.sh, or add a sentinel comment/hash that CI can verify stays in sync.

Low

None.

Info

  • [Correctness] post-review.sh — The failure action is not explicitly handled in the label if/elif chain. Stale labels are still removed (the removal block runs unconditionally), so the PR ends up with no outcome label, which is reasonable behavior. Just noting for completeness.
  • [Style] post-review.sh — The reject case now creates the label before applying it (gh label create "rejected" + --add-label), which is an improvement over the original code that only called --add-label without ensuring the label existed. Good consistency with the other branches.

Footer

Outcome: approve
This review applies to SHA e903bf9014460e0a0c44e54146e08628ffb23e94. Any push to the PR head clears this review and requires a new evaluation.

Previous run

Review: #874

Head SHA: c1dea91
Timestamp: 2026-05-13T12:30:00Z
Outcome: approve

Summary

This PR adds outcome-label application to post-review.sh, completing the label state machine documented in the bugfix workflow guide. The implementation is correct: it introduces a DOWNGRADED flag to distinguish genuine approvals from protected-path downgrades, applies ready-for-merge or requires-manual-review labels accordingly, and creates labels if they don't exist (matching the needs-human pattern in post-fix.sh). The test file covers all five action×downgrade combinations. No critical or high findings.

Findings

Info

  • [correctness] post-review-test.sh — The test reimplements the label-selection logic in a standalone determine_outcome_label() function rather than sourcing it from post-review.sh. This is a reasonable pragmatic choice (the real script requires network access and env vars), but it means the test won't catch regressions if someone modifies the label logic in post-review.sh without updating the test function. Consider adding a comment in post-review.sh near the label block pointing to the test, so future editors know to update both.

  • [style] post-review.sh:125-133 — The pre-existing reject block doesn't create the rejected label before applying it (unlike the new code and the post-fix.sh pattern). Not introduced by this PR, but the inconsistency is worth noting for a future cleanup.

  • [correctness] post-review.sh — The failure action correctly falls through all label conditions with no label applied. Verified that no label is erroneously applied for agent failures.

Dimensions evaluated

  1. Correctness — Logic is sound. ACTION captures the original action (line 55); DOWNGRADED tracks whether the protected-path check changed it. The label conditions are complete, mutually exclusive, and handle all documented outcomes. Edge cases (failure action, empty PR files) are handled by existing guards.
  2. Intent alignment — Matches issue post-review.sh does not apply outcome labels (ready-for-merge, requires-manual-review) #552 exactly. Scope is appropriate: label application after review posting, with tests.
  3. Platform security — No new privilege surface. Label operations use the existing REVIEW_TOKEN. Failures are swallowed with || true to avoid blocking the post-script, consistent with existing patterns.
  4. Content security — No user-content handling changes.
  5. Injection defense — No new injection vectors. PR body contains no suspicious directives or non-rendering Unicode. Code changes handle only internal state (ACTION, DOWNGRADED variables) with no user-controlled interpolation.
  6. Style/conventions — Follows existing patterns: gh label create ... 2>/dev/null || true, gh pr edit ... --add-label, descriptive echo before each block. Well-commented.

Footer

Outcome: approve
This review applies to SHA c1dea91980baa6359f6cb2d6c7ae87edbbecaabf. Any push to the PR head clears this review and requires a new evaluation.

Previous run (2)

Review: #874

Head SHA: 3fa1d00
Timestamp: 2026-05-13T12:30:00Z
Outcome: approve

Summary

This PR adds outcome-label logic to post-review.sh, completing the label state machine documented in the bugfix workflow guide. After fullsend post-review posts the verdict, the script now applies ready-for-merge (clean approve), requires-manual-review (downgraded approve or comment disposition), or no label (request_changes). The implementation correctly tracks the protected-path downgrade via a DOWNGRADED flag so the label reflects the final posted action, not the original agent output. A new test file validates the label-selection logic. No critical, high, or medium findings.

Findings

Critical

None.

High

None.

Medium

None.

Low

  • [style] post-review.sh:127-155 — The requires-manual-review label creation + application block is duplicated verbatim for the downgraded-approve and comment branches. Consider extracting a helper function (e.g., apply_label) to reduce duplication and ensure future label changes only need one edit.
    Remediation: Extract gh label create ... && gh pr edit ... --add-label into a apply_outcome_label() function called from each branch.

Info

  • [correctness] post-review-test.sh — The test reimplements the label-selection logic in a standalone determine_outcome_label function rather than sourcing it from post-review.sh. This is consistent with existing test patterns in the repo (post-code-test.sh, post-triage-test.sh), but the logic could diverge if the production script is updated without updating the test. Acceptable trade-off given bash sourcing constraints.

  • [intent-alignment] The PR body note about pre-commit/lint not running in sandbox is expected — the post-script runs authoritative pre-commit on the runner. No concern.

Footer

Outcome: approve
This review applies to SHA 3fa1d00ce0436ea1fa7614e248774b28ae9be261. Any push to the PR head clears this review and requires a new evaluation.

@ben-alkov ben-alkov changed the title feat(#552): add outcome labels to post-review.sh feat(review-agent): add outcome labels to post-review.sh May 13, 2026
@waynesun09
Copy link
Copy Markdown
Contributor

Review Squad Report — #874

Agents dispatched: 10 (3x claude-coder, 3x claude-researcher, 2x gemini-code-review, 2x cursor-code-review)
Total findings: 10 (after dedup and verification)
False positives removed: 3


High (1)

1. Stale labels accumulate across review cycles — no label cleanup

File: internal/scaffold/fullsend-repo/scripts/post-review.sh:127-153
Finding: The script only adds labels via --add-label but never removes conflicting outcome labels from prior review runs. If round 1 applies ready-for-merge and round 2 results in comment, the PR ends up with both ready-for-merge AND requires-manual-review — contradictory state. Issue #552 says "Labels are mutually exclusive where it matters — the pipeline enforces this," but this enforcement is missing.
Suggestion: Before applying the new label, remove all other outcome labels:

for stale_label in "ready-for-merge" "requires-manual-review" "rejected"; do
  gh pr edit "${PR_NUMBER}" --repo "${REPO_FULL_NAME}" \
    --remove-label "${stale_label}" 2>/dev/null || true
done

Consensus: 8/10 agents


Medium (4)

2. Test reimplements logic instead of testing production code — drift risk

File: internal/scaffold/fullsend-repo/scripts/post-review-test.sh:24-41
Finding: determine_outcome_label() in the test file manually reimplements the label-selection logic from post-review.sh. If someone modifies the production logic but forgets to update the test, all tests still pass against the stale copy. The test also includes reject as a unified branch, while production handles it in a separate if block.
Suggestion: Add a cross-reference comment in post-review.sh near the label block: # NOTE: label logic is mirrored in post-review-test.sh — update both. Alternatively, adopt the mock-binary approach from post-triage-test.sh.
Consensus: 7/10 agents

3. reject block lacks gh label create and is structurally separate from outcome label chain

File: internal/scaffold/fullsend-repo/scripts/post-review.sh:155-162
Finding: Two issues: (a) the pre-existing reject block does gh pr edit --add-label "rejected" without first calling gh label create, so in newly onboarded repos the label may not exist and application fails silently; (b) the reject block is a separate if statement, not part of the if/elif chain, so it doesn't participate in any future mutual-exclusivity cleanup.
Suggestion: Fold reject into the elif chain and add gh label create "rejected" before the --add-label call.
Consensus: 6/10 agents

4. 2>/dev/null || true on gh pr edit --add-label masks real failures

File: internal/scaffold/fullsend-repo/scripts/post-review.sh:137-138,143-145,150-152
Finding: The 2>/dev/null || true on gh label create is defensible (expected "already exists" error). However, the same pattern on gh pr edit --add-label suppresses permission errors, rate limits, and network failures. The existing reject block uses bare || true (no stderr redirect), which is better since stderr still reaches the Actions log.
Suggestion: Keep 2>/dev/null || true on gh label create; use bare || true on gh pr edit --add-label to match the existing reject block pattern.
Consensus: 5/10 agents

5. Missing test edge cases

File: internal/scaffold/fullsend-repo/scripts/post-review-test.sh:68-88
Finding: Tests cover the 5 expected actions but miss: empty string, unknown values (e.g. "failure", "APPROVE"), and comment with downgraded=true. The failure action is a real value synthesized by the no-output fallback at line 44 of post-review.sh.
Suggestion: Add boundary tests:

run_test "comment-with-downgrade-flag" "comment" "true" "requires-manual-review"
run_test "empty-action-no-label" "" "false" "none"
run_test "failure-action-no-label" "failure" "false" "none"
run_test "unknown-action-no-label" "banana" "false" "none"

Consensus: 5/10 agents


Low (3)

6. Duplicated requires-manual-review label blocks

File: internal/scaffold/fullsend-repo/scripts/post-review.sh:139-152
Finding: The approve+downgraded and comment branches have identical label creation and application code. If label name, color, or description changes, both must be updated in lockstep.
Suggestion: Combine: elif { [ "${ACTION}" = "approve" ] && [ "${DOWNGRADED}" = "true" ]; } || [ "${ACTION}" = "comment" ]; then
Consensus: 6/10 agents

7. request_changes falls through silently with no log message

File: internal/scaffold/fullsend-repo/scripts/post-review.sh:127-153
Finding: Every action path has an echo statement documenting the disposition except request_changes. Add an explicit branch: elif [ "${ACTION}" = "request_changes" ]; then echo "Request-changes disposition — no outcome label (fix agent triggers on event)"
Consensus: 3/10 agents

8. ACTION variable semantics correct but undocumented

File: internal/scaffold/fullsend-repo/scripts/post-review.sh:55
Finding: ACTION holds the original value from agent-result.json and is intentionally never reassigned after the downgrade. This invariant is undocumented — if a future editor re-reads ACTION from the modified RESULT_FILE, the downgraded-approve branch would silently break.
Suggestion: Add a comment: # ACTION holds the original action for the entire script — not re-read after protected-path downgrade.
Consensus: 4/10 agents


Info (2)

9. Unquoted ${FAILURES} in test

File: internal/scaffold/fullsend-repo/scripts/post-review-test.sh:90 — safe but quoting is best practice.

10. request_changes/ready-to-code design question

Issue #552's table says "Unanimous rework → ready-to-code" but its "What should happen" section says "no label needed (fix agent triggers on event; see #551)". Implementation matches the latter. Not a bug — requirements ambiguity.


Agent Agreement

Finding Agents Strength
Stale label cleanup 8/10 Strong consensus
Test drift risk 7/10 Strong consensus
Duplicated label blocks 6/10 Strong consensus
Reject block issues 6/10 Strong consensus
Error suppression 5/10 Consensus
Missing test cases 5/10 Consensus
ACTION docs 4/10 Moderate
Silent request_changes 3/10 Moderate

False Positives Removed

Finding Reason
ACTION variable is stale/buggy after downgrade Confirmed correct — ACTION retains original value, DOWNGRADED compensates
gh label create on every run is wasteful Pre-existing convention matching post-fix.sh, not introduced by this PR
Missing integration tests for gh CLI Out of scope; unit test pattern matches existing repo convention

Summary

The core logic is correct — the DOWNGRADED flag elegantly tracks protected-path downgrades and the label selection works as designed. The main issue is stale label accumulation (HIGH): re-runs will leave contradictory labels on PRs, violating the documented mutual exclusivity. The structural issues (reject block separation, test drift, error suppression inconsistency) are maintainability concerns worth addressing since this PR establishes the label state machine pattern.

Copy link
Copy Markdown
Contributor

@ralphbean ralphbean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean implementation that completes the label state machine from #552. Follows established patterns from post-fix.sh, tests match the existing conventions, and the DOWNGRADED flag is a nice minimal solution for tracking the protected-path downgrade.

One minor note inline (deferred, not blocking).

Comment thread internal/scaffold/fullsend-repo/scripts/post-review.sh
@ben-alkov
Copy link
Copy Markdown
Contributor

/fix

@github-actions
Copy link
Copy Markdown

fullsend fix is working on this — view logs

fullsend-ai-coder Bot pushed a commit that referenced this pull request May 13, 2026
Combine the downgraded-approve and comment branches into a single
condition since both apply the same requires-manual-review label.
This addresses the review feedback about verbatim duplication that
could drift if one branch is updated without the other.

Addresses review feedback on #874

Signed-off-by: fullsend-fix <fullsend-code@users.noreply.github.com>
@fullsend-ai-coder
Copy link
Copy Markdown
Author

🔧 Fix agent — iteration 1 (human-triggered)

Addressed 1 review finding: deduplicated the requires-manual-review label block by combining two elif branches into one condition.

Fixed (1):

  1. duplicated requires-manual-review label block (internal/scaffold/fullsend-repo/scripts/post-review.sh): Combined the downgraded-approve branch (lines 139-145) and comment branch (lines 146-152) into a single condition using || since both apply the identical requires-manual-review label-create + label-apply block. This eliminates the risk of one branch being updated without the other.

Tests: passed

Decision points
  • Chose to merge conditions with || rather than extracting a helper function (alternatives: Extract apply_requires_manual_review helper function, Use a variable to hold the label name and apply after the if/elif block; rationale: The || condition is the simplest change that eliminates duplication while keeping the script's linear structure intact. A helper function would be over-engineering for a two-command block.)

Updated by fullsend fix agent

@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

After posting the review verdict, post-review.sh now applies outcome
labels based on the review action, completing the label state machine
documented in the bugfix workflow guide:

- approve (no protected-path downgrade) -> ready-for-merge
- approve (downgraded to comment) -> requires-manual-review
- comment (split/conflicting review) -> requires-manual-review
- request_changes -> no label (fix agent triggers on the event)

Labels are created if missing, matching the needs-human pattern in
post-fix.sh. A DOWNGRADED flag tracks whether the protected-path
check changed the action so the correct label is applied based on
the final posted action rather than the original.

Closes #552

Assisted-by: Claude Code (Opus 4.6)
Signed-off-by: Ben Alkov <ben.alkov@redhat.com>
@ben-alkov ben-alkov force-pushed the agent/552-add-review-outcome-labels branch from c1dea91 to e903bf9 Compare May 14, 2026 12:44
@github-actions
Copy link
Copy Markdown

fullsend review is working on this — view logs

@ben-alkov ben-alkov added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit bd4aea0 May 14, 2026
25 checks passed
@ben-alkov ben-alkov deleted the agent/552-add-review-outcome-labels branch May 14, 2026 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

post-review.sh does not apply outcome labels (ready-for-merge, requires-manual-review)

3 participants