feat: retry on timeout + post fallbacks as PR reviews by derekmisler · Pull Request #19 · docker/docker-agent-action

derekmisler · 2026-06-25T12:12:46Z

Summary

Two improvements to timeout handling in PR reviews.

Change 1: retry-on-timeout

Add retry-on-timeout input (default "0") to root action.yml
exec.ts: exit code 124 now only skips retry when the retryOnTimeout budget is exhausted; otherwise falls through to the normal retry path
index.ts: reads and passes retryOnTimeout to runAgent()
review-pr/action.yml: sets retry-on-timeout: "1" on the Run PR Review step — one automatic retry when the 45-min timeout fires

Change 2: fallback comments → PR reviews

All three fallback comment paths in Post clean summary now post via POST /pulls/{pr}/reviews with event: COMMENT instead of POST /issues/{pr}/comments:

Timeout fallback
Failure fallback
Defense-in-depth LGTM fallback (dedup guard updated to query /pulls/{pr}/reviews accordingly)

This enables GitHub's re-request review button on all fallback scenarios, not just the happy path.

No permission changes needed — GH_TOKEN already has pull-requests: write.

Testing

472 unit tests pass (2 new tests for retry-on-timeout behavior)
TypeScript typecheck clean
Biome lint clean

Change 1 — retry-on-timeout: - Add retry-on-timeout input (default 0) to root action.yml - exec.ts: when exit 124, break only if retryOnTimeout budget exhausted; otherwise fall through to retry loop - index.ts: read and pass retryOnTimeout to runAgent() - review-pr/action.yml: set retry-on-timeout: "1" on Run PR Review step so a timed-out review gets one automatic retry - Add two new unit tests: retry succeeds on second attempt; stops after budget Change 2 — fallback comments → PR reviews: - Post all three fallback paths (timeout / failure / defense-in-depth LGTM) via POST /pulls/{pr}/reviews with event:COMMENT instead of POST /issues/{pr}/comments — enables GitHub's re-request review button - Update dedup guard to query /pulls/{pr}/reviews instead of /issues/comments

docker-agent

Assessment: 🟡 NEEDS ATTENTION

docker-agent · 2026-06-25T12:17:39Z

-    }
-
-    if (attempt > opts.maxRetries) {
+      if (opts.retryOnTimeout <= 0 || attempt > opts.retryOnTimeout) {


[MEDIUM] Shared attempt counter conflates timeout and non-timeout retry budgets

The timeout retry budget (retryOnTimeout) and the normal failure retry budget (maxRetries) both increment the same attempt counter. This causes under-retrying in mixed-failure sequences:

Attempt 1: normal (non-timeout) failure → attempt(1) > maxRetries(1) is false → retries (correct so far)

Attempt 2: timeout → attempt(2) > retryOnTimeout(1) is true → breaks with no timeout retry used

The intent is that retryOnTimeout=1 allows one extra retry when a timeout fires. But if a normal failure has already advanced attempt to 2, the timeout-retry budget is silently exhausted without ever being used.

The two budgets should be tracked with independent counters (e.g. timeoutRetryCount and failureRetryCount) rather than sharing a single attempt offset against different limits.

Fixed — replaced the shared attempt counter with two independent counters: failureRetryCount (checked against maxRetries) and timeoutRetryCount (checked against retryOnTimeout). A mixed failure→timeout→success sequence now works correctly and is covered by a new test.

Excellent fix! The independent counters now correctly track each budget separately. I verified that:

failureRetryCount increments only for non-timeout failures (checked against maxRetries)

timeoutRetryCount increments only for timeout failures (checked against retryOnTimeout)

A mixed-failure sequence like failure → timeout → success now correctly uses 1 from each budget instead of exhausting the shared counter prematurely

The implementation is clean and the log messages also accurately reflect which budget is being used. Great work covering this with a test case!

docker-agent · 2026-06-25T12:17:39Z

-            # Dedup guard: skip posting if an identical fallback comment already exists.
-            EXISTING=$(gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \
+            # Dedup guard: skip posting if an identical fallback PR review already exists.
+            EXISTING=$(gh api "repos/$REPOSITORY/pulls/$PR_NUMBER/reviews" \


[MEDIUM] Dedup guard for fallback LGTM is blind to old issue-comment style posts

The dedup guard now queries GET /pulls/{pr}/reviews for existing LGTM reviews, but the previous code posted fallbacks to POST /issues/{pr}/comments. These are separate GitHub API endpoints returning entirely different data.

On the first run of this new code against a PR that previously received an LGTM issue comment (from the old path), the guard will find zero matching reviews and post a duplicate LGTM PR review — leaving the PR with both an issue comment and a PR review saying the same thing.

Consider adding a one-time cross-check against the issue comments endpoint as well:

EXISTING_OLD=$(gh api "repos/$REPOSITORY/issues/$PR_NUMBER/comments" \ --jq '[.[] | select(.body | startswith("🟢 **No issues found**"))] | length' \ 2>/dev/null || echo "0")

Or simply accept the one-time duplicate as a known migration artefact (subsequent runs will dedup correctly via the reviews endpoint).

Fixed with the cross-check approach — the guard now queries both /pulls/{pr}/reviews (new path) and /issues/{pr}/comments (old path), and skips posting if either returns a match. This avoids the one-time duplicate on the migration run without any manual cleanup.

Perfect — the cross-check implementation looks solid. The guard now queries both /pulls/{pr}/reviews and /issues/{pr}/comments, and skips posting if either returns a match. This cleanly handles the migration without manual intervention.

The || echo \"0\" fallback on both queries is a nice touch — ensures the skip logic degrades gracefully if either API call fails (e.g., rate limiting).

exec.ts: replace shared attempt counter with separate failureRetryCount and timeoutRetryCount so mixed-failure sequences (e.g. a non-timeout failure followed by a timeout) consume only from the correct budget. The previous code used attempt > retryOnTimeout which silently exhausted the timeout budget when a prior failure had already advanced attempt. Add a test covering the mixed failure→timeout→success sequence. review-pr/action.yml: extend the LGTM dedup guard to also check the old issues/comments endpoint alongside the new pulls/reviews endpoint, so the first run of this code against a PR that previously received an LGTM issue comment does not post a duplicate review.

exec.ts: retry log now shows 'Timeout retry N of M' vs 'Retry N of M' so ops can distinguish which budget fired without reading code. review-pr/action.yml: guard jq selects with (.body // "") to avoid a type error when .body is null (draft reviews, deleted comments), which previously caused the || echo 0 fallback to fire and post a duplicate. action.yml: retry-on-timeout description notes that the budget is independent of max-retries so both can be consumed in the same run.

derekmisler self-assigned this Jun 25, 2026

derekmisler marked this pull request as ready for review June 25, 2026 12:13

derekmisler requested a review from docker-agent June 25, 2026 12:14

docker-agent reviewed Jun 25, 2026

View reviewed changes

derekmisler added 3 commits June 25, 2026 12:19

style: fix biome formatting (trailing alignment spaces)

e1e29b4

derekmisler enabled auto-merge (squash) June 25, 2026 14:00

aheritier approved these changes Jun 25, 2026

View reviewed changes

derekmisler merged commit e44355e into docker:main Jun 25, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: retry on timeout + post fallbacks as PR reviews#19

feat: retry on timeout + post fallbacks as PR reviews#19
derekmisler merged 4 commits into
docker:mainfrom
derekmisler:improve-timeout-handling

derekmisler commented Jun 25, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent Jun 25, 2026

Uh oh!

derekmisler Jun 25, 2026

Uh oh!

docker-agent Jun 25, 2026

Uh oh!

docker-agent Jun 25, 2026

Uh oh!

derekmisler Jun 25, 2026

Uh oh!

docker-agent Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

derekmisler commented Jun 25, 2026

Summary

Change 1: retry-on-timeout

Change 2: fallback comments → PR reviews

Testing

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟡 NEEDS ATTENTION

Uh oh!

docker-agent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

derekmisler Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

derekmisler Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants