ai: contain Kiro ACP stream failures and empty responses#217
Open
anders-heimer wants to merge 5 commits into
Open
ai: contain Kiro ACP stream failures and empty responses#217anders-heimer wants to merge 5 commits into
anders-heimer wants to merge 5 commits into
Conversation
Capture malformed ACP stdout separately from stderr so errors can report both diagnostic streams without leaking secrets. Include redacted JSON-RPC error data in Kiro ACP failures to expose nested provider error details. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Kiro ACP wraps response-stream and provider failures in JSON-RPC -32603 errors with details in error.data. Classify known transient, rate-limit, provider, and permanent markers so the retry layer can back off appropriately. Treat kiro-cli timeouts as transient typed errors. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Track whether ACP session updates indicate possible tool, command, or file mutations before an error is returned. Retryable-looking failures after such updates are kept fatal so the caller does not replay potentially non-idempotent work. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Bound Kiro ACP stream failures with Kiro-only per-turn retry budgets, a process-local circuit breaker, and ACP-line idle telemetry. Send session/cancel on prompt failures, preserve terminal Kiro failures across review-bin stdio, and keep retries blocked after side-effect-looking ACP updates. Rate-limit markers intentionally take precedence over generic stream failure markers so throttling receives the slower quota backoff instead of the short stream retry delay. Signed-off-by: Anders Heimer <anders.heimer@est.tech>
Treat Kiro ACP session/prompt successes with no model content as provider errors before downstream JSON validation sees them. Retry side-effect-free empty responses under a small Kiro-only per-turn budget, and fail terminal when the budget is exhausted or side effects were observed. Signed-off-by: Anders Heimer <anders.heimer@est.tech>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds Kiro-specific containment and diagnostics for ACP
session/promptfailures.
The Kiro backend can fail or stall while Sashiko is waiting for the long-lived
ACP prompt response stream. In earlier reviews this showed up as repeated
JSON-RPC
-32603 Internal errorresponses wrapping Kiro/upstream response-streamfailures such as
reqwest::Error { kind: Body, source: TimedOut }andUnexpectedEof. Those errors were retried as generic transient provider errors,which allowed a single expensive Kiro turn to monopolize a review worker through
many retries.
This series keeps the retry containment scoped to the Kiro ACP backend. Other AI
providers keep the existing generic retry behavior.
Changes
Improve Kiro ACP diagnostics around:
initializesession/newsession/prompterror.dataClassify known Kiro ACP stream/provider failures as transient, including
response-stream errors, response body errors, stalled-stream/minimum-throughput
failures,
TimedOut, andUnexpectedEof.Preserve rate-limit precedence over generic stream-transient classification, so
throttling wrapped in a stream-looking error still follows the rate-limit path.
Add a side-effect-aware retry gate for ACP prompt turns. If Kiro emits
tool/action-like updates before an otherwise retryable failure, the error is
treated as fatal rather than replaying a possibly non-idempotent turn.
Add Kiro-only per-turn retry containment:
Add a Kiro ACP idle watchdog based on valid ACP JSON-RPC stdout activity,
rather than arbitrary stdout bytes. This avoids treating verbose/non-ACP Kiro
diagnostics as protocol progress.
Send ACP
session/cancelwhensession/promptread handling fails before theprovider returns the error.
Treat protocol-level successful Kiro turns with no model content as a distinct
Kiro empty-response failure. Empty responses are retried only while the
Kiro-specific empty-response budget allows it, and remain fatal after
side-effect-looking ACP updates.
Configuration
The Kiro containment settings are configurable through environment variables: