fix: stop triage loop at completion-summary node + i18n object-key crashes (#1863)#1866
Conversation
…crashes (#1863) Two distinct v0.52.0 regressions reported in issue #1863. 1. Triage loop (engine): the best-effort completion-summary graph node is wired into every built-in workflow with a success-only edge. A thrown handler exception or a failed summary projection write bypassed the advisory `!blocking -> success` coercion, terminated the graph at 'completion-summary', and routeGraphFailureToExecutionResume bounced the in-review task back to todo forever (token usage 0, execution NOT STARTED). The graph executor now degrades a completion-summary node failure to success (ensureWorkflowCompletionSummary still backfills task.summary), with a routeGraphFailureToExecutionResume backstop. Shared isCompletionSummaryNode predicate exported from @fusion/core. 2. i18n object-key crashes (dashboard): three views called t() with keys that resolve to nested objects (taskDetail.executionMode, routing.source, nodes.dockerHost), so i18next returned "returned an object instead of string" and crashed the render. Added leaf label keys across all locales and switched the callers. Tests: engine non-fatal completion-summary regression (fails without the fix), dashboard invariant guard scanning t("literal") callers against real en/app.json, and a Stats-panel reproduction against the real bundle. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
Next review available in: 7 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR makes completion-summary node failures non-fatal, adds leaf i18n label keys and updates dashboard usage to avoid object-valued translation lookups, and switches pi-claude-cli to the builtin Anthropic model catalog API with matching dependency and test updates. ChangesCompletion-Summary Non-Fatal Execution
i18n Leaf-Key Fixes and Guard Test
pi-claude-cli builtin model catalog
Estimated code review effort: 4 (Complex) | ~45 minutes Sequence Diagram(s)sequenceDiagram
participant workflow-graph-executor as workflow-graph-executor
participant completion-summary node as completion-summary node
participant publishTaskProjection as publishTaskProjection
participant executor router as executor router
workflow-graph-executor->>completion-summary node: executeNodeWithRetries
completion-summary node--xworkflow-graph-executor: handler throws
workflow-graph-executor->>workflow-graph-executor: degrade to success
workflow-graph-executor->>publishTaskProjection: publishTaskProjectionFromResult
publishTaskProjection--xworkflow-graph-executor: projection write fails
workflow-graph-executor->>workflow-graph-executor: keep original result
executor router->>executor router: failedNode === COMPLETION_SUMMARY_NODE_ID
executor router-->>executor router: return false
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes two distinct v0.52.0 regressions reported in issue #1863.
Confidence Score: 5/5Safe to merge — both failure paths in the graph executor are correctly degraded for the completion-summary node, the i18n leaf-key additions are complete across all six locales, and the regression tests cover the invariant end-to-end. Both bug-fix paths (handler throw → isCompletionSummaryNode degrade; projection-write failure → swallowed and kept as success) are exercised by independent tests. The built-in workflow surface enumeration test locks the no-failure-edge assumption. The i18n invariant test will catch any future object-key regression. The pi-claude-cli migration is a targeted API rename with matching mock updates. No regressions introduced. No files require special attention. The one P2 observation (the i18n static scanner does not cover template-literal t() calls) is a documentation gap, not a functional defect. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[executeNodeWithRetries] --> B{All retries exhausted?}
B -- No --> C[Handler succeeds - return result]
B -- Yes --> D{isCompletionSummaryNode?}
D -- Yes --> E[Degrade to success value: summary-unavailable]
D -- No --> F[Return failure value: exception]
G[publishTaskProjectionFromResult] --> H{publishTaskProjection throws?}
H -- No --> I[Return result unchanged]
H -- Yes --> J{isCompletionSummaryNode?}
J -- Yes --> K[Return result with projectionError in contextPatch keep outcome]
J -- No --> L[Return failure value: projection-error]
E --> M[Graph continues to next node]
K --> M
F --> N[Graph terminates]
L --> N
N --> O[routeGraphFailureToExecutionResume]
O --> P{failedNode === COMPLETION_SUMMARY_NODE_ID?}
P -- Yes --> Q[Park task as failed backstop]
P -- No --> R[Normal resume logic: move to todo]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[executeNodeWithRetries] --> B{All retries exhausted?}
B -- No --> C[Handler succeeds - return result]
B -- Yes --> D{isCompletionSummaryNode?}
D -- Yes --> E[Degrade to success value: summary-unavailable]
D -- No --> F[Return failure value: exception]
G[publishTaskProjectionFromResult] --> H{publishTaskProjection throws?}
H -- No --> I[Return result unchanged]
H -- Yes --> J{isCompletionSummaryNode?}
J -- Yes --> K[Return result with projectionError in contextPatch keep outcome]
J -- No --> L[Return failure value: projection-error]
E --> M[Graph continues to next node]
K --> M
F --> N[Graph terminates]
L --> N
N --> O[routeGraphFailureToExecutionResume]
O --> P{failedNode === COMPLETION_SUMMARY_NODE_ID?}
P -- Yes --> Q[Park task as failed backstop]
P -- No --> R[Normal resume logic: move to todo]
Reviews (2): Last reviewed commit: "Merge origin/main into feature/fix-triag..." | Re-trigger Greptile |
| * let the caller park the task `failed` (a visible terminal state) instead of | ||
| * looping it forever. | ||
| */ | ||
| if (failedNode === COMPLETION_SUMMARY_NODE_ID) return false; |
There was a problem hiding this comment.
The backstop checks
failedNode === COMPLETION_SUMMARY_NODE_ID (ID-only), but isCompletionSummaryNode — used by the graph executor for the actual degradation — also matches nodes whose config.summaryTarget === "task" with a different ID. If a plugin workflow wires a summaryTarget:"task" node under a custom ID and somehow its failure reaches this router, the backstop would not fire and the task would still loop. Aligning the check with the predicate closes the gap cleanly, even though the PR correctly notes this path should be unreachable after the graph-executor fix.
| if (failedNode === COMPLETION_SUMMARY_NODE_ID) return false; | |
| if (failedNode === COMPLETION_SUMMARY_NODE_ID || isCompletionSummaryNode({ id: failedNode })) return false; |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/engine/src/workflow-graph-executor.ts (1)
1210-1233: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winDegrade logic is sound but leaves no operator-visible trace of the underlying failure.
Both branches swallow the real error into an in-memory
contextpatch only. Sincecompletion-summaryhas noskillName,shouldRecordNodeProgressnever records it intotask.workflowStepResults, and neither branch callsthis.deps.logTaskEntry?.(...)(unlike the post-merge traversal-error path a few lines above, which does log). A recurring failure (e.g. a persistent model/provider outage) would silently degrade to "success" on every run with zero signal in the task log.Consider adding a
logTaskEntrycall in each branch so operators can still notice a systemically failing summary node.♻️ Proposed logging addition
if (isCompletionSummaryNode(node)) { + this.deps.logTaskEntry?.( + `[completion-summary] node handler failed after retries — degrading to success: ${lastError instanceof Error ? lastError.message : String(lastError)}`, + ); const degraded: WorkflowNodeResult = {if (isCompletionSummaryNode(node)) { + this.deps.logTaskEntry?.( + `[completion-summary] projection write failed — preserving node outcome: ${error instanceof Error ? error.message : String(error)}`, + ); return {Also applies to: 1354-1372
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/engine/src/workflow-graph-executor.ts` around lines 1210 - 1233, The completion-summary degrade path in workflow-graph-executor swallows the underlying exception into contextPatch without any operator-visible log. In the isCompletionSummaryNode(node) branch, add a this.deps.logTaskEntry?.(...) call before returning the degraded WorkflowNodeResult, and do the same in the other completion-summary fallback branch referenced by the review so persistent failures are visible in task logs. Use the existing node.id, lastError, and degraded result context to keep the message consistent with nearby traversal-error logging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/engine/src/workflow-graph-executor.ts`:
- Around line 1210-1233: The completion-summary degrade path in
workflow-graph-executor swallows the underlying exception into contextPatch
without any operator-visible log. In the isCompletionSummaryNode(node) branch,
add a this.deps.logTaskEntry?.(...) call before returning the degraded
WorkflowNodeResult, and do the same in the other completion-summary fallback
branch referenced by the review so persistent failures are visible in task logs.
Use the existing node.id, lastError, and degraded result context to keep the
message consistent with nearby traversal-error logging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d8aba147-8c6d-4942-a0a1-f39a81213451
📒 Files selected for processing (19)
.changeset/fix-stats-execution-mode-object-key.mdpackages/core/src/builtin-completion-summary-node.tspackages/core/src/index.tspackages/core/src/workflow-lifecycle-validation.tspackages/dashboard/app/__tests__/i18n-string-keys-not-objects.test.tspackages/dashboard/app/components/NodeDetailModal.tsxpackages/dashboard/app/components/RoutingTab.tsxpackages/dashboard/app/components/TaskTokenStatsPanel.tsxpackages/dashboard/app/components/__tests__/TaskTokenStatsPanel.test.tsxpackages/engine/src/__tests__/workflow-graph-completion-summary-nonfatal.test.tspackages/engine/src/executor.tspackages/engine/src/workflow-graph-executor.tspackages/i18n/locales/en/app.jsonpackages/i18n/locales/es/app.jsonpackages/i18n/locales/fr/app.jsonpackages/i18n/locales/ko/app.jsonpackages/i18n/locales/zh-CN/app.jsonpackages/i18n/locales/zh-TW/app.jsonpackages/i18n/src/resources.d.ts
…to getBuiltinModels
The Typecheck CI gate was failing because pi-claude-cli declared pi-ai and
pi-coding-agent as unpinned "*" peers. pi-ai 0.80 was published and, via
hoisting/non-frozen resolution, the bare `@earendil-works/pi-ai` import floated
to 0.80.3 — which moved the top-level `getModels` export to the deprecated
`/compat` shim ("has no exported member named 'getModels'").
Migrate forward to the latest, consistent with cli/engine which already pin
^0.80.3:
- Pin pi-ai and pi-coding-agent to ^0.80.3 (peer + dev) so the whole extension
resolves one pi-ai version; pinning pi-coding-agent too avoids the
AssistantMessageEventStream type skew that a pi-ai-only bump reintroduced.
- Import the canonical `getBuiltinModels` from
`@earendil-works/pi-ai/providers/all` (identical signature to the old
`getModels`; the top-level export is now the deprecated compat alias).
- Update the provider test mock to the new subpath.
Behavior-preserving: getBuiltinModels === getModels. Typecheck, the full
recursive typecheck, and all 347 pi-claude-cli tests pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolves the pnpm-lock.yaml conflict. Main's #1865 (review-checkout routing) auto-merged cleanly with the completion-summary backstop in executor.ts. Main independently pinned pi-claude-cli's pi-ai/pi-coding-agent to ^0.80.3 (e154892) but kept the top-level `getModels` import, which 0.80.3 removed — this branch's migration to `getBuiltinModels` from `/providers/all` is retained as the working fix. Lockfile regenerated against the merged package.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes #1863.
The issue conflated two distinct v0.52.0 regressions (the reporter's i18n root-cause guess was only half of it).
1. The real triage loop — engine (primary fix)
The
completion-summarygraph node was added to every built-in workflow in v0.52.0 (FN-7228/FN-7233), wired with a success-only edge — no failure edge. It's a best-effort node (ensureWorkflowCompletionSummaryalready backfillstask.summary), but two failure paths bypass its advisory!blocking → successcoercion:executeNodeWithRetriesreturnsvalue:"exception"publishTaskProjectionFromResultreturnsvalue:"projection-error"With no failure edge, either terminates the graph at
completion-summary.routeGraphFailureToExecutionResumethen treats the in-review failure as recoverable and moves the task back totodo, which replays the already-passed nodes (0 new tokens) and fails again — the infinitetriage → in-progress → todoloop in the report (token usage 0, execution NOT STARTED).Fix: the graph executor now degrades a
completion-summarynode failure tosuccessso the graph always advances, with arouteGraphFailureToExecutionResumebackstop that parksfailedinstead of looping. SharedisCompletionSummaryNodepredicate exported from@fusion/core.2. The i18n crash the reporter saw — dashboard
t("taskDetail.executionMode")resolves to a nested object (the inline mode-toggle copy), so i18next returns "key 'taskDetail.executionMode (en)' returned an object instead of string" and crashes the Stats tab — the exact error in the report. Surface enumeration found two more callers of the same class:routing.sourceandnodes.dockerHost. Added leaf label keys across all 6 locales and switched the callers.Tests
workflow-graph-completion-summary-nonfatal.test.ts— a failing summary node (both modes) never terminates/loops the graph, a non-summary node still fails, and all 5 built-ins have no failure edge. Fails without the engine fix.i18n-string-keys-not-objects.test.ts— invariant guard scanning everyt("literal")caller against the realen/app.json. Catches a reverted caller.TaskTokenStatsPanel.test.tsx— reproduces the exact i18next error against the real bundle.Verification
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features
Tests