feat(openworkflow, server, dashboard): add resume workflow run functionality and UI components#521
feat(openworkflow, server, dashboard): add resume workflow run functionality and UI components#521vudc wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a "resume failed workflow run" capability across the backend (SQLite + Postgres), core API, client, and dashboard. Resuming flips a failed run back to pending, deletes failed step attempts so retry budgets are fresh, and preserves completed step cache.
Changes:
- New
resumeWorkflowRunmethod on theBackendinterface, implemented in SQLite and Postgres backends; new client helper and conflict resolver. - Dashboard gains a "Resume Run" button (component, server function, status helper, route integration).
- Example workflow + runner demonstrating the resume flow, plus tests for success and invalid-status cases.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/openworkflow/core/backend.ts | Adds resumeWorkflowRun to Backend and ResumeWorkflowRunParams type. |
| packages/openworkflow/core/workflow-run.ts | Adds resolveResumeWorkflowRunConflict error helper. |
| packages/openworkflow/sqlite/backend.ts | Implements resumeWorkflowRun with transaction-guarded UPDATE + DELETE. |
| packages/openworkflow/postgres/backend.ts | Implements resumeWorkflowRun via pg.begin transaction. |
| packages/openworkflow/client/client.ts | Exposes resumeWorkflowRun on OpenWorkflow client. |
| packages/openworkflow/worker/execution.test.ts | Tests for resume success + invalid-status rejection. |
| apps/dashboard/src/lib/api.ts | Adds resumeWorkflowRunServerFn. |
| apps/dashboard/src/lib/status.ts | Adds isRunResumableStatus helper. |
| apps/dashboard/src/components/run-resume-action.tsx | New AlertDialog-based Resume Run button. |
| apps/dashboard/src/routes/runs/$runId.tsx | Wires Resume Run button into run detail page. |
| openworkflow/flaky-payment.ts | Demo workflow that fails then succeeds after resume. |
| openworkflow/flaky-payment.run.ts | Runner script for the demo workflow. |
Comments suppressed due to low confidence (1)
apps/dashboard/src/components/run-resume-action.tsx:1
AlertDialogActioncloses the dialog on click by default. BecauseresumeRunis async and only callssetIsOpen(false)after success, the dialog will close immediately on click regardless of the in-flight request, and any error set in state won't be visible (the dialog is gone). Useevent.preventDefault()in the onClick (Radix supports this on AlertDialogAction) to keep the dialog open while resuming.
import {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (updateResult.changes === 0) { | ||
| this.db.exec("ROLLBACK"); | ||
| const existing = await this.getWorkflowRun({ | ||
| workflowRunId: params.workflowRunId, | ||
| }); | ||
| resolveResumeWorkflowRunConflict(params.workflowRunId, existing); | ||
| } |
| async resumeWorkflowRun( | ||
| params: ResumeWorkflowRunParams, | ||
| ): Promise<WorkflowRun> { | ||
| const currentTime = now(); |
| const [updated] = await tx<WorkflowRun[]>` | ||
| UPDATE ${workflowRunsTable} |
| RETURNING * | ||
| `; |
| DELETE FROM "step_attempts" | ||
| WHERE "namespace_id" = ? | ||
| AND "workflow_run_id" = ? | ||
| AND "status" = 'failed' |
| > | ||
| <Button | ||
| type="button" | ||
| variant="default" |
| disabled={isResuming} | ||
| > | ||
| Resume Run | ||
| </Button> |
| * you restart the worker between the failure and the resume, the counter | ||
| * resets, so resume will fail again and you'll need to resume once more. | ||
| */ | ||
| export const flakyPayment = defineWorkflow<FlakyPaymentInput, FlakyPaymentOutput>( |
|
|
||
| // eslint-disable-next-line functional/no-throw-statements | ||
| throw new Error( | ||
| `Cannot resume workflow run ${workflowRunId} with status ${existing.status}; only failed runs can be resumed`, |
| SET | ||
| "status" = 'pending', | ||
| "worker_id" = NULL, | ||
| "error" = NULL, |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.
Comments suppressed due to low confidence (3)
apps/dashboard/src/components/run-resume-action.tsx:1
AlertDialogActionfrom Radix closes the dialog by default on click. Because the click immediately closes the dialog,onOpenChange(false)fires, which setserrorto null, so the error message rendered at line 94 will never be visible if the resume call rejects — the dialog is already gone. Either prevent default close (event.preventDefault()) before invokingresumeRun, or use a regular<Button>and manage close on success only.
import {
openworkflow/flaky-payment.ts:1
- With
maximumAttempts: 3and the conditionreserveAttempt <= RESERVE_MAX_ATTEMPTS(i.e. fails on attempts 1, 2, 3), the first run throws on all 3 attempts and the workflow fails as intended. On resume,reserveAttemptcontinues from 3, increments to 4, and 4 ≤ 3 is false → succeeds. That works only because the counter persists. If the worker restarts,reserveAttemptresets to 0 and three resumes are needed (each consuming a full retry budget). The comment acknowledges this, but consider also: each resume gives the step a fresh retry budget, but the counter exceeds 3 only after 4 increments, so a worker restart between resumes means the second resume also fails outright. This makes the demo brittle in practice.
import { defineWorkflow } from "openworkflow";
apps/dashboard/src/lib/api.ts:1
- Like the sibling
cancelWorkflowRunServerFn, there is no authentication/authorization check on this endpoint. Anyone able to call the server function can resume any workflow run. If this matches the project's existing authn/z model (e.g. dashboard is internal-only) it's fine; otherwise consider adding an auth guard. Flag for consistency with the rest of the dashboard.
import { getBackend } from "./backend";
| if (updateResult.changes === 0) { | ||
| this.db.exec("ROLLBACK"); | ||
| const existing = await this.getWorkflowRun({ | ||
| workflowRunId: params.workflowRunId, | ||
| }); | ||
| resolveResumeWorkflowRunConflict(params.workflowRunId, existing); | ||
| } |
| DELETE FROM "step_attempts" | ||
| WHERE "namespace_id" = ? | ||
| AND "workflow_run_id" = ? | ||
| AND "status" = 'failed' |
| // Drop the prior failed attempts so the next worker pass starts the | ||
| // failed step with a fresh retry budget and the existing completed | ||
| // attempts remain in cache (replay skips re-execution). | ||
| await tx` | ||
| DELETE FROM ${stepAttemptsTable} | ||
| WHERE "namespace_id" = ${this.namespaceId} | ||
| AND "workflow_run_id" = ${params.workflowRunId} | ||
| AND "status" = 'failed' |
| const existing = await this.getWorkflowRun({ | ||
| workflowRunId: params.workflowRunId, | ||
| }); | ||
| resolveResumeWorkflowRunConflict(params.workflowRunId, existing); |
| try { | ||
| this.db.exec("BEGIN IMMEDIATE"); |
Closes #520
Summary
Adds a new client + backend method, resumeWorkflowRun(workflowRunId), that flips a failed workflow run back to pending so the worker picks it up on the next tick. Completed steps are served from the existing step-attempt cache (no re-execution); the failing step starts with a fresh retry budget. A "Resume Run" button is wired up on the run-detail page of the dashboard.
Changes
Backend
Backend.resumeWorkflowRun({ workflowRunId })added to the interface, implemented for Postgres and Sqlite. Atomically (single transaction) flipsstatusfromfailedtopending, clearserror/worker_id/finished_at, setsavailable_at = NOW(), andDELETEs the failedstep_attemptsrows for that run. Non-failed runs throw a clear error.ow.resumeWorkflowRun(workflowRunId)client method.resolveResumeWorkflowRunConflictincore/workflow-run.ts.Dashboard
RunResumeActioncomponent (apps/dashboard/src/components/run-resume-action.tsx) with a confirmation dialog.resumeWorkflowRunServerFninlib/api.ts.isRunResumableStatushelper inlib/status.ts(currently"failed"only).Tests
packages/openworkflow/worker/execution.test.ts:failed, resume, run completes, asserts the completed step was not re-executed and failed attempts are removed.