Skip to content

fix(tasks): hydrate runner progress after single-node restart#3952

Open
cursor[bot] wants to merge 2 commits into
developfrom
cursor/critical-bug-investigation-df26
Open

fix(tasks): hydrate runner progress after single-node restart#3952
cursor[bot] wants to merge 2 commits into
developfrom
cursor/critical-bug-investigation-df26

Conversation

@cursor

@cursor cursor Bot commented Jun 13, 2026

Copy link
Copy Markdown

Bug and impact

1. Remote tasks stuck after single-node restart (critical)

When Semaphore restarts while remote runner tasks are still executing, TaskPool.GetTask only hydrated from the database in HA mode. On a single-node deployment the in-memory pool is empty after restart, so UpdateRunner silently dropped all progress reports. Tasks remained stuck in running in the database even after the runner completed successfully.

Trigger: Restart the Semaphore server (upgrade, crash, deploy) while one or more remote runner tasks are actively executing.

2. History page crash on websocket events (high)

History.vue called this.items.some() without a null guard when handling websocket updates. If an update arrived before the first paginated load finished (or after a failed load), the History view threw TypeError: Cannot read properties of null.

Trigger: Open project History while tasks are actively updating; websocket event arrives before /tasks/last returns.

Root cause

  • HA work added HydrateTaskRunnerFromDB behind an util.HAEnabled() guard in GetTask, leaving single-node restarts without the same recovery path already used across HA nodes.
  • Task pagination added websocket reload throttling to History.vue but omitted the null guards already present in TaskList.vue.

Fix and validation

  • Always hydrate from DB in GetTask when a task is not in the in-memory pool (same behavior HA already relied on).
  • Mirror TaskList.vue null guards and throttle state initialization in History.vue.
  • Added TestUpdateRunner_RunningTaskInDBButNotInPoolAccepted covering post-restart runner completion.
go test ./api/runners/... ./services/tasks/... -count=1
Open in Web View Automation 

GetTask only hydrated from the database when HA was enabled, so after a
single-node restart the in-memory pool was empty while remote runners
kept executing. UpdateRunner dropped progress for those tasks, leaving
them stuck in 'running' forever.

Also guard History.vue websocket handling against null items before the
first page load completes (matching TaskList.vue), preventing a crash on
the project History view.

Co-authored-by: Denis Gukov <fiftin@outlook.com>
@fiftin fiftin marked this pull request as ready for review June 13, 2026 15:41

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale comment

Security review

Outcome: No medium, high, or critical vulnerabilities found.

Reviewed the added/modified code in services/tasks/TaskPool.go, api/runners/runners_test.go, and web/src/views/project/History.vue.

TaskPool.GetTask hydration change

The PR removes the util.HAEnabled() guard so HydrateTaskRunnerFromDB runs whenever a task is absent from the in-memory pool (e.g. after a single-node restart). This expands an existing HA code path; it does not add a new externally reachable entry point.

Traced the primary sink (UpdateRunnertaskPool.GetTask → status/log updates / FinalizeRemoteTask). Existing controls remain in place:

  • Authentication: RunnerMiddleware requires a valid X-Runner-Token before UpdateRunner runs.
  • Authorization: tsk.Task.RunnerID must match the authenticated runner; mismatches are added to TerminatedJobs and skipped.
  • State guards: finished tasks are rejected; invalid statuses return 400; TryFinalize deduplicates terminal finalization.

A runner cannot use hydration to update tasks assigned to other runners or bypass runner authentication.

History.vue null-safety

The websocket handler changes only add null guards (this.items != null, this.items || []). They prevent runtime errors when the list is uninitialized and do not introduce new rendering of attacker-controlled data.

Prior automation threads

No unresolved security-review threads were present on this PR.

Open in Web View Automation 

Sent by Cursor Automation: Find vulnerabilities

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 299970ada8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +159 to +160
if task == nil {
task, err = p.HydrateTaskRunnerFromDB(id)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep GetTask scoped to active tasks

In single-node mode this changes GetTask from a queue/running lookup into “load any DB row”. RemoveTask treats any nonnil result from taskPool(r).GetTask as an active task and returns 400 before deleting (checked api/projects/tasks.go:423-434), so completed tasks that are no longer queued or running now hydrate here and cannot be deleted. Consider keeping this fallback on a runner-progress-specific path, or filtering it to unfinished remote-runner tasks so inactive historical tasks still return nil.

Useful? React with 👍 / 👎.

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security review

Outcome: No medium, high, or critical vulnerabilities found.

Re-reviewed the PR after the latest synchronize (head 9ba363ae), including services/tasks/TaskPool.go, api/runners/runners_test.go, and web/src/views/project/History.vue.

TaskPool.GetTask hydration change

Removing the util.HAEnabled() guard expands an existing HA code path so HydrateTaskRunnerFromDB runs whenever a task is absent from the in-memory pool (e.g. after a single-node restart). This does not introduce a new externally reachable entry point.

Traced the primary sink (UpdateRunnertaskPool.GetTask → log/status updates / FinalizeRemoteTask). Existing controls remain effective:

  • Authentication: RunnerMiddleware requires a valid X-Runner-Token before UpdateRunner runs.
  • Authorization: tsk.Task.RunnerID must match the authenticated runner; mismatches are added to TerminatedJobs and skipped (api/runners/runners.go).
  • State guards: finished tasks are rejected; invalid statuses return 400; TryFinalize deduplicates terminal finalization.

A runner cannot use hydration to update tasks assigned to other runners or bypass runner authentication.

History.vue null-safety

Websocket handler changes only add null guards (this.items != null, this.items || []). They prevent runtime errors when the list is uninitialized and do not introduce new rendering of attacker-controlled data.

Prior automation threads

No unresolved security-review threads from earlier automation runs. Prior assessment validated on the updated head commit.

Open in Web View Automation 

Sent by Cursor Automation: Find vulnerabilities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants