Skip to content

fix(automations): scheduled runs must not inherit ambient request context#380

Merged
mgoldsborough merged 2 commits into
mainfrom
fix/scheduled-run-context-bleed
Jun 3, 2026
Merged

fix(automations): scheduled runs must not inherit ambient request context#380
mgoldsborough merged 2 commits into
mainfrom
fix/scheduled-run-context-bleed

Conversation

@mgoldsborough
Copy link
Copy Markdown
Contributor

Symptom

A scheduled automation owned by one workspace executed focused on a different workspace — discovering only the other workspace's tools, with zero matches for its own. A cross-tenant isolation bleed in the scheduled-execution path.

Root cause

getExecutorContext (src/tools/platform/automations.ts) resolved who a run acts as by preferring the ambient getRequestContext() (AsyncLocalStorage) over the automation's own ownerId/workspaceId. Its comment assumed "a scheduled run has no request context." That assumption is violated:

  • The scheduler arms its timer with setTimeout(() => this.onTimer()) (scheduler.ts). AsyncLocalStorage snapshots propagate through setTimeout.
  • reload() / runNow() re-arm the timer and are invoked from LLM tool handlers, which run inside runWithRequestContext(<that chat's workspace>).
  • So creating/editing an automation from a chat in workspace B captures B's context in the timer closure. onTimer then fires inside B's context, and getExecutorContext reads it — overriding the automation's real provenance. It's self-perpetuating (onTimer re-arms from inside the captured context), so once poisoned, every scheduled tick for any owner runs focused on B until the process restarts.

runtime.executeTask derives identity/workspace solely from the passed request (runtime.tsresolveRequestOwnerId(request.identity), request.workspaceId) and establishes its own request context for the engine. So once the resolver is told the correct owner/workspace, the entire run is correctly scoped — fixing the resolver is sufficient; no ALS-clearing gymnastics needed.

Fix

Make the run trigger explicit instead of inferring "scheduled" from the absence of an ambient context (an inference a leaked timer context silently defeats):

  • Thread a trigger: "scheduled" | "manual" through Scheduler.dispatchRunExecutorcreateDirectExecutor's getContext. onTimer dispatches "scheduled"; runNow (test button) dispatches "manual". executeDirect defaults to "scheduled" (the isolation-safe direction).
  • Extract the decision into a pure, exported resolveExecutorContext(automation, trigger, reqCtx). Scheduled → automation owner + provenance workspace, ambient context ignored. Manual → dispatched synchronously inside the user's genuine request context, so it keeps using it (falling back to provenance).

Tests

  • automations-executor-context.test.ts (new): scheduled runs ignore a leaked workspace-B context; manual runs use it; identity-scope and missing-owner edge cases.
  • scheduler.test.ts "run trigger" block: timer dispatch is "scheduled", runNow is "manual".
  • automation-e2e.test.ts: updated the mock executor to the 3-arg signature and assert the run tool dispatches "manual" end to end.

bun run verify:static and the full automations suite (208 tests) pass.

Notes / follow-ups

  • Identity shape: scheduled runs act as { id: ownerId } (no orgRole). This is the pre-existing scheduled-run contract — the no-ambient-context path always produced it; this change only removes the wrong-tenant bleed that previously substituted a foreign full identity. If scheduled runs should carry the owner's full role (e.g. for admin-only tools), hydrating the owner's UserIdentity by id is a separate, orthogonal improvement.
  • Defense in depth (not in this PR): the deeper footgun is that the scheduler's timer captures a request-scoped ALS context at all. Severing it at the dispatch boundary (running scheduled dispatch in a cleared context) would harden any future ambient read on the scheduled path. The explicit trigger fully addresses the reported bug; this would be additional hardening.

…text

A scheduled automation could execute focused on the wrong workspace. The
scheduler arms its timer with setTimeout, and AsyncLocalStorage propagates the
request context active when armTimer ran. reload()/runNow() are called from LLM
tool handlers that run inside runWithRequestContext(<workspace>), so creating or
editing an automation from a chat in workspace B captured B's context in the
timer. getExecutorContext then preferred that ambient context over the
automation's own ownerId/workspaceId, so every scheduled tick (any owner) ran
focused on B — a cross-tenant isolation bleed. Observed as an automation owned
by one workspace discovering only another workspace's tools.

Make the run trigger explicit instead of inferring 'scheduled' from the absence
of an ambient context (which a leaked timer context defeats): thread a
'scheduled' | 'manual' trigger through the scheduler -> executor -> context
resolver. Scheduled runs resolve from the automation's owner/provenance and
ignore the ambient context entirely; manual (test-button) runs, dispatched
synchronously inside the user's genuine request context, keep using it.

executeTask already derives identity/workspace solely from the request object
and establishes its own request context for the engine, so correcting the
resolver is sufficient — no run reads the bled context once it is told who to
act as.

Extract the decision into a pure, exported resolveExecutorContext for direct
unit testing. Add regression coverage: scheduled runs ignore a leaked workspace
context; the scheduler dispatches timer runs as 'scheduled' and runNow as
'manual'; the run tool dispatches 'manual' end to end.
Invert resolveExecutorContext so only an explicit 'manual' trigger reads the
ambient request context; every other value — including undefined from an
untyped/test caller or a future trigger — falls through to the isolated
owner/provenance path. Reading another tenant's ambient context now requires an
explicit opt-in, so a new dispatch path that forgets to opt in cannot leak.

Previously the safe (scheduled) branch was the explicit one and anything
unrecognized fell into the ambient branch; since CI does not typecheck tests, an
untyped caller passing undefined would have hit the ambient path. Fail-closed
removes that edge.
@mgoldsborough
Copy link
Copy Markdown
Contributor Author

Follow-up from QA review (commit e152969): made resolveExecutorContext fail-closed.

The resolver now reads the ambient request context only on an explicit "manual" trigger; every other value — including undefined from an untyped/test caller or any future trigger — falls through to the isolated owner/provenance path. Previously the safe (scheduled) branch was the explicit one and anything unrecognized fell into the ambient branch — and since CI doesn't typecheck tests, an untyped caller passing undefined would have hit the ambient (isolation-unsafe) path. This inverts that so leaking another tenant's context requires an explicit opt-in.

Added a regression test asserting an unknown/undefined trigger does not read ambient context. verify:static + full automations suite (209 tests) pass.

On the other QA suggestion (identity hydration / orgRole): deferring as discussed — it's the pre-existing no-ambient contract, safer than the pre-fix bleed, and orthogonal. Worth a separate change only if a scheduled automation ever needs admin-gated tools.

@mgoldsborough mgoldsborough added the qa-reviewed QA review completed with no critical issues label Jun 3, 2026
@mgoldsborough mgoldsborough merged commit e45404c into main Jun 3, 2026
5 checks passed
@mgoldsborough mgoldsborough deleted the fix/scheduled-run-context-bleed branch June 3, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa-reviewed QA review completed with no critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant