fix(automations): scheduled runs must not inherit ambient request context#380
Conversation
…text A scheduled automation could execute focused on the wrong workspace. The scheduler arms its timer with setTimeout, and AsyncLocalStorage propagates the request context active when armTimer ran. reload()/runNow() are called from LLM tool handlers that run inside runWithRequestContext(<workspace>), so creating or editing an automation from a chat in workspace B captured B's context in the timer. getExecutorContext then preferred that ambient context over the automation's own ownerId/workspaceId, so every scheduled tick (any owner) ran focused on B — a cross-tenant isolation bleed. Observed as an automation owned by one workspace discovering only another workspace's tools. Make the run trigger explicit instead of inferring 'scheduled' from the absence of an ambient context (which a leaked timer context defeats): thread a 'scheduled' | 'manual' trigger through the scheduler -> executor -> context resolver. Scheduled runs resolve from the automation's owner/provenance and ignore the ambient context entirely; manual (test-button) runs, dispatched synchronously inside the user's genuine request context, keep using it. executeTask already derives identity/workspace solely from the request object and establishes its own request context for the engine, so correcting the resolver is sufficient — no run reads the bled context once it is told who to act as. Extract the decision into a pure, exported resolveExecutorContext for direct unit testing. Add regression coverage: scheduled runs ignore a leaked workspace context; the scheduler dispatches timer runs as 'scheduled' and runNow as 'manual'; the run tool dispatches 'manual' end to end.
Invert resolveExecutorContext so only an explicit 'manual' trigger reads the ambient request context; every other value — including undefined from an untyped/test caller or a future trigger — falls through to the isolated owner/provenance path. Reading another tenant's ambient context now requires an explicit opt-in, so a new dispatch path that forgets to opt in cannot leak. Previously the safe (scheduled) branch was the explicit one and anything unrecognized fell into the ambient branch; since CI does not typecheck tests, an untyped caller passing undefined would have hit the ambient path. Fail-closed removes that edge.
|
Follow-up from QA review (commit e152969): made The resolver now reads the ambient request context only on an explicit Added a regression test asserting an unknown/undefined trigger does not read ambient context. On the other QA suggestion (identity hydration / |
Symptom
A scheduled automation owned by one workspace executed focused on a different workspace — discovering only the other workspace's tools, with zero matches for its own. A cross-tenant isolation bleed in the scheduled-execution path.
Root cause
getExecutorContext(src/tools/platform/automations.ts) resolved who a run acts as by preferring the ambientgetRequestContext()(AsyncLocalStorage) over the automation's ownownerId/workspaceId. Its comment assumed "a scheduled run has no request context." That assumption is violated:setTimeout(() => this.onTimer())(scheduler.ts).AsyncLocalStoragesnapshots propagate throughsetTimeout.reload()/runNow()re-arm the timer and are invoked from LLM tool handlers, which run insiderunWithRequestContext(<that chat's workspace>).onTimerthen fires inside B's context, andgetExecutorContextreads it — overriding the automation's real provenance. It's self-perpetuating (onTimer re-arms from inside the captured context), so once poisoned, every scheduled tick for any owner runs focused on B until the process restarts.runtime.executeTaskderives identity/workspace solely from the passedrequest(runtime.ts—resolveRequestOwnerId(request.identity),request.workspaceId) and establishes its own request context for the engine. So once the resolver is told the correct owner/workspace, the entire run is correctly scoped — fixing the resolver is sufficient; no ALS-clearing gymnastics needed.Fix
Make the run trigger explicit instead of inferring "scheduled" from the absence of an ambient context (an inference a leaked timer context silently defeats):
trigger: "scheduled" | "manual"throughScheduler.dispatchRun→Executor→createDirectExecutor'sgetContext.onTimerdispatches"scheduled";runNow(test button) dispatches"manual".executeDirectdefaults to"scheduled"(the isolation-safe direction).resolveExecutorContext(automation, trigger, reqCtx). Scheduled → automation owner + provenance workspace, ambient context ignored. Manual → dispatched synchronously inside the user's genuine request context, so it keeps using it (falling back to provenance).Tests
automations-executor-context.test.ts(new): scheduled runs ignore a leaked workspace-B context; manual runs use it; identity-scope and missing-owner edge cases.scheduler.test.ts"run trigger" block: timer dispatch is"scheduled",runNowis"manual".automation-e2e.test.ts: updated the mock executor to the 3-arg signature and assert theruntool dispatches"manual"end to end.bun run verify:staticand the full automations suite (208 tests) pass.Notes / follow-ups
{ id: ownerId }(noorgRole). This is the pre-existing scheduled-run contract — the no-ambient-context path always produced it; this change only removes the wrong-tenant bleed that previously substituted a foreign full identity. If scheduled runs should carry the owner's full role (e.g. for admin-only tools), hydrating the owner'sUserIdentityby id is a separate, orthogonal improvement.