diff --git a/packages/docs/docs.json b/packages/docs/docs.json index 00c1fa8d..05b972f9 100644 --- a/packages/docs/docs.json +++ b/packages/docs/docs.json @@ -40,6 +40,7 @@ "pages": [ "docs/sleeping", "docs/parallel-steps", + "docs/dynamic-steps", "docs/retries", "docs/type-safety", "docs/versioning", diff --git a/packages/docs/docs/dynamic-steps.mdx b/packages/docs/docs/dynamic-steps.mdx new file mode 100644 index 00000000..0165e462 --- /dev/null +++ b/packages/docs/docs/dynamic-steps.mdx @@ -0,0 +1,64 @@ +--- +title: Dynamic Steps +description: Run a variable number of steps based on runtime data +--- + +Sometimes you don't know how many steps a workflow needs until it runs. You +might need to fetch data for each item in a list, process rows from a query, +or fan out across a set of IDs from an API response. OpenWorkflow handles +this — you can create steps inside loops and maps, as long as each step has a +deterministic name. + +## Basic Pattern + +Map over your data and create a step per item using `Promise.all`: + +```ts +const results = await Promise.all( + input.items.map((item) => + step.run({ name: `fetch-data:${item.id}` }, async () => { + return await thirdPartyApi.fetch(item.id); + }), + ), +); +``` + +Each step is individually memoized. If the workflow restarts, completed steps +return their cached results and only the remaining steps re-execute. + +The most important rule: **step names must be deterministic across replays**. +Use a stable identifier from the data itself — like a database ID, a slug, or +a unique key: + +```ts +// Good — stable ID from the data +step.run({ name: `process-order:${order.id}` }, ...) +step.run({ name: `send-email:${user.email}` }, ...) + +// Bad — non-deterministic, different on every run +step.run({ name: `task-${Date.now()}` }, ...) +step.run({ name: `task-${crypto.randomUUID()}` }, ...) +``` + + + Non-deterministic names (timestamps, random values, request IDs) break replay. + Completed steps won't be found in history, causing them to re-execute. + + +### Falling Back to Array Indexes + +When no stable ID exists, you can use the array index: + +```ts +const results = await Promise.all( + input.items.map((item, index) => + step.run({ name: `fetch-data:${index}` }, async () => { + return await thirdPartyApi.fetch(item.lookupKey); + }), + ), +); +``` + +This is safe only if the array order is identical between the original run and +any replay. If the order changes, cached results get returned for the wrong +items. diff --git a/packages/docs/docs/steps.mdx b/packages/docs/docs/steps.mdx index f012b14b..896b8584 100644 --- a/packages/docs/docs/steps.mdx +++ b/packages/docs/docs/steps.mdx @@ -79,6 +79,9 @@ await step.run({ name: "step-1" }, ...); await step.run({ name: "step-2" }, ...); ``` +If you need to create a dynamic number of steps from runtime data (like +mapping over an array), see [Dynamic Steps](/docs/dynamic-steps). + Changing step names after workflows are in-flight can cause replay errors. Completed steps won't be found in the history, causing them to re-execute. To @@ -184,3 +187,24 @@ await step.run({ name: "do-everything" }, async () => { If an operation has no side effects and is fast to compute, consider whether it really needs to be a step. Pure computations can happen outside of steps. + +## Large Payloads + +Every step result is persisted to the database. If a step returns a large +payload, your workflow history can become heavy — especially when you have many +steps. + +A good pattern is to offload large data to external storage and return only a +reference: + +```ts +const data = await step.run({ name: "fetch-report" }, async () => { + const report = await analyticsApi.generate(input.reportId); + + const objectKey = `reports/${input.reportId}.json`; + await objectStore.put(objectKey, JSON.stringify(report)); + + // Store only the reference, not the full report + return { reportId: input.reportId, objectKey }; +}); +```