Skip to content

Emit OTEL Traces for workflow executions #501

@joeruello

Description

@joeruello

Hi there! As discussed on discord, this is a proposal/design to add OTel traces to the SDK so workflow runs and step attempts show up alongside the rest of a user's traces, with trace context + baggage flowing across replays and child workflows.

The proposal covers workflow traces only, does not include OTEL for metrics, logging, backend spans, DB migrations etc.

Traces

Example trace

A process-order workflow that calls a child workflow, waits up to an hour for a confirmation signal, cools down for 30 seconds, and then notifies the customer (with a retry). The parent runs across five executions; the child finishes in one.

defineWorkflow({ name: "process-order" }, async ({ step, input }) => {
  await step.run({ name: "validate-order" }, () => { ... });
  await step.runWorkflow(fulfillShipment, { orderId: input.orderId });
  await step.waitForSignal({
    signal: `confirmed:${input.orderId}`,
    timeout: "1h",
  });
  await step.sleep("cooldown", "30s");
  await step.run({ name: "notify-customer" }, () => { ... });
});

Everything lives under the trace_id of the caller that invoked the workflow.

app.orders.handler                                  caller's HTTP span
└── workflow.start process-order                    creates run-A
    │
    ├── workflow.execute process-order              [run-A execution 1]
    │   ├── step.run validate-order
    │   └── step.run_workflow fulfill-shipment      creates run-B, parks
    │       └── workflow.execute fulfill-shipment   [run-B execution 1]
    │           ├── step.run reserve-inventory
    │           └── step.run dispatch-courier
    │
    ├── workflow.execute process-order              [run-A execution 2, replay=true]
    │   ├── step.run_workflow fulfill-shipment      child done, OK
    │   └── step.wait_for_signal                 parks until signal or timeout
    │
    ├── workflow.execute process-order              [run-A execution 3, replay=true]
    │   ├── step.wait_for_signal                 signal received, OK
    │   └── step.sleep cooldown                     parks until t+30s
    │
    ├── workflow.execute process-order              [run-A execution 4, replay=true]   ERROR
    │   └── step.run notify-customer                fails, schedules retry
    │
    └── workflow.execute process-order              [run-A execution 5, replay=true]
        └── step.run notify-customer                prior_failures=1, OK

Things to note:

  • All executions of a run are siblings under the original workflow.start span, not nested in each other. The carrier captured at run creation is what every execution extracts.
  • Replayed steps emit no spans. Execution 2 doesn't re-emit validate-order, and execution 5 only re-emits notify-customer because that's the one that failed and needs another attempt.
  • step.wait_for_signal emits twice — once when set up (execution 2) and once when the signal is observed (execution 3). step.sleep only emits when set up (execution 3); the resume is harness-driven and silent.

Decisions

I've made the following initial decisions, happy to be challenged on any of them :)

Caller is the remote parent of every execution.
All executions share the caller's trace ID, People would expect "this request kicked off this workflow" to render as parent/child in their trace viewer.

Long sleeps produce late spans.
A workflow that sleeps for a week produces spans a week after the caller's span ended. Backends handle this gracefully but the trace view may split. workflow_run.id is on every span for cross-trace queries.

Carrier lives in workflow_runs.context.telemetry.
We use propagation.inject/extract, so traceparent + tracestate + baggage all ride along, dispatched to whatever propagators the user has registered. (I think this is the column you mentioned in discord?). Context is captured at the time when the workflow is invoked, in the above example:

// run-A.context.telemetry
{
  "traceparent": "00-<trace_id>-<caller_span_id>-01",
  "baggage": "tenant.id=acme,user.id=123"
}

// run-B.context.telemetry
{
  "traceparent": "00-<trace_id>-<run_A_step_run_workflow_span_id>-01",
  "baggage": "tenant.id=acme,user.id=123"
}

Replayed steps don't emit spans.
Cached history hits would just duplicate the work that already showed up on an earlier execution.

Spans

Span names are <operation> <name>, where <name> is the workflow or step
name.

Span When
workflow.start <workflow.name> client.runWorkflow() is called — wraps carrier injection + run creation
workflow.execute <workflow.name> Once per execution
step.run <step.name> A step.run actually runs (not a cache hit)
step.sleep <step.name> First encounter — creates the attempt, ends OK
step.run_workflow <step.name> First encounter (creates child) and on the execution that observes the child finish
step.wait_for_signal (or step.wait_for_signal <step.name> when user-supplied) First encounter (creates the attempt) and on the execution that observes delivery or timeout .

Attributes

Common: workflow.name, workflow.version, workflow_run.id, workflow_run.namespace_id,
workflow_run.attempt, workflow_run.replay, worker.id.

On step spans: step.name, step.kind (function / sleep / workflow /
signal-wait), step_attempt.id, step_attempt.prior_failures.

Kind-specific:

  • sleep adds step.resume_at
  • run_workflow adds child_workflow.name, child_workflow_run.id, step.timeout_at
  • wait_for_signal adds signal.name, step.timeout_at.

Inputs, outputs, signal data etc are not on spans as these any may PII that we don't
want to ship to user's vendors without their knowledge.

Status

  • workflow.executeOK on completion or park; ERROR on fail or cancel.
  • step.*OK on success or first-encounter park; ERROR +recordException on failure.
  • A run_workflow resolution is ERROR if the child failed, canceled, or timed out.

Questions

  • Do we want to have a hard dependency on @opentelemetry/api in the core, or do we want to ship a @openworkflow/otel sidecar package that wires all this up with some kind of middleware/interceptor pattern? Guidance here would be appreciated.

I've been looking for something like openworkflow for a while now, so you thank you for actually making it happen. Any feedback is much appreciated. Disclaimer that I went back and forth with Opus 4.7 for a couple of hours to generate above, I've tried to verify any assumptions etc it has made but apologies if I missed anything 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions