Skip to content

feat(closes OPEN-10635): Claude Agent SDK TypeScript integration#208

Open
viniciusdsmello wants to merge 21 commits into
mainfrom
vini/open-10633-integration-add-claude-agent-sdk-support
Open

feat(closes OPEN-10635): Claude Agent SDK TypeScript integration#208
viniciusdsmello wants to merge 21 commits into
mainfrom
vini/open-10633-integration-add-claude-agent-sdk-support

Conversation

@viniciusdsmello
Copy link
Copy Markdown
Contributor

@viniciusdsmello viniciusdsmello commented May 12, 2026

Summary

First-party Openlayer tracing for the Claude Agent SDK (@anthropic-ai/claude-agent-sdk on npm).

  • Drop-in: import { query } from "openlayer/lib/integrations/claudeAgentSdk" — same signature as the SDK's query, auto-traced.
  • traceClaudeAgentSdk() runtime-patch init for codebases that can't change imports; patches query and ClaudeSDKClient.prototype idempotently.
  • Hooks compose with user-provided hooks (PreToolUse / PostToolUse / PostToolUseFailure); we never replace.
  • Captures: root AGENT step per query(), nested CHAT_COMPLETION per assistant turn, TOOL per tool call. Subagents nest via parent_tool_use_id. MCP tools parsed (mcp__server__tool). Cost/tokens/duration/session_id/agent_config on root. Per-query state isolated via AsyncLocalStorage. Stream is a pure observer.
  • New subpath export ./integrations/claude-agent-sdk and @anthropic-ai/claude-agent-sdk declared as optional peerDependency (^0.2.111).

Companion Python PR at openlayer-ai/openlayer-python#641 (OPEN-10634).

Test plan

  • 15 unit tests pass: yarn jest tests/integrations/claudeAgentSdk.test.ts
  • Live integration test runs end-to-end against the real SDK (gated on ANTHROPIC_API_KEY)
  • Example script examples/tracing/claude-agent-sdk/claudeAgentSdkTracing.ts runs end-to-end via npx tsx
  • yarn build succeeds and produces correct ESM/CJS/types for the new subpath export
  • yarn lint clean

Plan deviations (documented in individual commits)

  1. @anthropic-ai/claude-agent-sdk is ESM-only; loader uses require() with a dynamic-import() fallback so Jest workers and older Node both work.
  2. Cost/tokens stored directly on the AgentStep instance because the base Step.log() filters unknown keys.
  3. Added tsx as a devDependency so the example script can be smoke-tested locally.

Closes OPEN-10635.

🤖 Generated with Claude Code

Sets up the test directory at tests/integrations/, adds claudeAgentSdkMocks.ts
with shaped fake messages mirroring @anthropic-ai/claude-agent-sdk's tagged
union message stream, and a placeholder test that fails because the
integration module does not exist yet (drives Task B3).
The claude-agent-sdk integration needs to open a step in one hook callback
(PreToolUse) and close it from another (PostToolUse / PostToolUseFailure),
which createStep's [step, endStep] pair already supports. Re-export it
under the underscore-prefixed _internalCreateStep / _internalGetCurrentStep
names to keep it out of the supported public API while still being importable
across our own integrations.
…okens

Implements the minimal happy path of the integration:

- ``tracedQuery({prompt, options, inferencePipelineId})`` opens a root
  ``AGENT`` step before delegating to the SDK's ``query()``, forwards
  every message yielded by the underlying stream unchanged, and closes
  the step when the iterator drains.
- ``SystemMessage(subtype="init")`` populates ``metadata.session_id``
  and ``metadata.agent_config`` (model, tools, MCP servers with env
  stripped, skills, plugins, permission mode, cwd, version).
- ``ResultMessage`` finalizes cost, prompt/completion/total tokens,
  duration, stop_reason, subtype, num_turns, and per-model usage.
- Observation failures are caught and logged so a tracing bug never
  breaks the user's stream (the wrapper is a pure observer invariant).

Tests cover the root-step shape end-to-end and assert the
passthrough invariant: identical messages, identical order, same
object references.
Each ``AssistantMessage`` becomes a nested ``CHAT_COMPLETION`` step under
whichever step is currently top-of-stack: the root ``AGENT`` for
top-level turns, or the spawning Agent ``ToolStep`` for subagent turns
once Task B5's hook-driven tool steps keep that step open across the
subagent's stream. Captures concatenated text (output), thinking blocks
(metadata.thinking when ``captureThinking`` is on), ToolUseBlock IDs
(metadata.tool_calls), stop_reason, parent_tool_use_id, and per-turn
token usage.
…se hooks

Adds an internal trio of hook callbacks (PreToolUse / PostToolUse /
PostToolUseFailure) that bracket each tool invocation with an Openlayer
TOOL step. Hooks are merged into ``options.hooks`` via ``injectHooks``
without replacing any user matchers — both run alongside each other.

Concurrent ``tracedQuery()`` calls each get their own state via
``AsyncLocalStorage``, so a hook callback fired during one query's stream
sees only that query's pending-tool map.

The TOOL step carries the parsed ``mcp_server`` / ``mcp_tool_name`` for
``mcp__<server>__<tool>``-namespaced tools, the raw ``tool_input`` from
PreToolUse, the truncated ``tool_response`` from PostToolUse,
``is_error``, and a wall-clock ``latency_ms``.

Because PreToolUse opens the step before yielding the next message and
PostToolUse closes it after the tool body completes, the step stack
correctly contains the Agent ToolStep while subagent assistant messages
stream — so subagent turns nest under the spawning Agent step
automatically.
Adds three coverage tests:

- ``mcp__<server>__<tool>`` tool names get parsed into ``mcp_server`` and
  ``mcp_tool_name`` metadata on the TOOL step (the underscores in the
  tool portion of the name are preserved).
- Subagent assistant turns (messages with ``parent_tool_use_id`` set)
  nest under their spawning Agent ToolStep, not under the root AGENT —
  verified by walking the trace and asserting the agent-tool's nested
  steps list contains the subagent chats.
- ``ResultMessage(subtype="error_max_turns", is_error=true)`` propagates
  to ``root.metadata.subtype`` / ``root.metadata.is_error`` so the
  Openlayer dashboard can surface failed runs.
When a user passes their own ``options.hooks.PreToolUse``, our internal
PreToolUse matcher is appended (not substituted) so both fire on every
tool call. The user retains full control of the permission flow — they
can still return ``permissionDecision: "deny"`` or any other SDK hook
output — while Openlayer captures the tool call in parallel.
When ``redactMcpEnv`` (default true) is on, MCP server configs in
``metadata.agent_config.mcp_servers`` have ``env``, ``headers``, and
``authorization`` fields stripped. Non-sensitive fields (name, status,
transport, url, command) are preserved.
…port

Adds the two public entry points beyond ``tracedQuery``:

- ``query`` is exported as a drop-in for codebases that just want to
  swap ``import { query } from "@anthropic-ai/claude-agent-sdk"`` for
  ``import { query } from "@openlayer/sdk/integrations/claude-agent-sdk"``.
- ``traceClaudeAgentSdk(config?)`` mutates the SDK module's ``query`` (and
  ``ClaudeSDKClient.prototype.query`` / ``.receive_response`` if the
  class is present) at runtime so existing imports get auto-traced
  without code changes. Idempotent — calling it more than once only
  patches once but always refreshes the tunable config.

Both flag the patched function with ``_openlayerPatched = true`` and
preserve a reference to the original via ``_openlayerOriginal`` so
double-patching is detectable and the patch is in principle reversible.
- ``./integrations/claude-agent-sdk`` is exposed as a package subpath
  export so users can ``import { query, traceClaudeAgentSdk } from
  "@openlayer/sdk/integrations/claude-agent-sdk"``. The export entry
  covers ``import`` (ESM), ``require`` (CJS), and ``types`` to match
  the dual-emit shape of the rest of the package.
- ``@anthropic-ai/claude-agent-sdk`` becomes an optional peer
  dependency at ``^0.2.111`` so users who don't use this integration
  don't have to install it, but those who do get version compat
  warnings on mismatched majors. It's also installed as a
  devDependency so live tests and ``yarn build`` find type
  definitions during development.
- ``src/lib/integrations/index.ts`` re-exports the new module to
  match the existing pattern.
- Minor TS fixes flagged by ``tsc --exactOptionalPropertyTypes``:
  drop the explicit ``undefined`` initializer on a removed-when-unused
  field and cast ChatCompletionStep through ``any`` when calling
  ``.log()`` so subclass fields are accepted.
The wrapper is a pure observer — adds a stronger invariant test that
constructs a full system/assistant/tool_use/user/result stream
(including hook callbacks) and asserts every yielded message is the
same object reference as what the SDK produced, in the same order.
…eAgentSdk()

Adds a test that fakes a ``ClaudeSDKClient`` class on the (virtually
mocked) SDK module, calls ``traceClaudeAgentSdk()``, and exercises
``client.query()`` + ``client.receive_response()`` to confirm the
prototype patch installed in B9 emits a root AGENT trace with the
session metadata threaded through.
…API_KEY

The test skips unless ``ANTHROPIC_API_KEY`` is set. It boots the real
SDK (which spawns the bundled Claude Code subprocess), runs a tiny
one-turn ``claude-haiku-4-5`` query, and asserts the wrapper saw a
``result`` message with the expected content.

Implementation notes:

- ``@anthropic-ai/claude-agent-sdk`` ships ESM-only (``"type": "module"``)
  and uses ``import.meta.url``, so the @swc/jest CJS transform cannot
  load it from a ``require()``-style test. The live test file therefore
  uses an ESM ``import`` and must be run via
  ``NODE_OPTIONS=--experimental-vm-modules ./node_modules/.bin/jest``.
- The integration's internal SDK loader now tries ``require()`` first
  (fast path for the virtual jest mock + CJS consumers + Node 22.12+
  native ESM-require) and falls back to dynamic ``import()`` so callers
  on older Node or pure-CJS Jest setups still get a clear path. Unit
  tests stay on the synchronous require path via their virtual mock.

Verified locally that the wrapper publishes a trace to Openlayer
end-to-end; the supplied ANTHROPIC_API_KEY was rejected by Anthropic
(``invalid x-api-key``), so the test's content assertion fails on
that environment — not an integration issue. With a valid key, the
test passes.
Adds ``examples/tracing/claude-agent-sdk/claudeAgentSdkTracing.ts`` —
a concise runnable example showing both ways to use the integration:

- Option A: drop-in ``import { query } from
  "openlayer/lib/integrations/claudeAgentSdk"`` (recommended).
- Option B: explicit ``traceClaudeAgentSdk()`` runtime patch for
  codebases that can't change their imports.

Covers two scenarios: a code-search query with ``Read``/``Glob``/``Grep``
tools, and a subagent dispatch via the SDK's ``Agent`` tool to
illustrate parent_tool_use_id-based nesting.

Also adds ``tsx`` as a dev dependency so the example can be verified
locally with ``npx tsx``; the script's import paths resolve via the
package's existing ``./*`` subpath fallback.
…able directives

The project's ESLint config no longer flags ``require()`` in test
contexts, so all the per-call ``no-require-imports`` opt-outs are
unnecessary. Also runs prettier on the integration source, tests,
and example script to bring them in line with the project's style.

No behavior changes.
viniciusdsmello and others added 6 commits May 12, 2026 13:35
… title

The root AGENT step title was being built from the prompt content
("claude-agent-sdk: Say the word 'banana'..."), making the trace
sidebar in Openlayer noisy and inconsistent across runs. Use the
stable name "Claude Agent SDK query" instead. The prompt content is
still captured in root_step.inputs.prompt where it belongs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tearing down

The Openlayer TS tracer publishes traces via a fire-and-forget `.then()`
after `endRootStep()` returns. When the live test finished and Jest
started tearing down, the publish callback's `console.debug('Trace
uploaded successfully to Openlayer')` tripped Jest's 'Cannot log after
tests are done' guard, exiting non-zero even though the trace had been
uploaded successfully.

Add a 3s flush wait at the end of the test so the publish callback
completes inside the test's lifetime. Test-only; production code stays
fire-and-forget as designed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts on root metadata

The root AGENT step was missing the user-provided system prompt and
subagent definitions that drove the run. Spec called for both, but the
initial implementation only captured the SDK's runtime-resolved
agent_config from SystemMessage(init), not the user's input options.

Capture, on the root step's metadata:
  - system_prompt (truncated to 4096 chars; supports string and
    preset object shapes)
  - agents_defined: { name -> { description, prompt, tools, model } }
  - options: { model, fallbackModel, maxTurns, maxBudgetUsd,
    permissionMode, cwd, allowedTools, disallowedTools, continue,
    resume, forkSession }

For ClaudeSDKClient, stash a shallow clone of the user's options on
the first patched query() call so they're available to receive_response()
before our hook injection mutates them in place.

New test: 'captures options.systemPrompt and options.agents on the
root step metadata'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… to exercise options-metadata capture

The live test now passes systemPrompt and maxTurns so the published
Openlayer trace surfaces the new metadata captured on the root step
(system_prompt, options.maxTurns), proving end-to-end that the wrapper
captures the user's configuration in addition to the SDK's
runtime-resolved agent_config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase token aliases on every step

Previously the trace published to Openlayer was missing key visibility:

- Assistant turns with empty text content (e.g. thinking-only turns or
  pure tool-call turns) appeared blank in the UI. Fall back to a tool
  call summary, thinking text, or '[no content]' marker so reviewers
  always see something useful.
- Top-level assistant turns had no 'inputs' set, so the UI couldn't
  show what prompt triggered them. Surface the user's prompt as the
  step input for top-level turns (subagent turns are driven by their
  parent's Agent tool call, not a user prompt).
- The raw assistant message content was never serialized. Stash it in
  metadata.rawOutput (TS ChatCompletionStep has no first-class
  rawOutput field) so users can inspect the full block array.
- ToolUseBlocks were captured by id only; widen to { id, name, input }
  so tool calls are inspectable from the assistant turn.
- Root step had no rawOutput surface; stash a JSON-serialized
  ResultMessage in metadata.rawOutput.
- Capture state.model from SystemMessage(init) and surface it in root
  metadata so reviewers see the resolved model.

Adjusted test_claude_agent_sdk.test.ts to match the richer tool_calls
shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a richer example that exercises every step type the Openlayer
wrapper captures, so users can see a full trace tree:

- Root AGENT step with systemPrompt, agent_config, agents_defined,
  options, rawOutput.
- Per-turn CHAT_COMPLETION steps with prompt/completion tokens,
  thinking blocks, tool_calls, rawOutput.
- TOOL steps for:
    - mcp__file-stats__count_files (custom in-process MCP tool, built
      with createSdkMcpServer + tool() and a zod schema).
    - Glob and Read (built-in).
    - Agent (twice: code-reviewer and summary-writer subagents).
- Nested subagent steps correlated via parent_tool_use_id.

The 'codebase analyzer' scenario walks the agent through three steps:
count files by extension, dispatch a code-reviewer subagent to review
one file, then dispatch a summary-writer subagent to wrap up. Result
is a 4-line markdown report.

Verified end-to-end against the live API:
  - 5 turns, 10 assistant turn steps captured
  - Both subagents dispatched, real $0.14 cost
  - Trace uploaded to Openlayer with full step tree

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant