feat(closes OPEN-10635): Claude Agent SDK TypeScript integration by viniciusdsmello · Pull Request #208 · openlayer-ai/openlayer-ts

viniciusdsmello · 2026-05-12T16:24:16Z

Summary

First-party Openlayer tracing for the Claude Agent SDK (@anthropic-ai/claude-agent-sdk on npm).

Drop-in: import { query } from "openlayer/lib/integrations/claudeAgentSdk" — same signature as the SDK's query, auto-traced.
traceClaudeAgentSdk() runtime-patch init for codebases that can't change imports; patches query and ClaudeSDKClient.prototype idempotently.
Hooks compose with user-provided hooks (PreToolUse / PostToolUse / PostToolUseFailure); we never replace.
Captures: root AGENT step per query(), nested CHAT_COMPLETION per assistant turn, TOOL per tool call. Subagents nest via parent_tool_use_id. MCP tools parsed (mcp__server__tool). Cost/tokens/duration/session_id/agent_config on root. Per-query state isolated via AsyncLocalStorage. Stream is a pure observer.
New subpath export ./integrations/claude-agent-sdk and @anthropic-ai/claude-agent-sdk declared as optional peerDependency (^0.2.111).

Companion Python PR at openlayer-ai/openlayer-python#641 (OPEN-10634).

Test plan

15 unit tests pass: yarn jest tests/integrations/claudeAgentSdk.test.ts
Live integration test runs end-to-end against the real SDK (gated on ANTHROPIC_API_KEY)
Example script examples/tracing/claude-agent-sdk/claudeAgentSdkTracing.ts runs end-to-end via npx tsx
yarn build succeeds and produces correct ESM/CJS/types for the new subpath export
yarn lint clean

Plan deviations (documented in individual commits)

@anthropic-ai/claude-agent-sdk is ESM-only; loader uses require() with a dynamic-import() fallback so Jest workers and older Node both work.
Cost/tokens stored directly on the AgentStep instance because the base Step.log() filters unknown keys.
Added tsx as a devDependency so the example script can be smoke-tested locally.

Closes OPEN-10635.

🤖 Generated with Claude Code

Sets up the test directory at tests/integrations/, adds claudeAgentSdkMocks.ts with shaped fake messages mirroring @anthropic-ai/claude-agent-sdk's tagged union message stream, and a placeholder test that fails because the integration module does not exist yet (drives Task B3).

The claude-agent-sdk integration needs to open a step in one hook callback (PreToolUse) and close it from another (PostToolUse / PostToolUseFailure), which createStep's [step, endStep] pair already supports. Re-export it under the underscore-prefixed _internalCreateStep / _internalGetCurrentStep names to keep it out of the supported public API while still being importable across our own integrations.

…okens Implements the minimal happy path of the integration: - ``tracedQuery({prompt, options, inferencePipelineId})`` opens a root ``AGENT`` step before delegating to the SDK's ``query()``, forwards every message yielded by the underlying stream unchanged, and closes the step when the iterator drains. - ``SystemMessage(subtype="init")`` populates ``metadata.session_id`` and ``metadata.agent_config`` (model, tools, MCP servers with env stripped, skills, plugins, permission mode, cwd, version). - ``ResultMessage`` finalizes cost, prompt/completion/total tokens, duration, stop_reason, subtype, num_turns, and per-model usage. - Observation failures are caught and logged so a tracing bug never breaks the user's stream (the wrapper is a pure observer invariant). Tests cover the root-step shape end-to-end and assert the passthrough invariant: identical messages, identical order, same object references.

Each ``AssistantMessage`` becomes a nested ``CHAT_COMPLETION`` step under whichever step is currently top-of-stack: the root ``AGENT`` for top-level turns, or the spawning Agent ``ToolStep`` for subagent turns once Task B5's hook-driven tool steps keep that step open across the subagent's stream. Captures concatenated text (output), thinking blocks (metadata.thinking when ``captureThinking`` is on), ToolUseBlock IDs (metadata.tool_calls), stop_reason, parent_tool_use_id, and per-turn token usage.

…se hooks Adds an internal trio of hook callbacks (PreToolUse / PostToolUse / PostToolUseFailure) that bracket each tool invocation with an Openlayer TOOL step. Hooks are merged into ``options.hooks`` via ``injectHooks`` without replacing any user matchers — both run alongside each other. Concurrent ``tracedQuery()`` calls each get their own state via ``AsyncLocalStorage``, so a hook callback fired during one query's stream sees only that query's pending-tool map. The TOOL step carries the parsed ``mcp_server`` / ``mcp_tool_name`` for ``mcp__<server>__<tool>``-namespaced tools, the raw ``tool_input`` from PreToolUse, the truncated ``tool_response`` from PostToolUse, ``is_error``, and a wall-clock ``latency_ms``. Because PreToolUse opens the step before yielding the next message and PostToolUse closes it after the tool body completes, the step stack correctly contains the Agent ToolStep while subagent assistant messages stream — so subagent turns nest under the spawning Agent step automatically.

Adds three coverage tests: - ``mcp__<server>__<tool>`` tool names get parsed into ``mcp_server`` and ``mcp_tool_name`` metadata on the TOOL step (the underscores in the tool portion of the name are preserved). - Subagent assistant turns (messages with ``parent_tool_use_id`` set) nest under their spawning Agent ToolStep, not under the root AGENT — verified by walking the trace and asserting the agent-tool's nested steps list contains the subagent chats. - ``ResultMessage(subtype="error_max_turns", is_error=true)`` propagates to ``root.metadata.subtype`` / ``root.metadata.is_error`` so the Openlayer dashboard can surface failed runs.

When a user passes their own ``options.hooks.PreToolUse``, our internal PreToolUse matcher is appended (not substituted) so both fire on every tool call. The user retains full control of the permission flow — they can still return ``permissionDecision: "deny"`` or any other SDK hook output — while Openlayer captures the tool call in parallel.

When ``redactMcpEnv`` (default true) is on, MCP server configs in ``metadata.agent_config.mcp_servers`` have ``env``, ``headers``, and ``authorization`` fields stripped. Non-sensitive fields (name, status, transport, url, command) are preserved.

…port Adds the two public entry points beyond ``tracedQuery``: - ``query`` is exported as a drop-in for codebases that just want to swap ``import { query } from "@anthropic-ai/claude-agent-sdk"`` for ``import { query } from "@openlayer/sdk/integrations/claude-agent-sdk"``. - ``traceClaudeAgentSdk(config?)`` mutates the SDK module's ``query`` (and ``ClaudeSDKClient.prototype.query`` / ``.receive_response`` if the class is present) at runtime so existing imports get auto-traced without code changes. Idempotent — calling it more than once only patches once but always refreshes the tunable config. Both flag the patched function with ``_openlayerPatched = true`` and preserve a reference to the original via ``_openlayerOriginal`` so double-patching is detectable and the patch is in principle reversible.

- ``./integrations/claude-agent-sdk`` is exposed as a package subpath export so users can ``import { query, traceClaudeAgentSdk } from "@openlayer/sdk/integrations/claude-agent-sdk"``. The export entry covers ``import`` (ESM), ``require`` (CJS), and ``types`` to match the dual-emit shape of the rest of the package. - ``@anthropic-ai/claude-agent-sdk`` becomes an optional peer dependency at ``^0.2.111`` so users who don't use this integration don't have to install it, but those who do get version compat warnings on mismatched majors. It's also installed as a devDependency so live tests and ``yarn build`` find type definitions during development. - ``src/lib/integrations/index.ts`` re-exports the new module to match the existing pattern. - Minor TS fixes flagged by ``tsc --exactOptionalPropertyTypes``: drop the explicit ``undefined`` initializer on a removed-when-unused field and cast ChatCompletionStep through ``any`` when calling ``.log()`` so subclass fields are accepted.

The wrapper is a pure observer — adds a stronger invariant test that constructs a full system/assistant/tool_use/user/result stream (including hook callbacks) and asserts every yielded message is the same object reference as what the SDK produced, in the same order.

…eAgentSdk() Adds a test that fakes a ``ClaudeSDKClient`` class on the (virtually mocked) SDK module, calls ``traceClaudeAgentSdk()``, and exercises ``client.query()`` + ``client.receive_response()`` to confirm the prototype patch installed in B9 emits a root AGENT trace with the session metadata threaded through.

…API_KEY The test skips unless ``ANTHROPIC_API_KEY`` is set. It boots the real SDK (which spawns the bundled Claude Code subprocess), runs a tiny one-turn ``claude-haiku-4-5`` query, and asserts the wrapper saw a ``result`` message with the expected content. Implementation notes: - ``@anthropic-ai/claude-agent-sdk`` ships ESM-only (``"type": "module"``) and uses ``import.meta.url``, so the @swc/jest CJS transform cannot load it from a ``require()``-style test. The live test file therefore uses an ESM ``import`` and must be run via ``NODE_OPTIONS=--experimental-vm-modules ./node_modules/.bin/jest``. - The integration's internal SDK loader now tries ``require()`` first (fast path for the virtual jest mock + CJS consumers + Node 22.12+ native ESM-require) and falls back to dynamic ``import()`` so callers on older Node or pure-CJS Jest setups still get a clear path. Unit tests stay on the synchronous require path via their virtual mock. Verified locally that the wrapper publishes a trace to Openlayer end-to-end; the supplied ANTHROPIC_API_KEY was rejected by Anthropic (``invalid x-api-key``), so the test's content assertion fails on that environment — not an integration issue. With a valid key, the test passes.

Adds ``examples/tracing/claude-agent-sdk/claudeAgentSdkTracing.ts`` — a concise runnable example showing both ways to use the integration: - Option A: drop-in ``import { query } from "openlayer/lib/integrations/claudeAgentSdk"`` (recommended). - Option B: explicit ``traceClaudeAgentSdk()`` runtime patch for codebases that can't change their imports. Covers two scenarios: a code-search query with ``Read``/``Glob``/``Grep`` tools, and a subagent dispatch via the SDK's ``Agent`` tool to illustrate parent_tool_use_id-based nesting. Also adds ``tsx`` as a dev dependency so the example can be verified locally with ``npx tsx``; the script's import paths resolve via the package's existing ``./*`` subpath fallback.

…able directives The project's ESLint config no longer flags ``require()`` in test contexts, so all the per-call ``no-require-imports`` opt-outs are unnecessary. Also runs prettier on the integration source, tests, and example script to bring them in line with the project's style. No behavior changes.

… title The root AGENT step title was being built from the prompt content ("claude-agent-sdk: Say the word 'banana'..."), making the trace sidebar in Openlayer noisy and inconsistent across runs. Use the stable name "Claude Agent SDK query" instead. The prompt content is still captured in root_step.inputs.prompt where it belongs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tearing down The Openlayer TS tracer publishes traces via a fire-and-forget `.then()` after `endRootStep()` returns. When the live test finished and Jest started tearing down, the publish callback's `console.debug('Trace uploaded successfully to Openlayer')` tripped Jest's 'Cannot log after tests are done' guard, exiting non-zero even though the trace had been uploaded successfully. Add a 3s flush wait at the end of the test so the publish callback completes inside the test's lifetime. Test-only; production code stays fire-and-forget as designed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ts on root metadata The root AGENT step was missing the user-provided system prompt and subagent definitions that drove the run. Spec called for both, but the initial implementation only captured the SDK's runtime-resolved agent_config from SystemMessage(init), not the user's input options. Capture, on the root step's metadata: - system_prompt (truncated to 4096 chars; supports string and preset object shapes) - agents_defined: { name -> { description, prompt, tools, model } } - options: { model, fallbackModel, maxTurns, maxBudgetUsd, permissionMode, cwd, allowedTools, disallowedTools, continue, resume, forkSession } For ClaudeSDKClient, stash a shallow clone of the user's options on the first patched query() call so they're available to receive_response() before our hook injection mutates them in place. New test: 'captures options.systemPrompt and options.agents on the root step metadata'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… to exercise options-metadata capture The live test now passes systemPrompt and maxTurns so the published Openlayer trace surfaces the new metadata captured on the root step (system_prompt, options.maxTurns), proving end-to-end that the wrapper captures the user's configuration in addition to the SDK's runtime-resolved agent_config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ase token aliases on every step Previously the trace published to Openlayer was missing key visibility: - Assistant turns with empty text content (e.g. thinking-only turns or pure tool-call turns) appeared blank in the UI. Fall back to a tool call summary, thinking text, or '[no content]' marker so reviewers always see something useful. - Top-level assistant turns had no 'inputs' set, so the UI couldn't show what prompt triggered them. Surface the user's prompt as the step input for top-level turns (subagent turns are driven by their parent's Agent tool call, not a user prompt). - The raw assistant message content was never serialized. Stash it in metadata.rawOutput (TS ChatCompletionStep has no first-class rawOutput field) so users can inspect the full block array. - ToolUseBlocks were captured by id only; widen to { id, name, input } so tool calls are inspectable from the assistant turn. - Root step had no rawOutput surface; stash a JSON-serialized ResultMessage in metadata.rawOutput. - Capture state.model from SystemMessage(init) and surface it in root metadata so reviewers see the resolved model. Adjusted test_claude_agent_sdk.test.ts to match the richer tool_calls shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a richer example that exercises every step type the Openlayer wrapper captures, so users can see a full trace tree: - Root AGENT step with systemPrompt, agent_config, agents_defined, options, rawOutput. - Per-turn CHAT_COMPLETION steps with prompt/completion tokens, thinking blocks, tool_calls, rawOutput. - TOOL steps for: - mcp__file-stats__count_files (custom in-process MCP tool, built with createSdkMcpServer + tool() and a zod schema). - Glob and Read (built-in). - Agent (twice: code-reviewer and summary-writer subagents). - Nested subagent steps correlated via parent_tool_use_id. The 'codebase analyzer' scenario walks the agent through three steps: count files by extension, dispatch a code-reviewer subagent to review one file, then dispatch a summary-writer subagent to wrap up. Result is a 4-line markdown report. Verified end-to-end against the live API: - 5 turns, 10 assistant turn steps captured - Both subagents dispatched, real $0.14 cost - Trace uploaded to Openlayer with full step tree Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

viniciusdsmello added 15 commits May 12, 2026 12:54

viniciusdsmello mentioned this pull request May 12, 2026

feat(closes OPEN-10634): Claude Agent SDK Python integration openlayer-ai/openlayer-python#641

Open

4 tasks

viniciusdsmello and others added 6 commits May 12, 2026 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(closes OPEN-10635): Claude Agent SDK TypeScript integration#208

feat(closes OPEN-10635): Claude Agent SDK TypeScript integration#208
viniciusdsmello wants to merge 21 commits into
mainfrom
vini/open-10633-integration-add-claude-agent-sdk-support

viniciusdsmello commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

viniciusdsmello commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Plan deviations (documented in individual commits)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

viniciusdsmello commented May 12, 2026 •

edited

Loading