feat(closes OPEN-10635): Claude Agent SDK TypeScript integration#208
Open
viniciusdsmello wants to merge 21 commits into
Open
feat(closes OPEN-10635): Claude Agent SDK TypeScript integration#208viniciusdsmello wants to merge 21 commits into
viniciusdsmello wants to merge 21 commits into
Conversation
Sets up the test directory at tests/integrations/, adds claudeAgentSdkMocks.ts with shaped fake messages mirroring @anthropic-ai/claude-agent-sdk's tagged union message stream, and a placeholder test that fails because the integration module does not exist yet (drives Task B3).
The claude-agent-sdk integration needs to open a step in one hook callback (PreToolUse) and close it from another (PostToolUse / PostToolUseFailure), which createStep's [step, endStep] pair already supports. Re-export it under the underscore-prefixed _internalCreateStep / _internalGetCurrentStep names to keep it out of the supported public API while still being importable across our own integrations.
…okens
Implements the minimal happy path of the integration:
- ``tracedQuery({prompt, options, inferencePipelineId})`` opens a root
``AGENT`` step before delegating to the SDK's ``query()``, forwards
every message yielded by the underlying stream unchanged, and closes
the step when the iterator drains.
- ``SystemMessage(subtype="init")`` populates ``metadata.session_id``
and ``metadata.agent_config`` (model, tools, MCP servers with env
stripped, skills, plugins, permission mode, cwd, version).
- ``ResultMessage`` finalizes cost, prompt/completion/total tokens,
duration, stop_reason, subtype, num_turns, and per-model usage.
- Observation failures are caught and logged so a tracing bug never
breaks the user's stream (the wrapper is a pure observer invariant).
Tests cover the root-step shape end-to-end and assert the
passthrough invariant: identical messages, identical order, same
object references.
Each ``AssistantMessage`` becomes a nested ``CHAT_COMPLETION`` step under whichever step is currently top-of-stack: the root ``AGENT`` for top-level turns, or the spawning Agent ``ToolStep`` for subagent turns once Task B5's hook-driven tool steps keep that step open across the subagent's stream. Captures concatenated text (output), thinking blocks (metadata.thinking when ``captureThinking`` is on), ToolUseBlock IDs (metadata.tool_calls), stop_reason, parent_tool_use_id, and per-turn token usage.
…se hooks Adds an internal trio of hook callbacks (PreToolUse / PostToolUse / PostToolUseFailure) that bracket each tool invocation with an Openlayer TOOL step. Hooks are merged into ``options.hooks`` via ``injectHooks`` without replacing any user matchers — both run alongside each other. Concurrent ``tracedQuery()`` calls each get their own state via ``AsyncLocalStorage``, so a hook callback fired during one query's stream sees only that query's pending-tool map. The TOOL step carries the parsed ``mcp_server`` / ``mcp_tool_name`` for ``mcp__<server>__<tool>``-namespaced tools, the raw ``tool_input`` from PreToolUse, the truncated ``tool_response`` from PostToolUse, ``is_error``, and a wall-clock ``latency_ms``. Because PreToolUse opens the step before yielding the next message and PostToolUse closes it after the tool body completes, the step stack correctly contains the Agent ToolStep while subagent assistant messages stream — so subagent turns nest under the spawning Agent step automatically.
Adds three coverage tests: - ``mcp__<server>__<tool>`` tool names get parsed into ``mcp_server`` and ``mcp_tool_name`` metadata on the TOOL step (the underscores in the tool portion of the name are preserved). - Subagent assistant turns (messages with ``parent_tool_use_id`` set) nest under their spawning Agent ToolStep, not under the root AGENT — verified by walking the trace and asserting the agent-tool's nested steps list contains the subagent chats. - ``ResultMessage(subtype="error_max_turns", is_error=true)`` propagates to ``root.metadata.subtype`` / ``root.metadata.is_error`` so the Openlayer dashboard can surface failed runs.
When a user passes their own ``options.hooks.PreToolUse``, our internal PreToolUse matcher is appended (not substituted) so both fire on every tool call. The user retains full control of the permission flow — they can still return ``permissionDecision: "deny"`` or any other SDK hook output — while Openlayer captures the tool call in parallel.
When ``redactMcpEnv`` (default true) is on, MCP server configs in ``metadata.agent_config.mcp_servers`` have ``env``, ``headers``, and ``authorization`` fields stripped. Non-sensitive fields (name, status, transport, url, command) are preserved.
…port
Adds the two public entry points beyond ``tracedQuery``:
- ``query`` is exported as a drop-in for codebases that just want to
swap ``import { query } from "@anthropic-ai/claude-agent-sdk"`` for
``import { query } from "@openlayer/sdk/integrations/claude-agent-sdk"``.
- ``traceClaudeAgentSdk(config?)`` mutates the SDK module's ``query`` (and
``ClaudeSDKClient.prototype.query`` / ``.receive_response`` if the
class is present) at runtime so existing imports get auto-traced
without code changes. Idempotent — calling it more than once only
patches once but always refreshes the tunable config.
Both flag the patched function with ``_openlayerPatched = true`` and
preserve a reference to the original via ``_openlayerOriginal`` so
double-patching is detectable and the patch is in principle reversible.
- ``./integrations/claude-agent-sdk`` is exposed as a package subpath
export so users can ``import { query, traceClaudeAgentSdk } from
"@openlayer/sdk/integrations/claude-agent-sdk"``. The export entry
covers ``import`` (ESM), ``require`` (CJS), and ``types`` to match
the dual-emit shape of the rest of the package.
- ``@anthropic-ai/claude-agent-sdk`` becomes an optional peer
dependency at ``^0.2.111`` so users who don't use this integration
don't have to install it, but those who do get version compat
warnings on mismatched majors. It's also installed as a
devDependency so live tests and ``yarn build`` find type
definitions during development.
- ``src/lib/integrations/index.ts`` re-exports the new module to
match the existing pattern.
- Minor TS fixes flagged by ``tsc --exactOptionalPropertyTypes``:
drop the explicit ``undefined`` initializer on a removed-when-unused
field and cast ChatCompletionStep through ``any`` when calling
``.log()`` so subclass fields are accepted.
The wrapper is a pure observer — adds a stronger invariant test that constructs a full system/assistant/tool_use/user/result stream (including hook callbacks) and asserts every yielded message is the same object reference as what the SDK produced, in the same order.
…eAgentSdk() Adds a test that fakes a ``ClaudeSDKClient`` class on the (virtually mocked) SDK module, calls ``traceClaudeAgentSdk()``, and exercises ``client.query()`` + ``client.receive_response()`` to confirm the prototype patch installed in B9 emits a root AGENT trace with the session metadata threaded through.
…API_KEY The test skips unless ``ANTHROPIC_API_KEY`` is set. It boots the real SDK (which spawns the bundled Claude Code subprocess), runs a tiny one-turn ``claude-haiku-4-5`` query, and asserts the wrapper saw a ``result`` message with the expected content. Implementation notes: - ``@anthropic-ai/claude-agent-sdk`` ships ESM-only (``"type": "module"``) and uses ``import.meta.url``, so the @swc/jest CJS transform cannot load it from a ``require()``-style test. The live test file therefore uses an ESM ``import`` and must be run via ``NODE_OPTIONS=--experimental-vm-modules ./node_modules/.bin/jest``. - The integration's internal SDK loader now tries ``require()`` first (fast path for the virtual jest mock + CJS consumers + Node 22.12+ native ESM-require) and falls back to dynamic ``import()`` so callers on older Node or pure-CJS Jest setups still get a clear path. Unit tests stay on the synchronous require path via their virtual mock. Verified locally that the wrapper publishes a trace to Openlayer end-to-end; the supplied ANTHROPIC_API_KEY was rejected by Anthropic (``invalid x-api-key``), so the test's content assertion fails on that environment — not an integration issue. With a valid key, the test passes.
Adds ``examples/tracing/claude-agent-sdk/claudeAgentSdkTracing.ts`` —
a concise runnable example showing both ways to use the integration:
- Option A: drop-in ``import { query } from
"openlayer/lib/integrations/claudeAgentSdk"`` (recommended).
- Option B: explicit ``traceClaudeAgentSdk()`` runtime patch for
codebases that can't change their imports.
Covers two scenarios: a code-search query with ``Read``/``Glob``/``Grep``
tools, and a subagent dispatch via the SDK's ``Agent`` tool to
illustrate parent_tool_use_id-based nesting.
Also adds ``tsx`` as a dev dependency so the example can be verified
locally with ``npx tsx``; the script's import paths resolve via the
package's existing ``./*`` subpath fallback.
…able directives The project's ESLint config no longer flags ``require()`` in test contexts, so all the per-call ``no-require-imports`` opt-outs are unnecessary. Also runs prettier on the integration source, tests, and example script to bring them in line with the project's style. No behavior changes.
4 tasks
… title
The root AGENT step title was being built from the prompt content
("claude-agent-sdk: Say the word 'banana'..."), making the trace
sidebar in Openlayer noisy and inconsistent across runs. Use the
stable name "Claude Agent SDK query" instead. The prompt content is
still captured in root_step.inputs.prompt where it belongs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tearing down
The Openlayer TS tracer publishes traces via a fire-and-forget `.then()`
after `endRootStep()` returns. When the live test finished and Jest
started tearing down, the publish callback's `console.debug('Trace
uploaded successfully to Openlayer')` tripped Jest's 'Cannot log after
tests are done' guard, exiting non-zero even though the trace had been
uploaded successfully.
Add a 3s flush wait at the end of the test so the publish callback
completes inside the test's lifetime. Test-only; production code stays
fire-and-forget as designed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ts on root metadata
The root AGENT step was missing the user-provided system prompt and
subagent definitions that drove the run. Spec called for both, but the
initial implementation only captured the SDK's runtime-resolved
agent_config from SystemMessage(init), not the user's input options.
Capture, on the root step's metadata:
- system_prompt (truncated to 4096 chars; supports string and
preset object shapes)
- agents_defined: { name -> { description, prompt, tools, model } }
- options: { model, fallbackModel, maxTurns, maxBudgetUsd,
permissionMode, cwd, allowedTools, disallowedTools, continue,
resume, forkSession }
For ClaudeSDKClient, stash a shallow clone of the user's options on
the first patched query() call so they're available to receive_response()
before our hook injection mutates them in place.
New test: 'captures options.systemPrompt and options.agents on the
root step metadata'.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… to exercise options-metadata capture The live test now passes systemPrompt and maxTurns so the published Openlayer trace surfaces the new metadata captured on the root step (system_prompt, options.maxTurns), proving end-to-end that the wrapper captures the user's configuration in addition to the SDK's runtime-resolved agent_config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ase token aliases on every step
Previously the trace published to Openlayer was missing key visibility:
- Assistant turns with empty text content (e.g. thinking-only turns or
pure tool-call turns) appeared blank in the UI. Fall back to a tool
call summary, thinking text, or '[no content]' marker so reviewers
always see something useful.
- Top-level assistant turns had no 'inputs' set, so the UI couldn't
show what prompt triggered them. Surface the user's prompt as the
step input for top-level turns (subagent turns are driven by their
parent's Agent tool call, not a user prompt).
- The raw assistant message content was never serialized. Stash it in
metadata.rawOutput (TS ChatCompletionStep has no first-class
rawOutput field) so users can inspect the full block array.
- ToolUseBlocks were captured by id only; widen to { id, name, input }
so tool calls are inspectable from the assistant turn.
- Root step had no rawOutput surface; stash a JSON-serialized
ResultMessage in metadata.rawOutput.
- Capture state.model from SystemMessage(init) and surface it in root
metadata so reviewers see the resolved model.
Adjusted test_claude_agent_sdk.test.ts to match the richer tool_calls
shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a richer example that exercises every step type the Openlayer
wrapper captures, so users can see a full trace tree:
- Root AGENT step with systemPrompt, agent_config, agents_defined,
options, rawOutput.
- Per-turn CHAT_COMPLETION steps with prompt/completion tokens,
thinking blocks, tool_calls, rawOutput.
- TOOL steps for:
- mcp__file-stats__count_files (custom in-process MCP tool, built
with createSdkMcpServer + tool() and a zod schema).
- Glob and Read (built-in).
- Agent (twice: code-reviewer and summary-writer subagents).
- Nested subagent steps correlated via parent_tool_use_id.
The 'codebase analyzer' scenario walks the agent through three steps:
count files by extension, dispatch a code-reviewer subagent to review
one file, then dispatch a summary-writer subagent to wrap up. Result
is a 4-line markdown report.
Verified end-to-end against the live API:
- 5 turns, 10 assistant turn steps captured
- Both subagents dispatched, real $0.14 cost
- Trace uploaded to Openlayer with full step tree
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First-party Openlayer tracing for the Claude Agent SDK (
@anthropic-ai/claude-agent-sdkon npm).import { query } from "openlayer/lib/integrations/claudeAgentSdk"— same signature as the SDK'squery, auto-traced.traceClaudeAgentSdk()runtime-patch init for codebases that can't change imports; patchesqueryandClaudeSDKClient.prototypeidempotently.PreToolUse/PostToolUse/PostToolUseFailure); we never replace.AGENTstep perquery(), nestedCHAT_COMPLETIONper assistant turn,TOOLper tool call. Subagents nest viaparent_tool_use_id. MCP tools parsed (mcp__server__tool). Cost/tokens/duration/session_id/agent_configon root. Per-query state isolated viaAsyncLocalStorage. Stream is a pure observer../integrations/claude-agent-sdkand@anthropic-ai/claude-agent-sdkdeclared as optional peerDependency (^0.2.111).Companion Python PR at openlayer-ai/openlayer-python#641 (OPEN-10634).
Test plan
yarn jest tests/integrations/claudeAgentSdk.test.tsANTHROPIC_API_KEY)examples/tracing/claude-agent-sdk/claudeAgentSdkTracing.tsruns end-to-end vianpx tsxyarn buildsucceeds and produces correct ESM/CJS/types for the new subpath exportyarn lintcleanPlan deviations (documented in individual commits)
@anthropic-ai/claude-agent-sdkis ESM-only; loader usesrequire()with a dynamic-import()fallback so Jest workers and older Node both work.AgentStepinstance because the baseStep.log()filters unknown keys.tsxas a devDependency so the example script can be smoke-tested locally.Closes OPEN-10635.
🤖 Generated with Claude Code