Skip to content

feat: Add auto and wrapper instrumentation for @github/copilot-sdk#1932

Open
Stephen Belanger (Qard) wants to merge 2 commits intomainfrom
feat/github-copilot-sdk-instrumentation
Open

feat: Add auto and wrapper instrumentation for @github/copilot-sdk#1932
Stephen Belanger (Qard) wants to merge 2 commits intomainfrom
feat/github-copilot-sdk-instrumentation

Conversation

@Qard
Copy link
Copy Markdown
Contributor

Summary

  • Adds auto-instrumentation (via --import braintrust/hook.mjs and bundler plugins) and a wrapCopilotClient(client) manual wrapper for @github/copilot-sdk v0.3+
  • Span tree: Copilot Session (TASK) → Copilot Turn (TASK) → github.copilot.llm (LLM) / tool: <name> (TOOL) / Agent: <name> (TASK for sub-agents)
  • Token metrics on LLM spans use the shared Anthropic cache-token helpers: prompt_tokens, completion_tokens, prompt_cached_tokens, prompt_cache_creation_tokens, completion_reasoning_tokens, reasoning_tokens, tokens
  • Copilot-specific billing fields (cost, quotaSnapshots, copilotUsage) flow into namespaced github_copilot.* metadata, not metrics
  • Orchestrion configs target CopilotClient.{createSession,resumeSession} in dist/client.js and CopilotSession.sendAndWait in dist/session.js (both ESM and CJS)
  • Also fixes imports.test.ts to skip node_modules and .d.ts files during the dynamic-import scan (those directories can be populated by test fixture installs)

Architecture notes

The Copilot SDK is an agent-runtime client (JSON-RPC over a CLI subprocess), not a chat-completions wrapper — so instrumentation follows the lifecycle/processor pattern used for @anthropic-ai/claude-agent-sdk rather than the per-method pattern used for Groq/OpenAI.

Span lifecycle is driven by the assistant.turn_start/turn_end event stream and assistant.usage (per-LLM-call metrics). Tool and sub-agent spans use tool.execution_start/complete and subagent.started/completed/failed events. Session close is triggered by the injected onSessionEnd hook.

Parent span IDs are resolved by awaiting span.export() (same pattern as claude-agent-sdk-plugin.ts). Async event handlers are chained through state.processing: Promise<void> to preserve ordering.

Test plan

  • tsc --noEmit clean (no type errors)
  • pnpm run build succeeds
  • All 961 unit tests pass
  • pnpm run lint — warnings only (same pattern as rest of codebase)
  • pnpm run formatting clean
  • Unit tests cover: extractMetricsFromUsage (all token/cache/reasoning/billing fields), plugin lifecycle, orchestrion config shape, BraintrustPlugin wiring
  • E2e scenario at e2e/scenarios/github-copilot-instrumentation/ requires a GitHub Copilot subscription or a BYOK provider endpoint (BRAINTRUST_E2E_MODEL_BASE_URL)

🤖 Generated with Claude Code

Stephen Belanger (Qard) and others added 2 commits May 2, 2026 13:23
Adds full Braintrust tracing support for the GitHub Copilot SDK
(`@github/copilot-sdk` v0.3+). Both auto-instrumentation (via
`--import braintrust/hook.mjs` / bundler plugins) and a manual
`wrapCopilotClient(client)` wrapper are provided.

Span tree produced per session:
  Copilot Session (TASK) → Copilot Turn (TASK) → github.copilot.llm (LLM)
                                                 → tool: <name> (TOOL)
                                                 → Agent: <name> (TASK, sub-agents)

Token metrics on LLM spans use the Anthropic cache-token helpers:
prompt_tokens, completion_tokens, prompt_cached_tokens,
prompt_cache_creation_tokens, completion_reasoning_tokens, reasoning_tokens,
tokens. Copilot-specific billing data (cost multiplier, quota snapshots,
copilot_usage) flows into namespaced metadata rather than metrics.

Also fixes imports.test.ts to skip node_modules and .d.ts files when
scanning for dynamic import violations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n to 3.10.0

- Remove unused `ChannelMessage` import in github-copilot-plugin.ts (lint error)
- Fix `anthropic.test.ts`: system prompt says "Just the poem" so asserting
  `output.contains("shakespeare")` is wrong — the model correctly returns only
  the poem content without mentioning the author
- Apply all pending changesets via `changeset version` to bump braintrust to
  3.10.0, so the API compatibility test detects a `minor` version bump rather
  than `none` and does not flag optional interface additions as breaking

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant