Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 3 additions & 2 deletions .agents/skills/sdk-integrations/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Read these when relevant:

- `py/src/braintrust/auto.py` for `auto_instrument()` changes
- `py/src/braintrust/conftest.py` for VCR behavior
- `py/src/braintrust/integrations/conftest.py` for per-version cassette directory resolution
- `py/src/braintrust/integrations/auto_test_scripts/` for subprocess auto-instrument coverage
- `py/src/braintrust/integrations/test_utils.py` when touching shared attachment materialization or multimodal payload shaping
- `py/src/braintrust/integrations/adk/test_adk.py`, `py/src/braintrust/integrations/anthropic/test_anthropic.py`, and `py/src/braintrust/integrations/google_genai/test_google_genai.py` for attachment-focused test layout patterns
Expand Down Expand Up @@ -98,7 +99,7 @@ Do not start by wiring wrappers and only later decide what the span should conta
- `patchers.py`
- `tracing.py`
- `test_<provider>.py`
- `cassettes/` when the provider uses HTTP
- `cassettes/<version>/` when the provider uses HTTP (one subdirectory per version in the nox matrix, plus `latest/`)
3. Export the integration from `py/src/braintrust/integrations/__init__.py`.
4. Add or update the provider session in `py/noxfile.py`.
5. Update `py/src/braintrust/auto.py` only if the integration should participate in `auto_instrument()`.
Expand Down Expand Up @@ -283,7 +284,7 @@ Also verify, when relevant:
- the `metadata` contains finish reasons, ids, or annotations in the expected place
- binary payloads are represented as `Attachment` objects where applicable, while remote URLs and non-attachment values remain unchanged and unmaterialized file inputs are preserved rather than dropped

Keep VCR cassettes in `py/src/braintrust/integrations/<provider>/cassettes/`. Re-record only when behavior intentionally changes.
Keep VCR cassettes in `py/src/braintrust/integrations/<provider>/cassettes/<version>/` (e.g. `cassettes/latest/`, `cassettes/0.48.0/`). Nox sessions set `BRAINTRUST_TEST_PACKAGE_VERSION` automatically so cassettes land in the correct version subdirectory. Do not add per-test `vcr_cassette_dir` or `cassette_library_dir` fixtures; the shared `py/src/braintrust/integrations/conftest.py` handles version routing. Re-record only when behavior intentionally changes.

When the provider returns binary HTTP responses or generated media, sanitize cassettes as needed so fixtures do not store raw file bytes.

Expand Down
28 changes: 25 additions & 3 deletions .agents/skills/sdk-vcr-workflows/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,13 @@ Always read:
- `AGENTS.md`
- `py/noxfile.py`
- `py/src/braintrust/conftest.py`
- `py/src/braintrust/integrations/conftest.py` (shared version-aware cassette directory resolution)
- the target provider test file under `py/src/braintrust/integrations/<provider>/` or `py/src/braintrust/wrappers/`

Read when relevant:

- the provider integration package under `py/src/braintrust/integrations/<provider>/`
- the provider cassette directory under `py/src/braintrust/integrations/<provider>/cassettes/`
- the provider cassette directory under `py/src/braintrust/integrations/<provider>/cassettes/<version>/`
- `py/src/braintrust/cassettes/`
- `py/src/braintrust/wrappers/cassettes/`
- `py/src/braintrust/devserver/cassettes/`
Expand Down Expand Up @@ -54,11 +55,26 @@ Current defaults:
- wheel mode skips VCR-marked tests
- fixtures inject dummy API keys and reset global state

### Per-version cassette directories

Integration cassettes are stored in version-specific subdirectories:

- `py/src/braintrust/integrations/<provider>/cassettes/<version>/` (e.g. `cassettes/latest/`, `cassettes/1.71.0/`)

The mechanism:

- Nox sessions pass the version under test via the `BRAINTRUST_TEST_PACKAGE_VERSION` env var.
- The shared `py/src/braintrust/integrations/conftest.py` provides a `vcr_cassette_dir` fixture that resolves `<test_dir>/cassettes/<version>/`.
- When the env var is absent (e.g. running a test file directly outside nox), cassettes fall back to the base `cassettes/` directory.
- Individual test files do not define their own `vcr_cassette_dir` or `cassette_library_dir` fixtures; the shared conftest handles it.
- Claude Agent SDK `_test_transport.py` also reads `BRAINTRUST_TEST_PACKAGE_VERSION` for version-specific cassette resolution.

Implications:

- A test that passes locally by silently recording new traffic may still fail in CI if the cassette is missing or stale.
- CI will not save you by recording fresh traffic. If the cassette is wrong, CI should fail.
- Reproducing a CI VCR failure locally usually means running the exact nox session named in `py/noxfile.py`, not raw pytest in whatever environment happens to exist.
- When re-recording, always use a nox session so cassettes land in the correct version subdirectory.

## Standard Workflow

Expand Down Expand Up @@ -108,8 +124,10 @@ Common locations in this repo:
- `py/src/braintrust/cassettes/`
- `py/src/braintrust/wrappers/cassettes/`
- `py/src/braintrust/devserver/cassettes/`
- `py/src/braintrust/integrations/<provider>/cassettes/`
- `py/src/braintrust/wrappers/claude_agent_sdk/cassettes/` for Claude Agent SDK subprocess transport recordings
- `py/src/braintrust/integrations/<provider>/cassettes/<version>/` for per-version integration cassettes
- `py/src/braintrust/integrations/claude_agent_sdk/cassettes/<version>/` for Claude Agent SDK subprocess transport recordings

Each version tested in a nox session gets its own cassette subdirectory (e.g. `cassettes/latest/`, `cassettes/0.48.0/`). The `latest` subdirectory holds cassettes for the unpinned latest version.

Keep cassettes next to the tests they support. When migrating or moving tests, move the cassettes with them.

Expand Down Expand Up @@ -241,6 +259,8 @@ Important differences:

- it talks to the bundled `claude` subprocess over stdin/stdout
- it uses transport-level cassette helpers instead of HTTP request recording
- cassettes are stored per-version under `integrations/claude_agent_sdk/cassettes/<version>/`
- `_test_transport.py` reads `BRAINTRUST_TEST_PACKAGE_VERSION` (set by nox) to resolve the version subdirectory
- use `BRAINTRUST_CLAUDE_AGENT_SDK_RECORD_MODE=all` when re-recording

Do not try to force ordinary HTTP VCR patterns onto Claude Agent SDK subprocess tests.
Expand Down Expand Up @@ -278,3 +298,5 @@ Avoid these failures:
- forgetting that Claude Agent SDK uses subprocess transport recordings, not HTTP VCR
- leaving duplicate stale cassettes behind after moving tests or renaming scenarios
- broad re-records that create unnecessary review noise
- running raw pytest outside nox when re-recording, causing cassettes to land in the base `cassettes/` directory instead of the correct version subdirectory
- adding per-test `vcr_cassette_dir` or `cassette_library_dir` fixtures — the shared `integrations/conftest.py` handles version routing
13 changes: 12 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,15 @@ Cassette locations:
- `py/src/braintrust/cassettes/`
- `py/src/braintrust/wrappers/cassettes/`
- `py/src/braintrust/devserver/cassettes/`
- `py/src/braintrust/wrappers/claude_agent_sdk/cassettes/` for Claude Agent SDK subprocess transport recordings
- `py/src/braintrust/integrations/<provider>/cassettes/<version>/` for per-version integration cassettes
- `py/src/braintrust/integrations/claude_agent_sdk/cassettes/<version>/` for Claude Agent SDK subprocess transport recordings

Per-version cassette directories:

- Integration and Claude Agent SDK cassettes are stored in version-specific subdirectories (e.g. `cassettes/latest/`, `cassettes/1.71.0/`).
- Nox sessions set the `BRAINTRUST_TEST_PACKAGE_VERSION` env var, which `py/src/braintrust/integrations/conftest.py` uses to resolve the correct subdirectory.
- When running a test file directly (outside nox), the env var is absent and cassettes resolve to the base `cassettes/` directory for backward compatibility.
- Individual test files do not define their own `vcr_cassette_dir` fixtures; the shared `integrations/conftest.py` handles it.

Behavior from `py/src/braintrust/conftest.py`:

Expand All @@ -208,11 +216,14 @@ nox -s "test_openai(latest)" -- --disable-vcr
nox -s "test_openai(latest)" -- --vcr-record=all -k "test_openai_chat_metrics"
```

When re-recording, the nox session sets `BRAINTRUST_TEST_PACKAGE_VERSION` automatically, so cassettes land in the correct version subdirectory.

Claude Agent SDK note:

- it does not use HTTP VCR
- it talks to the bundled `claude` subprocess over stdin/stdout
- it uses transport-level cassette helpers instead
- cassettes are also stored per-version under `integrations/claude_agent_sdk/cassettes/<version>/`

Common Claude Agent SDK commands:

Expand Down
6 changes: 5 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ Many wrapper and devserver tests use VCR cassettes.
- In CI, missing cassettes fail because `record_mode="none"` is used.
- If your change intentionally changes HTTP behavior, re-record the affected cassettes and commit them.

Integration cassettes are stored in **per-version subdirectories** (e.g. `cassettes/latest/`, `cassettes/1.71.0/`). Nox sessions set `BRAINTRUST_TEST_PACKAGE_VERSION` automatically so cassettes land in the correct directory when recording. The shared `py/src/braintrust/integrations/conftest.py` resolves the version-specific path; individual test files do not need their own `vcr_cassette_dir` fixtures.

Useful example:

```bash
Expand All @@ -116,7 +118,9 @@ nox -s "test_openai(latest)" -- --vcr-record=all -k "test_openai_chat_metrics"

`claude_agent_sdk` tests use the real SDK and bundled `claude` CLI, but they do not use VCR. Instead they record and replay the SDK/CLI JSON transport under:

- `py/src/braintrust/wrappers/claude_agent_sdk/cassettes/`
- `py/src/braintrust/integrations/claude_agent_sdk/cassettes/<version>/`

Like HTTP VCR cassettes, Claude Agent SDK cassettes are stored in per-version subdirectories. The `BRAINTRUST_TEST_PACKAGE_VERSION` env var (set by nox) selects the correct directory.

Behavior:

Expand Down
2 changes: 0 additions & 2 deletions docs/vcr-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,6 @@ The key difference between local and CI is the cassette record mode.

- **Record mode:** `once` -- records a new cassette if one doesn't exist, replays if it does.
- **API keys:** You need real API keys set in your environment to record new cassettes.
- **`test_latest_wrappers_novcr` session:** Runs normally, making real API calls (no VCR).

```bash
# Run tests (replays cassettes, records missing ones with real keys)
Expand Down Expand Up @@ -65,7 +64,6 @@ BRAINTRUST_CLAUDE_AGENT_SDK_RECORD_MODE=all \
- `ANTHROPIC_API_KEY` -> `sk-ant-test-dummy-api-key-for-vcr-tests`
- `GOOGLE_API_KEY` -> `your_google_api_key_here`
- **Claude Agent SDK tests:** Replay committed subprocess cassettes from `py/src/braintrust/wrappers/claude_agent_sdk/cassettes/`.
- **`test_latest_wrappers_novcr` session:** Automatically skipped in CI since it disables VCR and would need real keys.
- **No secrets needed:** CI workflows do not pass `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, or `GEMINI_API_KEY` as secrets. This means forks and external contributors can run CI without configuring any API key secrets.

## Recording Modes
Expand Down
Loading