[bot] Add Ollama Python SDK integration for chat, generate, and embed instrumentation

## Summary

The Ollama Python SDK (`ollama`) is the official Python client for Ollama, the leading platform for running LLMs locally. It provides execution APIs for chat completions (`chat()`), text generation (`generate()`), and embeddings (`embed()`) with a unique API surface distinct from OpenAI's format. This repository has zero instrumentation for any Ollama SDK surface — no integration, no wrapper, no patcher, no `auto_instrument()` support.

The `ollama` package has 9.9k GitHub stars, is used by ~34.8k downstream projects, and is actively maintained (latest: v0.6.2, April 29, 2026). It is one of the most widely used interfaces for local LLM inference in the Python ecosystem.

While Ollama also exposes an OpenAI-compatible HTTP endpoint, most Python users interact through the native `ollama` SDK which has its own request/response schemas. `wrap_openai()` cannot be used with the native `ollama.Client` or module-level functions. The AgentScope integration in this repo patches `agentscope.model.OllamaChatModel.__call__` (an AgentScope wrapper), not the Ollama SDK itself — direct `ollama.chat()` calls produce no Braintrust spans.

## What needs to be instrumented

The `ollama` package (v0.6.2) exposes these execution surfaces via module-level functions, `Client`, and `AsyncClient`, none of which are instrumented:

### Chat (highest priority)

| SDK Method | Description | Streaming | Return type |
|---|---|---|---|
| `ollama.chat(model, messages, ...)` | Chat completions with conversation history and tool use | `stream=True` returns iterator of dicts | `dict` with `message`, `model`, `eval_count`, `prompt_eval_count` |

**Response shape:** Returns a dict with `message` (role + content), `model`, `eval_count` (completion tokens), `prompt_eval_count` (prompt tokens), `total_duration`, `load_duration`, `prompt_eval_duration`, `eval_duration`. Token usage and latency metrics are directly available.

**Tool calling:** Supports `tools` parameter for function calling. Tool calls appear in `message.tool_calls`.

### Generate

| SDK Method | Description | Streaming | Return type |
|---|---|---|---|
| `ollama.generate(model, prompt, ...)` | Text generation from a prompt | `stream=True` returns iterator of dicts | `dict` with `response`, `model`, `eval_count`, `prompt_eval_count` |

### Embed

| SDK Method | Description | Return type |
|---|---|---|
| `ollama.embed(model, input)` | Generate embeddings from text (single or batch) | `dict` with `embeddings`, `model` |

All methods have corresponding `Client` instance methods and `AsyncClient` async variants with identical signatures.

## Implementation notes

**Module-level and client-level API:** The `ollama` package exposes both module-level convenience functions (`ollama.chat(...)`) and class-based clients (`Client().chat(...)`, `AsyncClient().chat(...)`). Both need instrumentation.

**Patching strategy:** The module-level functions delegate to a default `Client` instance. Patching `Client.chat`, `Client.generate`, `Client.embed` and corresponding `AsyncClient` methods should cover both usage patterns.

**Streaming:** Both `chat` and `generate` support `stream=True`, returning iterators of partial response dicts. The integration must accumulate chunks and finalize the span when the stream is exhausted.

**Rich timing metrics:** Ollama responses include `total_duration`, `load_duration`, `prompt_eval_duration`, and `eval_duration` in nanoseconds, providing fine-grained latency data beyond what most cloud providers expose.

**Parameters relevant for span metadata:** `model`, `options` (contains `temperature`, `top_p`, `top_k`, `num_predict`, `stop`, `seed`), `format` (structured output), `tools`, `keep_alive`.

**No API key:** Ollama runs locally, so there's no API key to sanitize in VCR cassettes. However, testing requires a running Ollama server with models pulled.

## Proposed span shape

### `chat()` / `generate()`

| Span field | Content |
|---|---|
| **input** | `messages` (chat) or `prompt` (generate), `system`, `tools` |
| **output** | `message` (chat) or `response` (generate) |
| **metadata** | `provider: "ollama"`, `model`, options (temperature, etc.) |
| **metrics** | `tokens`, `prompt_tokens`, `completion_tokens`, `time_to_first_token` (streaming), Ollama-specific durations |

### `embed()`

| Span field | Content |
|---|---|
| **input** | `input` text(s) |
| **output** | Embedding dimensions/count |
| **metadata** | `provider: "ollama"`, `model` |

## No coverage in any instrumentation layer

- No integration directory (`py/src/braintrust/integrations/ollama/`)
- No wrapper function (e.g. `wrap_ollama()`)
- No patcher in any existing integration (the AgentScope `_OllamaChatModelPatcher` patches AgentScope's model wrapper, not the `ollama` SDK)
- No nox test session (`test_ollama`)
- No version entry in `py/src/braintrust/integrations/versioning.py`
- No mention in `py/src/braintrust/integrations/__init__.py`

A grep for `ollama` across `py/src/braintrust/integrations/` returns only `agentscope/patchers.py` which patches AgentScope's own `OllamaChatModel` class, not the `ollama` SDK.

## Braintrust docs status

`not_found` — Ollama is not listed on the [Braintrust tracing guide](https://www.braintrust.dev/docs/guides/tracing) or the [integrations directory](https://www.braintrust.dev/docs/integrations). The [custom providers page](https://www.braintrust.dev/docs/integrations/ai-providers/custom) documents using Ollama's OpenAI-compatible endpoint via the proxy, but this does not cover native `ollama` SDK calls.

## Upstream references

- Ollama Python SDK on PyPI: https://pypi.org/project/ollama/
- Ollama Python SDK on GitHub: https://github.com/ollama/ollama-python
- Ollama main project: https://github.com/ollama/ollama
- Ollama API docs: https://github.com/ollama/ollama/blob/main/docs/api.md

## Local repo files inspected

- `py/src/braintrust/integrations/` — no `ollama/` directory exists on `main`
- `py/src/braintrust/wrappers/` — no Ollama wrapper
- `py/noxfile.py` — no `test_ollama` session
- `py/src/braintrust/integrations/__init__.py` — Ollama not listed in integration registry
- `py/src/braintrust/integrations/versioning.py` — no Ollama version matrix
- `py/src/braintrust/integrations/agentscope/patchers.py` — patches `agentscope.model.OllamaChatModel`, not the native `ollama` SDK

Span field	Content
input	`messages` (chat) or `prompt` (generate), `system`, `tools`
output	`message` (chat) or `response` (generate)
metadata	`provider: "ollama"`, `model`, options (temperature, etc.)
metrics	`tokens`, `prompt_tokens`, `completion_tokens`, `time_to_first_token` (streaming), Ollama-specific durations

Span field	Content
input	`input` text(s)
output	Embedding dimensions/count
metadata	`provider: "ollama"`, `model`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Add Ollama Python SDK integration for chat, generate, and embed instrumentation #389

Summary

What needs to be instrumented

Chat (highest priority)

Generate

Embed

Implementation notes

Proposed span shape

`chat()` / `generate()`

`embed()`

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bot] Add Ollama Python SDK integration for chat, generate, and embed instrumentation #389

Description

Summary

What needs to be instrumented

Chat (highest priority)

Generate

Embed

Implementation notes

Proposed span shape

chat() / generate()

embed()

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`chat()` / `generate()`

`embed()`