From 5845dab051ac108e30f1b509b46361ddb0ac79b6 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Mon, 15 Jun 2026 20:44:16 -0700 Subject: [PATCH 1/5] Add skill to migrate, reiew migration and add conformance tests --- .../skills/port-from-openinference/SKILL.md | 428 ++++++++++++++++++ .github/skills/review-ported/SKILL.md | 251 ++++++++++ .../skills/write-conformance-tests/SKILL.md | 262 +++++++++++ .gitignore | 3 + 4 files changed, 944 insertions(+) create mode 100644 .github/skills/port-from-openinference/SKILL.md create mode 100644 .github/skills/review-ported/SKILL.md create mode 100644 .github/skills/write-conformance-tests/SKILL.md diff --git a/.github/skills/port-from-openinference/SKILL.md b/.github/skills/port-from-openinference/SKILL.md new file mode 100644 index 00000000..71bc577d --- /dev/null +++ b/.github/skills/port-from-openinference/SKILL.md @@ -0,0 +1,428 @@ +--- +name: port-from-openinference +description: Port an openinference-instrumentation-* package from https://github.com/open-telemetry/donation-openinference into this repo as a new package. Use when a user asks to migrate a package from OpenInference. +--- + +# Port an OpenInference `instrumentation-*` package + +Migrate an `openinference-instrumentation-` package from +https://github.com/open-telemetry/donation-openinference into this +repo as a new package under `instrumentation/`. This is a **new implementation** +that emits OTel GenAI semantic conventions through `opentelemetry-util-genai`. + +The major +work items are: rewriting the patcher to method-level (step 5), mapping every +request/response shape into OTel `InputMessage`/`OutputMessage` parts +(step 6), and migrating the unit-test corpus while filtering openinference-framework +plumbing tests out (step 7). + +## Inputs + +User specifies the source, e.g. `openinference-instrumentation-crewai`. + +- **Source**: `https://github.com/open-telemetry/donation-openinference/tree/main/python/instrumentation//`. + Fetch a fresh shallow clone if you don't already have one locally: + ```sh + git clone --depth=1 https://github.com/open-telemetry/donation-openinference.git /tmp/openinference + ``` + and use `/tmp/openinference/python/instrumentation//` as the + source path in step 1. + +User may also provide ***target package name**. If not provided: derive it from the source name: +- drop the leading `openinference-instrumentation-`. Remaining part should match the instrumented library name as it appears on PyPI. If it's not the case, flag it. +- The target package name should be `opentelemetry-instrumentation-genai-` where `` is the instrumented library name (e.g. `openai`, `anthropic`, `bedrock`). For example: + - `openinference-instrumentation-openai` → `opentelemetry-instrumentation-genai-openai` + - `openinference-instrumentation-anthropic` → `opentelemetry-instrumentation-genai-anthropic` + Confirm the chosen name with the user. + +## Reference material + +- **OTel GenAI spans**: — authoritative attribute names and operation enum. +- **OpenInference → OTel attribute mapping** (Arize-maintained): . Use as a quick lookup for what an OpenInference attribute *roughly* corresponds to in OTel; when the mapping disagrees with the official semconv, **the official semconv wins**. +- **Message JSON schemas**: + - input messages: + - output messages: + - system instructions: + - tool definitions: + - retrieval documents: + +- **Code for above models**: . + +## Non-negotiable rules + +The repo-wide rules in [AGENTS.md](../../../AGENTS.md) already apply +(telemetry through `opentelemetry-util-genai` public surface only, no +`type: ignore`, semconv enums over string literals, re-raise caught +exceptions). The rules below are the ones the port is most likely to +violate: + +1. **Zero OpenInference dependencies.** No `openinference-instrumentation`, + no `openinference-semantic-conventions`, no `openinference-*` anywhere + in the port's `src/` or `tests/`. Verify with + `rg openinference instrumentation/` — output must be empty. +2. **Public util-genai surface only.** Beyond the AGENTS.md rule, the port + must not import any `opentelemetry.util.genai._*` module — the allowed + modules are enumerated in step 4. +3. **Ignore all other OpenInference instrumentations during the port.** The only + instrumentation code to read is the OpenInference package being migrated + plus `opentelemetry-util-genai`. Build + from first principles: original OpenInference code + util-genai public API + + official semconv spec. +4. **Never work around gaps.** If util-genai or the GenAI semconv is + missing something, flag it and fail the test intentionally +5. **Do not make OTel API calls.** **Exception:** + semconv attributes that exist in the registry but have no named property + on `InferenceInvocation` (e.g. `gen_ai.usage.cache_creation.input_tokens`, + `gen_ai.usage.cache_read.input_tokens`) may be set via + `invocation.attributes[KEY] = …` — that's still going through the + util-genai extension point. Import the key from + `opentelemetry.semconv._incubating.attributes.gen_ai_attributes`. Do + not invent attribute names that aren't in the semconv. +6. **Reuse VCR cassettes.** Reuse cassettes from the OpenInference + tests when possible. +7. **Conformance tests must never be silently skipped.** + When instrumentation can't be made conformant due to missing + information, gap in semantic conventions, or in util-genai, still + write the scenario and let it fail. + + Ask user to decide if they want to mark the scenario as skipped with a reason, or + add an `expected_violation` in the scenario that covers the missing piece. + All these must be documented in `MIGRATION_REPORT.md` as well, with links to the skipped scenario and the expected violation. + +8. **Do not modify weaver policies.** + +## Migration flow + +### 1. Create the target package + +```sh +cp -R / instrumentation// +cd instrumentation/ +rm -rf .pytest_cache .tox .venv venv .vscode .DS_Store .claude .ruff_cache +find . -name __pycache__ -type d -exec rm -rf {} + +rm -f CHANGELOG.md # OpenInference's per-package history doesn't apply here +``` + +Keep `LICENSE` (Apache-2.0). Don't carry over `examples/` directories or +OpenInference's `README.md` — both are rewritten below. Per-package +`CHANGELOG.md` is towncrier-generated at release time; don't carry one over, +and add a fragment for the new package per the **Changelog** section of +[AGENTS.md](../../../AGENTS.md). + +### 2. Rename the Python module + +OpenInference packages live at `src/openinference/instrumentation//`. +Move that tree under the OTel GenAI namespace, update path according to the target package name: + +```sh +mkdir -p src/opentelemetry/instrumentation/genai +mv src/openinference/instrumentation/ src/opentelemetry/instrumentation/genai/ +rm -rf src/openinference +``` + +Update every import. Verify zero `openinference` references remain in +`src/`, `tests/`, README. The instrumentor class typically renames from +`Instrumentor` (kept as-is — same name is fine). + +### 3. Update `pyproject.toml`, `version.py`, and `README.rst` + +- `[project] name` → new package name. +- `[project.entry-points.opentelemetry_instrumentor]` → un-prefixed lib name + pointing at the new module path + (` = "opentelemetry.instrumentation.genai.:Instrumentor"`). +- Hatch version path, project URLs, classifiers → new repo paths. +- **Strip every `openinference-*` dependency.** OpenInference packages typically depend on + `openinference-instrumentation`, `openinference-semantic-conventions`, and + sometimes `opentelemetry-instrumentation` for the `BaseInstrumentor` mixin + — keep only the last one. Replace with `opentelemetry-instrumentation` (for + `BaseInstrumentor`) and the underlying SDK (`openai`, `anthropic`, …) at + the same range OpenInference was using. +- `__version__` in `version.py` should equal the value in + `opentelemetry-util-genai/src/opentelemetry/util/genai/version.py` — all + workspace packages share one version. Verify: + + ```sh + diff <(grep ^__version__ instrumentation//src/opentelemetry/instrumentation/genai//version.py) \ + <(grep ^__version__ opentelemetry-util-genai/src/opentelemetry/util/genai/version.py) + ``` + +- Hatchling builds **require a `README.md` or `README.rst`**. Rewrite it to + point at the new repo URLs and module path; drop OpenInference links, OpenInference badges, + `using_attributes(...)` examples, `OpenInferenceTracer` / `TraceConfig` + configuration, and any "OpenInference semconv" links. Include a usage snippet + importing from `opentelemetry.instrumentation.genai.`, a pointer to + `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT`, and a pointer to + `tests/conformance/` (no `examples/`). + +### 4. Drop OpenInference plumbing + +OpenInference ships a framework that's incompatible with this repo's util-genai model +— excise it before touching the patcher. + +```sh +rg 'openinference|OpenInferenceTracer|TraceConfig|using_attributes|using_session|using_user|using_metadata|using_tags|SpanAttributes\.|OpenInferenceMimeTypeValues|OpenInferenceSpanKindValues|safe_json_dumps' src/ tests/ +``` + +Drop every match. The mappings: + +- **`OpenInferenceTracer` / `TraceConfig`** — replaced by `TelemetryHandler` from + `opentelemetry.util.genai.handler`. Instrumentation code does not + construct or pass tracers. +- **`using_attributes(session_id=…, user_id=…, …)` / `using_session` / + `using_metadata` / `using_tags`** — there is **no OTel GenAI equivalent** + context-propagation API. Drop the calls and the tests that exercise them + (those go in the test "skip with reason" bucket — see step 7). +- **`OpenInferenceSpanKindValues` / `OpenInferenceMimeTypeValues`** — drop; + span kind is set by util-genai based on the invocation type. +- **`SpanAttributes.LLM_*` / `SpanAttributes.INPUT_*`** — flat-string + OpenInference-semconv attributes. Replaced by typed `InputMessage` / `OutputMessage` + payloads serialized into `gen_ai.input.messages` / `gen_ai.output.messages` + by util-genai. The conversion is in step 6. +- **`safe_json_dumps`** — drop; util-genai serializes message payloads. + +**Any import of `opentelemetry.util.genai._` from instrumentation +`src/` is a violation.** Public surface only: + +- `opentelemetry.util.genai.handler` — `TelemetryHandler` +- `opentelemetry.util.genai.invocation` — `InferenceInvocation`, + `EmbeddingInvocation`, `ToolInvocation`, `WorkflowInvocation`, + `AgentInvocation`, `Error`, `GenAIInvocation` +- `opentelemetry.util.genai.types` — `InputMessage`, `OutputMessage`, + `Text`, `ToolCallRequest`, `ToolCallResponse`, `Reasoning`, + `ServerToolCall`, `ServerToolCallResponse`, `GenericPart`, `Blob`, + `File`, `Uri`, `Modality` +- `opentelemetry.util.genai.completion_hook` +- `opentelemetry.util.genai.environment_variables` + +```sh +rg 'from opentelemetry\.util\.genai\._' instrumentation//src/ +``` + +Output must be empty. + +### 5. Rewrite patching: transport → API method level + +This is the largest behavioral change. OpenInference typically patches at +the HTTP-transport layer (`OpenAI.request`, `AsyncOpenAI.request`, +`HTTPClient._send_request`, etc.) and dispatches by `cast_to` response type. +**That pattern does not survive the port.** util-genai's +`InferenceInvocation` model needs the request kwargs (`model`, `messages`, +`tools`, `stream`, …) at call time, which only the API methods see. + +Replace every transport-level wrapper with method-level +`wrap_function_wrapper` calls — one per public API endpoint — using +**positional args only** (newer wrapt versions reject keyword args): + +```python +from wrapt import wrap_function_wrapper +from opentelemetry.instrumentation.utils import unwrap + +wrap_function_wrapper( + "openai.resources.chat.completions", # module (positional) + "Completions.create", # name (positional) + chat_completions_wrapper, # wrapper (positional) +) +``` + +For uninstrument, use `opentelemetry.instrumentation.utils.unwrap` (matching +positional module + name). + +**Patch ALL API methods OpenInference instruments.** Walk OpenInference's +`_instrumentor.py` / `instrumentor.py` plus the dispatch table in the +transport accumulator and enumerate every endpoint OpenInference emits a span for — +including ones with only generic attribute extraction (assistants, threads, +files, fine-tuning, vector stores, batches, uploads, moderations, …). Each +becomes one `wrap_function_wrapper` call. Dropping coverage for an endpoint +because it's "legacy" or "rarely used" is a regression — OTel GenAI semconv +applies to every inference and embedding API regardless of vintage. If a +specific API has no util-genai invocation type yet, that's a gap for the +review report (see step 11), not a reason to drop the patch. + +### 6. Map every request and response shape into OTel GenAI types + +For each wrapped method, walk OpenInference's input-parsing branch by branch and +ensure each branch has a corresponding mapping in the new wrapper. Same +for output parsing. A wrapper that handles `str` input but not `list` +input (when the original SDK accepts both) is incomplete and must not +ship. + +Mapping cheat sheet — OpenInference source shape on the left, OTel construct on the +right (all types from `opentelemetry.util.genai.types` unless noted): + +| Source request item | OTel construct | +|---|---| +| User / assistant / system text message | `Input/OutputMessage(role=…, parts=[Text(content=…)])` | +| Assistant message containing a tool/function call | `Message(role="assistant", parts=[ToolCallRequest(name=…, id=…, arguments=…)])` | +| Tool/function result message | `Message(role="tool", parts=[ToolCallResponse(id=…, response=…)])` | +| Reasoning / thinking item | `Message(role="assistant", parts=[Reasoning(content=…)])` | +| Server-side tool call (web_search, file_search, code_interpreter, …) | `Message(parts=[ServerToolCall(name=…, server_tool_call=…, id=…)])` | +| Server-side tool call result | `Message(parts=[ServerToolCallResponse(server_tool_call_response=…, id=…)])` | +| Inline image / audio / video bytes | `Blob(mime_type=…, modality="image"\|"audio"\|"video", content=b"…")` | +| External media URL | `Uri(mime_type=…, modality=…, uri="https://…")` | +| File reference (e.g. OpenAI `file_id`) | `File(mime_type=…, modality=…, file_id="file-…")` | +| Provider-specific item with no semconv mapping | `GenericPart(value=…)` — never silently drop. Flag those in the review report. | + +Output messages mirror the input mapping — `OutputMessage` serializes with +a `parts` array (not `content`); each part has a `type` field. When +asserting on `gen_ai.output.messages`, parse the JSON and check +`msg["parts"]`. + +`Modality` is `Literal["image", "video", "audio"]`. `error.type` and span +status come from `invocation.fail(exc)` — do not emit a separate span +exception event. + +### 7. Restructure tests + +```text +tests/ + cassettes/.yaml + conformance/ + inference.py / embedding.py / ... # see step 8 + conftest.py + test_.py # unit tests; refactor onto helpers + requirements.{oldest,latest}.txt +``` + +**Categorize OpenInference tests before migrating.** List every test function in the +OpenInference package and bucket each one: + +- ✅ **Migrate** — exercises a patched API method. Rewrite assertions + from flat OpenInference semconv attributes (`SpanAttributes.LLM_INPUT_MESSAGES_…`) + to OTel constructs: assert on `span.attributes[GenAIAttributes.GEN_AI_…]` + (semconv constants), and parse `gen_ai.input.messages` / `gen_ai.output.messages` + JSON to check the `parts` arrays. +- ✅ **Migrate (rewrite)** — unit test for an OpenInference-internal helper + (attribute extractor, message parser, etc.) where the helper is gone but + the **parsing scenario** still applies. Rewrite as an integration test + that feeds the same response shape through VCR and asserts the + resulting OTel telemetry. This is the bucket most likely to be + mis-categorized — anything covering tool-call objects, refusals, + reasoning items, multi-content messages, token-usage breakdowns, image + inputs, raw-response variants (e.g. `with_raw_response`) is parsing + domain logic and **must** be migrated. The wrapper has to do the + parsing; the test has to verify it. +- ❌ **Skip with reason** — exercises OpenInference framework plumbing with no OTel + equivalent: `using_attributes()` context propagation, `TraceConfig` + masking, `OpenInferenceTracer` behavior, OpenInference flat-attribute naming format, + `OpenInferenceSpanKindValues` checks. Document the reason in a comment + in the test file (or in `MIGRATION_REPORT.md` later — see step 11). + Do **not** skip a test because the API it exercises is "legacy" or + "new" — semconv applies to all inference APIs. + +Decision rule for the ✅ rewrite vs ❌ skip split: ask whether the test +covers a *response shape* from the instrumented library, or *OpenInference framework +behavior*. A test that constructs a tool-call object with `arguments` / +`call_id` / `name` and verifies it's extracted into attributes covers a +response shape — migrate it. A test that checks +`using_attributes(session_id=…)` propagates into span attributes covers OpenInference +framework behavior — skip it. + +**Sanity check before committing step 7.** Count source vs ported tests: + +```sh +rg -c '^\s*(async )?def test_' /tests/ +rg -c '^\s*(async )?def test_' instrumentation//tests/ +``` + +A port that drops from 80 tests to 5 is a regression — go back to the +"migrate" and "migrate (rewrite)" buckets and finish them. + +**Replace conftest boilerplate.** OpenInference conftests duplicate the +exporter/provider/VCR plumbing that lives here in +`opentelemetry-test-util-genai`. Don't copy OpenInference's — mirror an +existing package's conftest (e.g. +`instrumentation/opentelemetry-instrumentation-genai-openai/tests/conftest.py`), +which registers the shared fixtures as plugins: + +```python +pytest_plugins = [ + "opentelemetry.test_util_genai.fixtures", + "opentelemetry.test_util_genai.vcr", +] +``` + +The lib-specific conftest then adds only: `vcr_config` (per-package +`filter_headers` and `before_record_response`), an `environment` autouse +for the lib's API-key env var, library-client fixtures (e.g. +`openai_client` / `async_openai_client`), and the `instrument_*` fixtures +(`instrument_no_content`, `instrument_with_content`, `instrument_event_only`) +built on the shared `instrument` context manager from +`opentelemetry.test_util_genai.instrumentor` (see [AGENTS.md](../../../AGENTS.md) Tests). + +**Assertions — no shared helper module.** There is no +`opentelemetry.test_util_genai.assertions`. Assert directly on +`span.attributes[GenAIAttributes.GEN_AI_…]` using the semconv constants +from `opentelemetry.semconv._incubating.attributes.gen_ai_attributes`, and +on metric/log records from the in-memory exporters. Factor repeated checks +into a per-package `tests/test_utils.py` (existing packages have helpers +like `assert_all_attributes`, `assert_completion_attributes`, +`assert_messages_attribute`, plus weather-tool fixtures). OpenInference +helpers (`_check_llm_attributes`, etc.) map onto these — rewrite them on +top of OTel semconv constants and parsed `gen_ai.*.messages` JSON, and keep +tiny constants (`DEFAULT_MODEL`, sample prompts) inline or in +`tests/test_utils.py`. + +**Required unit-test coverage per wrapped method.** Apply the repo test +matrix (sync/async × happy/error, plus streaming × happy/error where the +method streams — see [AGENTS.md](../../../AGENTS.md) Tests section) +to **every** method patched in step 5. For the port these are blockers for +the migration PR, not follow-up. The error variants must verify the +original exception is re-raised, `error.type` is recorded, and span status +is ERROR. + +**`tests/requirements.{latest,oldest}.txt`** — OpenInference typically has its own +pin file; keep only the third-party version pins (`openai==`, +`anthropic==`, `pydantic==`, `pytest==`, …). Drop every `-e ` line +— this repo's uv workspace already installs all members editable via +`uv sync --all-packages`. Drop `requirements.pydantic1.txt`-style +side-channel files entirely. + +### 8. Conformance scenarios + +Author conformance scenarios using the **`write-conformance-tests`** skill — +it's the generic procedure (scenario modules, the `test_conformance.py` +runner, declared gaps, lib-specific assertions, weaver policies) and applies +to any instrumentation. Port-specific notes on top of that skill: + +- Drop OpenInference's `examples/` tree — its end-to-end demos are replaced + by conformance scenarios, not ported. +- For an operation blocked by a util-genai/semconv gap, point the + `expected_violations` / `xfail` `reason=` at the gap row in + `MIGRATION_REPORT.md`. + +### 9. Cassettes + +- Copy cassettes from OpenInference's `tests/cassettes/` (or wherever the OpenInference package + parks them) into the port's `tests/cassettes/`. Reuse names so existing + unit tests keep loading them. +- Reuse existing cassettes for conformance scenarios when they are applicable. + +### 10. Workspace integration + +Wire the new package into the workspace, `tox.ini`, and pyright per the +**Adding a package to the workspace** section of [AGENTS.md](../../../AGENTS.md) +— it applies to any new package, not just ports. Port-specific note on top: + +- **Leave the package out of `[tool.pyright] include`.** A port over untyped + `wrapt` boundaries (`wrapped, instance, args, kwargs`) and vendor SDK members + produces hundreds of strict-mode errors, so don't add it to `include` until + typing lands — track that as a follow-up. + +### 11. Local checks, review, and PR + +Run the pre-PR checks from the **Commands** section of +[AGENTS.md](../../../AGENTS.md) — `tox -e precommit`, `tox -e typecheck`, and +the package's `-{oldest,latest}` (and `-conformance`) test envs. + +Open the PR with the `migration:openinference` label. Run the +`review-ported` skill locally to generate `MIGRATION_REPORT.md`; iterate +until §4 (test coverage) is clean. The review skill compares the port +against OpenInference (or any upstreams you name), so coverage gaps +surface in one report. + +## See also + +- [AGENTS.md](../../../AGENTS.md) — general repo rules that already apply to the port. +- `util/opentelemetry-util-genai/AGENTS.md` — util-genai usage rules. +- `.github/skills/write-conformance-tests/SKILL.md` — generic conformance-scenario authoring (step 8). +- `.github/skills/review-ported/SKILL.md` — sister review skill (writes `MIGRATION_REPORT.md`). diff --git a/.github/skills/review-ported/SKILL.md b/.github/skills/review-ported/SKILL.md new file mode 100644 index 00000000..64457d71 --- /dev/null +++ b/.github/skills/review-ported/SKILL.md @@ -0,0 +1,251 @@ +--- +name: review-ported +description: Review a ported instrumentation-genai package by comparing it against any known external upstream implementations of the same instrumentation (OpenInference, vendor-specific). Writes MIGRATION_REPORT.md in the migrated package root. +--- + +# Review a ported instrumentation-genai package + +Compare the migrated package in `instrumentation//` against +every known upstream implementation of the same instrumentation, and +write a `MIGRATION_REPORT.md` in the migrated package root listing only +what's missing or wrong. + +## Inputs and upstreams + +User supplies the **migrated package name**, e.g. +`opentelemetry-instrumentation-genai-openai`. Strip +`opentelemetry-instrumentation-genai-` to get ``; that drives the +upstream lookup. + +**Upstreams come from the user, with a default fallback:** + +- **If the user names one or more upstreams** (a repo, a directory path, a + vendor's own SDK-side instrumentation, a Logfire / Pydantic AI package, + …), compare against exactly those — one column per upstream. Use them + verbatim; don't second-guess or substitute. +- **If the user names none**, default to OpenInference: search + `https://github.com/open-telemetry/donation-openinference` under + `python/instrumentation/` for a package matching `` (typically + `openinference-instrumentation-`, but confirm by listing the + directory — the name may differ). Fetch a shallow clone if you don't + already have one locally: + + ```sh + git clone --depth=1 https://github.com/open-telemetry/donation-openinference.git /tmp/openinference + ls /tmp/openinference/python/instrumentation/ | rg + ``` + +If there are no upstreams to compare against (no user-named upstream resolves and no +OpenInference match), this isn't a port — bail with a one-line note. + +## Rules + +- **Deliverable is `instrumentation//MIGRATION_REPORT.md`.** Write + there, not stdout. The file is gitignored — regenerate freely. +- **Report problems, not work.** No "✅ matches", no "all tests migrated", + no recap of structure. The PR diff already shows what's there; the + report is for what's missing or wrong. +- **Don't justify empty findings.** When a section/table has no problems, + emit `_none_` (or skip the section, where the rule says so) and stop. + Do not list which greps you ran, which env vars you searched for, or + why you concluded clean. Clean means clean. +- **Tables only when there are ≥1 problem rows.** Empty section: `_none_`. +- **Read, don't guess.** Every claim from actual code or command output. + Don't infer from package names or READMEs. Before listing something as + missing, grep for it in the port's `src/` and `tests/`. +- **Snapshot of the PR head**, not aspirational state. + +## Report structure + +First line: + +```markdown +# Migration review: + +Compared against: +- OpenInference: `openinference-instrumentation-` (add link) +- : ``, `` (omit the line if no other upstream + is in scope) +``` + +Sections render in order. Each is bounded to its problems. + +### 1. Instrumented API surface + +Table of every API method any source patches, marking methods the +migrated package does **not** patch. **Be exhaustive.** Enumerate every +endpoint upstream emits a span for, even if attribute extraction +is generic. Do not collapse rows. + +For each upstream, walk the actual instrumentation code, not docs: + +- **Method-level patching** (the shape of this port, and of any + method-level upstream): read every `wrap_function_wrapper(...)` call in + `_instrument()`. Each call = one row. +- **Transport-level patching** (typical of OpenInference, e.g. + `openai.OpenAI.request` / `AsyncOpenAI.request`): enumerate every + `cast_to` response type the accumulator/dispatch table handles **and** + every other endpoint that flows through the wrapped method (assistants, + threads, files, fine-tuning, images, audio, vector_stores, batches, + uploads, moderations, …). Spans with only generic attribute extraction + still count — list them, and note "generic span only" in the Notes. +- **Anything else** (vendor SDK hooks, monkey-patched class methods, + decorator-based instrumentation): walk the actual entry points the + upstream registers and list one row per emitted span site. + +One column per upstream that exists, plus `This port`: + +| API method | OpenInference | This port | Notes | +|---|---|---|---| +| `openai.resources.chat.completions.Completions.create` | ✅ | ✅ | | +| `openai.resources.responses.Responses.create` | ✅ | ❌ | | +| `openai.resources.beta.assistants.Assistants.create` | ✅ (generic span only) | ❌ | | + +- ✅ = patched. ❌ = at least one upstream patches it, this port doesn't. + — = not patched (and that's expected — upstream doesn't patch it either). +- Sort `❌` rows to the top. +- For `❌` rows, the **Notes** cell must name the GenAI semconv operation + the method maps to (`chat`, `embeddings`, `text_completion`, + `invoke_agent`, `invoke_workflow`, `execute_tool`, `realtime`, …) — or + "no semconv operation defined yet" if the spec hasn't caught up. + +Do not render a separate `**Gaps:**` bullet list below the table — the +table itself is the gap list. + +### 2. Gaps and open issues + +Genuine **tooling/util gaps** — things the port couldn't do because +`opentelemetry-util-genai` and/or the GenAI semconv doesn't yet support them +(missing util-genai factory, attribute not in the registry yet). +Reference failing/skipped tests if any. + +Do NOT list "OTel semconv doesn't cover X" as a gap when X is just an +inference API variant — semconv applies to all inference APIs. + +| Gap | File / test | Upstream issue | Notes | +|---|---|---|---| + +### 3. Significant behavioral changes + +Material differences vs upstream — equivalences (different code shape, +same emitted telemetry) are not listed. Look for: patching strategy +(transport-level vs method-level); error recording (exception event vs +`error.type` + status); token counting (cache details folded vs +separate); message format (flat OpenInference attributes vs JSON `parts` array); +streaming wrapper shape; attributes set unconditionally vs gated on +`is_recording()` / `should_capture_content()`; anything emitted by +upstream that this port deliberately drops without a one-line rationale. + +| Aspect | Upstream | This port | Notes | +|---|---|---|---| + +### 4. Test coverage + +Three checklists. Render only **missing** cells; if every checklist is +clean, render `_Test coverage complete._` and skip the subsections. + +Gaps in this section are **blockers for the migration PR**, not +follow-up work — address them before merge. Do not list §4 items as +follow-up issues in §5. + +Also flag here: **🟡 missing-cassette** — any scenario or unit test that +references a cassette not committed under `tests/cassettes/`. And: +**unreferenced cassettes** — one-line count of `tests/cassettes/*.yaml` +files no test/scenario opens. Skip if 0. + +For each unreferenced cassette, walk git history before naming a cause: + +1. Check whether the same cassette is also unreferenced in the upstream + it came from (for each unreferenced file, search the upstream's + `tests/` for any reference). If yes, it's an **inherited upstream + orphan** — say so plainly; the port is not responsible. +2. If the cassette IS referenced upstream but not in the port, the port + dropped a test. That is **not** a §4b cosmetic note — it's a §4a + missing variant (or a missing scenario, depending on what the test + covered). Add it to §4a's table with the upstream test name in Notes, + and do not rationalize ("tests were renamed/removed during the port" + without naming the commit is speculation, not analysis). + +Never write "appear to be" / "likely" / "safe to delete" without +evidence. Either you walked the history and know, or you didn't — in +which case say "history not checked" so the reviewer does it. + +#### 4a. Unit-test matrix per wrapped method + +For each method in §1 row where `This port` = ✅: + +| Variant | Required when | +|---|---| +| **sync × happy** | always | +| **sync × error** | always — verify the original exception is re-raised unmodified, `error.type` is recorded, span status is ERROR | +| **async × happy** | the lib exposes an async counterpart | +| **async × error** | the lib exposes an async counterpart | +| **streaming × happy** | the method accepts `stream=True` or returns a stream wrapper | +| **streaming × error** | flag if streaming is supported but no error path is exercised at all or error is not recorded on telemetry | +| **async streaming × happy** | the method accepts `stream=True` or returns an async stream wrapper | +| **async streaming × error** | flag if streaming is supported but no error path is exercised at all or error is not recorded on telemetry | + +Identify variants by reading the port's `src/` (`is_streaming(kwargs)`, +async `def`, `Stream` / `AsyncStream` wrappers). + +| Wrapped method | Missing variants | Notes | +|---|---|---| + +#### 4b. Conformance scenarios + +For each distinct GenAI semconv operation the port emits (`chat`, +`embeddings`, `execute_tool`, `invoke_agent`, `invoke_workflow`, +`create_agent`, …) there should be at least one happy-path scenario file under +`tests/conformance/.py` driven by `run_conformance(...)`. More +scenarios per operation are fine but never required. + +| Operation | Scenario file | Status | +|---|---|---| + +Mention if conformance scenatio is skipped, there are expected_violations, +or `uv run tox -e py312-test-instrumentation-genai-` fails. + +#### 4c. Docstring / README coverage + +| Asset | Required content | Status | +|---|---|---| +| `README.rst` | install snippet, usage snippet importing from `opentelemetry.instrumentation.genai.`, pointer to `OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` and the util-genai README, pointer to `tests/conformance/` (no `examples/`) | | +| `src/opentelemetry/instrumentation/genai//__init__.py` module docstring | usage example using the new module path; configuration section listing the env vars users actually need | | + +Render only rows that are missing/wrong/stale (e.g. README still imports +from `openinference.instrumentation.`, install command points +at the `openinference-instrumentation-` package name, links to an +OpenInference URL). + +**Refactor misses.** One bullet *only if* unit tests re-implement generic +semconv shape checks inline instead of factoring them into the port's +`tests/test_utils.py` helpers (e.g. `assert_all_attributes`, +`assert_completion_attributes`, `assert_messages_attribute`). There is no +shared `opentelemetry.test_util_genai.assertions` module — assert directly +on `span.attributes[GenAIAttributes.GEN_AI_…]` using the semconv constants. + +### 5. Follow-up work + +What goes in this PR vs. a follow-up: + +- **In this PR**: §4 test coverage gaps. These block merge. +- **Follow-up**: §1 ❌ rows (new instrumented methods), §2 util-genai / + semconv gaps, §3 behavioral parity items — each a **separate** PR, one + logical change apiece. + +Suggest an issue title per follow-up item, grouped by type (API surface / +util-genai gaps / behavioral parity); name the upstream that covers it and +the semconv operation where relevant. **Do not file the issues** — listing +them is the deliverable; filing AI-generated issues is against the +contributor policy, so the human author decides which to open. If there +are none, render `_No follow-up issues recommended._` + +## See also + +- `.github/skills/port-from-openinference/SKILL.md` — the port skill; it + runs this review at its final step to produce `MIGRATION_REPORT.md`. +- `.github/skills/write-conformance-tests/SKILL.md` — authoring the + conformance scenarios this report checks in §4b. +- `.github/instructions/instrumentation.instructions.md` — the copilot + PR-review rules for `instrumentation/**`; generic instrumentation + violations are flagged there and not repeated in this report. \ No newline at end of file diff --git a/.github/skills/write-conformance-tests/SKILL.md b/.github/skills/write-conformance-tests/SKILL.md new file mode 100644 index 00000000..0958f990 --- /dev/null +++ b/.github/skills/write-conformance-tests/SKILL.md @@ -0,0 +1,262 @@ +--- +name: write-conformance-tests +description: Author GenAI conformance-test scenarios for an OpenTelemetry instrumentation package — Scenario subclasses under tests/conformance/, the test_conformance.py runner, declared gaps, lib-specific assertions, and weaver live-check policies. Use when adding or updating conformance tests for any instrumentation, whether greenfield or ported. +--- + +# Write GenAI conformance tests + +Conformance tests validate that an instrumentation package emits telemetry +matching the [GenAI semantic conventions](https://github.com/open-telemetry/semantic-conventions-genai) +via Weaver live-check. They apply to **any** instrumentation package — +greenfield or ported — and don't depend on how the package was built. + +This skill covers authoring the `tests/conformance/.py` modules and +the `tests/test_conformance.py` runner. For the always-on rules that hold even +without this skill loaded, see the **Conformance tests** section of +[AGENTS.md](../../../AGENTS.md). + +## One scenario per operation + +Put one scenario per emitted semconv operation under `tests/conformance/`. +Write a scenario for **every** semconv operation the library emits, even one +currently blocked by a util-genai or semconv gap. Skipping the scenario hides +the gap; writing it records the gap (see [Declared gaps](#declared-gaps)). +**Never** drop a scenario file because it would fail today. + +## Recommended scenarios + +Cover the scenarios below that apply to the library. Skip a row only when the +library genuinely can't perform that operation (e.g. an inference-only +client has no `embeddings` scenario). + +**LLM client instrumentations:** + +| Scenario | File | Covers | +|---|---|---| +| Inference | `inference.py` | A `chat` operation. | +| Tool calling | `tool_calling.py` | A `chat` turn where the model returns tool calls and a follow-up turn feeds tool results back. Asserts tool calls and tool results are present on input/output **messages**. weaver will validate the format. Do **not** expect `execute_tool` spans unless the client library itself instruments tool execution — most don't; tool execution is the caller's code. | +| Embeddings | `embedding.py` | An `embeddings` operation. | + +**Agent / orchestration instrumentations:** + +| Scenario | File | Covers | +|---|---|---| +| Agent invocation with tooling | `invoke_agent.py` | An `invoke_agent` run that calls at least one tool — expects `invoke_agent` plus the nested `execute_tool` / `chat` spans the framework emits. | +| Multi-agent orchestration | `multi_agent.py` | One agent handing off to / invoking another — expects nested `invoke_agent` spans under the orchestrator. | +| Workflows | `invoke_workflow.py` | An `invoke_workflow` run wrapping the agent/tool spans it drives. | + +## Scenario modules + +Each scenario module defines a subclass of `Scenario` from +`opentelemetry.test_util_genai.conformance`. It sets the `expected_spans` / +`expected_metrics` ClassVars and implements +`run(self, *, tracer_provider, meter_provider, logger_provider, vcr)`. +Drive instrumentation through the shared `instrument` context manager (not +`instr.instrument()` / `trace.set_tracer_provider`); the runner injects the +already-configured `vcr`, so scenarios just call `vcr.use_cassette(...)`: + +```python +# tests/conformance/inference.py +from typing import Any + +from opentelemetry.instrumentation.genai. import Instrumentor +from opentelemetry.sdk._logs import LoggerProvider +from opentelemetry.sdk.metrics import MeterProvider +from opentelemetry.sdk.trace import TracerProvider +from opentelemetry.test_util_genai.conformance import Scenario +from opentelemetry.test_util_genai.instrumentor import instrument + + +class InferenceScenario(Scenario): + expected_spans = ("chat",) + expected_metrics = ( + "gen_ai.client.operation.duration", + "gen_ai.client.token.usage", + ) + + def run( + self, + *, + tracer_provider: TracerProvider, + meter_provider: MeterProvider, + logger_provider: LoggerProvider, + vcr: Any, + ) -> None: + with instrument( + Instrumentor(), + tracer_provider=tracer_provider, + logger_provider=logger_provider, + meter_provider=meter_provider, + content_capture="SPAN_ONLY", + ): + with vcr.use_cassette("inference.yaml"): + ... # call the patched API +``` + +One operation per scenario. No env vars, no logging config. + +## Declared gaps + +**Declared gaps** go in the `expected_violations` ClassVar (a tuple of +`ExpectedViolation`), not `xfail`. `run_conformance` fails on *undeclared* +weaver violations and on declared violations weaver no longer reports — so +a known util-genai/semconv gap is recorded as an `expected_violation` that +fails loudly the moment it's fixed. + +When the gap is too big to capture as individual `expected_violations` — the +whole operation can't run yet — skip the entire scenario instead, via +`pytest.mark.xfail` / `skip` on the parametrize entry in +`test_conformance.py`. Don't invent a one-off `reason=` string: **ask the +user to file a tracking bug first**, then use that issue as the skip +`reason=` (e.g. `reason="blocked by open-telemetry/...#1234"`) so the skip +is traceable and gets revisited when the bug is fixed. **Never** drop the +scenario file itself — a skipped scenario still records that the operation +exists; a deleted one hides it. + +## Lib-specific assertions + +**Lib-specific assertions** go in a `validate(self, report)` override on the +scenario (call `super().validate(report)` first) — there is no +`_local_assertions.py` / `LocalAssertions` hook. Common lib-specific shapes: + +- **Vendor server-tool payloads** (`code_interpreter`, `web_search`, …). +- **Vendor-specific finish reasons** outside semconv's enum (`stop`, + `length`, `content_filter`, `tool_call`, `error`). +- **Provider-specific `error.type`** — exception class names from the + underlying SDK. + +`validate` receives the weaver `LiveCheckReport`. Read span samples from +`report["samples"]` (each `entry["span"]["attributes"]` is a list of +`{"name", "value"}`) and seen metrics from +`report["statistics"]["seen_registry_metrics"]`. Always call +`super().validate(report)` first so the `expected_spans` / `expected_metrics` +checks still run: + +```python +# tests/conformance/tool_calling.py +from __future__ import annotations + +import json +from typing import Any + +from opentelemetry.test.weaver_live_check import LiveCheckReport +from opentelemetry.test_util_genai.conformance import Scenario + + +class ToolCallingScenario(Scenario): + expected_spans = ("chat",) + expected_metrics = ("gen_ai.client.operation.duration",) + + def run(self, *, tracer_provider, meter_provider, logger_provider, vcr) -> None: + ... # drive a tool-calling turn — see "Scenario modules" above + + def validate(self, report: LiveCheckReport) -> None: + super().validate(report) # keep the expected_spans / _metrics checks + + # Lib-specific: weaver validates the *shape* of each message part, but + # not that a tool call actually round-tripped. Assert the model's + # tool_call landed on an output message and the tool result was fed + # back on an input message (across the two chat turns). + chat_spans = [ + entry["span"] + for entry in report["samples"] + if "span" in entry + and _attr(entry["span"], "gen_ai.operation.name") == "chat" + ] + assert chat_spans, "no chat span emitted" + + output_part_types = { + t for span in chat_spans + for t in _part_types(_attr(span, "gen_ai.output.messages")) + } + input_part_types = { + t for span in chat_spans + for t in _part_types(_attr(span, "gen_ai.input.messages")) + } + assert "tool_call" in output_part_types, ( + f"expected a tool_call part on an output message, saw {output_part_types}" + ) + assert "tool_call_response" in input_part_types, ( + f"expected a tool_call_response part on an input message, saw {input_part_types}" + ) + + +def _attr(span: dict[str, Any], name: str) -> Any: + for attr in span["attributes"]: + if attr["name"] == name: + return attr["value"] + return None + + +def _part_types(messages_json: str | None) -> list[str]: + # gen_ai.{input,output}.messages is a JSON string of + # [{"role": ..., "parts": [{"type": ..., ...}]}]. + messages = json.loads(messages_json) if messages_json else [] + return [part["type"] for message in messages for part in message["parts"]] +``` + +## The test_conformance.py runner + +`tests/test_conformance.py` guards collection with +`pytest.importorskip("opentelemetry.test.weaver_live_check")`, parametrizes +the scenario *instances*, and calls `run_conformance(scenario, vcr=vcr, +weaver=weaver_live_check)` — it builds its own providers from the weaver +OTLP endpoint, so don't pass providers/exporters: + +```python +import pytest + +pytest.importorskip("opentelemetry.test.weaver_live_check") + +from opentelemetry.test.weaver_live_check import WeaverLiveCheck # noqa: E402 +from opentelemetry.test_util_genai.conformance import ( # noqa: E402 + Scenario, + run_conformance, +) + +from .conformance.embedding import EmbeddingScenario +from .conformance.inference import InferenceScenario + + +@pytest.mark.parametrize( + "scenario", + [InferenceScenario(), EmbeddingScenario()], + ids=lambda s: type(s).__name__, +) +def test_conformance( + scenario: Scenario, vcr, weaver_live_check: WeaverLiveCheck +) -> None: + run_conformance(scenario, vcr=vcr, weaver=weaver_live_check) +``` + +Do not write a separate `test_weaver_live.py`; weaver is already wired +through `run_conformance`. The `weaver_live_check` fixture skips the test +(rather than passing without the gate) only when it can't start (unsupported +platform / network) — never wrap it in a try/except, which would silently +disable the gate. + +## Weaver policies + +`weaver_live_check` enforces `policies/genai_content_validation.rego` +(content-attribute JSON shape) and `policies/genai_span_validation.rego` +(span name format, per-op expected attributes) — read these policy files +when authoring scenarios; they're authoritative. + +`weaver_live_check` downloads the pinned weaver binary on first use (cached +under `~/.cache/otel-conformance/weaver/`). + +## Cassettes + +A scenario loads one cassette per operation via +`vcr.use_cassette(".yaml")`. + +## Running + +```sh +uv run tox -e py312-test-instrumentation-genai--conformance +``` + +The `*-conformance` tox envs target `tests/test_conformance.py` directly; the +regular `*-{oldest,latest}` envs `--ignore` it so they don't need the +OTLP/gRPC exporter or `weaver_live_check`. + diff --git a/.gitignore b/.gitignore index fa7bfddc..631834e7 100644 --- a/.gitignore +++ b/.gitignore @@ -73,3 +73,6 @@ policies/_schemas.rego # Conformance test weaver live-check reports weaver_reports/ + +# Migration review reports generated by the review-ported skill +MIGRATION_REPORT.md From 05353cc61728f321814961cad22669c5178f87db Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Mon, 15 Jun 2026 23:40:24 -0700 Subject: [PATCH 2/5] up --- .claude/skills | 1 + .../skills/port-from-openinference/SKILL.md | 218 ++++++++++++++++-- .github/skills/review-ported/SKILL.md | 73 +++++- .../skills/write-conformance-tests/SKILL.md | 110 ++++++++- 4 files changed, 369 insertions(+), 33 deletions(-) create mode 120000 .claude/skills diff --git a/.claude/skills b/.claude/skills new file mode 120000 index 00000000..3e73f3a3 --- /dev/null +++ b/.claude/skills @@ -0,0 +1 @@ +../.github/skills \ No newline at end of file diff --git a/.github/skills/port-from-openinference/SKILL.md b/.github/skills/port-from-openinference/SKILL.md index 71bc577d..b9144c52 100644 --- a/.github/skills/port-from-openinference/SKILL.md +++ b/.github/skills/port-from-openinference/SKILL.md @@ -1,20 +1,29 @@ --- name: port-from-openinference -description: Port an openinference-instrumentation-* package from https://github.com/open-telemetry/donation-openinference into this repo as a new package. Use when a user asks to migrate a package from OpenInference. +description: Port an openinference-instrumentation-* package from https://github.com/open-telemetry/donation-openinference into this repo. Creates a new package, or — when a package for the library already exists in the repo — augments it with the coverage OpenInference adds on top. Use when a user asks to migrate or port a package from OpenInference. --- # Port an OpenInference `instrumentation-*` package Migrate an `openinference-instrumentation-` package from https://github.com/open-telemetry/donation-openinference into this -repo as a new package under `instrumentation/`. This is a **new implementation** -that emits OTel GenAI semantic conventions through `opentelemetry-util-genai`. +repo. The result emits OTel GenAI semantic conventions through +`opentelemetry-util-genai`. -The major -work items are: rewriting the patcher to method-level (step 5), mapping every -request/response shape into OTel `InputMessage`/`OutputMessage` parts -(step 6), and migrating the unit-test corpus while filtering openinference-framework -plumbing tests out (step 7). +Two modes, decided by the "Before you start" gate below: + +- **Greenfield port** — no package for the library exists yet. Create a + **new implementation** under `instrumentation/`. The default, and what the + bulk of this skill describes. +- **Augment an existing package** — the repo already ships + `opentelemetry-instrumentation-genai-`. Don't re-create it; inventory + what it covers, diff against OpenInference, and add **only the missing + parts**. See [Augment mode](#augment-mode-the-package-already-exists). + +For a greenfield port the major work items are: rewriting the patcher to +method-level (step 5), mapping every request/response shape into OTel +`InputMessage`/`OutputMessage` parts (step 6), and migrating the unit-test +corpus while filtering openinference-framework plumbing tests out (step 7). ## Inputs @@ -35,9 +44,75 @@ User may also provide ***target package name**. If not provided: derive it from - `openinference-instrumentation-anthropic` → `opentelemetry-instrumentation-genai-anthropic` Confirm the chosen name with the user. +## Before you start: is there already a package for this library? + +Once the target name is settled, check whether the repo already ships it: + +```sh +ls instrumentation/opentelemetry-instrumentation-genai- 2>/dev/null +``` + +- **Exists →** **augment mode**: don't scaffold a new package. OpenInference + is now a *second* reference to mine for coverage the existing package + lacks. Jump to [Augment mode](#augment-mode-the-package-already-exists). +- **Doesn't exist →** greenfield port; continue below. + +If a near-name sibling (a `-agents` / `-client` suffix) might instrument a +*different* surface of the same vendor, confirm the target name with the user +before deciding. + +## Before you start: check for native OTel instrumentation + +AI SDKs increasingly ship their **own** OpenTelemetry GenAI instrumentation. +When they do, porting the OpenInference package is redundant. So +before writing any code, determine whether the instrumented library is +self-instrumenting. + +```sh +# 1. Does the SDK depend on the OTel API / semconv? +pip show | grep -i requires # or read its pyproject / METADATA +# a dependency on opentelemetry-api or opentelemetry-semantic-conventions +# is the tell. +# 2. Does its source actually emit GenAI spans? +python -c "import , os; print(os.path.dirname(.__file__))" +rg -l "opentelemetry|semconv\._incubating\.attributes\.gen_ai" / +``` + +A dependency on `opentelemetry-api` (or `-semantic-conventions`) **plus** +`gen_ai_attributes` usage in the SDK source means the library is +self-instrumented. Confirm empirically: set a **global** `TracerProvider` +(native hooks often activate only when a real, non-proxy provider is +configured), make one call, and inspect the emitted spans' instrumentation +scope. + +**If the library is self-instrumented, do NOT port the OpenInference +package.** Pivot the work: + +1. **Ignore the OpenInference instrumentation entirely** — the vendor owns + the spans; there is nothing to re-implement, and no `src/` instrumentor / + patcher to write. +2. **Write conformance tests against the native instrumentation.** Follow + step 8 / the `write-conformance-tests` skill, but each scenario's `run()` + configures providers and enables the **native** tracer (e.g. sets a global + `TracerProvider`) instead of calling a `*Instrumentor` — then runs the + emitted telemetry through weaver live-check. +3. **Identify gaps / inconsistencies** between the native output and the + GenAI semconv: missing operations, wrong operation name, legacy/duplicate + attributes, no metrics, no content-capture controls, no util-genai + content modes / completion-hook / upload support, etc. Record each as an + `expected_violation` or a documented skip, same as a normal port. +4. **Write `MIGRATION_REPORT.md`** stating the library is self-instrumented, + the conformance results, and the gap list — that report is the + deliverable. **Stop and surface the finding to the user.** Do not build a + competing package unless they explicitly decide to (e.g. to suppress + native instrumentation and layer util-genai features on top). + +Only when the library has **no** native OTel instrumentation do you continue +with the migration flow below. + ## Reference material -- **OTel GenAI spans**: — authoritative attribute names and operation enum. +- **OTel GenAI spans**: — authoritative attribute names, spans, logs, and metrics definitions. - **OpenInference → OTel attribute mapping** (Arize-maintained): . Use as a quick lookup for what an OpenInference attribute *roughly* corresponds to in OTel; when the mapping disagrees with the official semconv, **the official semconv wins**. - **Message JSON schemas**: - input messages: @@ -91,23 +166,96 @@ violate: 8. **Do not modify weaver policies.** +## Augment mode: the package already exists + +The package already has a working, conformant implementation; OpenInference +is just another reference to mine for missing coverage. The job is a tight, +delta-only PR that closes specific gaps — **not** a rewrite. + +All [Non-negotiable rules](#non-negotiable-rules) apply to every line you +add, plus two specific to this mode: + +- **Don't rewrite or "improve" existing code.** Leave the existing patcher, + wrappers, and tests alone unless OpenInference reveals a concrete bug — and + then it's a *separate* PR. No opportunistic refactors. +- **Match the existing package's conventions.** New wrappers, helpers, and + scenarios follow the patterns already there (helper names, wrapper + structure, conftest/fixture wiring, scenario shape). Don't add a second way + to do something the package already does. + +### A. Inventory what's already there + +Map the existing package before reading OpenInference (read, don't guess): +patched methods (`wrap_function_wrapper` / `unwrap` in `_instrument`), +request/response shapes its wrappers map, the unit-test matrix per method +(`rg -c '^\s*(async )?def test_' instrumentation//tests/`), +conformance scenarios under `tests/conformance/`, and cassettes. + +### B. Diff against OpenInference + +Run the OpenInference analysis as a greenfield port would (the reading behind +steps 5–6): every method it patches, every shape it parses. Subtract +inventory A. The remainder is the work-list: + +- Methods OpenInference patches that the package doesn't → new wrappers (step 5). +- Shape branches OpenInference handles that the wrappers drop → extend the + mapping (step 6). +- Scenarios OpenInference covers that the package lacks → new tests / + conformance (steps 7–8). + +Coverage both already have → skip. Coverage the package has but OpenInference +lacks → leave it; not a regression. + +### C. Add the delta + +Apply steps **5–10 to the new parts only**: + +- **Steps 1–3 (scaffold/rename/pyproject) skipped** — touch `pyproject.toml` + only for a genuinely new entry point or dependency range, `README.rst` only + for a new user-visible capability. +- **Step 4** applies to anything you copy in; nothing to excise from existing + code. +- **Steps 5–9** extend the existing wrappers / test utils / conformance + runner in place, not parallel ones. +- **Step 10** is done; revisit `tox.ini` / pyright only for a new test factor + or requirements file. + +### D. Report and review + +Write `MIGRATION_REPORT.md` via the `review-ported` skill as usual — it +detects augment mode. + ## Migration flow +> The numbered steps below are written for a **greenfield port**. In +> [augment mode](#augment-mode-the-package-already-exists) skip steps 1–3, +> and scope steps 5–10 to the delta from the inventory/diff (sections A–C +> above). + ### 1. Create the target package +Because the patcher, wrappers, and tests are all rewritten (steps 5–7), a +`cp -R` of the OpenInference tree mostly creates files you immediately +delete or overwrite. **Prefer scaffolding fresh** from the nearest existing +package (e.g. `opentelemetry-instrumentation-genai-anthropic`) and copy over from +OpenInference **only** what you actually reuse: + +- `LICENSE` (Apache-2.0). +- Reusable cassettes (step 9), if any. + ```sh -cp -R / instrumentation// -cd instrumentation/ -rm -rf .pytest_cache .tox .venv venv .vscode .DS_Store .claude .ruff_cache -find . -name __pycache__ -type d -exec rm -rf {} + -rm -f CHANGELOG.md # OpenInference's per-package history doesn't apply here +mkdir -p instrumentation//src/opentelemetry/instrumentation/genai/ +mkdir -p instrumentation//tests/conformance instrumentation//tests/cassettes +cp /LICENSE instrumentation//LICENSE ``` -Keep `LICENSE` (Apache-2.0). Don't carry over `examples/` directories or -OpenInference's `README.md` — both are rewritten below. Per-package -`CHANGELOG.md` is towncrier-generated at release time; don't carry one over, -and add a fragment for the new package per the **Changelog** section of -[AGENTS.md](../../../AGENTS.md). +Do **not** carry over `examples/`, OpenInference's `README.md`, or its +`CHANGELOG.md` (per-package changelogs are towncrier-generated at release +time). + +(If you do `cp -R` instead, clean it up afterwards: +`rm -rf .pytest_cache .tox .venv venv .vscode .DS_Store .claude .ruff_cache CHANGELOG.md` +and `find . -name __pycache__ -type d -exec rm -rf {} +`.) ### 2. Rename the Python module @@ -372,10 +520,8 @@ is ERROR. **`tests/requirements.{latest,oldest}.txt`** — OpenInference typically has its own pin file; keep only the third-party version pins (`openai==`, -`anthropic==`, `pydantic==`, `pytest==`, …). Drop every `-e ` line -— this repo's uv workspace already installs all members editable via -`uv sync --all-packages`. Drop `requirements.pydantic1.txt`-style -side-channel files entirely. +`anthropic==`, …). +Add current `util/opentelemetry-util-genai` and `instrumentation/opentelemetry-instrumentation-genai-` to the latest; for the oldest, use the oldest version of `opentelemetry-util-genai` that works (there is none released yet, but check). ### 8. Conformance scenarios @@ -390,12 +536,36 @@ to any instrumentation. Port-specific notes on top of that skill: `expected_violations` / `xfail` `reason=` at the gap row in `MIGRATION_REPORT.md`. -### 9. Cassettes +### 9. Cassettes (or a transport proxy) - Copy cassettes from OpenInference's `tests/cassettes/` (or wherever the OpenInference package parks them) into the port's `tests/cassettes/`. Reuse names so existing unit tests keep loading them. - Reuse existing cassettes for conformance scenarios when they are applicable. +- **AI-generated cassettes.** For a cassette OpenInference lacks and you + can't record (no provider access), you may synthesize one from the + provider's API reference via AI. Start it with a + `# TODO: this is generated by AI, re-record` comment, mention it in the PR, + and open a follow-up issue to re-record it against the real provider in CI. + +**Transport proxy instead of cassettes.** If the OpenInference unit tests mock +HTTP (e.g. `respx`, `httpx.MockTransport`) rather than replay recorded +cassettes, you may do the same in the port's **unit** tests — build the SDK +client with an `httpx.MockTransport` (or equivalent) returning canned +responses instead of `@pytest.mark.vcr`. When you go this route: + +- Reuse the OpenInference cassettes' recorded **response bodies** (incl. the + streaming SSE payloads) as the canned responses, so fidelity is preserved. +- Still register the shared VCR plugins in `conftest.py` (the shared + `fixture_vcr` is autouse) and keep `vcr_config`. +- **Use the same mechanism in conformance scenarios.** Conformance does not + require VCR — a scenario can build its client with the same transport mock + and ignore the injected `vcr` (see the `write-conformance-tests` skill). + Pick one mechanism (cassettes *or* transport mock) and use it consistently + across the whole package. +- **Mention the choice in `MIGRATION_REPORT.md`** (the `review-ported` skill + flags missing cassettes; note that the package mocks the transport by + design so the absence is not a gap). ### 10. Workspace integration diff --git a/.github/skills/review-ported/SKILL.md b/.github/skills/review-ported/SKILL.md index 64457d71..8444d39c 100644 --- a/.github/skills/review-ported/SKILL.md +++ b/.github/skills/review-ported/SKILL.md @@ -1,6 +1,6 @@ --- name: review-ported -description: Review a ported instrumentation-genai package by comparing it against any known external upstream implementations of the same instrumentation (OpenInference, vendor-specific). Writes MIGRATION_REPORT.md in the migrated package root. +description: Review a ported or augmented instrumentation-genai package by comparing it against any known external upstream implementations of the same instrumentation (OpenInference, vendor-specific). Handles both greenfield ports and augment-mode PRs that add coverage to a pre-existing package (checking the added parts, their consistency with existing code, and old-vs-new coexistence). Writes MIGRATION_REPORT.md in the migrated package root. --- # Review a ported instrumentation-genai package @@ -38,6 +38,25 @@ upstream lookup. If there are no upstreams to compare against (no user-named upstream resolves and no OpenInference match), this isn't a port — bail with a one-line note. +## Greenfield vs augment mode + +Detect the mode before reviewing — it changes what counts as a problem: + +- **Greenfield port** — the migration *created* the package; everything in + `instrumentation//` is the port. +- **Augment mode** — the package **already existed** and the migration only + *adds* coverage mined from the upstream. The diff modifies a pre-existing + package. + +Tell them apart from the PR diff / git history / unstaged changes: if the +package's `src/` and tests predate the migration commits, it's augment mode. + +In augment mode the deliverable shifts to: the **added** coverage's +completeness vs the upstream, its **consistency** with the pre-existing code, +and **old-vs-new coexistence** concerns. Do **not** flag pre-existing +coverage the upstream also has as missing. Per-section adjustments are inline +below. + ## Rules - **Deliverable is `instrumentation//MIGRATION_REPORT.md`.** Write @@ -62,12 +81,17 @@ First line: ```markdown # Migration review: +Mode: greenfield port | augment existing package + Compared against: - OpenInference: `openinference-instrumentation-` (add link) - : ``, `` (omit the line if no other upstream is in scope) ``` +State the mode on the second line. In augment mode, one sentence after it +naming what the PR adds (the gaps it closes) orients the reviewer. + Sections render in order. Each is bounded to its problems. ### 1. Instrumented API surface @@ -112,6 +136,17 @@ One column per upstream that exists, plus `This port`: Do not render a separate `**Gaps:**` bullet list below the table — the table itself is the gap list. +**Augment mode.** Split `This port` into `Existed` and `Added this PR`: + +| API method | OpenInference | Existed | Added this PR | Notes | +|---|---|---|---|---| +| `…chat.completions.Completions.create` | ✅ | ✅ | — | already covered | +| `…responses.Responses.create` | ✅ | ❌ | ✅ | added by this PR | +| `…batches.Batches.create` | ✅ | ❌ | ❌ | still missing — `chat` | + +`❌` in both columns (upstream patches it, the package still doesn't) is the +gap; sort it to the top. + ### 2. Gaps and open issues Genuine **tooling/util gaps** — things the port couldn't do because @@ -139,6 +174,30 @@ upstream that this port deliberately drops without a one-line rationale. | Aspect | Upstream | This port | Notes | |---|---|---|---| +### 3b. Consistency and old-vs-new coexistence (augment mode only) + +**Greenfield port: skip.** Here the risk isn't "is it conformant" but "do +the additions fit the package they landed in." Compare the **added** code +against the **pre-existing** code (not the upstream); flag only real +problems: + +- **Divergent patterns** — added wrappers/helpers/fixtures re-implementing + something the package already has a convention for. +- **Duplicated scaffolding** — added tests not reusing the package's + `tests/test_utils.py` helpers or conformance runner. +- **Telemetry contradictions** — an added method emitting an operation name, + attribute shape, span-kind, or content-capture gating that disagrees with + what pre-existing methods emit for the analogous case. +- **Coexistence hazards** — added code that alters pre-existing paths (shared + module-level state, `_instrument()` ordering, a dependency range the old + code wasn't tested against), or a `pyproject`/`tox` edit affecting existing + tests. + +| Concern | Pre-existing | Added | Notes | +|---|---|---|---| + +`_none_` if the additions are consistent and isolated. + ### 4. Test coverage Three checklists. Render only **missing** cells; if every checklist is @@ -148,6 +207,10 @@ Gaps in this section are **blockers for the migration PR**, not follow-up work — address them before merge. Do not list §4 items as follow-up issues in §5. +**Augment mode.** The matrix applies to methods **added this PR** (§1 +`Added this PR` = ✅) — those block merge. A pre-existing method missing a +variant is a §5 follow-up, not a §4 blocker. + Also flag here: **🟡 missing-cassette** — any scenario or unit test that references a cassette not committed under `tests/cassettes/`. And: **unreferenced cassettes** — one-line count of `tests/cassettes/*.yaml` @@ -228,10 +291,12 @@ on `span.attributes[GenAIAttributes.GEN_AI_…]` using the semconv constants. What goes in this PR vs. a follow-up: -- **In this PR**: §4 test coverage gaps. These block merge. +- **In this PR**: §4 test coverage gaps. These block merge. In augment mode, + scoped to the methods added this PR, plus any §3b consistency problem in the + added code (it ships here). - **Follow-up**: §1 ❌ rows (new instrumented methods), §2 util-genai / - semconv gaps, §3 behavioral parity items — each a **separate** PR, one - logical change apiece. + semconv gaps, §3 behavioral parity items, and — augment mode — pre-existing + coverage gaps from §4 — each a **separate** PR, one logical change apiece. Suggest an issue title per follow-up item, grouped by type (API surface / util-genai gaps / behavioral parity); name the upstream that covers it and diff --git a/.github/skills/write-conformance-tests/SKILL.md b/.github/skills/write-conformance-tests/SKILL.md index 0958f990..c1c8a6cd 100644 --- a/.github/skills/write-conformance-tests/SKILL.md +++ b/.github/skills/write-conformance-tests/SKILL.md @@ -35,6 +35,8 @@ client has no `embeddings` scenario). |---|---|---| | Inference | `inference.py` | A `chat` operation. | | Tool calling | `tool_calling.py` | A `chat` turn where the model returns tool calls and a follow-up turn feeds tool results back. Asserts tool calls and tool results are present on input/output **messages**. weaver will validate the format. Do **not** expect `execute_tool` spans unless the client library itself instruments tool execution — most don't; tool execution is the caller's code. | +| Multimodal content | `multimodal.py` | A `chat` turn carrying the **non-text parts** the client accepts (inline image/audio bytes, media URLs, file refs, …), asserting each round-trips onto the messages. Cover only the part types the library emits — see [Message-part coverage](#message-part-coverage). | +| Reasoning | `reasoning.py` | A `chat` turn against a reasoning model where the response carries reasoning/thinking content, asserting a `reasoning` part lands on an output message (and `gen_ai.usage.output_tokens` / reasoning-token attributes if the library records them). Only when the client surfaces reasoning content — see [Message-part coverage](#message-part-coverage). | | Embeddings | `embedding.py` | An `embeddings` operation. | **Agent / orchestration instrumentations:** @@ -45,6 +47,56 @@ client has no `embeddings` scenario). | Multi-agent orchestration | `multi_agent.py` | One agent handing off to / invoking another — expects nested `invoke_agent` spans under the orchestrator. | | Workflows | `invoke_workflow.py` | An `invoke_workflow` run wrapping the agent/tool spans it drives. | +## Message-part coverage + +Weaver validates a part's *shape*, not *which* part types a scenario +exercised — a text-only scenario leaves the package's image/audio/file/tool +mapping unverified. So exercise **every non-text part type the library can +emit** and assert it landed on a message. Cover only what the package +instruments: walk its wrappers (the step-6 mapping for a port) for which +`opentelemetry.util.genai.types` parts they produce. + +| Part `type` | util-genai type | Emitted when the library accepts… | +|---|---|---| +| `text` | `Text` | plain text (always) | +| `tool_call` / `tool_call_response` | `ToolCallRequest` / `ToolCallResponse` | function/tool calling — covered by `tool_calling.py` | +| `server_tool_call` / `server_tool_call_response` | `ServerToolCall` / `ServerToolCallResponse` | vendor server-side tools (web_search, code_interpreter, …) | +| `reasoning` | `Reasoning` | reasoning / thinking items | +| `blob` | `Blob` | inline image/audio/video **bytes** (`modality` distinguishes them) | +| `uri` | `Uri` | an external media **URL** (`modality`) | +| `file` | `File` | a **file reference** / id (`modality`) | +| `generic` | `GenericPart` | a provider item with no semconv mapping — flag, don't drop | + +Group by shared turn/cassette — typically one `multimodal.py` for the +image/audio/file/url inputs the client accepts, `tool_calling.py` for tool +parts, and `reasoning.py` for `reasoning` parts (a reasoning model emits +those on output messages, not input). `type` alone gives `blob` / `uri` / +`file`; to tell image from +audio from video, read the part's `modality` with a `_part_fields` helper +returning `(type, modality)` tuples (defined alongside `_part_types` in +[Lib-specific assertions](#lib-specific-assertions)): + +```python + def validate(self, report: LiveCheckReport) -> None: + super().validate(report) + chat_spans = [ + entry["span"] for entry in report["samples"] + if "span" in entry + and _attr(entry["span"], "gen_ai.operation.name") == "chat" + ] + input_parts = { + (t, m) for span in chat_spans + for t, m in _part_fields(_attr(span, "gen_ai.input.messages")) + } + # e.g. an inline image + an audio URL were sent + assert ("blob", "image") in input_parts, f"no image blob, saw {input_parts}" + assert ("uri", "audio") in input_parts, f"no audio uri, saw {input_parts}" +``` + +If a part type the library accepts can't round-trip yet (a util-genai/semconv +gap), still write the scenario and record it as a +[declared gap](#declared-gaps) — never silently omit the part. + ## Scenario modules Each scenario module defines a subclass of `Scenario` from @@ -52,8 +104,9 @@ Each scenario module defines a subclass of `Scenario` from `expected_metrics` ClassVars and implements `run(self, *, tracer_provider, meter_provider, logger_provider, vcr)`. Drive instrumentation through the shared `instrument` context manager (not -`instr.instrument()` / `trace.set_tracer_provider`); the runner injects the -already-configured `vcr`, so scenarios just call `vcr.use_cassette(...)`: +`instr.instrument()` / `trace.set_tracer_provider`). The runner injects an +already-configured `vcr`, so a cassette-based scenario just calls +`vcr.use_cassette(...)`: ```python # tests/conformance/inference.py @@ -95,6 +148,28 @@ class InferenceScenario(Scenario): One operation per scenario. No env vars, no logging config. +**VCR cassettes are not required — a transport mock works too.** Mock HTTP +however the package's **unit** tests already do, and use the **same pattern +across every scenario in that package** (don't mix cassettes and transport +mocks within one lib). If the package mocks the transport (e.g. +`httpx.MockTransport`, `respx`) instead of replaying cassettes, build the +client with that transport inside `run()` and ignore the injected `vcr`: + +```python + def run(self, *, tracer_provider, meter_provider, logger_provider, vcr) -> None: + with instrument( + Instrumentor(), + tracer_provider=tracer_provider, + logger_provider=logger_provider, + meter_provider=meter_provider, + content_capture="SPAN_ONLY", + ): + client = (transport=httpx.MockTransport(_handler)) # canned response + client.(...) # call the patched API +``` + +`vcr` stays in the signature either way (the runner always passes it). + ## Declared gaps **Declared gaps** go in the `expected_violations` ClassVar (a tuple of @@ -193,6 +268,17 @@ def _part_types(messages_json: str | None) -> list[str]: # [{"role": ..., "parts": [{"type": ..., ...}]}]. messages = json.loads(messages_json) if messages_json else [] return [part["type"] for message in messages for part in message["parts"]] + + +def _part_fields(messages_json: str | None) -> list[tuple[str, str | None]]: + # Like _part_types, but keeps modality so image/audio/video are + # distinguishable on blob/uri/file parts (None for parts without one). + messages = json.loads(messages_json) if messages_json else [] + return [ + (part["type"], part.get("modality")) + for message in messages + for part in message["parts"] + ] ``` ## The test_conformance.py runner @@ -245,10 +331,24 @@ when authoring scenarios; they're authoritative. `weaver_live_check` downloads the pinned weaver binary on first use (cached under `~/.cache/otel-conformance/weaver/`). -## Cassettes +## Recorded HTTP (cassettes or transport mock) + +A scenario replays one HTTP interaction per operation. Use whichever +mechanism the package's unit tests use, consistently across all of its +scenarios: + +- **VCR cassette** — `vcr.use_cassette(".yaml")`, one committed + cassette per operation under `tests/cassettes/`. +- **Transport mock** — build the SDK client with an `httpx.MockTransport` + (or `respx`) returning a canned response; no cassette file needed. + +Pick the one the lib already follows and don't mix the +two within a package. -A scenario loads one cassette per operation via -`vcr.use_cassette(".yaml")`. +**AI-generated cassettes.** Lacking provider access, you may synthesize a +cassette from the provider's API reference via AI. Start the cassette with a +`# TODO: this is generated by AI, re-record` comment, mention it in the PR, +and open a follow-up issue to re-record it against the real provider in CI. ## Running From 26d0894056fd5cd280fda950f75417bead09b5ff Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Tue, 16 Jun 2026 00:13:56 -0700 Subject: [PATCH 3/5] rename skills, update contrib --- .../SKILL.md | 62 +++++++++---------- .../SKILL.md | 50 +++++++-------- .gitignore | 2 +- CONTRIBUTING.md | 14 +++++ 4 files changed, 71 insertions(+), 57 deletions(-) rename .github/skills/{port-from-openinference => migrate-from-openinference}/SKILL.md (92%) rename .github/skills/{review-ported => review-migration}/SKILL.md (86%) diff --git a/.github/skills/port-from-openinference/SKILL.md b/.github/skills/migrate-from-openinference/SKILL.md similarity index 92% rename from .github/skills/port-from-openinference/SKILL.md rename to .github/skills/migrate-from-openinference/SKILL.md index b9144c52..1a018461 100644 --- a/.github/skills/port-from-openinference/SKILL.md +++ b/.github/skills/migrate-from-openinference/SKILL.md @@ -1,9 +1,9 @@ --- -name: port-from-openinference -description: Port an openinference-instrumentation-* package from https://github.com/open-telemetry/donation-openinference into this repo. Creates a new package, or — when a package for the library already exists in the repo — augments it with the coverage OpenInference adds on top. Use when a user asks to migrate or port a package from OpenInference. +name: migrate-from-openinference +description: Migrate an openinference-instrumentation-* package from https://github.com/open-telemetry/donation-openinference into this repo. Creates a new package, or — when a package for the library already exists in the repo — augments it with the coverage OpenInference adds on top. Use when a user asks to migrate or port a package from OpenInference. --- -# Port an OpenInference `instrumentation-*` package +# Migrate an OpenInference `instrumentation-*` package Migrate an `openinference-instrumentation-` package from https://github.com/open-telemetry/donation-openinference into this @@ -12,7 +12,7 @@ repo. The result emits OTel GenAI semantic conventions through Two modes, decided by the "Before you start" gate below: -- **Greenfield port** — no package for the library exists yet. Create a +- **Greenfield migration** — no package for the library exists yet. Create a **new implementation** under `instrumentation/`. The default, and what the bulk of this skill describes. - **Augment an existing package** — the repo already ships @@ -20,7 +20,7 @@ Two modes, decided by the "Before you start" gate below: what it covers, diff against OpenInference, and add **only the missing parts**. See [Augment mode](#augment-mode-the-package-already-exists). -For a greenfield port the major work items are: rewriting the patcher to +For a greenfield migration the major work items are: rewriting the patcher to method-level (step 5), mapping every request/response shape into OTel `InputMessage`/`OutputMessage` parts (step 6), and migrating the unit-test corpus while filtering openinference-framework plumbing tests out (step 7). @@ -55,7 +55,7 @@ ls instrumentation/opentelemetry-instrumentation-genai- 2>/dev/null - **Exists →** **augment mode**: don't scaffold a new package. OpenInference is now a *second* reference to mine for coverage the existing package lacks. Jump to [Augment mode](#augment-mode-the-package-already-exists). -- **Doesn't exist →** greenfield port; continue below. +- **Doesn't exist →** greenfield migration; continue below. If a near-name sibling (a `-agents` / `-client` suffix) might instrument a *different* surface of the same vendor, confirm the target name with the user @@ -64,7 +64,7 @@ before deciding. ## Before you start: check for native OTel instrumentation AI SDKs increasingly ship their **own** OpenTelemetry GenAI instrumentation. -When they do, porting the OpenInference package is redundant. So +When they do, migrating the OpenInference package is redundant. So before writing any code, determine whether the instrumented library is self-instrumenting. @@ -85,7 +85,7 @@ self-instrumented. Confirm empirically: set a **global** `TracerProvider` configured), make one call, and inspect the emitted spans' instrumentation scope. -**If the library is self-instrumented, do NOT port the OpenInference +**If the library is self-instrumented, do NOT migrate the OpenInference package.** Pivot the work: 1. **Ignore the OpenInference instrumentation entirely** — the vendor owns @@ -100,7 +100,7 @@ package.** Pivot the work: GenAI semconv: missing operations, wrong operation name, legacy/duplicate attributes, no metrics, no content-capture controls, no util-genai content modes / completion-hook / upload support, etc. Record each as an - `expected_violation` or a documented skip, same as a normal port. + `expected_violation` or a documented skip, same as a normal migration. 4. **Write `MIGRATION_REPORT.md`** stating the library is self-instrumented, the conformance results, and the gap list — that report is the deliverable. **Stop and surface the finding to the user.** Do not build a @@ -128,17 +128,17 @@ with the migration flow below. The repo-wide rules in [AGENTS.md](../../../AGENTS.md) already apply (telemetry through `opentelemetry-util-genai` public surface only, no `type: ignore`, semconv enums over string literals, re-raise caught -exceptions). The rules below are the ones the port is most likely to +exceptions). The rules below are the ones the migration is most likely to violate: 1. **Zero OpenInference dependencies.** No `openinference-instrumentation`, no `openinference-semantic-conventions`, no `openinference-*` anywhere - in the port's `src/` or `tests/`. Verify with + in the migrated package's `src/` or `tests/`. Verify with `rg openinference instrumentation/` — output must be empty. -2. **Public util-genai surface only.** Beyond the AGENTS.md rule, the port +2. **Public util-genai surface only.** Beyond the AGENTS.md rule, the migrated package must not import any `opentelemetry.util.genai._*` module — the allowed modules are enumerated in step 4. -3. **Ignore all other OpenInference instrumentations during the port.** The only +3. **Ignore all other OpenInference instrumentations during the migration.** The only instrumentation code to read is the OpenInference package being migrated plus `opentelemetry-util-genai`. Build from first principles: original OpenInference code + util-genai public API + @@ -193,7 +193,7 @@ conformance scenarios under `tests/conformance/`, and cassettes. ### B. Diff against OpenInference -Run the OpenInference analysis as a greenfield port would (the reading behind +Run the OpenInference analysis as a greenfield migration would (the reading behind steps 5–6): every method it patches, every shape it parses. Subtract inventory A. The remainder is the work-list: @@ -222,12 +222,12 @@ Apply steps **5–10 to the new parts only**: ### D. Report and review -Write `MIGRATION_REPORT.md` via the `review-ported` skill as usual — it +Write `MIGRATION_REPORT.md` via the `review-migration` skill as usual — it detects augment mode. ## Migration flow -> The numbered steps below are written for a **greenfield port**. In +> The numbered steps below are written for a **greenfield migration**. In > [augment mode](#augment-mode-the-package-already-exists) skip steps 1–3, > and scope steps 5–10 to the delta from the inventory/diff (sections A–C > above). @@ -353,7 +353,7 @@ Output must be empty. This is the largest behavioral change. OpenInference typically patches at the HTTP-transport layer (`OpenAI.request`, `AsyncOpenAI.request`, `HTTPClient._send_request`, etc.) and dispatches by `cast_to` response type. -**That pattern does not survive the port.** util-genai's +**That pattern does not survive the migration.** util-genai's `InferenceInvocation` model needs the request kwargs (`model`, `messages`, `tools`, `stream`, …) at call time, which only the API methods see. @@ -465,14 +465,14 @@ response shape — migrate it. A test that checks `using_attributes(session_id=…)` propagates into span attributes covers OpenInference framework behavior — skip it. -**Sanity check before committing step 7.** Count source vs ported tests: +**Sanity check before committing step 7.** Count source vs migrated tests: ```sh rg -c '^\s*(async )?def test_' /tests/ rg -c '^\s*(async )?def test_' instrumentation//tests/ ``` -A port that drops from 80 tests to 5 is a regression — go back to the +A migration that drops from 80 tests to 5 is a regression — go back to the "migrate" and "migrate (rewrite)" buckets and finish them. **Replace conftest boilerplate.** OpenInference conftests duplicate the @@ -513,7 +513,7 @@ tiny constants (`DEFAULT_MODEL`, sample prompts) inline or in **Required unit-test coverage per wrapped method.** Apply the repo test matrix (sync/async × happy/error, plus streaming × happy/error where the method streams — see [AGENTS.md](../../../AGENTS.md) Tests section) -to **every** method patched in step 5. For the port these are blockers for +to **every** method patched in step 5. For the migration these are blockers for the migration PR, not follow-up. The error variants must verify the original exception is re-raised, `error.type` is recorded, and span status is ERROR. @@ -528,10 +528,10 @@ Add current `util/opentelemetry-util-genai` and `instrumentation/opentelemetry-i Author conformance scenarios using the **`write-conformance-tests`** skill — it's the generic procedure (scenario modules, the `test_conformance.py` runner, declared gaps, lib-specific assertions, weaver policies) and applies -to any instrumentation. Port-specific notes on top of that skill: +to any instrumentation. Migration-specific notes on top of that skill: - Drop OpenInference's `examples/` tree — its end-to-end demos are replaced - by conformance scenarios, not ported. + by conformance scenarios, not migrated. - For an operation blocked by a util-genai/semconv gap, point the `expected_violations` / `xfail` `reason=` at the gap row in `MIGRATION_REPORT.md`. @@ -539,7 +539,7 @@ to any instrumentation. Port-specific notes on top of that skill: ### 9. Cassettes (or a transport proxy) - Copy cassettes from OpenInference's `tests/cassettes/` (or wherever the OpenInference package - parks them) into the port's `tests/cassettes/`. Reuse names so existing + parks them) into the migrated package's `tests/cassettes/`. Reuse names so existing unit tests keep loading them. - Reuse existing cassettes for conformance scenarios when they are applicable. - **AI-generated cassettes.** For a cassette OpenInference lacks and you @@ -550,7 +550,7 @@ to any instrumentation. Port-specific notes on top of that skill: **Transport proxy instead of cassettes.** If the OpenInference unit tests mock HTTP (e.g. `respx`, `httpx.MockTransport`) rather than replay recorded -cassettes, you may do the same in the port's **unit** tests — build the SDK +cassettes, you may do the same in the migrated package's **unit** tests — build the SDK client with an `httpx.MockTransport` (or equivalent) returning canned responses instead of `@pytest.mark.vcr`. When you go this route: @@ -563,7 +563,7 @@ responses instead of `@pytest.mark.vcr`. When you go this route: and ignore the injected `vcr` (see the `write-conformance-tests` skill). Pick one mechanism (cassettes *or* transport mock) and use it consistently across the whole package. -- **Mention the choice in `MIGRATION_REPORT.md`** (the `review-ported` skill +- **Mention the choice in `MIGRATION_REPORT.md`** (the `review-migration` skill flags missing cassettes; note that the package mocks the transport by design so the absence is not a gap). @@ -571,9 +571,9 @@ responses instead of `@pytest.mark.vcr`. When you go this route: Wire the new package into the workspace, `tox.ini`, and pyright per the **Adding a package to the workspace** section of [AGENTS.md](../../../AGENTS.md) -— it applies to any new package, not just ports. Port-specific note on top: +— it applies to any new package, not just migrations. Migration-specific note on top: -- **Leave the package out of `[tool.pyright] include`.** A port over untyped +- **Leave the package out of `[tool.pyright] include`.** A migration over untyped `wrapt` boundaries (`wrapped, instance, args, kwargs`) and vendor SDK members produces hundreds of strict-mode errors, so don't add it to `include` until typing lands — track that as a follow-up. @@ -585,14 +585,14 @@ Run the pre-PR checks from the **Commands** section of the package's `-{oldest,latest}` (and `-conformance`) test envs. Open the PR with the `migration:openinference` label. Run the -`review-ported` skill locally to generate `MIGRATION_REPORT.md`; iterate -until §4 (test coverage) is clean. The review skill compares the port +`review-migration` skill locally to generate `MIGRATION_REPORT.md`; iterate +until §4 (test coverage) is clean. The review skill compares the migrated package against OpenInference (or any upstreams you name), so coverage gaps surface in one report. ## See also -- [AGENTS.md](../../../AGENTS.md) — general repo rules that already apply to the port. +- [AGENTS.md](../../../AGENTS.md) — general repo rules that already apply to the migration. - `util/opentelemetry-util-genai/AGENTS.md` — util-genai usage rules. - `.github/skills/write-conformance-tests/SKILL.md` — generic conformance-scenario authoring (step 8). -- `.github/skills/review-ported/SKILL.md` — sister review skill (writes `MIGRATION_REPORT.md`). +- `.github/skills/review-migration/SKILL.md` — sister review skill (writes `MIGRATION_REPORT.md`). diff --git a/.github/skills/review-ported/SKILL.md b/.github/skills/review-migration/SKILL.md similarity index 86% rename from .github/skills/review-ported/SKILL.md rename to .github/skills/review-migration/SKILL.md index 8444d39c..5a41ac76 100644 --- a/.github/skills/review-ported/SKILL.md +++ b/.github/skills/review-migration/SKILL.md @@ -1,9 +1,9 @@ --- -name: review-ported -description: Review a ported or augmented instrumentation-genai package by comparing it against any known external upstream implementations of the same instrumentation (OpenInference, vendor-specific). Handles both greenfield ports and augment-mode PRs that add coverage to a pre-existing package (checking the added parts, their consistency with existing code, and old-vs-new coexistence). Writes MIGRATION_REPORT.md in the migrated package root. +name: review-migration +description: Review a migrated or augmented instrumentation-genai package by comparing it against any known external upstream implementations of the same instrumentation (OpenInference, vendor-specific). Handles both greenfield migrations and augment-mode PRs that add coverage to a pre-existing package (checking the added parts, their consistency with existing code, and old-vs-new coexistence). Writes MIGRATION_REPORT.md in the migrated package root. --- -# Review a ported instrumentation-genai package +# Review a migrated instrumentation-genai package Compare the migrated package in `instrumentation//` against every known upstream implementation of the same instrumentation, and @@ -36,14 +36,14 @@ upstream lookup. ``` If there are no upstreams to compare against (no user-named upstream resolves and no -OpenInference match), this isn't a port — bail with a one-line note. +OpenInference match), this isn't a migration — bail with a one-line note. ## Greenfield vs augment mode Detect the mode before reviewing — it changes what counts as a problem: -- **Greenfield port** — the migration *created* the package; everything in - `instrumentation//` is the port. +- **Greenfield migration** — the migration *created* the package; everything in + `instrumentation//` is the migrated package. - **Augment mode** — the package **already existed** and the migration only *adds* coverage mined from the upstream. The diff modifies a pre-existing package. @@ -71,7 +71,7 @@ below. - **Tables only when there are ≥1 problem rows.** Empty section: `_none_`. - **Read, don't guess.** Every claim from actual code or command output. Don't infer from package names or READMEs. Before listing something as - missing, grep for it in the port's `src/` and `tests/`. + missing, grep for it in the migrated package's `src/` and `tests/`. - **Snapshot of the PR head**, not aspirational state. ## Report structure @@ -81,7 +81,7 @@ First line: ```markdown # Migration review: -Mode: greenfield port | augment existing package +Mode: greenfield migration | augment existing package Compared against: - OpenInference: `openinference-instrumentation-` (add link) @@ -103,7 +103,7 @@ is generic. Do not collapse rows. For each upstream, walk the actual instrumentation code, not docs: -- **Method-level patching** (the shape of this port, and of any +- **Method-level patching** (the shape of this migration, and of any method-level upstream): read every `wrap_function_wrapper(...)` call in `_instrument()`. Each call = one row. - **Transport-level patching** (typical of OpenInference, e.g. @@ -117,15 +117,15 @@ For each upstream, walk the actual instrumentation code, not docs: decorator-based instrumentation): walk the actual entry points the upstream registers and list one row per emitted span site. -One column per upstream that exists, plus `This port`: +One column per upstream that exists, plus `This package`: -| API method | OpenInference | This port | Notes | +| API method | OpenInference | This package | Notes | |---|---|---|---| | `openai.resources.chat.completions.Completions.create` | ✅ | ✅ | | | `openai.resources.responses.Responses.create` | ✅ | ❌ | | | `openai.resources.beta.assistants.Assistants.create` | ✅ (generic span only) | ❌ | | -- ✅ = patched. ❌ = at least one upstream patches it, this port doesn't. +- ✅ = patched. ❌ = at least one upstream patches it, this migration doesn't. — = not patched (and that's expected — upstream doesn't patch it either). - Sort `❌` rows to the top. - For `❌` rows, the **Notes** cell must name the GenAI semconv operation @@ -136,7 +136,7 @@ One column per upstream that exists, plus `This port`: Do not render a separate `**Gaps:**` bullet list below the table — the table itself is the gap list. -**Augment mode.** Split `This port` into `Existed` and `Added this PR`: +**Augment mode.** Split `This package` into `Existed` and `Added this PR`: | API method | OpenInference | Existed | Added this PR | Notes | |---|---|---|---|---| @@ -149,7 +149,7 @@ gap; sort it to the top. ### 2. Gaps and open issues -Genuine **tooling/util gaps** — things the port couldn't do because +Genuine **tooling/util gaps** — things the migration couldn't do because `opentelemetry-util-genai` and/or the GenAI semconv doesn't yet support them (missing util-genai factory, attribute not in the registry yet). Reference failing/skipped tests if any. @@ -169,14 +169,14 @@ same emitted telemetry) are not listed. Look for: patching strategy separate); message format (flat OpenInference attributes vs JSON `parts` array); streaming wrapper shape; attributes set unconditionally vs gated on `is_recording()` / `should_capture_content()`; anything emitted by -upstream that this port deliberately drops without a one-line rationale. +upstream that this migration deliberately drops without a one-line rationale. -| Aspect | Upstream | This port | Notes | +| Aspect | Upstream | This package | Notes | |---|---|---|---| ### 3b. Consistency and old-vs-new coexistence (augment mode only) -**Greenfield port: skip.** Here the risk isn't "is it conformant" but "do +**Greenfield migration: skip.** Here the risk isn't "is it conformant" but "do the additions fit the package they landed in." Compare the **added** code against the **pre-existing** code (not the upstream); flag only real problems: @@ -221,12 +221,12 @@ For each unreferenced cassette, walk git history before naming a cause: 1. Check whether the same cassette is also unreferenced in the upstream it came from (for each unreferenced file, search the upstream's `tests/` for any reference). If yes, it's an **inherited upstream - orphan** — say so plainly; the port is not responsible. -2. If the cassette IS referenced upstream but not in the port, the port + orphan** — say so plainly; the migration is not responsible. +2. If the cassette IS referenced upstream but not in the migrated package, the migration dropped a test. That is **not** a §4b cosmetic note — it's a §4a missing variant (or a missing scenario, depending on what the test covered). Add it to §4a's table with the upstream test name in Notes, - and do not rationalize ("tests were renamed/removed during the port" + and do not rationalize ("tests were renamed/removed during the migration" without naming the commit is speculation, not analysis). Never write "appear to be" / "likely" / "safe to delete" without @@ -235,7 +235,7 @@ which case say "history not checked" so the reviewer does it. #### 4a. Unit-test matrix per wrapped method -For each method in §1 row where `This port` = ✅: +For each method in §1 row where `This package` = ✅: | Variant | Required when | |---|---| @@ -248,7 +248,7 @@ For each method in §1 row where `This port` = ✅: | **async streaming × happy** | the method accepts `stream=True` or returns an async stream wrapper | | **async streaming × error** | flag if streaming is supported but no error path is exercised at all or error is not recorded on telemetry | -Identify variants by reading the port's `src/` (`is_streaming(kwargs)`, +Identify variants by reading the migrated package's `src/` (`is_streaming(kwargs)`, async `def`, `Stream` / `AsyncStream` wrappers). | Wrapped method | Missing variants | Notes | @@ -256,7 +256,7 @@ async `def`, `Stream` / `AsyncStream` wrappers). #### 4b. Conformance scenarios -For each distinct GenAI semconv operation the port emits (`chat`, +For each distinct GenAI semconv operation the migrated package emits (`chat`, `embeddings`, `execute_tool`, `invoke_agent`, `invoke_workflow`, `create_agent`, …) there should be at least one happy-path scenario file under `tests/conformance/.py` driven by `run_conformance(...)`. More @@ -281,7 +281,7 @@ at the `openinference-instrumentation-` package name, links to an OpenInference URL). **Refactor misses.** One bullet *only if* unit tests re-implement generic -semconv shape checks inline instead of factoring them into the port's +semconv shape checks inline instead of factoring them into the migrated package's `tests/test_utils.py` helpers (e.g. `assert_all_attributes`, `assert_completion_attributes`, `assert_messages_attribute`). There is no shared `opentelemetry.test_util_genai.assertions` module — assert directly @@ -307,7 +307,7 @@ are none, render `_No follow-up issues recommended._` ## See also -- `.github/skills/port-from-openinference/SKILL.md` — the port skill; it +- `.github/skills/migrate-from-openinference/SKILL.md` — the migration skill; it runs this review at its final step to produce `MIGRATION_REPORT.md`. - `.github/skills/write-conformance-tests/SKILL.md` — authoring the conformance scenarios this report checks in §4b. diff --git a/.gitignore b/.gitignore index 631834e7..67ff44c6 100644 --- a/.gitignore +++ b/.gitignore @@ -74,5 +74,5 @@ policies/_schemas.rego # Conformance test weaver live-check reports weaver_reports/ -# Migration review reports generated by the review-ported skill +# Migration review reports generated by the review-migration skill MIGRATION_REPORT.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 05ce838e..f8857b2f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -127,6 +127,20 @@ uv run tox -e changelog-preview If your change doesn't need an entry (pure docs/tooling), add the `Skip Changelog` label to the PR. +## Skills + +This repo ships skills (under `.github/skills/`) that automate the heavy, +repeatable contribution flows. Trigger them deliberately when +you start one of these tasks: + +- **`migrate-from-openinference`** — migrate an `openinference-instrumentation-*` + package into this repo as an OTel GenAI package, or augment an existing + package with the coverage OpenInference adds on top. +- **`review-migration`** — review a ported or augmented package against its + upstream implementation and write `MIGRATION_REPORT.md`. +- **`write-conformance-tests`** — author conformance scenarios and the + `test_conformance.py` runner for an instrumentation package. + ## Keep PRs small One logical change per PR. Don't bundle unrelated fixes, refactors, or From c92aad97a174902b59235b15cd2f4d7cf814aa88 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Tue, 16 Jun 2026 17:27:33 -0700 Subject: [PATCH 4/5] Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Trask Stalnaker --- .github/skills/migrate-from-openinference/SKILL.md | 6 +++--- .github/skills/review-migration/SKILL.md | 2 +- CONTRIBUTING.md | 2 ++ 3 files changed, 6 insertions(+), 4 deletions(-) diff --git a/.github/skills/migrate-from-openinference/SKILL.md b/.github/skills/migrate-from-openinference/SKILL.md index 1a018461..9155c9d1 100644 --- a/.github/skills/migrate-from-openinference/SKILL.md +++ b/.github/skills/migrate-from-openinference/SKILL.md @@ -37,7 +37,7 @@ User specifies the source, e.g. `openinference-instrumentation-crewai`. and use `/tmp/openinference/python/instrumentation//` as the source path in step 1. -User may also provide ***target package name**. If not provided: derive it from the source name: +User may also provide the **target package name**. If not provided: derive it from the source name: - drop the leading `openinference-instrumentation-`. Remaining part should match the instrumented library name as it appears on PyPI. If it's not the case, flag it. - The target package name should be `opentelemetry-instrumentation-genai-` where `` is the instrumented library name (e.g. `openai`, `anthropic`, `bedrock`). For example: - `openinference-instrumentation-openai` → `opentelemetry-instrumentation-genai-openai` @@ -286,12 +286,12 @@ Update every import. Verify zero `openinference` references remain in `BaseInstrumentor`) and the underlying SDK (`openai`, `anthropic`, …) at the same range OpenInference was using. - `__version__` in `version.py` should equal the value in - `opentelemetry-util-genai/src/opentelemetry/util/genai/version.py` — all + `util/opentelemetry-util-genai/src/opentelemetry/util/genai/version.py` — all workspace packages share one version. Verify: ```sh diff <(grep ^__version__ instrumentation//src/opentelemetry/instrumentation/genai//version.py) \ - <(grep ^__version__ opentelemetry-util-genai/src/opentelemetry/util/genai/version.py) + <(grep ^__version__ util/opentelemetry-util-genai/src/opentelemetry/util/genai/version.py) ``` - Hatchling builds **require a `README.md` or `README.rst`**. Rewrite it to diff --git a/.github/skills/review-migration/SKILL.md b/.github/skills/review-migration/SKILL.md index 5a41ac76..fcac4d7a 100644 --- a/.github/skills/review-migration/SKILL.md +++ b/.github/skills/review-migration/SKILL.md @@ -265,7 +265,7 @@ scenarios per operation are fine but never required. | Operation | Scenario file | Status | |---|---|---| -Mention if conformance scenatio is skipped, there are expected_violations, +Mention if conformance scenario is skipped, there are expected_violations, or `uv run tox -e py312-test-instrumentation-genai-` fails. #### 4c. Docstring / README coverage diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index f8857b2f..24dfc10f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -141,6 +141,8 @@ you start one of these tasks: - **`write-conformance-tests`** — author conformance scenarios and the `test_conformance.py` runner for an instrumentation package. +Please contribute back anything you learn while using the skills that could help improve them! + ## Keep PRs small One logical change per PR. Don't bundle unrelated fixes, refactors, or From 9a035d7ec5a8b4a722a53c24726bd1e61e4ff545 Mon Sep 17 00:00:00 2001 From: Liudmila Molkova Date: Wed, 17 Jun 2026 00:39:45 +0000 Subject: [PATCH 5/5] update links --- .github/skills/migrate-from-openinference/SKILL.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/.github/skills/migrate-from-openinference/SKILL.md b/.github/skills/migrate-from-openinference/SKILL.md index 9155c9d1..f409ef7d 100644 --- a/.github/skills/migrate-from-openinference/SKILL.md +++ b/.github/skills/migrate-from-openinference/SKILL.md @@ -29,7 +29,8 @@ corpus while filtering openinference-framework plumbing tests out (step 7). User specifies the source, e.g. `openinference-instrumentation-crewai`. -- **Source**: `https://github.com/open-telemetry/donation-openinference/tree/main/python/instrumentation//`. +- **Source**: It should point to one of the folders in + `https://github.com/open-telemetry/donation-openinference/tree/main/python/instrumentation/`. Fetch a fresh shallow clone if you don't already have one locally: ```sh git clone --depth=1 https://github.com/open-telemetry/donation-openinference.git /tmp/openinference @@ -113,15 +114,15 @@ with the migration flow below. ## Reference material - **OTel GenAI spans**: — authoritative attribute names, spans, logs, and metrics definitions. -- **OpenInference → OTel attribute mapping** (Arize-maintained): . Use as a quick lookup for what an OpenInference attribute *roughly* corresponds to in OTel; when the mapping disagrees with the official semconv, **the official semconv wins**. +- **OpenInference → OTel attribute mapping** (Arize-maintained): . Use as a quick lookup for what an OpenInference attribute *roughly* corresponds to in OTel; when the mapping disagrees with the official semconv, **the official semconv wins**. - **Message JSON schemas**: - input messages: - output messages: - system instructions: - - tool definitions: - - retrieval documents: + - tool definitions: + - retrieval documents: -- **Code for above models**: . +- **Code for above models**: . ## Non-negotiable rules @@ -406,7 +407,7 @@ right (all types from `opentelemetry.util.genai.types` unless noted): | Server-side tool call (web_search, file_search, code_interpreter, …) | `Message(parts=[ServerToolCall(name=…, server_tool_call=…, id=…)])` | | Server-side tool call result | `Message(parts=[ServerToolCallResponse(server_tool_call_response=…, id=…)])` | | Inline image / audio / video bytes | `Blob(mime_type=…, modality="image"\|"audio"\|"video", content=b"…")` | -| External media URL | `Uri(mime_type=…, modality=…, uri="https://…")` | +| External media URL | `Uri(mime_type=…, modality=…, uri="…")` | | File reference (e.g. OpenAI `file_id`) | `File(mime_type=…, modality=…, file_id="file-…")` | | Provider-specific item with no semconv mapping | `GenericPart(value=…)` — never silently drop. Flag those in the review report. |