Skip to content

feat(datadog): support per-request ml_app override via metadata#25684

Open
liranddd wants to merge 1 commit intoBerriAI:mainfrom
liranddd:feat/per-request-ml-app-override
Open

feat(datadog): support per-request ml_app override via metadata#25684
liranddd wants to merge 1 commit intoBerriAI:mainfrom
liranddd:feat/per-request-ml-app-override

Conversation

@liranddd
Copy link
Copy Markdown

@liranddd liranddd commented Apr 14, 2026

Summary

  • Allow callers to pass ml_app in request metadata to control the Application column in Datadog LLM Observability
  • Spans are grouped by ml_app at flush time and sent as separate batches (Datadog intake API requires ml_app at batch level)
  • Spans without an override fall back to DD_SERVICE — fully backwards compatible

Closes #20701

Motivation

When multiple services share a single LiteLLM proxy, all LLM traces appear under the same application in Datadog LLM Observability. There is currently no way to distinguish which service made the call. This PR lets callers tag their requests so they appear as distinct applications.

Usage

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    metadata={"ml_app": "my-service-name"}
)

Result in Datadog LLM Obs

Service Application
litellm-server my-service-name
litellm-server other-service
litellm-server litellm-server (default, no override)

Changes

  • litellm/integrations/datadog/datadog_llm_obs.py — read ml_app from metadata, group batches by it
  • litellm/types/integrations/datadog_llm_obs.py — add internal _dd_ml_app field (stripped before send)
  • tests/test_litellm/integrations/datadog/test_per_request_ml_app.py — 5 new tests

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 14, 2026 9:30am

Request Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 14, 2026

CLA assistant check
All committers have signed the CLA.

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 14, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing liranddd:feat/per-request-ml-app-override (4a55027) with main (e64d98f)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 14, 2026

Greptile Summary

This PR adds per-request ml_app override support for Datadog LLM Observability, allowing multiple services sharing one LiteLLM proxy to appear as distinct applications. Spans are grouped by ml_app at flush time and sent as separate batches; spans without an override fall back to DD_LLMOBS_ML_APP / DD_SERVICE.

  • Two imports (get_datadog_ml_app, safe_dumps) are placed inline inside async_send_batch instead of at module level — a minor style violation per the project's CLAUDE.md guidelines.
  • With multi-group flushing, a partial-success scenario is now reachable: if group A's POST succeeds but group B's raises, log_queue.clear() is skipped and group A's spans will be re-sent on the next flush, producing duplicates in Datadog.

Confidence Score: 5/5

  • Safe to merge; all findings are P2 style/quality suggestions that do not block correct operation.
  • The core grouping logic is correct, the previous reviewer's mutation concern is addressed with clean copies, tests are thorough and mock-only, and the change is fully backwards-compatible. Remaining findings are a style violation (inline imports) and a non-critical partial-success edge case that only produces duplicate data under transient errors with ≥2 ml_app values.
  • litellm/integrations/datadog/datadog_llm_obs.py — async_send_batch multi-group partial-success and inline imports.

Important Files Changed

Filename Overview
litellm/integrations/datadog/datadog_llm_obs.py Core change: groups log_queue by _dd_ml_app and sends separate batches per ml_app. Clean-copy approach correctly avoids mutating queue entries. Two inline imports violate project style. Multi-group partial-success can cause duplicate spans on retry.
litellm/integrations/datadog/datadog_handler.py Adds get_datadog_ml_app() helper that reads DD_LLMOBS_ML_APP and falls back to get_datadog_service(). Clean, backwards-compatible addition.
litellm/types/integrations/datadog_llm_obs.py Adds optional _dd_ml_app field to LLMObsPayload TypedDict as an internal routing hint. Field is documented as stripped before sending. Clean change.
tests/test_litellm/integrations/datadog/test_per_request_ml_app.py 5 new mock-only tests covering payload field, absence without override, env-var default, multi-app grouping, failure-keeps-queue, and success-clears-queue. All mock the HTTP client; no real network calls. Good coverage.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant LiteLLM
    participant DDLLMObsLogger
    participant DatadogAPI

    Caller->>LiteLLM: "completion(model, messages, metadata={ml_app: "svc-a"})"
    LiteLLM->>DDLLMObsLogger: async_log_success_event(kwargs)
    DDLLMObsLogger->>DDLLMObsLogger: "create_llm_obs_payload()<br/>reads metadata.ml_app → stores _dd_ml_app="svc-a""
    DDLLMObsLogger->>DDLLMObsLogger: log_queue.append(payload)

    Note over DDLLMObsLogger: On flush (batch_size or periodic)

    DDLLMObsLogger->>DDLLMObsLogger: "async_send_batch()<br/>group spans by _dd_ml_app"

    DDLLMObsLogger->>DatadogAPI: "POST /api/intake/llm-obs/v1/trace/spans<br/>ml_app="svc-a", spans=[…stripped of _dd_ml_app]"
    DatadogAPI-->>DDLLMObsLogger: 202 Accepted

    DDLLMObsLogger->>DatadogAPI: "POST /api/intake/llm-obs/v1/trace/spans<br/>ml_app="svc-b", spans=[…]"
    DatadogAPI-->>DDLLMObsLogger: 202 Accepted

    DDLLMObsLogger->>DatadogAPI: "POST /api/intake/llm-obs/v1/trace/spans<br/>ml_app=DD_LLMOBS_ML_APP (default), spans=[…]"
    DatadogAPI-->>DDLLMObsLogger: 202 Accepted

    DDLLMObsLogger->>DDLLMObsLogger: log_queue.clear()
Loading

Reviews (4): Last reviewed commit: "feat(datadog): support per-request ml_ap..." | Re-trigger Greptile

Comment thread litellm/integrations/datadog/datadog_llm_obs.py
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 92.30769% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
litellm/integrations/datadog/datadog_llm_obs.py 91.30% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread litellm/integrations/datadog/datadog_llm_obs.py Fixed
Allow callers to pass ml_app in request metadata to control the
Application column in Datadog LLM Observability. Also adds support
for the DD_LLMOBS_ML_APP env var.

Fallback chain: metadata.ml_app → DD_LLMOBS_ML_APP → DD_SERVICE.

Closes BerriAI#20701
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support DD_LLMOBS_ML_APP env var for Datadog LLM Observability

3 participants