Skip to content

[bot] Streaming aggregator and output types cannot represent or capture logprobs data #51

@braintrust-bot

Description

@braintrust-bot

Summary

The OutputChoice type has a logprobs field typed as Option<()> (always None), and the streaming aggregator's StreamChoice/StreamDelta structs do not parse the logprobs field from streaming chunks. When a Chat Completions request uses logprobs: true (with optional top_logprobs), the per-token log probability data is silently dropped from the aggregated span output.

What is missing

OpenAI Chat Completions streaming includes a logprobs field on each choice containing per-token log probabilities:

{"choices": [{"index": 0, "delta": {"content": "Hello"}, "logprobs": {"content": [{"token": "Hello", "logprob": -0.31725305, "bytes": [72,101,108,108,111], "top_logprobs": [{"token": "Hello", "logprob": -0.31725305, "bytes": [72,101,108,108,111]}, {"token": "Hi", "logprob": -1.3862944, "bytes": [72,105]}]}]}}]}

Currently in the SDK:

  1. OutputChoice (src/stream.rs:356-363) defines logprobs: Option<()> — the type () cannot hold any data; it is always serialized as null and always set to None in both the builder (line 434) and constructor (line 376)
  2. StreamChoice (src/stream.rs:658-664) does not have a logprobs field, so logprobs data from streaming chunks is discarded during deserialization
  3. aggregate() (src/stream.rs:727-807) has no logic to collect or merge per-token logprobs across chunks
  4. There are no logprobs types defined anywhere in the codebase (no LogprobContent, TopLogprob, etc.)

This means when users request logprobs for:

  • Confidence scoring — measuring model certainty on classifications or extractions
  • Calibration — evaluating whether model-reported probabilities match actual correctness rates
  • Content filtering confidence — assessing how confidently the model produced specific tokens
  • Token-level analysis — debugging model behavior at the token level

…the logprobs data is present in the stream but lost from the Braintrust span.

Braintrust docs status

unclear — Braintrust's OpenAI integration page does not mention logprobs specifically. Other Braintrust SDKs (TypeScript, Python) capture logprobs as part of the full response object via their wrapOpenAI/wrap_openai wrappers, but there is no explicit documentation about logprobs tracing support.

Upstream sources

Relationship to existing issues

Local files inspected

  • src/stream.rs:356-363OutputChoice struct has logprobs: Option<()> (cannot hold data)
  • src/stream.rs:376OutputChoice::new() hardcodes logprobs: None
  • src/stream.rs:434OutputChoiceBuilder::build() hardcodes logprobs: None
  • src/stream.rs:658-664StreamChoice struct has no logprobs field
  • src/stream.rs:727-807aggregate() has no logprobs collection logic
  • Full codebase grep for logprobs — only hits are the Option<()> placeholder in OutputChoice

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions