Skip to content

[BOT ISSUE] Missing OpenAI Audio API instrumentation (transcription, speech, translation) #146

@braintrust-bot

Description

@braintrust-bot

Summary

The SDK instruments OpenAI chat completions, Responses API, and moderations for both the openai (official) and ruby-openai gems, but does not instrument any of the Audio APIs — transcription (speech-to-text), speech synthesis (text-to-speech), or translation. These are stable generative execution APIs that use AI models (Whisper for transcription/translation, TTS models for speech) and return structured results with usage metrics.

What is missing

openai gem (official)

Three resource classes under OpenAI::Resources::Audio:

  • client.audio.transcriptions.create — Speech-to-text using Whisper. Accepts audio file + model, returns text transcription with optional token/segment detail.
  • client.audio.speech.create — Text-to-speech using TTS models (tts-1, tts-1-hd). Accepts text input + voice + model, returns audio binary.
  • client.audio.translations.create — Audio translation to English using Whisper. Accepts audio file + model, returns translated text.

Source: lib/openai/resources/audio/ in openai/openai-ruby contains transcriptions.rb, speech.rb, translations.rb.

ruby-openai gem

Equivalent methods:

  • client.audio.transcribe(parameters: {...})
  • client.audio.speech(parameters: {...})
  • client.audio.translate(parameters: {...})

Source: Documented under the "Whisper" section in the ruby-openai README.

Expected instrumentation

New patchers (similar to the existing ModerationsPatcher) for each audio surface:

Transcription spans should capture:

  • Input: audio file reference, language hint
  • Metadata: model (e.g., whisper-1), response_format, language, provider, endpoint
  • Metrics: duration (audio length), tokens if available
  • Output: transcription text

Speech spans should capture:

  • Input: text to synthesize
  • Metadata: model (e.g., tts-1), voice, speed, response_format, provider, endpoint
  • Output: audio format/size metadata (not the binary audio itself)

Translation spans should capture the same fields as transcription.

Braintrust docs status

not_found — Braintrust docs at https://www.braintrust.dev/docs/instrument/trace-llm-calls list both openai and ruby-openai as supported Ruby libraries but do not mention audio API instrumentation. All examples focus on chat completions.

Upstream sources

Local repo files inspected

  • lib/braintrust/contrib/openai/integration.rb — registers ChatPatcher, ResponsesPatcher, ModerationsPatcher only; no audio patchers
  • lib/braintrust/contrib/openai/instrumentation/ — contains chat.rb, responses.rb, moderations.rb, common.rb; no audio files
  • lib/braintrust/contrib/ruby_openai/integration.rb — registers ChatPatcher, ResponsesPatcher, ModerationsPatcher only; no audio patchers
  • lib/braintrust/contrib/ruby_openai/instrumentation/ — contains chat.rb, responses.rb, moderations.rb, common.rb; no audio files
  • Grep for audio, transcri, speech, whisper across lib/braintrust/ returns zero matches

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions