Skip to content

feat(speechsdk): add speech-sdk multi-provider TTS plugin#1754

Open
btpod wants to merge 1 commit into
livekit:mainfrom
btpod:btpod/speechsdk-tts-plugin
Open

feat(speechsdk): add speech-sdk multi-provider TTS plugin#1754
btpod wants to merge 1 commit into
livekit:mainfrom
btpod:btpod/speechsdk-tts-plugin

Conversation

@btpod

@btpod btpod commented Jun 11, 2026

Copy link
Copy Markdown

Disclosure up front: I work on speech-sdk (Apache 2.0). The integration runs fully BYOK against provider APIs with your users' own keys; no account with us is needed.

Proposed in #1753 per the CONTRIBUTING "discuss first" guideline; opening the PR alongside so the diff is concrete. Happy to close either if this isn't a fit.

Summary

  • New optional @livekit/agents-plugin-speechsdk package: non-streaming TTS where the model is one provider/model string across 15 providers.
  • Adds providers agents-js has no dedicated TTS plugin for today: Murf, Smallest.ai, fal.ai-hosted open-weight models (Kokoro, Orpheus, F5), and xAI TTS.
  • Makes provider evaluation a one-string change (elevenlabs/eleven_flash_v2_5 to cartesia/sonic-3) with no new dependency per vendor; for production streaming, the dedicated provider plugins remain the better choice and the README says so.
  • Defaults to openai/gpt-4o-mini-tts, so anyone with OPENAI_API_KEY already set can use it immediately.
  • Existing plugins and defaults are untouched; the only shared-file changes are a README table row, a CLAUDE.md plugin-list entry, one turbo.json globalEnv entry, and the changeset.

Implementation notes

  • Mirrors the OpenAI TTS plugin's shape: tts.TTS subclass with { streaming: false }, ChunkedStream, stream() throws like openai.TTS (AgentSession wraps non-streaming TTS in the sentence-level StreamAdapter automatically).
  • Audio is requested from speech-sdk as raw PCM with no rate hint, the actual rate is read from the returned mediaType, and @livekit/rtc-node's AudioResampler resamples when a provider's native rate differs from the configured frame rate (default 24 kHz). This avoids per-provider sample-rate matrices: some providers are fixed at 48 kHz (Hume), some have no 24 kHz option (Resemble), and fal has no rate selection at all.
  • speech-sdk's internal retry is disabled (maxRetries: 0) so the framework's ChunkedStream retry policy owns retries. speech-sdk errors map to APIStatusError (HTTP errors, retryable on 408/429/5xx) or non-retryable APIError (for example a missing key produces "OpenAI API key is required. Pass it via apiKey option or set the OPENAI_API_KEY environment variable").
  • Key resolution is delegated to speech-sdk: each provider's standard env var (OPENAI_API_KEY, MURF_API_KEY, ...) or an explicit apiKey option.
  • speech-sdk is ESM-only; the plugin imports it via dynamic import(), which tsup preserves in the CJS build (same pattern as the @huggingface/transformers import in the livekit plugin). Types come from static import type, which is erased.
  • Optionally, setting SPEECHBASE_API_KEY routes the same provider/model strings through speechbase.ai, the hosted gateway we run, so one key covers every provider; without it, calls go directly to the provider. Direct is the default.
  • Model strings split on the first slash only, because fal model ids are path-style (fal-ai/kokoro/american-english). Unknown provider prefixes are rejected at construction with the supported list.
  • Dependency footprint: @speech-sdk/core plus four transitive runtime deps (mediabunny, its mp3 encoder, p-retry, zod, of which zod was already in the lockfile). The pnpm-lock.yaml diff is purely additive (81 lines); the existing Renovate-pinned vitest resolutions are deliberately left untouched.
  • New files carry SPDX headers; config files are covered by the existing REUSE.toml annotations. A changeset is included per CLAUDE.md.

Test plan

  • pnpm build (all 37 turbo tasks), pnpm lint, pnpm format:check, pnpm throws:check, and reuse lint pass locally; pnpm install --frozen-lockfile --ignore-scripts (the CI install command) accepts the spliced lockfile. Re-verified on a clean checkout of current main before opening this PR.
  • Unit tests for model-string handling pass without keys: missing prefix rejected, unknown provider rejected with the supported list, path-style fal ids split on the first slash only.
  • Error paths verified against the built package: missing provider key produces a non-retryable APIError with env-var guidance before any network call; an invalid gateway key produces a real HTTP 401 mapped to APIStatusError (statusCode 401, retryable false).
  • Live round-trip via the shared harness passes (OPENAI_API_KEY=... npx vitest run plugins/speechsdk, 4/4): streams text through StreamAdapter, synthesizes openai/gpt-4o-mini-tts, and validates the transcript with OpenAI STT. Note: the test constructs the STT as new STT({ useRealtime: false, model: 'whisper-1' }); bare new STT() (as the sibling plugin tests use) now defaults to the realtime model and throws without a VAD when run with keys.
  • Resample path exercised against real audio: a 3.5 s utterance synthesized at the default 24 kHz emits frames at 24000 Hz, and with sampleRate: 16000 the AudioResampler branch emits frames at 16000 Hz with matching duration.

I'll maintain this integration and take responsibility for breakage in it. If this isn't a direction you want, totally fine, close both with no hard feelings.

🤖 Generated with Claude Code

New @livekit/agents-plugin-speechsdk package: non-streaming TTS across 15
providers through one provider/model string, including providers without a
dedicated plugin (Murf, Smallest.ai, fal.ai-hosted open-weight models).
Synthesis requests raw PCM and resamples to the configured frame rate with
AudioResampler when a provider's native rate differs. speech-sdk's internal
retry is disabled so the framework's ChunkedStream retry policy owns retries.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@changeset-bot

changeset-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 96ee459

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 36 packages
Name Type
@livekit/agents-plugin-speechsdk Patch
@livekit/agents Patch
@livekit/agents-plugin-anam Patch
@livekit/agents-plugin-assemblyai Patch
@livekit/agents-plugin-baseten Patch
@livekit/agents-plugin-bey Patch
@livekit/agents-plugin-cartesia Patch
@livekit/agents-plugin-cerebras Patch
@livekit/agents-plugin-deepgram Patch
@livekit/agents-plugin-did Patch
@livekit/agents-plugin-elevenlabs Patch
@livekit/agents-plugin-fishaudio Patch
@livekit/agents-plugin-google Patch
@livekit/agents-plugin-hedra Patch
@livekit/agents-plugin-hume Patch
@livekit/agents-plugin-inworld Patch
@livekit/agents-plugin-lemonslice Patch
@livekit/agents-plugin-liveavatar Patch
@livekit/agents-plugin-livekit Patch
@livekit/agents-plugin-minimax Patch
@livekit/agents-plugin-mistral Patch
@livekit/agents-plugin-mistralai Patch
@livekit/agents-plugin-neuphonic Patch
@livekit/agents-plugin-openai Patch
@livekit/agents-plugin-perplexity Patch
@livekit/agents-plugin-phonic Patch
@livekit/agents-plugin-resemble Patch
@livekit/agents-plugin-rime Patch
@livekit/agents-plugin-runway Patch
@livekit/agents-plugin-sarvam Patch
@livekit/agents-plugin-silero Patch
@livekit/agents-plugin-soniox Patch
@livekit/agents-plugin-tavus Patch
@livekit/agents-plugin-trugen Patch
@livekit/agents-plugin-xai Patch
@livekit/agents-plugins-test Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants