feat(speechsdk): add speech-sdk multi-provider TTS plugin by btpod · Pull Request #1754 · livekit/agents-js

btpod · 2026-06-11T03:43:47Z

Disclosure up front: I work on speech-sdk (Apache 2.0). The integration runs fully BYOK against provider APIs with your users' own keys; no account with us is needed.

Proposed in #1753 per the CONTRIBUTING "discuss first" guideline; opening the PR alongside so the diff is concrete. Happy to close either if this isn't a fit.

Summary

New optional @livekit/agents-plugin-speechsdk package: non-streaming TTS where the model is one provider/model string across 15 providers.
Adds providers agents-js has no dedicated TTS plugin for today: Murf, Smallest.ai, fal.ai-hosted open-weight models (Kokoro, Orpheus, F5), and xAI TTS.
Makes provider evaluation a one-string change (elevenlabs/eleven_flash_v2_5 to cartesia/sonic-3) with no new dependency per vendor; for production streaming, the dedicated provider plugins remain the better choice and the README says so.
Defaults to openai/gpt-4o-mini-tts, so anyone with OPENAI_API_KEY already set can use it immediately.
Existing plugins and defaults are untouched; the only shared-file changes are a README table row, a CLAUDE.md plugin-list entry, one turbo.json globalEnv entry, and the changeset.

Implementation notes

Mirrors the OpenAI TTS plugin's shape: tts.TTS subclass with { streaming: false }, ChunkedStream, stream() throws like openai.TTS (AgentSession wraps non-streaming TTS in the sentence-level StreamAdapter automatically).
Audio is requested from speech-sdk as raw PCM with no rate hint, the actual rate is read from the returned mediaType, and @livekit/rtc-node's AudioResampler resamples when a provider's native rate differs from the configured frame rate (default 24 kHz). This avoids per-provider sample-rate matrices: some providers are fixed at 48 kHz (Hume), some have no 24 kHz option (Resemble), and fal has no rate selection at all.
speech-sdk's internal retry is disabled (maxRetries: 0) so the framework's ChunkedStream retry policy owns retries. speech-sdk errors map to APIStatusError (HTTP errors, retryable on 408/429/5xx) or non-retryable APIError (for example a missing key produces "OpenAI API key is required. Pass it via apiKey option or set the OPENAI_API_KEY environment variable").
Key resolution is delegated to speech-sdk: each provider's standard env var (OPENAI_API_KEY, MURF_API_KEY, ...) or an explicit apiKey option.
speech-sdk is ESM-only; the plugin imports it via dynamic import(), which tsup preserves in the CJS build (same pattern as the @huggingface/transformers import in the livekit plugin). Types come from static import type, which is erased.
Optionally, setting SPEECHBASE_API_KEY routes the same provider/model strings through speechbase.ai, the hosted gateway we run, so one key covers every provider; without it, calls go directly to the provider. Direct is the default.
Model strings split on the first slash only, because fal model ids are path-style (fal-ai/kokoro/american-english). Unknown provider prefixes are rejected at construction with the supported list.
Dependency footprint: @speech-sdk/core plus four transitive runtime deps (mediabunny, its mp3 encoder, p-retry, zod, of which zod was already in the lockfile). The pnpm-lock.yaml diff is purely additive (81 lines); the existing Renovate-pinned vitest resolutions are deliberately left untouched.
New files carry SPDX headers; config files are covered by the existing REUSE.toml annotations. A changeset is included per CLAUDE.md.

Test plan

pnpm build (all 37 turbo tasks), pnpm lint, pnpm format:check, pnpm throws:check, and reuse lint pass locally; pnpm install --frozen-lockfile --ignore-scripts (the CI install command) accepts the spliced lockfile. Re-verified on a clean checkout of current main before opening this PR.
Unit tests for model-string handling pass without keys: missing prefix rejected, unknown provider rejected with the supported list, path-style fal ids split on the first slash only.
Error paths verified against the built package: missing provider key produces a non-retryable APIError with env-var guidance before any network call; an invalid gateway key produces a real HTTP 401 mapped to APIStatusError (statusCode 401, retryable false).
Live round-trip via the shared harness passes (OPENAI_API_KEY=... npx vitest run plugins/speechsdk, 4/4): streams text through StreamAdapter, synthesizes openai/gpt-4o-mini-tts, and validates the transcript with OpenAI STT. Note: the test constructs the STT as new STT({ useRealtime: false, model: 'whisper-1' }); bare new STT() (as the sibling plugin tests use) now defaults to the realtime model and throws without a VAD when run with keys.
Resample path exercised against real audio: a 3.5 s utterance synthesized at the default 24 kHz emits frames at 24000 Hz, and with sampleRate: 16000 the AudioResampler branch emits frames at 16000 Hz with matching duration.

I'll maintain this integration and take responsibility for breakage in it. If this isn't a direction you want, totally fine, close both with no hard feelings.

🤖 Generated with Claude Code

New @livekit/agents-plugin-speechsdk package: non-streaming TTS across 15 providers through one provider/model string, including providers without a dedicated plugin (Murf, Smallest.ai, fal.ai-hosted open-weight models). Synthesis requests raw PCM and resamples to the configured frame rate with AudioResampler when a provider's native rate differs. speech-sdk's internal retry is disabled so the framework's ChunkedStream retry policy owns retries. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

changeset-bot · 2026-06-11T03:43:52Z

🦋 Changeset detected

Latest commit: 96ee459

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 36 packages

Name	Type
@livekit/agents-plugin-speechsdk	Patch
@livekit/agents	Patch
@livekit/agents-plugin-anam	Patch
@livekit/agents-plugin-assemblyai	Patch
@livekit/agents-plugin-baseten	Patch
@livekit/agents-plugin-bey	Patch
@livekit/agents-plugin-cartesia	Patch
@livekit/agents-plugin-cerebras	Patch
@livekit/agents-plugin-deepgram	Patch
@livekit/agents-plugin-did	Patch
@livekit/agents-plugin-elevenlabs	Patch
@livekit/agents-plugin-fishaudio	Patch
@livekit/agents-plugin-google	Patch
@livekit/agents-plugin-hedra	Patch
@livekit/agents-plugin-hume	Patch
@livekit/agents-plugin-inworld	Patch
@livekit/agents-plugin-lemonslice	Patch
@livekit/agents-plugin-liveavatar	Patch
@livekit/agents-plugin-livekit	Patch
@livekit/agents-plugin-minimax	Patch
@livekit/agents-plugin-mistral	Patch
@livekit/agents-plugin-mistralai	Patch
@livekit/agents-plugin-neuphonic	Patch
@livekit/agents-plugin-openai	Patch
@livekit/agents-plugin-perplexity	Patch
@livekit/agents-plugin-phonic	Patch
@livekit/agents-plugin-resemble	Patch
@livekit/agents-plugin-rime	Patch
@livekit/agents-plugin-runway	Patch
@livekit/agents-plugin-sarvam	Patch
@livekit/agents-plugin-silero	Patch
@livekit/agents-plugin-soniox	Patch
@livekit/agents-plugin-tavus	Patch
@livekit/agents-plugin-trugen	Patch
@livekit/agents-plugin-xai	Patch
@livekit/agents-plugins-test	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

CLAassistant · 2026-06-11T03:43:56Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

btpod mentioned this pull request Jun 11, 2026

Proposal: speech-sdk TTS plugin (Murf, Smallest.ai, fal.ai-hosted open-weight models, one-string provider switching) #1753

Open

devin-ai-integration Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(speechsdk): add speech-sdk multi-provider TTS plugin#1754

feat(speechsdk): add speech-sdk multi-provider TTS plugin#1754
btpod wants to merge 1 commit into
livekit:mainfrom
btpod:btpod/speechsdk-tts-plugin

btpod commented Jun 11, 2026

Uh oh!

changeset-bot Bot commented Jun 11, 2026

Uh oh!

CLAassistant commented Jun 11, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

btpod commented Jun 11, 2026

Summary

Implementation notes

Test plan

Uh oh!

changeset-bot Bot commented Jun 11, 2026

🦋 Changeset detected

Uh oh!

CLAassistant commented Jun 11, 2026

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants