Add Speechmatics as alternative speech provider by claudio-pi · Pull Request #78 · edgarjs/claudio

claudio-pi · 2026-02-12T06:58:35Z

Summary

Adds Speechmatics as a configurable alternative to ElevenLabs for TTS and STT
New SPEECH_PROVIDER config option in service.env (default: elevenlabs, alternative: speechmatics)
Speechmatics TTS uses the preview endpoint (preview.tts.speechmatics.com) with WAV output
Speechmatics STT uses the batch API (asr.api.speechmatics.com/v2) with async job submission, polling, and plain-text transcript retrieval
Provider dispatch in handlers.py selects the appropriate TTS/STT functions based on config

New config variables (in `service.env`)

Variable	Default	Description
`SPEECH_PROVIDER`	`elevenlabs`	Speech provider: `elevenlabs` or `speechmatics`
`SPEECHMATICS_API_KEY`	(empty)	Speechmatics API key
`SPEECHMATICS_VOICE_ID`	`sarah`	Voice: `sarah`, `theo`, `megan`, `jack`
`SPEECHMATICS_STT_REGION`	`eu1`	STT API region: `eu1`, `us1`, `au1`

Files changed

New: lib/speechmatics.py — TTS and STT module (stdlib only)
New: tests/test_speechmatics.py — 33 tests
Modified: lib/config.py — Added Speechmatics config fields
Modified: lib/handlers.py — Provider dispatch functions
Modified: tests/test_handlers.py — Updated mocks for dispatch layer
Modified: CLAUDE.md — Documentation updates

Test plan

All 673 tests pass (640 existing + 33 new)
Speechmatics TTS API tested with real API key — returns valid WAV audio
Speechmatics STT batch API tested end-to-end — submit job, poll, get transcript
Verify ElevenLabs path still works unchanged (default config)
Test switching SPEECH_PROVIDER=speechmatics in production

🤖 Generated with Claude Code

Adds configurable speech provider selection via SPEECH_PROVIDER env var (default: "elevenlabs"). When set to "speechmatics", voice transcription uses the batch STT API and TTS uses the streaming WAV endpoint. New files: - lib/speechmatics.py: TTS (WAV output) and STT (async batch job API) - tests/test_speechmatics.py: 33 tests covering TTS, STT, polling, validation Modified: - lib/config.py: SPEECH_PROVIDER, SPEECHMATICS_API_KEY, SPEECHMATICS_VOICE_ID, SPEECHMATICS_STT_REGION in BotConfig and ClaudioConfig - lib/handlers.py: provider dispatch via _stt_transcribe(), _tts_convert(), _get_speech_api_key() — selects provider based on config - tests/test_handlers.py: updated mocks for new dispatch functions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-02-12T06:58:55Z

Summary of Changes

Hello @claudio-pi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's voice processing capabilities by integrating Speechmatics as a new, configurable alternative to ElevenLabs for both text-to-speech and speech-to-text. This provides greater flexibility and choice for users, allowing them to leverage different speech technologies based on their preferences or requirements. The changes involve core logic updates, new API integrations, and comprehensive configuration management to ensure seamless operation.

Highlights

Speechmatics Integration: Added Speechmatics as an alternative provider for both Text-to-Speech (TTS) and Speech-to-Text (STT) functionalities.
Configurable Speech Provider: Introduced a new SPEECH_PROVIDER configuration option, allowing users to switch between elevenlabs (default) and speechmatics.
Speechmatics TTS Implementation: Implemented Speechmatics TTS using its preview endpoint, generating WAV audio output, and including markdown stripping and text truncation.
Speechmatics STT Implementation: Implemented Speechmatics STT using its asynchronous batch API, which involves submitting transcription jobs, polling for completion, and retrieving plain-text transcripts.
Centralized Speech Dispatch: Refactored lib/handlers.py to include dispatch functions (_stt_transcribe, _tts_convert) that dynamically select the appropriate speech provider based on the configuration.
New Configuration Variables: Added SPEECHMATICS_API_KEY, SPEECHMATICS_VOICE_ID, and SPEECHMATICS_STT_REGION to the system's configuration.
Comprehensive Testing: Included a new dedicated test file (tests/test_speechmatics.py) with 33 new tests for Speechmatics functionalities and updated existing handler tests to reflect the new dispatch layer.

Changelog

CLAUDE.md
- Updated the description of lib/handlers.py to reflect the new speech provider dispatch logic.
- Added lib/speechmatics.py to the list of core modules.
- Adjusted the total test count from 640 to 673 and included speechmatics in the list of covered test modules.
lib/config.py
- Added speech_provider to the BotConfig class attributes.
- Included speechmatics_api_key, speechmatics_voice_id, and speechmatics_stt_region in BotConfig attributes and their initialization.
- Integrated new Speechmatics-related environment variables (SPEECH_PROVIDER, SPEECHMATICS_API_KEY, SPEECHMATICS_VOICE_ID, SPEECHMATICS_STT_REGION) into ClaudioConfig's managed keys and default service environment.
lib/handlers.py
- Renamed imports for ElevenLabs TTS/STT functions and added imports for Speechmatics TTS/STT functions, aliasing them to prevent naming conflicts.
- Introduced _get_speech_api_key, _stt_transcribe, and _tts_convert functions to dispatch speech processing requests to the configured provider.
- Modified _process_message to use the new speech dispatch functions for voice transcription and API key validation.
- Updated _deliver_voice_response to utilize the new _tts_convert dispatch function and dynamically set the temporary audio file extension based on the selected speech provider.
lib/speechmatics.py
- Added a new module implementing Speechmatics Text-to-Speech (tts_convert) and Speech-to-Text (stt_transcribe) functionalities.
- Implemented tts_convert to interact with the Speechmatics preview TTS endpoint, handling text processing, API requests, and WAV output validation.
- Implemented stt_transcribe to manage the Speechmatics batch STT API workflow, including job submission, status polling, and transcript retrieval.
- Included helper functions for WAV magic byte validation, job submission, job status polling, transcript fetching, and safe file deletion.
tests/test_handlers.py
- Updated mock paths for stt_transcribe and tts_convert to target the new dispatch functions (_stt_transcribe, _tts_convert) within lib.handlers.
tests/test_speechmatics.py
- Added a new test file containing extensive unit tests for the lib/speechmatics.py module.
- Included tests for tts_convert covering success, API errors, URL errors, text truncation, invalid inputs, and non-audio responses.
- Provided tests for stt_transcribe covering success, submission errors, file size validation, missing API keys, invalid regions, empty transcriptions, polling logic, and multipart request verification.
- Added tests for _validate_wav_magic to ensure correct WAV file identification.
- Included tests for _wait_for_job and _get_transcript to verify job polling and transcript retrieval mechanisms.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cubic-dev-ai

3 issues found across 6 files

Confidence score: 3/5

Telegram voice replies may fail because send_voice doesn’t accept WAV files, so Speechmatics responses may not reach Telegram users until a supported format is used.
Unbounded TTS response reads in lib/speechmatics.py could allow excessive memory use on malformed/large responses, which is a moderate stability risk.
Score reflects a couple of medium‑severity, user‑impacting issues but no critical blockers reported.
Pay close attention to lib/handlers.py and lib/speechmatics.py - media format compatibility and response/file handling safeguards.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="lib/speechmatics.py">

<violation number="1" location="lib/speechmatics.py:102">
P2: Cap the TTS response read to a maximum size so a malformed/large response can’t consume unbounded memory.

(Based on your team's feedback about capping HTTP response reads for downloaded media.) [FEEDBACK_USED]</violation>

<violation number="2" location="lib/speechmatics.py:123">
P2: Create the TTS output file with restrictive permissions (0o600) to avoid leaking voice content to other local users.

(Based on your team's feedback about creating downloaded media files with restrictive permissions.) [FEEDBACK_USED]</violation>
</file>

<file name="lib/handlers.py">

<violation number="1" location="lib/handlers.py:831">
P2: Telegram `send_voice` does not accept WAV files, so Speechmatics voice replies will fail for Telegram. Use a supported format (e.g., request MP3/OGG from Speechmatics or transcode before calling send_voice).</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-02-12T07:05:12Z

lib/speechmatics.py

+
+    # Write output file
+    try:
+        with open(output_path, 'wb') as f:


P2: Create the TTS output file with restrictive permissions (0o600) to avoid leaking voice content to other local users.

(Based on your team's feedback about creating downloaded media files with restrictive permissions.)

View Feedback

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/speechmatics.py, line 123: <comment>Create the TTS output file with restrictive permissions (0o600) to avoid leaking voice content to other local users. (Based on your team's feedback about creating downloaded media files with restrictive permissions.) </comment> <file context> @@ -0,0 +1,324 @@ + + # Write output file + try: + with open(output_path, 'wb') as f: + f.write(data) + except OSError as e: </file context>

cubic-dev-ai · 2026-02-12T07:05:12Z

lib/speechmatics.py

+
+    try:
+        with urllib.request.urlopen(req, timeout=120) as resp:
+            data = resp.read()


P2: Cap the TTS response read to a maximum size so a malformed/large response can’t consume unbounded memory.

(Based on your team's feedback about capping HTTP response reads for downloaded media.)

View Feedback

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/speechmatics.py, line 102: <comment>Cap the TTS response read to a maximum size so a malformed/large response can’t consume unbounded memory. (Based on your team's feedback about capping HTTP response reads for downloaded media.) </comment> <file context> @@ -0,0 +1,324 @@ + + try: + with urllib.request.urlopen(req, timeout=120) as resp: + data = resp.read() + except urllib.error.HTTPError as e: + error_detail = f"HTTP {e.code}" </file context>

cubic-dev-ai · 2026-02-12T07:05:12Z

lib/handlers.py

 def _deliver_voice_response(response, config, client, msg, platform,
                            tmp_dir, tmp_files, bot_id):
    """Convert response to voice/audio and send, falling back to text."""
+    tts_ext = '.wav' if config.speech_provider == 'speechmatics' else '.mp3'


P2: Telegram send_voice does not accept WAV files, so Speechmatics voice replies will fail for Telegram. Use a supported format (e.g., request MP3/OGG from Speechmatics or transcode before calling send_voice).

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At lib/handlers.py, line 831: <comment>Telegram `send_voice` does not accept WAV files, so Speechmatics voice replies will fail for Telegram. Use a supported format (e.g., request MP3/OGG from Speechmatics or transcode before calling send_voice).</comment> <file context> @@ -787,15 +828,15 @@ def _typing_loop(): def _deliver_voice_response(response, config, client, msg, platform, tmp_dir, tmp_files, bot_id): """Convert response to voice/audio and send, falling back to text.""" + tts_ext = '.wav' if config.speech_provider == 'speechmatics' else '.mp3' fd, tts_file = tempfile.mkstemp( - prefix='claudio-tts-', suffix='.mp3', dir=tmp_dir, </file context>

gemini-code-assist

Code Review

This pull request adds Speechmatics as an alternative speech provider, which is a great enhancement. The implementation is generally clean and well-structured. However, a high-severity prompt injection vulnerability was found in lib/handlers.py where the output of the new speech-to-text functionality is not sanitized before being used in a language model prompt. Additionally, two suggestions for improvement were identified in lib/speechmatics.py: the speech-to-text language is currently hardcoded to English and could be made configurable for better flexibility, and the polling logic for transcription jobs could be made more resilient to transient network errors, aligning with best practices for handling non-fatal exceptions.

gemini-code-assist · 2026-02-12T07:11:04Z

lib/handlers.py

-                config.elevenlabs_api_key,
-                model=config.elevenlabs_stt_model,
-            )
+            transcription = _stt_transcribe(voice_file, config)


The introduction of the new speech-to-text provider creates a prompt injection vector. The transcription variable, which holds the output from the _stt_transcribe function, should be treated as untrusted input. This variable is directly concatenated into the prompt text on line 662 and passed to the language model in run_claude without sanitization. A malicious user could craft an audio file that transcribes into malicious instructions, allowing them to manipulate the language model's behavior. This could lead to the model ignoring its system prompt, leaking sensitive data from the conversation history, or performing other unintended actions.

Suggested change

transcription = _stt_transcribe(voice_file, config)

transcription = sanitize_for_prompt(_stt_transcribe(voice_file, config))

gemini-code-assist · 2026-02-12T07:11:05Z

lib/speechmatics.py

+    return True
+
+
+def stt_transcribe(audio_path, api_key, region='eu1', language='en'):


The language parameter is hardcoded to 'en'. This limits the speech-to-text functionality to English only. The other speech provider, ElevenLabs, appears to support multilingual transcription, so this could be a feature regression for users switching to Speechmatics.

To better support multilingual users, this should be made configurable. I recommend the following changes:

In lib/config.py, add a new configuration variable SPEECHMATICS_STT_LANGUAGE to ClaudioConfig and BotConfig (with a default of 'en' to maintain current behavior).

In lib/handlers.py, update _stt_transcribe to pass this new config value to speechmatics_stt.

Here in stt_transcribe, you could then remove the default value for the language parameter to ensure it's always explicitly provided by the caller.

This will make the Speechmatics integration more flexible and consistent with the multilingual capabilities of the project.

gemini-code-assist · 2026-02-12T07:11:05Z

lib/speechmatics.py

+        try:
+            with urllib.request.urlopen(req, timeout=30) as resp:
+                data = json.loads(resp.read())
+        except (urllib.error.HTTPError, urllib.error.URLError, OSError,
+                json.JSONDecodeError, ValueError) as e:
+            log_error("stt", f"Error polling job {job_id}: {e}")
+            return False


The current error handling in the polling loop is not resilient to transient network errors. Any URLError or OSError will cause the entire transcription to fail immediately. It would be more robust to handle these potentially transient errors by logging them and continuing to poll until the deadline. Permanent errors like HTTPError (especially 4xx client errors) or JSONDecodeError should still cause a failure.

Suggested change

try:

with urllib.request.urlopen(req, timeout=30) as resp:

data = json.loads(resp.read())

except (urllib.error.HTTPError, urllib.error.URLError, OSError,

json.JSONDecodeError, ValueError) as e:

log_error("stt", f"Error polling job {job_id}: {e}")

return False

try:

with urllib.request.urlopen(req, timeout=30) as resp:

data = json.loads(resp.read())

except (urllib.error.URLError, OSError) as e:

log("stt", f"Network error polling job {job_id}, will retry: {e}")

time.sleep(STT_POLL_INTERVAL)

continue

except (urllib.error.HTTPError, json.JSONDecodeError, ValueError) as e:

log_error("stt", f"API/parsing error polling job {job_id}: {e}")

return False

References

In non-fatal contexts like hooks, catch broad exceptions but log them to stderr for debuggability instead of silently swallowing them.

Remove unused imports flagged by ruff

8c44384

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cubic-dev-ai bot reviewed Feb 12, 2026

View reviewed changes

gemini-code-assist bot reviewed Feb 12, 2026

View reviewed changes

edgarjs closed this Feb 12, 2026

edgarjs deleted the feature/speechmatics-provider branch February 12, 2026 16:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Speechmatics as alternative speech provider#78

Add Speechmatics as alternative speech provider#78
claudio-pi wants to merge 2 commits intomainfrom
feature/speechmatics-provider

claudio-pi commented Feb 12, 2026

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Feb 12, 2026

Uh oh!

cubic-dev-ai bot Feb 12, 2026

Uh oh!

cubic-dev-ai bot Feb 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

gemini-code-assist bot Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	transcription = _stt_transcribe(voice_file, config)
	transcription = sanitize_for_prompt(_stt_transcribe(voice_file, config))

		return True


		def stt_transcribe(audio_path, api_key, region='eu1', language='en'):

Conversation

claudio-pi commented Feb 12, 2026

Summary

New config variables (in service.env)

Files changed

Test plan

Uh oh!

gemini-code-assist bot commented Feb 12, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New config variables (in `service.env`)