Conversation
There was a problem hiding this comment.
Auto Pull Request Review from LlamaPReview
🎯 TL;DR & Recommendation
Recommendation: Request Changes
This PR adds the UpliftAI TTS engine but introduces critical risks with fragile WAV header handling that could break audio decoding and incomplete test validation that might miss audio corruption.
🌟 Strengths
- Adds UpliftAI engine with streaming support and preset voices for Urdu, Sindhi, and Balochi languages.
- Well-integrated into existing test scaffolding with example usage.
🚨 Critical Issues (P1)
- tts_wrapper/engines/upliftai/client.py: Fragile WAV header stripping logic risks breaking audio decoding silently if the API changes header size.
- tests/test_tts_engines.py: Test for WAV header stripping doesn't validate audio integrity, potentially allowing corrupted audio to pass undetected.
💡 Suggestions (P2)
- tests/test_tts_engines.py: Adding a branch to
check_credentialsexacerbates its complexity, making it harder to maintain and scale for new engines. - tts_wrapper/engines/upliftai/client.py: Hardcoded voice metadata creates a maintenance burden and synchronization risk with the external API.
- tts_wrapper/engines/upliftai/client.py: Credential check performs expensive synthesis, which could slow down frequent calls in test setups.
💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.
| audio_bytes = self._strip_wav_header(audio_bytes) | ||
| return audio_bytes | ||
|
|
||
| def synth_to_bytestream( |
There was a problem hiding this comment.
P1 | Confidence: High
The WAV header stripping logic in the streaming method assumes a fixed 44-byte header. This is fragile because WAV headers can vary in size depending on format details. The related test (test_upliftai_streaming_strips_wav_header) only checks for the absence of "RIFF" but doesn't validate the actual audio format. If the API changes its header structure, this could break audio decoding or cause silent corruption. The non-streaming method uses a more robust RIFF signature check.
| def synth_to_bytestream( | |
| def synth_to_bytestream(...): | |
| # ... same setup ... | |
| with requests.post(...) as response: | |
| response.raise_for_status() | |
| # Find RIFF header dynamically | |
| header_buffer = b"" | |
| for chunk in response.iter_content(chunk_size=4096): | |
| if not chunk: | |
| continue | |
| if not header_skipped: | |
| header_buffer += chunk | |
| riff_pos = header_buffer.find(RIFF_HEADER) | |
| if riff_pos == -1: | |
| # Haven't found header yet, keep buffering | |
| continue | |
| # Calculate header end (RIFF + 4 bytes size + 4 bytes WAVE = 8 bytes minimum) | |
| # Actual header size is at riff_pos+4 (little-endian uint32) | |
| if len(header_buffer) < riff_pos + 8: | |
| continue | |
| header_size = int.from_bytes(header_buffer[riff_pos+4:riff_pos+8], 'little') + 8 | |
| if len(header_buffer) < riff_pos + header_size: | |
| continue | |
| # Yield data after header | |
| data = header_buffer[riff_pos + header_size:] | |
| header_skipped = True | |
| if data: | |
| yield data | |
| continue | |
| yield chunk |
| pytest.fail(f"SSML synthesis failed with error: {e}") | ||
|
|
||
|
|
||
| @pytest.mark.synthetic |
There was a problem hiding this comment.
P1 | Confidence: High
This test validates the WAV header stripping but doesn't verify the audio content remains valid after stripping. The related context shows similar tests for other engines focus on basic audio properties (byte length, format), but don't validate audio integrity. This could allow broken audio to pass tests if the header stripping corrupts the data.
Code Suggestion:
# Add to test after header check
import wave
from io import BytesIO
# ... after first_chunk assertion ...
full_audio = b''.join(stream)
try:
with BytesIO(full_audio) as bio:
with wave.open(bio) as wav:
assert wav.getnchannels() == 1 # Validate basic audio properties
assert wav.getframerate() == 22050
except Exception as e:
pytest.fail(f"Stripped audio is invalid: {e}")| f"ElevenLabs API key: {elevenlabs_api_key[:5]}...{elevenlabs_api_key[-5:] if elevenlabs_api_key else ''}" | ||
| ) | ||
| client = ElevenLabsClient(credentials=elevenlabs_api_key) | ||
| elif service == "upliftai": |
There was a problem hiding this comment.
P2 | Confidence: High
The check_credentials function is already flagged by Ruff as overly complex (C901). Adding another conditional branch for UpliftAI exacerbates this maintainability issue. The function now has 25+ branches, making it difficult to modify and test. This pattern also doesn't scale well for new engines. The related context shows this function is critical for test setup across all engines.
Code Suggestion:
_CREDENTIAL_CHECKERS = {
"upliftai": lambda: UpliftAIClient(api_key=os.getenv("UPLIFTAI_KEY")),
# ... other services ...
}
def check_credentials(service):
if service not in _CREDENTIAL_CHECKERS:
return False
try:
client = _CREDENTIAL_CHECKERS[service]()
return client.check_credentials()
except Exception:
return False| with Path(output_file).open("wb") as f: | ||
| f.write(audio_bytes) | ||
|
|
||
| def _get_voices(self) -> list[dict[str, Any]]: |
There was a problem hiding this comment.
P2 | Confidence: Medium
Hardcoding voice metadata creates a maintenance burden and synchronization risk with the external API. If UpliftAI adds/removes voices, this will require code changes. The related context doesn't show how other engines handle voice metadata, but this static approach is less flexible than dynamic discovery.
Code Suggestion:
def _get_voices(self) -> list[dict[str, Any]]:
try:
# Attempt to fetch voices dynamically if API adds endpoint
response = requests.get(f"{self.BASE_URL}/voices", headers=self.headers)
return response.json()
except Exception:
logger.warning("Falling back to hardcoded voice list")
return [ ... ] # Hardcoded fallback| }, | ||
| ] | ||
|
|
||
| def check_credentials(self) -> bool: # pragma: no cover - network call |
There was a problem hiding this comment.
P2 | Confidence: Medium
The credential check performs a full synthesis request ("ping"), which is expensive and slow compared to a dedicated auth endpoint or lighter-weight request. The related context doesn't show how other engines implement credential checks, but this could become problematic if called frequently (e.g., in test setups).
Summary
Testing
uv run pytest -q -m "not synthetic"(fails: Missing required environment variables: POLLY_REGION, POLLY_AWS_KEY_ID, POLLY_AWS_ACCESS_KEY, GOOGLE_SA_PATH, GOOGLE_SA_FILE_B64, MICROSOFT_TOKEN, MICROSOFT_REGION, WATSON_API_KEY, WATSON_REGION, WATSON_INSTANCE_ID, ELEVENLABS_API_KEY, WITAI_TOKEN, PLAYHT_API_KEY, PLAYHT_USER_ID)uv run ruff check tests/test_tts_engines.py(fails: C901check_credentialsis too complex (24 > 10), PLR0911, PLR0912, PLR0915, PTH110, PTH100, PTH118, PTH109, E501, C901, PLR0911, PLR0912, LOG015, G004, C901, PLR0912, PLR0915, LOG015, G004)uv run ruff check tts_wrapper/engines/upliftai/client.pyhttps://chatgpt.com/codex/tasks/task_e_68ab6808c4bc83278cd968e4f809eef5