Skip to content

Add UpliftAI TTS engine#62

Closed
willwade wants to merge 2 commits intomainfrom
codex/add-new-voice-engines-and-streaming-support
Closed

Add UpliftAI TTS engine#62
willwade wants to merge 2 commits intomainfrom
codex/add-new-voice-engines-and-streaming-support

Conversation

@willwade
Copy link
Owner

Summary

  • add UpliftAI engine with streaming HTTP support and preset Urdu, Sindhi, and Balochi voices
  • wire UpliftAI into test scaffolding and add example usage

Testing

  • uv run pytest -q -m "not synthetic"
  • uv run ruff check (fails: LOG015 and other pre-existing warnings)

https://chatgpt.com/codex/tasks/task_e_68ab6808c4bc83278cd968e4f809eef5

Copy link

@llamapreview llamapreview bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto Pull Request Review from LlamaPReview

🎯 TL;DR & Recommendation

Recommendation: Request Changes.
This PR adds UpliftAI TTS support but contains a critical runtime error in the example code and security vulnerabilities in the client implementation that could lead to failures or data exposure.

🚨 Critical Issues (P0)

  • examples/test-upliftai.py: Calls a non-existent method (speak_streamed), which will cause an AttributeError at runtime and mislead users about available functionality.

⚡ Key Risks & Improvements (P1)

  • tts_wrapper/engines/upliftai/client.py: Lacks Content-Type verification in streaming, risking exposure of sensitive error messages or data corruption if API returns non-audio responses.
  • tts_wrapper/engines/upliftai/client.py: Authorization header missing 'Bearer' scheme, potentially causing authentication failures with the API.

💡 Suggestions (P2)

  • tts_wrapper/engines/upliftai/client.py: Hardcoded voice list creates a maintenance burden and risk of desynchronization with API changes.
  • tts_wrapper/engines/upliftai/client.py: Credential check uses a full synthesis request, which is inefficient and could lead to unnecessary API costs and slower performance.
  • tts_wrapper/engines/upliftai/client.py: Redundant 'lang' parameter in set_voice method may confuse developers and lead to inconsistent state.

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

from tts_wrapper import UpliftAIClient


def main() -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 | Confidence: High

The example calls speak_streamed() which is not implemented in the UpliftAIClient. This will cause an AttributeError at runtime. The base class likely provides this method, but it depends on proper implementation of synth_to_bytestream(). This creates a false expectation about available functionality.

Code Suggestion:

# Replace with actual streaming usage pattern
for chunk in client.synth_to_bytestream(text):
    # Process audio chunks
    pass

response.raise_for_status()
return response.content

def synth_to_bytestream(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 | Confidence: High

The streaming implementation does not verify the response Content-Type before yielding chunks. If the API returns an error (e.g., 200 with JSON error message due to API gateway misconfiguration), the client will treat JSON as audio data, potentially exposing sensitive error information or causing downstream processing errors. This violates the principle of defensive programming and could lead to security incidents if error messages contain internal API details.

Suggested change
def synth_to_bytestream(
with requests.post(...) as response:
response.raise_for_status()
# Verify we're receiving audio data
content_type = response.headers.get('Content-Type', '')
if not content_type.startswith('audio/'):
raise ValueError(f"Unexpected Content-Type: {content_type}")
for chunk in response.iter_content(chunk_size=4096):
if chunk:
yield chunk

STREAM_URL = f"{BASE_URL}/stream"
DEFAULT_VOICE = "v_8eelc901" # Info/Education Urdu

def __init__(self, api_key: str | None = None) -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 | Confidence: High

The API key is used directly as the Authorization header value without a standard scheme prefix (e.g., "Bearer"). This may not comply with the UpliftAI API requirements (documentation not provided in PR) and could cause authentication failures. Most modern APIs expect Bearer token format. Additionally, storing the raw key in instance state increases exposure risk if the object is serialized or logged.

Suggested change
def __init__(self, api_key: str | None = None) -> None:
def __init__(self, api_key: str | None = None) -> None:
super().__init__()
self.api_key = api_key or os.getenv("UPLIFTAI_KEY")
if not self.api_key:
msg = "UpliftAI API key is required. Set UPLIFTAI_KEY or pass api_key."
raise ValueError(msg)
self.headers = {
"Authorization": f"Bearer {self.api_key}", # Add scheme prefix
"Content-Type": "application/json"
}

with Path(output_file).open("wb") as f:
f.write(audio_bytes)

def _get_voices(self) -> list[dict[str, Any]]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: High

Hardcoded voice list creates a maintenance burden and synchronization risk with upstream API changes. The method is private (_get_voices) but there's no public interface to access voices. This limits discoverability of available voices and may lead to voice ID typos that aren't caught until API call time. Consider implementing a public get_voices() method with caching/refresh mechanism.

},
]

def check_credentials(self) -> bool: # pragma: no cover - network call
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: Medium

The credential check performs a full synthesis request ("ping" text), which is resource-intensive for both client and server. A lighter-weight endpoint (e.g., dedicated health check or simple GET request) would be more efficient. This could lead to unnecessary API costs and slower startup times if called frequently.

self.audio_rate = 22050
self.voice_id = self.DEFAULT_VOICE

def set_voice(self, voice_id: str, lang: str | None = None) -> None:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 | Confidence: Medium

The method sets a lang attribute that isn't used in synthesis methods, creating inconsistent state. The voice ID already encapsulates language information (as seen in hardcoded voices). This redundant parameter may confuse developers and could become out-of-sync with actual voice capabilities.

Suggested change
def set_voice(self, voice_id: str, lang: str | None = None) -> None:
def set_voice(self, voice_id: str) -> None:
"""Set the voice for synthesis."""
self.voice_id = voice_id

@willwade willwade closed this Aug 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant