A unified FastAPI-based text-to-speech service that supports multiple TTS engines and providers, allowing you to generate audio from text using various synthesis methods.
- Multiple TTS Providers: Support for cloud-based APIs and local TTS engines
- Voice Selection: Different voices available per provider
- Audio Caching: Generated audio files are cached to avoid regeneration
- RESTful API: Simple HTTP endpoints for easy integration
- Streaming Audio: Direct audio file streaming for web playback
- Provider Auto-Detection: Automatically detects which TTS engines are available
The API supports various TTS providers including:
- Cloud-based services (like Pollinations)
- Local TTS engines (eSpeak, Festival, Flite, DECtalk, SAM)
- AI models (Coqui TTS)
- Custom effect engines ((currently broken) LPC-style processing)
-
Install dependencies:
pip install fastapi requests uvicorn
-
Run the server:
python app.py
-
Generate speech:
GET /tts?text=Hello+World&provider=pollinations&voice=alloy
GET /- API information and available providersGET /tts- Generate audio from textGET /play/{filename}- Stream audio filesGET /providers- List all available providers and voicesGET /files- List generated audio filesGET /health- Health check