This document provides curl examples for using the Text-to-Speech (TTS) endpoints in the AI Content Processing API.
- GET /tts-voices - Get available voices and capabilities
- POST /text-to-speech - Convert text to speech with single voice
- POST /text-to-speech-podcast - Convert multiple text segments with different voices (perfect for podcasts)
-
Make sure the API server is running:
python api_server.py
-
Ensure you have OpenAI API key configured in your
.envfile:OPENAI_API_KEY=your_openai_api_key_here
Get information about available voices, formats, and capabilities:
curl http://localhost:8000/tts-voicesResponse:
{
"voices": {
"alloy": "A balanced voice, suitable for most content",
"echo": "A warm, friendly voice",
"fable": "A storytelling voice with character",
"onyx": "A deep, authoritative voice",
"nova": "A bright, energetic voice",
"shimmer": "A soft, gentle voice"
},
"supported_formats": ["mp3", "opus", "aac", "flac"],
"models": ["tts-1", "tts-1-hd"],
"speed_range": {"min": 0.25, "max": 4.0},
"max_text_length": 4096
}Convert text to speech using the default settings:
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "Hello! Welcome to our AI Content Processing API. This is a demonstration of our text-to-speech capabilities."
}'curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "This is a test of the high-quality text-to-speech system using the Onyx voice at a slower speed.",
"voice": "onyx",
"model": "tts-1-hd",
"format": "mp3",
"speed": 0.8
}'Alloy (Balanced):
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "This is the Alloy voice - balanced and suitable for most content.",
"voice": "alloy"
}'Echo (Warm & Friendly):
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "Hi there! This is the Echo voice - warm and friendly for conversational content.",
"voice": "echo"
}'Fable (Storytelling):
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "Once upon a time, in a land far away... This is the Fable voice, perfect for storytelling!",
"voice": "fable"
}'Onyx (Deep & Authoritative):
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "This is an important announcement. The Onyx voice provides deep, authoritative speech.",
"voice": "onyx"
}'Nova (Bright & Energetic):
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "Hey everyone! Welcome to our show! This is the Nova voice - bright and energetic!",
"voice": "nova"
}'Shimmer (Soft & Gentle):
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "Good evening. This is the Shimmer voice - soft and gentle, perfect for calm content.",
"voice": "shimmer"
}'Create a podcast with two speakers having a conversation:
curl -X POST http://localhost:8000/text-to-speech-podcast \
-H "Content-Type: application/json" \
-d '{
"segments": [
{
"text": "Welcome to Tech Talk! I am your host Sarah, and today we are discussing artificial intelligence.",
"voice": "nova",
"speaker_name": "Sarah"
},
{
"text": "Thanks Sarah! Hi everyone, I am Mark, and I am excited to share some insights about AI development.",
"voice": "onyx",
"speaker_name": "Mark"
},
{
"text": "Great! So Mark, what do you think is the most exciting development in AI this year?",
"voice": "nova",
"speaker_name": "Sarah"
},
{
"text": "Well Sarah, I think the advancement in multimodal AI systems has been remarkable. They can now process text, images, and audio simultaneously.",
"voice": "onyx",
"speaker_name": "Mark"
}
]
}'Create a more complex podcast with multiple speakers and narrator:
curl -X POST http://localhost:8000/text-to-speech-podcast \
-H "Content-Type: application/json" \
-d '{
"segments": [
{
"text": "Welcome to Story Time, where we bring classic tales to life with multiple voices.",
"voice": "shimmer",
"speaker_name": "Narrator"
},
{
"text": "Once upon a time, there was a brave knight who lived in a castle.",
"voice": "fable",
"speaker_name": "Storyteller"
},
{
"text": "I must find the dragon and save the kingdom!",
"voice": "onyx",
"speaker_name": "Knight"
},
{
"text": "But first, you must solve my riddle, brave knight.",
"voice": "echo",
"speaker_name": "Wizard"
},
{
"text": "The knight thought carefully about the wizards words.",
"voice": "fable",
"speaker_name": "Storyteller"
},
{
"text": "I accept your challenge, wise wizard!",
"voice": "onyx",
"speaker_name": "Knight"
}
],
"model": "tts-1-hd",
"format": "mp3",
"speed": 1.0
}'curl -X POST http://localhost:8000/text-to-speech-podcast \
-H "Content-Type: application/json" \
-d '{
"segments": [
{
"text": "Good morning, and welcome to Business Insights. I am Jennifer, your host.",
"voice": "nova",
"speaker_name": "Jennifer_Host"
},
{
"text": "Today we have with us Dr. Robert Chen, economist and market analyst.",
"voice": "nova",
"speaker_name": "Jennifer_Host"
},
{
"text": "Thank you for having me, Jennifer. It is great to be here.",
"voice": "onyx",
"speaker_name": "Dr_Chen"
},
{
"text": "Dr. Chen, what are your thoughts on the current market trends?",
"voice": "nova",
"speaker_name": "Jennifer_Host"
},
{
"text": "Well, Jennifer, we are seeing some interesting patterns emerge in the technology sector.",
"voice": "onyx",
"speaker_name": "Dr_Chen"
}
],
"model": "tts-1-hd",
"format": "flac",
"speed": 0.9
}'{
"request_id": "123e4567-e89b-12d3-a456-426614174000",
"success": true,
"error": null,
"audio_file": "/tmp/ai_content_process/tts/tts_alloy_20250107_143022.mp3",
"file_size_mb": 0.15,
"voice": "alloy",
"model": "tts-1",
"format": "mp3",
"speed": 1.0,
"text_length": 87,
"processing_time": 2.34,
"timestamp": "2025-01-07T14:30:22.123456"
}{
"request_id": "123e4567-e89b-12d3-a456-426614174001",
"success": true,
"segments": [
{
"segment_index": 0,
"success": true,
"error": null,
"audio_file": "/tmp/ai_content_process/tts/podcast_20250107_143025/001_Sarah_nova.mp3",
"file_size_mb": 0.12,
"voice": "nova",
"speaker_name": "Sarah",
"original_text": "Welcome to Tech Talk! I am your host Sarah...",
"processing_time": 1.45
},
{
"segment_index": 1,
"success": true,
"error": null,
"audio_file": "/tmp/ai_content_process/tts/podcast_20250107_143025/002_Mark_onyx.mp3",
"file_size_mb": 0.18,
"voice": "onyx",
"speaker_name": "Mark",
"original_text": "Thanks Sarah! Hi everyone, I am Mark...",
"processing_time": 1.67
}
],
"total_segments": 2,
"successful_segments": 2,
"failed_segments": 0,
"total_characters": 145,
"output_directory": "/tmp/ai_content_process/tts/podcast_20250107_143025",
"processing_time": 3.45,
"timestamp": "2025-01-07T14:30:25.123456"
}Save this as test_tts.sh and run it to test all endpoints:
#!/bin/bash
echo "π€ Testing TTS API endpoints..."
echo "1. Getting available voices..."
curl -s http://localhost:8000/tts-voices | jq
echo -e "\n2. Testing single voice TTS..."
curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "This is a test of the text-to-speech API!",
"voice": "alloy"
}' | jq
echo -e "\n3. Testing podcast mode..."
curl -X POST http://localhost:8000/text-to-speech-podcast \
-H "Content-Type: application/json" \
-d '{
"segments": [
{
"text": "Hello, I am the first speaker.",
"voice": "nova",
"speaker_name": "Speaker1"
},
{
"text": "And I am the second speaker.",
"voice": "onyx",
"speaker_name": "Speaker2"
}
]
}' | jq
echo -e "\nβ
TTS API testing complete!"Make it executable and run:
chmod +x test_tts.sh
./test_tts.shcurl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "Testing invalid voice",
"voice": "invalid_voice"
}'Response:
{
"request_id": "...",
"success": false,
"error": "Invalid voice 'invalid_voice'. Available: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']",
"audio_file": null,
"processing_time": 0.0,
"timestamp": "..."
}curl -X POST http://localhost:8000/text-to-speech \
-H "Content-Type: application/json" \
-d '{
"text": "' $(python -c "print('x' * 5000)") '",
"voice": "alloy"
}'-
Voice Selection:
- Use
novafor energetic hosts - Use
onyxfor authoritative speakers - Use
shimmerfor calm, gentle content - Use
fablefor storytelling - Use
echofor friendly conversations - Use
alloyfor general-purpose content
- Use
-
Speed Adjustment:
- 0.25-0.75: Slow, clear speech
- 1.0: Normal speed
- 1.5-4.0: Fast speech for energetic content
-
Format Selection:
mp3: Good balance of quality and sizeflac: Highest quality, larger filesopus: Good for web streamingaac: Good for mobile apps
-
Text Length:
- Maximum 4096 characters per segment
- Use podcast mode for longer content
- Break text at natural points (sentences, paragraphs)
- Podcasts: Multiple speakers with different voices
- Audiobooks: Single narrator with consistent voice
- Voice-overs: Professional quality audio
- Educational Content: Clear, authoritative speech
- Interactive Applications: Dynamic speech generation
- Accessibility: Text-to-speech for visually impaired users