Skip to content

Latest commit

Β 

History

History
436 lines (377 loc) Β· 11.2 KB

File metadata and controls

436 lines (377 loc) Β· 11.2 KB

Text-to-Speech API - Curl Examples

This document provides curl examples for using the Text-to-Speech (TTS) endpoints in the AI Content Processing API.

🎀 Available Endpoints

  1. GET /tts-voices - Get available voices and capabilities
  2. POST /text-to-speech - Convert text to speech with single voice
  3. POST /text-to-speech-podcast - Convert multiple text segments with different voices (perfect for podcasts)

πŸ”§ Prerequisites

  1. Make sure the API server is running:

    python api_server.py
  2. Ensure you have OpenAI API key configured in your .env file:

    OPENAI_API_KEY=your_openai_api_key_here
    

πŸ“‹ Get Available Voices

Get information about available voices, formats, and capabilities:

curl http://localhost:8000/tts-voices

Response:

{
  "voices": {
    "alloy": "A balanced voice, suitable for most content",
    "echo": "A warm, friendly voice",
    "fable": "A storytelling voice with character",
    "onyx": "A deep, authoritative voice",
    "nova": "A bright, energetic voice",
    "shimmer": "A soft, gentle voice"
  },
  "supported_formats": ["mp3", "opus", "aac", "flac"],
  "models": ["tts-1", "tts-1-hd"],
  "speed_range": {"min": 0.25, "max": 4.0},
  "max_text_length": 4096
}

🎡 Single Voice Text-to-Speech

Basic Example

Convert text to speech using the default settings:

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to our AI Content Processing API. This is a demonstration of our text-to-speech capabilities."
  }'

Advanced Example with Custom Settings

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "This is a test of the high-quality text-to-speech system using the Onyx voice at a slower speed.",
    "voice": "onyx",
    "model": "tts-1-hd",
    "format": "mp3",
    "speed": 0.8
  }'

Different Voices Examples

Alloy (Balanced):

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "This is the Alloy voice - balanced and suitable for most content.",
    "voice": "alloy"
  }'

Echo (Warm & Friendly):

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hi there! This is the Echo voice - warm and friendly for conversational content.",
    "voice": "echo"
  }'

Fable (Storytelling):

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Once upon a time, in a land far away... This is the Fable voice, perfect for storytelling!",
    "voice": "fable"
  }'

Onyx (Deep & Authoritative):

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "This is an important announcement. The Onyx voice provides deep, authoritative speech.",
    "voice": "onyx"
  }'

Nova (Bright & Energetic):

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hey everyone! Welcome to our show! This is the Nova voice - bright and energetic!",
    "voice": "nova"
  }'

Shimmer (Soft & Gentle):

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Good evening. This is the Shimmer voice - soft and gentle, perfect for calm content.",
    "voice": "shimmer"
  }'

πŸŽ™οΈ Podcast Mode - Multiple Voices

Simple Podcast Example

Create a podcast with two speakers having a conversation:

curl -X POST http://localhost:8000/text-to-speech-podcast \
  -H "Content-Type: application/json" \
  -d '{
    "segments": [
      {
        "text": "Welcome to Tech Talk! I am your host Sarah, and today we are discussing artificial intelligence.",
        "voice": "nova",
        "speaker_name": "Sarah"
      },
      {
        "text": "Thanks Sarah! Hi everyone, I am Mark, and I am excited to share some insights about AI development.",
        "voice": "onyx",
        "speaker_name": "Mark"
      },
      {
        "text": "Great! So Mark, what do you think is the most exciting development in AI this year?",
        "voice": "nova",
        "speaker_name": "Sarah"
      },
      {
        "text": "Well Sarah, I think the advancement in multimodal AI systems has been remarkable. They can now process text, images, and audio simultaneously.",
        "voice": "onyx",
        "speaker_name": "Mark"
      }
    ]
  }'

Advanced Podcast Example

Create a more complex podcast with multiple speakers and narrator:

curl -X POST http://localhost:8000/text-to-speech-podcast \
  -H "Content-Type: application/json" \
  -d '{
    "segments": [
      {
        "text": "Welcome to Story Time, where we bring classic tales to life with multiple voices.",
        "voice": "shimmer",
        "speaker_name": "Narrator"
      },
      {
        "text": "Once upon a time, there was a brave knight who lived in a castle.",
        "voice": "fable",
        "speaker_name": "Storyteller"
      },
      {
        "text": "I must find the dragon and save the kingdom!",
        "voice": "onyx",
        "speaker_name": "Knight"
      },
      {
        "text": "But first, you must solve my riddle, brave knight.",
        "voice": "echo",
        "speaker_name": "Wizard"
      },
      {
        "text": "The knight thought carefully about the wizards words.",
        "voice": "fable",
        "speaker_name": "Storyteller"
      },
      {
        "text": "I accept your challenge, wise wizard!",
        "voice": "onyx",
        "speaker_name": "Knight"
      }
    ],
    "model": "tts-1-hd",
    "format": "mp3",
    "speed": 1.0
  }'

High-Quality Podcast with Custom Settings

curl -X POST http://localhost:8000/text-to-speech-podcast \
  -H "Content-Type: application/json" \
  -d '{
    "segments": [
      {
        "text": "Good morning, and welcome to Business Insights. I am Jennifer, your host.",
        "voice": "nova",
        "speaker_name": "Jennifer_Host"
      },
      {
        "text": "Today we have with us Dr. Robert Chen, economist and market analyst.",
        "voice": "nova",
        "speaker_name": "Jennifer_Host"
      },
      {
        "text": "Thank you for having me, Jennifer. It is great to be here.",
        "voice": "onyx",
        "speaker_name": "Dr_Chen"
      },
      {
        "text": "Dr. Chen, what are your thoughts on the current market trends?",
        "voice": "nova",
        "speaker_name": "Jennifer_Host"
      },
      {
        "text": "Well, Jennifer, we are seeing some interesting patterns emerge in the technology sector.",
        "voice": "onyx",
        "speaker_name": "Dr_Chen"
      }
    ],
    "model": "tts-1-hd",
    "format": "flac",
    "speed": 0.9
  }'

πŸ“ Response Format

Single Voice Response Example

{
  "request_id": "123e4567-e89b-12d3-a456-426614174000",
  "success": true,
  "error": null,
  "audio_file": "/tmp/ai_content_process/tts/tts_alloy_20250107_143022.mp3",
  "file_size_mb": 0.15,
  "voice": "alloy",
  "model": "tts-1",
  "format": "mp3",
  "speed": 1.0,
  "text_length": 87,
  "processing_time": 2.34,
  "timestamp": "2025-01-07T14:30:22.123456"
}

Podcast Mode Response Example

{
  "request_id": "123e4567-e89b-12d3-a456-426614174001",
  "success": true,
  "segments": [
    {
      "segment_index": 0,
      "success": true,
      "error": null,
      "audio_file": "/tmp/ai_content_process/tts/podcast_20250107_143025/001_Sarah_nova.mp3",
      "file_size_mb": 0.12,
      "voice": "nova",
      "speaker_name": "Sarah",
      "original_text": "Welcome to Tech Talk! I am your host Sarah...",
      "processing_time": 1.45
    },
    {
      "segment_index": 1,
      "success": true,
      "error": null,
      "audio_file": "/tmp/ai_content_process/tts/podcast_20250107_143025/002_Mark_onyx.mp3",
      "file_size_mb": 0.18,
      "voice": "onyx",
      "speaker_name": "Mark",
      "original_text": "Thanks Sarah! Hi everyone, I am Mark...",
      "processing_time": 1.67
    }
  ],
  "total_segments": 2,
  "successful_segments": 2,
  "failed_segments": 0,
  "total_characters": 145,
  "output_directory": "/tmp/ai_content_process/tts/podcast_20250107_143025",
  "processing_time": 3.45,
  "timestamp": "2025-01-07T14:30:25.123456"
}

πŸš€ Quick Start Script

Save this as test_tts.sh and run it to test all endpoints:

#!/bin/bash

echo "🎀 Testing TTS API endpoints..."

echo "1. Getting available voices..."
curl -s http://localhost:8000/tts-voices | jq

echo -e "\n2. Testing single voice TTS..."
curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "This is a test of the text-to-speech API!",
    "voice": "alloy"
  }' | jq

echo -e "\n3. Testing podcast mode..."
curl -X POST http://localhost:8000/text-to-speech-podcast \
  -H "Content-Type: application/json" \
  -d '{
    "segments": [
      {
        "text": "Hello, I am the first speaker.",
        "voice": "nova",
        "speaker_name": "Speaker1"
      },
      {
        "text": "And I am the second speaker.",
        "voice": "onyx",
        "speaker_name": "Speaker2"
      }
    ]
  }' | jq

echo -e "\nβœ… TTS API testing complete!"

Make it executable and run:

chmod +x test_tts.sh
./test_tts.sh

πŸ”§ Error Handling

Invalid Voice

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Testing invalid voice",
    "voice": "invalid_voice"
  }'

Response:

{
  "request_id": "...",
  "success": false,
  "error": "Invalid voice 'invalid_voice'. Available: ['alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer']",
  "audio_file": null,
  "processing_time": 0.0,
  "timestamp": "..."
}

Text Too Long

curl -X POST http://localhost:8000/text-to-speech \
  -H "Content-Type: application/json" \
  -d '{
    "text": "' $(python -c "print('x' * 5000)") '",
    "voice": "alloy"
  }'

πŸ’‘ Tips for Best Results

  1. Voice Selection:

    • Use nova for energetic hosts
    • Use onyx for authoritative speakers
    • Use shimmer for calm, gentle content
    • Use fable for storytelling
    • Use echo for friendly conversations
    • Use alloy for general-purpose content
  2. Speed Adjustment:

    • 0.25-0.75: Slow, clear speech
    • 1.0: Normal speed
    • 1.5-4.0: Fast speech for energetic content
  3. Format Selection:

    • mp3: Good balance of quality and size
    • flac: Highest quality, larger files
    • opus: Good for web streaming
    • aac: Good for mobile apps
  4. Text Length:

    • Maximum 4096 characters per segment
    • Use podcast mode for longer content
    • Break text at natural points (sentences, paragraphs)

🎯 Use Cases

  • Podcasts: Multiple speakers with different voices
  • Audiobooks: Single narrator with consistent voice
  • Voice-overs: Professional quality audio
  • Educational Content: Clear, authoritative speech
  • Interactive Applications: Dynamic speech generation
  • Accessibility: Text-to-speech for visually impaired users