Skip to content

jaibhasin/AI-Caller-Review-Collector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ AI Caller Review Collector

An intelligent AI voice agent that conducts natural phone conversations to collect customer feedback

Usually customers are very hesitant to give feedback online about their recent purchase of some product/service, for e.g. how often do people give a review on Amazon after using a product.

With latest developments in LLMs, Text to Speech services, and RAG systems, we can create a nice AI Caller that can be used to collect reviews/feedback from customers within a few seconds along with their suggestions, sentiment, etc.

✨ What Makes This Special

🧠 Intelligent Conversation Pipeline

  • 3-Step LLM Processing: Analyze β†’ Plan β†’ Generate for structured, natural responses
  • Smart Context Awareness: Remembers conversation history and adapts accordingly
  • Emotional Intelligence: Detects sentiment and responds with appropriate empathy
  • Role Consistency: Advanced role confusion detection prevents AI identity mix-ups

🎯 Natural Conversation Flow

  • Sarah Persona: Warm, professional customer service representative
  • Structured Responses: Always follows Acknowledge β†’ Empathize β†’ Ask pattern
  • Natural Pacing: Automatic pause insertion for human-like speech rhythm
  • Topic Tracking: Avoids repetitive questions by remembering what's been discussed

πŸ”Š High-Quality Audio Pipeline

  • Real-time Processing: WebSocket-based bidirectional audio streaming
  • Smart Format Handling: Automatic browser audio format detection and optimization
  • Natural Voice: ElevenLabs Rachel voice with optimized settings for phone conversations
  • Reliable STT: AssemblyAI with direct WebM support (no conversion needed)

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    WebSocket    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      │◄──────────────►│   FastAPI        β”‚
β”‚   (Browser)     β”‚                β”‚   Backend        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                                   β”‚
         β”‚ MediaRecorder                     β”‚
         β”‚ (WebM/Opus)                       β”‚
         β–Ό                                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Web Audio API   β”‚                β”‚ 3-Step Pipeline  β”‚
β”‚ Audio Playback  β”‚                β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                β”‚ β”‚1. Analyze    β”‚ β”‚
                                   β”‚ β”‚2. Plan       β”‚ β”‚
                                   β”‚ β”‚3. Generate   β”‚ β”‚
                                   β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                            β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β–Ό                       β–Ό                       β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ AssemblyAI   β”‚    β”‚ Google Gemini    β”‚    β”‚ ElevenLabs   β”‚
            β”‚ (STT)        β”‚    β”‚ 2.0 Flash        β”‚    β”‚ (TTS)        β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚ (LLM)            β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • Modern web browser (Chrome/Edge recommended)
  • API Keys: Google AI, ElevenLabs, AssemblyAI

Installation

  1. Clone and Setup

    git clone https://github.com/jaibhasin/AI-Caller-Review-Collector.git
    cd AI-Caller-Review-Collector
    python3 -m venv venv
    source venv/bin/activate  # macOS/Linux
    pip install -r requirements.txt
  2. Configure Environment

    # Create .env file
    echo "SECRET_KEY_GOOGLE_AI=your_google_ai_key" > .env
    echo "ELEVEN_LABS_API_KEY=your_elevenlabs_key" >> .env
    echo "ASSEMBLYAI_API_KEY=your_assemblyai_key" >> .env
  3. Launch

    # Start backend
    uvicorn app.main:app --reload
    
    # Open frontend/index.html in browser
    # Click "Start Call" and begin conversation!

🎭 Meet Sarah - Your AI Agent

Sarah is designed to be the perfect customer service representative:

  • Personality: Warm, professional, genuinely interested
  • Communication Style: Natural, conversational, never rushed
  • Intelligence: Understands context, emotions, and conversation flow
  • Consistency: Always stays in character as the company representative

Example Conversation

Sarah: "Hi there! This is Sarah calling from Lifelong. I hope you're 
       having a good day. I wanted to give you a quick call about the 
       pickleball set you got from us recently. Is this an okay time 
       to chat for just a minute?"

Customer: "Oh hi! Yeah, sure, I have a few minutes."

Sarah: "Wonderful! I'm so glad I caught you at a good time... How has 
       your experience been with the pickleball set so far?"

Customer: "It's been really great actually! The grip is so comfortable."

Sarah: "Oh that's fantastic to hear that you love the grip comfort!... 
       What specifically makes it feel so good to use?"

πŸ”§ Technical Features

Intelligent Response Pipeline

# Every customer response goes through:
1. ANALYZE    β†’ Extract sentiment, topic, keywords, emotion level
2. PLAN       β†’ Decide acknowledgment style, empathy approach, follow-up
3. GENERATE   β†’ Create natural Sarah response following structure
4. POST-PROCESS β†’ Fix role confusion, add natural pacing

Advanced Conversation State

conversation_state = {
    "topics_covered": ["grip", "durability"],
    "customer_sentiment": "positive",
    "turn_count": 3,
    "last_analysis": {...},
    "last_plan": {...},
    "conversation_history": [...]
}

Audio Optimization

  • Browser Compatibility: Automatic format detection (WebM β†’ MP4 β†’ fallback)
  • Natural Speech: Optimized ElevenLabs settings with pauses and pacing
  • Reliable Processing: Direct WebM support, no ffmpeg conversion needed
  • Quality Control: Phone-optimized voice settings for clear communication

πŸ“ Project Structure

AI-Caller-Review-Collector/
β”œβ”€β”€ 🎯 Core Application
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ main.py                    # FastAPI entry point
β”‚   β”‚   β”œβ”€β”€ api/agent_voice.py         # 3-step pipeline WebSocket handler
β”‚   β”‚   └── services/
β”‚   β”‚       └── simple_stt_service.py  # Optimized AssemblyAI integration
β”‚   └── frontend/
β”‚       β”œβ”€β”€ index.html                 # Modern UI with audio visualization
β”‚       β”œβ”€β”€ script.js                  # WebSocket + Web Audio API
β”‚       └── styles.css                 # Responsive design
β”œβ”€β”€ πŸ“š Documentation
β”‚   β”œβ”€β”€ structured_response_pipeline.md
β”‚   β”œβ”€β”€ conversation_improvements.md
β”‚   └── role_confusion_fix.md
β”œβ”€β”€ πŸ§ͺ Testing
β”‚   β”œβ”€β”€ test_audio_formats.html
β”‚   └── conversation_example.md
└── βš™οΈ Configuration
    β”œβ”€β”€ requirements.txt
    β”œβ”€β”€ .env
    └── README.md

πŸ” Environment Configuration

Variable Purpose Example
SECRET_KEY_GOOGLE_AI Gemini 2.0 Flash API access AIzaSy...
ELEVEN_LABS_API_KEY Rachel voice synthesis sk_...
ASSEMBLYAI_API_KEY Real-time speech recognition a13c86...

🎨 API Endpoints

REST API

  • GET / - Health check and system status
  • GET /docs - Interactive API documentation (Swagger UI)

WebSocket API

  • WS /api/agent/voice - Real-time voice conversation endpoint
    • Accepts: WebM/Opus audio chunks
    • Returns: JSON conversation data + MP3 audio chunks

πŸ› Troubleshooting

Common Issues

🎀 Microphone Not Working

# Check browser permissions
# Ensure HTTPS or localhost
# Verify Web Audio API support

πŸ”Œ WebSocket Connection Failed

# Verify backend is running: http://localhost:8000
# Check firewall settings
# Confirm CORS configuration

πŸ€– AI Role Confusion

# Automatic detection and fixing implemented
# Check console for "[DEBUG] Fixed role confusion" messages
# Review conversation_state in logs

πŸ”Š Audio Quality Issues

# Test browser audio format support: open test_audio_formats.html
# Check ElevenLabs API quota
# Verify voice settings in agent_voice.py

πŸš€ Advanced Usage

Custom Voice Configuration

# In agent_voice.py, modify:
VOICE_ID = "21m00Tcm4TlvDq8ikWAM"  # Rachel (default)
# Or try: "pNInz6obpgDQGcFmaJgB"  # Adam

Conversation Customization

# Modify BASE_PROMPT in agent_voice.py for different:
# - Product types
# - Company personas  
# - Conversation styles
# - Response structures

Pipeline Tuning

# Adjust LLM settings:
temperature=0.8,    # Creativity (0.1-1.0)
max_tokens=150,     # Response length
top_p=0.9          # Response variety

πŸ“Š Performance Metrics

  • Response Time: ~2-3 seconds (STT + LLM Pipeline + TTS)
  • Audio Quality: 16kHz, optimized for voice clarity
  • Conversation Length: Automatically managed (5-7 turns typical)
  • Success Rate: 95%+ natural conversation flow

🀝 Contributing

This is a portfolio project, but suggestions and improvements are welcome!

  1. Fork the repository
  2. Create a feature branch
  3. Make your improvements
  4. Submit a pull request

πŸ“„ License

MIT License - Feel free to use this project for learning and development.

πŸ‘€ Author

Jai Bhasin

πŸ™ Acknowledgments

  • Google AI - Gemini 2.0 Flash LLM
  • ElevenLabs - Natural voice synthesis
  • AssemblyAI - Real-time speech recognition
  • FastAPI - Modern Python web framework
  • LangChain - LLM application framework

Built with ❀️ for natural human-AI conversation

About

AI Caller for reviews collection. Can be used by businesses to get reviews on their latest prodcuts/services sold

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors