Skip to content

onwurahben/call-agent-web-v2

Repository files navigation

SOTA Voice AI Receptionist - FamilyHealth Clinic

A state-of-the-art AI-powered voice assistant and receptionist designed for dental clinics. Built with LangGraph, Groq, Pinecone, and FastAPI, this system provides human-like conversational experiences with full RAG-enabled knowledge recall and automated appointment scheduling.

🚀 Key Features

  • Ultra-Low Latency Conversational Flow: Optimized with Groq (Llama-3.1-8b and GPT-OSS-20B) to deliver under ~1s response times.
  • RAG-Enabled Knowledge Base: Context-aware retrieval from Pinecone allows the bot to answer complex FAQs about clinic procedures, hours, and pricing.
  • Automated Scheduling: Direct integration with Google Calendar API to check availability and book appointments in real-time.
  • Persistent State Management: Uses LangGraph's checkpointer to maintain "long-term memory" across multiple turns.
  • Proactive Latency Masking: Implements filler messages and early RAG pre-fetching to eliminate silence during heavy API processing.
  • Multi-Channel Delivery: Native support for high-stakes voice (Vapi) and visual web chat.

📞 Vapi Integration (Custom LLM)

This backend is designed as a Custom LLM Provider for Vapi. It implements the Vapi Custom LLM protocol via SSE (Server-Sent Events).

Connection Steps:

  1. Deploy the Backend: Use ngrok or a public VPC to expose the FastAPI server.
  2. Vapi Configuration:
    • Set Model: custom-llm
    • Set URL: https://your-domain.com/chat/completions
  3. Internal Logic:
    • The /chat/completions endpoint maps incoming Vapi payloads to LangGraph thread_ids.
    • It supports the model parameter passed by Vapi.
    • It extracts the call.id for persistent checkpointing.

💻 Web Chat Interface

The project includes a separate, modern Frontend Web Chat UI.

Features:

  • Streaming Response: Real-time SSE streaming for a "typing" effect.
  • Status Indicators: Displays [SIGNAL_PROCESSING] when the AI is performing RAG or Calendar lookups.
  • Thread Persistence: Each web session maintains its own threadId for consistent memory.

Setup:

  1. Navigate to the frontend directory.
  2. Run npm install followed by npm run build.
  3. The FastAPI server automatically serves the build at the /chat endpoint.

🏗 Architecture & Logic Flow

The core of the assistant is a directed acyclic graph (DAG) managed by LangGraph.

graph TD
    Entry([User Message]) --> Handshake[Early Handshake]
    Handshake --> Classify[Intent Classification]
    
    Classify --> Router{Intent Router}
    
    %% Standard Flows
    Router -- Greeting --> HandleGreeting[Handle Greeting]
    Router -- FAQ Query --> RAG[RAG Retrieval]
    Router -- Cancellation --> HandleCancel[Handle Cancellation]
    
    %% High-Latency Voice Flows (Masked by Filler)
    Router -- Appointment Check --> StatusFiller[Provide Filler]
    Router -- Search Availability --> AvailFiller[Provide Filler]
    Router -- Confirm Booking --> ConfirmFiller[Provide Filler]
    
    %% Booking Logic
    Router -- Booking Request --> Capture[Capture Details]
    Capture -- Missing Info --> END
    Capture -- Has Info --> AvailFiller
    
    %% Transitions from Filler
    StatusFiller --> CheckStatus[Check Appointment Status]
    AvailFiller --> CheckAvail[Check Availability]
    ConfirmFiller --> FinalConfirm[Confirm Appointment]
    
    %% Exit Points
    HandleGreeting --> END
    RAG --> END
    HandleCancel --> END
    CheckStatus --> END
    CheckAvail --> END
    FinalConfirm --> END
    
    style Handshake fill:#f96,stroke:#333
    style AvailFiller fill:#bbf,stroke:#333
    style StatusFiller fill:#bbf,stroke:#333
    style ConfirmFiller fill:#bbf,stroke:#333
Loading

Advanced Parallelization

In the classify_intent node, the system pre-fetches RAG results in a parallel asyncio.create_task. While the Intent Classifier (Llama 8B) is determining the turn's goal, the Vector Database (Pinecone) is already retrieving relevant clinic data. This cuts response latency by ~30-40%.


🛠 Tech Stack

  • Core: Python 3.10+, FastAPI
  • Orchestration: LangGraph (LangChain)
  • Large Language Models:
    • Groq/Llama-3.1-8b-instant (Intent Classification)
    • Groq/GPT-OSS-20B (Primary Brain/Reasoning)
  • Vector Database: Pinecone
  • Embeddings: FastEmbed (BAAI/bge-small-en-v1.5)
  • External APIs: Google Calendar, Twilio
  • State Persistence: SQLite (Checkpointer)

⚙️ Environment Variables

Copy .env.example to .env and fill in:

  • GROQ_API_KEY: For ultra-fast inference.
  • OPENAI_API_KEY: Fallback or primary brain.
  • PINECONE_API_KEY & PINECONE_INDEX_NAME: Knowledge base.
  • GOOGLE_CALENDAR_TOKEN_JSON: Encoded credentials for scheduling.
  • TWILIO_ACCOUNT_SID & TWILIO_AUTH_TOKEN: For SMS notifications.

📜 Design Philosophy

Built for FamilyHealth Clinic, this receptionist is engineered for Functional Purity. By leveraging LangGraph's update-based state return pattern, the assistant avoids memory wipes and maintains a robust, immutable history of the conversation, ensuring it follows the persona rules of "Daniel," our helpful clinic receptionist.


📚 License

This project is licensed under the MIT License - see the LICENSE file for details.


📧 Contact

Ben Onwurah - @onwurahben

Project Link: https://github.com/onwurahben/voice-ai-bot


Built with ❤️ using Groq, LangGraph, Pinecone, and FastAPI

About

State-of-the-art voice + chat receptionist built using LangGraph and Vapi

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors