A state-of-the-art AI-powered voice assistant and receptionist designed for dental clinics. Built with LangGraph, Groq, Pinecone, and FastAPI, this system provides human-like conversational experiences with full RAG-enabled knowledge recall and automated appointment scheduling.
- Ultra-Low Latency Conversational Flow: Optimized with Groq (Llama-3.1-8b and GPT-OSS-20B) to deliver under ~1s response times.
- RAG-Enabled Knowledge Base: Context-aware retrieval from Pinecone allows the bot to answer complex FAQs about clinic procedures, hours, and pricing.
- Automated Scheduling: Direct integration with Google Calendar API to check availability and book appointments in real-time.
- Persistent State Management: Uses LangGraph's checkpointer to maintain "long-term memory" across multiple turns.
- Proactive Latency Masking: Implements filler messages and early RAG pre-fetching to eliminate silence during heavy API processing.
- Multi-Channel Delivery: Native support for high-stakes voice (Vapi) and visual web chat.
This backend is designed as a Custom LLM Provider for Vapi. It implements the Vapi Custom LLM protocol via SSE (Server-Sent Events).
- Deploy the Backend: Use ngrok or a public VPC to expose the FastAPI server.
- Vapi Configuration:
- Set Model:
custom-llm - Set URL:
https://your-domain.com/chat/completions
- Set Model:
- Internal Logic:
- The
/chat/completionsendpoint maps incoming Vapi payloads to LangGraphthread_ids. - It supports the
modelparameter passed by Vapi. - It extracts the
call.idfor persistent checkpointing.
- The
The project includes a separate, modern Frontend Web Chat UI.
- Streaming Response: Real-time SSE streaming for a "typing" effect.
- Status Indicators: Displays
[SIGNAL_PROCESSING]when the AI is performing RAG or Calendar lookups. - Thread Persistence: Each web session maintains its own
threadIdfor consistent memory.
- Navigate to the
frontenddirectory. - Run
npm installfollowed bynpm run build. - The FastAPI server automatically serves the build at the
/chatendpoint.
The core of the assistant is a directed acyclic graph (DAG) managed by LangGraph.
graph TD
Entry([User Message]) --> Handshake[Early Handshake]
Handshake --> Classify[Intent Classification]
Classify --> Router{Intent Router}
%% Standard Flows
Router -- Greeting --> HandleGreeting[Handle Greeting]
Router -- FAQ Query --> RAG[RAG Retrieval]
Router -- Cancellation --> HandleCancel[Handle Cancellation]
%% High-Latency Voice Flows (Masked by Filler)
Router -- Appointment Check --> StatusFiller[Provide Filler]
Router -- Search Availability --> AvailFiller[Provide Filler]
Router -- Confirm Booking --> ConfirmFiller[Provide Filler]
%% Booking Logic
Router -- Booking Request --> Capture[Capture Details]
Capture -- Missing Info --> END
Capture -- Has Info --> AvailFiller
%% Transitions from Filler
StatusFiller --> CheckStatus[Check Appointment Status]
AvailFiller --> CheckAvail[Check Availability]
ConfirmFiller --> FinalConfirm[Confirm Appointment]
%% Exit Points
HandleGreeting --> END
RAG --> END
HandleCancel --> END
CheckStatus --> END
CheckAvail --> END
FinalConfirm --> END
style Handshake fill:#f96,stroke:#333
style AvailFiller fill:#bbf,stroke:#333
style StatusFiller fill:#bbf,stroke:#333
style ConfirmFiller fill:#bbf,stroke:#333
In the classify_intent node, the system pre-fetches RAG results in a parallel asyncio.create_task. While the Intent Classifier (Llama 8B) is determining the turn's goal, the Vector Database (Pinecone) is already retrieving relevant clinic data. This cuts response latency by ~30-40%.
- Core: Python 3.10+, FastAPI
- Orchestration: LangGraph (LangChain)
- Large Language Models:
- Groq/Llama-3.1-8b-instant (Intent Classification)
- Groq/GPT-OSS-20B (Primary Brain/Reasoning)
- Vector Database: Pinecone
- Embeddings: FastEmbed (BAAI/bge-small-en-v1.5)
- External APIs: Google Calendar, Twilio
- State Persistence: SQLite (Checkpointer)
Copy .env.example to .env and fill in:
GROQ_API_KEY: For ultra-fast inference.OPENAI_API_KEY: Fallback or primary brain.PINECONE_API_KEY&PINECONE_INDEX_NAME: Knowledge base.GOOGLE_CALENDAR_TOKEN_JSON: Encoded credentials for scheduling.TWILIO_ACCOUNT_SID&TWILIO_AUTH_TOKEN: For SMS notifications.
Built for FamilyHealth Clinic, this receptionist is engineered for Functional Purity. By leveraging LangGraph's update-based state return pattern, the assistant avoids memory wipes and maintains a robust, immutable history of the conversation, ensuring it follows the persona rules of "Daniel," our helpful clinic receptionist.
This project is licensed under the MIT License - see the LICENSE file for details.
Ben Onwurah - @onwurahben
Project Link: https://github.com/onwurahben/voice-ai-bot
Built with ❤️ using Groq, LangGraph, Pinecone, and FastAPI