A modular, voice-enabled AI chatbot built with Python, OpenAI, and Gradio. Supports multiple bot personas (travel agent, financial advisor, call center rep, interview coach) with real-time speech-to-text input and text-to-speech responses. Deployed on AWS EC2 via Docker.
My Contributions: Built on an open-source GenAI voice framework. I extended it with an interview coach persona and handled end-to-end deployment to AWS EC2 using Docker — including server setup, swap configuration, containerization, and making the app publicly accessible.
- 🎙️ Speaks and listens — uses your microphone for input and plays responses through your speakers
- 🤖 Multiple bot personas — switch between travel, financial, call center, and interview coach bots via context files
- 🌐 Web UI — interactive Gradio interface accessible in the browser
- ☁️ Cloud deployed — runs on AWS EC2 (free tier) via Docker
- 🔊 Flexible audio — supports OpenAI Whisper API for transcription and gTTS/pyttsx3 for speech output
🌐 Live Demo: http://50.18.70.71/
| Layer | Technology |
|---|---|
| Language | Python 3.10+ |
| LLM | OpenAI GPT-3.5 Turbo |
| Speech-to-Text | OpenAI Whisper API, SpeechRecognition |
| Text-to-Speech | gTTS, pyttsx3 |
| Audio Processing | PyDub, ffmpeg, PyAudio |
| Web UI | Gradio |
| Dependency Management | Poetry |
| Containerization | Docker + Docker Compose |
| Cloud Deployment | AWS EC2 (t2.micro, Free Tier) |
Each persona is powered by a dedicated context file in the data/ directory:
| Bot | Context File | Description |
|---|---|---|
| Travel Agent | travel_bot_context.txt |
Answers travel and itinerary questions |
| Financial Advisor | financial_bot_context.txt |
Answers financial questions |
| Call Center Rep | call_center_prompt_with_intents_categories_context.txt |
Handles customer service queries with intent classification |
| Interview Coach | interview_coach_context.txt |
Helps candidates prep for job interviews |
audioBot/
├── app/
│ ├── chatbot_gradio_runner.py # Main Gradio app entry point
│ └── chatbot_gradio_runner.ipynb # Jupyter notebook version
├── genai_voice/
│ ├── bots/
│ │ └── chatbot.py # Core ChatBot class (speak, listen, respond)
│ ├── config/
│ │ └── defaults.py # API keys, model config
│ ├── data_utils/
│ │ └── extract_web_data.py # Web scraping utilities for context data
│ ├── defintions/
│ │ ├── prompts.py # System prompts for each bot persona
│ │ └── model_response_formats.py
│ ├── logger/
│ │ └── log_utils.py # Custom logging utility
│ ├── models/
│ │ ├── open_ai.py # OpenAI API wrapper (chat + streaming)
│ │ └── claude_sonnet.py # Claude model config
│ ├── moderation/
│ │ └── responses.py # Response filtering
│ └── processing/
│ └── audio.py # Audio I/O: mic input, STT, TTS, playback
├── data/ # Context files for each bot persona
├── playground/ # Streamlit experiments and audio tests
├── Dockerfile
├── docker-compose.yml
├── aws_deployment_guide.md
└── pyproject.toml
- Python 3.10 or 3.12
- Poetry
- ffmpeg
- An OpenAI API key
git clone https://github.com/kamilj62/audioBot.git
cd audioBotMac:
brew install ffmpegLinux:
apt install ffmpeg libportaudio2 portaudio19-devWindows: ffmpeg binaries are included in the libs/ directory.
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
poetry lock
poetry install
playwright installCreate a .env file in the root:
OPENAI_API_KEY=your_key_here
USE_LOCAL_WHISPER=FalseOption A — Jupyter Notebook:
ipython kernel install --user --name=venv
jupyter notebook app/chatbot_gradio_runner.ipynbOption B — Python script:
poetry run RunChatBotScriptThis app is deployed on AWS EC2 (t2.micro free tier) using Docker. See the full step-by-step guide in aws_deployment_guide.md.
Quick summary:
- Launch a t2.micro EC2 instance (Ubuntu)
- Install Docker on the instance
- Upload your
.envfile with your OpenAI key - Run
docker-compose up -d --build - Access the Gradio UI at
http://your-ec2-public-ip
Live instance: http://50.18.70.71/
The app uses
USE_LOCAL_WHISPER=Falseto offload transcription to OpenAI's API, keeping memory usage low enough to run on the 1GB t2.micro instance.
Deploying this to AWS was not straightforward — here's what I ran into and how I solved it:
- Account restrictions — My AWS account was temporarily restricted during setup, which blocked instance creation. Had to work through AWS support to get it resolved before deployment could continue.
- Out of memory crashes — The t2.micro only has 1GB of RAM. The Docker build kept failing mid-way. Fixed it by configuring 2GB of swap space on the instance, which gave the build enough headroom to complete.
- Docker permission errors — Had to add the
ubuntuuser to the Docker group and re-login for permissions to take effect. Simple fix but not obvious from the error message. - ffmpeg path issues — Audio processing dependencies weren't resolving correctly inside the container on Linux. Traced it back to PATH configuration differences between local and the EC2 environment.
The main takeaway: cloud deployment failures are rarely about one thing. You have to read the error, isolate the layer (AWS, Docker, OS, app), fix it, and move on to the next one. It took longer than expected but got there.
- Use a headset microphone for cleaner audio capture
- Record in a quiet environment
- Make sure microphone permissions are granted in your browser/OS
- If the build fails on EC2 due to RAM, build the Docker image locally, push to Docker Hub, and pull it on the server
Joseph Kamil — AI/ML Engineer based in Los Angeles, CA
- GitHub: @kamilj62
- Email: kamilj@umich.edu
MIT
