Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# TAP Voice Agent — Environment Variables
# Author: Dashpreet Singh <dashpreetsinghhanda@gmail.com>
# Copy this to .env and fill in your values.

# ── VAPI (Voice Orchestration) ──────────────────────────────────────────────
VAPI_API_KEY=your_vapi_api_key_here
VAPI_PHONE_NUMBER_ID=your_vapi_phone_number_id
VAPI_WEBHOOK_SECRET=your_webhook_secret_here

# ── Sarvam AI (Indic STT + TTS) ────────────────────────────────────────────
# Get key at: https://dashboard.sarvam.ai
SARVAM_API_KEY=your_sarvam_api_key_here

# ── WhatsApp Business API (Meta) ────────────────────────────────────────────
WHATSAPP_TOKEN=your_whatsapp_business_token
WHATSAPP_PHONE_NUMBER_ID=your_whatsapp_phone_number_id

# ── TAP LMS (Frappe) ────────────────────────────────────────────────────────
FRAPPE_BASE_URL=https://lms.theapprenticeproject.org
FRAPPE_API_KEY=your_frappe_api_key
FRAPPE_API_SECRET=your_frappe_api_secret

# ── OpenAI (LLM for conversation) ───────────────────────────────────────────
OPENAI_API_KEY=your_openai_api_key_here

# ── Redis (session state + rate limiting) ────────────────────────────────────
REDIS_HOST=localhost
REDIS_PORT=6379

# ── Server ───────────────────────────────────────────────────────────────────
PORT=8000
DEBUG=false
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.env
__pycache__/
*.pyc
163 changes: 162 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,162 @@
# C4GT_2026
# TAP Multilingual Voice Agent

**Author:** Dashpreet Singh
**Email:** dashpreetsinghhanda@gmail.com | 2024ucs0087@iitjammu.ac.in
**Institution:** IIT Jammu, B.Tech CSE 2024–2028
**Project:** [DMP 2026] Building Multilingual Voice Agents — The Apprentice Project

---

## What This Is

An AI-powered multilingual voice agent that proactively calls students and parents in Hindi, Marathi, and Punjabi to re-engage them with TAP's learning platform. When a student goes inactive, Didi (the agent persona) calls them, speaks in their language, knows exactly where they left off, and gently guides them back.

Unlike SMS, a voice call can answer "what do I do next?" — making it significantly more effective for government school demographics with low digital literacy.

---

## Architecture

```
User (student/parent phone)
VAPI Platform ──── Sarvam AI STT (Indic ASR)
│ ──── GPT-4o-mini (Didi persona)
│ ──── Sarvam AI TTS (Indic voices)
FastAPI Server (this repo)
├── LanguageDetector — script-aware Hindi/Marathi/Punjabi detection
├── DropoutRiskModel — logistic regression on 6 LMS signals
├── NudgeOrchestrator — decides who, how, when to call
├── ConversationFlow — structured dialogue trees + LLM prompts
├── ExperimentFramework — A/B testing: voice vs WhatsApp vs control
├── VAPI Client — outbound call orchestration
├── WhatsApp Client — voice note fallback (Sarvam TTS → Meta API)
└── Frappe Client — TAP LMS REST API integration
```

---

## Key Innovations

### 1. Sarvam AI over Whisper
Whisper is trained on clean internet audio. Sarvam AI is specifically trained on Indian telephony audio — noisy, accented, code-switched. For government school calls, this matters enormously.

### 2. Dropout Risk Model
A lightweight logistic regression over 6 LMS signals (inactivity days, completion %, engagement decay rate, session duration, streak, total sessions). Online learning: every call outcome updates the model weights. The system gets smarter as it runs.

### 3. Conversational Memory
Redis stores the last 10 call outcomes per student. If math nudges haven't worked for Riya in 2 calls, the system switches to a different topic she responded to before.

### 4. Parent vs Student Routing
Grade ≤ 6 or hour ≥ 18 → call parent with a different script. The agent knows it's talking to a parent and adjusts vocabulary, tone, and ask accordingly.

### 5. Channel Fallback Chain
`voice_call → WhatsApp voice note → WhatsApp text`
Voice notes have 3–5× higher open rates than text messages for this demographic.

### 6. A/B Experimentation Built In
Every nudge is assigned to an experiment arm (control / voice / WhatsApp voice / WhatsApp text) via deterministic hashing. Return-to-platform rate is the primary metric, tracked via a Frappe webhook when the student opens the app.

---

## Setup

```bash
git clone https://github.com/DZDasherKTB/tap-voice-agent
cd tap-voice-agent

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Fill in your API keys in .env

# Start Redis
docker run -d -p 6379:6379 redis:alpine

# Run the server
python main.py
```

---

## Run Tests

```bash
python -m pytest tests/test_suite.py -v
# Expected: all tests pass, no external services required
```

---

## API Endpoints

| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/api/nudge/student` | Trigger nudge for one student |
| `POST` | `/api/nudge/batch` | Trigger batch nudge for all at-risk students |
| `GET` | `/api/experiments/report` | A/B experiment metrics |
| `POST` | `/api/detect-language` | Debug: detect language from text |
| `GET` | `/api/health` | Health check |
| `POST` | `/webhook/vapi` | VAPI call events (internal) |
| `POST` | `/webhook/lms/login` | Student login after nudge (internal) |

---

## Scheduler

Batch nudges run automatically:
- **10:00 AM IST** — morning session (students before school)
- **6:00 PM IST** — evening session (after school, parents home)

---

## Supported Languages

| Language | Script | Kokoro | Sarvam STT | Sarvam TTS Voice |
|---|---|---|---|---|
| Hindi | Devanagari | `hi-IN` | ✅ | meera (F), arjun (M) |
| Marathi | Devanagari | `mr-IN` | ✅ | anushka (F), rajan (M) |
| Punjabi | Gurmukhi | `pa-IN` | ✅ | preet (F), gurpreet (M) |

---

## File Structure

```
tap-voice-agent/
├── main.py # FastAPI app, webhooks, scheduler
├── config.py # Central configuration
├── requirements.txt
├── .env.example
├── agent/
│ ├── language_detector.py # Script-aware language detection
│ └── conversation_flow.py # Dialogue trees, system prompts, closings
├── lms/
│ └── frappe_client.py # TAP Frappe LMS REST client
├── nudge/
│ ├── dropout_risk.py # Logistic regression risk scorer
│ └── orchestrator.py # Master pipeline: score → call → log
├── telephony/
│ ├── vapi_client.py # VAPI outbound call client
│ └── whatsapp_client.py # WhatsApp voice note + text fallback
├── experiments/
│ └── framework.py # A/B experiment assignment + metrics
└── tests/
└── test_suite.py # 35+ unit tests, no external services needed
```

---

## Contact

Dashpreet Singh
dashpreetsinghhanda@gmail.com
2024ucs0087@iitjammu.ac.in
IIT Jammu — B.Tech CSE 2024–2028
Loading