An agentic AI platform that prevents avoidable Medicaid coverage loss for FQHC (Federally Qualified Health Center) patients through AI-powered eligibility determination, risk-scored renewal workflows, TCPA-compliant outreach, and a caseworker dashboard.
Every year, millions of patients lose Medicaid coverage due to paperwork failures — not ineligibility. During the 2023-2024 unwinding, 69% of disenrollments were procedural. MediAssist AI addresses this with two phases:
- Phase 1 — Eligibility Copilot: A ReAct-style AI agent that determines Medicaid eligibility across all 50 states with a five-layer defense architecture ensuring healthcare-grade reliability.
- Phase 2 — Recertification Engine: A multi-agent renewal workflow with risk scoring, automated outreach, document processing, and a caseworker dashboard to prevent procedural coverage loss.
This is not a simple chatbot — it's a multi-agent system with 10+ specialized agents:
- Receives a goal (e.g., "Check eligibility for patient #3")
- Recalls prior determinations from Mem0 memory scoped per patient
- Reasons about what tools to use and in what order
- Executes tools autonomously in a loop until the task is complete
- Validates its own output against a deterministic engine and QA agent
- Saves results to memory and filesystem for future recall
- Scores renewal risk and triggers TCPA-compliant outreach sequences
- Manages renewal workflows through an 11-state state machine
- Processes documents with LLM classification and deterministic validation
- Powers a caseworker dashboard with portfolio views, alerts, and overrides
LLMs are unreliable calculators. A single LLM call is insufficient for high-stakes Medicaid determinations. This system uses five independent layers — each catches a different failure mode:
┌─────────────────────────────────────────────────────────┐
│ Layer 1: System Prompt │
│ FPL tables + 50-state rules embedded in context │
│ Catches: Basic reasoning errors │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Deterministic Engine (eligibility.py) │
│ Pure Python — zero LLM involvement │
│ Catches: All math errors │
├─────────────────────────────────────────────────────────┤
│ Layer 3: Structured Output │
│ JSON schema for tool calls │
│ Catches: Format/parsing errors │
├─────────────────────────────────────────────────────────┤
│ Layer 4: Post-Hoc Guardrail │
│ Compares LLM output vs engine in real-time │
│ Catches: Hallucinated determinations │
├─────────────────────────────────────────────────────────┤
│ Layer 5: QA Agent │
│ Second LLM reviewing first with ground truth │
│ Catches: Reasoning errors the engine can't catch │
└─────────────────────────────────────────────────────────┘
Layer 4 validated the entire architecture: Patient #10 (Kevin Park) has income of $21,598 — exactly $1 over the $21,597 threshold. The LLM got this wrong. The guardrail caught it and corrected the response in real-time.
Layer 5 (QA Agent) checks five things independently: category, FPL table, math, citizenship, and expansion status. It receives ground truth from the deterministic engine — it's a reasoning auditor, not a coin flip.
All five layers apply to both streaming and non-streaming paths.
eligibility.py is the single source of truth for eligibility math. It's a pure Python function with zero LLM involvement:
- Looks up correct FPL threshold for household size (standard + Alaska + Hawaii tables)
- Determines Medicaid expansion status for the patient's state
- Applies correct threshold percentage (138% expansion, 100% non-expansion, higher for pregnant/children)
- Returns:
eligible,category,threshold_used,fpl_amount,reasoning
Used by three consumers:
- Layer 4 guardrail — compares LLM determination against engine in real-time
- Layer 5 QA agent — provides ground truth for reasoning audit
- Eval suite — computes expected results for all 16 seed patients
User (Browser) ──► FastAPI Server ──► Router (Orchestrator)
│ │
├─ /dashboard ──► Dashboard UI ├─ Memory Agent (Mem0 SDK)
├─ /copilot ────► Eligibility UI ├─ Knowledge Agent (FPL/rules)
└─ / ───────────► Marketing Site ├─ Eligibility Agent (ReAct loop)
│ ├─ MCP: Postgres (patient DB)
│ ├─ MCP: Filesystem (reports)
│ └─ MCP: Fetch (banned)
├─ Correctness Eval (guardrail)
├─ Quality Eval (QA agent)
├─ Risk Scoring Agent
├─ Outreach Agent (TCPA)
├─ Document Agent (LLM + rules)
├─ Workflow Orchestrator (state machine)
└─ Caseworker Copilot (dashboard)
| Agent | LLM? | Purpose |
|---|---|---|
| Eligibility Agent | Yes | ReAct loop — queries DB, determines eligibility |
| Risk Scoring Agent | No | 5-factor deterministic risk scoring (0-1.0) |
| Outreach Agent | No | TCPA-compliant SMS templates, consent, frequency caps |
| Document Agent | Yes | LLM classifies/extracts; deterministic validation |
| Workflow Orchestrator | No | 11-state renewal state machine |
| Caseworker Copilot | Partial | Deterministic alerts + LLM summaries |
| Memory Agent | No | Mem0 SDK for per-patient memory |
| Knowledge Agent | No | FPL tables + 50-state rules |
| Correctness Eval | No | Deterministic guardrail (engine vs LLM) |
| Quality Eval | Yes | QA agent reviews determination |
The agent connects to three MCP servers via stdio:
| Server | Package | Purpose |
|---|---|---|
| PostgreSQL | @modelcontextprotocol/server-postgres |
Query patient records from the database |
| Fetch | mcp-server-fetch |
Web access (banned by evals — FPL data is embedded) |
| Filesystem | @modelcontextprotocol/server-filesystem |
Save eligibility determination reports |
Mem0 runs as a direct Python SDK call, not an MCP tool:
- Why not MCP? Each MCP tool exposed to GPT adds potential API calls. Mem0 as MCP added 2 extra OpenAI round-trips per query.
- SDK approach: Memory search happens in Python before the GPT call (injected into system prompt). Memory save happens after the final response. Zero additional API calls.
- Per-patient scoping: Memories are keyed by
patient-{id}to prevent cross-patient pollution — a HIPAA-aware design pattern.
The model is pinned to gpt-4o-mini-2024-07-18 (specific snapshot), not the gpt-4o-mini alias. Aliases silently resolve to new snapshots that can change behavior. Changing the model version requires a deliberate code change + regression eval run.
IDENTIFIED ──► NOTIFIED ──► ENGAGED ──► DOC_COLLECTION ──► VALIDATION ──► SUBMISSION_READY ──► COMPLETED
│ │ │ │
▼ ▼ ▼ ▼
NO_RESPONSE DROPPED_OFF (reminder) DOC_COLLECTION
│ │ (invalid doc)
▼ ▼
NOTIFIED ENGAGED ◄── Recovery paths
(escalate) (re-engage)
EXPIRED ◄── Deadline passed in any non-terminal state
Every state transition is logged to audit_log for HIPAA compliance.
Deterministic 5-factor scoring (no LLM):
| Factor | Weight | Trigger |
|---|---|---|
| Deadline proximity | 0-0.30 | ≤14d: 0.30, 15-30d: 0.20, 31-60d: 0.10 |
| Prior renewal history | 0-0.25 | Lapsed: 0.25, first renewal: 0.15 |
| Response pattern | 0-0.20 | No-response >50%: 0.20 |
| Contact quality | 0-0.10 | Bounced: 0.10, unverified: 0.05 |
| Demographic complexity | 0-0.15 | Age≥65, non-English, household≥5: 0.05 each |
Four risk tiers: Critical (0.70-1.0), High (0.40-0.69), Medium (0.20-0.39), Low (0-0.19).
All outreach enforces:
- Consent required —
opted_instatus checked before every message - Quiet hours — 8am-9pm patient local time only
- Frequency caps — max 3/week, 1/day
- Opt-out — STOP/ALTO immediately blocks all messages
- Bilingual — EN + ES templates, selected by
preferred_language - Escalation — 2 unanswered → caseworker alert, 3 → phone outreach
Pipeline: Classify (LLM) → Extract (LLM) → Validate (deterministic)
- Supports: pay stubs, tax returns, utility bills, ID documents
- Validation: date ranges, amount extraction, name cross-reference
- Confidence < 0.80 → routed to caseworker for manual review
A unified web UI for non-technical FQHC caseworkers:
- Pipeline view — all patients grouped by workflow state
- Risk-ranked alerts — critical/high/medium patients needing attention
- Patient detail — eligibility chat + renewal status + timeline + actions
- Override capability — caseworker can override agent decisions with audit trail
- Embedded eligibility copilot — per-patient chat with streaming responses
Three new tables added to support renewal workflows:
| Table | Purpose |
|---|---|
renewals |
One row per renewal workflow — state, risk score, documents, communication log |
documents |
Uploaded documents with classification, extraction data, and review status |
audit_log |
HIPAA 6-year retention — every state transition, override, and PHI access |
Every determination is evaluated on three independent dimensions:
| Dimension | Method | Threshold |
|---|---|---|
| Correctness | Deterministic engine comparison | Must match exactly |
| Efficiency | API call count + banned tool check | ≤4 API calls, no Fetch |
| Quality | Keyword matching with alternatives | Must mention category, state, threshold |
A determination can pass correctness but fail quality (or vice versa) — the dimensions are independent.
8 standard cases + 8 edge cases covering the full eligibility decision space:
| # | Patient | Edge Case |
|---|---|---|
| 9 | Elena Ruiz | Income exactly at 138% FPL threshold (<= boundary) |
| 10 | Kevin Park | Income $1 over threshold (just above cutoff) |
| 11 | Yuki Tanaka | Non-US citizen (citizenship disqualification) |
| 12 | Jordan Lee | Age 18 (child→adult category boundary) |
| 13 | Margaret Davis | Age 65 in non-expansion state (adult→elderly boundary) |
| 14 | Tamika Williams | Pregnant in non-expansion state (higher threshold) |
| 15 | John Whitehorse | Alaska (different FPL table) |
| 16 | Leilani Kealoha | Hawaii, household size 8 (FPL table max boundary) |
| Eval | Method | What it checks |
|---|---|---|
| Risk Scoring Eval | Deterministic | Score determinism, tier boundaries, all 16 scenarios |
| Outreach Compliance Eval | Deterministic | Opt-out blocking, quiet hours, frequency caps, STOP text |
| Workflow Eval | Deterministic | Valid transitions, timeout escalations, recovery paths, audit logging |
npm run eval # Deterministic evals (instant, no API calls)
npm run eval:agent # Full agent evals (requires running server + OpenAI key)
npm run build # Full build: pip install → deterministic evals → seed database
# Phase 2 unit tests (94 tests)
python -m pytest tests/ -vA single combined GitHub Actions workflow (regression-evals.yml) handles all eval automation:
- On every push/PR: Deterministic evals run instantly — blocks broken code
- Daily schedule: Full agent evals for drift monitoring
- Manual trigger:
workflow_dispatchfor on-demand agent evals - Auto-issue on failure: Agent eval failures auto-create GitHub issues with
eval-regressionlabel and run link - Render deployment: Deterministic evals run in the build pipeline — if they fail, deployment is blocked
| Method | Path | Description |
|---|---|---|
GET |
/ |
Marketing site |
GET |
/copilot |
Eligibility copilot UI |
GET |
/dashboard |
Caseworker dashboard UI |
GET |
/health |
MCP server connection status |
GET |
/patients |
List all patients |
POST |
/patients |
Create a new patient |
GET |
/patients/{id} |
Get patient by ID |
POST |
/check |
Run eligibility check (returns determination + metrics) |
POST |
/check/stream |
Streaming eligibility check |
POST |
/check/{patient_id} |
Check eligibility by patient ID |
GET |
/metrics |
Last query metrics (guardrail, QA, latency, tokens) |
GET |
/sessions |
List active conversation sessions |
GET |
/sessions/{id} |
Retrieve conversation history |
GET |
/patients/{id}/sessions |
List sessions for a patient |
GET |
/reports |
List saved determination reports |
GET |
/reports/{filename} |
Get a specific report |
| Method | Path | Description |
|---|---|---|
POST |
/renewals/{patient_id}/start |
Initiate renewal workflow (risk score + first outreach) |
GET |
/renewals/{patient_id}/status |
Current state + audit timeline |
GET |
/renewals/pipeline |
All patients grouped by workflow state |
POST |
/renewals/{patient_id}/event |
Trigger state transition |
POST |
/renewals/{patient_id}/documents |
Upload and process a document |
GET |
/renewals/{patient_id}/documents |
List documents for a renewal |
POST |
/renewals/{patient_id}/check_renewal |
Check renewed eligibility with updated info |
| Method | Path | Description |
|---|---|---|
GET |
/dashboard/portfolio |
Portfolio summary with risk scores |
GET |
/dashboard/alerts |
Patients needing attention, sorted by priority |
GET |
/dashboard/metrics |
Pipeline health metrics |
POST |
/dashboard/override/{renewal_id} |
Caseworker override with audit trail |
| Method | Path | Description |
|---|---|---|
GET |
/outreach/history/{patient_id} |
Communication log |
POST |
/outreach/opt_out/{patient_id} |
Process TCPA opt-out |
The /check response and /metrics endpoint expose:
{
"llm_api_calls": 3,
"llm_api_calls_breakdown": { "react_loop": 2, "qa_review": 1 },
"tool_call_count": 2,
"tool_names": ["read_query", "write_file"],
"guardrail_match": true,
"guardrail_details": {
"engine_eligible": true,
"llm_eligible": true,
"category": "adult_expansion",
"income_pct": 95.2,
"threshold_pct": 138
},
"qa_approved": true,
"qa_issues": [],
"latency_ms": 4200,
"input_tokens": 3500,
"output_tokens": 800,
"total_tokens": 4300
}The tool-use loop is capped at MAX_AGENT_ITERATIONS = 10. If the model keeps requesting tools beyond this limit, the agent stops and returns a graceful message. Applied to both streaming and non-streaming paths.
All MCP tool results pass through sanitization before being sent to GPT:
- Truncation: Results over 10,000 characters are truncated
- Control character stripping: Null bytes and non-printable characters removed
2025 Federal Poverty Level tables and all 50-state Medicaid thresholds are embedded directly in the system prompt. The agent does not need to fetch from HHS.gov. The Fetch MCP server is available but banned by evals.
The streaming path has full feature parity with the non-streaming path:
- Mem0 search before GPT, Mem0 save after completion
- Loop guardrails (same
MAX_AGENT_ITERATIONS) - Tool result sanitization
- Post-hoc guardrail + QA agent review
- Metrics tracking and conversation persistence
Reduced from 5-6 OpenAI API calls per query to 3:
- Removing Mem0 MCP (was 2 extra calls, now 0 via SDK)
- Embedding FPL data (was 1-2 fetch calls, now 0)
- Typical flow: 1 initial + 1 tool execution + 1 final = 3 calls
Conversations are stored in PostgreSQL as JSONB, keyed by session ID with patient association. Multi-turn sessions can be recovered across server restarts.
| Component | Technology |
|---|---|
| Language | Python 3.13+ |
| LLM | OpenAI gpt-4o-mini-2024-07-18 (pinned) |
| Framework | FastAPI |
| Database | PostgreSQL (Render managed) — patients, renewals, documents, audit_log |
| Memory | Mem0 SDK (direct Python, not MCP) |
| Tool Protocol | MCP (stdio transport) |
| Deployment | Render (render.yaml blueprint) |
| CI/CD | GitHub Actions (eval-gated) |
| Frontend | Vanilla HTML/CSS/JS (inline, no build step) |
| Testing | pytest (94 Phase 2 tests) + custom eval suite (16 patients) |
| Evals | npm scripts + Python |
.
├── router.py # Multi-agent orchestrator (default)
├── agent.py # Monolith agent (legacy, still functional)
├── eligibility.py # Deterministic engine (single source of truth)
├── prompts.py # System prompt + QA prompt with embedded FPL data
├── config.py # Constants, MCP configs, model pinning, Phase 2 settings
├── mcp_manager.py # Multi-server MCP connection manager with retry
├── server.py # FastAPI endpoints (Phase 1 + Phase 2)
├── seed_db.py # Database schema + 16 seed patients + 16 renewal scenarios
├── agents/
│ ├── __init__.py # All agent exports
│ ├── base.py # AgentResult, EvalResult base types
│ ├── eligibility_agent.py # ReAct loop + renewal eligibility check
│ ├── memory_agent.py # Mem0 SDK integration
│ ├── knowledge_agent.py # FPL tables + state rules
│ ├── risk_scoring_agent.py # Deterministic 5-factor risk scoring
│ ├── outreach_agent.py # TCPA-compliant SMS sequences
│ ├── document_agent.py # LLM classification + deterministic validation
│ ├── workflow_orchestrator.py # 11-state renewal state machine
│ ├── caseworker_copilot.py # Dashboard summaries + alerts
│ ├── eval_correctness.py # Guardrail: engine vs LLM
│ ├── eval_efficiency.py # API call count + banned tool check
│ ├── eval_quality.py # QA agent review
│ ├── eval_risk_scoring.py # Risk score determinism eval
│ ├── eval_outreach_compliance.py # TCPA compliance eval
│ └── eval_workflow.py # State machine validity eval
├── static/
│ ├── index.html # Marketing site
│ ├── copilot.html # Eligibility copilot UI
│ └── dashboard.html # Caseworker dashboard UI
├── tests/
│ ├── test_risk_scoring.py # 16 tests
│ ├── test_outreach_agent.py # 20 tests
│ ├── test_workflow_orchestrator.py # 32 tests
│ ├── test_document_agent.py # 18 tests
│ └── test_renewal_eligibility.py # 8 tests
├── evals/
│ └── test_eligibility.py # 3-dimension eval suite (16 patients)
├── .github/
│ └── workflows/
│ └── regression-evals.yml # Deterministic on push + agent evals daily
├── reports/ # Saved eligibility determination reports
├── render.yaml # Render blueprint (eval-gated deployment)
├── package.json # npm scripts: eval, eval:agent, start, build
└── requirements.txt
- Python 3.13+
- Node.js (for MCP servers)
- PostgreSQL (via Docker or local install)
# Start PostgreSQL
docker run -d --name medicaid-pg -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=medicaid -p 5432:5432 postgres:16
# Install dependencies
pip install -r requirements.txt
npm install
# Configure environment
cp .env.example .env # Add your OPENAI_API_KEY and MEM0_API_KEY
# Run evals + seed the database
npm run build
# Run the server
npm startOpen http://localhost:8000 for the marketing site, http://localhost:8000/dashboard for the caseworker dashboard, or http://localhost:8000/copilot for the standalone eligibility copilot.
- Push to GitHub
- Go to Render Dashboard
- Connect the repo — Render detects
render.yaml - Set
OPENAI_API_KEYandMEM0_API_KEYwhen prompted - Deploy
The blueprint creates a free web service and a free PostgreSQL database. Deterministic evals run during build — if they fail, deployment is blocked.
MIT