GenAI that knows exactly where every email belongs.
Short link: aka.ms/classymail
ClassyMail is an open-source AI pipeline that automatically classifies emails and PDFs using multi-model inference on Azure. Upload a document, get structured intents with confidence scores β in seconds.
- A PDF arrives (scanned email, letter, invoice) β uploaded through the web dashboard, API, or automatically ingested from Blob Storage via Event Grid.
- Mistral Document AI converts it to Markdown via specialized OCR β preserving layout, headers, and image descriptions. If Mistral is unavailable or rate-limited, the pipeline automatically falls back to Azure Document Intelligence via a circuit breaker pattern (a π badge appears in the UI when this happens).
- Phi-4 (or a fallback model) reads the Markdown and classifies the email into one or more business intents with confidence scores and justification.
- Results appear in the dashboard where a human reviewer can validate, correct, or reject the classification.
- Corrections feed back into exportable fine-tuning datasets (JSONL) to improve the model over time.
The entire pipeline is event-driven: Blob Storage β Event Grid β Service Bus β Worker β so it scales from 1 email to 10,000+ without code changes.
- Hybrid OCR: Mistral Document AI 2512 as primary, Azure Document Intelligence as automatic fallback with circuit breaker pattern. When Document Intelligence is used, a π amber badge appears on the email in the dashboard and detail modal so reviewers can see which OCR engine was used.
- Multi-Model Classification: Phi-4 (8K context) as primary, GPT-4o-mini (120K) as fallback β switch models from the Settings UI at any time
- Multi-Intent Detection: A single email can match multiple categories, each with its own confidence score (0.0β1.0) and text justification
- Custom Category Taxonomy: Define your own business categories with name, slug, description, and exclusion rules β all editable from the Settings UI
- Model Selection: Switch classification model (Phi-4, GPT-4o-mini, GPT-5-mini, Kimi-K2.5, etc.) from Settings β models are auto-discovered from your AI Foundry project deployments with a hardcoded fallback list
- Processing Strategies: Choose between Standard (fast), Deep Reasoning (Chain-of-Thought), Vision (image-aware), or Agentic (Multi-Agent) per processing run
- Batch Reprocessing: Reprocess all emails with a different model or strategy for A/B testing
- Orchestrator Inspector: A fast, cheap model (gpt-4.1-nano) scans the document and shortlists the top 3-5 candidate categories β cuts 80% of unnecessary computation
- Parallel Specialized Agents: One agent per candidate category, each with its own RAG tool calling a dedicated Azure AI Search index for reference examples
- Red Team Quality Gate: Adversarial reviewer activated when confidence is low or agents disagree
- Per-Category AI Search Indexes: Add good and bad examples via the Settings UI β agents use them to calibrate confidence
| Setting | Description |
|---|---|
| Orchestrator Model | Fast routing model (gpt-4.1-nano recommended) |
| Agent Tiers 1/2/3 | Model per confidence band: nano for clear, mini for ambiguous, full for critical |
| Red Team | Quality gate model + threshold slider |
| RAG Mode | Vector, Hybrid, or Semantic retrieval for per-category indexes |
| Per-Category Index | Toggle AI Search RAG on/off per category, manage good/bad examples |
Full documentation: AGENTIC_CLASSIFICATION | AI_SEARCH_INDEXES
- Classification Dashboard: Card or table view with confidence filters (high/low), status filters (processed/review/error), PII indicators, and real-time search
- Email Detail Modal: Side-by-side PDF preview + classified intents with confidence bars, justification text, and one-click correction
- CSV Export: Download classification results as semicolon-delimited CSV with intents, confidence scores, model used, processing time, and PII metadata
- Batch Actions: Mark emails as reviewed, reprocess selections, or export subsets
- Correction Workflow: Reviewers override wrong classifications with a reason β corrections are stored as golden labels with full audit trail
- AI Feedback: Each correction generates an LLM feedback message explaining what the model missed, used for prompt improvement
- Auto-Feed to AI Search: When a user corrects a classification, the email is automatically pushed as a negative example to the old (wrong) category and a positive example to the new (correct) category in the per-category AI Search indexes
- One-Click Reinforcement: Click "Reinforce" on any correctly classified email to push it as a
human_reinforcedpositive example into its category's AI Search index β teaches the agentic pipeline "this is right" - Fine-Tuning Export: Export anonymized JSONL datasets (train/test split) matching the production system prompt format β ready for Microsoft AI Foundry (Phi-4 LoRA, GPT-4o-mini)
- Human Reinforcement: Corrected examples are weighted higher in training data, closing the feedback loop between human reviewers and model quality
- Category AI Assessment: GPT-4.1-nano analyzes your category definitions and suggests improvements based on classification patterns
- Per-Category AI Search Indexes: Each category gets its own Azure AI Search index with positive and negative reference examples β agents use RAG to calibrate confidence (see AI_SEARCH_INDEXES)
- Chat with your emails: GPT-5.2-chat with vector search over all processed documents, orchestrated by Microsoft Agent Framework 1.5 (
agent-framework-core>=1.5+agent-framework-openai>=1.5) β per-localeAgentcache,ContextVar-scoped dependency injection, and full chat history replay vialist[Message] - Agent-driven suggestions: The LLM agent generates contextual follow-up action pills after each response β no hardcoded logic
- Ask AI button: Click β¨ on any email card or table row to open the chatbot pre-filled with that emailβs context
- 12 agent tools: Semantic search (with date filtering), keyword search (case-insensitive), reclassification handoff, sequential review, stats, error analysis, category explanation
- Semantic vector search: Emails are embedded with
text-embedding-3-small(1536-dim) during processing. The chatbot uses Cosmos DBVectorDistance()for concept-level matching - Semantic cache: Repeated similar questions are served from a vector cache (>99% similarity), saving tokens
- Zero Secrets: Managed Identity everywhere β no connection strings, no API keys in config
- PII Detection: LLM-based, Azure AI Language, or Hybrid mode (GDPR compliant)
- Dynamic Cost Tracking: Real token usage per email across 12+ model pricing tiers, with links to Microsoft Azure OpenAI Pricing
- Danger Zone: Admin operations organized by severity β Maintenance (diagnostics), Bulk Operations (reprocess all, rebuild vector index, DLQ), and Destructive (atomic reset)
- Rebuild Vector Index: One-click re-embedding of all emails for semantic search repair
- i18n: 5 languages (EN, FR, DE, ES, IT) with 500+ translation keys
- Build metadata: Commit SHA and build timestamp baked into Docker image, visible in the Info modal
flowchart TD
user[User] -->|Upload PDF| ui["Vue 3 SPA"]
ui -->|API| api["FastAPI"]
api -->|Store| blob[("Blob Storage")]
blob -->|Event Grid| sbq["Service Bus Queue"]
sbq --> worker["Worker"]
worker -->|OCR| ocr["Mistral Document AI 2512"]
ocr -.->|Fallback| di["Doc Intelligence"]
worker -->|Classify| phi4["Phi-4"]
phi4 -.->|Fallback| gpt["GPT-4o-mini"]
worker -->|Save| cosmos[("Cosmos DB")]
cosmos --> api
api --> ui
# Prerequisites: Python 3.12, Node.js 18+, Azure CLI
# Generate secrets from Azure
./scripts/write_secrets_env.ps1 -ResourceGroup "<prefix>-rg" -Force
# Backend
uv sync && uv run uvicorn main:app --reload
# Frontend (separate terminal)
cd frontend && npm install && npm run dev
# Health check
curl http://localhost:8000/healthzSee docs/LOCAL_DEVELOPMENT.md for full setup or docs/DEPLOY_FROM_SCRATCH.md for a fresh Azure deployment.
- Backend: FastAPI (Python 3.12) + uv
- Frontend: Vue 3 + Vite + TailwindCSS + vue-i18n
- Infra: Terraform (azurerm v4 + azapi)
- AI: Microsoft AI Foundry (Mistral, Phi-4, GPT-4o-mini, GPT-5.2-chat) + Microsoft Agent Framework GA 1.0
- Storage: Cosmos DB (serverless + vector search + composite indexes), Blob Storage
- Auth: Managed Identity (zero secrets)
- CI/CD: GitHub Actions with OIDC
| Model | Deployment Name | Purpose | SKU |
|---|---|---|---|
| Phi-4 | phi-4 |
Primary classification | GlobalStandard |
| Mistral Document AI 2512 | mistral-document-ai-2512 |
OCR / PDF extraction | GlobalStandard |
| Model | Deployment Name | Purpose | Required for |
|---|---|---|---|
| GPT-4o-mini | gpt-4o-mini |
Fallback classifier, PII detection | Large emails (>8K tokens) |
| text-embedding-3-small | text-embedding-3-small |
Vector embeddings for RAG chatbot | Chat / semantic search |
| GPT-5.2-chat | gpt-5.2-chat |
RAG chatbot conversations | Chat feature |
Not all models are required. The pipeline works with just Phi-4 + Mistral OCR. Optional models enable fallback classification and RAG chat. Deploy only what you need.
Pricing in the Settings UI is hardcoded (estimated Azure rates as of 2025-2026). Actual costs depend on your region and Azure agreement. The backend cost calculator uses the same hardcoded rates. See
classymail/services/costing.pyandfrontend/src/views/SettingsView.vuefor the pricing tables.
The chatbot uses semantic vector search over all processed emails, powered by Microsoft Agent Framework GA 1.0:
- During processing: Each emailβs OCR markdown is embedded using
text-embedding-3-small(1536 dimensions) and stored in Cosmos DB withtype: "email". Chunks are also embedded separately withtype: "chunk" - During chat: User queries are embedded, then Cosmos DB
VectorDistance()finds the most semantically similar emails β with optional date filtering (daysparameter for βlast weekβ queries) - Chat model (GPT-5.2-chat) generates answers grounded in retrieved context, with source citations
- Semantic cache: Similar questions (>99% cosine similarity) return cached responses instantly
- Agent-driven actions: The LLM appends contextual follow-up suggestions (hidden
<!-- ACTIONS -->block) that appear as clickable pills in the UI
Rebuild embeddings: If emails were processed before the embedding model was deployed, use Settings β Danger Zone β Rebuild Vector Index to regenerate all vectors.
The chat button is automatically hidden in the UI if
CHAT_DEPLOYMENTorEMBEDDING_DEPLOYMENTare not configured. Deploy the optional models to enable it. Category AI Assessment is also automatically disabled if the assessment model isn't deployed in your AI Foundry project.
| Category | Doc | Description |
|---|---|---|
| Getting Started | LOCAL_DEVELOPMENT | Run locally with uv and npm |
| DEPLOY_FROM_SCRATCH | First-time Azure deployment (45β60 min) | |
| Architecture | ARCHITECTURE | System design, data flow, RBAC |
| INFRASTRUCTURE | Terraform resources and networking | |
| MODELS | AI models, fallback logic, fine-tuning | |
| Operations | TROUBLESHOOTING | Common issues and fixes |
| CICD_GITHUB | GitHub Actions CI/CD with OIDC | |
| COSTS_LOGIC | Cost estimation and token tracking | |
| Features | USER_INTERFACE | Dashboard and UI guide |
| CUSTOMIZATION | Category taxonomy and configuration | |
| AI_SEARCH_INDEXES | Per-category AI Search indexes and examples | |
| INTEGRATION | CSV export, slug system, API |
Full index: docs/INDEX.md
MIT β see LICENSE for details.


