🚀 Smart Document Triage with PDF, JSON, Email Routing 📁 Built with Python, LangChain (LLM), Redis, and Streamlit
This project is a multi-agent AI system that accepts documents in various formats (PDF, Email, JSON) and intelligently:
- Classifies the document format and business intent (Invoice, RFQ, Complaint, Regulation)
- Routes the input to the correct processing agent (EmailAgent, JSONAgent, PDFAgent)
- Extracts structured data (e.g., sender info, invoice fields, anomalies)
- Stores traceable output in Redis-based shared memory for chain-of-processing transparency
This project leverages LLMs, LangChain, and agent orchestration for real-world intelligent document workflows.
| Feature | Description |
|---|---|
| 📤 Format Detection | Automatically detects if input is PDF, JSON, or Email |
| 🧾 Intent Classification | Understands if the doc is an Invoice, RFQ, Complaint, or Regulation |
| 🧠 LLM-Powered Extraction | Uses LLMs to extract and clean structured fields from natural text |
| 🔁 Multi-Agent Routing | Orchestrates document to the right processing agent |
| 🧩 Shared Memory via Redis | Persists extracted fields, metadata, thread ID for traceability |
| 📊 Anomaly Detection (JSON) | Flags missing/suspicious fields in structured JSON |
| 📨 CRM Formatting (Email) | Extracts sender info + urgency for downstream CRM integration |
| 📄 Streamlit UI | Upload interface with live logs, previews, and Redis-stored results |
.
├── agents/
│ ├── classifier_agent.py
│ ├── email_agent.py
│ ├── json_agent.py
│ └── pdf_agent.py
├── router/orchestrator.py
├── memory/redis_memory.py
├── utils/
│ ├── logger.py
│ ├── json_cleaner.py
│ └── file_handler.py
├── llm/langchain_llm.py
├── app.py # Streamlit UI
├── requirements.txt
└── README.md![]() |
![]() |
![]() |
![]() |
![]() |
- Clone the repository
git clone https://github.com/yourname/multi-agent-ai-docs.git
cd multi-agent-ai-docs- Install dependencies
pip install -r requirements.txt- Configure Environment
Create a
.envfile:
GROQ_API_KEY=your_groq_api_key
REDIS_HOST=localhost
REDIS_PORT=6379- Run the Streamlit App
streamlit run app.pylangchain
langchain-groq
redis
streamlit
python-dotenv
pdfplumber # for PDF extraction
- Real-time classification using LLMs (via LangChain)
- Seamless orchestration across multi-modal document types
- Context-sharing using Redis-based memory architecture
- Fully interactive UI with Streamlit
- Highly modular structure for extensibility to other formats or intents
Thank you for checking out this project 🙌




