A comprehensive AI-powered legal assistant for Indian legal systems with intelligent document analysis, multi-language support, and advanced RAG capabilities.
Note: This GitHub repository is for demonstration and documentation purposes. The complete, production-ready components are deployed separately:
- AI Backend: Hosted on Hugging Face Spaces
- Node.js Backend: Handles authentication and database operations
- Frontend: User-facing application
Nyay Mitra is a full-stack legal AI assistant specifically designed for Indian legal systems. The platform combines state-of-the-art AI technologies to provide document analysis, intelligent conversational assistance, multi-language translation, and automated document generation for legal professionals and individuals seeking legal information.
Architecture: Frontend Application β Node.js Backend (Auth/DB) β AI Backend (Hugging Face) β External AI Services
- Multi-tool Reasoning Engine: Advanced workflow automation with LangChain orchestration
- Context-Aware Conversations: Session management with conversation continuity
- Automatic Language Detection: Smart routing for 10+ languages
- Multi-step Workflows: Complex query handling with intelligent tool selection
- RAG-Powered Q&A: Ask questions about uploaded documents with high accuracy
- Duplicate Detection: SHA-256 content hashing prevents redundant uploads
- Adaptive Analysis: Smart document analysis (small/medium/large strategies)
- Hybrid Retrieval: Semantic search with fallback mechanisms
- 10+ Languages: English, Hindi, French, Urdu, Tamil, Bengali, Gujarati, Kannada, Malayalam, Telugu
- Bridge Translation: Direct and English-bridge translation paths
- Automatic Detection: Input language detection with confidence scoring
- Context-Aware Translation: Preserves legal terminology accuracy
- Template-Based Generation: Jinja2 templates for legal documents
- AI-Assisted Content: Automated summary and content generation
- Customizable Templates: Support for various legal document types
| Component | Technology | Purpose |
|---|---|---|
| Framework | FastAPI 0.111.0 | High-performance async API |
| LLM | Google Gemini 2.5 | AI reasoning (Flash & Pro) |
| Orchestration | LangChain 0.1.20 | AI workflow management |
| RAG Engine | LlamaIndex 0.10.34 | Document indexing & retrieval |
| Vector Store | ChromaDB 0.4.24 | Semantic search |
| Embeddings | Gemini text-embedding-004 | Document vectorization |
| Translation | Argos Translate 1.9.0 | Self-hosted translation |
| Storage | Hugging Face Datasets | Document persistence |
| Templates | Jinja2 3.1.4 | Document generation |
| Layer | Technology |
|---|---|
| Frontend | NextJS |
| API Gateway | Node.js Backend |
| AI Processing | FastAPI on Hugging Face |
| Database | NeonDB |
| Authentication | JWT + OAuth 2.0 |
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend Application (React/Vue) β
β β’ User Interface β
β β’ Document Upload β
β β’ Chat Interface β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Backend (Auth/Database) β
β β’ User Authentication β
β β’ Session Management β
β β’ Database Operations β
β β’ Request Routing β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI Backend (Hugging Face Spaces) β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β Nyay Mitra AI Agent (Multi-tool) β β
β ββββββββββββββββββββββββββββββββββββββββββββββ β
β β’ RAG Service (LlamaIndex + ChromaDB) β
β β’ Translation Service (Argos Translate) β
β β’ Document Generation (Jinja2) β
β β’ Deduplication Service (SHA-256) β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β External AI Services & Storage β
β β’ Google Gemini API (LLM & Embeddings) β
β β’ Hugging Face Hub (Document Storage) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
- Upload: A user uploads a document through the frontend.
- Authentication & Routing: The Node.js backend authenticates the user and securely forwards the request to the AI backend.
- Deduplication: The AI backend checks for duplicates using SHA-256 hashing to prevent redundant processing.
- Processing Pipeline: If unique, the document is classified, chunked (800 characters with 150 overlap), embedded, and indexed in the ChromaDB vector store.
- Response: An adaptive analysis is generated based on the document's size, and the results are returned to the user.
- Query & Language Detection: A user sends a query, and the system automatically identifies the input language.
- Workflow Planning: The AI Agent plans a multi-step workflow, selecting the appropriate tools (RAG, translation, etc.) for the task.
- Reasoning & Execution: The agent executes the plan, manages conversational context, and generates a coherent response.
- Display: The final response, translated if necessary, is displayed on the frontend with sources cited for transparency.
- Retrieval: The system performs a semantic search in the vector store to retrieve the top-k relevant chunks (max 12).
- Context Formulation: A context is prepared for the LLM, managing a token limit of ~8,000.
- Generation: The Gemini (Flash/Pro) model generates a precise answer based on the provided context and sources.
- Chunk Size: 800 characters with 150 character overlap
- Max Chunks per Query: 12 (adaptive based on document size)
- Context Token Limit: 8,000 tokens (~32,000 characters)
- Analysis Token Limit: 20,000 tokens (~80,000 characters)
- Similarity Threshold: 0.3 for semantic search
- Deduplication: SHA-256 content hashing with in-memory cache
- Supported Document Types: PDF, TXT, DOCX
- Live Demo: Hugging Face Space
- API Docs: Interactive Swagger UI
- Source Code: Hugging Face Repository
- Live Application: Link to Frontend
- Repository: Link to Frontend Repo
- API Server: Link to Backend
- Repository: Link to Backend Repo
π§ View Full API Endpoints
POST /api/v1/agent/chat- Intelligent conversational interfacePOST /api/v1/agent/upload-and-chat- Upload & instant analysisGET /api/v1/agent/capabilities- System capabilities
POST /api/v1/chat/rag- Document Q&APOST /api/v1/chat/rag/batch- Batch questionsPOST /document/suggest- AI analysis & suggestions
POST /api/v1/translate- Text translationPOST /api/v1/agent/detect-language- Language detectionGET /api/v1/agent/languages- Supported languages
POST /api/v1/agent/deduplication/check- Check for duplicatesGET /api/v1/agent/deduplication/stats- Deduplication statistics
GET /health- Health checkGET /- System overview
Full API Documentation: Swagger UI
This AI system is for informational and educational purposes only.
- β Does NOT provide legal advice
- β Does NOT replace qualified legal professionals
- β Does NOT guarantee accuracy or completeness
- β Provides general information and analysis
- β Helps organize legal information
- β Should be verified by legal experts
Always consult a qualified legal professional for specific legal matters.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Tejasvi Aryan && Anubrata Guin
For questions, issues, or collaboration:
- Email:
tejasviaryan225@gmail.com
- Google Gemini for powerful LLM capabilities
- LangChain & LlamaIndex for AI orchestration
- Hugging Face for hosting infrastructure
- Argos Translate for open-source translation
- FastAPI for excellent async framework