Evidence-backed, auditable explanations for credit decisions โ with citations, PII-safety, and regulator-ready audit trails.
Built for the NSK.AI RAG Hackathon 2025, CreditExplain RAG helps compliance officers and analysts quickly answer:
- "Why was this loan declined?"
- "Which clause justifies KYC step X?"
The system integrates advanced RAG techniques (SELF-RAG critic loop, reranking, provenance logging, PII redaction) to provide trustworthy, auditable explanations.
Try the system here: https://credit-explain.vercel.app
- Project Overview
- Tech Stack
- Installation & Setup
- Basic Usage
- Repository Structure
- Known Issues
- Future Development
- Acknowledgement
- Contact Information
- License
CreditExplain RAG addresses the critical need for transparent, evidence-based explanations in financial compliance and credit decisioning. Traditional AI systems often provide "black box" responses without verifiable sources, making them unsuitable for regulated environments.
Our solution provides:
- Regulatory Compliance: Every explanation cites specific clauses from authoritative documents
- Audit Trail: Complete provenance tracking with reflection token scoring
- PII Protection: Automatic redaction of sensitive personal information
- Multi-jurisdictional Support: Handling of Nigerian, Kenyan, and global financial regulations
- FastAPI - Modern Python web framework
- LangChain - RAG orchestration and tooling
- ChromaDB - Vector database for document storage
- HuggingFace Transformers - Embeddings and reranking models
- Groq API - LLM inference for critic and generator components
- Pydantic - Data validation and serialization
- React 18 - User interface library
- TypeScript - Type-safe development
- Vite - Build tool and dev server
- Tailwind CSS - Utility-first CSS framework
- TanStack Query - Server state management
- Axios - HTTP client for API communication
- SELF-RAG Architecture - Adaptive retrieval with reflection tokens
- sentence-transformers/all-MiniLM-L6-v2 - Embedding model
- cross-encoder/ms-marco-MiniLM-L-6-v2 - Reranker model
- Llama 3 - LLMs for generation and critique
- Python 3.9+ installed on your system
- Node.js 18+ and npm for frontend development
- Groq API account and API key for LLM access
- Git for version control
git clone https://github.com/Maina314159/CreditExplain
cd CreditExplain
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r requirements.txt# Copy and update the environment template
cp .env.example .envEdit .env with your configuration:
GROQ_API_KEY=your_groq_api_key_here- For frontend, also create an
.envwith the backend server added to it
# Go to frontend and create .env file
cd frontend
cp .env.example .env
# Then add backend server
VITE_BACKEND_URL=http://localhost:8000cd frontend
npm installcd ..
python -m ingest.indexStart Backend API:
python -m api.app
# API server starts at http://localhost:8000Start Frontend (in new terminal):
cd frontend
npm run dev
# Frontend starts at http://localhost:5173CLI Application
- You can also choose to interact with the terminal app:
python -m core.self_rag-
Access the Web Interface: Open http://localhost:5173 in your browser
-
Upload Documents: Navigate to the Upload page to add regulatory PDFs
- Supported formats: PDF, text documents
- Documents are automatically chunked and indexed
-
Ask Questions: Use the Query interface to ask compliance questions like:
- "What are the capital requirements for banks in Nigeria?"
- "What are the financial regulations in Kenya"
- "What documents are required for KYC verification?"
-
Review Results: Each response includes:
- Evidence-backed explanations with citations
- Confidence scores (HIGH, MEDIUM, LOW)
- Source document references with exact excerpts
- Suggested follow-up questions
-
Monitor Performance: Check the Metrics dashboard for system performance and audit logs
.
โโโ .github/ # GitHub templates and workflows
โโโ api/ # FastAPI backend application
โ โโโ app.py # Main FastAPI application with CORS
โ โโโ models.py # Pydantic models for request/response
โ โโโ __init__.py
โโโ core/ # RAG pipeline core components
โ โโโ self_rag.py # Main SELF-RAG orchestration logic
โ โโโ critic.py # Critic model for retrieval decisions
โ โโโ generator.py # Response generation component
โ โโโ retrieval.py # Vector retrieval functionality
โ โโโ reranker.py # Cross-encoder reranking
โ โโโ prompts.py # LLM prompt templates
โ โโโ provenance.py # Audit logging and provenance tracking
โ โโโ __init__.py
โโโ data/ # Document storage
โ โโโ raw/ # Original PDF documents
โ โโโ interim/ # Processed data files
โโโ frontend/ # React TypeScript frontend
โ โโโ src/
โ โ โโโ api/ # API client and hooks
โ โ โโโ components/ # Reusable UI components
โ โ โโโ pages/ # Main application pages
โ โ โโโ hooks/ # Custom React hooks
โ โ โโโ types/ # TypeScript type definitions
โ โ โโโ utils/ # Utility functions
โ โโโ package.json
โ โโโ vite.config.ts
โ โโโ tsconfig.json
โโโ ingest/ # Document ingestion pipeline
โ โโโ loader.py # PDF loading and parsing
โ โโโ chunker.py # Text chunking strategies
โ โโโ index.py # Vector indexing process
โ โโโ normalize.py # Text normalization
โ โโโ __init__.py
โโโ eval/ # Evaluation and metrics
โ โโโ metrics.py # Performance metrics calculation
โ โโโ __init__.py
โโโ tests/ # Test suites
โ โโโ unit/ # Unit tests
โ โโโ integration/ # Integration tests
โ โโโ demo_data/ # Test data
โ โโโ scripts/ # Test scripts
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
- PDF Parsing Accuracy: Complex PDF layouts with tables and multi-column formats may not parse perfectly
- Rate Limiting: Groq API has rate limits that may affect performance during high usage
- Context Length: Currently limited to ~1000 token chunks due to model constraints
- Metadata Extraction: Some document metadata (section headers, page numbers) may not be fully preserved
- Initial document ingestion can be slow for large PDF collections
- Real-time query processing typically takes 5-15 seconds depending on complexity
- Vector search performance degrades with very large document collections (>10,000 chunks)
- Best experienced in modern browsers (Chrome, Firefox, Safari, Edge)
- Mobile experience is functional but optimized for desktop use
- Real-time Collaboration: Multi-user support with shared workspaces
- Advanced Document Types: Support for Word documents, HTML, and scanned PDFs
- Custom Model Support: Integration with local LLMs and embedding models
- Enhanced Analytics: Advanced dashboard with trend analysis and compliance reporting
- API Extensions: Webhook support and third-party integrations
- Improved chunking strategies for legal and regulatory documents
- Multi-hop reasoning across multiple documents
- Automated regulatory change detection and alerting
- Cross-jurisdictional compliance mapping
This project was developed for the NSK.AI RAG Hackathon 2025 and builds upon several open-source technologies and research:
- SELF-RAG Paper (Asai et al., 2023) for the adaptive retrieval framework
- LangChain and LangSmith for RAG orchestration tools
- HuggingFace for transformer models and embeddings
- FastAPI for the high-performance backend framework
- React Query for efficient server state management
Special thanks to the regulatory bodies whose documents made this system possible:
- Central Bank of Nigeria (CBN)
- Central Bank of Kenya (CBK)
- Financial Action Task Force (FATF)
For questions, issues, or contributions:
- Project Maintainer: CreditExplain Team
- GitHub Issues: Create an issue
We welcome bug reports, feature requests, and contributions from the community!
This project is licensed under the MIT License - see the LICENSE file for details.