Skip to content

Fidelisaboke/CreditExplain

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

66 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“˜ CreditExplain RAG

Evidence-backed, auditable explanations for credit decisions โ€” with citations, PII-safety, and regulator-ready audit trails.

Built for the NSK.AI RAG Hackathon 2025, CreditExplain RAG helps compliance officers and analysts quickly answer:

  • "Why was this loan declined?"
  • "Which clause justifies KYC step X?"

The system integrates advanced RAG techniques (SELF-RAG critic loop, reranking, provenance logging, PII redaction) to provide trustworthy, auditable explanations.

Try the system here: https://credit-explain.vercel.app

๐Ÿ“‹ Table of Contents

โœจ Project Overview

CreditExplain RAG addresses the critical need for transparent, evidence-based explanations in financial compliance and credit decisioning. Traditional AI systems often provide "black box" responses without verifiable sources, making them unsuitable for regulated environments.

Our solution provides:

  • Regulatory Compliance: Every explanation cites specific clauses from authoritative documents
  • Audit Trail: Complete provenance tracking with reflection token scoring
  • PII Protection: Automatic redaction of sensitive personal information
  • Multi-jurisdictional Support: Handling of Nigerian, Kenyan, and global financial regulations

๐Ÿ› ๏ธ Tech Stack

Backend

  • FastAPI - Modern Python web framework
  • LangChain - RAG orchestration and tooling
  • ChromaDB - Vector database for document storage
  • HuggingFace Transformers - Embeddings and reranking models
  • Groq API - LLM inference for critic and generator components
  • Pydantic - Data validation and serialization

Frontend

  • React 18 - User interface library
  • TypeScript - Type-safe development
  • Vite - Build tool and dev server
  • Tailwind CSS - Utility-first CSS framework
  • TanStack Query - Server state management
  • Axios - HTTP client for API communication

Machine Learning

  • SELF-RAG Architecture - Adaptive retrieval with reflection tokens
  • sentence-transformers/all-MiniLM-L6-v2 - Embedding model
  • cross-encoder/ms-marco-MiniLM-L-6-v2 - Reranker model
  • Llama 3 - LLMs for generation and critique
image

๐Ÿš€ Installation & Setup

Pre-requisites

  • Python 3.9+ installed on your system
  • Node.js 18+ and npm for frontend development
  • Groq API account and API key for LLM access
  • Git for version control

Setup Instructions

1. Clone & Install Backend Dependencies

git clone https://github.com/Maina314159/CreditExplain
cd CreditExplain

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

2. Configure Environment Variables

# Copy and update the environment template
cp .env.example .env

Edit .env with your configuration:

GROQ_API_KEY=your_groq_api_key_here
  • For frontend, also create an .env with the backend server added to it
# Go to frontend and create .env file
cd frontend
cp .env.example .env

# Then add backend server
VITE_BACKEND_URL=http://localhost:8000

3. Setup Frontend

cd frontend
npm install

4. Ingest Sample Documents

cd ..
python -m ingest.index

5. Run the Application

Start Backend API:

python -m api.app
# API server starts at http://localhost:8000

Start Frontend (in new terminal):

cd frontend
npm run dev
# Frontend starts at http://localhost:5173

CLI Application

  • You can also choose to interact with the terminal app:
python -m core.self_rag

๐Ÿ’ก Basic Usage

  1. Access the Web Interface: Open http://localhost:5173 in your browser

  2. Upload Documents: Navigate to the Upload page to add regulatory PDFs

    • Supported formats: PDF, text documents
    • Documents are automatically chunked and indexed
  3. Ask Questions: Use the Query interface to ask compliance questions like:

    • "What are the capital requirements for banks in Nigeria?"
    • "What are the financial regulations in Kenya"
    • "What documents are required for KYC verification?"
  4. Review Results: Each response includes:

    • Evidence-backed explanations with citations
    • Confidence scores (HIGH, MEDIUM, LOW)
    • Source document references with exact excerpts
    • Suggested follow-up questions
  5. Monitor Performance: Check the Metrics dashboard for system performance and audit logs

๐Ÿ“ Repository Structure

.
โ”œโ”€โ”€ .github/                 # GitHub templates and workflows
โ”œโ”€โ”€ api/                     # FastAPI backend application
โ”‚   โ”œโ”€โ”€ app.py               # Main FastAPI application with CORS
โ”‚   โ”œโ”€โ”€ models.py            # Pydantic models for request/response
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ core/                    # RAG pipeline core components
โ”‚   โ”œโ”€โ”€ self_rag.py          # Main SELF-RAG orchestration logic
โ”‚   โ”œโ”€โ”€ critic.py            # Critic model for retrieval decisions
โ”‚   โ”œโ”€โ”€ generator.py         # Response generation component
โ”‚   โ”œโ”€โ”€ retrieval.py         # Vector retrieval functionality
โ”‚   โ”œโ”€โ”€ reranker.py          # Cross-encoder reranking
โ”‚   โ”œโ”€โ”€ prompts.py           # LLM prompt templates
โ”‚   โ”œโ”€โ”€ provenance.py        # Audit logging and provenance tracking
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ data/                    # Document storage
โ”‚   โ”œโ”€โ”€ raw/                 # Original PDF documents
โ”‚   โ””โ”€โ”€ interim/             # Processed data files
โ”œโ”€โ”€ frontend/                # React TypeScript frontend
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ api/             # API client and hooks
โ”‚   โ”‚   โ”œโ”€โ”€ components/      # Reusable UI components
โ”‚   โ”‚   โ”œโ”€โ”€ pages/           # Main application pages
โ”‚   โ”‚   โ”œโ”€โ”€ hooks/           # Custom React hooks
โ”‚   โ”‚   โ”œโ”€โ”€ types/           # TypeScript type definitions
โ”‚   โ”‚   โ””โ”€โ”€ utils/           # Utility functions
โ”‚   โ”œโ”€โ”€ package.json
โ”‚   โ”œโ”€โ”€ vite.config.ts
โ”‚   โ””โ”€โ”€ tsconfig.json
โ”œโ”€โ”€ ingest/                  # Document ingestion pipeline
โ”‚   โ”œโ”€โ”€ loader.py            # PDF loading and parsing
โ”‚   โ”œโ”€โ”€ chunker.py           # Text chunking strategies
โ”‚   โ”œโ”€โ”€ index.py             # Vector indexing process
โ”‚   โ”œโ”€โ”€ normalize.py         # Text normalization
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ eval/                    # Evaluation and metrics
โ”‚   โ”œโ”€โ”€ metrics.py           # Performance metrics calculation
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ tests/                   # Test suites
โ”‚   โ”œโ”€โ”€ unit/                # Unit tests
โ”‚   โ”œโ”€โ”€ integration/         # Integration tests
โ”‚   โ”œโ”€โ”€ demo_data/           # Test data
โ”‚   โ””โ”€โ”€ scripts/             # Test scripts
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ””โ”€โ”€ README.md               # This file

โš ๏ธ Known Issues

Current Limitations

  1. PDF Parsing Accuracy: Complex PDF layouts with tables and multi-column formats may not parse perfectly
  2. Rate Limiting: Groq API has rate limits that may affect performance during high usage
  3. Context Length: Currently limited to ~1000 token chunks due to model constraints
  4. Metadata Extraction: Some document metadata (section headers, page numbers) may not be fully preserved

Performance Considerations

  • Initial document ingestion can be slow for large PDF collections
  • Real-time query processing typically takes 5-15 seconds depending on complexity
  • Vector search performance degrades with very large document collections (>10,000 chunks)

Browser Compatibility

  • Best experienced in modern browsers (Chrome, Firefox, Safari, Edge)
  • Mobile experience is functional but optimized for desktop use

๐Ÿ”ฎ Future Development

Planned Features

  • Real-time Collaboration: Multi-user support with shared workspaces
  • Advanced Document Types: Support for Word documents, HTML, and scanned PDFs
  • Custom Model Support: Integration with local LLMs and embedding models
  • Enhanced Analytics: Advanced dashboard with trend analysis and compliance reporting
  • API Extensions: Webhook support and third-party integrations

Research Directions

  • Improved chunking strategies for legal and regulatory documents
  • Multi-hop reasoning across multiple documents
  • Automated regulatory change detection and alerting
  • Cross-jurisdictional compliance mapping

๐Ÿ™ Acknowledgement

This project was developed for the NSK.AI RAG Hackathon 2025 and builds upon several open-source technologies and research:

  • SELF-RAG Paper (Asai et al., 2023) for the adaptive retrieval framework
  • LangChain and LangSmith for RAG orchestration tools
  • HuggingFace for transformer models and embeddings
  • FastAPI for the high-performance backend framework
  • React Query for efficient server state management

Special thanks to the regulatory bodies whose documents made this system possible:

  • Central Bank of Nigeria (CBN)
  • Central Bank of Kenya (CBK)
  • Financial Action Task Force (FATF)

๐Ÿ“ž Contact Information

For questions, issues, or contributions:

  • Project Maintainer: CreditExplain Team

We welcome bug reports, feature requests, and contributions from the community!

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

Built with โค๏ธ for the NSK.AI RAG Hackathon 2025

MIT License Python React

About

A transparent RAG assistant that explains credit and regulatory decisions with citations, follow-ups, and audit-ready metrics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 51.0%
  • TypeScript 46.8%
  • CSS 1.9%
  • Other 0.3%