Skip to content

Maina314159/CreditExplain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

68 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“˜ CreditExplain RAG

Evidence-backed, auditable explanations for credit decisions โ€” with citations, PII-safety, and regulator-ready audit trails.

CreditExplain RAG helps compliance officers and analysts quickly answer:

  • "Why was this loan declined?"
  • "Which clause justifies KYC step X?"

The system integrates advanced RAG techniques (SELF-RAG critic loop, reranking, provenance logging, PII redaction) to provide trustworthy, auditable explanations.

Try the system here: https://credit-explain.vercel.app

๐Ÿ“‹ Table of Contents

โœจ Project Overview

CreditExplain RAG addresses the critical need for transparent, evidence-based explanations in financial compliance and credit decisioning. Traditional AI systems often provide "black box" responses without verifiable sources, making them unsuitable for regulated environments.

Our solution provides:

  • Regulatory Compliance: Every explanation cites specific clauses from authoritative documents
  • Audit Trail: Complete provenance tracking with reflection token scoring
  • PII Protection: Automatic redaction of sensitive personal information
  • Multi-jurisdictional Support: Handling of Nigerian, Kenyan, and global financial regulations

Some Background as to how this project promotes Responsible/Ethical use of AI

AI in Credit Scoring & Loan Approval

1. Description and Purpose Banks and fintechs use AI models to assess creditworthiness by analyzing transaction history, repayment behavior, mobile data, and alternative data sources. The goal is faster, more accurate lending decisions.

2. Ethical Risks Bias & Financial Exclusion: Marginalized groups may be unfairly denied credit due to biased data. Lack of Explainability: Applicants often donโ€™t understand why loans are rejected. Privacy Concerns: Use of alternative data (e.g., phone usage) can be intrusive.

3. Responsible AI Mitigations This is where CreditExplain comes in!

-Fairness testing across demographics -Explainable credit decisions (reason codes) -Strict limits on data sources and informed consent.

๐Ÿ› ๏ธ Tech Stack

Backend

  • FastAPI - Modern Python web framework
  • LangChain - RAG orchestration and tooling
  • ChromaDB - Vector database for document storage
  • HuggingFace Transformers - Embeddings and reranking models
  • Groq API - LLM inference for critic and generator components
  • Pydantic - Data validation and serialization

Frontend

  • React 18 - User interface library
  • TypeScript - Type-safe development
  • Vite - Build tool and dev server
  • Tailwind CSS - Utility-first CSS framework
  • TanStack Query - Server state management
  • Axios - HTTP client for API communication

Machine Learning

  • SELF-RAG Architecture - Adaptive retrieval with reflection tokens
  • sentence-transformers/all-MiniLM-L6-v2 - Embedding model
  • cross-encoder/ms-marco-MiniLM-L-6-v2 - Reranker model
  • Llama 3 - LLMs for generation and critique
image

๐Ÿš€ Installation & Setup

Pre-requisites

  • Python 3.9+ installed on your system
  • Node.js 18+ and npm for frontend development
  • Groq API account and API key for LLM access
  • Git for version control

Setup Instructions

1. Clone & Install Backend Dependencies

git clone https://github.com/Maina314159/CreditExplain
cd CreditExplain

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r requirements.txt

2. Configure Environment Variables

# Copy and update the environment template
cp .env.example .env

Edit .env with your configuration:

GROQ_API_KEY=your_groq_api_key_here
  • For frontend, also create an .env with the backend server added to it
# Go to frontend and create .env file
cd frontend
cp .env.example .env

# Then add backend server
VITE_BACKEND_URL=http://localhost:8000

3. Setup Frontend

cd frontend
npm install

4. Ingest Sample Documents

cd ..
python -m ingest.index

5. Run the Application

Start Backend API:

python -m api.app
# API server starts at http://localhost:8000

Start Frontend (in new terminal):

cd frontend
npm run dev
# Frontend starts at http://localhost:5173

CLI Application

  • You can also choose to interact with the terminal app:
python -m core.self_rag

๐Ÿ’ก Basic Usage

  1. Access the Web Interface: Open http://localhost:5173 in your browser

  2. Upload Documents: Navigate to the Upload page to add regulatory PDFs

    • Supported formats: PDF, text documents
    • Documents are automatically chunked and indexed
  3. Ask Questions: Use the Query interface to ask compliance questions like:

    • "What are the capital requirements for banks in Nigeria?"
    • "What are the financial regulations in Kenya"
    • "What documents are required for KYC verification?"
  4. Review Results: Each response includes:

    • Evidence-backed explanations with citations
    • Confidence scores (HIGH, MEDIUM, LOW)
    • Source document references with exact excerpts
    • Suggested follow-up questions
  5. Monitor Performance: Check the Metrics dashboard for system performance and audit logs

๐Ÿ“ Repository Structure

.
โ”œโ”€โ”€ .github/                 # GitHub templates and workflows
โ”œโ”€โ”€ api/                     # FastAPI backend application
โ”‚   โ”œโ”€โ”€ app.py               # Main FastAPI application with CORS
โ”‚   โ”œโ”€โ”€ models.py            # Pydantic models for request/response
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ core/                    # RAG pipeline core components
โ”‚   โ”œโ”€โ”€ self_rag.py          # Main SELF-RAG orchestration logic
โ”‚   โ”œโ”€โ”€ critic.py            # Critic model for retrieval decisions
โ”‚   โ”œโ”€โ”€ generator.py         # Response generation component
โ”‚   โ”œโ”€โ”€ retrieval.py         # Vector retrieval functionality
โ”‚   โ”œโ”€โ”€ reranker.py          # Cross-encoder reranking
โ”‚   โ”œโ”€โ”€ prompts.py           # LLM prompt templates
โ”‚   โ”œโ”€โ”€ provenance.py        # Audit logging and provenance tracking
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ data/                    # Document storage
โ”‚   โ”œโ”€โ”€ raw/                 # Original PDF documents
โ”‚   โ””โ”€โ”€ interim/             # Processed data files
โ”œโ”€โ”€ frontend/                # React TypeScript frontend
โ”‚   โ”œโ”€โ”€ src/
โ”‚   โ”‚   โ”œโ”€โ”€ api/             # API client and hooks
โ”‚   โ”‚   โ”œโ”€โ”€ components/      # Reusable UI components
โ”‚   โ”‚   โ”œโ”€โ”€ pages/           # Main application pages
โ”‚   โ”‚   โ”œโ”€โ”€ hooks/           # Custom React hooks
โ”‚   โ”‚   โ”œโ”€โ”€ types/           # TypeScript type definitions
โ”‚   โ”‚   โ””โ”€โ”€ utils/           # Utility functions
โ”‚   โ”œโ”€โ”€ package.json
โ”‚   โ”œโ”€โ”€ vite.config.ts
โ”‚   โ””โ”€โ”€ tsconfig.json
โ”œโ”€โ”€ ingest/                  # Document ingestion pipeline
โ”‚   โ”œโ”€โ”€ loader.py            # PDF loading and parsing
โ”‚   โ”œโ”€โ”€ chunker.py           # Text chunking strategies
โ”‚   โ”œโ”€โ”€ index.py             # Vector indexing process
โ”‚   โ”œโ”€โ”€ normalize.py         # Text normalization
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ eval/                    # Evaluation and metrics
โ”‚   โ”œโ”€โ”€ metrics.py           # Performance metrics calculation
โ”‚   โ””โ”€โ”€ __init__.py
โ”œโ”€โ”€ tests/                   # Test suites
โ”‚   โ”œโ”€โ”€ unit/                # Unit tests
โ”‚   โ”œโ”€โ”€ integration/         # Integration tests
โ”‚   โ”œโ”€โ”€ demo_data/           # Test data
โ”‚   โ””โ”€โ”€ scripts/             # Test scripts
โ”œโ”€โ”€ requirements.txt         # Python dependencies
โ””โ”€โ”€ README.md               # This file

โš ๏ธ Known Issues

Current Limitations

  1. PDF Parsing Accuracy: Complex PDF layouts with tables and multi-column formats may not parse perfectly
  2. Rate Limiting: Groq API has rate limits that may affect performance during high usage
  3. Context Length: Currently limited to ~1000 token chunks due to model constraints
  4. Metadata Extraction: Some document metadata (section headers, page numbers) may not be fully preserved

Performance Considerations

  • Initial document ingestion can be slow for large PDF collections
  • Real-time query processing typically takes 5-15 seconds depending on complexity
  • Vector search performance degrades with very large document collections (>10,000 chunks)

Browser Compatibility

  • Best experienced in modern browsers (Chrome, Firefox, Safari, Edge)
  • Mobile experience is functional but optimized for desktop use

๐Ÿ”ฎ Future Development

Planned Features

  • Real-time Collaboration: Multi-user support with shared workspaces
  • Advanced Document Types: Support for Word documents, HTML, and scanned PDFs
  • Custom Model Support: Integration with local LLMs and embedding models
  • Enhanced Analytics: Advanced dashboard with trend analysis and compliance reporting
  • API Extensions: Webhook support and third-party integrations

Research Directions

  • Improved chunking strategies for legal and regulatory documents
  • Multi-hop reasoning across multiple documents
  • Automated regulatory change detection and alerting
  • Cross-jurisdictional compliance mapping

๐Ÿ™ Acknowledgement

This project was developed for the NSK.AI RAG Hackathon 2025 and builds upon several open-source technologies and research:

  • SELF-RAG Paper (Asai et al., 2023) for the adaptive retrieval framework
  • LangChain and LangSmith for RAG orchestration tools
  • HuggingFace for transformer models and embeddings
  • FastAPI for the high-performance backend framework
  • React Query for efficient server state management

Special thanks to the regulatory bodies whose documents made this system possible:

  • Central Bank of Nigeria (CBN)
  • Central Bank of Kenya (CBK)
  • Financial Action Task Force (FATF)

๐Ÿ“ž Contact Information

For questions, issues, or contributions:

  • Project Maintainer: CreditExplain Team

We welcome bug reports, feature requests, and contributions from the community!

๐Ÿ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License Python React

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors