CredTech is a real-time explainable credit intelligence platform that continuously ingests multi-source financial data to generate dynamic creditworthiness scores. Unlike traditional credit rating agencies that update infrequently, our platform provides real-time, explainable credit assessments with transparent feature-level explanations.
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Data Sources βββββΆβ Feature Engine βββββΆβ ML Pipeline β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β’ Alpha Vantage β β β’ Financial β β β’ CatBoost β
β β’ News API β β Metrics β β β’ Neural Nets β
β β’ Finnhub β β β’ Sentiment β β β’ SHAP β
β β’ FMP β β Analysis β β β’ Ensemble β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit Dashboard β
β β’ Risk Gauges β’ SHAP Waterfall β’ News Sentiment Timeline β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What makes us different: We implement the Black-Cox first-passage structural model, a sophisticated econometric approach that models default as the first time a firm's asset value drops below a time-dependent barrier.
def black_cox_pod(V, B, mu, sigma, T=1.0):
"""
Black-Cox Probability of Default (Structural Credit Risk).
"""
with np.errstate(divide='ignore', invalid='ignore'):
A0 = np.log(V / B)
at = mu - (sigma**2) / 2
denom = sigma * np.sqrt(T)
d1 = (-A0 + at * T) / denom
d2 = (-A0 - at * T) / denom
pod = norm.cdf(d1) + np.exp((-2 * at * A0) / (sigma**2)) * norm.cdf(d2)
return podWhy this matters: Unlike FICO scores that are primarily backward-looking statistical constructs, the Black-Cox model provides an economic foundation by relating default probability to fundamental asset dynamics and market volatility. This approach:
- Captures continuous default risk rather than point-in-time assessments
- Incorporates market volatility and asset dynamics
- Provides theoretical grounding in option pricing theory
- Enables scenario analysis and stress testing
Corporate Risk Assessment (30+ metrics):
- Cash-flow Quality: FCF/NI ratio, CapEx/Depreciation ratio
- Leverage & Solvency: Market leverage, Debt/EBITDA, Interest coverage
- Liquidity: Cash runway, Quick ratio, Working capital cycle
- Market Signals: Yield spreads, Beta, Short interest ratios
Sovereign Risk Assessment (15+ metrics):
- Fiscal Health: Debt/GDP, Primary balance/GDP
- External Stability: Current account/GDP, Import coverage
- Debt Dynamics: External debt/exports, Debt service/revenue
Econometric Advantage: These metrics go beyond traditional ratios by incorporating:
- Dynamic relationships between cash flows and capital structure
- Market-based signals that reflect real-time sentiment
- Cross-sectional and time-series analysis capabilities
- Scenario-based stress testing frameworks
Technical Innovation: Our system combines four distinct ML approaches:
- Risk Score ANN: 32β16β1 neural network for base risk assessment
- CatBoost Classifier: Gradient boosting optimized for categorical features
- Main Neural Network: 128β64β1 with dropout and batch normalization
- Graph Neural Networks: Relationship modeling via GCN layers
Why this is superior:
- Ensemble robustness: Multiple models reduce single-point-of-failure risk
- Feature complementarity: Different models capture different aspects of risk
- Adaptive learning: Neural networks adapt to changing market conditions
- Graph relationships: Captures systemic risk through network effects
Innovation: Integration of PyTorch Geometric for relationship modeling:
class GNN(nn.Module):
def __init__(self, input_dim: int, hidden_dim: int = 64, output_dim: int = 16):
super(GNN, self).__init__()
self.conv1 = GCNConv(input_dim, hidden_dim)
self.conv2 = GCNConv(hidden_dim, output_dim)Business Impact: Traditional models ignore interconnectedness. Our GNN approach:
- Models counterparty relationships and supply chain dependencies
- Captures contagion effects during market stress
- Identifies systemic risk clusters in portfolios
- Enables portfolio-level optimization
Technical Implementation: Multi-modal learning combining numerical and textual data:
- 768-dimensional BERT embeddings for news sentiment
- Real-time processing of market communications
- Attention mechanisms for relevant information extraction
Advantage over competitors: Most credit models ignore unstructured data. Our approach:
- Incorporates forward-looking sentiment vs. backward-looking financials
- Processes real-time news flow for immediate risk updates
- Handles multiple languages and financial jargon
- Provides interpretable sentiment contributions
Technical Implementation:
def explain_prediction(self, X: pd.DataFrame) -> Dict:
if self.shap_explainer is None:
return {"error": "SHAP explainer not available"}
shap_values = self.shap_explainer.shap_values(X)
if isinstance(shap_values, list):
shap_values = shap_values[1] # positive class
contributions = {f: float(val) for f, val in zip(self.feature_names, shap_row)}
return {
"feature_contributions": contributions,
"top_risk_factors": positive_factors,
"top_protective_factors": negative_factors
}Regulatory Advantage: Unlike LLM-based explanations that can hallucinate:
- Mathematically consistent explanations based on game theory
- Additive feature contributions sum to final prediction
- Regulatory compliant with GDPR "right to explanation"
- Stakeholder friendly visualizations via waterfall charts
Implementation: Built-in bias detection and mitigation:
def evaluate_fairness(self, y_true: np.ndarray, y_pred: np.ndarray, sensitive_features: np.ndarray):
metric_frame = MetricFrame(
metrics=selection_rate,
y_true=y_true,
y_pred=y_pred,
sensitive_features=sensitive_features
)
return metric_frame.by_groupCompetitive Advantage: Proactive fairness assessment vs. reactive compliance:
- Algorithmic auditing across protected characteristics
- Disparate impact analysis built into model pipeline
- Fairness-accuracy tradeoff optimization
- Continuous monitoring for model drift
Data Ingestion Layer:
- Structured Data: Alpha Vantage (financial overview), Finnhub (market data), FMP (financial statements)
- Unstructured Data: News API (sentiment analysis), real-time news processing
- Rate Limiting: Built-in fallback mechanisms and caching to handle API limits
Feature Engineering:
- Financial ratios calculation (FCF/NI, Debt/EBITDA, Quick Ratio, etc.)
- Sentiment analysis using VADER and TextBlob
- Corporate metrics computation with Black-Cox probability of default
- Time-series feature extraction
ML Pipeline:
- Ensemble Model: CatBoost + Neural Networks + Risk Score ANN
- Explainability: SHAP TreeExplainer for feature importance
- Architecture: Multi-modal learning with graph embeddings and text embeddings (BERT)
- Training: Incremental learning capability with model persistence
Presentation Layer:
- Interactive Streamlit dashboard
- Real-time score updates with explanations
- Visualization suite (Plotly-based gauges, waterfalls, timelines)
- Python 3.11: Core runtime environment
- PyTorch: Deep learning framework for neural networks
- CatBoost: Gradient boosting for structured data
- Transformers: BERT embeddings for text analysis
- PyTorch Geometric: Graph neural networks
- SHAP: Model explainability
- scikit-learn: Feature preprocessing and metrics
- Pandas: Data manipulation and analysis
- NumPy: Numerical computations
- Requests: HTTP API calls
- TextBlob & VADER: Sentiment analysis
- Alpha Vantage, News API, Finnhub: External data sources
- Streamlit: Web application framework
- Plotly: Interactive visualizations
- Custom CSS: Enhanced UI/UX
- Docker: Containerization
- Docker Compose: Multi-service orchestration
- Joblib: Model serialization
- Python-dotenv: Environment management
- Streamlit over Flask/FastAPI: Rapid prototyping for ML dashboards with built-in interactivity
- CatBoost + PyTorch Ensemble: CatBoost excels at tabular data while PyTorch handles multi-modal inputs
- SHAP for Explainability: Industry standard for model interpretability without LLM dependency
- Docker: Ensures reproducible deployments across environments
- Multiple API Sources: Diversified data pipeline reduces single-point-of-failure risk
- Docker Engine 20.0+
- Docker Compose 1.29+
- Git
- Clone the repository
git clone https://github.com/yourusername/credtech.git
cd credtech- Set up environment variables
# Create .env file with your API keys
cat > .env << EOF
ALPHA_VANTAGE_API_KEY=your_alpha_vantage_key
NEWS_API_KEY=your_news_api_key
FINNHUB_API_KEY=your_finnhub_key
FMP_KEY=your_fmp_key
TWELVEDATA_API_KEY=your_twelvedata_key
EOF- Build and run with Docker Compose
docker-compose up --build- Access the application
- Open browser to
http://localhost:8501 - The application will automatically train models on first run
Dockerfile Structure:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py"]Health Checks:
- Built-in health monitoring via Streamlit's
/_stcore/health - Automatic container restart on failure
- 30s interval health checks with 3 retries
Volume Mounts:
./models:/app/models- Persistent model storage./data:/app/data- Data cache directory
- Python 3.11 (3.9-3.11 supported, avoid 3.13 due to dependency conflicts)
- pip or conda
- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Configure API keys
cp .env.example .env
# Edit .env file with your API keys- Run the application
streamlit run app.py| Service | Purpose | Free Tier Limits | Signup URL |
|---|---|---|---|
| Alpha Vantage | Company financials | 5 calls/min, 500/day | alphavantage.co |
| News API | News sentiment | 1000 requests/day | newsapi.org |
| Finnhub | Market data | 60 calls/min | finnhub.io |
| FMP | Financial statements | 250 calls/day | financialmodelingprep.com |
For Streamlit Cloud:
- Add keys to Streamlit secrets management
- Access via
st.secrets["KEY_NAME"]
For Local Development:
- Use
.envfile withpython-dotenv - Environment variables loaded automatically
For Docker:
- Pass via environment variables in docker-compose.yml
- Supports
.envfile in project root
Multi-Model Ensemble:
- Risk Score ANN: 32β16β1 neural network for base risk assessment
- CatBoost Classifier: Gradient boosting on engineered features
- Main Neural Network: 128β64β1 with dropout and batch normalization
- Ensemble Averaging: Weighted combination of model predictions
Feature Engineering:
- Financial Metrics: FCF/NI ratio, Debt/EBITDA, Quick Ratio, Market Leverage
- Structural Model: Black-Cox probability of default calculation
- Sentiment Features: News sentiment aggregation and volatility
- Graph Embeddings: Company relationship networks (16-dim)
- Text Embeddings: BERT-based document representations (768-dim)
Explainability Layer:
- SHAP Values: Feature contribution analysis
- Waterfall Charts: Visual impact breakdown
- Risk Factor Identification: Top positive/negative contributors
- Plain Language Summaries: Non-technical explanations
- Ensemble AUC: >0.85 on validation set
- Training Time: ~2-3 minutes on CPU
- Inference Time: <100ms per prediction
- Model Size: ~50MB serialized
- Dynamic Updates: Scores react to market events within minutes
- Multi-Factor Analysis: 30+ engineered features from diverse data sources
- Risk Categorization: Low/Medium/High risk classification with confidence scores
- SHAP Integration: Feature-level impact analysis without black-box explanations
- Visual Explanations: Interactive charts showing "why this score"
- Trend Analysis: Historical risk evolution tracking
- Event Attribution: Links score changes to specific news/market events
- Risk Gauges: Real-time creditworthiness visualization
- News Sentiment Timeline: Market event impact tracking
- Feature Importance: Dynamic ranking of risk factors
- Company Comparison: Side-by-side risk analysis
- Multi-Source Fusion: Combines financial statements, market data, and news
- Rate Limit Handling: Intelligent caching and fallback mechanisms
- Data Quality: Automated cleaning and normalization pipelines
- Scalability: Designed for dozens of entities across sectors
- CPU: 2 cores, 2.0 GHz
- RAM: 4GB (8GB recommended)
- Storage: 2GB free space
- Network: Stable internet for API calls
- CPU: 4+ cores, 3.0 GHz
- RAM: 16GB
- Storage: 10GB SSD
- Network: Low-latency connection (< 100ms to API endpoints)
- Rate Limits: Free tier APIs limit real-time capabilities
- Data Quality: Dependent on external API reliability
- Cost Scaling: Production usage requires paid API tiers
- Training Data: Uses synthetic data for demonstration
- Cold Start: New entities require initial data accumulation
- Market Coverage: Optimized for US equity markets
- Python 3.13 Incompatibility: UMAP/Numba dependencies limit Python version
- Memory Usage: BERT models require significant RAM
- Compute Requirements: Real-time inference needs adequate CPU
Ensemble vs Single Model:
- β Chosen: Ensemble approach for better accuracy and robustness
- β Rejected: Single model for simplicity (sacrifices performance)
Streamlit vs Custom Frontend:
- β Chosen: Streamlit for rapid prototyping and ML-focused UI
- β Rejected: React/Vue for production-grade UX (development time)
Docker vs Native Deployment:
- β Chosen: Docker for reproducible, portable deployments
- β Rejected: Native installation (environment conflicts)
- Real-time WebSocket Updates: Live score streaming
- Advanced ML Models: Transformer-based time series models
- Extended Market Coverage: International markets and bonds
- Alert System: Configurable risk threshold notifications
- Historical Backtesting: Strategy performance analysis
- Microservices Architecture: Separate data, model, and UI services
- Database Integration: PostgreSQL/TimescaleDB for historical data
- Kubernetes Deployment: Container orchestration for production
- CDN Integration: Global content delivery optimization
We would like to thank all the amazing contributors who have been part of this project:
- Om Karmakar - omkarmakar07@gmail.com
- Jotiraditya Banerjee - joti.ban.2710@gmail.com
- Rudra Ray - itisrudraray@gmail.com
- Oyshi Mukherjee - oyshi0911@gmail.com
- Kingshuk Bhandary - kingshukbhandaryedm@gmail.com
Developed for the CredTech Hackathon organized by The Programming Club, IIT Kanpur, and powered by Deep Root Investments. This platform addresses the challenge of creating transparent, real-time credit intelligence to replace opaque traditional rating methodologies.
Built with β€οΈ for transparent financial intelligence