CompoundV2 Credit Scoring System: A DeFi Creditworthiness Model

A data-driven credit scoring model for Ethereum wallets based on their transaction history with the Compound V2 protocol. This project leverages a hybrid machine learning strategy to transform raw on-chain data into an interpretable and robust credit score. This model uses the three lagest json files: compoundV2_transactions_etherium_chunk_0, 1 and 2 as its datasets, which have not been uploaded for privacy concerns.

Overview

In Decentralized Finance (DeFi), assessing the creditworthiness of an anonymous wallet is a critical challenge. This project implements a complete pipeline to address this problem by analyzing on-chain behavior. It ingests transaction data (deposits, borrows, repays, etc.), engineers features that reflect financial health and risk, and applies a transparent scoring model to generate a final score from 0 to 100.

The result is a reliable and data-driven metric for quantifying wallet risk and reliability.

Key Features

Robust Data Ingestion: Efficiently loads and cleans data from multiple JSON files containing raw transaction records.
Comprehensive Feature Engineering: Creates over 20 meaningful features for each wallet, covering financial health, account history, activity patterns, and risk indicators.
Interpretable Scoring Model: The final score is a weighted average of three core components: Health, Trust, and Risk.
Expert-in-the-Loop Logic: The model incorporates domain-specific business rules to penalize high-risk events like liquidations or loan defaults, ensuring scores are practical and realistic.
Weight Sensitivity Analysis: Includes a script to test the model's stability by measuring how the top-ranked wallets change when model weights are adjusted.
Detailed Validation Notebook: A Notebook is provided for full model validation, visualizing score distributions and confirming the profiles of the highest and lowest-scoring wallets.

Machine Learning Strategy

This project employs a hybrid approach that combines statistical modeling with expert-defined business rules. This strategy ensures the model is both data-driven and aligned with fundamental principles of credit risk.

Feature Engineering: Raw transaction logs are aggregated to create a wallet-level feature set. Logarithmic transformations are applied to normalize skewed distributions (e.g., transaction amounts, account age).
Non-Linear Scaling: A sklearn.preprocessing.QuantileTransformer is used to scale all features to a uniform distribution between 0 and 1. This is a powerful, non-parametric method that is robust to outliers and does not assume a specific distribution for the input data.
Component-Based Scoring: Features are grouped into three logical components (Health, Trust, Risk). An intermediate score is calculated for each component, providing model interpretability.
Weighted Aggregation: The final raw score is a linear combination of the three component scores, allowing for easy tuning of their relative importance.
Rule-Based Overrides: After the statistical score is calculated, a set of deterministic rules is applied. For instance, any wallet that has been liquidated or has a very low repayment ratio receives a significant score penalty. This "expert-in-the-loop" step ensures that critical, unambiguous risk factors are never missed by the model.

Model Validation and Performance

The model's accuracy and logical consistency were validated using the tests/model_performance_analysis.ipynb notebook. The analysis confirms that the model performs as expected and successfully separates user profiles based on risk.

Key Performance Findings:

High-Scoring Wallets (Score 85-95): The model correctly identifies top-tier users. These wallets consistently exhibit zero liquidations and a perfect repayment ratio (1.0), demonstrating their reliability.
Low-Scoring Wallets (Score 10-25): The model accurately pinpoints high-risk users. This group is characterized by one of two failure modes:
1. A history of being liquidated.
2. A near-zero repayment ratio, indicating a loan default.
Effective Risk Differentiation: The model demonstrates a strong ability to differentiate between distinct types of risk, proving its logic is robust and aligned with real-world credit assessment principles. The score distribution shows clear separation between low-risk, average, and high-risk user groups.

Project Structure

zero-credit-score/ ├── data/ │ └── *.json # Input transaction data files (took the three largest chunks) ├── output/ │ └── top_1000_wallets.csv # output ├── src/ │ └── zeru_credit_score/ │ ├── loader.py # Data loading and cleaning │ ├── features.py # Feature engineering logic │ ├── scoring.py # Scoring model and overrides │ ├── main.py # Main CLI entrypoint for scoring │ └── run_sensitivity.py# CLI for sensitivity analysis ├── tests/ │ └── model_performance_analysis.ipynb # Notebook for model validation ├── README.md # This file └── requirements.txt # Project dependencies

Getting Started

Prerequisites

Python 3.8+
git
A virtual environment manager (venv or conda)

Installation

Clone the repository:

git clone https://github.com/your-username/zero-credit-score.git
cd zero-credit-score

Create and activate a virtual environment: Using conda (recommended):
```
conda create -n zeru_env python=3.10
conda activate zeru_env
```
Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

1. Generate Credit Scores

To run the main pipeline and generate a CSV file with the top-scoring wallets, use main.py.

python src/zeru_credit_score/main.py --data-dir data/ --output-dir output/ --topk 1000

This will process the data in data/ and save top_1000_wallets.csv to the output/ directory.

2. Run Sensitivity Analysis

To check how stable the leaderboard is to changes in model weights, run the run_sensitivity.py script.

python src/zeru_credit_score/run_sensitivity.py

This will output the Jaccard similarity between the base model and two alternatives, indicating model robustness.

To visually inspect and validate the model's performance, use the provided notebook under tests as test.ipynb

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
output		output
src/zeru_credit_score		src/zeru_credit_score
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CompoundV2 Credit Scoring System: A DeFi Creditworthiness Model

Table of Contents

Overview

Key Features

Machine Learning Strategy

Model Validation and Performance

Project Structure

Getting Started

Prerequisites

Installation

Usage

1. Generate Credit Scores

2. Run Sensitivity Analysis

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CompoundV2 Credit Scoring System: A DeFi Creditworthiness Model

Table of Contents

Overview

Key Features

Machine Learning Strategy

Model Validation and Performance

Project Structure

Getting Started

Prerequisites

Installation

Usage

1. Generate Credit Scores

2. Run Sensitivity Analysis

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages