GitHub Developer Impact Tier Predictor

A Machine Learning-powered web application that predicts a GitHub developer's Impact Tier (Beginner, Advanced, or Elite) using repository activity, stars, forks, language diversity, and account statistics.

The application fetches real-time GitHub data using the GitHub REST API and uses an XGBoost model to classify developers into impact tiers.

📌 Overview

GitHub profiles contain valuable signals about a developer's open-source impact. This project analyzes a user's repositories, calculates impact-related metrics, and predicts their GitHub Impact Tier using Machine Learning.

The model evaluates repository popularity, developer engagement, language diversity, and account activity to estimate a developer's GitHub impact level.

✨ Features

🔍 Analyze any public GitHub profile
🤖 Machine Learning-based tier prediction
📊 Real-time GitHub API integration
📈 Confidence score visualization
🌐 Interactive Streamlit dashboard
⚡ XGBoost-powered predictions
📋 Detailed GitHub profile insights

🛠️ Tech Stack

Machine Learning

Python
Scikit-Learn
XGBoost
Pandas
NumPy

Visualization

Plotly
Matplotlib

Deployment

Streamlit

Data Source

GitHub REST API

📊 Feature Engineering

The model uses the following GitHub metrics:

Feature	Description
Following	Number of users followed
Public Repositories	Total public repositories
Total Stars	Sum of stars across repositories
Total Forks	Sum of forks across repositories
Language Diversity	Number of unique programming languages used
Account Age (Days)	GitHub account age
Stars per Repository	Average stars per repository
Forks per Repository	Average forks per repository

🎯 Target Variable

A custom GitHub Impact Score is calculated as:

impact_score = total_stars + 2 * total_forks

Forks are weighted higher because they represent deeper developer engagement and repository adoption compared to stars.

The impact score is divided into three balanced tiers using Pandas qcut():

Beginner
Advanced
Elite

🤖 Models Evaluated

Decision Tree Classifier

Accuracy: 97.5%

Random Forest Classifier

Accuracy: 94.5%

XGBoost Classifier ⭐

Accuracy: 99.0%

Cross Validation

5-Fold Cross Validation Accuracy: 96.88%

📈 Why Is The Accuracy So High?

The target variable (Impact Tier) is generated using repository impact metrics:

impact_score = total_stars + 2 * total_forks

The model is trained using features such as:

Total Stars
Total Forks
Stars per Repository
Forks per Repository
Public Repositories
Language Diversity

Because the target variable is strongly related to the selected features, the classification problem becomes highly learnable for tree-based models such as XGBoost.

Therefore, the reported accuracy reflects the strong relationship between GitHub activity metrics and the engineered Impact Tier rather than predicting subjective measures such as developer skill or experience.

🔄 Project Workflow

Collect GitHub user data using GitHub API
Fetch repository statistics
Perform feature engineering
Calculate GitHub Impact Score
Generate tier labels using qcut()
Preprocess features using Scikit-Learn Pipeline
Train multiple classification models
Evaluate model performance
Save trained models using Joblib
Deploy using Streamlit

📁 Project Structure

GitHub-Developer-Impact-Tier-Predictor/
│
├── app.py
├── model.ipynb
├── github_users.csv
├── xgb_model.pkl
├── pipeline.pkl
├── label_encoder.pkl
├── requirements.txt
├── .env
├── README.md
│
└── assets/

⚙️ Installation

Clone Repository

git clone https://github.com/yourusername/github-developer-impact-tier-predictor.git

cd github-developer-impact-tier-predictor

Install Dependencies

pip install -r requirements.txt

Create Environment Variables

Create a .env file:

GITHUB_TOKEN=your_github_personal_access_token

▶️ Run Locally

streamlit run app.py

🎓 Learning Outcomes

Data Collection using APIs
Feature Engineering
Classification Problems
XGBoost Modeling
Cross Validation
Model Evaluation
Streamlit Deployment
Real-Time Data Processing
Model Serialization using Joblib

🚀 Future Improvements

GitHub contribution analysis
Developer comparison dashboard
Organization-level analysis
Cloud deployment
Automated retraining pipeline
Advanced feature selection techniques

👨‍💻 Author

Arpit Shirbhate

Machine Learning • Data Science • Open Source Contributor

⭐ If you found this project useful, consider giving it a star.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Developer Impact Tier Predictor

📌 Overview

✨ Features

🛠️ Tech Stack

Machine Learning

Visualization

Deployment

Data Source

📊 Feature Engineering

🎯 Target Variable

🤖 Models Evaluated

Decision Tree Classifier

Random Forest Classifier

XGBoost Classifier ⭐

Cross Validation

📈 Why Is The Accuracy So High?

🔄 Project Workflow

📁 Project Structure

⚙️ Installation

Clone Repository

Install Dependencies

Create Environment Variables

▶️ Run Locally

🎓 Learning Outcomes

🚀 Future Improvements

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Models		Models
.gitignore		.gitignore
Model.ipynb		Model.ipynb
README.md		README.md
app.py		app.py
collector.py		collector.py
github_users.csv		github_users.csv
label_encoder.pkl		label_encoder.pkl
model.py		model.py
pipeline.pkl		pipeline.pkl
requirements.txt		requirements.txt
xgb_model.pkl		xgb_model.pkl

Folders and files

Latest commit

History

Repository files navigation

GitHub Developer Impact Tier Predictor

📌 Overview

✨ Features

🛠️ Tech Stack

Machine Learning

Visualization

Deployment

Data Source

📊 Feature Engineering

🎯 Target Variable

🤖 Models Evaluated

Decision Tree Classifier

Random Forest Classifier

XGBoost Classifier ⭐

Cross Validation

📈 Why Is The Accuracy So High?

🔄 Project Workflow

📁 Project Structure

⚙️ Installation

Clone Repository

Install Dependencies

Create Environment Variables

▶️ Run Locally

🎓 Learning Outcomes

🚀 Future Improvements

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages