Ruth Mutheu Mutuku Mutheu04

🧠 About Me

"I don't just analyse data — I translate it into decisions that move the needle."

I'm a commercially-minded Data Analyst & Data Scientist with an MSc in Data Science (University of Salford) and 5+ years of experience turning complex datasets into clear, actionable business insights across non-profit, FMCG, regulatory, and e-commerce domains.

My flagship project: an end-to-end ML pipeline on ~500,000 donor records for Marie Curie that achieved a ROC-AUC of 0.957, a PR-AUC of 0.775, and a 5× campaign targeting lift — showing that targeting just the top 10% of donors could capture a 78% response rate while cutting mailing volume by 70–80% and dramatically improving campaign ROI.

I'm equally comfortable writing complex SQL, automating workflows with Python, building Power BI dashboards that slash reporting time by 60%, and deploying scalable recommendation engines on Databricks.

📊 Impact at a Glance

Metric	Result
🏆 Best ML Model (ROC-AUC)	0.957 — Marie Curie Donor Prediction
📈 Campaign Targeting Lift	5× over baseline response rate
💹 Market Share Growth	+30% within 12 months
🎯 Demand Forecast Accuracy	~90% across 50+ product lines
⚡ Reporting Time Reduction	~60% via automated Power BI dashboards
👥 Stakeholders Trained	50 sales reps, 98% tool adoption rate
🔬 Regulatory Sources Automated	6+ international sources (ECHA, REACH, Stockholm Convention)

🚀 Featured Projects

🏥 Donor Response Prediction — Marie Curie Christmas Cash Appeal

Python · Scikit-learn · SHAP · SMOTE · CRISP-DM · Gradient Boosting

End-to-end ML pipeline on ~500K donor records predicting Christmas appeal response. Engineered 15+ behavioural features including RFM scores, donor fatigue indicators, mailing exposure counts, and engagement trend deltas. Compared Logistic Regression, Random Forest, and Gradient Boosting using hyperparameter tuning via RandomizedSearchCV, 5-fold stratified cross-validation, threshold sensitivity analysis, and SHAP-based model interpretability.

Results: ROC-AUC 0.957 · PR-AUC 0.775 · 5× targeting lift · 70–80% reduction in unnecessary outreach · top-10% scoring threshold → 78% response rate

🍷 Wine Quality Analysis & Prediction

R · Random Forest · XGBoost · Statistical Testing · EDA

ML project predicting wine quality from physicochemical attributes using R. Applied statistical hypothesis testing, feature importance analysis, and comparative model evaluation across Random Forest and XGBoost, with full visualisation pipeline.

🎮 Steam Game Recommender System

PySpark · ALS · Databricks · MLflow · SparkSQL · Big Data

Scalable collaborative filtering recommendation engine built on Databricks using PySpark ALS, analysing user play-history and rating patterns to serve personalised game recommendations. Experiment tracking with MLflow.

🌍 Global Population Dashboard — Power BI

Power BI · DAX · Power Query · Business Intelligence

Interactive Power BI dashboard analysing global population trends from 1960–2050 with demographic projections, regional breakdowns, and trend forecasting. Demonstrates production-level DAX and Power Query skills.

🏋️ Obesity Level Classification

Python · Scikit-learn · Multi-class Classification · Hyperparameter Tuning

Multi-class classification pipeline predicting obesity levels from lifestyle and dietary data. Compared Logistic Regression, Random Forest, and Gradient Boosting with full hyperparameter tuning and cross-validation.

🛒 Online Shopping Intention Clustering

Python · K-Means · Hierarchical Clustering · PCA · Silhouette Analysis

Customer segmentation project using unsupervised learning to profile online shoppers by browsing behaviour. Dimensionality reduction via PCA, cluster optimisation with silhouette analysis, and comparative evaluation of K-Means vs. hierarchical approaches.

💬 Cross-Domain Sentiment Analysis

Python · NLP · TF-IDF · Logistic Regression · Linear SVM

NLP pipeline classifying sentiment across Amazon, IMDB, and Yelp review corpora. Full text pre-processing, TF-IDF vectorisation, and comparative model evaluation (Logistic Regression vs. Linear SVM) with cross-domain generalisation testing.

🛠️ Tech Stack

Languages

Machine Learning & Data Science

Visualisation & Business Intelligence

Platforms & Tools

Statistical Tools

📚 Education

	Degree	Institution	Year
🎓	MSc Data Science	University of Salford, Manchester	2025 – 2026
🎓	BSc Economics & Statistics	Kenyatta University, Nairobi	2015 – 2019

🏅 Certifications

📈 GitHub Stats

💼 Professional Timeline

Jan 2026 – Present  │  Data Science Intern    @ Marie Curie              │  ML · SHAP · CRISP-DM · 500K records
Jul 2025 – Present  │  Consultant Analyst     @ Yordas Limited            │  Python · Web Scraping · Regulatory ETL
Sep 2023 – Dec 2024 │  Data Analyst           @ Westside Distillers       │  SQL · Power BI · +30% market share
Jul 2019 – Aug 2023 │  Data Analyst           @ Hasbah Kenya Ltd          │  Big Data · Forecasting · 90% accuracy

⭐ Found a project useful? A star means the world!
📫 Open to Data Analyst & Data Scientist roles — let's connect on LinkedIn!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly