"I don't just analyse data — I translate it into decisions that move the needle."
I'm a commercially-minded Data Analyst & Data Scientist with an MSc in Data Science (University of Salford) and 5+ years of experience turning complex datasets into clear, actionable business insights across non-profit, FMCG, regulatory, and e-commerce domains.
My flagship project: an end-to-end ML pipeline on ~500,000 donor records for Marie Curie that achieved a ROC-AUC of 0.957, a PR-AUC of 0.775, and a 5× campaign targeting lift — showing that targeting just the top 10% of donors could capture a 78% response rate while cutting mailing volume by 70–80% and dramatically improving campaign ROI.
I'm equally comfortable writing complex SQL, automating workflows with Python, building Power BI dashboards that slash reporting time by 60%, and deploying scalable recommendation engines on Databricks.
| Metric | Result |
|---|---|
| 🏆 Best ML Model (ROC-AUC) | 0.957 — Marie Curie Donor Prediction |
| 📈 Campaign Targeting Lift | 5× over baseline response rate |
| 💹 Market Share Growth | +30% within 12 months |
| 🎯 Demand Forecast Accuracy | ~90% across 50+ product lines |
| ⚡ Reporting Time Reduction | ~60% via automated Power BI dashboards |
| 👥 Stakeholders Trained | 50 sales reps, 98% tool adoption rate |
| 🔬 Regulatory Sources Automated | 6+ international sources (ECHA, REACH, Stockholm Convention) |
Python · Scikit-learn · SHAP · SMOTE · CRISP-DM · Gradient Boosting
End-to-end ML pipeline on ~500K donor records predicting Christmas appeal response. Engineered 15+ behavioural features including RFM scores, donor fatigue indicators, mailing exposure counts, and engagement trend deltas. Compared Logistic Regression, Random Forest, and Gradient Boosting using hyperparameter tuning via RandomizedSearchCV, 5-fold stratified cross-validation, threshold sensitivity analysis, and SHAP-based model interpretability.
Results: ROC-AUC 0.957 · PR-AUC 0.775 · 5× targeting lift · 70–80% reduction in unnecessary outreach · top-10% scoring threshold → 78% response rate
R · Random Forest · XGBoost · Statistical Testing · EDA
ML project predicting wine quality from physicochemical attributes using R. Applied statistical hypothesis testing, feature importance analysis, and comparative model evaluation across Random Forest and XGBoost, with full visualisation pipeline.
PySpark · ALS · Databricks · MLflow · SparkSQL · Big Data
Scalable collaborative filtering recommendation engine built on Databricks using PySpark ALS, analysing user play-history and rating patterns to serve personalised game recommendations. Experiment tracking with MLflow.
Power BI · DAX · Power Query · Business Intelligence
Interactive Power BI dashboard analysing global population trends from 1960–2050 with demographic projections, regional breakdowns, and trend forecasting. Demonstrates production-level DAX and Power Query skills.
Python · Scikit-learn · Multi-class Classification · Hyperparameter Tuning
Multi-class classification pipeline predicting obesity levels from lifestyle and dietary data. Compared Logistic Regression, Random Forest, and Gradient Boosting with full hyperparameter tuning and cross-validation.
Python · K-Means · Hierarchical Clustering · PCA · Silhouette Analysis
Customer segmentation project using unsupervised learning to profile online shoppers by browsing behaviour. Dimensionality reduction via PCA, cluster optimisation with silhouette analysis, and comparative evaluation of K-Means vs. hierarchical approaches.
Python · NLP · TF-IDF · Logistic Regression · Linear SVM
NLP pipeline classifying sentiment across Amazon, IMDB, and Yelp review corpora. Full text pre-processing, TF-IDF vectorisation, and comparative model evaluation (Logistic Regression vs. Linear SVM) with cross-domain generalisation testing.
Languages
Machine Learning & Data Science
Visualisation & Business Intelligence
Platforms & Tools
Statistical Tools
| Degree | Institution | Year | |
|---|---|---|---|
| 🎓 | MSc Data Science | University of Salford, Manchester | 2025 – 2026 |
| 🎓 | BSc Economics & Statistics | Kenyatta University, Nairobi | 2015 – 2019 |
Jan 2026 – Present │ Data Science Intern @ Marie Curie │ ML · SHAP · CRISP-DM · 500K records
Jul 2025 – Present │ Consultant Analyst @ Yordas Limited │ Python · Web Scraping · Regulatory ETL
Sep 2023 – Dec 2024 │ Data Analyst @ Westside Distillers │ SQL · Power BI · +30% market share
Jul 2019 – Aug 2023 │ Data Analyst @ Hasbah Kenya Ltd │ Big Data · Forecasting · 90% accuracy
📫 Open to Data Analyst & Data Scientist roles — let's connect on LinkedIn!