I am a machine learning engineer and data scientist building from Nairobi, Kenya. My work lives where real-world data gets difficult: noisy property listings, cadastral maps, low-resource language tasks, competition datasets, and product interfaces that need to make model output understandable.
I like the whole system, not just the notebook: scrape the data, clean the edges, build the model, validate hard, ship the API, and make the result useful to someone.
current_mode:
build: Nairobi real estate intelligence
compete: Zindi ML challenges
explore: speech, LLMs, OCR, geospatial AI
ship_with: Python, FastAPI, Next.js, Supabase|
Competition Brain Validation, leakage checks, feature engineering, ensembles, and leaderboard discipline. |
Product Hands APIs, dashboards, scheduled pipelines, and interfaces that make data products usable. |
Local Lens AI for African housing, maps, language, agriculture, and public-interest datasets. |
| Track | How I use it |
|---|---|
| Models | Train, validate, compare, and explain ML systems for messy tabular, text, vision, and map data |
| Pipelines | Scrape, clean, enrich, schedule, and store analytics-ready datasets |
| Maps | Extract parcels, detect boundaries, run OCR, and turn geospatial files into usable layers |
| Products | Wrap models in APIs, dashboards, and workflows people can actually use |
An end-to-end housing intelligence system for Nairobi. It turns raw listings into structured market signals: prices, bedroom counts, neighborhoods, affordability bands, and dashboard-ready summaries.
Pipeline: listing scrape -> parsing -> Supabase -> analytics -> dashboard
Stack: Python, Supabase, Next.js, TypeScript, GitHub Actions
Why it matters: housing data is scattered and inconsistent; the product turns it into something searchable, comparable, and decision-ready.
Backend · Frontend · Live site
A geospatial AI pipeline for cadastral survey maps, built to move from scanned map imagery to structured polygon and text outputs.
Pipeline: raster maps -> boundary segmentation -> polygon cleaning -> OCR -> merged GIS output
Modeling: segmentation for parcels, post-processing for valid geometries, OCR for map labels
Result: Public score 0.965006861 · Private score 0.970242006
Repository · Data prep notebook
An NLP experiment that detects emotional signal from text and uses it to generate contextual story responses.
Idea: sentiment -> context -> generated story
Focus: language understanding, generation, and interaction design
input: raw datasets, maps, listings, language, competition briefs
process: clean -> validate -> model -> evaluate -> package
output: notebooks, APIs, dashboards, repositories, field-ready insights| I care about | Because |
|---|---|
| Strong baselines | They expose whether the complex idea is actually useful |
| Validation design | A good score only matters when it survives reality |
| Data quality | Most model problems start before training begins |
| Shipping | A useful model needs a path into a workflow |
- Make Nairobi property data easier to search, compare, and understand
- Build stronger competition pipelines for NLP, vision, geospatial, and tabular ML
- Package geospatial OCR and document-understanding workflows into reusable tools
- Push deeper into speech and language systems for low-resource African contexts



