Joseph Gitau josephgitau

Signal

I am a machine learning engineer and data scientist building from Nairobi, Kenya. My work lives where real-world data gets difficult: noisy property listings, cadastral maps, low-resource language tasks, competition datasets, and product interfaces that need to make model output understandable.

I like the whole system, not just the notebook: scrape the data, clean the edges, build the model, validate hard, ship the API, and make the result useful to someone.

current_mode:
  build:     Nairobi real estate intelligence
  compete:   Zindi ML challenges
  explore:   speech, LLMs, OCR, geospatial AI
  ship_with: Python, FastAPI, Next.js, Supabase

Competition Brain

Validation, leakage checks, feature engineering, ensembles, and leaderboard discipline.

Product Hands

APIs, dashboards, scheduled pipelines, and interfaces that make data products usable.

Local Lens

AI for African housing, maps, language, agriculture, and public-interest datasets.

📈 Live Zindi Stats

🔗 View full profile on Zindi →

Last updated: 2026-06-05 07:30:06 UTC

Build Map

Track	How I use it
Models	Train, validate, compare, and explain ML systems for messy tabular, text, vision, and map data
Pipelines	Scrape, clean, enrich, schedule, and store analytics-ready datasets
Maps	Extract parcels, detect boundaries, run OCR, and turn geospatial files into usable layers
Products	Wrap models in APIs, dashboards, and workflows people can actually use

Field Projects

01. Nairobi Property Pricing Platform

An end-to-end housing intelligence system for Nairobi. It turns raw listings into structured market signals: prices, bedroom counts, neighborhoods, affordability bands, and dashboard-ready summaries.

Pipeline: listing scrape -> parsing -> Supabase -> analytics -> dashboard
Stack: Python, Supabase, Next.js, TypeScript, GitHub Actions
Why it matters: housing data is scattered and inconsistent; the product turns it into something searchable, comparable, and decision-ready.

Backend · Frontend · Live site

02. Barbados Lands and Surveys Plot Automation Challenge

A geospatial AI pipeline for cadastral survey maps, built to move from scanned map imagery to structured polygon and text outputs.

Pipeline: raster maps -> boundary segmentation -> polygon cleaning -> OCR -> merged GIS output
Modeling: segmentation for parcels, post-processing for valid geometries, OCR for map labels
Result: Public score 0.965006861 · Private score 0.970242006

Repository · Data prep notebook

03. Sentiment Story Generation Bot

An NLP experiment that detects emotional signal from text and uses it to generate contextual story responses.

Idea: sentiment -> context -> generated story
Focus: language understanding, generation, and interaction design

Repository

Operating System

input:   raw datasets, maps, listings, language, competition briefs
process: clean -> validate -> model -> evaluate -> package
output:  notebooks, APIs, dashboards, repositories, field-ready insights

I care about	Because
Strong baselines	They expose whether the complex idea is actually useful
Validation design	A good score only matters when it survives reality
Data quality	Most model problems start before training begins
Shipping	A useful model needs a path into a workflow

Tools I Use

GitHub Activity

Current Missions

Make Nairobi property data easier to search, compare, and understand
Build stronger competition pipelines for NLP, vision, geospatial, and tabular ML
Package geospatial OCR and document-understanding workflows into reusable tools
Push deeper into speech and language systems for low-resource African contexts

Connect

Messy data in. Useful systems out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly