I've been building ML systems for about a decade now. Started in computer vision, took detours through causal inference and marketing analytics, and now I'm deep in LLM-based systems — mostly multi-agent architectures and RAG pipelines. UWaterloo math background.
Production AI systems — the unglamorous kind where you spend more time on data pipelines and evaluation than on model architecture. Currently doing a lot with retrieval-augmented generation and multi-agent orchestration. Before that, spent years on causal inference (MMM, incrementality testing) and classical ML.
Python · PyTorch · LlamaIndex · LangChain · Neo4j · PySpark · Docker · Azure/GCP · Go · SQL
- Implementing RAPTOR (recursive abstractive processing for tree-organized retrieval) from the Sarthi et al. paper to see if hierarchical summarization actually helps on messy enterprise docs
- Experimenting with DSPy for prompt optimization instead of hand-tuning
- Reading through the Causal Forests paper again — want to apply GRF to some uplift modeling stuff
I have mass opinions about evaluation metrics and why most RAG benchmarks are garbage. Also into bouldering, bad chess, and collecting mechanical keyboards I don't need.
