I work at the intersection of data engineering, cloud platforms, and real-time systems.
Most days: designing predictable pipelines, taming messy data, tuning Spark jobs, or thinking about how systems behave at scale.
I value systems that are:
- predictable
- observable
- easy to maintain & reason about
Silence helps me focus. Clean logs make me happy. Calm infrastructure > shiny complexity.
- Python, SQL, PySpark
- MySQL / PostgreSQL / Snowflake
- Pandas & data wrangling
- AWS (S3, Glue, Lambda, EMR), Azure (Data Factory, Databricks)
- Airflow / dbt (orchestration & transformation)
- Kafka / Event Hubs
- Spark / Databricks (exploring internals & perf tuning)
- Docker & basic infra automation
- FastAPI / Streamlit for data apps
- React / TS / JS for quick UIs
- Streamlit apps connected to AWS S3 for quick data exploration
- Batch & near-real-time pipelines with Python + SQL
- Diving deeper into Azure Databricks, PySpark, Kafka → event-driven processing
- Building cleaner, more maintainable data workflows
- Advanced data modeling for analytics & warehouse workloads
- Spark internals, partitioning, broadcast joins, memory tuning
- Event-driven architectures & reliable message queues
- Writing better docs & diagrams for data systems (Mermaid / Excalidraw)