aksingh4545

Follow

🎯

Focusing

Ankit Kumar Singh aksingh4545

🎯

Focusing

Follow

Data Engineer | AWS | Azure | Spark | Databricks | Snowflake | Airflow

5 followers · 5 following

aksingh4545/README.md

👋 A bit about me

I work at the intersection of data engineering, cloud platforms, and real-time systems.

Most days: designing predictable pipelines, taming messy data, tuning Spark jobs, or thinking about how systems behave at scale.

I value systems that are:

predictable
observable
easy to maintain & reason about

Silence helps me focus. Clean logs make me happy. Calm infrastructure > shiny complexity.

🧠 Skills & Technologies

Core Data Stack (daily drivers)

Python, SQL, PySpark
MySQL / PostgreSQL / Snowflake
Pandas & data wrangling
AWS (S3, Glue, Lambda, EMR), Azure (Data Factory, Databricks)
Airflow / dbt (orchestration & transformation)

Streaming & Scale

Kafka / Event Hubs
Spark / Databricks (exploring internals & perf tuning)
Docker & basic infra automation

When needed

FastAPI / Streamlit for data apps
React / TS / JS for quick UIs

🔭 What I’m working on right now

Streamlit apps connected to AWS S3 for quick data exploration
Batch & near-real-time pipelines with Python + SQL
Diving deeper into Azure Databricks, PySpark, Kafka → event-driven processing
Building cleaner, more maintainable data workflows

🌱 Currently learning / leveling up

Advanced data modeling for analytics & warehouse workloads
Spark internals, partitioning, broadcast joins, memory tuning
Event-driven architectures & reliable message queues
Writing better docs & diagrams for data systems (Mermaid / Excalidraw)

🤝 Connect with me

_{Building calm data systems • One reliable pipeline at a time • 2026}

Pinned Loading

image_resize image_resize Public

This project implements an event-driven, serverless image processing pipeline on AWS. Images uploaded to Amazon S3 are automatically resized using AWS Lambda and Pillow, stored in a destination buc…

Python 4 1
streamlit_s3_pipeline streamlit_s3_pipeline Public

The system supports real-world resumes (PDF, DOCX, TXT), handles noisy formats, and follows industry-grade data engineering practices.

Python 3 1
Login_Cognito Login_Cognito Public

This repo about how to use AWS Congito fully managed services with streamlit application.

Python 2 2
flink-kafka flink-kafka Public

This repository is all about working with flink with docker, and real time ingestion of data from kafka and processed in real time on Flink
kafka-project kafka-project Public

In repository i have one project that read the data from csv file and store it in the output.csv file , python create streaming of data and kafka do will streaming that

Python
mlflow_pipeline mlflow_pipeline Public

An end-to-end Machine Learning + MLOps project that predicts a student’s final performance percentage using a production-ready ML pipeline, MLflow model registry, FastAPI inference service, and Doc…

Python