📍 Denton, Texas, USA | 📧 GowthamVenkatEathamokkala@my.unt.edu | 📞 +1-940-536-4494
I'm a Data Engineer and Data Analyst with 2+ years of industry experience and an M.S. in Advanced Data Analytics (GPA: 3.64) from the University of North Texas. I specialize in building scalable data pipelines, transforming complex datasets into actionable business insights, and delivering clean, reliable data to analytics and ML teams.
I have hands-on experience across the full data stack — from streaming ingestion with Apache Kafka and ETL design in SQL/Python, to dashboards in Power BI and Tableau and ML workflows in scikit-learn. I enjoy working at the intersection of engineering rigor and analytical storytelling.
Currently a PhD student in Information Science at UNT (Fall 2026) and actively seeking RA / TA / GRA opportunities at UNT.
Sep 2024 – Present
- Designed streaming pipelines processing 10M+ financial records/day using Apache Kafka and PostgreSQL; reduced query latency by 40% through optimized SQL views and indexing.
- Built layered analytical data models delivering reliable, clean data to analytics and reporting teams with full documentation.
- Implemented modular ETL transformations achieving 100% data accuracy for downstream applications.
- Collaborated with cross-functional stakeholders to translate business requirements into scalable, maintainable data solutions.
Jan 2022 – Dec 2022
- Built automated Power BI dashboards with dynamic filtering; reduced report generation time by 80% and saved 20+ hours/month in recurring financial workflows.
- Analyzed 500k+ operational and financial records using SQL Server / SSIS; improved resource allocation efficiency by 15%.
- Applied statistical analysis and predictive modeling to forecast trends, improving budget planning and operational efficiency.
| Degree | Institution | Year | GPA |
|---|---|---|---|
| Ph.D. Information Science (Concentration: Data Science) | University of North Texas | 2026– | — |
| M.S. Advanced Data Analytics | University of North Texas | 2024 | 3.64 / 4.00 |
| B.Tech. Electronics Engineering | GRIET, Hyderabad, India | 2022 | 3.36 / 4.00 |
Languages & Data Science
Python SQL R PySpark
Pandas NumPy scikit-learn PyTorch TensorFlow
Data Engineering & Cloud
Apache Kafka Apache Spark Snowflake
GCP BigQuery Dataflow Pub/Sub Vertex AI
PostgreSQL MySQL Microsoft SQL Server
Visualization & BI
Power BI Tableau Looker Studio Excel Matplotlib Seaborn
ML & Analytics
Regression Classification Clustering Random Forest SVM
Time Series Forecasting PCA Model Evaluation Uncertainty Quantification
Tools
Git Jupyter Notebook LaTeX FastAPI
| Repository | Description | Tech |
|---|---|---|
| IPL-Analysis | A deep-dive analysis of Indian Premier League cricket(2008 - 2023) using custom analytical metrics | Python |
| superstore-profitability-analysis | Analyzed a retail superstore's 4-year sales dataset (2014-2017) to uncover profitability challenges and recommend actionable strategies | Python · PowerBI |
| trisql-framework | 3-stage Text-to-SQL pipeline (TriSQL architecture) — semantic schema selector, structure-aware SQL generator, complexity-aware refiner; 70% execution accuracy on the Spider benchmark, 100% executability, zero GPU or API costs | Python · FastAPI · Ollama · SQLite · sentence-transformers |
| TDSP-Transportation_Data_Science_Project | End-to-end spatiotemporal analysis of 200k+ NYC crash records — time-series decomposition, geospatial hotspot clustering, anomaly detection; findings presented at USDOT Federal Highway Administration | Python · Jupyter · GeoPandas |
| Data-Analysis-using-python | Collection of data analysis workflows covering EDA, data cleaning, feature engineering, and statistical visualizations across real-world datasets | Python · Pandas · Jupyter |
| Machine-Learning-using-Python | Foundational to intermediate ML implementations: regression, classification, clustering, and model evaluation with real datasets | Python · scikit-learn · Jupyter |
| Power-BI-Projects | Business intelligence dashboards with dynamic filtering, KPI cards, and drill-through views for financial and operational reporting | Power BI |
| MSSQL_Queries | T-SQL query library covering complex joins, window functions, CTEs, stored procedures, and query performance tuning | T-SQL · SQL Server |
| Pizza-Sales-Excel-Project | Sales analytics dashboard built with Pivot Tables, dynamic charts, and slicers to surface revenue trends and product performance | Excel |
| PySpark-in-DataBricks | Practice work with PySpark inside Databricks, focusing on data manipulation, transformations, and analytics at scale | PySpark |
| SQL-for-Data-Engineering | This Project demonstrates my SQL skills applied in both real-world data engineering workflows and practice queries | PLpgSQL |
| Vsion-Technologies-CaseStudy | transformed raw data streams from Kafka topics into clean, business-ready datasets by designing optimized SQL views on PostgreSQL staging tables | Kafka · PostgreSQL |
| Zetatek-DataAnalysis-CaseStudy | Analyzed operational and financial datasets to uncover business trends, optimize resource allocation, and streamline reporting processes for leadership | SSIS · MSSQL · Power BI · Excel |
- Al-Edhari, A., Eathamokkala, G. V., & Rahouti, M. (2026). Response drift across frontier large language models. Manuscript under review at Nature Machine Intelligence.
- B. V. Kumar et al. (2022). Analysis of an IoT based Water Quality Monitoring System. IEEE I-SMAC 2022. DOI: 10.1109/I-SMAC55078.2022.9987360
| Certification | Provider | Year |
|---|---|---|
| Data Engineering & ML Specialization | Google Cloud | 2024 |
| Advanced Data Analytics Professional | 2024 | |
| Power BI for Data Analysts | Microsoft | 2024 |
| Advanced SQL for Data Engineering | 2024 |
English — Professional Telugu — Native Hindi — Conversational Tamil — Conversational
