This constantly evolving repository brings together my projects, exercises, professional internships, and certifications in the field of data analysis. I am currently at ninth semester of a Computer Science Engineering degree, while also completing my professional internship at Nidix Networks. This allows me to apply my knowledge in a real-world environment and complement my academic training.
The portfolio includes Python projects I have developed throughout my studies for various elective courses, such as Machine Learning and Data Science. It also includes other data analysis tools like Excel, SQL, Power BI, and Minitab, among others. In addition to showcasing my technical skills, this portfolio also reflects my commitment to continuous learning, constant improvement, and the adoption of good engineering practices, such as documentation, version control, and the use of pipelines.
This repository will continue to grow progressively, incorporating the new tools, methodologies, and technologies I learn, especially now that I am exposed to real-world processes within the industry. I believe it's essential to stay up-to-date in a rapidly evolving technological environment; therefore, in addition to reinforcing what I've learned, I strive to integrate modern approaches that add value to any project or company I'm involved with.
Collection of certificates from courses and training programs I have completed or am about to complete (checklist in my main repository: Allan19k), including:
- Kaggle Learn (Python, Pandas, Data Cleaning, Data Visualization, SQL, Machine Learning, Geospatial Analysis, etc.)
- Santander Open Academy (Excel, ChatGPT Fundamentals, Power BI)
Excel exercises applied to analysis and dashboarding (in progress), including:
- Basic exercises from the Santander Open Academy Excel course
- Interactive dashboards and conditional formatting
- Simulation exercises and automated reports
- Intermediate and advanced exercises using various Excel functions and tools
Exercises and projects developed in 7th semester during the optional course of Machine Learning with the help of Dr. Graciela María de Jesús Ramírez Alonso, in addition to various complementary Kaggle Learn courses:
- Algebra Review for Neural Networks (Algebra exercises using the NumPy library, focused on reinforcing essential knowledge for ML)
- Hyperparameter Search (GridSearchCV with MLPClassifier on
load_wine) - Time Series Prediction (RNN vs. LSTM for EUR/USD)
- Transfer Learning with ResNet50 for image classification
- Smart Dairy Farming: Milk Yield Classification App (Final Project for the course) This consists of a mobile application that uses a computer vision-trained classification model to predict milk production levels (high, medium, or low) from images. It was published as a scientific article and demonstrates the practical application of machine learning in the agricultural sector.
- Complementary Courses:
- Intro to Machine Learning
Applied Statistics exercises completed in 5th semester under the guidance of Teacher Patricia Guadalupe Orpinel Ureña:
- Chi-square Tests (goodness of fit and independence)
- ANOVA (one-way and two-way)
- Linear Regression (simple and multiple)
- Formal conclusions, graphs, and validation of assumptions
Projects and exercises on Python using various libraries for data analysis:
- Python Fundamentals (syntax, functions, lists, conditionals…)
- Statistical Analysis using Statistics
- Generation of Dummy data with Faker for export to CSV and XLSX
- Kaggle Learn courses (Pandas, Data Cleaning, and Data Visualization)
- Scraping and automation (automating data retrieval from various sources (APIs or web scraping) using threads to improve performance)
- Kaggle notebook adaptations
Exercises from the Database Fundamental course developed along with Professor José Saúl de Lira Miramontes in 6th semester and other SQL exercises and projects :
- Installation and configuration of Oracle 21c XE and HR schema
- Basic queries: SELECT, WHERE, JOIN, GROUPBY, subqueries, DML, views
- Exercises organized by topic with screenshots and explanations
Kaggle Learn course focused on the ethical principles of using AI. Through examples and real-world cases, key concepts such as algorithmic bias, privacy, fairness, and accountability in automated systems were explored.
I included this content in my portfolio because I consider it fundamental to understand the social impact of the tools we develop. In particular, I am interested in applying these principles within data analysis and artificial intelligence projects in an ethical and transparent manner.
Projects and exercises carried out with Microsoft Power BI, applying data import, creation of interactive reports, filters, conditional formatting, and transformations with Power Query, as part of the Power BI Fundamentals course – by Santander Open Academy, as well as other exercises I will do later to reinforce the knowledge acquired from the course in question or more challenging projects.
Optional course that I took in 8th semester guided by Dr. Olanda Prieto Ordaz.
Practical Data Science course focused on the complete cycle of a Machine Learning project: data acquisition and cleaning, exploratory analysis, supervised and unsupervised modeling, validation, and deployment. This experience allowed me to apply ML techniques to real-world problems and build reproducible artifacts for my portfolio.
-
Social Network Analysis Introductory exercise developed in Google Colab using the book Data Science from Scratch as a reference. The main objective was to apply data structures in Python to answer questions related to a small, fictitious social network of employees.
-
Web Scraping Following a YouTube tutorial on web scraping in Python; an analysis stage was added that extracts mentions of Open Source tools for Data Science and groups them by categories (data management, integration, visualization, etc.). Graphs were created showing the frequency of mentions per tool/category.
-
Linear Regression Car price prediction. Includes: Train/Test partitioning, EDA, preprocessing pipeline (imputation, encoding, scaling when applicable), linear regression baseline, hyperparameter search (Grid/Random), evaluation with RMSE, and final validation in the test.
-
End-to-end project End-to-end project inspired by the hands-on-ml2 repository (Andreas Géron). Complete workflow: EDA, preprocessing pipeline, model training (Linear Regression, Decision Tree, Random Forest), fitting with RandomizedSearchCV, comparison by RMSE, and local deployment of the best model with Streamlit (interface for making predictions).
-
Lung Cancer dataset projects Implementation and comparison of multiple Machine Learning models on a Kaggle database about Lung Cancer.
-
Amazon Predictor Complete system to predict the Adjusted Close (Adj Close) of Amazon using historical data: EDA, creation of lags (lagged features), pipelines, comparison of classic models and networks (Ridge, SVR, RandomForest, Voting, AdaBoost, GradientBoosting, XGBoost, MLP, DNN, LSTM). The best model was selected based on metrics (RMSE, MAE, MSE) and deployed locally using Streamlit for interactive prediction.
- Complete all Kaggle courses to strengthen SQL, Machine Learning, and other related topics
- Add new tools currently used in Data Analysis
- Conduct Intermediate and Advanced Excel exercises using custom or Kaggle databases
- Add more personal projects with real or simulated data
- Continuously improve the documentation and design of the portfolio
Repositories and sections updated as of 25/01/2026.