This repository contains my completed coursework for Machine Learning Assignment 1. The focus is on building and evaluating supervised learning models from scratch using real-world datasets.
From fitting regression curves to tuning classification algorithms, this project demonstrates my ability to handle data preprocessing, apply core ML algorithms, and assess model performance with industry-standard metrics — all implemented in Python using the scikit-learn.
✅ Apply linear and polynomial regression to numerical data
✅ Use GridSearchCV for hyperparameter optimization
✅ Preprocess datasets: handle missing values, normalize, and encode features
✅ Train and evaluate classifiers: Logistic Regression, KNN, Naive Bayes
✅ Compare models using metrics like Accuracy, F1-score, Recall, Precision
✅ Use pipelines to ensure clean, modular, and reproducible ML workflows
The solution is presented in the Notebook
- Load and split dataset from
task1_data.csv - Train and evaluate linear regression model using:
- MSE, RMSE, MAE, R²
- Perform polynomial regression and select optimal degree using GridSearchCV
- Visualize and compare model performance
- Load
pokemon_modified.csvand perform:- Missing value imputation
- One-hot encoding for categorical features
- Feature scaling
- Train and tune:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Evaluate models using:
- Accuracy, Precision, Recall, F1-score
- Select best-performing model
- Python (Jupyter Notebook)
numpy,pandas,matplotlib,scikit-learn- Data preprocessing pipelines
- Cross-validation with
GridSearchCV - Classification and regression metrics
Valeria Neganova
Focus: Practical supervised learning, evaluation & model selection
📫 Valerochka.neganova@mail.ru