This project predicts London’s daily temperature using machine learning.
It follows a structured ML workflow with EDA, data cleaning, model training, and experiment tracking with MLflow.
london-weather-predictor/
├── data/
│ ├── raw/ # Original datasets
│ │ └── london\_weather.csv
│ ├── clean/ # Processed/clean datasets
│ │ └── london\_weather\_clean.csv
│ └── mlruns/ # MLflow experiment logs
├── notebooks/
│ ├── eda.ipynb # Data exploration and cleaning
│ └── train.ipynb # Model training & evaluation
├── src/
│ └── data/
│ └── clean.py # Data cleaning script
└── README.md
-
Clone this repo:
git clone https://github.com/joseph-ajegetina/london-weather-predictor.git cd london-weather-predictor```
-
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows pip install -r requirements.txt
-
Make sure the dataset is in
data/raw/london_weather.csv.
Run the cleaning script to generate the processed dataset:
-
EDA: explore dataset, visualize trends, and verify missing values
jupyter notebook notebooks/eda.ipynb
-
Training: train and evaluate models with MLflow tracking
jupyter notebook notebooks/train.ipynb
MLflow logs are stored in:
data/mlruns/
mlflow ui --backend-store-uri file:data/mlrunsOpen http://127.0.0.1:5000 to browse experiments.
-
EDA (
eda.ipynb)- Explore dataset (trends, correlations, missing values)
- Feature selection & preprocessing
-
Training (
train.ipynb)- Train multiple models:
- Linear Regression
- Decision Tree Regressor
- Random Forest Regressor
- Evaluate using RMSE, MAE, R²
- Log experiments with MLflow
- Train multiple models:
Shows relationships between weather variables and mean temperature.
Monthly average temperatures across years.
Metric value for the various models

- Python (pandas, numpy, matplotlib, seaborn, scikit-learn)
- MLflow for experiment tracking
- Jupyter Notebooks for development
- Add hyperparameter tuning (Grid/Random Search)
- Try Gradient Boosting / XGBoost / LightGBM
- Deploy best model as an API (FastAPI/Flask)

