Full FPL pipeline: cleans FBRef + FPL data, predicts player points by combining start probabilities and conditional points. Runs multi-gameweek optimisation with transfers and fixed player options, and utilises Monte Carlo simulations to give outcomes with quantiles.
- Data ingestion: scrape/merge FBRef + FPL, normalize teams, dedupe matches, filter low minutes.
- Availability: leak‑safe per‑season rolling starts -> start_prob (shifted windows, optional league‑only).
- Features: rolling (3/7) performance stats + pre‑match fixture features (home, team/opponent strength).
- ML models: per‑position XGBoost (option NN) predicting conditional points (if player starts).
- Uncertainty: per‑player residual std + start_prob -> unconditional mean & variance (law of total variance).
- Optimiser: PuLP MILP over multi‑GW horizon (squad, XI, captain, transfers, penalties, constraints).
- Simulation: Monte Carlo of starts + conditional points to get distribution (mean, std, quantiles).
- Reporting: per‑GW transfers, captaincy, XI, bench ordering by expected points; player prediction CSV.
- ID hygiene: manual FBRef ID overrides, duplicate match collapse, safe merges.
- Expected points per player per future GW (conditional & unconditional).
- Optimised multi‑GW squad plan (squad_ids, xi_ids, captains, transfers).
- Risk metrics (distribution quantiles).
- Transparent intermediate artefacts (residual stats, per‑player uncertainty).
- Create a Python virtual environment (recommended):
python -m venv .venv .venv\Scripts\activate
- Install required packages:
pip install -r requirements.txt
Prep: fixture_scraper.py, install chromedriver run_pipeline.py
- Credit to https://github.com/vaastav/Fantasy-Premier-League for global_scraper.py and FPL Data