Table: runs (CSV in data/raw/runs.csv)
date (ISO, e.g., 2025-10-01)
distance_km (float)
duration_sec (int)
avg_hr (int, optional)
elev_gain_m (float, optional)
surface (categorical: road/trail/track/indoor)
workout_type (easy/tempo/interval/long/race)
sleep_h (float, optional)
rpe (1–10, optional)
shoes (string, optional)
Derived (placed in processed file by Day 1 script)
pace_sec_per_km = duration_sec / distance_km
is_race_like = workout_type in {"race","interval","tempo"}
target_best5k_next30d_sec — min t5k_equiv_sec achieved in the next 30 days after each date (future window; same-day excluded).
Dataset runs (examples): python -m scripts.make_features --dataset dhruva --inp data/processed/runs_clean__dhruva.csv python -m scripts.train_model --name dhruva --table 5k python -m scripts.make_features --dataset colby --inp data/processed/runs_clean__colby.csv python -m scripts.train_model --name colby --table 5k
Quickstart Install (creates venv/ and installs deps): make install
Run full pipeline (features → train → eval): make pipeline DATASET=dhruva TABLE=all STAMP=1 INP=data/processed/runs_clean.csv
Evaluate (latest or stamped run): python -m scripts.evaluate_model --name dhruva --table 5k --split test
Artifacts (stamped run): models//model.joblib models//metadata.json models//metrics.json reports/figures/pred_vs_actual__dhruva__all__.png
Predict (latest run): python -m scripts.predict --latest --name dhruva
Expected output (short): Prediction summary Run dir : models/latest Rows used : Pred (sec) : Pred (mm:ss): MM:SS
Dashboard (Streamlit) Entry point: app.py
Run locally: streamlit run app.py
Deployment (Streamlit Cloud):
- Set the app file to app.py
- The dashboard loads assets/sample_results.csv by default
Recommended CSV columns: dataset, table, model_name, cv_mae, test_mae, timestamp