Forecast.fm: Weather-Conditioned Music Classification Engine

Predict a suitable weather category based on a song's audio features (valence, acousticness, tempo, loudness, energy)

Authors

Two Demo Paths

Option 1: Run the Website Locally

This starts the FastAPI backend + the Vite frontend.

Backend Setup:

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Set Spotify credentials (go to Spotify Web API):

export SPOTIPY_CLIENT_ID="your_spotify_client_id"
export SPOTIPY_CLIENT_SECRET="your_spotify_client_secret"

Start the API (uses backend/models/model.pkl if present):

uvicorn backend.app.main:app --reload --port 8000

Start the frontend in a second terminal:

cd frontend
npm install
npm run dev

Open the app using the given localhost server.

Video: https://drive.google.com/file/d/1nc7trHjiscxEsSmc9bRy0GR-1T8RSImB/view?usp=sharing

Option 2: Manual Features via the ML Script

This runs a simple prompt-driven predictor and does not use the web app.

Ensure the Python environment is ready (same as Demo 1).
Run the script and enter features when prompted:

python 'ml/features->weather.py'

The user will be prompted for energy [0.0, 1.0], valence [0.0, 1.0], tempo (BPM), acousticness [0.0, 1.0], and loudness (dB) in order, and the predicted weather category is printed.

Tech Stack

Backend: Python - scikit-learn, pandas, matplotlib, FastAPI (served via Uvicorn)
Frontend: React, Typescript, Tailwind CSS

Project Structure

.
├── backend
│   ├── api
│   └── app
├── data
├── frontend
│   ├── public
│   └── src
├── ml
│   └── results
├── README.md
└── requirements.txt

APIs Used

Spotify Web API (song lookup)
ReccoBeats API (audio features)
OpenWeatherMap API (real-time weather context) (NOT USED IN CURRENT VERSION)

ML Component

Data source: data/track_data.csv. We ultmately used 5201 labelled tracks for training/testing, around a third of which were pulled using the Spotify Web API from Spotify-generated and user-made playlists. As no other canonical dataset mapping songs to weather labels exists, and weather labels are weakly supervised, we prompted LLMS (ChatGPT and Anthropic) to generate supplemental data based on some explicit heuristics (e.g. melancholic and reflective ambiance -> rainy).
Features: energy, valence, tempo, acousticness, loudness. StandardScalar was used to standardize each feature.
Target label: weather: one of sunny, cloudy, rainy, snowy
Models in ml/models.py, trained using Pipeline to avoid data leakage: Naive Bayes, Logistic Regression (baseline), Random Forest, Gradient Boosting (production)

Training and evaluation scripts:

python ml/train.py
python ml/evaluate.py

Evaluation metrics and diagnostics used in ml/evaluate.py:

Weighted F1 on a simple holdout split was initially used, and we enhanced by using Stratified K-Fold cross-validation with weighted F1
Permutation feature importance (PFI)
Confusion matrices (visualized using matplotlib)

see ml/results/evaluate_results.txt for other final diagnostics

Conclusion

We found that energy, valence, tempo, acousticness, loudness, standardized using StandardScalar, were a set of features with consistent, moderately high PFI scores across multiple CV folds and all four models. A structured imbalance was present, with 'energy' (mean 0.224) slightly dominating weaker supporting features such as tempo (mean 0.113). We ultimately achieved 0.758 ± 0.014 weighted F1 with 5-fold stratified cross-validation on our Gradient Boosting production model.

We initially intended the Naive Bayes model to be a deliberately weak baseline, having reasoned that audio features such as 'energy' and 'loudness' are highly correlated and conditionally dependent. However we ended up with a surprisingly high 0.745 CV F1. Indeed, in our regime, where features tend to cluster across classes and predictive signal is distributed relatively evenly (see PFI), Naive Bayes can still effectively aggregate signals.

More expressive models such as Gradient Boosting are nonetheless able to achieve additional gains by modeling residual feature dependencies. For instance, Naive Bayes struggled on differentiating between the similar classes of snowy and rainy, but Gradient Boosting reduced from 146 to 59 such misclassifications.

Notes/Possible Improvements

As of November 2024, Spotify API does not provide access to audio features. ReccoBeats was thus added for audio features but API experienced high latency. In the future we can experiment with caching song information for faster response times.
In the future, sentiment/semantic analysis on song lyrics could be used to further our model. For example, our model classifies "Jingle Bell Rock" by Brenda Lee as sunny, but the lyrics are definitely more indicative of a snowy song.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forecast.fm: Weather-Conditioned Music Classification Engine

Authors

Two Demo Paths

Option 1: Run the Website Locally

Option 2: Manual Features via the ML Script

Tech Stack

Project Structure

APIs Used

ML Component

Conclusion

Notes/Possible Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
backend		backend
data		data
frontend		frontend
ml		ml
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Forecast.fm: Weather-Conditioned Music Classification Engine

Authors

Two Demo Paths

Option 1: Run the Website Locally

Option 2: Manual Features via the ML Script

Tech Stack

Project Structure

APIs Used

ML Component

Conclusion

Notes/Possible Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages