Skip to content

Reduce RAM usage#18

Merged
RadValentin merged 12 commits into
mainfrom
17-reduce-ram-usage
May 5, 2026
Merged

Reduce RAM usage#18
RadValentin merged 12 commits into
mainfrom
17-reduce-ram-usage

Conversation

@RadValentin
Copy link
Copy Markdown
Owner

@RadValentin RadValentin commented Apr 24, 2026

Summary

This PR reduces total RAM usage by not loading debug data in production (feature_matrix_raw), optimizing track ids to be stored as UUID bytes instead of strings and storing genre labels as integer codes instead of strings.

Part of feature matrix RAM used before RAM used after Change (after - before) Notes
feature_matrix 321 MB 321 MB 0 MB no change
mbid_to_idx 443 MB 80 MB -363 MB
years 11 MB 10 MB -1 MB negligible change
feature_matrix_raw 321 MB 322 MB +1 MB won't be loaded in prod
feature_names 0 MB 0 MB 0 MB no change
genre_dortmund 353 MB 10 MB -343 MB
genre_rosamerica 283 MB 10 MB -273 MB
  • Without the feature matrix loaded, the app uses 126 MB of RAM.
  • After the optimization app will use 560 MB of RAM in prod vs 1858 MB before. This means I could run 4 workers on a machine with 4GB RAM and 2 vcores.

Co-authored-by: Copilot <copilot@github.com>
@RadValentin RadValentin linked an issue Apr 24, 2026 that may be closed by this pull request
3 tasks
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 90.60773% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.52%. Comparing base (9061c19) to head (a53a7c0).
⚠️ Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
backend/recommend_api/services/recommender.py 81.63% 9 Missing ⚠️
backend/ingest/track_processing_helpers.py 0.00% 2 Missing ⚠️
backend/recommend_api/api/track.py 83.33% 2 Missing ⚠️
backend/recommend_api/models.py 83.33% 2 Missing ⚠️
...recommend_api/tests/services/test_feature_store.py 93.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #18      +/-   ##
==========================================
+ Coverage   90.36%   90.52%   +0.15%     
==========================================
  Files          36       37       +1     
  Lines        1754     1868     +114     
  Branches      122      130       +8     
==========================================
+ Hits         1585     1691     +106     
- Misses        130      137       +7     
- Partials       39       40       +1     
Files with missing lines Coverage Δ
backend/recommend_api/api/artist.py 88.37% <100.00%> (+0.56%) ⬆️
backend/recommend_api/api/genre.py 100.00% <100.00%> (ø)
backend/recommend_api/api/recommend.py 77.02% <100.00%> (ø)
backend/recommend_api/api/search.py 83.33% <ø> (ø)
backend/recommend_api/serializers.py 100.00% <100.00%> (ø)
backend/recommend_api/tests/api/test_genre_api.py 100.00% <100.00%> (ø)
backend/recommend_api/tests/api/test_track_api.py 100.00% <100.00%> (ø)
backend/recommend_api/tests/factories.py 98.07% <100.00%> (+1.01%) ⬆️
...d/recommend_api/tests/services/test_recommender.py 100.00% <100.00%> (ø)
backend/ingest/track_processing_helpers.py 76.21% <0.00%> (ø)
... and 4 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@RadValentin RadValentin marked this pull request as ready for review April 28, 2026 12:24
@RadValentin RadValentin requested a review from Copilot April 28, 2026 12:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces memory usage in the recommender stack by compacting in-memory feature-store identifiers/metadata and by normalizing genres into dedicated tables.

Changes:

  • Store MBIDs in the feature index as 16-byte UUID values (V16) and optionally avoid loading raw feature matrices in production (DEBUG-only).
  • Replace track genre string fields with FK references to GenreDortmund / GenreRosamerica, updating serializers and genre/track APIs accordingly.
  • Update ingest pipeline, management command, and tests/factories to use compact genre codes and UUID MBIDs.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
backend/recommend_api/services/recommender.py Loads compact MBID/genre arrays, adds feature accessors, and emits UUID strings in recommendation output.
backend/recommend_api/models.py Introduces genre lookup tables and switches Track genres to FK relationships.
backend/recommend_api/migrations/0021_genredortmund_genrerosamerica_and_more.py Adds genre tables and alters track genre fields to FKs.
backend/recommend_api/serializers.py Exposes genre labels via FK (genre_*.label) and makes raw_features optional.
backend/recommend_api/api/track.py Uses new FeatureStore accessors; conditionally includes raw features.
backend/recommend_api/api/genre.py Lists genres from new genre tables rather than distinct track fields.
backend/recommend_api/tests/services/test_recommender.py Updates service tests to use UUID MBIDs and numeric genre codes.
backend/recommend_api/tests/factories.py Adds genre factories and updates TrackFactory to create FK genres.
backend/recommend_api/tests/api/test_track_api.py Updates tests for feature endpoint to use new store accessors and UUID MBIDs.
backend/recommend_api/tests/api/test_genre_api.py Updates genre API tests to seed genre tables directly.
backend/ingest/pipeline.py Writes compact MBID bytes and uint16 genre codes into the NPZ feature/index artifact; populates genre tables.
backend/ingest/track_processing_helpers.py Minor formatting/comment cleanup; retains genre labels during extraction for later coding.
backend/ingest/management/commands/recommend.py Updates CLI to map numeric genre codes back to labels and improves robustness.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/recommend_api/tests/factories.py
Comment thread backend/recommend_api/tests/factories.py
Comment thread backend/ingest/management/commands/recommend.py
Comment thread backend/recommend_api/services/recommender.py Outdated
Comment thread backend/ingest/pipeline.py
Comment thread backend/recommend_api/serializers.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • frontend/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/recommend_api/api/track.py
@RadValentin RadValentin merged commit aab88da into main May 5, 2026
1 check passed
@RadValentin RadValentin deleted the 17-reduce-ram-usage branch May 5, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reduce RAM usage of in-memory feature store

2 participants