About β’ Features β’ Requirements β’ Installation β’ Usage β’ Architecture β’ Contributing β’ License
MV Recommender System is an interactive web application built with Python and Streamlit that delivers personalized movie recommendations powered by collaborative filtering. Using a pre-trained KNN (K-Nearest Neighbors) model trained on real user ratings from the MovieLens dataset, the system learns viewing patterns and suggests movies tailored to your preferences.
Select 1-3 movies you love, and get instant recommendations enriched with posters, synopses, and metadata from The Movie Database (TMDB).
Discover your next favorite film. Powered by collaborative filtering and machine learning.
| Feature | Description | Benefit |
|---|---|---|
| Collaborative Filtering | KNN-based item-item recommendation | Learns from millions of user ratings |
| Multi-Movie Selection | Choose 1-3 reference films | More accurate personalization |
| Adjustable Results | 1-10 recommendations in real-time | Find exactly what you need |
| Rich Metadata | Posters, synopses, ratings from TMDB | Complete movie information |
| Interactive UI | Intuitive Streamlit interface | Seamless user experience |
- Pre-trained KNN Model - Instant recommendations without training time
- Sparse Matrix Optimization - Efficient memory usage for millions of ratings
- Docker Containerization - One-command deployment
- API Integration - Real-time data from TMDB
- Production Ready - Error handling and fallback mechanisms
| Component | Technology | Purpose |
|---|---|---|
| Algorithm | K-Nearest Neighbors | Find similar movies |
| Similarity Metric | Cosine Distance | Measure movie similarity |
| Data Structure | Sparse CSR Matrix | Efficient storage (100M+ ratings) |
| Training Data | MovieLens 25M | Real user preferences |
| OS | Support | Notes |
|---|---|---|
| Windows 10+ | β Native | Via Docker or Python |
| Linux | β Native | Ubuntu 18.04+ tested |
| macOS | β Native | Intel & Apple Silicon |
- Python: 3.8 or higher
- Docker: Any version (optional, recommended)
- Docker Compose: Any version (optional)
| Resource | Minimum | Recommended |
|---|---|---|
| Memory | 512 MB | 2 GB |
| Disk | 1 GB | 2 GB |
| Internet | Required | For TMDB API calls |
- TMDB API Key - Free registration
Prerequisites: Docker and Docker Compose installed
# Clone repository
git clone <REPOSITORY_URL>
cd mv-recommender-system
# Download models and data (see Setup step 2)
# ...
# Create .env file with TMDB API key (see Setup step 3)
# ...
# Run with Docker Compose
docker compose up -dOpen browser: http://localhost:8501
β Simplest | β±οΈ ~30 seconds | π Isolated environment
Prerequisites: Python 3.8+
# Clone repository
git clone <REPOSITORY_URL>
cd mv-recommender-system
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# Linux/macOS:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Download models and data (see Setup step 2)
# ...
# Create .env file with TMDB API key (see Setup step 3)
# ...
# Run application
streamlit run src/app.pyβ Full control | β±οΈ ~2-3 minutes | π― Direct development
git clone https://github.com/Paulogb98/MV-Recommender-System.git
cd MV-Recommender-SystemThe pre-trained KNN model and MovieLens dataset are available on Google Drive.
Required files:
| File | Destination | Purpose |
|---|---|---|
knn_model.pkl |
recommender/models/ |
Pre-trained KNN model |
mappers.pkl |
recommender/models/ |
Movie & user ID mappings |
movies.csv |
data/ |
Movie metadata |
links.csv |
data/ |
TMDB/IMDB mappings |
ratings.csv |
data/ |
(Optional) For retraining |
Directory structure after download:
MV-Recommender-System/
βββ data/
β βββ movies.csv
β βββ links.csv
β βββ ratings.csv (optional)
βββ recommender/
β βββ models/
β βββ knn_model.pkl
β βββ mappers.pkl
βββ ...
- Register at TMDB: https://www.themoviedb.org
- Get API Key: Settings β API β Create Request β Copy API Key
- Create
.envfile in project root:
TMDB_API_KEY=your_api_key_hereβ
Never commit .env to version control (already in .gitignore)
Using Docker Compose:
docker compose up -d
# Access: http://localhost:8501Using Python locally:
streamlit run src/app.py
# Access: http://localhost:8501βββββββββββββββββββββββββββββββ
β 1. Select 1-3 Movies β
β (from dropdown menu) β
ββββββββββ¬βββββββββββββββββββββ
β
ββββββββββΌβββββββββββββββββββββ
β 2. Choose # of Results β
β (1-10 recommendations) β
ββββββββββ¬βββββββββββββββββββββ
β
ββββββββββΌβββββββββββββββββββββ
β 3. Click Submit β
β (or Add Filter for more)β
ββββββββββ¬βββββββββββββββββββββ
β
ββββββββββΌβββββββββββββββββββββ
β 4. Get Recommendations β
β (with posters & info) β
βββββββββββββββββββββββββββββββ
| Component | Type | Purpose | Range |
|---|---|---|---|
| Movie Selector | Dropdown | Choose reference films | 1-3 movies |
| Results Slider | Number Input | Set recommendation count | 1-10 |
| Add Filter Button | Button | Add more reference films | N/A |
| Submit Button | Button | Generate recommendations | N/A |
| Gallery View | Grid | Display movie posters | Dynamic |
Scenario 1: Single Movie Reference
1. Select: "The Matrix"
2. Slide: 5 recommendations
3. Click: Submit
4. Result: 5 sci-fi movies similar to The Matrix
Scenario 2: Multiple References
1. Select: "Inception"
2. Click: Add Filter
3. Select: "Interstellar"
4. Click: Add Filter
5. Select: "The Prestige"
6. Slide: 10 recommendations
7. Click: Submit
8. Result: 10 movies matching all three preferences
ββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit UI Layer β
β (Dropdowns, Sliders, Buttons, Gallery) β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌββββββββββββββββββββββββββββββββ
β Business Logic Layer β
β (utils_functions.py) β
β β’ Movie selection & mapping β
β β’ KNN recommendation engine β
β β’ TMDB API integration β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌββββββββββββββββββββββββββββββββ
β Model & Data Layer β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β Pre-trained KNN Model β β
β β (knn_model.pkl - 100M+ ratings) β β
β ββββββββββββββββββββββββββββββββββββββββββ€ β
β β Movie Mappings (mappers.pkl) β β
β β (movieId β Index mapping) β β
β ββββββββββββββββββββββββββββββββββββββββββ€ β
β β CSV Data β β
β β (movies.csv, links.csv, ratings.csv) β β
β ββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββΌββββββββββββββββββββββββββββββββ
β External APIs β
β β’ TMDB (Movie Posters) β
β β’ IMDB (via links mapping) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
User Input (Movie Titles)
β
βΌ
Map Titles β Movie IDs (via movies.csv)
β
βΌ
Get KNN Indices (via knn_model.pkl)
β
βΌ
Query Neighbors (cosine similarity)
β
βΌ
Map Indices β Movie IDs (via mappers.pkl)
β
βΌ
Get TMDB IDs (via links.csv)
β
βΌ
Fetch Posters (TMDB API)
β
βΌ
Display Gallery
MV-Recommender-System/
βββ data/
β βββ links.csv # TMDB/IMDB ID mappings
β βββ movies.csv # Movie metadata & titles
β βββ ratings.csv # User ratings (for training)
β
βββ recommender/
β βββ model.ipynb # Model training notebook
β βββ models/
β βββ knn_model.pkl # Pre-trained KNN
β βββ mappers.pkl # ID mappings
β
βββ src/
β βββ app.py # Streamlit application
β
βββ utils/
β βββ utils_functions.py # Helper functions
β
βββ assets/
β βββ img/
β β βββ mv-square-logo.png
β β βββ mv-horizontal-logo.png
β βββ gif/
β βββ homepage.gif
β
βββ .env # Environment variables (TMDB API)
βββ .gitignore # Git ignore rules
βββ .dockerignore # Docker build ignore
βββ docker-compose.yml # Docker orchestration
βββ Dockerfile # Docker image definition
βββ requirements.txt # Python dependencies
βββ README.md # This file
Algorithm Type: Item-Item Collaborative Filtering
Method: K-Nearest Neighbors (KNN)
Distance Metric: Cosine Similarity
Similarity(Movie A, Movie B) = Ξ£(rating_user_i_movie_a Γ rating_user_i_movie_b)
ββββββββββββββββββββββββββββββββββββββββββββ
||Movie A|| Γ ||Movie B||
| Parameter | Value | Description |
|---|---|---|
| algorithm | brute |
Exhaustive search (accurate) |
| metric | cosine |
Cosine distance similarity |
| n_neighbors | User-defined (1-10) | Number of recommendations |
| weights | uniform | Equal weight for all neighbors |
- Source: MovieLens 25M dataset
- Size: 25,000,095 ratings
- Movies: 62,423 unique films
- Users: 162,541 unique users
- Rating Scale: 0.5 to 5.0 stars
- Sparsity: 99.97% (sparse matrix ideal)
# Item-User Matrix: Shape (# movies, # users)
# Each cell = user rating for movie
# Sparse format: Only non-zero ratings stored (CSR matrix)
# Memory efficient: ~2GB for 25M ratings vs ~10GB dense
Example:
User1 User2 User3 ... User162541
Movie1 4.5 3.0 NaN ... 4.0
Movie2 NaN 4.0 3.5 ... NaN
Movie3 3.0 NaN 4.5 ... 3.5
...
Movie62423 4.0 3.5 NaN ... 4.5To retrain the model with updated ratings:
# 1. Update data/ratings.csv with new data
# 2. Open recommender/model.ipynb
# 3. Run all cells
# 4. Models regenerated: knn_model.pkl, mappers.pklNotebook provides:
- Data loading and preprocessing
- Item-User matrix creation
- KNN model training
- Model serialization (pickle)
- Mapper creation
Select: "Mad Max: Fury Road"
Recommend: 5 movies
β
Results:
β’ John Wick
β’ Fast & Furious 7
β’ Mission: Impossible - Fallout
β’ Deadpool
β’ The Raid 2
Select: "Blade Runner 2049"
"2001: A Space Odyssey"
"Arrival"
Recommend: 10 movies
β
Results:
β’ Dune (2021)
β’ Ex Machina
β’ The Matrix
β’ Minority Report
β’ Total Recall
β’ Interstellar
β’ Inception
β’ Tron: Legacy
β’ Ghost in the Shell
β’ Passengers
Select: "Parasite"
Recommend: 3 movies
β
Results:
β’ Moonlight
β’ Manchester by the Sea
β’ The Farewell
Solution: Install dependencies
pip install -r requirements.txt
Cause: Invalid or missing API key
Solution:
1. Verify TMDB_API_KEY in .env file
2. Check API key at https://www.themoviedb.org/settings/api
3. Ensure .env file is in project root
4. Restart application after updating .env
Cause: Model files not downloaded
Solution: Download from Google Drive (Setup Step 2)
Extract to recommender/models/ directory
Cause: Stale browser cache or session
Solution:
1. Hard refresh browser (Ctrl+Shift+R or Cmd+Shift+R)
2. Clear browser cookies
3. Restart application
Cause: Container not running
Solution:
docker compose up -d
docker compose ps # Verify status
docker compose logs # View errors
Cause: Movie title typo or not in dataset
Solution:
1. Check spelling in MovieLens database
2. Try partial movie name
3. Use search feature to filter results
| Operation | Time | Notes |
|---|---|---|
| Model Load | ~500ms | First run, then cached |
| Recommendation Query | ~50ms | For 1-3 reference movies |
| TMDB Poster Fetch | ~500-2000ms | Per 10 movies, depends on API |
| Page Render | ~100ms | After data ready |
| Total End-to-End | ~2-3s | User selects β sees results |
- Sparse Matrix (CSR) - Efficient memory: ~80% reduction vs dense
- Pre-trained Model - Skip training step (hours β milliseconds)
- API Caching - Reduce redundant TMDB requests
- Streamlit Caching - Cache expensive computations
Contributions are welcome!
# 1. Fork repository
# 2. Create feature branch
git checkout -b feature/YourFeature
# 3. Make changes
# 4. Commit
git commit -m 'feat: add YourFeature'
# 5. Push
git push origin feature/YourFeature
# 6. Open Pull Request- β New recommendation algorithms (SVD, Neural Collaborative Filtering)
- β UI/UX improvements (dark mode, better filters)
- β Additional movie metadata sources
- β Performance optimizations
- β Documentation and examples
- β Unit and integration tests
- β Deployment guides (AWS, Heroku, etc.)
For advanced users: update recommendations with new user data
# 1. Load ratings
ratings = pd.read_csv("data/ratings.csv")
# 2. Create matrix
X, movie_mapper, user_mapper, movie_inv_mapper, user_inv_mapper = \
create_item_user_matrix(ratings)
# 3. Train model
train_and_save_item_knn(X, movie_mapper, movie_inv_mapper)
# 4. Restart app - new model automatically loadedSee recommender/model.ipynb for complete walkthrough.
| Metric | Value |
|---|---|
| Total Ratings | 25,000,095 |
| Unique Movies | 62,423 |
| Unique Users | 162,541 |
| Rating Scale | 0.5 - 5.0 β |
| Time Span | 1995 - 2019 |
| Data Freshness | 2018 (last update) |
Citation:
Harper, F. M., & Konstan, J. A. (2015).
The MovieLens datasets: History and context.
ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4), 19.
Download: https://grouplens.org/datasets/movielens/
| Component | Technology | Version |
|---|---|---|
| Frontend | Streamlit | Latest |
| Backend | Python | 3.8+ |
| ML Model | Scikit-Learn | 1.0+ |
| Sparse Matrix | SciPy | 1.7+ |
| Data Processing | Pandas | 1.3+ |
| Model Serialization | Joblib | 1.0+ |
| API Integration | Requests | 2.26+ |
| Environment | Python-dotenv | 0.19+ |
| Containerization | Docker | Any |
This project is licensed under the MIT License.
MIT License
Copyright (c) 2024 Paulo G.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction...
See LICENSE file for details.
- π¬ MovieLens - Dataset and research foundation
- π₯ TMDB (The Movie Database) - Movie posters and metadata
- π Streamlit - Beautiful web framework
- π€ Scikit-Learn - Machine learning tools
- π Python Community - Ecosystem and libraries
| Channel | Type | Response Time |
|---|---|---|
| GitHub Issues | Bugs/Features | 24-48h |
| GitHub Discussions | Questions | 24-48h |
| Urgent | 12-24h |
π LinkedIn: https://www.linkedin.com/in/paulo-goiss/
| Aspect | Status | Details |
|---|---|---|
| Development | β Active | Issues and PRs accepted |
| Production | β Ready | Stable v1.0 |
| Testing | β Complete | Cross-platform verified |
| Performance | β Optimized | Sub-second recommendations |
| Documentation | β Complete | Comprehensive guide |
| Docker Support | β Ready | Production container |
- β KNN collaborative filtering
- β 1-3 movie selection
- β 1-10 adjustable results
- β TMDB poster integration
- β Docker deployment
- β Interactive UI
- π Advanced filtering (genre, year, rating)
- π User ratings & feedback
- π Watchlist management
- π Trending movies section
- π Dark mode UI
- π Movie reviews integration
- π Neural Collaborative Filtering (NCF)
- π Hybrid recommendation system
- π User preference learning
- π Social recommendations
- π Web deployment (AWS/Heroku)
- π Mobile app
Built with β€οΈ in Python
π Repository β’
π Issues β’
π¦ Releases
MV Recommender System v1.0 | β Production Ready

