A lightweight hybrid recommender that blends collaborative filtering with content-based signals to surface personalised learning resources — with keyword explanations for every suggestion.
Three signals, weighted and blended:
| Signal | Model | What it captures |
|---|---|---|
| Collaborative | ALS (implicit feedback) | Patterns from similar users |
| Collaborative | LogisticMF (implicit) | Second opinion on user-item affinity |
| Content | TF-IDF cosine similarity | Keyword overlap with user history |
Each recommendation includes an explanation: the top TF-IDF terms connecting the item to the user's history.
| Metric | Score |
|---|---|
| MAP@10 | 1.000 |
| NDCG@10 | 1.000 |
| Recall@10 | 1.000 |
Evaluated on a 5-user × 8-item sample with leave-one-out holdout.
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Browse recommendations in the UI
streamlit run app.py
# Run tests
PYTHONPATH=. pytest tests/ -vsrc/
├── data.py — load CSVs, build sparse interaction matrix + TF-IDF content matrix
├── ids.py — user/item ↔ index mappings
├── als_train.py — ALS model (implicit library)
├── lmf_train.py — LogisticMF model (implicit library)
├── hybrid.py — scoring functions + keyword explainer
└── metrics.py — P@K, R@K, AP@K, MAP@K, NDCG@K
tests/
├── test_data.py — data loading, matrix shapes, mappings
├── test_hybrid.py — scoring functions, explainer
└── test_metrics.py — ranking metric correctness
data/sample/
├── items.csv — item_id, title, genre flags
└── ratings.csv — user_id, item_id, rating, timestamp
items.csv
item_id,title,Comedy,Drama,Action,...
1,Intro to Linear Algebra,0,1,0,...
ratings.csv
user_id,item_id,rating,ts
1,1,1,1000
Swap in any implicit-feedback dataset that follows this schema (e.g. MovieLens 100K).
