MovieAnalysis

Scenario & Introduction

Problem Statement

Streaming platforms ingest hundreds of new titles every month, yet most arrive with only minimal metadata. Without reliable genre tags or similarity links, users struggle to discover relevant content, and providers miss out on engagement opportunities.

Business / Research Goal

Our goal is to automatically enrich newly ingested movies by providing:

Primary genre predictions derived solely from a plot summary.
Content-based recommendations that surface semantically similar titles on day 0.

Concrete Research Questions",

#	Question
RQ-1	How accurately can genres be inferred from plot texts, and which model performs best?
RQ-2	Which text-representation strategy (e.g., TF-IDF, Word2Vec, Sentence-BERT) yields the best performance for the recommendation system?

Strategy & Methodology

Our workflow is organised in four notebooks, each mapping to a specific project phase:

Phase / Notebook	Key actions
DataExtraction & Preprocessing	• Download TMDB metadata and Wikipedia plots • Normalise titles, merge datasets, remove duplicates
DataExploration	• Inspect class balance, text length, language distribution • Visualise token counts & top n-grams
GenreClassification	• Train and compare models: Logistic Regression, SVM, Random Forest, LSTM, BERT
RecommenderSystem	• Experiment with TF-IDF, Word2Vec, Sentence-BERT embeddings • Retrieve neighbours via cosine similarity & HNSW

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
1. Scenario & Introduction		1. Scenario & Introduction
2. Data extraction and preprocessing		2. Data extraction and preprocessing
3. Data exploration		3. Data exploration
4. ML Task - Genre Classification		4. ML Task - Genre Classification
5. ML Task - Recommender System		5. ML Task - Recommender System
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieAnalysis

Scenario & Introduction

Problem Statement

Business / Research Goal

Concrete Research Questions",

Strategy & Methodology

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MovieAnalysis

Scenario & Introduction

Problem Statement

Business / Research Goal

Concrete Research Questions",

Strategy & Methodology

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages