You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Project 15: Distributed Text Mining and Sentiment Analysis (Variant B)
Use MapReduce to analyze a large corpus of text for word frequency and sentiment insights. Classify sentiment using a pre‑trained lexicon (positive/negative/neutral).
1. Project Overview
Objective:
Compute word frequencies over a large text corpus (tweets, reviews, headlines)
Classify each document’s sentiment (positive/negative/neutral) using a pre‑trained lexicon
Deliverable:
Command‑line tool that ingest raw text, run MapReduce jobs, and print aggregate word counts and sentiment summaries
2. Team Members
Stepan Vagin (lead)
Lana Ermolaeva
Savva Ponomarev
Vyacheslav Molchanov
Danil Valiev
3. Team Roles (5 Engineers)
Engineer
Primary Focus
E1
Environment & Infrastructure (Hadoop/HDFS or local‑cluster setup)
E2
Data Ingestion & Preprocessing (reading, cleaning, partitioning)