Skip to content

DATA605_Spring2026_Real_Time_Stock_Market_Pipeline_Kafka_Spark #461

@aashishvinod

Description

@aashishvinod

Project Description
Author: @aashishvinod
Title: DATA605_Spring2026_Real_Time_Tweet_Sentiment_Analysis_Kafka
Course: DATA605 Spring 2026
Link: https://github.com/gpsaggese/gpsaggese.github.io/blob/master/class_project/data605/Spring2026/projects_descriptions/Apache_Kafka_Project_Description.md

Summary: This project builds a real-time tweet sentiment analysis pipeline using Apache Kafka for stream ingestion and a pre-trained HuggingFace RoBERTa model (cardiffnlp/twitter-roberta-base-sentiment) for sentiment classification. Tweets from the Sentiment140 dataset (1.6M labeled tweets) are streamed through Kafka, classified in real time, and visualized using a live Streamlit dashboard with sentiment trends, comparative model analysis, and anomaly detection.

Planned Workflow:

  1. Load and preprocess Sentiment140 dataset (1.6M tweets)
  2. Kafka producer setup (publish tweet events to Kafka topic)
  3. HuggingFace RoBERTa model loading and classification
  4. Kafka consumer setup (consume and classify tweets in real time)
  5. Sentiment aggregation (positive, negative, neutral counts and accuracy)
  6. Spark SQL analysis (sentiment distribution, high confidence predictions)
  7. Comparative model analysis (RoBERTa vs DistilBERT accuracy comparison)
  8. Anomaly detection (detect sudden spikes in sentiment using rolling statistics)
  9. Live Streamlit dashboard (real-time pie chart, trend chart, metrics)
  10. Performance analysis (Kafka producer throughput vs batch size)

Tools: Apache Kafka, HuggingFace Transformers (RoBERTa), Apache Spark (PySpark), Streamlit, Docker, Python, Sentiment140 Dataset

Assigned to: @aashishvinod @gpsaggese @protocorn

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status
    To review

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions