Skip to content

rshalinisai/FakeNewsProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Fake News Detection Ensemble for Combating Online Misinformation

This project involves curating and integrating three major fake news datasets—Fakeddit, Weibo, and FakeNewsNet—followed by extensive feature engineering and the development of a deep learning model. By combining textual, visual, and metadata features across multiple languages, the project aims to build a robust multilingual, multimodal system for fake news classification.

Report: https://docs.google.com/document/d/18P5IHFZBCZZa9uoN0dCsvX3OWVefIykAAFsvcLglQkI/edit?usp=drive_link

Datasets

  • Fakeddit: A dataset sourced from Reddit, consisting of both rumor and non-rumor posts. Each post contains metadata, text, and images.
  • Weibo: A collection of tweets from the Weibo platform, labeled as rumors or non-rumors. Each tweet contains metadata, text, and associated images.
  • FakeNewsNet: A dataset containing news articles with labels indicating whether they are fake or real, along with relevant metadata and images.

Folder Structure

  • data/: This folder contains the raw data for the three datasets.
    • fakenewsnet/: Processed FakeNewsNet dataset.
    • weibo/: Processed Weibo dataset.
    • fakeddit/: Processed Fakeddit dataset.
    • image_dump/: A folder where all the images from different sources are stored.
  • scripts/: Python scripts for processing the datasets, performing feature engineering, and saving the processed data.
    • data_processing: Notebooks for data preprocessing and joining of the datasets
    • feature_engineering: Notebook for feature engineering
    • model: Notebook for model training and evaluation
  • requirements.txt: Python dependencies needed for the project.

Note: GitHub may not render Jupyter notebooks correctly if they were created using Google Colab, particularly due to compatibility issues with interactive widget metadata. However, all .ipynb files in this repository can be downloaded and opened locally in Jupyter Notebook or Colab with full outputs preserved. This includes all model outputs, visualizations, and metrics.

Setup Guide

Prerequisites

  • Python 3.11 or higher
  • Git
  • Pip package manager

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/FakeNewsProject.git
    cd FakeNewsProject
    
  2. Create and activate a virtual environment (recommended):

    python -m venv venv
    # On Windows
    venv\Scripts\activate
    # On macOS/Linux
    source venv/bin/activate
    
  3. Install the required dependencies:

    pip install -r requirements.txt
    
  4. Configure environment variables:

    • Copy the .env_sample file to .env
    cp .env_sample .env
    
    • Edit the .env file and add your Reddit API credentials (required for scraping Fakeddit data)

Dataset Setup

The datasets used in this project are large. Two options are available:

  1. Download the processed datasets:

    • The processed dataset is available on Kaggle
    • The feature-engineered dataframe is available on Google Drive
  2. Process the datasets from scratch:

    • Run the data processing notebooks in the scripts/data_processing directory in the following order:
      1. (FakeNewsNet) fakenewsnet_preprocessing.ipynb
      2. (Weibo) weibo_preprocessing.ipynb
      3. (Fakeddit) reddit_scraper.ipynb
      4. (Fakeddit) article_image_scraper.ipynb
      5. final_dataset_preperation.ipynb

Running Feature Engineering

After setting up the datasets, run the feature engineering notebook:

jupyter notebook scripts/feature_engineering/feature_engineering.ipynb

Training and Evaluating Models

Run the model training and evaluation notebook:

jupyter notebook scripts/model/model_train_eval.ipynb

Workflow

Data Processing Workflow

Results

Below are the evaluation metrics of our multimodal model on the test set. For more details refer to model_train_eval.ipynb.

Confusion Matrix

Confusion Matrix

Precision, Recall, F1 Score

Classification Report

ROC-AUC Curve

ROC AUC Curve

Acknowledgments

The datasets used in this project are from the following repositories:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors