Multimodal Fake News Detection Ensemble for Combating Online Misinformation

This project involves curating and integrating three major fake news datasets—Fakeddit, Weibo, and FakeNewsNet—followed by extensive feature engineering and the development of a deep learning model. By combining textual, visual, and metadata features across multiple languages, the project aims to build a robust multilingual, multimodal system for fake news classification.

Report: https://docs.google.com/document/d/18P5IHFZBCZZa9uoN0dCsvX3OWVefIykAAFsvcLglQkI/edit?usp=drive_link

Datasets

Fakeddit: A dataset sourced from Reddit, consisting of both rumor and non-rumor posts. Each post contains metadata, text, and images.
Weibo: A collection of tweets from the Weibo platform, labeled as rumors or non-rumors. Each tweet contains metadata, text, and associated images.
FakeNewsNet: A dataset containing news articles with labels indicating whether they are fake or real, along with relevant metadata and images.

Folder Structure

data/: This folder contains the raw data for the three datasets.
- fakenewsnet/: Processed FakeNewsNet dataset.
- weibo/: Processed Weibo dataset.
- fakeddit/: Processed Fakeddit dataset.
- image_dump/: A folder where all the images from different sources are stored.
scripts/: Python scripts for processing the datasets, performing feature engineering, and saving the processed data.
- data_processing: Notebooks for data preprocessing and joining of the datasets
- feature_engineering: Notebook for feature engineering
- model: Notebook for model training and evaluation
requirements.txt: Python dependencies needed for the project.

Note: GitHub may not render Jupyter notebooks correctly if they were created using Google Colab, particularly due to compatibility issues with interactive widget metadata. However, all .ipynb files in this repository can be downloaded and opened locally in Jupyter Notebook or Colab with full outputs preserved. This includes all model outputs, visualizations, and metrics.

Setup Guide

Prerequisites

Python 3.11 or higher
Git
Pip package manager

Installation

Clone the repository:

git clone https://github.com/yourusername/FakeNewsProject.git
cd FakeNewsProject

Create and activate a virtual environment (recommended):

python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

Install the required dependencies:
```
pip install -r requirements.txt
```
Configure environment variables:
- Copy the .env_sample file to .env
```
cp .env_sample .env
```
- Edit the .env file and add your Reddit API credentials (required for scraping Fakeddit data)

Dataset Setup

The datasets used in this project are large. Two options are available:

Download the processed datasets:
- The processed dataset is available on Kaggle
- The feature-engineered dataframe is available on Google Drive
Process the datasets from scratch:
- Run the data processing notebooks in the scripts/data_processing directory in the following order:
  1. (FakeNewsNet) fakenewsnet_preprocessing.ipynb
  2. (Weibo) weibo_preprocessing.ipynb
  3. (Fakeddit) reddit_scraper.ipynb
  4. (Fakeddit) article_image_scraper.ipynb
  5. final_dataset_preperation.ipynb

Running Feature Engineering

After setting up the datasets, run the feature engineering notebook:

jupyter notebook scripts/feature_engineering/feature_engineering.ipynb

Training and Evaluating Models

Run the model training and evaluation notebook:

jupyter notebook scripts/model/model_train_eval.ipynb

Workflow

Results

Below are the evaluation metrics of our multimodal model on the test set. For more details refer to model_train_eval.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
data		data
scripts		scripts
.env_sample		.env_sample
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Fake News Detection Ensemble for Combating Online Misinformation

Datasets

Folder Structure

Setup Guide

Prerequisites

Installation

Dataset Setup

Running Feature Engineering

Training and Evaluating Models

Workflow

Results

Confusion Matrix

Precision, Recall, F1 Score

ROC-AUC Curve

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Fake News Detection Ensemble for Combating Online Misinformation

Datasets

Folder Structure

Setup Guide

Prerequisites

Installation

Dataset Setup

Running Feature Engineering

Training and Evaluating Models

Workflow

Results

Confusion Matrix

Precision, Recall, F1 Score

ROC-AUC Curve

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages