Skip to content

Praveenkumar76/Toxic-Comment-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚫 Toxic Comment Classification using BERT

This project classifies toxic as well as sarcastic comments using a fine-tuned BERT model with an accuracy of over 95%. It integrates unlabeled data from the Jigsaw Toxic Comment Classification Challenge and utilizes a custom API wrapper named isomina for training orchestration and evaluation.


🧠 Model

  • Base: bert-base-uncased
  • Fine-tuned on labeled and pseudo-labeled Jigsaw dataset
  • Multi-label classification (toxic, severe_toxic, obscene, threat, insult, identity_hate)
  • Trained using custom isomina API wrapper for streamlined experimentation and deployment

📊 Output Examples

| Visualization of Predictions | output1 output2

  • In output2.png, we see that a sarcastic message was correctly identified.

🚀 Features

  • 95% validation accuracy

  • Semi-supervised training using pseudo-labeling
  • BERT fine-tuning pipeline
  • Easy experimentation with isomina API
  • Robust evaluation with ROC AUC, F1, and accuracy metrics


🔧 Installation & Setup

1. Clone the Repository

git clone https://github.com/yourusername/toxic-comment-bert.git
cd toxic-comment-bert

Requirements

  • transformers
  • pandas
  • numpy
  • scikit-learn
  • matplotlib
  • seaborn
  • tqdm
  • isomina # Custom or private package, ensure it's accessible
  • torch>=1.10.0

About

A high-accuracy toxic comment classifier using BERT, fine-tuned on Jigsaw data with semi-supervised learning and integrated with the custom isomina API for training, evaluation, and deployment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors