This project classifies toxic as well as sarcastic comments using a fine-tuned BERT model with an accuracy of over 95%. It integrates unlabeled data from the Jigsaw Toxic Comment Classification Challenge and utilizes a custom API wrapper named isomina for training orchestration and evaluation.
- Base:
bert-base-uncased - Fine-tuned on labeled and pseudo-labeled Jigsaw dataset
- Multi-label classification (
toxic,severe_toxic,obscene,threat,insult,identity_hate) - Trained using custom
isominaAPI wrapper for streamlined experimentation and deployment
| Visualization of Predictions |

- In output2.png, we see that a sarcastic message was correctly identified.
-
95% validation accuracy
- Semi-supervised training using pseudo-labeling
- BERT fine-tuning pipeline
- Easy experimentation with
isominaAPI - Robust evaluation with ROC AUC, F1, and accuracy metrics
- Data:
- Labeled: From Jigsaw Toxic Comment Classification Challenge
- Unlabeled: From Jigsaw Unintended Bias Dataset
- API: Integrated with the custom
isominaAPI for:- Data preprocessing
- Model orchestration
- Metrics visualization
- Inference handling
git clone https://github.com/yourusername/toxic-comment-bert.git
cd toxic-comment-bert- transformers
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- tqdm
- isomina # Custom or private package, ensure it's accessible
- torch>=1.10.0