Skip to content

LittleRabbitHole/ToxicCommentsClassification

Repository files navigation

Toxic Comments

Project overview

For most online communities, social interaction and discussion are the core mechanisms, through which users communicate with each other to share information and exchange opinions for diverse topics. However, the anonymity afforded by such online communities has led to the increase of misbehavior, such as abuse and harassment, spread of propaganda, hate speech, and many more. Such misbehavior negatively influence the users online experience and impede the healthiness of the online environment. Using 159,571 human labelled online discussion comments under Wikipedia discussion pages, current project intends to solve the problem by providing a technical tool to detect the toxic social interactions accurately and effectively. Utilizing an enriched word embedding feature sets and different machine learning techniques, the proposed model is able to achieve both high overall accuracy rate as well as F1 score in detecting the toxic comments from ordinary comments.

Results:

Presentation slides: [https://github.com/LittleRabbitHole/ToxicCommentsClassification/blob/master/Results.pdf]

Detection with word level features: DataExplorationPrediction_Features.ipynb

Detection with word embedding features: DataExplorationPrediction_w2v.ipynb

Word embedding: w2v_train.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors