For most online communities, social interaction and discussion are the core mechanisms, through which users communicate with each other to share information and exchange opinions for diverse topics. However, the anonymity afforded by such online communities has led to the increase of misbehavior, such as abuse and harassment, spread of propaganda, hate speech, and many more. Such misbehavior negatively influence the users online experience and impede the healthiness of the online environment. Using 159,571 human labelled online discussion comments under Wikipedia discussion pages, current project intends to solve the problem by providing a technical tool to detect the toxic social interactions accurately and effectively. Utilizing an enriched word embedding feature sets and different machine learning techniques, the proposed model is able to achieve both high overall accuracy rate as well as F1 score in detecting the toxic comments from ordinary comments.
Presentation slides: [https://github.com/LittleRabbitHole/ToxicCommentsClassification/blob/master/Results.pdf]
Detection with word level features: DataExplorationPrediction_Features.ipynb
Detection with word embedding features: DataExplorationPrediction_w2v.ipynb
Word embedding: w2v_train.py