Skip to content

VIKNESH1211/Toxic-comment-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Toxic-comment-classification

TensorFlow Keras NumPy Matplotlib Jupyter Notebook Python

This repository consists of experimental classification models for text toxicity identification.


ABOUT THE DATA SET

The JIG-SAW Toxic Comments data set was used to train the model , The data set has 159571 entries with six classes ore six targets which are toxic, severe_toxic , obscene , threat , insult & identity_hate.

The total number of multi labeled comments were 9865 ,

accessibility text

VANNILA-RNN based TREE MODEL

A RNN based NLP model was developed to pick the sentiment of the comment and was used to label the comment using the six labels

accessibility text

The above image was produced using the plot_model function from tensorflow. the diagram of the model clearly shows that the data first goes through the text_vectorization layer then the embedding layer and then through the vannila-rnn layer and then it goes through dropouts to introduce some bias and finally all the features go thorugh 6 different sigmoid layers which can produce 6 independent outputs the above model architecture was achived using the Keras Functional API

accessibility text

model_rnn.compile(optimizer= tf.keras.optimizers.Adam(learning_rate=1e-5)  ,
                     loss = ['binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy' , 
                     'binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy'] , metrics='accuracy')
model_rnn.fit(x = x_train , y = [y_train.toxic , y_train.severe_toxic , y_train.obscene 
                                    , y_train.threat , y_train.insult , y_train.identity_hate] ,
                batch_size=32, epochs=15,
                 validation_data=(x_val , [ y_val.toxic , y_val.severe_toxic , y_val.obscene 
                                            , y_val.threat , y_val.insult , y_val.identity_hate]))

The average loss of all 6 outputs is 0.0845

The average accuracy from all 6 outputs is 0.967

LSTM based TREE MODEL

A LSTM based NLP model was developed to pick the sentiment of the comment and was used to label the comment using the six labels

accessibility text

The above image was produced using the plot_model function from tensorflow. the diagram of the model clearly shows that the data first goes through the text_vectorization layer then the embedding layer and then through the LSTM layer and then it goes through dropouts to introduce some bias and finally all the features go thorugh 6 different sigmoid layers which can produce 6 independent outputs the above model architecture was achived using the Keras Functional API

accessibility text

model_LSTM.compile(optimizer= tf.keras.optimizers.Adam(learning_rate=1e-5 ) ,   
                     loss = ['binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy' , 
                     'binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy'] , metrics= ["accuracy"])
model_LSTM.fit(x = x_train , y = [y_train.toxic , y_train.severe_toxic , y_train.obscene 
                                    , y_train.threat , y_train.insult , y_train.identity_hate] ,
                batch_size=512, epochs=30,
                 validation_data=(x_val , [ y_val.toxic , y_val.severe_toxic , y_val.obscene 
                                            , y_val.threat , y_val.insult , y_val.identity_hate]))

The average loss of all 6 outputs is 0.078

The average accuracy from all 6 outputs is 0.964

About

this repository consists of multiple experimental models which can classifiy a comments with respect to six different labels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors