Toxic-comment-classification

This repository consists of experimental classification models for text toxicity identification.

ABOUT THE DATA SET

The JIG-SAW Toxic Comments data set was used to train the model , The data set has 159571 entries with six classes ore six targets which are toxic, severe_toxic , obscene , threat , insult & identity_hate.

The total number of multi labeled comments were 9865 ,

VANNILA-RNN based TREE MODEL

A RNN based NLP model was developed to pick the sentiment of the comment and was used to label the comment using the six labels

The above image was produced using the plot_model function from tensorflow. the diagram of the model clearly shows that the data first goes through the text_vectorization layer then the embedding layer and then through the vannila-rnn layer and then it goes through dropouts to introduce some bias and finally all the features go thorugh 6 different sigmoid layers which can produce 6 independent outputs the above model architecture was achived using the Keras Functional API

model_rnn.compile(optimizer= tf.keras.optimizers.Adam(learning_rate=1e-5)  ,
                     loss = ['binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy' , 
                     'binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy'] , metrics='accuracy')

model_rnn.fit(x = x_train , y = [y_train.toxic , y_train.severe_toxic , y_train.obscene 
                                    , y_train.threat , y_train.insult , y_train.identity_hate] ,
                batch_size=32, epochs=15,
                 validation_data=(x_val , [ y_val.toxic , y_val.severe_toxic , y_val.obscene 
                                            , y_val.threat , y_val.insult , y_val.identity_hate]))

The average loss of all 6 outputs is 0.0845

The average accuracy from all 6 outputs is 0.967

LSTM based TREE MODEL

A LSTM based NLP model was developed to pick the sentiment of the comment and was used to label the comment using the six labels

The above image was produced using the plot_model function from tensorflow. the diagram of the model clearly shows that the data first goes through the text_vectorization layer then the embedding layer and then through the LSTM layer and then it goes through dropouts to introduce some bias and finally all the features go thorugh 6 different sigmoid layers which can produce 6 independent outputs the above model architecture was achived using the Keras Functional API

model_LSTM.compile(optimizer= tf.keras.optimizers.Adam(learning_rate=1e-5 ) ,   
                     loss = ['binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy' , 
                     'binary_crossentropy' , 'binary_crossentropy' , 'binary_crossentropy'] , metrics= ["accuracy"])

model_LSTM.fit(x = x_train , y = [y_train.toxic , y_train.severe_toxic , y_train.obscene 
                                    , y_train.threat , y_train.insult , y_train.identity_hate] ,
                batch_size=512, epochs=30,
                 validation_data=(x_val , [ y_val.toxic , y_val.severe_toxic , y_val.obscene 
                                            , y_val.threat , y_val.insult , y_val.identity_hate]))

The average loss of all 6 outputs is 0.078

The average accuracy from all 6 outputs is 0.964

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
LSTM		LSTM
RNN		RNN
clean		clean
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic-comment-classification

ABOUT THE DATA SET

VANNILA-RNN based TREE MODEL

LSTM based TREE MODEL

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Toxic-comment-classification

ABOUT THE DATA SET

VANNILA-RNN based TREE MODEL

LSTM based TREE MODEL

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages