Original article was published on Artificial Intelligence on Medium
Using PyTorch for Toxic Content Monitoring with FastText, FastAPI and Docker
In recent years tools for studying online behavior have increased in popularity in both industry and academia. One area of active study is negative online behavior, e.g. comments which are offensive enough that participants would be compelled to leave the conversation. Prior models were error prone and didn’t allow for stakeholders to pick the type of toxicity they were interested in finding, i.e. some models monitor profanity but not other types of offensive behavior. The following article presents a multi-headed model capable of detecting numerous types of toxicity, i.e. threats, obscenity, insults and identity based threats. The dataset used to train the model is provided by Wikipedia’s talk page edits. The following graphic is a breakdown of the dataset into various categories our model will be predicting
The final model produced an F1 score of 0.753 and an ROC-AUC score of 0.987. I am using a Bi-directional LSTM + GRU neural network made with PyTorch, FastText vectorization, a FastAPI framework and deploying using a Docker image.
The first step in building this model is data preprocessing. This includes creating multiple text files with a list of words most associated with a specific type of behavior which a comment is then assigned to. This includes comments which are interpreted as toxic, severely toxic, obscene, threatening, insulting and identity hate. The preprocessing of data was performed in three steps. First, making a dictionary with all of the words that falls into our categories. Second, writing a function that cleans the text by removing characters, user identification, and non-printable characters that can interfere with the models ability to make accurate predictions. The return value is a clean version of the string. A word splitting function is then used to account for words being hidden within normal text and a function that tokenizes the text and replaces associated characters with letters as well as adds spaces around punctuation marks. Finally, a function that does a variety of normalization steps such as separating punctuation marks and de-censoring before splitting words within a sentence into a list of words. The return value is a clean version of the string which is ready to be input into our model.
The model uses PyTorch to make a Bi-directional LSTM + GRU neural network in combination with FastText vectorization. The metrics used to evaluate the results and the several other models that did not make the final cut are F1 and ROC-AUC scores. The AUC score represents the measure of separability, in this case, the difference between toxic and non-toxic behavior. ROC represents the probability curves. The following graphic shows all of the model that were made in an attempt to distinguish which tactic would result with the best results.