Jigsaw Unintended Bias in Toxicity Classification: A Kaggle Case-Study

Original article can be found here (source): Deep Learning on Medium

4 Understanding the data:

4.1 About Data:

Download the data files from here.

The data includes the following:

  • train.csv: the training set, which includes comments, toxicity labels, and subgroups.
  • test.csv: the test set, which contains comment texts but toxicity labels or subgroups.
  • sample_submission.csv: a sample submission file in the correct format.

The text of the individual comment is found in the comment_text column. Each comment in Train has a toxicity label (target), and models should predict the target toxicity for the Test data.

Although there are many identity columns in Train data only a few are required and these identities are: male, female, homosexual_gay_or_lesbian, christian, jewish, muslim, black, white, psychiatric_or_mental_illness. These identities will help us in calculating the final metric.

4.2 Exploratory Data Analysis:

Let’s try to study and analyze our data and try to come up with some meaningful insights. EDA helps us in many ways:

  • We might find some patterns in the data which will help us build good models.
  • We might come up with some meaningful insights which would help us make important business decisions.

Let’s first load our CSV datafiles to pandas data frame:

import pandas as pd
train_df = pd.read_csv('train.csv.zip')
test_df = pd.read_csv('test.csv.zip')

4.2.1 Univariate Analysis of target feature:

This feature is the measure of toxicity for a comment text.

plt.title("Distribution of 'target' in the train set")
sns.distplot(train_df['target'], kde=True, hist=False, bins=120, label='target')
  • We can see that the target feature ranges b/w 0.0 to 1.0
  • Most of the comments have a toxicity score in the range 0.0 to 0.2

Let’s try to see the barplot for each class.

#assigning ‘target’>=0.5 as toxic(1) and ‘target<0.5’ as non-toxic(0)
data = np.where(train_df['target'] >= 0.5, 1, 0)
from collections import Counter
data = Counter(data)

fig, axe = plt.subplots(figsize=(7,5))
sns.barplot(['non-toxic','toxic'],[data[0], data[1]],orient="v", ax=axe)
plt.title("# of datapoints : Toxic vs Non-Toxic")
  • We can see that the data is very much unbalanced. Most of the comments are non-toxic and there are very few toxic comments.

4.2.2 Univariate Analysis of Auxiliary target features:

The data also has several additional toxicity subtype attributes that are highly correlated to target feature. These features are:

  • severe_toxicity , obscene , threat , insult , identity_attack , sexual_explicit

Let’s try to visualize them:

def plot_features_distribution(features, title):
"""Plot the distribution plot on input list of feature values"""


#looping through each feature on which we have to plot a distribution plot
for feature in features:
sns.distplot(train_df.loc[~train_df[feature].isnull(),feature],kde=True,hist=False, bins=120, label=feature)
plt.xlabel("Toxicity Score")
features = ['severe_toxicity', 'obscene','identity_attack','insult','threat', 'sexual_explicit']plot_features_distribution(features, "Distribution of additional toxicity features in the train set")
def plot_stack_bar(features_to_plot, ylabel_, tlabel):
"""This function plot the stack bar for given list of atrributes/features"""

#computing count for toxic and non-toxic for every features
toxic = []
non_toxic = []

#loop through each input feature
for feature in features_to_plot:
#samples the points for given feature while discarding all the nan points
subgroup = train_df[["target", feature]][~train_df[feature].isnull()]
#counting values of each class and saving it to a feature
subgroup_counts = subgroup["target"][subgroup[feature]!=0].value_counts()
#append nos of non-toxic points
#append nos of toxic points

total_ft = len(features_to_plot)
indx = np.arange(total_ft)
width = 0.25

p1 = plt.bar(indx, non_toxic, width)
#bottom=non_toxic this argument stack the plot on top of other
p2 = plt.bar(indx, toxic, width, bottom=non_toxic)

plt.xticks(indx, features_to_plot, rotation=30)
plt.legend((p1[0], p2[0]), ('non-toxic', 'toxic'))
plot_stack_bar(features, "Comment Count", tlabel="# of Toxic + Non-toxic comments based on Auxiliary targets")
  • We can say that most comments are made with the intention to insult someone.

4.2.3 Analysis of Identity features:

A subset of comments has also been labeled with a variety of identity attributes. They can be grouped into five categories: race or ethnicity, gender, sexual orientation, religion and disability, as following:

  • race or ethnicity: asian, black, jewish, latino, other_race_or_ethnicity, white
  • gender: female, male, transgender, other_gender
  • sexual orientation: bisexual, heterosexual, homosexual_gay_or_lesbian, other_sexual_orientation
  • religion: atheist, buddhist, christian, hindu, muslim, other_religion
  • disability: intellectual_or_learning_disability, other_disability, physical_disability, psychiatric_or_mental_illness
features = ['asian', 'black', 'jewish', 'latino', 'other_race_or_ethnicity', 'white']plot_stack_bar(features, "Comment Count", "Number of comments based on race and ethnicity")
features = ['female', 'male', 'transgender', 'other_gender']plot_stack_bar(features, "Comment Count", "Number of comments based on gender")
features = ['bisexual', 'heterosexual', 'homosexual_gay_or_lesbian', 'other_sexual_orientation']plot_stack_bar(features, "Comment Count", "Number of comments based on sexual orientation")
features = ['atheist','buddhist', 'christian', 'hindu', 'muslim', 'other_religion']plot_stack_bar(features, "Comment Count", "Number of comments based on religion")
features = ['intellectual_or_learning_disability', 'other_disability', 'physical_disability', 'psychiatric_or_mental_illness']plot_stack_bar(features, "Comment Count", "Number of comments based on disability features")

We can derive the following conclusions from the above plots:

  • Most comments are made on the ‘Christian’ religion. Also, most of the toxic comments are made on this religion.
  • The comments that mention ‘ gay’ or ‘lesbian’ are more likely to be toxic.

4.2.4 Analysis of comment_text feature:

comment_len = train_df['comment_text'].apply(len)
sns.distplot(comment_len, color='red')
plt.title('Distribution of comment length')
plt.xlabel('Length of comment')
plt.ylabel('Comments count')
plt.xlim(0, 1200)
  • We have a bimodal distribution of character length in the data.
  • The average length of comments is 297
comment_len2 = [len(text.split(" ")) for text in train_df['comment_text'].values.tolist()]

plt.title('Distribution of number of words in a comment')
plt.xlabel('Number of words in comment')
plt.ylabel('Comments count')
plt.xlim(0, 250)
sns.distplot(comment_len2, color='g')
  • We have a clear unimodal left-skewed distribution for the number of words in the data.
  • The average number of words in a comment is 52.

Next, we will sample 20,000 comments from both toxic and non-toxic comments and see the Wordcloud for the top 100 words.

# https://www.kaggle.com/gpreda/jigsaw-eda
def show_wordcloud(data, max_words, title = None):
wordcloud = WordCloud(
).generate(" ".join(data))

fig = plt.figure(1, figsize=(8,8))
if title:
fig.suptitle(title, fontsize=20)


Wordcloud for Toxic comments:

show_wordcloud(train_df.loc[train_df['target'] >= 0.50]['comment_text'].sample(20000),
max_words = 100,
title = 'Word of Toxic comments')
show_wordcloud(train_df.loc[train_df['target'] < 0.50]['comment_text'].sample(20000), 
max_words = 100,
title = 'Wordcloud of Non-Toxic comments')

4.3 Data Preprocessing:

Let’s have a look at some of the comment texts.

idx = random.sample(range(0, len(train_df)), 5)
for i in (idx):

It’s ironic to sterilize homosexuals. They won’t, by their OWN nature, have sex with the opposite sex. Thereby not contributing their genes to the pool. That is why homosexuality, scientifically speaking, is considered a fatal genetic mutation. (ps I’m still waiting for fallout on my mutant comment) haha
Natural Law is neither natural nor is it a law. It is a philosophical opinion. IMHO, it’s use in making an argument is invalid.

For what? (Snork)

So, for saying tax ‘Payers’ should stop (in perpetuity) supporting tax ‘Takers’ (aka welfare recipients)?
* I agree with Cory.
Or for not being willing to hold “in person” town halls where violence can happen-like with the alt left baseball shooting?
* I agree with Cory.
Jail? Laughable.

I support Cory. He doesn’t believe in wasting my hard earned tax dollars either. For that I thank him.
A non-issue.
murdering he says
did you murder a cheeseburger today hypocrite?

We see that our data points consist of lots of punctuations, contractions, quotes, etc. So next we will try to clean them so as to increase their vocabulary coverage.

4.3.1 Handling Contractions:

# https://kite.com/python/docs/nltk.TreebankWordTokenizer
from nltk.tokenize.treebank import TreebankWordTokenizer
tokenizer = TreebankWordTokenizer()
def handle_contractions(x):
x = tokenizer.tokenize(x)
return x

The TreebankWordTokenizer tokenizer performs the following steps:

  • split standard contractions, e.g. don’t -> do n’t and they’ll -> they ‘ll
  • treat most punctuation characters as separate tokens
  • split off commas and single quotes, when followed by whitespace
  • separate periods that appear at the end of line

4.3.2 Handling punctuation marks: We will remove the unwanted punctuation marks.

We are now done cleaning our data.

4.4 Adding weights to the data samples:

A subset of comments in the dataset has also been labeled with a variety of identity attributes, representing the identities that are mentioned in the comment. The following columns corresponding to identity attributes are included in the evaluation calculation. So, we will add identity information as weights to our data points.

IDENTITY_COLUMNS = ['male', 'female', 'homosexual_gay_or_lesbian', 'christian', 'jewish','muslim', 'black', 'white', 'psychiatric_or_mental_illness']weights = np.ones((len(train_df),)) / 4
# Subgroup
weights += (train_df[IDENTITY_COLUMNS].fillna(0).values>=0.5).sum(axis=1).astype(bool).astype(np.int) / 4
# Background Positive, Subgroup Negative
weights += (( (train_df['target'].values>=0.5).astype(bool).astype(np.int) + (train_df[IDENTITY_COLUMNS].fillna(0).values<0.5).sum(axis=1).astype(bool).astype(np.int) ) > 1 ).astype(bool).astype(np.int) / 4
# Background Negative, Subgroup Positive
weights += (( (train_df['target'].values<0.5).astype(bool).astype(np.int) + (train_df[IDENTITY_COLUMNS].fillna(0).values>=0.5).sum(axis=1).astype(bool).astype(np.int) ) > 1 ).astype(bool).astype(np.int) / 4

loss_weight = 1.0 / weights.mean()

4.5 Featurization:

Deep learning or machine learning models can not understand human language. Hence we need to convert our data into a mathematical form before we can feed it as input to our model.

Text tokenization is a method to vectorize a text corpus, by turning each text into a sequence of integers (each integer is the index of a token in a dictionary). This can be done with simple lines of code:

from keras.preprocessing import texttok=text.Tokenizer()


It is to be noted that all the sequence’s length is not equal. Because the comment texts are not equal in length. So, we will pad the sequences to the same length, MAX_LEN. Sequences that are shorter than MAX_LEN are padded with value at the start/end. Sequences longer than MAX_LEN are truncated so that they fit the desired length

x_train=sequence.pad_sequences(x_train, maxlen=MAX_LEN)
x_test=sequence.pad_sequences(x_test, maxlen=MAX_LEN)

Now, we are ready with our data to train a Deep Learning Model.