Severstal: Steel Defect Detection

Source: Deep Learning on Medium

As you might already guessed, yes! it is combination of pipeline-3 and pipeline-4.well I partially tried all the above strategies and stick to this because it gave me better results than all.and this also take less time to test results because we are filtering out the non-defective images and sending only defective images to segmentation model.Let me give you a detailed view of me doing this approach below.

Exploratory Data Analysis

let us see some EDA to get to know more about data!

Binary Classification

As we see the distribution of defective and non-defective classes above we say that this is a well balanced Binary classification problem

Multi label Classification

Multi label Classification

As we see this is a challenging problem where our multi label classification is imbalance in data,as the class-2 defected images are very small in data size and class-1 defects are very huge in data,classes 3,4 are somewhat balanced.

keeping the imbalance of multi label classification in mind we plot the Number of Classes per Image

Number of Classes per Image

we can observe that distribution first 2 classes are more and 3,4 classes are almost null in images

Visualizing each type of defect in train data:

Let us see some images of each class.this visualization can be done easily by masking given encoded pixels on the train data images,you can refer for code here.

Class-1 Defect

Shades which are in red color in images describe class-1. Class-1 seems to have less defects and almost similar to non-defective images.

Class-2 Defect

Shades which are in green color in images describe class-2. These can be drawn using encoded pixels provided in the train we can notice that these are similar to class-1 defects and it is somewhat difficult to classify between them.

Class-3 Defect

pixels which are in blue,yellow color in images describe we can observe that class-3 images are worse in terms of defect compared to class 1,2.

Class-4 Defect

pixels which are in cyan color in images describe class-4.we can see that these class-4 images are the most damaged images. These can be classified and segmented easily because these unique type of edges and defects rather than other classes.we finally conclude that classes 1,2 are similar and less defective while classes 3,4 are less similar but more defective and hence it is easy to classify them.

Segmentation model using Binary and Multi label Classification

As I have mentioned that I used pipeline-4 as my strategy, let’s view in detail!

Binary and Multi label Classification

Train-Cv split:

Our data is not a time series data so we can split the data into train-Cv randomly. By reading kaggle discussions and other notebooks i found that train data is not similar to the test data provided. So it is advisable to do at-least data augmentation. by doing so we cannot overcome the problem completely, but partially.Here is the code to generate and augment data using keras data generator.

#reference columns =['class1','class2','class3','class4']

mtr_df, mval_df = train_test_split( mc, random_state=42, test_size=0.1825)
print('train_data shape:',mtr_df.shape,'val_data:',mval_df.shape)




Binary and Multi label classification Models

For both models i used pre-trained InceptionResNetV2() model by keras and the weights trained from image-net data.

model = InceptionResNetV2(weights=None, input_shape=(299,299,3), include_top=False)

out=Dense(1,activation='sigmoid')(x) #final layer binary classifier


And similarly Multi -label classification model is also loaded in the same way except for the out-put layer,hence it is a four label classifier, which is out=Dense(4,activation=’sigmoid’)(X)

After training the generator for few epochs we got a good performance of the models

Binary Model Performance

As we can see that out binary model got accuracy 90 and recall of 96 by the end of 17 epochs,it means that out binary model is doing good :).

Let us see how well our Multi label classifier works.I am showing up both models at a time because we have used the same pre-trained model so that we can use common code without repeating.

Multi label Classification

Our multi label model also has a good performance,it has a accuracy of 95.

Performance measurement models:

why do we use precision and recall?

Recall actually calculates how many of the Actual Positives our model capture through labeling it as Positive (True Positive) and Recall shall be the model metric when there is a high cost associated with False Negative.In our case false Negative is classifying a defective image to non-defective,so it makes sense to us that we could not allow our model to make false Negatives, as this can completely discard sending image to further model.

For multi label classification we have used both precision and recall because it is advisable to use both metric when coming to multi label or multi class classification.

As we have saved our model for future use and it has metrics which are not by default in keras,so it has to be loaded by following

#For multi label classification, precision and recallmodel_mul=load_model('/kaggle/input/multicaugg/multic_aug.h5',
#use only recall for binary classification

Test time Augmentation

For better performance of the model we have used test time augmentation, which improved our model performance.we have used TTA for both binary and multi label classification.I will provide a code snippet for simple TTA, you can learn more about TTA here.

tta_steps = 10
multi_class= []
for val in tqdm_notebook(test_gen_mul):

batch_pred = []
for i in range(tta_steps):
preds = model_mul.predict_generator(augmentation_gen.flow(val,batch_size=bs, shuffle=False),steps=1)

pred = np.mean(batch_pred, axis=0)

Segmentation Models(Mask prediction)

Data loading

It is necessary to use other type of generator rather than keras generatot because we have to get EncodedPixels(data_y’s) to our generator to train, which is not possible by using keras Image generator.It can be done by using custom data-generator by stanford edu.

class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, list_IDs, df, target_df=None, mode='fit',
batch_size=16, dim=(128, 800),preprocess=None, n_channels=3,
n_classes=1, random_state=2019, shuffle=False):
self.dim = dim
self.batch_size = batch_size
self.df = df
self.mode = mode
self.preprocess = preprocess
self.base_path = base_path
self.target_df = target_df
self.list_IDs = list_IDs
self.n_channels = n_channels
self.n_classes = n_classes
self.shuffle = shuffle
self.random_state = random_state

def __len__(self):
'Denotes the number of batches per epoch'
return int(np.floor(len(self.list_IDs) / self.batch_size))

def __getitem__(self, index):
'Generate one batch of data'
# Generate indexes of the batch
indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
# Find list of IDs
list_IDs_batch = [self.list_IDs[k] for k in indexes]

X = self.__generate_X(list_IDs_batch)

if self.mode == 'fit':
y = self.__generate_y(list_IDs_batch)
return X, y

elif self.mode == 'predict':
return X

raise AttributeError('The mode parameter should be set to "fit" or "predict".')

def on_epoch_end(self):
'Updates indexes after each epoch'
self.indexes = np.arange(len(self.list_IDs))
if self.shuffle == True:

def __generate_X(self, list_IDs_batch):
'Generates data containing batch_size samples'
# Initialization
X = np.empty((self.batch_size, *self.dim, self.n_channels))

# Generate data
for i, ID in enumerate(list_IDs_batch):
im_name = self.df['imageid'][ID]
img_path = f"{self.base_path}/{im_name}"
img = self.__load_rgb(img_path)
# Store samples
img = cv2.resize(img,(800,128))
X[i,] = img
#print(" X sahpe",X.shape)
#print(" img sahpe",img.shape)
# normalize
#X = X / 255
if self.preprocess!=None: X = self.preprocess(X)

return X

def __generate_y(self, list_IDs_batch):
y = np.empty((self.batch_size, *self.dim, self.n_classes), dtype=int)

for i, ID in enumerate(list_IDs_batch):
im_name = self.df['imageid'][ID]
#image_df = self.target_df[self.target_df['imageid'] == im_name]

rles = self.df['EncodedPixels'][ID]
masks = rle_to_mask(rles, 256,1600)
masks = cv2.resize(masks,(800,128))

#print(" y sahpe",y.shape)
#print(" masks sahpe",masks.shape)
y[i, ] = np.expand_dims(masks, -1)
y = (y > 0).astype(int)
return y

def __load_rgb(self, img_path):
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = img.astype(np.float32) / 255.

return img

def __load_grayscale(self, img_path):
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
img = img.astype(np.float32) / 255.
img = np.expand_dims(img, axis=-1)

return img

RLE (Run Length Encoder)

For predicting the pixel regions which are defected, we need to convert them into RLE’s,this is done by kaggle In order to reduce the submission file size. I will provide code snippet to do so, you can refer here for more details about how to convert pixels to RLE.

def mask_to_rle(mask):
Convert a mask into RLE

mask (numpy.array): binary mask of numpy array where 1 - mask, 0 - background

sring: run length encoding
pixels= mask.T.flatten()
pixels = np.concatenate([[0], pixels, [0]])
runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
runs[1::2] -= runs[::2]
return ' '.join(str(x) for x in runs)

And also we need to convert RLE’s provided in train data into masks to get fit in train data so we do it as follows

def rle_to_mask(rle_string, height, width):

rows, cols = height, width
img = np.zeros(rows * cols, dtype=np.uint8)
if len(str(rle_string)) > 1:
rle_numbers = [int(numstring) for numstring in rle_string.split(' ')]
rle_pairs = np.array(rle_numbers).reshape(-1, 2)
for index, length in rle_pairs:
index -= 1
img[index:index+length] = 255
else: img = np.zeros(cols*rows)
img = img.reshape(cols, rows)
img = img.T
return img

Segmentation Models

Output from the previous models(Defected images) will be the input for the segmentation model. segmentation models are very useful to predict regions, segmentation model gave us the good result, there are many wonderful GitHub repositories of pre-trained segmentation models. One which i used is from quvbel who posted models in both keras and pytorch. All you need to do is type pip install segmentation-models and then import.

from segmentation_models import Unet
model = Unet('resnet34')

and start training by


Training Model

We will build four segmentation models to train four defect-classes separately, so that we can pass each defective image(which is the out-put of multi label) to respective segmentation model to predict the location of the defect area.

#For class-1 defective images
pred_c1 = Unet('resnet50', input_shape=(128, 800, 3), classes=1, activation='sigmoid')
pred_c1.compile(optimizer='adam', loss='binary_crossentropy', metrics=[dice_coef,loss_dice_coef])
history = pred_c1.fit_generator(
epochs=25 )

Similarly for class 2,3,4 we have build and trained the similar models and saved the models for future use.Let us look at train data performance results.

Defective Class-1 Train Results
Defective Class-2 Train Results
Defective Class-3 Train Results
Defective Class-4 Train Results

As we already discussed and can be observe from train-Cv results that class 1,2 has low dice-coef and class 3,4 have a good dice_coef as class3,4 have more edges and large defects.


The Dice coefficient can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by:

#reference2 ∗(|X∩Y|) / (|X|+|Y|)

where X is the predicted set of pixels and Y is the ground truth. The Dice coefficient is defined to be 1 when both X and Y are empty. The leader board score is the mean of the Dice coefficients for each <ImageId, ClassId> pair in the test set.


Let us have a glance at some random mask prediction results.


Defective Class-1 Cv Results


Defective Class-2 Cv Results


Defective Class-3 Cv Results


Defective Class-4 Cv Results

These Results show that performance of our model is good

Further scope

We have a good results so far,but can still there is a scope to improve the model

  • As there is less similarity between train test data we can use Adversarial Validations to improve performance.
  • You can also Use other data Augmentation techniques for both train and test Augmentation.
  • You can use a better pipeline, which may perform better than this.



Thanks for reading… !!!