Source: Deep Learning on Medium
Application Of Deep Learning In Identifying Road Cracks
Recently I had a chance to work with a really cool road crack detection dataset as part of my research. A company (lets call it Ministry of Road Cracks and Other Important Stuff (MRCOIS for short) 😑) was seeking an autonomous system to localize the road cracks and classify them according to 3 crack severity levels (low, medium and high). They were generous enough to provide us with patches of cracked roads collected over four major highways in Ontario. The dataset was collected using the ARAN 9000 (Automatic Road Analyzer 9000) road surveying system designed by Furgo Inc. The ARAN 9000 is an integrated LiDAR and camera based system that is commonly used to perform road surveys.
The dataset was segmented into cracks with 3 different levels of severity (low, medium and high). The camera and LiDAR data were communally used to detect the road cracks. The severity of the cracks was then determined based on the width and the depth of the crack (Deep and wide cracks are identified with critical or high severity whereas shallow and narrow cracks are of low-medium severity).
The current systems used by MRCOIS was based on a combination of thresholding techniques for crack localization followed by a multi-label classification based on the average width and depth of the crack for identifying the severity. Thresholding methods are very sensitive to noise and hence road markings were often falsely picked up as road cracks in the image. Thresholding techniques are also known to yield incomplete edges, hence the method used edge connectivity to join discontinuous edges. This approach often resulted in false edges being generated in regions without any cracks.
To solve these issues, my friend and I used two popular deep learning architectures (modified CNN and FCN) to detect and classify the road cracks. I was able to obtain a total of ~3,000 images. Since the dataset was very small, I extensively used data augmentation techniques to increase the size of the dataset to roughly around 220,000 images. A large amount of data was generated by cropping the original image to smaller patches with a 75% overlap between adjacent patches. Since not every patch contained crack pixels, further filtering was used to remove images containing less than 0.5% crack pixels. The patches were than divided into 9:1 for training and validation purposes.
Method 1: Convolutional Neural Network with Structured Prediction
The first method was based off of a Convolutional Neural Network (CNN) with structured prediction. The original paper that proposed the method performed a binary classification to localize the crack pixels (i.e. crack or non-crack). In my paper, the architecture was improved to detect different levels of crack severity. A deeper network was utilized to learn more complex features and output multiple classes corresponding to different levels of crack severity. The architecture is illustrated in the figure below.
Method 2: Fully Convolutional Neural Network
The second method used in this study was the FCN-8 based on VGG-16 (full paper). The structure of the FC network is based upon encoder-decoder architecture, where the initial 7 layers of the network are layers of a typical CNN, and subsequent layers are used for generating the segmentation map by up-sampling the results via deconvolutions.
For this study we used the FCN-8 structure. The network contained 7 convolutions and 5 pooling layers. Since we were using FCN-8 architecture we had 3 deconvolution layers with skip connections from pooling layers 3 and 4.
Results & Validation
The network was trained on a GTX-1070 GPU and 16 GB of RAM. The results were evaluated for localization and severity classification.
Precision, recall and F1-score were calculated for the crack localization (i.e. binary classification) and severity classification results. The accuracy metrics for the validation dataset are shown in the tables below.
Though the results indicated accurate detection results, there were still issues that contributed to erroneous segmentation. One of the key problems I faced, while training the networks, was the poor quality of the training dataset. The ground truth dataset contained road markings and false edges. Hence the overall crack localization accuracy was poor which in turn contributed to a lower crack severity classification accuracy. The amount of training data was also not sufficient as a few images were initially provided and not every image contained crack pixels.
Though promising results were obtained, a few improvements still need to be made (getting better training dataset would be a start 😅). If you guys like the work or need more information, you can find my full publication here.