Real-time Concrete Crack Detection and Instance Segmentation using Deep Transfer Learning

: Cracks on concrete infrastructure are one of the early indications of structural degradation which needs to be identiﬁed early as possible to carry out early preventive measures to avoid further damage. In this paper, we propose to use YOLACT: A real-time instance segmentation algorithm for automatic concrete crack detection. This deep learning algorithm is used with transfer learning to train the YOLACT network to identify and localise cracks with their corresponding masks which can be used to identify each crack instance. The transfer learning techniques allowed us to train the network on a relatively small dataset of 500 crack images. To train the YOLACT network, we created a dataset with ground-truth masks from images collected from publicly available datasets. We evaluated the trained YOLACT model for concrete crack detection with Resnet-50 and Resnet-101 backbone architectures for both precision and speed of detection. The trained model achieved high mAP results with real-time frame-rates when tested on concrete crack images on a single GPU. The YOLACT algorithm was able to correctly segment multiple cracks with individual instance level masks with high localisation accuracy.


Introduction
Concrete based civil infrastructure such as bridges, tunnels and dams undergo structural deterioration due to weathering, corrosion, thermal cycles and carbonation.Cracks on concrete surfaces are often identified as an early indication of possible future structural failures which could be catastrophic if unattended.Therefore, it is of utmost importance to inspect concrete structures frequently for cracks to initiate any proactive measures to avoid further damage.
Use of robotic devices and smart sensors for infrastructure monitoring has become popular in recent years in places where human access is difficult [1][2][3].Nowadays, visual inspection of larger civil structures is done using remotely controlled drones.The recorded video footages from these inspection rounds are manually watched to detect any cracks.This is a highly time-consuming process and largely depends on surveyor's experience and knowledge, which adds an extra subjective bias to the final qualitative analysis.These inefficiencies and human errors can be avoided by developing learning models that automatically identify concrete cracks on recorded video.

Several researchers have attempted to develop deep learning models for concrete crack detection.
There are many deep learning architectures that have been built specifically for concrete crack detection [4][5][6].Some of them can only classify crack images from non-crack images without localizing the cracks [4], while others have attempted to differentiate crack pixels from the background [5].More recently, many have applied deep transfer learning techniques such as Mask R-CNN [7] and YOLO [8] for instance-level segmentation of cracks [9,10] where each crack location can be individually localized and labeled.None of these previous studies looked at the possibility of real-time concrete crack detection with instance-level segmentation.Real-time detection is vital as this will enable active inspection by autonomously steering autonomous robotic platforms such as drones along the cracks.In addition, the robotic platform can be navigated closer to the crack to inspect it in detail.On the other hand, instance segmentation will allow detection of localized multiple cracks on the same image which may provide extra information to predict the propagation of cracks.
In this paper, we demonstrate that deep transfer learning can be used to train an object detection model to automatically identify cracks with segmentation masks to localize cracks on images collected from video inspections in real-time.We specifically investigated YOLACT, a real-time instant segmentation algorithm [11], which outperformed other existing algorithms in speed and accuracy in the COCO object detection dataset and used it to train deep learning model on a small dataset of concrete crack images.To train the crack detection model, we built a dataset by collecting images from a publicly available dataset and manually annotating segmentation mask for each crack.The transfer learning approach helped us to train the network on a smaller dataset with the high-level features extracted from the COCO dataset with reduced training duration.

Materials and Methods
Our framework for concrete crack detection is based on YOLACT: a fully-convolution deep learning model for real-time instance segmentation.Instance segmentation allows us to segment-out all the cracks present in the image into individual segmentation masks.The YOLACT architecture has recorded mean average precision (mAP) of 29.8 at 33 FPS on a single Titan XP GPU.This is significantly faster than any other available instance segmentation frameworks.For example, the popular Mask R-CNN recorded only 9 FPS with slightly higher maP of 35.YOLACT has achieved this by breaking the instance segmentation into two parallel subtasks.The first task generates a set of prototype masks and the second task predicts the per-instance mask coefficients.Then, instance masks are produced by linearly combining the prototypes with the mask coefficients.As these two tasks are run in parallel, the instance segmentation process is faster than other state-of-the-art algorithms.This makes the YOLACT model the ideal choice for real-time concrete crack detection as it provides real-time results which can be incorporated with an autonomous UAV for active inspection of concrete cracks.
To train a YOLACT model for concrete crack detection, we used the publicly available open-source implementation of the YOLACT algorithm.Even though the original YOLACT model was trained on COCO dataset with 80 real-world object categories, we intended to train a YOLACT model for concrete crack detection and instance segmentation.We achieved this by using deep transfer learning techniques and training the YOLACT model on a custom dataset of concrete crack images.
The dataset and the code base for this research work are publicly available at https://github.com/lasithaya/YOLACT.We used freely available Google Colab for training and testing the YOLACT model for concrete crack detection.Colab notebook is available in the GitHub repository to easily replicate the results presented in this paper.

Transfer Learning
Many training images are required to train a deep network such as YOLACT as there are thousands of weight parameters that need training.In some domains, e.g.infrastructure monitoring, acquiring such a large dataset is time-consuming and costly.However, instead of training network from scratch with concrete cracks, we can pre-train a CNN model on a separate large dataset and use the pre-trained weights to initialize the weights for crack detection.This is called transfer learning and often yields better results with smaller datasets [12].The transfer learning technology is commonly used to initialize the backbone of the deep network, where the backbone weights are usually pre-trained on the publicly available ImageNet dataset [13].We used the commonly available ResNet50 and ResNet101 [14] pre-trained backbone models for weight initialization.In addition, as our dataset is small, we used a YOLACT model pre-trained on COCO dataset [15] to initialize the weight for concrete crack detection and trained only the last few layers of the YOLACT model on the concrete crack dataset.The results of these experiments are discussed in Section 3.

Dataset
To test the effectiveness of YOLACT we concatenated concrete crack images from five different publicly available datasets [16][17][18][19][20].However, these images were not annotated with instance segmentation and were not directly usable for training the YOLACT model.Consequently, we built an instance-level segmented dataset with 300 images for training, 100 images for validation and 100 images for testing.All the images were re-scaled to 448 × 448 resolution for efficient training.Figure 1 shows a ground-truth crack image with its instance-level segmented mask.

Training
As we used a small dataset with pre-trained weights, we did not require training for a longer time.We experimented with different training schedules with different backbone architectures.The model's weights were saved every 1000 iteration and later used to evaluate the performance values on the validation set.We also tested different hyperparameters such as learning rate and batch sizes to find the best training parameters.We trained the network with a batch size of 8, the learning rate of 1e-3 and 25,000 iterations on Google Colab.

Results
We trained two separate networks with ResNet-50 and ResNet-101 backbone architectures and evaluated their performances on the test set.The qualitative and quantitative results are discussed in the following sections.

Qualitative Results
Figure 2 shows the qualitative results from a selected set of test images from the dataset.The first column shows the test images and the second column shows the corresponding ground-truth crack location.The last two columns show the test results with ResNet-50 and ResNet-101 backbones.Different color segments in the result columns correspond to instance segmentation of each crack.According to the test results, both backbone architectures performed well in segmenting individual crack.The trained models identified each branch of cracks as separate instances.This is preferable as we can identify the crack propagation more accurately.A close look at the test results revealed that ResNet-101 backbone performed slightly better than ResNet-50 in segmenting some cracks.This can be seen in the last image in Column C, where ResNet-50 failed to identify the small crack propagating up.This might be because ResNet-101 architecture has more deep layers than ResNet-50, which provides more fine-tuned features for the YOLACT network to improve its performances.

Quantitative Results
Table 1 shows the quantitative results of YOLACT performance on our concrete crack detection dataset with ResNet-101 and ResNet-50 backbones.The last three columns of the table show the segmentation performances for both the bounding box and the mask-based localization of cracks.According to the results shown in Table 1, ResNet-101 backbone reported higher mAP (mean average precision) than ResNet-50 backbone in both box and mask segmentations.We used COCO definition of mAP for this evaluation, which is averaged over all object categories and 10 IoU (Intersection over Union) thresholds starting from IoU of 0.5.The last two columns are the average precision (AP) for 50% and 75% IoUs, respectively.Again, ResNet-101 backbone performed better in both AP 50 and AP 75 performance measures.This is expected as ResNet-101 twice as many deep layers as ResNet-50.However, the training on the ResNet-101 takes more time and the inference is slow compared to a network with ResNet-50 backbone.The first two columns of Table 1 show the frames per second (FPS) inference performances on Tesla P100 and Nvidia Titan XP GPUs.According to the test results, real-time inference is possible with Titan XP GPU, and even with the low-end P100 GPU near real-time inference is possible, which is acceptable in many robotics applications.ResNet-50 recorded a higher frame rate than ResNet-101 and is much suitable for real-time applications.

Discussion
In this paper, we evaluate YOLACT: a real-time instance segmentation algorithm for concrete crack detection.We created a small dataset of annotated masks with concrete crack images collected from publicly available datasets.Deep transfer learning techniques were used with different backbone architectures to speed up the training process.This also reduced the number of training images required as we only fine-tuned the last few layers of the YOLACT network for concrete crack segmentation.Both qualitative and quantitative tests were carried out with ResNet-50 and ResNet-101 backbone architectures.ResNet-101 backbone performed slightly better in average precision but ResNet-50 gave much better real-time frame rate when tested on a single GPU.As future work, YOLACT can be easily integrated with a robotic system to carry out active inspections on concrete structures.

Figure 1 .
A image with crack from our dataset: (a) original image; (b) highlighted crack; and (c) annotated cracks with the segmentation masks.

Figure 2 .
Qualitative results on selected test images.

Table 1 .
Quantitative results analysis with different backbone architectures.