A Crack Segmentation Model Combining Morphological Network and Multiple Loss Mechanism

With the wide application of computer vision technology and deep-learning theory in engineering, the image-based detection of cracks in structures such as pipelines, pavements and dams has received more and more attention. Aiming at the problems of high cost, low efficiency and poor detection accuracy in traditional crack detection methods, this paper proposes a crack segmentation network by combining a morphological network and a multiple-loss mechanism. First, for improving the identification of cracks with different resolutions, the U-Net network is used to extract multi-scale features from the crack image. Second, for eliminating the effect of polarized light on the cracks under different illuminations, the extracted crack features are further morphologically processed by a white-top hat transform and a black-bottom hat transform. Finally, a multi-loss mechanism is designed to solve the problem of the inaccurate segmentation of cracks on a single scale. Extensive experiments are carried out on five open crack datasets: Crack500, CrackTree200, CFD, AEL and GAPs384. The experimental results showed that the average ODS, OIS, AIU, sODS and sOIS are 75.7%, 73.9%, 36.4%, 52.4% and 52.2%, respectively. Compared with state-of-the-art methods, the proposed method achieves better crack segmentation performance. Ablation experiments also verified the effectiveness of each module in the algorithm.


Introduction
Inevitably, there are some injuries and defects in concrete structures that arise in the process of building and using because of reasons having to do with design, construction, loads and materials and so on. These defects, which initially present as cracks, should seriously affect the safety and durability of a structure, and they need to be discovered as early as possible. Crack segmentation based on machine vision is a process of automatically extracting cracks from images captured by cameras. Automatic crack segmentation not only has important theoretical research significance, but also has an important early warning application in practical engineering.
Compared with the depth-learning method, the traditional crack segmentation method does not require training and is simpler and more direct [1][2][3][4][5][6][7][8][9]. This kind of algorithm is easy to implement and fast in operation. Although the traditional method is simple and fast, it is easily affected by noise. Improper threshold selection will significantly reduce the segmentation accuracy.
Depth-learning-based methods can automatically extract the shallow and deep characteristics of cracks, so they can greatly improve segmentation and detection accuracy with regard to cracks. The backbone networks commonly used for crack segmentation include the multi-scale pyramid structure network [10][11][12][13][14][15][16][17] and the U-Net network [18][19][20][21][22]. Methods based on multi-scale features can extract more abundant crack image features, but the existing methods make it difficult to detect cracks similar to the background. Because of its good performance in medical image segmentation, the U-Net network and its expansion are widely used in crack detection [18][19][20][21][22]. From the experimental results of many studies, it can be seen that the methods based on deep learning have a significant improvement in the effects of crack segmentation. However, the segmentation accuracy of cracks under polarized light is generally poor.
In order to improve crack segmentation performance under polarized light, this paper proposes a crack segmentation method that combines a morphological network and a multi-loss function. In summary, the main contributions of this paper are listed as follows.
(1) In view of inaccurate crack detection caused by the lack of an effective processing mechanism for the influence of the existing depth network model on polarized light, this paper proposes a morphological-processing network module, which processes the crack feature with a white-top hat and a black-bottom hat transformation to correct the influence of uneven lighting, and the bright cracks on the dark background and the dark cracks under the bright background are effectively extracted by the proposed method.
(2) In view of the incomplete detection on different scales of the crack image, a multiscale loss fusion mechanism is adopted to overcome the problem of too thin or too thick cracks extracted by single-scale detection.
(3) The experimental results show that the proposed method can accurately detect all kinds of cracks in the five public crack datasets.
The paper is organized as follows. Related works are presented in Section 2. The proposed crack segmentation method is proposed in Section 3. Experimental results and analysis are presented in Section 4. The conclusions are drawn in Section 5.
In traditional methods, manual features are usually used to extract cracks. Because cracks exhibit certain edge characteristics, commonly used edge detection operators are performed for the extraction of cracks [1]. Based on Euclidean graphs as crack pattern descriptors, salient skeleton features are extracted to locate the cracks in an image [5]. A Gabor filter is used to detect the cracks in any direction [8]. Morphological filters and dynamic thresholding are applied to extract cracks of different thicknesses [9]. Using the pixel brightness difference between cracks and surrounding areas, methods such as random structure forest [4] and the threshold-processing method [6,7] can also effectively extract cracks. Although the traditional method is simple in implementation and fast in operation, it is easily affected by noise. Improper selection of the threshold will significantly reduce the segmentation accuracy.
Because the shallow and deep features of an image can be extracted automatically, the method based on deep learning can greatly improve the accuracy of crack segmentation and detection. Due to the ability to extract the information at different resolutions, multi-scale features are widely used in the segmentation and detection of cracks of different sizes [10][11][12][13][14][15][16][17]. Chun et al. [10] proposed a pavement crack detection method based on multi-scale attention and hesitant fuzzy set (HFS) theory. Saining et al. [11] first performed edge detection with a multiple loss structure, and they then fused multi-scale features to improve the detection effect of cracks. Yang et al. [12] proposed a pyramid structure and a hierarchical boosting network to detect pavement cracks. Wang et al. [13] proposed a bridge crack detection model, which combined the Inception-Resnet-v2 module, multi-scale feature fusion and a GKA clustering mechanism to improve the real-time detection performance. However, it is difficult for this algorithm to segment the small cracks accurately. Sun et al. [14] used an adaptive bilateral filtering algorithm to reduce the influence of noise before using the FPHBN network to segment the cracks. Li et al. [15] fused feature maps of different scales and used a class-balanced cross-entropy loss function to improve the accuracy, speed and robustness of crack segmentation, but the edge of the segmented crack was blurred. The attention mechanism is imposed on the fused multi-scale feature to enhance the distinction of crack feature representation [16]. A feature aggregation network with the spatial-channel squeeze and excitation attention mechanism module was proposed in paper [17] to accurately segment cracks. Although these methods based on multi-scale features can extract more abundant fracture image features, they were unable to distinguish cracks and backgrounds under biased light, which would lead to a decline in segmentation and detection performance.
Since the U-Net network can not only extract multi-scale features of objects with different sizes, the cross-layer connection in the encoder-decoder structure can also avoid information loss in the deep network. Therefore, the U-Net network has been widely used in image target segmentation, since it was proposed for medical image segmentation. Due to its excellent segmentation performance, U-Net network and its extension have also been widely used in crack segmentation [18][19][20][21][22] and rust detection in metallic constructions [23]. Hong et al. [20] introduced an attention module to the U-Net network and fused the features of skip connections to segment cracks on UAV aerial photography pavement. Jacob et al. [21] proposed an improved U-Net algorithm and introduced a data augmentation strategy to improve the accuracy of crack segmentation. A U-Net network with alternately updated clique was designed to separate cracks from the background [22].
In addition, scholars have also conducted research on crack segmentation and detection based on a combination of detection and segmentation [24,25], transformers [26], super-resolution reconstructions [27], an attention mechanism [28,29], transfer-learning technology [30], a fully convolutional network [31][32][33], deep learning and heuristic image post-processing [34]. As a result, relatively effective results have been obtained in the segmentation of various cracks.
Compared with traditional methods, the methods based on depth learning have achieved better crack segmentation performance, but due to the lack of an effective mechanism to remove polarized light, it is difficult to accurately detect cracks with an uneven brightness distribution.
Aiming at the problem of inaccurate crack detection caused by uneven illumination in images, this paper proposes a morphological network module, which performs whitetop hat and black-bottom hat processing on the crack feature map to correct unevenness. Aiming at the incomplete description of cracks by the shallow and deep features of the network, a multi-scale loss fusion mechanism is used to overcome the problem of too thin or too coarse cracks extracted by single-scale detection.

The Proposed Crack Segmentation Method
A flow chart of the proposed crack segmentation method is shown in Figure 1, which consists of four main modules: feature extraction based on a backbone network, feature enhancement based on morphological processing, multi-scale feature fusion and multiobjective loss function calculation.

Multi-Scale Feature Extraction Based on U-Net Network
This paper selects the U-Net network as the feature extraction module. As shown in Figure 2, the U-Net network consists of the left encoding part, the right decoding part and the next two convolution and activation layers. The encoding part consists of four repeating structures, each of which consists of two 3 × 3 convolutional layers, nonlinear ReLU layers and a 2 × 2 max pooling layer, which correspond to the blue and red arrows in Figure 2, respectively. The decoding part is similar to the encoding part and also consists First, the crack image is inputted into the U-Net network to extract features of crack images. The U-Net network uses an encoder-decoder structure to extract features of different scales and fuses features of the same scale of encoder and decoder to make the crack feature more prominent. Second, the extracted features are inputted into the morphological-processing network, and the white-top hat and black-bottom hat transformation are performed, respectively, to correct the influence of uneven illumination on the captured crack images. Then, the output features of the U-Net network and the morphological-processing network are fused and inputted into the side network to obtain a crack prediction map at each scale. Finally, the multi-scale prediction results are fused to obtain the final crack segmentation result, and the loss function of each scale and the final prediction map loss are fused into the final loss function.

Multi-Scale Feature Extraction Based on U-Net Network
This paper selects the U-Net network as the feature extraction module. As shown in Figure 2, the U-Net network consists of the left encoding part, the right decoding part and the next two convolution and activation layers. The encoding part consists of four repeating structures, each of which consists of two 3 × 3 convolutional layers, nonlinear ReLU layers and a 2 × 2 max pooling layer, which correspond to the blue and red arrows in Figure 2, respectively. The decoding part is similar to the encoding part and also consists of four repeating structures. Deconvolution up-conv is used before each repeating structure to halve the number of channels and double the size of the feature map, corresponding to the green arrow in the figure. The deconvolution result is concatenated with the feature map of the same scale from the corresponding encoding part, which corresponds to the white/blue blocks of each repeating structure. The concatenated feature map is subjected to two 3 × 3 convolutions, which correspond to the blue arrows in the decoding part. At the last layer of the network, the feature map with 64 channels is converted into the prediction result of whether it is a crack through 1 × 1 convolution, corresponding to the cyan arrow. As shown in Table 1, the parameters of the U-Net network structure are listed.

Multi-Scale Feature Extraction Based on U-Net Network
This paper selects the U-Net network as the feature extraction module. As shown in Figure 2, the U-Net network consists of the left encoding part, the right decoding part and the next two convolution and activation layers. The encoding part consists of four repeating structures, each of which consists of two 3 × 3 convolutional layers, nonlinear ReLU layers and a 2 × 2 max pooling layer, which correspond to the blue and red arrows in Figure 2, respectively. The decoding part is similar to the encoding part and also consists of four repeating structures. Deconvolution up-conv is used before each repeating structure to halve the number of channels and double the size of the feature map, corresponding to the green arrow in the figure. The deconvolution result is concatenated with the feature map of the same scale from the corresponding encoding part, which corresponds to the white/blue blocks of each repeating structure. The concatenated feature map is subjected to two 3 × 3 convolutions, which correspond to the blue arrows in the decoding part. At the last layer of the network, the feature map with 64 channels is converted into the prediction result of whether it is a crack through 1 × 1 convolution, corresponding to the cyan arrow. As shown in Table 1, the parameters of the U-Net network structure are listed. are extracted and sent to the morphological network for depolarization processing. In this paper, a pavement crack image with a size and dimension of W × H × 3 is used as the input image of the U-Net network, and the three features of the decoder part with sizes of W × H × 1, W 2 × H 2 × 128 and W 4 × H 4 × 256 are extracted and sent to the morphological network for depolarization processing.

Morphological Network
When the camera takes a picture of the crack, different parts of the crack perhaps appear as different colors and brightness values due to the change of light and shadow and different shooting angles, which is called a polarized light phenomenon. This phenomenon will lead to the same crack being treated as different objects, thus affecting the performance of target segmentation. In order to overcome the influence of polarized light on object segmentation performance, a morphological network is designed to enhance the output features of the U-Net network to highlight the bright cracks on a dark background and the dark cracks on a bright background.
The morphological network consists of the white-top hat transformation T hat (I) and the black-bottom hat transformation B hat (I): where I ∈ R W×H×C represents the output feature of the U-Net network, (I W e ) represents the morphological erosion operation on I and (I ⊕ W d ) represents the morphological expansion operation on I. Here, W d ∈ R M×N×K and W e ∈ R M×N×K are dilation and erosion filters, and W × H and C represent the resolution and number of channels of the extracted features, respectively. M × N and K are the size and the number of the filters. For ∀k ∈ [1, K], x ∈ [1, W], y ∈ [1, H], the dilation (⊕) and erosion ( ) operations on the feature map I are as follows: The value ranges of S 1 and S 2 are as follows: Examples of dilation and erosion operations are shown in Figures 3 and 4, respectively. The erosion operation is to make the dark area in the image larger, and the dilation operation is to make the bright area in the image larger. The functions of dilation and erosion are to eliminate noise, segment independent cracks, connect adjacent elements in the image and find the maximum or minimum area in the image. Therefore, subtracting morphologically manipulated images from the original image (and vice versa) can highlight areas that are brighter or darker than that in the original image, thus correcting the effects of uneven illumination. The structure diagram of the morphological network is shown in Figure 5, which consists of parallel erosion and dilation processing layers, a subtracting operation and a concatenation operation. The K feature maps of the expansion layer are obtained by convoluting K expansion filters W d with the output features of U-Net. In the same way, the K feature maps of the erosion layer are obtained by convolving the K erosion filters W e with the output features of the U-Net. After expansion and corrosion processing, the white-top hat feature and black-bottom hat feature are obtained through difference operation. The difference feature map and morphological processing feature map are concatenated in series, weighted by linear combination and convolved by a 1 × 1 filter to obtain the output feature map of the morphological network.
features, respectively. M N  and K are the size and the number of the filters.
For       The value ranges of 1 S and 2 S are as follows: Examples of dilation and erosion operations are shown in Figures 3 and 4, respectively. The erosion operation is to make the dark area in the image larger, and the dilation operation is to make the bright area in the image larger. The functions of dilation and erosion are to eliminate noise, segment independent cracks, connect adjacent elements in the image and find the maximum or minimum area in the image. Therefore, subtracting morphologically manipulated images from the original image (and vice versa) can highlight areas that are brighter or darker than that in the original image, thus correcting the effects of uneven illumination. The structure diagram of the morphological network is shown in Figure 5, which consists of parallel erosion and dilation processing layers, a subtracting operation and a concatenation operation. The K feature maps of the expansion layer are obtained by convoluting K expansion filters d W with the output features of U-Net. In the same way, the K feature maps of the erosion layer are obtained by convolving the K erosion filters e W with the output features of the U-Net. After expansion and corrosion processing, the white-top hat feature and black-bottom hat feature are obtained through difference operation. The difference feature map and morphological processing feature map are concatenated in series, weighted by linear combination and convolved by a 1 × 1 filter to obtain the output feature map of the morphological network.     In order to verify the effectiveness of the morphological network module, fe maps before and after the morphological network processing are extracted, respecti which are shown in Figure 6. As shown in Figure 6, there is a large difference betwee extracted crack shape and the ground truth data without morphological processing there are many false detection areas. Compared to the feature map before morpholo processing, the feature map after morphological processing is closer to the ground t The experimental results verify the effectiveness of the morphological processing.  In order to verify the effectiveness of the morphological network module, feature maps before and after the morphological network processing are extracted, respectively, which are shown in Figure 6. As shown in Figure 6, there is a large difference between the extracted crack shape and the ground truth data without morphological processing, and there are many false detection areas. Compared to the feature map before morphological processing, the feature map after morphological processing is closer to the ground truth. The experimental results verify the effectiveness of the morphological processing. In order to verify the effectiveness of the morphological network module, feature maps before and after the morphological network processing are extracted, respectively, which are shown in Figure 6. As shown in Figure 6, there is a large difference between the extracted crack shape and the ground truth data without morphological processing, and there are many false detection areas. Compared to the feature map before morphological processing, the feature map after morphological processing is closer to the ground truth. The experimental results verify the effectiveness of the morphological processing.

Side Network and Loss Function
To obtain the crack prediction result, the output feature of the morphological network on each scale is inputted to the side network for channel merging and up-sampling. The specific operations are as follows: (1) The enhanced feature sized of W 4 × H 4 × 256 is performed a 1 × 1 convolution operation to obtain the dimension reduction feature with the number of channels being 1. Through two 2 × 2 up-sampling convolution operations, the up-sampled feature with the same size as the original map is obtained. Then, the prediction map of the first scale is obtained by the activation function processing, which is recorded asŶ (1) side .
(2) The prediction result of the second scale is obtained after the enhanced feature sized of W 2 × H 2 × 128 is conducted in a 1 × 1 convolution operation, a 2 × 2 up-sampling and activation function processing, which is denoted asŶ (2) side .
(3) The prediction result of the third scale is obtained by only activation function processing, which is expressed asŶ (3) side with the size of W × H × 1. From the observation of the crack images, it can be seen that the size of the crack takes a small proportion in the image. That is to say, the number of negative samples used for model training is far greater than the number of positive samples. A large number of negative samples will encourage the model to ignore the learning of positive samples, which will lead to a poor prediction of positive samples and a low F1 value. For solving the sample imbalance problem in the model training, the Dice Loss is used to reduce the learning degree of simple negative samples, and thus to improve the segmentation performance. The Dice Loss is calculated as follows.
Given two sets A and B, their Dice similarity coefficient S(A, B) is defined as Equation (7), and its value range is [0, 1]: where |A ∩ B| is the number of elements in the intersection of A and B, and |A| and |B| represent the number of elements in the set A and B, respectively. In this paper, A and B are predicted and true positive sample sets, respectively.  (8): Dice Loss L dice is defined as follows: On the one hand, Dice Loss-based network learning directly uses the segmentation effect evaluation index as the loss function, and on the other hand, a large number of background pixels are ignored in the calculation of the ratio between the intersection and the union. Therefore, the Dice Loss function is chosen to solve the problem of uneven positive and negative samples of crack images and to improve the convergence speed at the same time.
In order to make full use of the multi-scale information of the crack, this paper calculates the loss of the prediction results at each scale and then fuses the loss of the final prediction result to obtain the objective loss function. The design of the multiple loss function is shown in Figure 7.    The final crack prediction resultŶ is obtained by concatenation and 1 × 1 convolution of multi-scale prediction results. The objective loss function L sum dice is the sum of the loss on all scales L side dice and the final prediction loss L f inal dice , which is used to adjust the training of the network model until it converges to the desired value, and thus, the network model is obtained.
whereŶ (m) side and Y side are the predicted result and the ground-truth of the m − th scale, and M is the total number of scales, which is set as three in the experiment. Figure 8 shows the crack image, its prediction results on three scales such asŶ (1) side ,Ŷ (2) side andŶ (3) side and the final prediction resultŶ.Ŷ is the final prediction result by performing a 1 × 1 convolution on the concatenation of the features of three scales. As can be seen from Figure 8, shallow featuresŶ (1) side andŶ (2) side pay more attention to the location information of cracks, but there is a relatively large difference between the prediction result and the ground-truth due to the down-sampling operation.Ŷ

Analysis of Experimental Results
In order to verify the performance of the proposed crack segmentation network, extensive comparisons are made on five open crack datasets, i.e., CrackTree200 [35], Crack500 [36], GAPs384 [37], AEL [38] and CFD [39]. Further, ablation experiments are used to verify the effectiveness and necessity of each module.

Experimental Dataset and Evaluation Indicators
CrackTree200 [35] is a road crack image dataset that was proposed in 2012, which includes 206 road crack images with resolutions of 800 × 600. The image is rich in shadows and occlusion, and the cracks are slender and diversified in distribution. The numbers of images in the training dataset and the test dataset are 126 and 80, respectively.
Crack500 [36] is a dataset of pavement cracks that was taken with mobile phones in 2016. Each high-definition image was cropped into 16 crack images with a size of 360 × 640. The dataset includes 1896 training images, 348 verification images and 1124 test images.
GAPs384 [37] is the German Asphalt Pavement Distress (GAPs) dataset, which contains various common pavement diseases such as cracks, potholes, patches, etc. The images are characterized by low illumination, oil stains, zebra stripes and other noise, and the image resolution is 540 × 440. The training set and test set contain 409 images and 100 images, respectively. AEL [38] is a small dataset of crack images collected under various background environments, which consists of 47 training images and 11 test images.
CFD [39] is a widely used road crack dataset that was photographed by mobile phones, including 118 images with a size of 480 × 320, 95 of which are training images and 23 are test images. Sample labeling is performed at the pixel level.

Analysis of Experimental Results
In order to verify the performance of the proposed crack segmentation network, extensive comparisons are made on five open crack datasets, i.e., CrackTree200 [35], Crack500 [36], GAPs384 [37], AEL [38] and CFD [39]. Further, ablation experiments are used to verify the effectiveness and necessity of each module.

Experimental Dataset and Evaluation Indicators
CrackTree200 [35] is a road crack image dataset that was proposed in 2012, which includes 206 road crack images with resolutions of 800 × 600. The image is rich in shadows and occlusion, and the cracks are slender and diversified in distribution. The numbers of images in the training dataset and the test dataset are 126 and 80, respectively.
Crack500 [36] is a dataset of pavement cracks that was taken with mobile phones in 2016. Each high-definition image was cropped into 16 crack images with a size of 360 × 640. The dataset includes 1896 training images, 348 verification images and 1124 test images.
GAPs384 [37] is the German Asphalt Pavement Distress (GAPs) dataset, which contains various common pavement diseases such as cracks, potholes, patches, etc. The images are characterized by low illumination, oil stains, zebra stripes and other noise, and the image resolution is 540 × 440. The training set and test set contain 409 images and 100 images, respectively.
AEL [38] is a small dataset of crack images collected under various background environments, which consists of 47 training images and 11 test images.
CFD [39] is a widely used road crack dataset that was photographed by mobile phones, including 118 images with a size of 480 × 320, 95 of which are training images and 23 are test images. Sample labeling is performed at the pixel level.
The objective evaluation indicators used in this paper were the segmentation performance evaluation indicators ODS (Optimal Dataset Scale), OIS (Optimal Image Scale), AIU (Average Intersection over Union) [12] and sODS (simplified versions of ODS) and sOIS (simplified versions of OIS) [18]. OIS is the aggregate F measure of the best threshold in each image in the dataset, and ODS is the best F-measure on the fixed threshold dataset. AIU is the coincidence ratio between the predicted fracture area and the real fracture area. SOIS and sODS are simplified versions of OIS and ODS. The higher the value of the five evaluation indicators, the better the segmentation effect.

Comparison and Analysis of Subjective Results
In order to verify the better performance of the proposed algorithm, comparative experiments were conducted with the classical object detection and segmentation algorithms FCN [40], RCF [41], HED [11], FPHBN [12], DAUNet [18] and SPLAC U-Net [23]. Figure 9 shows the original crack image, the ground truth of the crack and the comparison experiment results, where the extracted crack is represented by white pixels, and the background is represented by black pixels.

Comparison and Analysis of Subjective Results
In order to verify the better performance of the proposed algorithm, comparative experiments were conducted with the classical object detection and segmentation algorithms FCN [40], RCF [41], HED [11], FPHBN [12], DAUNet [18] and SPLAC U-Net [23]. Figure 9 shows the original crack image, the ground truth of the crack and the comparison experiment results, where the extracted crack is represented by white pixels, and the background is represented by black pixels. It can be seen from Figure 9 that the segmentation of cracks of FCN [40] is the most incomplete, especially for the slender cracks appearing in the datasets CrackTree200 [34], AEL [38] and CFD [39]. Compared with FCN [40], the result of RCF [41] is slightly better, but the segmentation results contain a lot of noise, and the boundary of the segmented crack is also fuzzy. The segmented cracks by FPHBN [12] and HED [11] are thicker than the ground truth, and the extracted slender cracks are incomplete, that is, an intact crack is divided into broken parts. DAUNet [18] and SPLAC U-Net [23] can accurately locate cracks; however, the details of extracted cracks are not relatively obvious. As shown in the detection results from columns 3 to 9 of the fifth row in Figure 9, although all the comparison methods detect the interference object as a crack to some extent, the false detection ratio of the proposed method is the minimum. Although the crack segmentation result of the proposed method contains some noise and interference objects, it is the most competitive compared with other methods.
In order to verify the good segmentation performance for the cracks with an abruptly changed shape, Figure 10 shows the three images randomly selected from the CRACK500 [36], CFD [39] and GAPs384 [37] datasets and the segmentation results of the proposed method in this paper. The rectangular boxes in the left image and right image, respectively, represent the selected crack region in the original image and the extracted crack (d) RCF [41]; (e) HED [11]; (f) FPHBN [12]; (g) DAUNet [18]; (h) SPLAC U-Net [23]; (i) our method.
It can be seen from Figure 9 that the segmentation of cracks of FCN [40] is the most incomplete, especially for the slender cracks appearing in the datasets CrackTree200 [34], AEL [38] and CFD [39]. Compared with FCN [40], the result of RCF [41] is slightly better, but the segmentation results contain a lot of noise, and the boundary of the segmented crack is also fuzzy. The segmented cracks by FPHBN [12] and HED [11] are thicker than the ground truth, and the extracted slender cracks are incomplete, that is, an intact crack is divided into broken parts. DAUNet [18] and SPLAC U-Net [23] can accurately locate cracks; however, the details of extracted cracks are not relatively obvious. As shown in the detection results from columns 3 to 9 of the fifth row in Figure 9, although all the comparison methods detect the interference object as a crack to some extent, the false detection ratio of the proposed method is the minimum. Although the crack segmentation result of the proposed method contains some noise and interference objects, it is the most competitive compared with other methods.
In order to verify the good segmentation performance for the cracks with an abruptly changed shape, Figure 10 shows the three images randomly selected from the CRACK500 [36], CFD [39] and GAPs384 [37] datasets and the segmentation results of the proposed method in this paper. The rectangular boxes in the left image and right image, respectively, represent the selected crack region in the original image and the extracted crack region in the segmentation image. It can be seen from the Figure 10 that the proposed method can easily identify the cracks at the turning, thinning and thickening points. When the lighting conditions change during the shooting of the crac ent parts of the same crack may show different colors and brightness valu target visually easily misclassified as different objects. Under polarized discrimination of the depth features extracted by U-Net network would which inevitably leads to the decline in target segmentation performance. bias-light phenomenon also brings some challenges to manual labeling. In verify the good crack segmentation performance of the proposed metho bias-light, three crack images with an obvious light change, a shadow cha low illumination were selected, respectively, for comparison experim shows the original crack images and the segmented cracks by FCN [40], [11], FPHBN [12], DAUNet [18], SPLAC U-Net [23] and our method, r shown in Figure 11, the segmentation results of FCN [40], RCF [41] and HE a lot of noise pollution, resulting in poor segmentation performance. Com [40], RCF [41], HED [11] and FPHBN [12], the segmentation performance and SPLAC U-Net [23] was better. However, as shown in the sixth and se of the second row of Figure 11, cracks could hardly be detected by thes When the lighting conditions change during the shooting of the crack image, different parts of the same crack may show different colors and brightness values, making the target visually easily misclassified as different objects. Under polarized light, the target discrimination of the depth features extracted by U-Net network would become worse, which inevitably leads to the decline in target segmentation performance. In addition, the bias-light phenomenon also brings some challenges to manual labeling. In order to further verify the good crack segmentation performance of the proposed method in the case of bias-light, three crack images with an obvious light change, a shadow change and under low illumination were selected, respectively, for comparison experiments. Figure 11 shows the original crack images and the segmented cracks by FCN [40], RCF [41], HED [11], FPHBN [12], DAUNet [18], SPLAC U-Net [23] and our method, respectively. As shown in Figure 11, the segmentation results of FCN [40], RCF [41] and HED [11] involved a lot of noise pollution, resulting in poor segmentation performance. Compared to FCN [40], RCF [41], HED [11] and FPHBN [12], the segmentation performance of DAUNet [18] and SPLAC U-Net [23] was better. However, as shown in the sixth and seventh columns of the second row of Figure 11, cracks could hardly be detected by these two methods under low illumination. Although there were a few false alarms and missed detection, the proposed method still achieved the best crack segmentation performance under a variety of conditions. The excellent performance benefitted from the use of white top-hat transformation and black bottom-hat transformation to eliminate the influence of polarized light and enhance the features of the crack.  (c) RCF [41]; (d) HED [11]; (e) FPHBN [12]; (f) DAUNet [18]; (g) SPLAC U-Net [23]; (h) our method.

Comparison and Analysis of Objective Results
In order to verify the good performance of the proposed method on each dataset, Tables 2-6 show the evaluation and comparison results of the ODS, OIS, AIU, sODS and sOIS indicators of some algorithms on the five datasets. From the results in these tables, it can be seen that the FCN [40] had the worst performance on the five datasets, followed by RCF [41]. Both HED [11] and FPHBN [12] had a similar performance, which had a certain performance improvement compared with FCN [40] and RCF [41]. DAUNet [18] and SPLAC U-Net [23] had some improvements compared with the four other networks; however, our method showed absolute advantages in the five datasets. The average ODS, OIS, AIU, sODS and sOIS of the proposed method on the five crack datasets were 75.7%, 73.9%, 36.4%, 52.4% and 52.2%, respectively.  (c) RCF [41]; (d) HED [11]; (e) FPHBN [12]; (f) DAUNet [18]; (g) SPLAC U-Net [23]; (h) our method.

Comparison and Analysis of Objective Results
In order to verify the good performance of the proposed method on each dataset, Tables 2-6 show the evaluation and comparison results of the ODS, OIS, AIU, sODS and sOIS indicators of some algorithms on the five datasets. From the results in these tables, it can be seen that the FCN [40] had the worst performance on the five datasets, followed by RCF [41]. Both HED [11] and FPHBN [12] had a similar performance, which had a certain performance improvement compared with FCN [40] and RCF [41]. DAUNet [18] and SPLAC U-Net [23] had some improvements compared with the four other networks; however, our method showed absolute advantages in the five datasets. The average ODS, OIS, AIU, sODS and sOIS of the proposed method on the five crack datasets were 75.7%, 73.9%, 36.4%, 52.4% and 52.2%, respectively.

Comparison and Analysis of Ablation Experiments
Ablation experiments were performed to further verify the contribution of each module to the segmentation performance. Figure 12a-f show the original crack image, the ground truth and the segmentation results using only the single-objective loss function, the multi-objective loss function, only the morphological module and a combination of the multi-loss function and the morphological module, respectively. It can be seen from Figure 12 and Table 7 that the segmentation result using only the single-objective loss function was the worst, followed by multi-loss function. The segmentation result of using the morphological module and single-loss function was better, and the segmentation result of the combination of the multi-loss function and the morphological module was the best. Compared with the basic U-Net segmentation network, the segmentation performance indicators ODS, OIS, AIU, sODS and sOIS of our method were increased by 19.8%, 15.2%, 20.9%, 26.6% and 26.0%, respectively.

Comparison and Analysis of Computational Cost
Normally, the FLOPs (floating point operations) and the PARAMS (parameter amount of the neural network) are used to measure the complexity and size of the neural network model, respectively. The FLOPs and PARAMS of our method were about 65.23G and 32.45M, respectively. Compared to the U-Net backbone network, the FLOPs and PARAMS of the combination of the morphological-processing network and the multi-loss function module were only increased by 20.31G and 1.41M, respectively.
We further investigated the execution performance of our crack detection method. Experimental tests were performed on NVIDIA GeForce RTX 2080 SUPER. The crack segmentation ran on average at 3.0fps, and the computational costs included reading crack images, image preprocessing, feature extraction and fusion and crack segmentation.

Conclusions
A novel crack segmentation method using a combination of the U-Net backbone network, a morphological-processing network and multi-loss function was proposed in this paper. Aiming at the problem that the existing depth network models lack an effective processing mechanism to eliminate the influence of polarized light, which leads to inaccurate crack segmentation, this paper has designed a morphological-processing network module composed of a white-top hat transformation and a black-bottom hat transformation. This module can correct the influence of uneven lighting and effectively extract the bright cracks on a dark background and the dark cracks on a bright background. In order to avoid the problem of too thin or too thick cracks segmented on a single scale, this paper extracts and fuses the cracks on each scale to obtain more accurate cracks. In addition, a multi-loss function was designed to enhance the robustness of the network model. The experimental results on five crack datasets prove that the proposed algorithm has absolute superiority over several classical crack segmentation algorithms.  We further investigated the execution performance of our crack detection method. Experimental tests were performed on NVIDIA GeForce RTX 2080 SUPER. The crack segmentation ran on average at 3.0 fps, and the computational costs included reading crack images, image preprocessing, feature extraction and fusion and crack segmentation.

Conclusions
A novel crack segmentation method using a combination of the U-Net backbone network, a morphological-processing network and multi-loss function was proposed in this paper. Aiming at the problem that the existing depth network models lack an effective processing mechanism to eliminate the influence of polarized light, which leads to inaccurate crack segmentation, this paper has designed a morphological-processing network module composed of a white-top hat transformation and a black-bottom hat transformation. This module can correct the influence of uneven lighting and effectively extract the bright cracks on a dark background and the dark cracks on a bright background. In order to avoid the problem of too thin or too thick cracks segmented on a single scale, this paper extracts and fuses the cracks on each scale to obtain more accurate cracks. In addition, a multi-loss function was designed to enhance the robustness of the network model. The experimental results on five crack datasets prove that the proposed algorithm has absolute superiority over several classical crack segmentation algorithms.