Automatic Pixel-Level Pavement Crack Recognition Using a Deep Feature Aggregation Segmentation Network with a scSE Attention Mechanism Module

Pavement crack detection is essential for safe driving. The traditional manual crack detection method is highly subjective and time-consuming. Hence, an automatic pavement crack detection system is needed to facilitate this progress. However, this is still a challenging task due to the complex topology and large noise interference of crack images. Recently, although deep learning-based technologies have achieved breakthrough progress in crack detection, there are still some challenges, such as large parameters and low detection efficiency. Besides, most deep learning-based crack detection algorithms find it difficult to establish good balance between detection accuracy and detection speed. Inspired by the latest deep learning technology in the field of image processing, this paper proposes a novel crack detection algorithm based on the deep feature aggregation network with the spatial-channel squeeze & excitation (scSE) attention mechanism module, which calls CrackDFANet. Firstly, we cut the collected crack images into 512 × 512 pixel image blocks to establish a crack dataset. Then through iterative optimization on the training and validation sets, we obtained a crack detection model with good robustness. Finally, the CrackDFANet model verified on a total of 3516 images in five datasets with different sizes and containing different noise interferences. Experimental results show that the trained CrackDFANet has strong anti-interference ability, and has better robustness and generalization ability under the interference of light interference, parking line, water stains, plant disturbance, oil stains, and shadow conditions. Furthermore, the CrackDFANet is found to be better than other state-of-the-art algorithms with more accurate detection effect and faster detection speed. Meanwhile, our algorithm model parameters and error rates are significantly reduced.


Introduction
As an important part of the transportation hub, highways not only bear the heavy responsibility of transporting goods, but also concerns for the safety of transport personnel. However, due to natural or human factors, highways suffer from various damages. Among all kinds of pavement damage, cracks re the most common type and an early form of damage, which poses a latent threat to highway safety. To keep the pavement in good condition, it is an important task for the transportation and maintenance department to locate and repair the cracks in a timely way [1].
Manual crack detection methods are utterly dependent on the knowledge and experience of the inspection personnel, and these methods are very subjective, time-consuming, and labor-intensive. In addition, there are other crack detection techniques based on traditional image algorithms, such as edge detection [2,3] and image processing [4][5][6]. Although For pixel-level crack detection tasks, there are only two classes: cracks and non-cracks. When crack samples are seriously insufficient in the training set, the model may incorrectly classify part of the crack samples as non-crack samples, thereby reducing the accuracy of the model. Solutions can be summarized into three categories: data-level methods, algorithm-level methods and hybrid methods [17]. To address the problem, algorithmlevel methods are applied in this work. These methods can modify the algorithm to consider category penalty and weight, or modify the decision threshold to reduce the deviation of negative category samples. This paper use the focal loss to solve the class imbalance problem [18]. The main contributions of this paper can be summarized as follows: (1) This paper remodified the network architecture of DFANet, and integrated the scSE attention mechanism behind each encoder module, and the novel pavement crack detection algorithm calls CrackDFANet, which can make full use of the multi-scale receiving field to refine the crack detection results and extract the high-level features and low-level details of the crack. This allow us to achieve better results between detection speed and detection accuracy. (2) The scSE attention mechanism was integrated to the DFANet, a set of weight coefficients are learned independently through the network, and the mechanism of "dynamic weighting" to emphasize the crack region of interest while suppressing the irrelevant background region. And the attention module can correlate the global information of the crack. In this way, it effectively improves the detection efficiency of the model and reduces the computational cost. (3) The focal loss function can decrease the weight of easy-to-classify samples, and the model can focus on hard-to-classify samples during training. This paper uses the focal loss to solve the category imbalance problem.
The rest of the work is arranged as follows. Firstly, Section 2 reviews the common crack detection algorithms. Then Section 3 introduces the production process of the datasets of this paper and four common public datasets. Next, Section 4 detailly introduces the model of crack detection algorithm proposed in this paper. Section 5 carries out experimental verification and analysis of the proposed crack detection algorithm. Section 6 calculates the error rate. Finally, Section 7 summarizes the main work and conclusions of the paper. For pixel-level crack detection tasks, there are only two classes: cracks and non-cracks. When crack samples are seriously insufficient in the training set, the model may incorrectly classify part of the crack samples as non-crack samples, thereby reducing the accuracy of the model. Solutions can be summarized into three categories: data-level methods, algorithm-level methods and hybrid methods [17]. To address the problem, algorithm-level methods are applied in this work. These methods can modify the algorithm to consider category penalty and weight, or modify the decision threshold to reduce the deviation of negative category samples. This paper use the focal loss to solve the class imbalance problem [18]. The main contributions of this paper can be summarized as follows: (1) This paper remodified the network architecture of DFANet, and integrated the scSE attention mechanism behind each encoder module, and the novel pavement crack detection algorithm calls CrackDFANet, which can make full use of the multi-scale receiving field to refine the crack detection results and extract the high-level features and low-level details of the crack. This allow us to achieve better results between detection speed and detection accuracy. (2) The scSE attention mechanism was integrated to the DFANet, a set of weight coefficients are learned independently through the network, and the mechanism of "dynamic weighting" to emphasize the crack region of interest while suppressing the irrelevant background region. And the attention module can correlate the global information of the crack. In this way, it effectively improves the detection efficiency of the model and reduces the computational cost. (3) The focal loss function can decrease the weight of easy-to-classify samples, and the model can focus on hard-to-classify samples during training. This paper uses the focal loss to solve the category imbalance problem.
The rest of the work is arranged as follows. Firstly, Section 2 reviews the common crack detection algorithms. Then Section 3 introduces the production process of the datasets of this paper and four common public datasets. Next, Section 4 detailly introduces the model of crack detection algorithm proposed in this paper. Section 5 carries out experimental verification and analysis of the proposed crack detection algorithm. Section 6 calculates the error rate. Finally, Section 7 summarizes the main work and conclusions of the paper.

Related Work
The pavement crack detection is essential for safe driving and transportation. For a long time, crack detection has received extensive concern by the academic and engineering circles in the world, and it has obtained some good research results. In this section, we will review and summarize some typical crack detection algorithms.
In this paper, the crack detection methods based on the non-deep learning technologies are called the traditional crack detection methods. These methods can classify into five types: (1) wavelet transform-based methods, (2) image thresholding-based methods, (3) traditional machine learning-based methods, (4) edge detection-based methods, (5) minimal path-based methods. These methods are successfully used for the crack detection, such as Gabor filters [19], histogram of oriented gradient (HOG) [20], local binary pattern (LBP) [21], and minimal path selection (MPS) [22]. Although these algorithms achieve better results than manual detection, it is essential to note that they are insufficient to differentiate between cracks and complex backgrounds in low-level images.
Recently, some deep learning technologies have been successfully used for crack detection and have achieved better detection results, such as object detection, image classification and image segmentation. The accuracy attained by the convolutional neural network (CNN) based networks has extremely surpassed the accuracy achieved by traditional crack detection methods [23][24][25]. Reference [26] used the object detection method to detect the location of cracks in pavement images. References [27,28] divided pavement crack images into smaller image blocks by using the road grid or sliding windows, and then used a CNN to recognize whether the image block includes a crack or not. In [29], the shallow CNN-based architecture for concrete crack detection was proposed. This model enabled the employment of deep learning algorithms using low-power computational devices for a hassle-free monitoring of civil structures. Le et al. proposed a crack detection algorithm based on a DL convolutional neural network, which achieved a good crack classification effect [30]. Ali et al. proposed a customized convolutional neural network for crack detection in concrete structures, which can achieve higher crack detection accuracy [31]. In [32], a two-stage data enhancement method was used to construct a CNN-based crack detector. The results show that the crack detector with two-stage data enhancement training on a small dataset has a better crack detection effect.
Although the methods above can accurately locate cracks, they cannot detect cracks at a pixel-level. For this reason, Huang et al. used the FCN for the crack detection and achieved high precision [33]. Zou et al. proposed an end-to-end deep convolutional neural network (DeepCrack) to realize the automatic detection of cracks by learning highlevel characteristics of cracks. The method incorporates features of the multi-scale deep convolution learned in the hierarchical convolution stage to capture linear structures, which can achieve better detection results [34]. In reference [35], a lightweight end-to-end pixel by pixel classification network (SegNet) was used to detect cracks. In this work, SegNet used max-pooling indexes calculated in the pooling step of the encoder to implement non-linear up-sampling in the matching decoder, which doesn't need to learn in the upsample. The trained SegNet model can segment crack images in any size image with the help of sliding window scanning technology. Reference [36] showed that deeper backbone networks in FCN models and skip connections in U-Net both improved the performance of the crack detection. Li et al. combined generative adversarial network (GAN) with fully convolutional DenseNet (FC-DenseNet), and proposed a pavement crack detection method based on adversarial learning semi-supervised semantic segmentation. The improved algorithm can use a small number of labeled datasets to train a crack detection model with better robustness. However, when detecting relatively thin cracks, the effect is still not very satisfactory [37]. In [38], a block crack detection method was proposed, which divided the input image into block units, judged whether there were cracks in each block through the classifier, and then segmented the cracks from the classified blocks. This method can recognize the images of different sizes of cracks. Most deep learning-based crack detection algorithms are not effective when detecting cracks with uneven intensity, complex noise interference, complex topology, etc. Meanwhile, due to the characteristics of the network themselves, the number of parameters is relatively large, and it is difficult to achieve a balance between crack detection speed and crack detection performance.

Data Collection
This section introduces the production process of datasets. To show that the algorithm proposed in this paper has good generalization ability, four public datasets are selected for the experimental analysis.

Data Acquisition Process
In the two types of asphalt and concrete pavement, this paper selects sunny and cloudy weather conditions to take pavement crack images at different times of the day. Compared with the concrete pavement, the asphalt pavement is rougher and the background is more complex. As shown in Figure 2, an FDR-AX60 camera (SONY, Shanghai, China) was used to take 1000 crack images with a resolution of 3840 × 2160 pixels on five highways. Then the images captured by the camera were encoded and compressed through the image acquisition card, and stored in the memory of the computer to complete the collection of crack images. Most deep learning-based crack detection algorithms are not effective when detecting cracks with uneven intensity, complex noise interference, complex topology, etc. Meanwhile, due to the characteristics of the network themselves, the number of parameters is relatively large, and it is difficult to achieve a balance between crack detection speed and crack detection performance.

Data Collection
This section introduces the production process of datasets. To show that the algorithm proposed in this paper has good generalization ability, four public datasets are selected for the experimental analysis.

Data Acquisition Process
In the two types of asphalt and concrete pavement, this paper selects sunny and cloudy weather conditions to take pavement crack images at different times of the day. Compared with the concrete pavement, the asphalt pavement is rougher and the background is more complex. As shown in Figure 2, an FDR-AX60 camera (SONY, Shanghai, China) was used to take 1000 crack images with a resolution of 3840  2160 pixels on five highways. Then the images captured by the camera were encoded and compressed through the image acquisition card, and stored in the memory of the computer to complete the collection of crack images.  To ensure the diversity of crack images in the dataset, the crack datasets containing water stains, oil stains, parking lines, plant interference, pseudo cracks, etc. are collected at different lighting stages of the day. These crack images containing noise become training samples for network models, which can improve the robustness of the crack detection algorithm.

Data Augmentation and Annotation
The number of crack images collected in this paper is relatively small and the size is relatively large, it is not conducive to the training of the model. In addition, the insufficient number of datasets is likely to cause overfitting in the model training. Therefore, it is necessary to use the image preprocessing to enhance the collected crack images. Through the image enhancement, the number of images can be increased. The image enhancement methods in this paper include a clockwise rotation of 90° and 180°, horizontal flip, and To ensure the diversity of crack images in the dataset, the crack datasets containing water stains, oil stains, parking lines, plant interference, pseudo cracks, etc. are collected at different lighting stages of the day. These crack images containing noise become training samples for network models, which can improve the robustness of the crack detection algorithm.

Data Augmentation and Annotation
The number of crack images collected in this paper is relatively small and the size is relatively large, it is not conducive to the training of the model. In addition, the insufficient number of datasets is likely to cause overfitting in the model training. Therefore, it is necessary to use the image preprocessing to enhance the collected crack images. Through the image enhancement, the number of images can be increased. The image enhancement methods in this paper include a clockwise rotation of 90 • and 180 • , horizontal flip, and random color jittering. Then randomly cut into the dataset of 512 × 512 pixels for model training. No data augmentation is used during the testing.
In addition, this paper used the open-sourced deep learning labeling tool LabelMe [39] to manually label the crack images at the pixel level. In the manual marking process, this paper classified and marked the cracks in the image, and then saved them in the x.json file containing the image itself and related annotations. The information in the x.json file was converted into label images and label visualization, and then converted into binary label images. This paper repeated the above labeling process, and finally used the Python program to generate the corresponding label images in batches. Partial image annotation results are shown in Figure 3.  [39] to manually label the crack images at the pixel level. In the manual marking process, this paper classified and marked the cracks in the image, and then saved them in the x.json file containing the image itself and related annotations. The information in the x.json file was converted into label images and label visualization, and then converted into binary label images. This paper repeated the above labeling process, and finally used the Python program to generate the corresponding label images in batches. Partial image annotation results are shown in Figure 3. As shown in Table 1, this paper selects 3320 crack images from our crack datasets for the training and verification of the deep learning model. Crack images include asphalt cracks and concrete cracks, and this dataset is more extensive. The training set, validation set, and test set are divided at a ratio of 6: 2: 2, of which 2000 images in the training set and 660 images in the verification set are used to train and optimize the model, and 660 images in the test set are used to verify the effect of the model.

Public Datasets
In order to further verify the effect of the CrackDFANet model on the public pavement crack dataset and reflect the generalization ability of the model, this paper selects four public pavement crack datasets for the experimental verification. The labels of all public datasets use the original label images. The characteristics of the dataset are shown in Table 2.

Datasets
Image Size (Pixels) Number of Images GAPs384 [40] 1920 × 1080 1969 Crack500 [41] 2000 × 1500 500 AigleRN [22] 768 × 512 269 CFD [1] 480 × 320 118 As shown in Table 1, this paper selects 3320 crack images from our crack datasets for the training and verification of the deep learning model. Crack images include asphalt cracks and concrete cracks, and this dataset is more extensive. The training set, validation set, and test set are divided at a ratio of 6 : 2 : 2, of which 2000 images in the training set and 660 images in the verification set are used to train and optimize the model, and 660 images in the test set are used to verify the effect of the model.

Public Datasets
In order to further verify the effect of the CrackDFANet model on the public pavement crack dataset and reflect the generalization ability of the model, this paper selects four public pavement crack datasets for the experimental verification. The labels of all public datasets use the original label images. The characteristics of the dataset are shown in Table 2. Table 2. The characteristics of the public crack dataset.

Methods
For the purpose of obtaining more detailed characteristics of the input images to get better crack detection results in the output, we should acquire multi-layer semantic information through different layers of the convolution operation. By this means, both high-level features and low-level details can be maintained. This paper uses the concept of DFANet [16] as the basic network backbone for the pavement crack detection. Furthermore, to enhance important information features, while suppressing unimportant information features in space and channels, and effectively improve the accuracy of crack detection, we integrated the scSE module to optimize the model. Thus, the proposed network structure is called CrackDFANet. In the work of DFANet, the authors used ImageNet pretrained network model to achieve comprehensive feature extraction. Since our crack detection is a pixel-level binary classification task, and we have enough training dataset including all kinds of noise interference, so the proposed CrackDFANet does not need pretrained network models. This section first introduces the CrackDFANet in detail, and then introduces the loss function and the training process of the model.

Overall Architecture of the Proposed Method
In this work, the crack detection is described as a binary classification task at pixel-level. Given a crack image, the designed method will generate a crack prediction image. Figure 4 shows the structure of the proposed CrackDFANet. The model structure is explained in detail below.

Methods
For the purpose of obtaining more detailed characteristics of the input images to get better crack detection results in the output, we should acquire multi-layer semantic information through different layers of the convolution operation. By this means, both highlevel features and low-level details can be maintained. This paper uses the concept of DFANet [16] as the basic network backbone for the pavement crack detection. Furthermore, to enhance important information features, while suppressing unimportant information features in space and channels, and effectively improve the accuracy of crack detection, we integrated the scSE module to optimize the model. Thus, the proposed network structure is called CrackDFANet. In the work of DFANet, the authors used ImageNet pretrained network model to achieve comprehensive feature extraction. Since our crack detection is a pixel-level binary classification task, and we have enough training dataset including all kinds of noise interference, so the proposed CrackDFANet does not need pretrained network models. This section first introduces the CrackDFANet in detail, and then introduces the loss function and the training process of the model.

Overall Architecture of the Proposed Method
In this work, the crack detection is described as a binary classification task at pixellevel. Given a crack image, the designed method will generate a crack prediction image. Figure 4 shows the structure of the proposed CrackDFANet. The model structure is explained in detail below.  As shown in Figure 4, the CrackDFANet consists of five parts: (1) the lightweight backbone network, (2) the scSE attention mechanism module, (3) the sub-network aggregation module, (4) the sub-stage aggregation module, (5) the dual-path decoder. It has been proved that the deepwise separable convolution (a depthwise convolution followed by a pointwise convolution) is one of the most effective methods for real-time reasoning. Compared with the traditional convolution, the deepwise separable convolution reduces As shown in Figure 4, the CrackDFANet consists of five parts: (1) the lightweight backbone network, (2) the scSE attention mechanism module, (3) the sub-network aggregation module, (4) the sub-stage aggregation module, (5) the dual-path decoder. It has been proved that the deepwise separable convolution (a depthwise convolution followed by a pointwise convolution) is one of the most effective methods for real-time reasoning. Compared with the traditional convolution, the deepwise separable convolution reduces the computation cost and the number of parameters while maintaining similar (or slightly better) performance. We alter the Xception network [42] to the backbone framework. Depthwise separable convolution is used in the Xception network, which can reduce the total amount of calculations and parameters while maintaining good performance. To pursue higher accuracy, we fuse the scSE attention mechanism module to recalibrate the output characteristics of each encoder block in both the spatial and the channel aspects. The sub-network aggregation module is centered on up-sampling the inputs from the previous backbone to the next backbone on the high-level feature graphs to refine the crack prediction results. From another point of view, the sub-network aggregation module can be viewed as a process of pixel classification from rough to fine. The sub-stage aggregation module is performed by assembling feature representations between the matching phases through "rough" part and "fine" part, which provides the receptive domain and highdimensional framework details by connecting the layers with the same dimension. After these modules, a dual-path decoder made up of deconvolution and bilinear up-sampling operations is used to connect the outputs of each phase to obtain the crack detection results from rough to fine. The following describes the network structure in more detail: The primary backbone is the lightweight Xception network, which makes some changes to crack detection tasks. In the first convolution, this paper only uses ordinary convolution with a convolution kernel of 3 × 3, and other convolutions use the deepwise separable convolution with a convolution kernel of 3 × 3. Table 3 summarizes the detailed architecture of the Xception model. For crack detection, it is not only necessary to provide dense feature representation, but also how to obtain effective semantic context information is also a problem. For this reason, we fuse the scSE attention mechanism module to simultaneously recalibrate the output characteristics of each code block in terms of space and channel, which can effectively improve the accuracy of crack detection. Table 3. The detailed architecture of the Xception model (3 × 3 means a depthwise separable convolution except "conv1". In "conv1" stage, we only implement a 3 × 3 convolution layer).

Stage
Kernel Size Padding Stride As shown in Figure 5, the essence of scSE module is the superposition of two attention mechanisms. After each backbone output U of the encoder enters the scSE module, it will enter two branches, the upper branch is called sSE branch. Through a 1 × 1 convolution operation, the branch obtains a weight matrix with the same length and width as the U. the matrix is multiplied with U to obtain the feature U sSE that is recalibrated in space. The following branch is called cSE branch. This branch first passes a maximum pooling operation to obtain a weight matrix with the same number of channels as the U. This matrix passes through two fully connected (FC) layers, the number of neurons in the first FC layer is half the number of channels (to reduce computational complexity). The number of the second FC layer is equal to the number of channels. After the activation function, the nonlinearity is added, and the matrix restored to the number of channels through the second FC layer is multiplied by U to obtain the characteristic U cSE recalibrated in the channel direction. Finally, merge the features recalibrated along the channel and the space

Sub-Network Aggregation Module
Sub-network aggregation accomplishes the association of high-level features at the network level. In this paper, our crack detection network architecture consists of a series of backbone networks, which provide the output of the prior backbone network to the next as input. From another point of view, sub-network aggregation can be viewed as a refining process. The backbone process is expressed as = ϕ( ), and the output of encoder ϕ is the input of the encoder ϕ +1 . Therefore, the sub-network aggregate process can be expressed as:

Sub-Stage Aggregation Module
The focus of sub-stage aggregation is to fuse semantic and spatial information at the stage level among multiple networks. With the increase of network depth, the spatial details are gradually lost. Common methods, such as the U-shape, implement skipping connections to restore image details in the decoder module. Nevertheless, due to the lack of low-level features and spatial information in deeper encoder blocks, so they cannot judge a wide variety of objects and precise structural edges. The parallel branch design usages the original and reduced resolutions as input, and the output is the fusion of the results of large-scale and small-scale branches. However, this design lacks information interaction between parallel branches.
In this paper, sub-stage aggregation is used to combine features through the encoding period. Under the same depth of the sub-network, the different stages are fused. In more detail, the output of one stage in the prior sub-network contributes to the input of the matching stage position of the next sub-network.

Sub-Network Aggregation Module
Sub-network aggregation accomplishes the association of high-level features at the network level. In this paper, our crack detection network architecture consists of a series of backbone networks, which provide the output of the prior backbone network to the next as input. From another point of view, sub-network aggregation can be viewed as a refining process. The backbone process is expressed as y = φ(x), and the output of encoder φ n is the input of the encoder φ n+1 . Therefore, the sub-network aggregate process can be expressed as:

Sub-Stage Aggregation Module
The focus of sub-stage aggregation is to fuse semantic and spatial information at the stage level among multiple networks. With the increase of network depth, the spatial details are gradually lost. Common methods, such as the U-shape, implement skipping connections to restore image details in the decoder module. Nevertheless, due to the lack of low-level features and spatial information in deeper encoder blocks, so they cannot judge a wide variety of objects and precise structural edges. The parallel branch design usages the original and reduced resolutions as input, and the output is the fusion of the results of large-scale and small-scale branches. However, this design lacks information interaction between parallel branches.
In this paper, sub-stage aggregation is used to combine features through the encoding period. Under the same depth of the sub-network, the different stages are fused. In more detail, the output of one stage in the prior sub-network contributes to the input of the matching stage position of the next sub-network.
For a unitary backbone network φ n (x), a stage process can be defined as φ i n . The previous stage in the backbone network is φ i n−1 . i is the index of the stage. The sub-stage aggregation approach can be expressed as: where x i−1 n is coming from: In our algorithm, the stage aggregation approach is to learn the residual formula of x i−1 n , x i n−1 at the beginning of each stage. In case of n > 1, the input of the ith stage of the nth network is obtained by the output of the ith stage of the (n − 1)th network and the output of the (i − 1)th stage of the nth network, and then the ith stage learns a residual representation of x i−1 n , x i n−1 . x i−1 n and x i n−1 have the same resolution, and we achieve the fusion characteristics of concatenation operation.
We always keep the features flowing from high resolution to low resolution. Equation (2) not only learns the n − th feature maps, but also retains (n − 1) − th features and receptive field. Meanwhile, Information flows can be transmitted over multiple networks.

The Dual-Path Decoder
After the crack feature is extracted by the encoder network, the corresponding decoder network needs to be used to restore the feature map to a binary image with the same size as the input image, so as to obtain the detection result of the crack. To improve the real-time reasoning ability of the CrackDFANet model, the decoder network in this paper is designed as an efficient feature up-sampling module, which combines the crack features from the low-level and high-level of the image.
In the decoder network of CrackDFANet, this paper uses convolution and bilinear upsampling to directly fuse the high-level features and low-level details of the crack to form a simple decoder network. Since the encoder network of CrackDFANet is composed of three backbone networks, this paper first merges high-level features from the bottom of the three backbone networks, and then performs 4× bilinear up-sampling on these high-level features, and fuse them with the same spatial resolution. The low-level information of each backbone network is finally added with high-level features and low-level details, and 4× bilinear up-sampling is performed to obtain the final crack detection result. In the decoder module, to decrease the number of channels, a small number of convolution operations are used.

Loss Function
In the deep learning model, the loss function is the most crucial mathematic constituent part. The loss function is applied to measure the deviation between the predicted value and the true value [43]. Therefore, the loss function is usually defined as the objective function of the crack detection model optimization. What we should pay attention to is that compared with the non-crack pixels, crack pixels occupy a small part of the crack image. Hence, the definition of total loss may be affected by this class imbalance between the two categories of cracks and non-cracks. For this reason, this paper uses a novel loss function: focal loss (FL) solves the class imbalance problem [18]. The FL is modified on the basis of the standard cross-entropy loss. And the FL function can decrease the weight of easy-to-classify samples. Therefore, the model can center on hard-to-classify samples during training. The FL can be expressed as: where the weight factor α t = 0.25, (1 − p t ) γ represents the modulation coefficient. (1 − p t ) γ is added, the loss obtained by the sample with a large prediction probability will be reduced, and the loss of the sample with a small prediction probability will become large, thereby strengthening the attention to the positive sample. The p t is defined as: where y ∈ {0, 1} is the ground truth class, p ∈ {0, 1} is the model's estimated probability for label y = 1. t ∈ {0, 1} represents the category number, t = 0 denotes the background pixel, and t = 1 denotes the crack pixel. The properties of focal loss are as follows: (1) When an example is misclassified. At this time, p t is small, and then the modulation factor (1 − p t ) is close to 1, and the loss is not affected. At this time, the model enhances the focus on positive samples. When p t → 1 , the modulation factor (1 − p t ) is close to 0, which indicates that the classification is true and the samples are easy to classify at this time, and the modulation coefficient (1 − p t ) γ will approach 0, that is, the sample contributes little to the total loss. (2) The focusing parameter γ(γ ≥ 0) smoothly adjusts the proportion of the weight of the easy-to-divide samples to be reduced. Increasing γ can enhance the influence of modulation factors. γ = 2 in this paper.

Model Training and Optimization
To obtain a more robust crack segmentation model, this paper selects 2000 crack datasets, including water stains, oil stains, parking lines, plant disturbances, rough backgrounds, etc. as the training set, and 660 datasets as the validation set for the model training. These datasets contain asphalt crack images and concrete crack images. The training structure diagram is shown in Figure 6.
will be reduced, and the loss of the sample with a small prediction probability will becom large, thereby strengthening the attention to the positive sample. The is defined as: where ∈ {0,1} is the ground truth class, ∈ {0,1} is the model's estimated probabilit for label = 1. ∈ {0,1} represents the category number, = 0 denotes the backgroun pixel, and = 1 denotes the crack pixel. The properties of focal loss are as follows: (1) When an example is misclassified. At this time, is small, and then the modulatio factor (1 − ) is close to 1, and the loss is not affected. At this time, the model en hances the focus on positive samples. When → 1, the modulation factor (1 − is close to 0, which indicates that the classification is true and the samples are easy t classify at this time, and the modulation coefficient (1 − ) will approach 0, that i the sample contributes little to the total loss.
(2) The focusing parameter ( ≥ 0) smoothly adjusts the proportion of the weight o the easy-to-divide samples to be reduced. Increasing can enhance the influence o modulation factors. = 2 in this paper.

Model Training and Optimization
To obtain a more robust crack segmentation model, this paper selects 2000 crack da tasets, including water stains, oil stains, parking lines, plant disturbances, rough back grounds, etc. as the training set, and 660 datasets as the validation set for the model train ing. These datasets contain asphalt crack images and concrete crack images. The trainin structure diagram is shown in Figure 6. Meanwhile, this paper applies the Adam algorithm to optimize the model durin model training. The Adam algorithm can calculate the adaptive learning rate of differen parameters from the first-order and the second-order estimation of the loss function. I particular, the Adam algorithm combines the advantages of two stochastic gradient de scent algorithms. The first is the AdaGrad algorithm, which adopts a specific learning rat for each parameter to improve the performance of the spare gradient. The second is th Meanwhile, this paper applies the Adam algorithm to optimize the model during model training. The Adam algorithm can calculate the adaptive learning rate of different parameters from the first-order and the second-order estimation of the loss function. In particular, the Adam algorithm combines the advantages of two stochastic gradient descent algorithms. The first is the AdaGrad algorithm, which adopts a specific learning rate for each parameter to improve the performance of the spare gradient. The second is the RMSProp algorithm, which adaptively assigns a learning rate to each parameter based on the average of the closest magnitude of the weighted gradient [44].
During the model training process, the model is saved every 10 epochs, and then each saved model is tested, as shown in Figure 7. The experimental results show that the model has the best detection effect when it is trained for 600 epochs. After training, the model appears over-fitting phenomenon, and the results of crack model detection become worse. Therefore, this paper chooses to train 600 epochs.
Sensors 2021, 21, x FOR PEER REVIEW RMSProp algorithm, which adaptively assigns a learning rate to each parameter the average of the closest magnitude of the weighted gradient [44].
During the model training process, the model is saved every 10 epochs, and t saved model is tested, as shown in Figure 7. The experimental results show that th has the best detection effect when it is trained for 600 epochs. After training, th appears over-fitting phenomenon, and the results of crack model detection becom Therefore, this paper chooses to train 600 epochs.  As shown in Figure 8, through 600 epochs iteration, a real-time crack segm model with good robustness is obtained. In this time, the loss of model training co to 0.024.

Experiment and Analysis
The previous section introduced the crack detection algorithm proposed in per. This section will verify and discuss the proposed algorithm through expe Firstly, we introduce the implementation details of the proposed CrackDFANet. explain the evaluation criteria. Next, we introduce the compared methods. Fin analyze and discuss the experimental results.

Computation Platform
All algorithms in this paper used Python 3 for programming. The algorith As shown in Figure 8, through 600 epochs iteration, a real-time crack segmentation model with good robustness is obtained. In this time, the loss of model training converges to 0.024.
Sensors 2021, 21, x FOR PEER REVIEW 12 RMSProp algorithm, which adaptively assigns a learning rate to each parameter base the average of the closest magnitude of the weighted gradient [44]. During the model training process, the model is saved every 10 epochs, and then saved model is tested, as shown in Figure 7. The experimental results show that the m has the best detection effect when it is trained for 600 epochs. After training, the m appears over-fitting phenomenon, and the results of crack model detection become wo Therefore, this paper chooses to train 600 epochs.  As shown in Figure 8, through 600 epochs iteration, a real-time crack segmenta model with good robustness is obtained. In this time, the loss of model training conve to 0.024.

Experiment and Analysis
The previous section introduced the crack detection algorithm proposed in this per. This section will verify and discuss the proposed algorithm through experim Firstly, we introduce the implementation details of the proposed CrackDFANet. Then explain the evaluation criteria. Next, we introduce the compared methods. Finally analyze and discuss the experimental results.

Computation Platform
All algorithms in this paper used Python 3 for programming. The algorithms w implemented by using the deep learning framework Pytorch. The experimental softw environment is Windows 10 operating system, and the hardware environment is an In

Experiment and Analysis
The previous section introduced the crack detection algorithm proposed in this paper. This section will verify and discuss the proposed algorithm through experiments. Firstly, we introduce the implementation details of the proposed CrackDFANet. Then we explain the evaluation criteria. Next, we introduce the compared methods. Finally, we analyze and discuss the experimental results.

Computation Platform
All algorithms in this paper used Python 3 for programming. The algorithms were implemented by using the deep learning framework Pytorch. The experimental software environment is Windows 10 operating system, and the hardware environment is an Intel ® Quad-CoreTM i7-9750 CPU @ 3.6 GHz Processor, 16 GB RAM and an NVIDIA GeForce GTX 1080Ti 16 GB GPU.

Parameter Settings
The hyperparameters include: the momentum is set to 0.9, and the weight decay is set to 0.0001. The mini-batch stochastic gradient descent (MSGD) with a batch size of 8 is used to update the network parameters. As a general configuration, the "poly" learning rate strategy is applied where the initial rate is multiplied by 1 − iter max_iter power with power 0.9, and the base learning rate is set as 0.001 [16].

Evaluation Criteria
For the crack segmentation task, the generally applied evaluation indexes are: precision (P), recall (R), F1, accuracy (ACC) and mean intersection over union (MIoU). Crack pixels are defined as positive instances, while non-crack pixels are defined as false instances. According to the combination of ground truth and prediction, pixels are divided into four types: true positive (TP), false positive (FP), true negative (TN) and false negative (FN), as shown in Table 4. Then, P, R, F1, ACC and MIoU can be defined as follows:

Compared Methods
We compare the performance of CrackDFANet with FCN, SegNet, and U-Net crack detection algorithms. Meanwhile, to solve the problem of class imbalance, the loss function of all comparison methods in this paper chooses the focal loss. The following is a brief description of the methods compared: (1) FCN: We use FCN-8s to detect cracks by replacing the loss function with FL. The hyperparameter settings are as follows: the base learning rate is set to 1 × 10 −5 , the momentum is set to 0.99, and the weight decay is set to 0.0005. We train the FCN model on our dataset. (2) SegNet: This network can achieve end-to-end learning and crack segmentation by using an encoder network and its corresponding decoder network. Except that the basic learning rate is set to 1 × 10 −6 , the settings of other hyperparameters are the same as those in FCN. We train the SegNet model on our dataset.
(3) U-Net: This network uses skip-layer in the encoder-decoder network for crack segmentation. Except that the basic learning rate is set to 1 × 10 −3 , the settings of other hyperparameters are the same as those in FCN. We train the U-Net model on our dataset. (4) CrackDFANet: The CrackDFANet model is trained on our dataset.

Experiment Results and Discussion
To verify the crack detection effect of the trained and validated model described in Section 4, we compared and analyzed FCN, SegNet, and U-Net crack detection algorithms on our dataset, GAPs384 dataset, Crack500 dataset, AigleRN dataset and CFD dataset, respectively. The following is a comparative analysis of the experimental results on five datasets. In addition, to further compare and analyze the advantages of the proposed crack detection method, we have carried out experiments in some special cases, that is, light interference, water stains, oil pollution, and shadow, etc. During the test phase, the hyperparameter settings of all algorithms are the same as the training process.

Results on Our Dataset
On our dataset, we perform the crack detection effect verification. As shown in Figure 9, the crack detection results of five typical input images of this method and the comparison method are given. The first column is the original crack image, the second column is the label image, and the next four columns are the output images of the comparison detection algorithm. The first four rows of images are concrete cracks, the latter one is asphalt cracks. (3) U-Net: This network uses skip-layer in the encoder-decoder network for crack segmentation. Except that the basic learning rate is set to 1 × 10 −3 , the settings of other hyperparameters are the same as those in FCN. We train the U-Net model on our dataset. (4) CrackDFANet: The CrackDFANet model is trained on our dataset.

Experiment Results and Discussion
To verify the crack detection effect of the trained and validated model described in Section 4, we compared and analyzed FCN, SegNet, and U-Net crack detection algorithms on our dataset, GAPs384 dataset, Crack500 dataset, AigleRN dataset and CFD dataset, respectively. The following is a comparative analysis of the experimental results on five datasets. In addition, to further compare and analyze the advantages of the proposed crack detection method, we have carried out experiments in some special cases, that is, light interference, water stains, oil pollution, and shadow, etc. During the test phase, the hyperparameter settings of all algorithms are the same as the training process.

Results on Our Dataset
On our dataset, we perform the crack detection effect verification. As shown in Figure  9, the crack detection results of five typical input images of this method and the comparison method are given. The first column is the original crack image, the second column is the label image, and the next four columns are the output images of the comparison detection algorithm. The first four rows of images are concrete cracks, the latter one is asphalt cracks.

Original images
Label images FCN SegNet U-Net Ours Figure 9. The visualization of detection results of compared methods on our dataset (from left to right: original images, label images, FCN, SegNet, U-Net, our detection results). It can be seen from Figure 9 that all these algorithms can detect the crack region. However, for the thin crack and the crack image with noises, CrackDFANet obtains crack detection results close to the label image, while the other algorithms produce many false detections.
As shown in Table 5, we made a quantitative comparison of several deep learningbased crack detection algorithms. The MIoU of our crack segmentation algorithm is 0.8972, which is significantly higher than the other three crack segmentation algorithms. Meanwhile, our model parameter is 14.5 M, which is significantly reduced compared to other algorithms. The speed is an important factor in crack segmentation algorithms, and we try to compare crack segmentation models in the same state. In this paper, the speed of crack detection is measured by the frames per second (FPS). The detection speed of our crack segmentation algorithm reaches 64.5 FPS with 512 × 512 pixels. In other words, our algorithm takes 15.50 milliseconds to test an image. It can be seen that our algorithm has achieved a good balance between detection accuracy and detection speed. One way to obtain the model speed of the model is to summarize its calculations. We use floating point operations per second (FLOPs) to measure the computation speed of the model. From the experimental results in Table 5, it can be concluded that the calculation speed of our model is much faster than the speed of the other three comparison models.

Results on GAPs384 Dataset
To further verify the generalization ability of the CrackDFANet, we performed experimental verification on the public GAPs384 dataset (1920 × 1080 pixels). The crack images in this dataset are all asphalt cracks. Figure 10 shows the five most representative crack detection effects of different algorithms under the same conditions. As shown in Figure 10, the topological structure of the images in the GAPs384 dataset is complex, and noise interference is massive, which poses greater challenges to various crack detection algorithms. Under this kind of complicated noise interference, our algorithm still shows good detection performance, and FCN, SegNet, and U-Net crack detection algorithms are very willing to erroneously detect, such as judging parking lines, oil pollution, and manhole covers as crack regions. This is because our detection algorithm recognizes crack characteristics through the sub-network aggregation and the sub-stage aggregation. Sub-network aggregation can be regarded as a rough-to-fine process of the pixel classification. Sub-stage aggregation is mainly the fusion of semantic and spatial information at the stage level from multiple networks. Meanwhile, we fuse the scSE attention mechanism module to simultaneously recalibrate the output characteristics of each code block in terms of space and channel. In this way, the crack characteristics can be refined multiple times, thereby achieving better detection results.
In Table 6, we performed a quantitative comparison on the GAPs384 dataset. Our algorithm has an MIoU of 0.8723, and a detection speed is 8.2 FPS at 1920 × 1080 pixels, which is higher than the other three deep learning-based crack detection algorithms under the same conditions. As shown in Figure 10, the topological structure of the images in the GAPs384 dataset is complex, and noise interference is massive, which poses greater challenges to various crack detection algorithms. Under this kind of complicated noise interference, our algorithm still shows good detection performance, and FCN, SegNet, and U-Net crack detection algorithms are very willing to erroneously detect, such as judging parking lines, oil pollution, and manhole covers as crack regions. This is because our detection algorithm recognizes crack characteristics through the sub-network aggregation and the sub-stage aggregation. Sub-network aggregation can be regarded as a rough-to-fine process of the pixel classification. Sub-stage aggregation is mainly the fusion of semantic and spatial information at the stage level from multiple networks. Meanwhile, we fuse the scSE attention mechanism module to simultaneously recalibrate the output characteristics of each code block in terms of space and channel. In this way, the crack characteristics can be refined multiple times, thereby achieving better detection results.
In Table 6, we performed a quantitative comparison on the GAPs384 dataset. Our algorithm has an MIoU of 0.8723, and a detection speed is 8.2 FPS at 1920 × 1080 pixels, which is higher than the other three deep learning-based crack detection algorithms under the same conditions. In addition, to further compare the detection performance of the algorithms, we carried out experimental verification on a public Crack500 dataset with larger image size

Results on Crack500 Dataset
In addition, to further compare the detection performance of the algorithms, we carried out experimental verification on a public Crack500 dataset with larger image size (2000 × 1500 pixels). The crack images in this dataset are all asphalt cracks. Figure 11 shows five representative visual comparison results.
As shown in Figure 11, the texture of the cracks in the Crack500 dataset is complex, the contrast is low, and the characteristics of the cracks are not obvious. It is very apt to cause false detection. However, the experimental results show that in this complicated case, our crack detection algorithm still shows a good detection effect, in contrast, other compared algorithms have poor anti-interference ability, which shows that the CrackDFANet has better robustness and better detection effect under the same conditions.
To quantitatively compare the detection performance of various algorithms, we compute P, R, F1, ACC, MIoU and FPS on the Crack500 dataset. Table 7 shows the quantitative results. The results show that the CrackDFANet realizes best P, R, F1, ACC, MIoU and FPS and surpasses three other algorithms. This proves that the CrackDFANet has better generality than other comparison algorithms. Meanwhile, it shows that our detection algorithm improves the detection effect while increasing the detection speed. (2000 × 1500 pixels). The crack images in this dataset are all asphalt cracks. Figure 11 shows five representative visual comparison results. Figure 11. The visualization of detection results of compared methods on Crack500 dataset (from left to right: original images, label images, FCN, SegNet, U-Net, our detection results).

Original images Label images FCN SegNet U-Net Ours
As shown in Figure 11, the texture of the cracks in the Crack500 dataset is complex, the contrast is low, and the characteristics of the cracks are not obvious. It is very apt to cause false detection. However, the experimental results show that in this complicated case, our crack detection algorithm still shows a good detection effect, in contrast, other compared algorithms have poor anti-interference ability, which shows that the Crack-DFANet has better robustness and better detection effect under the same conditions.
To quantitatively compare the detection performance of various algorithms, we compute P, R, F1, ACC, MIoU and FPS on the Crack500 dataset. Table 7 shows the quantitative results. The results show that the CrackDFANet realizes best P, R, F1, ACC, MIoU and FPS and surpasses three other algorithms. This proves that the CrackDFANet has better generality than other comparison algorithms. Meanwhile, it shows that our detection algorithm improves the detection effect while increasing the detection speed. In this set of comparative experiments, we conduct experimental analysis on the public AigleRN dataset, the size of the dataset is 768 × 512 pixels. The crack images in this dataset are all asphalt cracks, and cracks are generally thin and have low contrast, which is a challenging task for all detection algorithms.

Results on AigleRN Dataset
In this set of comparative experiments, we conduct experimental analysis on the public AigleRN dataset, the size of the dataset is 768 × 512 pixels. The crack images in this dataset are all asphalt cracks, and cracks are generally thin and have low contrast, which is a challenging task for all detection algorithms.
The results of the AigleRN dataset are shown in Figure 12. Although FCN, SegNet, and U-Net can detect most the cracks in the image, but it still retains a lot of noises. In addition, due to the loss of information, some of the cracks detected are discontinuous. Meanwhile, they may hallucinate cracks that do not exist. Under the same conditions, the CrackDFANet can extract the edges of cracks in images, whether they contain some types of noise or not. This proves that our algorithm has better high interference capability and better detection effect.
The quantitative results of the experiment are shown in Table 8, CrackDFANet shows good results on most indicators. Specifically, the CrackDFANet demonstrates not only a higher detection accuracy but also faster detection speed, and it has better engineering application value.
CrackDFANet can extract the edges of cracks in images, whether they contain some types of noise or not. This proves that our algorithm has better high interference capability and better detection effect.
The quantitative results of the experiment are shown in Table 8, CrackDFANet shows good results on most indicators. Specifically, the CrackDFANet demonstrates not only a higher detection accuracy but also faster detection speed, and it has better engineering application value.

Original images
Label images FCN SegNet U-Net Ours Figure 12. The visualization of detection results of compared methods on AigleRN dataset (from left to right: original images, label images, FCN, SegNet, U-Net, our detection results).

Results on CFD Dataset
In this set of comparative experiments, we selected the public CFD dataset for experimental verification, the size of the dataset is 480 × 320 pixels. This dataset contains asphalt cracks and concrete cracks.

Results on CFD Dataset
In this set of comparative experiments, we selected the public CFD dataset for experimental verification, the size of the dataset is 480 × 320 pixels. This dataset contains asphalt cracks and concrete cracks.
To further verify the generalization ability of the method, four methods are tested on CFD dataset. Results are shown in Figure 13. The noise interference of the CFD dataset is relatively small, and all methods can roughly detect the contour of the crack. It can be intuitively observed that our method is superior to other methods. Three other comparisons of crack detection algorithms based on deep learning do not perform well on edge cracks and web cracks. They may hallucinate cracks that do not exist. Through comparative analysis, our proposed algorithm still achieves good detection results. The predicted image is basically close to the label image. takes 8.98 milliseconds to detect an image. The performance improvement is mainly du to the use of the sub-network aggregation and the sub-stage aggregation. In addition, w fuse the scSE attention mechanism module to the lightweight Xception network, whic can simultaneously recalibrate the output characteristics of each code block in terms o space and channel. In this way, the crack characteristics can be refined multiple time thereby achieving better detection results.

Original images
Label images FCN SegNet U-Net Ours Figure 13. The visualization of detection results of compared methods on CFD dataset (from left to right: original images, label images, FCN, SegNet, U-Net, our detection results).

Special Cases Discussion
To further compare and analyze the CrackDFANet and other deep learning-base several crack detection algorithms, we perform experiments in some special cases, i.e light interference, parking line, water stains, oil stains, and shadow, etc. Figure 14 show As shown in Table 9, compared with other crack detection methods, the CrackDFANet achieves the highest P, R, F1, ACC, MIoU, and FPS. The MIoU value of CrackDFANet reaches the highest 87.59%. The FPS of CrackDFANet is 111.3. In other words, it takes 8.98 milliseconds to detect an image. The performance improvement is mainly due to the use of the sub-network aggregation and the sub-stage aggregation. In addition, we fuse the scSE attention mechanism module to the lightweight Xception network, which can simultaneously recalibrate the output characteristics of each code block in terms of space and channel. In this way, the crack characteristics can be refined multiple times, thereby achieving better detection results.

Special Cases Discussion
To further compare and analyze the CrackDFANet and other deep learning-based several crack detection algorithms, we perform experiments in some special cases, i.e., light interference, parking line, water stains, oil stains, and shadow, etc. Figure 14 shows five representative results showing all methods. The second and third lines are asphalt crack images, and the rest are concrete crack images.

algorithms.
In addition, to further prove the performance of different methods, we draw the precision-recall curve (PR-curve) based on the precision and recall of various comparison methods on our dataset, GAPs384 dataset, Crack500 dataset, AigleRN dataset, and CFD dataset. As shown in Figure 15, CrackDFANet is the highest among other three comparison methods, which shows that the CrackDFANet has better crack detection effect and generalization ability.   As shown in Figure 14, under the interference of complex noise, real cracks are surrounded by disturbing fake cracks. In this case, because the patterns between cracks and pseudo-cracks are similar, which is a challenging problem for all crack detection algorithms. Compared to the other crack detection algorithms, the CrackDFANet can produce clearer and better results, regardless of whether they contain some noises or not. For crack only images, light interference images, parking line images, water stains, plant disturbance images, oil stains, and shadow images, the CrackDFANet can accurately extract crack edges, and it is best to eliminate different types of noise. This shows that the CrackDFANet model is more resistant to noise interferes than FCN, SegNet and U-Net crack detection algorithms.
In addition, to further prove the performance of different methods, we draw the precision-recall curve (PR-curve) based on the precision and recall of various comparison methods on our dataset, GAPs384 dataset, Crack500 dataset, AigleRN dataset, and CFD dataset. As shown in Figure 15, CrackDFANet is the highest among other three comparison methods, which shows that the CrackDFANet has better crack detection effect and generalization ability.

Error Rate
To compare the accuracy of different methods in crack detection, the error rate is introduced, that is, the ratio of the number of misclassified pixels to the total number of pixels in the image. The formula description is shown in Equation (11):

Error Rate
To compare the accuracy of different methods in crack detection, the error rate is introduced, that is, the ratio of the number of misclassified pixels to the total number of pixels in the image. The formula description is shown in Equation (11) where N error denotes the number of misclassification pixels, N total denotes the total pixel number of the images. The error rates of four algorithms are shown in Table 10. The error rate of the CrackDFANet model on average is 0.73%, while for FCN, SegNet, and U-Net, it is 5.75%, 6.35%, and 4.83%, respectively. That is to say, the error rate of the CrackDFANet is obviously lower than that of the other three comparison methods. The CrackDFANet recognizes crack characteristics through the sub-network aggregation and the sub-stage aggregation. In this way, the crack characteristics can be refined multiple times. In addition, to pursue higher accuracy, we fuse the scSE attention mechanism module to recalibrate the output characteristics of each encoder block in both the spatial and the channel aspects. Based on the above analysis, the CrackDFANet algorithm can significantly reduce the error rate.

Conclusions
In this work, the CrackDFANet is proposed for the pavement crack detection. The main work and conclusions are as follows: (1) Model validation performed on five different crack datasets, CrackDFANet records the MIoU of 0.8972 on our dataset. And the model processes in real-time (64.5 FPS) images with 512 × 512 pixels. At the same time, to further verify the generalization ability of the model, the model tests were performed on four public datasets: GAPs384, Crack500, AigleRN and CFD. Experimental results show that CrackDFANet achieves best P, R, F1, ACC, MIoU and FPS and surpasses FCN, SegNet, and U-Net crack detection algorithms. (2) The CrackDFANet can get a good balance between detection accuracy and detection speed. Meanwhile, the parameters of the model are greatly reduced. (3) Under the interference of light interference, parking line, water stains, plant disturbance, oil stains, and shadow conditions, the CrackDFANet has strong antiinterference ability, and has better robustness and generalization ability. (4) To compare the accuracy of different algorithms in the crack detection, we calculated the error rate for different images. The experimental results show that the average error rate of our algorithm is 0.73%, which is significantly better than the other three comparison algorithms.
In future research, we will build a wider crack dataset and test our crack detection algorithm on more datasets.