GGMNet: Pavement-Crack Detection Based on Global Context Awareness and Multi-Scale Fusion

: Accurate and comprehensive detection of pavement cracks is important for maintaining road quality and ensuring traffic safety. However, the complexity of road surfaces and the diversity of cracks make it difficult for existing methods to accomplish this challenging task. This paper proposes a novel network named the global graph multiscale network (GGMNet) for automated pixel-level detection of pavement cracks. The GGMNet network has several innovations compared with the mainstream road crack detection network: (1) a global contextual Res-block (GC-Resblock) is proposed to guide the network to emphasize the identities of cracks while suppressing background noises; (2) a graph pyramid pooling module (GPPM) is designed to aggregate the multi-scale features and capture the long-range dependencies of cracks; (3) a multi-scale features fusion module (MFF) is established to efficiently represent and deeply fuse multi-scale features. We carried out extensive experiments on three pavement crack datasets. These were DeepCrack dataset, with complex background noises; the CrackTree260 dataset, with various crack structures; and the Aerial Track Detection dataset, with a drone’s perspective. The experimental results demonstrate that GGMNet has excellent performance, high accuracy, and strong robustness. In conclusion, this paper provides support for accurate and timely road maintenance and has important reference values and enlightening implications for further linear feature extraction research.


Introduction
Roads serve as the foundation of contemporary economic growth and sustainability.The operational and structural status of roads are pivotal factors in shaping the economic landscape of a nation and are deemed essential criteria by the World Bank for assessing the competitiveness of national economies [1].However, due to traffic loads, construction defects, and environmental conditions, road surfaces are susceptible to a variety of defects, the most prevalent of which are cracks.If these defects are not professionally repaired promptly, they can seriously affect the driving quality and safety of traffic.Therefore, it has become imperative to monitor and assess pavement conditions more frequently.Nevertheless, the departments and agencies of pavement management face the challenges of traditional inspection based on manual detection, which is not only inefficient and costly, but also relies heavily on human subjective factors.When the inspection staff are fatigued, they tend to misidentify or overlook crack information during detection, which could significantly impact the accuracy of crack detection.Therefore, there is an urgent need for a high-efficiency and automated crack-detection method to support refined pavement maintenance operations.
For many years, crack detection has mostly relied on traditional image-processing methods, such as threshold segmentation [2][3][4][5], morphological method [6], wavelet transform [7,8], and artificial feature engineering [9,10].However, these traditional techniques have many drawbacks, resulting in pavement-crack detection algorithms relying on traditional image-processing techniques have not been effectively applied in engineering practice.For example, threshold segmentation is highly susceptible to road surface background information and expert knowledge [2], and the detection results depend largely on the set hyperparameters and can only extract crack information that is significantly different from the background lightness.Compared to threshold segmentation, although the semi-automatic method based on morphology maintains the structure of the cracks and enhances the quality of the segmented image, it is still difficult to completely extract crack features with complex structure, and it remains sensitive to noise information [6].The wavelet transform can effectively strike a balance between suppressing noise and portraying edge details [7], but it is less effective in extracting cracks with uneven signal strength.The artificial feature engineering techniques have improved the accuracy of crack extraction, and they are time-consuming and labor-intensive because of the requirement for manual feature design and exhibit poor robustness [9].In conclusion, although traditional image-processing techniques can detect cracks, these methods demonstrate poor robustness and low accuracy and encounter difficulties in meeting the demand for high-precision and fully automatic intelligent detection of pavement cracks.
Deep learning has exhibited exceptional performance in various tasks both upstream and downstream of computer vision in recent years, such as image-level classification [11], object-level detection [12], and pixel-level segmentation [13,14].Built upon these achievements, deep learning has been widely employed for pavement-crack detection and attained satisfactory results.Some researchers have embedded CNNs into edge detectors to enhance the accuracy of crack edge detection.For example, the HED [15] for extracting edges of images, consisting of CNNs combined with edge detectors, has achieved better results than have traditional edge detectors.The model named RCF [16], with deeper convolutional layers, obtains better performance on pavement-crack detection compared to HED.More recently, to achieve more robust results and superior performance, research on crack detection has increasingly shifted toward the end-to-end and pixel level.The FCN [17] introduced to crack detection effectively alleviates the issue of low efficiency in crack detection.The use of symmetrical encoder-decoder models, exemplified by U-Net [18] and SegNet [19], has led to further improvement in detection accuracy.Due to the complexity of road-surface backgrounds, segmentation models with larger receptive fields such as PSPNet [20], Deeplabv3+ [21], TransUNet [22], and SwinUNet [23] have been employed for crack detection.Additionally, some specialized deep neural networks for crack extraction have been designed.Zou et al. [24] constructed a multi-level fusion structure based on SegNet for crack segmentation.Liu et al. [25] proposed a network with CNNs and Transformer, which significantly enhanced the extraction effectiveness of cracks.Bai et al. [26] designed a dual-path crack extraction network, enhancing the ability to describe complex crack features.Zhang et al. [27] proposed a network based on deformable convolution to adapt to the morphology of cracks.However, the following issues are still common: (1) insufficient global contextual awareness-due to the diversity and complexity of road scenes, simple convolutions with inadequate global contextual awareness fail to capture the spatial correlations of crack features, resulting in significant impact from background noise on crack detection; (2) inadequate capability to integrate multiscale features-on account of the diversity of crack structures, the results from a single scale often fail to accurately and comprehensively represent crack information, limiting the performance of the model.
In addressing these aforementioned issues, this paper introduces a pixel-level crack detection network (GGMNet).The results from experiments conducted on three public datasets demonstrate the excellent performance, high accuracy, and strong robustness of GGMNet.Specifically, the DeepCrack dataset was utilized to evaluate the ability of GGM-Net in extracting cracks within complex scenarios, the CrackTree260 dataset was employed to demonstrate our model's proficiency in recognizing various types of crack information, and the Aerial Track Detection dataset was employed to assess the generalization capability of the GGMNet under different viewpoints.The primary contributions of this paper can be outlined as follows: (1) A novel network for pavement-crack detection with excellent accuracy and strong robustness was constructed.
(2) In the context of complex backgrounds, the GC-Resblock was constructed to guide the model to focus more on crack-information extraction.Specialized for intricate crack structures, the GPPM was innovatively designed to effectively aggregate features of various sizes and shapes.Moreover, the MFF was constructed to reduce the probability of missing detection.

Methods
This Section provides a comprehensive explanation of the proposed GGMNet architecture, the details of each model component, and the employed deep supervision training strategy.

Model Overview
The proposed GGMNet is presented in Figure 1 and comprises the encoder, decoder, and multi-scale feature fusion module.Firstly, the network takes crack images as input into the encoder, which utilizes GC-Resblock to obtain the local spatial and global contextual information of cracks.Continuously, GPPM processes features from the encoder using graph reasoning operators and a pooling pyramid structure, enriching them with higher-dimensional and higher-order feature representations.Finally, the decoder gradually restores crack features to the original resolution and employs MFF to integrate feature information from different levels, thereby obtaining output results with rich spatial and semantic information.

Global Contextual Res-Block
Realistic scenes of crack images frequently contain complex backgrounds, such as shadows, oil stains, debris, and garbage.If these types of noise are not suppressed, the performance of the model will be weakened.Therefore, a global contextual Res-block (GC-Resblock) is proposed as the basic module of the encoder to guide the network to emphasize the identities of cracks.The structure is depicted in Figure 2. As indicated in Figure 2, GC-Resblock incorporates two components: a residual block [28] and a global contextual unit [29].The residual block extracts local spatial information of crack features through convolutional operations and utilizes an identity mapping mechanism to aid in model training.The global semantic unit is utilized to grasp the overall information of the image, adaptively enhancing the output feature representation from the residual block.
(1) Residual block: The residual block consists mainly of two 3 × 3 convolutions and a residual connection.Batch normalization and rectified linear unit are employed subsequent to every convolutional layer to normalize hidden features and address the issue of non-linear activation.The success of this module lies in its ability to preserve original features to a certain extent, thereby providing some guarantee for gradient backpropagation.
(2) Global contextual unit: The features outputted by the residual block can represent shallow information but often contain significant noise that requires further processing.Therefore, this paper introduces the global contextual unit following the residual block.Initially, this module employs 1 × 1 convolution and the sigmoid activation function to produce pixel attention weights, where the highlighted regions denote cracks and lowlighted regions represent the background.Subsequently, the input maps are multiplied by the generated attention weights to obtain redistributed results.Further, convolutional operations are applied to recalibrate inter-channel relationships, yielding output results that incorporate spatial and channel attention processing.

= ⊕  ' ( ( ) )
cw pw y y f f y y where y and ' y denote the features from the residual block and final features, respec- tively; pw f and cw f represent pixel operation and channel operation, respectively; conv f represents 1 × 1 convolution; ln f is layer normalization; and δ s and δ r denote sigmoid activation function and relu activation function, respectively.

Graph Pyramid Pooling Module
The use of GC-Resblock effectively mitigates the expression of irrelevant information and enhances the performance of the model in complex backgrounds.However, due to the uneven force strength on road surface, the cracks exhibit diverse structures.The network faces challenges in learning crack features of varying shapes and sizes.Therefore, to meet the requirement of extracting complex crack structures, this paper establishes the graph pyramid pooling module (GPPM), as depicted in Figure 3.As illustrated in Figure 3, the GPPM comprises two components: a pooling pyramid module [30] and a graph reasoning unit [31].The pooling pyramid module operates multiscale features by pooling layers of different sizes, thereby aggregating crack information of different sizes.The graph reason block aims at perceiving global contextual information, capturing the relationships between disjoint regions with irregularly shaped relationships.
(1) Pooling pyramid module: The feature map undergoes average-pooling operations using various kernel sizes, followed by a 1 × 1 convolution.To preserve the original feature, one path is not pooled.Then, these feature maps are introduced into the graph reasoning unit to gain more contextual information, which is followed by elementwise addition and 1 × 1 convolution to obtain the output feature.
(2) Graph reasoning unit: The traditional convolutions can only handle pixels in the neighborhood, and they often struggle to effectively capture long-range global relationships between distant regions and require the use of multiple stacked convolutional layers.The advantage of graph convolutions is the ability to directly capture the contextual information of the entire graph.Thus, the graph reasoning unit is embedded in the pooling pyramid module for establishing the relationship of distant regions.As shown in Figure 3, the detailed operation is described below.
Projection: Before engaging in comprehensive graph-based relational inference, a prerequisite involves the transformation of features from the coordinate space, facilitating their projection and mapping onto the graph space.As shown in Figure 4, in contrast to feature map in the coordinate space, the projected feature map in the graph space stores the features through the nodes.The projection function is learned by two 1 × 1 convolutions followed by elementwise multiplication. where

Graph reasoning:
The nodes denote the semantics of the original feature and facilitate the identification of relationships in distant and irregular regions.To grasp the attributes of the related nodes, the contextual relationship between each pair of nodes is represented and reasoned through the application of graph convolutions.
where g f denotes the state update function of nodes in graph convolution, A represents the node adjacency matrix, and these two parameters are both learnable.
Reverse projection: The final step involves mapping the output features to return to the original coordinate space following the reasoning of relationships.Reverse projection is very similar to the projection.In conclusion, the GPPM aggregates multi-scale features by a pooling pyramid module and captures the relationship of arbitrarily shaped cracks with a graph reasoning unit.This module enhances the ability of GGMNet to identify various sizes and shapes of cracks.

Multi-Scale Feature Fusion
To reduce the probability of missing detection and to ensure the accuracy of detecting pavement cracks, the multi-scale feature fusion (MFF) is constructed to assemble the feature maps of layers and avoid missing contextual and spatial information.As depicted in Figure 1, the feature maps of each layer are upsampled to the scale of 256 × 256 and subsequently introduced into the channel-weighting fusion unit (CWF), which is designed to learn and assign the weights of each channel.Compared to previous studies [32,33], we presume that the importance of each feature is different, and the relationship of these maps is not explored, the effective complementary knowledge will be overlooked, but redundant information will be retained.As shown in Figure 5, the channel-weighting fusion unit is composed of convolution, pooling and sigmoid activation function.The specific computational formula for this unit is as follows: where z and ' Through the channel-weighting fusion unit, the same weight is shared among various spatial positions within the feature channel, while feature weights for different channels are redistributed.The useless channel information will be suppressed, and important channel information will be prominently expressed.

Loss Function
The BCE loss function exhibits superior efficacy in image segmentation missions, indicating the disparities between ground truth and the predicted result [34].However, the issue of significant imbalance between positive and negative samples exists in crack detection, and if only the BCE loss function is chosen, the model may fail to obtain the global optimal effect.Thus, the Dice loss function [35], which is designed to lighten the imbalance issues, is introduced into the training process.In addition, a deep supervision mechanism [36] is separately applied to each output layer, which aims at enhancing the network's segmentation accuracy and accelerating the convergence speed of segmentation.
where M represents the count of output layers; α m denotes the weight of each output layer; and side L and fuse L represent the loss of each output layer and the fused predicted result, respectively.

Datasets
(1) DeepCrack: The DeepCrack dataset was collected by Y. Liu et al. [24], and it contains 537 images of concrete pavement cracks with different scenes and light conditions.All crack images in the dataset are 544 × 384 pixels.In particular, a substantial amount of noise exists in this dataset in the form of such imagery as shadows, oil stains, and different shapes of road debris.
(2) CrackTree260: The CrackTree260 dataset was collected by Q. Zou et al. [37] This dataset comprises 260 pavement crack images of size 800 × 600 pixels, with multiple crack types, such as transverse, longitudinal, mesh, and block.The cracks in this dataset show various sizes and shapes, and it contains a number of relatively narrow cracks.
(3) Aerial Track Detection: The Aerial Track Detection dataset was collected by Z. Hong et al. [38] In contrast to the images in both of the above datasets, the crack images acquired under the unmanned aerial vehicle perspective for this dataset include 4118 postearthquake pavement cracks with each image size being 512 × 512 pixels.This dataset is applied for training and testing to verify the robustness of our network.

Parameter Setting
The GGMNet proposed relied on the PyTorch framework, and the NVIDIA RTX A5000 was employed to expedite the training of the model.The specific parameters of the training process are presented in Table 1.In this study, the datasets were partitioned into training sets, validation sets, and testing sets with a division ratio of 6:2:2.To mitigate the risk of overfitting, data augmentation techniques, including random horizontal-vertical flipping, random cropping, and random color mapping, were used during training.

Experimental Results
To evaluate the effectiveness of GGMNet, we combined the DeepCrack, Crack-Tree260, and Aerial Track Detection datasets to conduct comparison experiments.Four evaluation metrics were employed to conduct precise quantitative analysis.Additionally, visualization of the results was conducted to qualitatively analyze the detection performance of both GGMNet and other mainstream networks.

Results for DeepCrack
Table 2 shows the quantitative crack detection results of GGMNet.To evaluate the accuracy and performance of GGMNet, we utilized several mainstream models.The results indicate that GGMNet exhibited outstanding performance compared to current mainstream networks for the DeepCrack dataset, with a precision of 83.63%, a recall of 90.93%, an F1 score of 87.13%, and an IOU value of 77.19%.Except for a slightly lower recall, all other metrics were at their optimal values.

Method
Figure 6 shows the visualization of the qualitative results of each model.GGMNet obtained the superior visual performance compared with the other models.As we can see in the top row in Figure 6, all models could acquire acceptable detection results when the interference of the background was weak and the structure of the crack was simple.However, in addition to the GGMNet, HED, and DeepCrack, the other networks all omitted some details of cracks (see the red rectangular box).As seen in the second and third rows, only the GGMNet detected the crack information completely and without the problem of misdetection, but there is still a considerable number of results with missed crack detection (see the red rectangular box) and background error recognition (see the yellow rectangular box) for the other models.This is mainly because the proposed GGMNet uses the GC-Resblock in the encoder to suppress the background noises and highlight the crack feature expression, reduces the probability of misdetection of background information, and obtains coherent crack information.As observed from the fourth and fifth rows, only GGMNet had a satisfactory crack-detection performance when the structure of cracks was complex and the background noises interference was strong.Taking a closer look at the red rectangular box in fifth row, we can see that GGMNet alone extracted the cracks unlabeled, which is because GGMNet applies both the GPPM and MFF, enabling awareness of global semantic information and the spatial relationships of the microcrack.In conclusion, GGMNet obtained outstanding experimental results for different scenes and cracks of diverse scales and could effectively distinguish between the crack and background even though the interference of the background was strong.

Results for CrackTree260
To further validate the validity and generalizability of the GGMNet, we also performed experimental investigations using the CrackTree260 dataset.This dataset contains all crack types, and the structure of the cracks is more complex compared with the DeepCrack dataset.Specifically, this dataset incorporates some thin cracks.Table 3 exhibits the quantitative comparison results of each model.The precision, recall, F1 score, and IOU value of GGMNet were the highest, and these metrics of GGMNet significantly exceeded those of the others.Figure 7 shows the visualization of the qualitative results of each model.The outcomes of the proposed GGMNet were more accurate and complete.As we can see from the first and second rows in Figure 7, the other models except for GGMNet all missed detections when the structure of cracks was intricate (see the red rectangular box).Compared with that of additional labels, the visual performance of GGMNet was remarkably improved.This is because GGMNet with the GPPM and MFF can effectively aggregate multi-scale crack features and successfully capture the relationships of cracks across different regions and shapes.The third row shows that GGMNet displayed exceptional extraction abilities in microcracks and overcame the interference of background noises.In contrast, the other models all exhibited error in detection and omissions in extraction, and their visual performance was markedly inferior to that of GGMNet.In addition, we can find from the fourth and fifth rows that the GGMNet demonstrated robust performance even under uneven lighting conditions.This can be attributed to the fact that the GC-Resblock of the GGMNet guide network concentrates on the cracks and suppresses the other noises.In summary, the GGMNet showed excellent detection performance for the cracks of various sizes and shapes.

Results for Aerial Track Detection
In contrast to the aforementioned two datasets, this dataset was acquired from the aerial viewpoint of drones.Due to the origin of the images from post-earthquake highway crack formations, where the cracks exhibit significant severity and the scene is relatively homogeneous, the crack information can be easily identified and extracted.The quantitative experimental findings of diverse segmentation models on the Aerial Track Detection dataset are shown in Table 4.As depicted in Table 4, the proposed GGMNet achieved 94.13% precision, 91.37% recall, 92.73% F1 score, and 86.45% IOU values.Compared to the other models, the GGMNet exhibited superior performance in the all evaluation metrics except for the recall.For recall, the GGMNet achieved suboptimal results, but this was only 0.01% lower than the result for DeepCrack.Figure 8 qualitatively displays the visualization results of each model.All models achieved satisfactory performance, but except for GGMNet, the models still had some misdetection (the red rectangular box in the first row) and omissions in extraction (the red rectangular box in other rows).As depicted in Figure 8, it is evident that the crack-detection results of GGMNet were remarkably similar to the labels.

Experimental Conclusions
The results of the experiments on three publicly available datasets indicate that GGMNet achieved the best performance.The results obtained on the DeepCrack dataset provide evidence that GGMNet exhibits excellent performance even in the presence of complex background information.The experiments on CrackTree260 demonstrate that GGMNet is effective in extracting cracks of various shapes and sizes, particularly thin cracks.The results for the Aerial Track Detection prove that the proposed GGMNet is capable of adapting to different perspectives and fields of view for crack detection.In conclusion, GGMNet is characterized by outstanding performance and strong robustness.

Comparison of Effectiveness among Different Levels of GC-Resblock
To showcase the effectiveness of varying tiers of GC-Resblock, we conducted additional investigations into the functionality of this module using ablation experiments and feature visualization techniques on the DeepCrack dataset.
Table 5 displays the assessment findings for different levels of GC-Block for the DeepCrack testing sets.Compared with the No. 1 model, No. 5 acquired the optimal results, with the F1 score and IOU value improving by 1.43% and 2.22%, respectively.As we can see, the chosen evaluation metrics consistently exhibited incremental improvements from the No. 1 to the No. 5 model, indicating the effectiveness of incorporating GC-Resblock across different stages.Meanwhile, it can be observed that more significant improvements were obtained by incorporating this module in stage 1 and stage 4, resulting in increases of 0.57% and 0.81% for IOU value, respectively.To further investigate the role of GC-Resblock, the feature maps were visualized before and after the GC-Resblock was applied at different levels.Figure 9 shows the visualized results, where different brightness levels indicate the model's attention to different regions.As shown in Figure 9, the feature maps all exhibited varying levels of luminosity changes before and after the addition of GC-Resblock.To be more specific, a-b, c-d, and e-f show the brightness of cracks increased while the brightness of the background regions decreased.The top row in Figure 9 shows that the semantic information of cracks in f was more extensive compared to that in a.In conclusion, the evaluation metrics and visualizations all demonstrate that the GC-block guides the network to emphasize the identities of cracks while attenuating background interference.

Comparison of the Effectiveness among Different Multi-Scale Aggregation Schemes
Multi-scale aggregation approaches have been widely applied in tasks such as object detection and semantic segmentation, with related studies confirming their effectiveness [44][45][46][47][48].In this study, the proposed GPPM enabled our network to perceive distant multiscale crack features, resulting in a better representation of complex multidimensional crack features.To further demonstrate the superiority of this module, we compared it with other mainstream multi-scale aggregation approaches, and the comparative results are presented in Table 6, which indicate our module is more suitable for extracting complex crack features.

Comparison of Effectiveness among Various Feature Fusion Methods
To affirm the superiority of the CWF, we conducted a comprehensive comparison of various feature fusion methods on the DeepCrack testing set.As shown in Figure 10, the proposed CWF obtained the optimal F1 score and IOU value compared to the other methods.This is because features at different layers contain complementary and redundant information.If there are only output features from a single dimension, the results are often incomplete.If these features are concatenated without processing from different layers, this will lead to feature redundancy, and satisfactory results will not be acquired.Therefore, considering the contributions of features from different layers is of paramount importance.We devised a channel-weighting fusion module (CWF) that adaptively captures the weights of each channel, facilitating the propagation of informative features.The CWF proposed is more adaptive to crack detection in contrast to the SE module [50].

Ablation Experiments
To showcase the effectiveness of each component we proposed, the effects of removing GC-Resblock, GPPM, and MFF on the model performance are discussed, respectively.The findings from the performed ablation experiments on the DeepCrack dataset are displayed in Table 7.The F1 score and IOU value of the model decreased significantly after removal of the GC-Resblock, which indicates that focusing on essential information and suppressing background noise is of paramount importance.The two metrics of GGMNet decreased to some extent after removal of the GPPM and MFF, respectively, which indicates that the aggregation and interaction of multi-scale information are also of great importance.In addition, for the GPPM module, discarding the graph reasoning unit negatively impacted the model's performance, which indicates that capturing the relationships among different regions and extracting irregular spatial information is crucial for crack detection.

Conclusions
This paper introduces a novel pavement-crack detection network named GGMNet.Combined with three crack datasets, the experimental findings from quantitative assessment and qualitative analysis demonstrate that GGMNet exhibits excellent performance and strong robustness.This method will facilitate accurate and comprehensive pavementcrack detection, providing significant engineering significance for digital highway management and maintenance.Below are the specific contributions of this paper: (1) An accurate and robust network, named GGMNet, is proposed for pavementcrack detection.
(2) A GC-Resblock was developed to guide the network to emphasize the identities of cracks while suppressing the background noises effectively.
(3) A GPPM was constructed to support the model to aggregate multi-scale features and capture the long-range dependencies of cracks.
(4) A MFF structure was designed to facilitate channel interaction and achieve feature complementarity across different layers.
Although the proposed GGMNet shows optimal detection performance, it has certain limitations.The model's parameter and computational complexity are slightly higher.Consequently, our future focus will be on simultaneously improving the model's accuracy and addressing speed considerations.
Qunxiong Zhuo are employed by the Fujian Luoning Expressway Co., Ltd.The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figure 1 .
Figure 1.Network framework of the proposed GGMNet.
represent input maps and output maps, respectively; conv f represents 1 × 1 convolution; p f represents global average pooling operation; and δ s denotes sig- moid activation function.

Figure 5 .
Figure 5. Framework of the proposed CWF.
loss function, BCE, and Dice loss function, respectively; N denotes the total number of pixels in the image; p Y * , p Y represent true and predicted values at pixel P, respectively; TP denotes the true-positive samples predicted as positive by the network; FP represents the negative samples predicted as positive by the network; and FN is the positive samples predicted as negative by the network.

Figure 6 .
Figure 6.Visualization of the outcomes produced by diverse methods for DeepCrack.

Figure 7 .
Figure 7. Visualization of the outcomes produced by diverse methods for CrackTree260.

Figure 8 .
Figure 8. Visualization of the outcomes produced by diverse methods for Aerial Track Detection.

Figure 9 .
Figure 9. Visualization outcomes for different levels of GC-Resblock.(a,b) Before and after addition of the first layer of the encoder.(c,d) Before and after the addition of the second layer of the encoder.(e,f) Before and after the addition of the third layer of the encoder.

Figure 10 .
Figure 10.F1 score and IOU value of the different feature fusion methods for DeepCrack.

Table 1 .
The parameter settings.

Table 3 .
Comparison of the methods' P, R, F1, and IOU for the CrackTree260 dataset/%.

Table 4 .
Comparison of the methods' P, R, F1, and IOU for the Aerial Track Detection dataset/%.

Table 5 .
Assessment findings for different levels of GC-Resblock for DeepCrack/%.

Table 6 .
F1 score and IOU value of the different multi-scale aggregation schemes for DeepCrack/%.