Transfer Learning Model for Crack Detection in Side SlopesBased on Crack-Net

Li, Na; Zhang, Yilong; Zhang, Qing; Zhu, Shaoguang

doi:10.3390/app15136951

Open AccessArticle

Transfer Learning Model for Crack Detection in Side SlopesBased on Crack-Net

College of Artificial Intelligence and Computer Science, Xi’an University of Science and Technology, Xi’an 710054, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 6951; https://doi.org/10.3390/app15136951

Submission received: 12 May 2025 / Revised: 13 June 2025 / Accepted: 17 June 2025 / Published: 20 June 2025

Download

Browse Figures

Versions Notes

Abstract

:

Accurate detection of slope cracks plays a crucial role in early landslide disaster warning; however, traditional approaches often struggle to identify fine and irregular cracks. This study introduces a novel deep learning model, Crack-Net, which leverages a multi-modal feature fusion mechanism and is developed using transfer learning. To resolve the blurred representation of small-scale cracks, a nonlinear frequency-domain mapping module is employed to decouple amplitude and phase information, while a cross-domain attention mechanism facilitates adaptive feature fusion. In addition, a deep feature fusion module integrating deformable convolution and a dual attention mechanism is embedded within the encoder–decoder architecture to enhance multi-scale feature interactions and preserve crack topology. The model is pre-trained on the CrackVision12K dataset and fine-tuned on a custom dataset of slope cracks, effectively addressing performance degradation in small-sample scenarios. Experimental results show that Crack-Net achieves an average accuracy of 92.1%, outperforming existing models such as DeepLabV3 and CrackFormer by 9.4% and 5.4%, respectively. Furthermore, the use of transfer learning improves the average precision by 1.6%, highlighting the model’s strong generalization capability and practical effectiveness in real-world slope crack detection.

Keywords:

crack detection; transfer learning; semantic segmentation; feature fusion

1. Introduction

In recent years, China’s rapid economic development has led to a significant expansion of infrastructure construction, particularly in mountainous and hilly regions, including expressways, railways, and tunnels. Slope stability is a critical aspect of these projects, as it directly affects both project safety and the protection of lives and property. In complex geological environments, slope cracks are among the primary causes of slope instability [1]. Crack formation typically occurs gradually, making early detection challenging. Once cracks reach a critical size, they can trigger landslides, collapses, and other catastrophic events. Therefore, timely and accurate detection and assessment of slope cracks are essential for the prevention and control of geological disasters. Current slope crack detection methods include traditional manual inspection, the use of physical sensors, image processing techniques, and automated approaches based on machine learning [2]. Deep learning, particularly convolutional neural networks (CNNs), has demonstrated strong performance in crack detection tasks. Despite significant progress in slope crack detection, several challenges remain [3]. Recent studies in geotechnical and civil engineering have shown that factors such as material type, surface moisture, inherent shadows, and vegetation coverage can significantly impact the detection and identification of slope cracks. These factors may obscure or alter crack features, increasing the complexity of automated detection [4,5]. For instance, variations in soil or rock composition and humidity can affect crack visibility, while shadows and vegetation may cause false positives or missed detections in image-based methods. These environmental and material factors have become key concerns in recent civil engineering research and are increasingly being considered in the development and application of information-based detection methods. First, slope cracks are typically slender and irregular, and their appearance is strongly influenced by environmental factors. This makes it difficult for traditional image processing and machine learning models to accurately identify small cracks, especially in complex backgrounds, where false detections and missed detections remain significant issues [6,7]. Secondly, most deep learning models require large amounts of labeled data for training. However, in slope crack detection, data are limited, and labeling is challenging, which hinders model training and generalization [8,9]. Additionally, the robustness and real-time performance of current models require further improvement, particularly when dealing with complex geological environments and diverse crack morphologies. Detection accuracy often falls short of engineering requirements [10].

To address these challenges, this paper proposes a crack detection model based on transfer learning. The model is initially trained on a large-scale fissure dataset; then, transfer learning is applied to adapt the model to specific slope crack datasets. This approach effectively mitigates the issue of limited data and enhances the model’s generalization ability and accuracy. To address the challenges of complex crack morphology and the detection of fine cracks, a Frequency-Domain Nonlinear Mapping (FDNM) module is introduced. This module enhances frequency-domain features of crack images, enabling better capture of fine crack characteristics and improving detection accuracy. Additionally, to optimize model performance in complex backgrounds, a Bi-Attention Fusion Module (BAFM) is designed. This module fuses multi-level features, activates crack-related features, and suppresses non-crack features, thereby enhancing the model’s ability to recognize crack details. The main contributions of this work are outlined as follows:

(1): We propose a frequency-domain nonlinear mapping module to better capture the characteristics of small cracks using fast Fourier transform and enhancing the amplitude and phase information;
(2): A bidirectional attention module is designed to enable the model to pay more attention to the key features of the crack and better suppress irrelevant features;
(3): We propose the Crack-Net model combined with the transfer learning strategy to overcome the challenge of insufficient datasets of slope cracks.

2. Related Works

2.1. Crack Detection Method

Early crack detection techniques predominantly depend on manual visual inspections and traditional physical sensors. However, their installation process is intricate and necessitates extensive data processing, which limits their ability to fully address issues of efficiency and comprehensiveness. With advancements in technology, crack detection has progressively transitioned into an image processing-based stage. Huang and Zhang [11] introduced a road crack detection and recognition method to enhance the accuracy of crack identification. Cao et al. [12] developed a technique for identifying and classifying cracks on concrete component surfaces, addressing early defect detection and maintenance challenges in concrete structures. Safaei et al. [13] proposed an automatic image processing algorithm based on crack pixel density, overcoming the limitations of supervised learning methods in small-scale and localized projects due to complex training requirements and resource constraints. Kheradmandi and Mehranfar [14] conducted a critical review and comparative analysis of pavement crack detection technologies based on image segmentation, highlighting the absence of unified standards in this domain.

Despite these advancements, image processing remains constrained by environmental factors such as illumination and background complexity. Consequently, machine learning technologies have been increasingly integrated into crack detection applications. Müller et al. [15] presented an image-based machine learning approach to mitigate uncertainties and errors in determining fracture onset points during mechanical experiments through visual inspection. Loverdos and Sarhosis [16] combined image-based techniques with machine learning to enhance the automation of brick segmentation and crack detection in masonry walls, addressing the low automation levels of traditional methods. Aravind et al. [17] proposed a method leveraging image processing and suitable machine learning algorithms for the detection of cracks and to recognize failure modes in concrete structures, resolving inefficiencies in traditional manual inspections within the construction industry. Ahmadi et al. [18] introduced an ensemble machine learning model for the automated detection and classification of road cracks in urban environments. Jiang et al. [19] developed an efficient deep learning network, MFPA-Net, to improve the efficiency of ground-surface crack detection in coal mining areas.

Zhang et al. [20] pioneered a deep learning-based crack detection method, overcoming the limitations of traditional manual feature extraction in complex road scenarios. Yang et al. [21] devised an automatic pixel-level crack detection technique using fully convolutional networks, thereby enhancing the precision of crack segmentation. Gao et al. [22] proposed a pixel-level road crack detection method utilizing UAV remote sensing images, addressing the inefficiency of traditional road crack detection approaches. Laxman et al. [23] introduced a method for automatic crack detection and depth prediction in reinforced concrete structures using deep learning, solving problems related to crack detection and depth evaluation in concrete structures. Ali et al. [24] reviewed the application of convolutional neural networks in civil structure crack detection, emphasizing technological gaps in this field. Su et al. [25] introduced an enhanced MOD-YOLO algorithm to address the issues of feature information loss, limited multi-scale adaptability, and insufficient computational efficiency associated with the traditional YOLO algorithm in crack detection.

2.2. Transfer Learning Method

Transfer learning refers to a methodology that leverages pre-existing knowledge to address novel problems, which is particularly advantageous in practical scenarios characterized by limited data or challenging labeling processes [26]. Zhang et al. [27] introduced a unified transfer learning-based approach for detecting both pavement cracks and seal cracks, effectively addressing issues such as uneven intensity in pavement crack detection. Jin et al. [28] developed an unsupervised representation federated transfer learning method, which significantly enhanced the accuracy of crack detection. Additionally, Qingyi and Bo [29] proposed a real-time concrete crack detection model utilizing a Transformer architecture, achieving superior precision in crack identification. Katsigiannis et al. [30] presented a deep learning framework based on transfer learning to overcome the limitations of traditional crack detection methods for masonry building facades, which are often time-consuming and costly. Wu et al. [31] introduced a crack detection technique grounded in the GoogLeNet Inception V3 convolutional neural network, resolving challenges associated with crack detection during the operational life cycle of civil structures. Zheng et al. [32] devised a deep transfer learning-based framework to tackle the difficulties of railway surface crack detection caused by complex image backgrounds and significant noise. Lastly, Su and Wang [33] enhanced the EfficientNetB0 model through transfer learning, enabling efficient and objective crack detection on concrete surfaces while overcoming the inefficiencies and subjectivity inherent in traditional manual methods. Vinodhini and Sidhaarth [34] proposed a pothole detection method for asphalt pavement based on transfer learning and convolutional neural networks, which solved the problems of low efficiency and strong subjectivity in traditional manual detection of asphalt pavement potholes.

Despite the prevalence of transfer learning in numerous domains, research exploring its application in detecting cracks in high-steepness slopes remains limited. The scarcity of image data, particularly about cracks in such slopes, poses a significant challenge in training deep learning models. The irregular geometry of cracks further complicates accurately recognizing their minute characteristics. When feature extraction is performed in a U-shaped network, the number of feature channels in different layers does not match, resulting in insufficient fusion of deep and shallow features. To address these problems, this paper proposes a slope fissure detection method based on transfer learning. By capturing global and local detailed features and introducing an attention mechanism in feature fusion, the above problems are effectively addressed. It provides a reliable detection method for early warning of geological disasters such as landslides.

A comprehensive review of the literature reveals that existing crack detection methods possess unique strengths across various technical paradigms, yet each approach is accompanied by specific challenges. These limitations are particularly pronounced when these methods are applied to the specialized context of slope crack detection. Table 1 provides a systematic overview of the main categories of crack detection methods and representative studies, as well as their respective advantages and and core limitations. It also offers a focused assessment of their applicability to slope crack detection. Deep learning-based transfer learning strategies have emerged as effective solutions to address the scarcity of annotated slope data. However, there remains a lack of research that specifically targets the unique characteristics of geotechnical slope cracks—such as their fine scale, irregular morphology, and susceptibility to strong background interference—while fully leveraging transfer learning to bridge domain gaps. This gap highlights a clear research opportunity and provides a foundation for the innovations proposed in this study.

3. Materials and Methods

3.1. Detection Model with Crack-Net

3.1.1. Overall Structure

This paper aims to establish simultaneous interaction and local context awareness between low-level feature maps of crack images and enhance the detectability of cracks. To this end, a Transformer-based crack detection model, Crack-Net, was developed. Crack-Net adopts an encoder–decoder structure and combines an attention mechanism and a fusion module to improve the ability to extract crack features and integrate information. The overall structure is shown in Figure 1.

Crack-Net consists of three main stages: the encoder, the decoder, and the feature fusion stage. Initially, the input image passes through a 3 × 3 convolutional layer, which transforms the RGB channels into a 64-dimensional feature representation. H, W, and C represent the height, width, and number of channels of the feature map, respectively. After each encoder block, max pooling is applied to reduce the spatial dimensions of the feature map by half, which facilitates the extraction of semantic information. Simultaneously, convolutional operations increase the number of channels (C→2C→4C→8C), allowing the network to extract richer features at lower spatial resolutions. Max unpooling is used to upsample the feature map at corresponding positions in the decoder. During decoding, the spatial resolution of the feature map is gradually restored (from H/32 × W/32 to H × W), while the number of channels is reduced (8C→4C→2C→C). Finally, a 1 × 1 convolution generates a single-channel crack detection map. The prediction results from each scale and the fused features are then resized to match the original dimensions of the input image, producing the final crack detection map. This multi-scale, multi-path feature fusion strategy enables the model to capture subtle crack features at various spatial resolutions, thereby improving detection accuracy and robustness.

However, the traditional encoder–decoder architecture confronts two challenges in the context of slope crack detection. First, the fine-grained characteristics of cracks are susceptible to disruption by complex backgrounds, and conventional convolution operations are ineffective in separating frequency-domain coupled features. Second, channel mismatch and spatial misalignment during cross-layer feature fusion hinder the accurate alignment of shallow details with deep semantics. To address these challenges, this paper proposes a novel integration of a frequency-domain nonlinear mapping module (FDNM) and a bidirectional attention fusion module (BAFM) into the existing architecture. The FDNM utilizes dynamic spectral processing to decouple frequency-domain features, while the BAFM employs an attention-gating mechanism to optimize cross-layer feature interaction. The collaborative effect of these two modules significantly enhances the model’s capacity to accurately identify and classify crack features.

3.1.2. Frequency-Domain Nonlinear Mapping Module

In slope crack detection, the inherent challenge lies in the inherent sparsity and small size of these cracks, which often lead to a significant imbalance between the cracks and their background features. This imbalance poses a significant threat to the efficacy of crack detection, as it can easily mask or weaken the crack areas, consequently impacting the detection outcomes. To address this challenge, this paper puts forward a frequency-domain nonlinear mapping module (FDNM), a methodology that, in principle, can be outlined as follows: First, an input is normalized and subsequently fed to a dynamic spectrum processing module, where a 1 × 1 convolution adjusts the number of channels. Thereafter, the result of the first residual connection receives another normalization, a process that further stabilizes the feature distribution. The normalized features, thus prepared, are then entered into the feedforward neural network layer (FFN), consisting of multiple fully connected layers, which are employed to perform nonlinear transformations and information integration on the features. This FFN is a key element in the overall system, as it is instrumental in performing these critical functions. Finally, the output of the FFN is summed element-wise with the input before normalization to complete the second residual connection, and the result is output.

Given that the detection process both must capture local detailed features and understand global contextual information, the present paper designs a dynamic frequency spectrum processing module (DFSP) whose detailed structure is shown in Figure 2a. When processing frequency-domain features, the module is divided into two parallel processing paths: one for processing amplitude information and the other for processing phase information. The input feature map is converted to the frequency domain using a fast Fourier transform (FFT). After these two processing paths, the module adds the original amplitude and phase features to the processed results, forming enhanced amplitude and phase features. The enhanced features are then converted back to the spatial domain and fused with the original features, effectively enhancing the diversity and richness of the feature representation.

First, the input feature (x) is subjected to a fast Fourier transform to obtain its representation in the frequency domain:

X_{f r e q} = FFT (x)

(1)

Secondly, the magnitude (

| X_{f r e q} |

) and phase (

∠ X_{f r e q}

) are transmitted to two parallel pathways for processing. Each pathway comprises a 1 × 1 convolution operation and an SE module. The SE layer employs an adaptive channel attention mechanism to accentuate salient feature channels (Figure 2b). The specific formula for this process is expressed as follows:

\begin{matrix} {\hat{X}}_{m a g} & = processmag (| X_{f r e q} |) \\ {\hat{X}}_{p h a} & = processpha (∠ X_{f r e q}) \end{matrix}

(2)

In each path, the features undergo nonlinear activation and an attention-based mechanism, subsequently integrating with the original amplitude and phase to yield the enhanced amplitude (

{\hat{X}}_{m a g}^{'}

) and phase (

{\hat{X}}_{p h a}^{'}

):

\begin{matrix} {\hat{X}}_{m a g^{'}} & = | X_{f r e q} | + {\hat{X}}_{m a g} \\ {\hat{X}}_{p h a^{'}} & = ∠ X_{f r e q} + {\hat{X}}_{p h a} \end{matrix}

(3)

Then, the reconstruction of the complex frequency-domain characteristics is achieved through the utilization of trigonometric functions that exhibit augmented amplitude and phase:

X_{o u t} = {\hat{X}}_{m a g^{'}} \cdot (\cos ({\hat{X}}_{p h a^{'}}) + i \cdot \sin ({\hat{X}}_{p h a^{'}}))

(4)

Subsequently, the enhanced features from the frequency domain are converted back to the spatial domain through an inverse fast Fourier transform, also known as an IFFT, to obtain the enhanced features (

x_{f o u t}

):

x_{f o u t} = IFFT (X_{o u t})

(5)

In conclusion, to achieve the fusion of the enhanced frequency-domain features with the original input, it is necessary to obtain the final output features (

x_{o u t}

) by performing element-wise addition:

x_{o u t} = x + x_{f o u t}

(6)

The FDNM module is sufficiently designed to leverage the complementary characteristics of amplitude and phase information in the frequency domain. By independently processing these two components and incorporating adaptive channel attention, the module selectively enhances salient features while suppressing noise and irrelevant information. This dual-pathway architecture ensures that both the intensity and structural attributes of cracks are effectively captured and retained. In addition, the use of residual connections at multiple stages stabilizes gradient flow and promotes the integration of original and enhanced features, thereby improving the network’s robustness and generalization ability in challenging crack detection scenarios.

3.1.3. Bi-Attention Fusion Module

At each stage, the feature vectors in the encoder and decoder are connected and fused through a bidirectional attention fusion module (BAFM) to generate a clear and distinct crack boundary map. The decoder module was engineered to improve segmentation performance through the combination of feature vectors. Specifically, an attention gate mechanism is implemented to generate an attention mask, which is normalized through a sigmoid activation function. Following this, element-wise multiplication is performed on features requiring refinement; this process is similar to a filter, activating the features of the region of interest while suppressing irrelevant features. Finally, the attention mask generated using the features in the encoder is used as the attention coefficient and multiplied by the corresponding features in the decoder to activate the crack-related features and suppress non-crack features. As illustrated in Figure 3, this process enables the refinement of crack-related features, enhancing the clarity and precision of the resulting crack boundary map.

In the context of feature extraction in encoders and decoders, where there are often mismatches in the number of feature channels across layers, a Deep Feature Fusion Module (DFFM) is proposed. This module aims to enhance the fusion of deep and shallow features, focusing on optimizing the extraction and integration of features. The DFFM integrates a Channel Attention (CA) module, a Multi-Regional Fusion Module (MRF), and the relevance enhancement operation. This module has been shown to effectively fuse and streamline the intermediate features extracted by CNNs and Transformers, enhancing the overall efficiency of interaction between disparate feature domains and significantly improving the expressiveness of crack features. Figure 4 illustrates that the module introduces an adaptive weighting mechanism for feature importance. This mechanism learns the importance weights of different feature maps through global average pooling and a fully connected layer. This process has been demonstrated to help suppress invalid features and highlight valid features. The attention mechanism employs a bidirectional interaction structure, capturing complementary information by calculating the attention matrix between two feature maps. This mechanism involves the calculation of the attention matrix through the inner product of Q and K, with features calculated separately for Q, K, and V. The values are subsequently weighted and summed using these attention weights to generate an enhanced feature map:

\begin{matrix} {Attention}_{y} & = softmax (\frac{Q_{y} \cdot K_{x}^{⊤}}{\sqrt{c}}) \cdot V_{x} \\ {Attention}_{x} & = softmax (\frac{Q_{x} \cdot K_{y}^{⊤}}{\sqrt{c}}) \cdot V_{y} \end{matrix}

(7)

In these equations,

Q_{x}, K_{x}, V_{x}

and

Q_{y}, K_{y}, V_{y}

denote the query, key, and value matrices derived from feature maps x and y, respectively. For a feature map (

x \in R^{B \times C \times H \times W}

), these matrices are generated by applying separate linear projections (such as

1 \times 1

convolutions) to x, resulting in

Q_{x}, K_{x}, V_{x} \in R^{B \times d \times N}

, where d is the embedding dimension and

N = H \times W

is the number of spatial locations. The same procedure is applied to y. Attention weights are calculated by performing a scaled dot product between the query and key matrices, followed by a softmax operation along the spatial dimension. The resulting attention map is then used to weight the value matrix, yielding an enhanced feature representation that captures both local and global dependencies. This bidirectional attention mechanism allows the DFFM to effectively integrate complementary information from different feature domains, thereby enhancing the accuracy and robustness of crack detection.

Additionally, the DFFM utilizes depthwise separable convolutions to reduce the number of parameters without sacrificing performance and adopts a multi-path feature aggregation strategy to integrate features from diverse sources. This comprehensive fusion approach ensures that both the original and attention-enhanced features are fully exploited, resulting in more expressive and discriminative representations for crack detection.

Concomitantly, a deep separable convolution is employed as an alternative to a standard convolution, leading to a substantial reduction in parameters without a corresponding decline in performance. Furthermore, a multi-path feature aggregation approach is devised to integrate features obtained through diverse methodologies. The integration process encompasses the original features, the features that are multiplied directly, and the attention interaction features. This integration is facilitated by the fusion module, also known as the Feature Fusion Block (FFB). The utilization of all available information is thereby guaranteed.

3.1.4. Loss Function

In deep learning, specifically within the Python framework, BCEWithLogitsLoss has been identified as a loss function that is particularly adept in addressing binary classification problems. This loss function represents a synthesis of the sigmoid activation function and the binary cross-entropy loss. This combination has been demonstrated to enhance numerical stability during the training process. The precise formula for this function is expressed as follows:

Loss (x, y) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log (σ (x_{i})) + (1 - y_{i}) \log (1 - σ (x_{i}))]

(8)

where

x_{i}

denotes the initial output of the model;

y_{i}

is the true label, which assumes a value of 0 or 1;

σ (x_{i})

is the sigmoid activation function;

N is the total number of samples.

3.2. Transfer Learning Strategy

In the transfer learning scheme, a three-stage training strategy is adopted to minimize overfitting and optimize model performance:

The early layers of the encoder are frozen, and the model is trained for 30 epochs with a learning rate of

1 \times 10^{- 4}

.

The Down 2.nn2 module is unfrozen, and the learning rate is reduced to

5 \times 10^{- 5}

, with training conducted for 50 epochs.

All layers are unfrozen, and a learning rate of

1 \times 10^{- 5}

is used for 20 epochs of fine-tuning. This phase aims to further optimize the model’s performance for the slope crack detection task.

The entire training process utilizes the AdamW optimizer with the CosineAnnealingLR learning rate scheduler. This scheduler gradually reduces the learning rate to mitigate overfitting and enhance model convergence and stability. A suite of data augmentation techniques is also employed to improve the model’s robustness and generalizability, including random rotation, scaling, and brightness adjustment.

In network structure adjustment, this paper proposes the augmentation of the number of channels and heads within the high-level attention module, intending to enhance the model’s ability to effectively capture and interpret complex slope textures.

The rationale for this staged transfer learning strategy is grounded in established best practices in deep learning. Initially freezing the early encoder layers allows the model to preserve generic low-level features acquired from large-scale pre-training datasets, which are typically transferable and beneficial for downstream tasks. By restricting updates to the later layers, the model can efficiently adapt to the specific characteristics of the slope crack dataset without disrupting foundational representations. Gradually unfreezing additional layers and reducing the learning rate in subsequent stages facilitates a smooth adaptation process, thereby minimizing the risk of catastrophic forgetting and overfitting. Furthermore, the use of a decaying learning rate schedule ensures stable convergence and assists the model in achieving a better local optimum. Overall, this parameter configuration is designed to balance the retention of pre-trained knowledge with effective task-specific adaptation, thereby maximizing the benefits of transfer learning for slope crack detection.

Figure 5 depicts the transfer learning workflow employed in this study. First, the Crack-Net model is trained on the publicly available CrackVision12K dataset to obtain a source-domain model. Through training on a large-scale, generic crack dataset, the source-domain model develops robust crack feature extraction capabilities. Subsequently, the weights of the source-domain model are transferred (weight shift) to the target task of slope crack detection. Specifically, the Crack-Net model pre-trained on the CrackVision12K dataset serves as the initial model and is further fine-tuned on the slope crack dataset, resulting in a model optimized for slope crack detection. This approach effectively leverages the rich features of the public dataset, thereby improving the accuracy and robustness of crack detection in specific scenarios, such as slope surfaces.

4. Experiment

4.1. Implementation Details

The hardware environment consists of an Intel Xeon(R) Platinum 8352 V central processing unit (CPU), an NVIDIA GeForce RTX 4090 graphics processing unit (GPU), 24 gigabytes (GB) of video memory, and 64 GB of main memory. The installed NVIDIA graphics card drivers are a CUDA 11.3 and cudnnv8.2.1. The software environment consists of an Ubuntu operating system and PyTorch 1.11.0.

4.2. Experimental Settings

(1): Datasets

The dataset employed in the present experiment comprises a pair of components. The primary component is the publicly accessible CrackVision12K dataset, which contains 12,000 images depicting a variety of crack-like structures, including but not limited to roads, bridges, and walls. This dataset is accompanied by masks that delineate the specific regions of interest. The resolution of this dataset was uniformly adjusted to 256 × 256 and utilized for the purpose of pre-training the model. The second part of the dataset under investigation consists of the slope crack set. This set contains 400 original images collected through manual means, as well as via drone inspection, covering images of different types of cracks in sloped rock. The Labelme tool was used to label all collected images of slope cracks. The two datasets were divided into training, validation, and test sets in ratios of 8:1:1, respectively, in order to ensure that each subset is representative of the crack type and image source, as demonstrated in Figure 6.

To enhance the model’s generalization capability, data augmentation techniques were implemented on the training set of the slope fracture dataset. The augmentation methods employed included clipping, flipping, and rotation. To circumvent the generation of duplicate images following rotation and flipping, rotation angles of 19°, 23°, and 90° were assigned, and rectangular clipping with internal orientation was executed. Furthermore, horizontal flipping and two-step gamma correction were implemented on the images to mitigate the impact of brightness changes on model training, as illustrated in Figure 7.

The data augmentation strategy was specifically designed to address the unique challenges of slope crack detection, where cracks are typically slender and irregular and exhibit diverse orientations under complex lighting conditions. All augmentation operations were based on the original images to preserve the intrinsic characteristics of cracks in natural scenes. Rotation at non-standard angles (19°, 23°, and 90°) increased the directional diversity of the dataset, enabling the model to recognize cracks in arbitrary orientations and reducing directional bias. In addition, combining rotation with gamma correction simulated variations in brightness and contrast, thereby improving the model’s robustness to illumination changes. Gamma correction, as a nonlinear brightness adjustment, is particularly effective for enhancing crack detection under low contrast, strong light, or shadowed conditions. Horizontal flipping further enriched the spatial distribution of cracks, preventing the model from overfitting to specific spatial patterns. The combination of flipping, rotation, and gamma correction comprehensively simulated the multi-faceted variations encountered in real-world scenarios, ensuring that the model maintains high detection accuracy across different orientation, lighting, and contrast conditions.

(2): Evaluation Metrics

In order to provide a comprehensive evaluation of the performance of the algorithm, this paper employs a variety of evaluation metrics, which include the following: Precision (P), Recall (R), Average Precision (AP), Optimal Dataset Size (ODS), Optimal Image Size (OIS), F1 score (F1), mean Intersection over Union (mIoU), convergence epochs, and training time (h). ODS indicates the highest performance obtained after optimization with a single threshold over the entire dataset and is suitable for evaluating overall performance. Conversely, OIS signifies the mean precision obtained after optimizing the threshold independently on each image. Therefore, this metric reflects the model’s adaptability to different images.

4.3. Comparative Analysis

The effectiveness of the proposed improvement to the model was determined through a comparative experimental analysis of seven distinct models. The models selected for this study included U-Net [35], SegNet [36], DeepLab [37], DeepLabV3 HrSegNet-B16 [38], CrackNex [39], and CrackFormer [40]. The evaluation was conducted on a custom-built slope crack dataset. The experimental results demonstrate that the Crack-Net model proposed in this paper exhibits superior comprehensive performance in multi-indicator evaluation, as evidenced by its highest AP value in Table 2 compared to baseline models such as U-Net and DeepLabV3. The AP value of the CrackFormer model is 5.2% higher than that of the proposed model, which highlights the effectiveness of the frequency-domain nonlinear mapping module in separating background noise from crack features. In addition, the deep feature fusion module improves subpixel boundary localization through cross-layer feature alignment, which significantly increases model accuracy while maintaining a high recall rate. As well as AP, the proposed model achieves the highest F1 score (0.833) and mIoU (0.865), demonstrating superior performance in both segmentation accuracy and region overlap. Furthermore, the proposed model contains only 22.3 million parameters, which is fewer than most comparison models.

Moreover, the findings from the transfer learning experiment (Table 3) demonstrate that, upon pre-training, the AP value exhibited an enhancement from 92.1% to 93.7%; regarding convergence speed, the transfer learning group reached convergence in 73 epochs, whereas the non-transfer learning group required 126 epochs. This represents a 42.1% improvement in convergence efficiency, and the training time was reduced from 8.7 h to 4.2 h. These results indicate that pre-trained weights provide a more stable optimization path. Overall, these findings demonstrate that the transfer learning strategy improves training efficiency and model stability and confirm the practical value of the proposed method for slope crack detection with limited sample sizes.

As demonstrated in Figure 8, a comparative analysis was conducted on the mask results of the proposed model and other state-of-the-art methods, namely U-Net, SegNet, DeepLabV3, CrackNex, and CrackFormer, on a custom dataset comprising cracks on slopes. The results reveal that U-Net exhibits suboptimal performance in detecting cracks, with incomplete detection outcomes when compared to other models. The detection model proposed in this paper exhibits enhanced performance compared to other models, yet there remains scope for refinement to approach the precision of the original mask image. The model achieved proficiency in identifying minor cracks, indicating potential for further refinement to achieve even greater accuracy.

4.4. Extreme Sample Visualization Analysis

To further assess the robustness and generalization capability of the proposed model, we performed a visual analysis of the best and worst performing samples from the test set. As illustrated in Figure 9, for the best performing samples, the model achieves high F1 scores (greater than 0.89) and IoU values (greater than 0.81) while maintaining a favorable balance between precision and recall. The predicted probability maps closely align with the ground truth, and the binary segmentation results accurately delineate crack regions, even in the presence of complex backgrounds or fine crack structures. This demonstrates the model’s strong ability to capture subtle crack features and suppress background noise.

In contrast, the worst performing samples show lower F1 and IoU scores (Figure 10), mainly due to challenging conditions such as low contrast, blurred cracks, or significant background interference. In these cases, the probability maps display weaker responses along crack regions, and the binary segmentation results may miss fine cracks or introduce false positives. Notably, samples with high recall but low precision indicate a tendency to over-segment ambiguous regions, whereas those with high precision but low recall suggest missed detections of faint cracks.

Building upon these comparative results, we further evaluate the stability and reproducibility of our proposed method through comprehensive statistical analysis, as presented in the following section.

4.5. Model Stability and Reproducibility Analysis

To comprehensively assess the stability and reproducibility of Crack-Net under various experimental conditions, we performed 50 independent runs for each of five representative scenarios: baseline, test-time augmentation (TTA), MC dropout, weight noise perturbation, and threshold variation. We conducted statistical analyses of key performance metrics. The mean, standard deviation, and coefficient of variation (CV) for F1, AP, and mIoU are summarized for all experimental settings in Figure 11. All CV values remain below 0.017, indicating excellent model stability. Notably, the baseline model achieves CVs of 0.007 for F1 and 0.010 for AP, while the TTA strategy further reduces the CV of F1 to 0.006, demonstrating highly consistent performance across different scenarios. Furthermore, the MC dropout and weight noise experiments confirm the model’s robustness to uncertainty and parameter perturbations. The slightly higher CV observed for mIoU highlights the inherent complexity of pixel-level crack segmentation.

Figure 12 shows the distribution of F1, AP, mIoU, and precision for each experimental condition using box plots. The TTA and baseline methods display the most concentrated performance distributions, characterized by narrow interquartile ranges and the absence of significant outliers, which further confirms the reliability of the model architecture and transfer learning strategy. In contrast, the distributions for MC dropout and weight noise are somewhat wider, indicating increased performance variability when uncertainty or parameter perturbations are introduced. Nevertheless, the overall variability remains well controlled. In summary, Crack-Net exhibits outstanding stability and reproducibility across diverse practical scenarios, providing strong statistical evidence for its deployment in engineering applications such as slope crack detection.

4.6. Ablation Study

Following the stability analysis that confirmed the reproducibility of our method, we now examine the individual contributions of each proposed component. This paper substantiates the optimization effect of each module through a series of progressive ablation experiments, as illustrated in Table 4.

With the incorporation of FDNM alone, the AP increases by 3.3% to 88.4%, and the accuracy improves to 78.9%. This result indicates that the dynamic spectral decomposition strategy of FDNM effectively enhances the identification of fine crack features. Meanwhile, the FPS increases to 25.5, demonstrating that the FDNM not only improves detection performance but also accelerates inference speed. When the DFFM is used alone, the AP reaches 89.2%, and the OIS index shows a significant improvement. This finding verifies the optimization effect of the multi-region fusion block on cross-layer feature alignment. At the same time, the FPS increases to 26.7, further confirming that the DFFM enhances both accuracy and efficiency. After integrating both modules, the AP further increases to 91.2%. This result demonstrates that the combined optimization strategy in both the frequency and spatial domains significantly improves the continuity prediction of complex crack structures. The improvement is due to the complementary nature of the feature enhancement directions, which clarifies the regulatory mechanism of collaborative multi-scale feature fusion. Notably, the FPS of the full model reaches 27.9, indicating that the combined modules not only maximize detection accuracy but also achieve the highest inference speed among all configurations. These findings highlight that previous methods often overlook small cracks, focusing primarily on larger ones. The proposed approach, which enhances detailed crack features, then extracts and fuses them, leads to substantial improvements in crack detection.

5. Conclusions

This paper presents a Crack-Net model based on a transfer learning framework to address challenges in slope crack detection. Pre-training on a large crack dataset and subsequently transferring the model to a smaller, target-specific dataset significantly enhances its generalizability. This improvement is particularly evident when only limited samples are available. The model incorporates a novel frequency-domain nonlinear mapping module to address the challenges posed by the irregular and sporadic nature of cracks. This module enhances the expressiveness and diversity of crack features through frequency-domain feature enhancement, enabling the model to detect and recognize even minimally sized cracks. In addition, a novel deep feature fusion module is proposed to improve feature interconnectivity and integration between the encoder and decoder. This module further enhances crack feature representation, suppresses irrelevant features, and increases the model’s robustness.

The Crack-Net model performs optimally on slope surfaces that are relatively clean, with moderate-to-steep gradients, adequate natural lighting, and minimal interference from vegetation or shadows. Under these conditions, crack features are more prominent and can be effectively extracted. Conversely, detection accuracy may decline in environments with dense vegetation, significant shadowing, low-light conditions, or adverse weather such as heavy rain or fog, as these factors can obscure or distort crack features. Future research will focus on further optimizing the Crack-Net architecture and exploring advanced feature extraction and integration techniques. Addressing these challenges and further enhancing the model’s robustness and adaptability to complex real-world environments will be a key direction for future research. These improvements aim to further enhance the model’s performance in slope crack detection. Additionally, the model will be rigorously tested in diverse, practical application scenarios. Crack-Net is expected to provide reliable and intelligent technical support for slope safety monitoring, thereby contributing to reductions in geological disasters such as landslides and enhancing the protection of lives and property.

Author Contributions

Conceptualization, Y.Z.; methodology, N.L. and Y.Z.; investigation, N.L. and Y.Z.; resources, Q.Z. and S.Z.; writing—original draft preparation, N.L.; writing—review and editing, N.L., Y.Z., Q.Z. and S.Z.; visualization, Y.Z.; supervision, Q.Z. and S.Z.; project administration, N.L. All authors reviewed the results and approved the final version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 62002285) and the Young Innovation Team of Shaanxi Universities.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The public CrackVision12k dataset can be downloaded from https://www.kaggle.com/datasets/vangiap/crackvision12k/data (accessed on 17 December 2024). As the customized dataset used in this study is not easily accessible and is subject to third-party, commercial, or privacy restrictions, interested researchers may contact the corresponding author via email to discuss data access.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Azarafza, M.; Akgün, H.; Ghazifard, A.; Asghari-Kaljahi, E.; Rahnamarad, J.; Derakhshani, R. Discontinuous rock slope stability analysis by limit equilibrium approaches—A review. Int. J. Digit. Earth 2021, 14, 1918–1941. [Google Scholar] [CrossRef]
Zhang, R.; Tang, P.; Lan, T.; Liu, Z.; Ling, S. Resilient and sustainability analysis of flexible supporting structure of expansive soil slope. Sustainability 2022, 14, 12813. [Google Scholar] [CrossRef]
Xiao, S.; Dai, T.; Li, S. Review and comparative analysis of factor of safety definitions in slope stability. Geotech. Geol. Eng. 2024, 42, 4263–4283. [Google Scholar] [CrossRef]
Chen, X.; Jing, X.; Li, X.; Chen, J.; Ma, Q.; Liu, X. Slope crack propagation law and numerical simulation of expansive soil under Wetting–Drying cycles. Sustainability 2023, 15, 5655. [Google Scholar] [CrossRef]
Gao, Q.F.; Zeng, L.; Shi, Z.N. Effects of desiccation cracks and vegetation on the shallow stability of a red clay cut slope under rainfall infiltration. Comput. Geotech. 2021, 140, 104436. [Google Scholar] [CrossRef]
Yuan, Q.; Shi, Y.; Li, M. A review of computer vision-based crack detection methods in civil infrastructure: Progress and challenges. Remote Sens. 2024, 16, 2910. [Google Scholar] [CrossRef]
Neyestani, A.; Ahmed, I.; Daponte, P.; De Vito, L. Concrete Crack Detection and Segmentation in Civil Infrastructures Using UAVs and Deep Learning. In Proceedings of the 2023 7th International Conference on Internet of Things and Applications (IoT), Athens, Greece, 18–20 September 2023; pp. 1–6. [Google Scholar]
Zhuang, H.; Cheng, Y.; Zhou, M.; Yang, Z. Deep learning for surface crack detection in civil engineering: A comprehensive review. Measurement 2025, 248, 116908. [Google Scholar] [CrossRef]
Mohammed, M.A.; Han, Z.; Li, Y. Exploring the detection accuracy of concrete cracks using various CNN models. Adv. Mater. Sci. Eng. 2021, 2021, 9923704. [Google Scholar] [CrossRef]
Wu, Q.; Song, Z.; Chen, H.; Lu, Y.; Zhou, L. A highway pavement crack identification method based on an improved U-Net model. Appl. Sci. 2023, 13, 7227. [Google Scholar] [CrossRef]
Huang, W.; Zhang, N. A novel road crack detection and identification method using digital image processing techniques. In Proceedings of the 2012 7th International Conference on Computing and Convergence Technology (ICCCT), Daejeon, Republic of Korea, 29–31 October 2012; pp. 397–400. [Google Scholar]
Cao, X.; Li, T.; Bai, J.; Wei, Z. Identification and Classification of Surface Cracks on Concrete Members Based on Image Processing. Trait. Du Signal 2020, 37, 519–525. [Google Scholar] [CrossRef]
Safaei, N.; Smadi, O.; Masoud, A.; Safaei, B. An automatic image processing algorithm based on crack pixel density for pavement crack detection and classification. Int. J. Pavement Res. Technol. 2022, 15, 159–172. [Google Scholar] [CrossRef]
Kheradmandi, N.; Mehranfar, V. A critical review and comparative study on image segmentation-based techniques for pavement crack detection. Constr. Build. Mater. 2022, 321, 126162. [Google Scholar] [CrossRef]
Müller, A.; Karathanasopoulos, N.; Roth, C.C.; Mohr, D. Machine learning classifiers for surface crack detection in fracture experiments. Int. J. Mech. Sci. 2021, 209, 106698. [Google Scholar] [CrossRef]
Loverdos, D.; Sarhosis, V. Automatic image-based brick segmentation and crack detection of masonry walls using machine learning. Autom. Constr. 2022, 140, 104389. [Google Scholar] [CrossRef]
Aravind, N.; Nagajothi, S.; Elavenil, S. Machine learning model for predicting the crack detection and pattern recognition of geopolymer concrete beams. Constr. Build. Mater. 2021, 297, 123785. [Google Scholar] [CrossRef]
Ahmadi, A.; Khalesi, S.; Golroo, A. An integrated machine learning model for automatic road crack detection and classification in urban areas. Int. J. Pavement Eng. 2022, 23, 3536–3552. [Google Scholar] [CrossRef]
Jiang, X.; Mao, S.; Li, M.; Liu, H.; Zhang, H.; Fang, S.; Yuan, M.; Zhang, C. MFPA-Net: An efficient deep learning network for automatic ground fissures extraction in UAV images of the coal mining area. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 103039. [Google Scholar] [CrossRef]
Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Atlanta, GA, USA, 25–28 September 2016; pp. 3708–3712. [Google Scholar]
Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic pixel-level crack detection and measurement using fully convolutional network. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
Gao, Y.; Cao, H.; Cai, W.; Zhou, G. Pixel-level road crack detection in UAV remote sensing images based on ARD-Unet. Measurement 2023, 219, 113252. [Google Scholar] [CrossRef]
Laxman, K.; Tabassum, N.; Ai, L.; Cole, C.; Ziehl, P. Automated crack detection and crack depth prediction for reinforced concrete structures using deep learning. Constr. Build. Mater. 2023, 370, 130709. [Google Scholar] [CrossRef]
Ali, R.; Chuah, J.H.; Talip, M.S.A.; Mokhtar, N.; Shoaib, M.A. Structural crack detection using deep convolutional neural networks. Autom. Constr. 2022, 133, 103989. [Google Scholar] [CrossRef]
Su, P.; Han, H.; Liu, M.; Yang, T.; Liu, S. MOD-YOLO: Rethinking the YOLO architecture at the level of feature information and applying it to crack detection. Expert Syst. Appl. 2024, 237, 121346. [Google Scholar] [CrossRef]
Saberironaghi, A.; Ren, J. DepthCrackNet: A Deep Learning Model for Automatic Pavement Crack Detection. J. Imaging 2024, 10, 100. [Google Scholar] [CrossRef]
Zhang, K.; Cheng, H.D.; Zhang, B. Unified approach to pavement crack and sealed crack detection using preclassification based on transfer learning. J. Comput. Civ. Eng. 2018, 32, 04018001. [Google Scholar] [CrossRef]
Jin, X.; Bu, J.; Yu, Z.; Zhang, H.; Wang, Y. FedCrack: Federated transfer learning with unsupervised representation for crack detection. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11171–11184. [Google Scholar] [CrossRef]
Qingyi, W.; Bo, C. A novel transfer learning model for the real-time concrete crack detection. Knowl.-Based Syst. 2024, 301, 112313. [Google Scholar] [CrossRef]
Katsigiannis, S.; Seyedzadeh, S.; Agapiou, A.; Ramzan, N. Deep learning for crack detection on masonry façades using limited data and transfer learning. J. Build. Eng. 2023, 76, 107105. [Google Scholar] [CrossRef]
Wu, L.; Lin, X.; Chen, Z.; Lin, P.; Cheng, S. Surface crack detection based on image stitching and transfer learning with pretrained convolutional neural network. Struct. Control Health Monit. 2021, 28, e2766. [Google Scholar] [CrossRef]
Zheng, Z.; Qi, H.; Zhuang, L.; Zhang, Z. Automated rail surface crack analytics using deep data-driven models and transfer learning. Sustain. Cities Soc. 2021, 70, 102898. [Google Scholar] [CrossRef]
Su, C.; Wang, W. Concrete cracks detection using convolutional neuralnetwork based on transfer learning. Math. Probl. Eng. 2020, 2020, 7240129. [Google Scholar] [CrossRef]
Vinodhini, K.A.; Sidhaarth, K.R.A. Pothole detection in bituminous road using CNN with transfer learning. Meas. Sens. 2024, 31, 100940. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Voon, Z.C.; Chaw, J.K. Crack Segmentation Using DeepLab. Ph.D. Thesis, Tunku Abdul Rahman University College, Kuala Lumpur, Malaysia, 2020. [Google Scholar]
Li, Y.; Ma, R.; Liu, H.; Cheng, G. Real-time high-resolution neural network with semantic guidance for crack segmentation. Autom. Constr. 2023, 156, 105112. [Google Scholar] [CrossRef]
Yao, Z.; Xu, J.; Hou, S.; Chuah, M.C. Cracknex: A few-shot low-light crack segmentation model based on retinex theory for uav inspections. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; pp. 11155–11162. [Google Scholar]
Liu, H.; Miao, X.; Mertz, C.; Xu, C.; Kong, H. Crackformer: Transformer network for fine-grained crack detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 3783–3792. [Google Scholar]

Figure 1. Structure of Crack-Net.

Figure 2. Structural diagram of the dynamic frequency spectrum processing module.

Figure 3. Structural diagram of the bi-attention fusion module.

Figure 4. Structural diagram of the deep feature fusion module.

Figure 5. Schematic diagram of transfer learning for slope cracks.

Figure 6. Dataset annotation.

Figure 7. Data augmentation.

Figure 8. Comparison of model results.

Figure 9. Optimal sample.

Figure 10. Worst sample.

Figure 11. Performance and stability comparison across experimental settings.

Figure 12. Distribution analysis of performance metrics across experimental settings.

Table 1. Comparison of crack detection methods.

Method	Typical Scenario	Advantages	Limitations	Applicability to Slopes
Image Processing	Roads/Concrete surfaces	Simple and fast	Sensitive to lighting; background interference; poor for fine cracks	Low
Machine Learning	Buildings/Pavements	Learns complex features	Requires feature engineering; limited generalization	Moderate to low
Deep Learning	General structures	Automatic feature extraction; high accuracy	Requires large datasets; high computational cost	High potential
Transfer Learning	Pavements/Slopes	Efficient with small samples; strong generalization	Sensitive to domain differences; few studies on slopes	High

Table 2. Contract test.

Model	P	R	AP	ODS	OIS	F1	mIoU	Params (M)
U-Net	0.681	0.716	0.762	0.665	0.684	0.698	0.761	31.0
SegNet	0.674	0.806	0.771	0.689	0.704	0.734	0.758	29.5
DeepLab	0.694	0.802	0.793	0.697	0.714	0.744	0.772	26.6
DeepLabV3	0.712	0.810	0.827	0.715	0.735	0.758	0.785	11.38
HrSegNet-B16	0.748	0.813	0.844	0.734	0.757	0.779	0.784	9.51
CrackNex	0.776	0.808	0.853	0.757	0.789	0.792	0.801	28.3
CrackFormer	0.780	0.818	0.869	0.764	0.797	0.799	0.843	25.4
Ours	0.823	0.844	0.921	0.817	0.841	0.833	0.865	22.3

Table 3. Results of the transfer learning experiment.

Experimental Group	P	R	AP	ODS	OIS	Convergence Epochs	Training Time (h)
No transfer learning	0.823	0.844	0.921	0.817	0.841	126	8.7
With transfer learning	0.854	0.847	0.937	0.821	0.845	73	4.2

Table 4. Ablation experiment.

Model	FDNM	DFFM	P	R	AP	ODS	OIS	FPS
Original	×	×	0.741	0.819	0.851	0.731	0.762	22.9
Only FDNM	✓	×	0.789	0.835	0.884	0.779	0.811	25.5
Only DFFM	×	✓	0.816	0.821	0.892	0.774	0.815	26.7
Full model	✓	✓	0.834	0.846	0.912	0.808	0.830	27.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, N.; Zhang, Y.; Zhang, Q.; Zhu, S. Transfer Learning Model for Crack Detection in Side SlopesBased on Crack-Net. Appl. Sci. 2025, 15, 6951. https://doi.org/10.3390/app15136951

AMA Style

Li N, Zhang Y, Zhang Q, Zhu S. Transfer Learning Model for Crack Detection in Side SlopesBased on Crack-Net. Applied Sciences. 2025; 15(13):6951. https://doi.org/10.3390/app15136951

Chicago/Turabian Style

Li, Na, Yilong Zhang, Qing Zhang, and Shaoguang Zhu. 2025. "Transfer Learning Model for Crack Detection in Side SlopesBased on Crack-Net" Applied Sciences 15, no. 13: 6951. https://doi.org/10.3390/app15136951

APA Style

Li, N., Zhang, Y., Zhang, Q., & Zhu, S. (2025). Transfer Learning Model for Crack Detection in Side SlopesBased on Crack-Net. Applied Sciences, 15(13), 6951. https://doi.org/10.3390/app15136951

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transfer Learning Model for Crack Detection in Side SlopesBased on Crack-Net

Abstract

1. Introduction

2. Related Works

2.1. Crack Detection Method

2.2. Transfer Learning Method

3. Materials and Methods

3.1. Detection Model with Crack-Net

3.1.1. Overall Structure

3.1.2. Frequency-Domain Nonlinear Mapping Module

3.1.3. Bi-Attention Fusion Module

3.1.4. Loss Function

3.2. Transfer Learning Strategy

4. Experiment

4.1. Implementation Details

4.2. Experimental Settings

4.3. Comparative Analysis

4.4. Extreme Sample Visualization Analysis

4.5. Model Stability and Reproducibility Analysis

4.6. Ablation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI