Wind Turbine Surface Crack Detection Based on YOLOv5l-GCB

Hu, Feng; Leng, Xiaohui; Ma, Chao; Sun, Guoming; Wang, Dehong; Liu, Duanxuan; Zhang, Zixuan

doi:10.3390/en18112775

Open AccessArticle

Wind Turbine Surface Crack Detection Based on YOLOv5l-GCB

by

Feng Hu

¹,

Xiaohui Leng

¹,

Chao Ma

¹,

Guoming Sun

¹,

Dehong Wang

^2,*

,

Duanxuan Liu

² and

Zixuan Zhang

²

¹

CGN New Energy Investment (Shenzhen) Co., Ltd. (Jilin Branch), Changchun 130028, China

²

School of Civil Engineering and Architecture, Northeast Electric Power University, Jilin 132012, China

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(11), 2775; https://doi.org/10.3390/en18112775

Submission received: 23 April 2025 / Revised: 17 May 2025 / Accepted: 22 May 2025 / Published: 27 May 2025

Download

Browse Figures

Versions Notes

Abstract

As a fundamental element of the wind power generation system, the timely detection and rectification of surface cracks and other defects are imperative to ensure the stable function of the entire system. A new wind tower surface crack detection model, You Only Look Once version 5l GhostNetV2-CBAM-BiFPN (YOLOv5l-GCB), is proposed to accomplish the accurate classification of wind tower surface cracks. Ghost Network Version 2 (GhostNetV2) is integrated into the backbone of YOLOv5l to realize lightweighting of the backbone, which simplifies the complexity of the model and enhances the inference speed; the Convolutional Block Attention Module (CBAM) is added to strengthen the attention of the model to the target region; and the bidirectional feature pyramid network (BiFPN) has been developed for the purpose of enhancing the model’s detection accuracy in complex scenes. The proposed improvement strategy is verified through ablation experiments. The experimental results indicate that the precision, recall, F1 score, and mean average precision of YOLOv5l-GCB reach 91.6%, 99.0%, 75.0%, and 84.6%, which are 4.7%, 2%, 1%, and 10.4% higher than that of YOLOv5l, and it can accurately recognize multiple types of cracks, with an average number of 28 images detected per second, which improves the detection speed.

Keywords:

wind turbine tower; YOLO; target detection; concrete crack; CBAM; BiFPN; GhostNetV2

1. Introduction

The automatic detection and analysis of cracks in reinforced concrete structures is one of the most essential research topics in the field of structural engineering. The formation of cracks is affected by a number of factors such as the structure’s mechanical behavior, external environment, construction quality, material properties, and long-term loading [1]. Crack detection is an important aspect of the safety evaluation of concrete structures, as cracks are among the most common defects in concrete structures, and their presence can severely affect the functionality, durability, and safety of structures, or worse, lead to catastrophic events such as collapse.

In engineering practices, the detection of surface cracks on structures has long relied on manual inspection methods, applying plaster or pasting paper tape to the cracks to observe whether they break due to crack expansion or placing a crack comparison card close to the cracks and estimating the width by visual comparison. Although these methods are quick, easy, and inexpensive, the disadvantages are that the processes are laborious and time-consuming, and the interpretation of the final results is potentially subjective, presenting a safety risk in the field. Although there are some advanced non-destructive testing (NDT) techniques applied in the domain of surface crack identification in concrete structures, these methods also have obvious drawbacks, such as the ultrasonic detection method, which requires the structural surface to be leveled and has high technical requirements for the operator; the thermal imaging method, which is suitable for rapid screening of large areas but has low sensitivity to fine cracks; and the acoustic emission technique, which may mistakenly identify non-crack-related stress waves (e.g., rebar friction or temperature deformation) as crack signals, is costly to detect for large structures, and is not suitable for concrete whose surface has been severely deteriorated [2,3,4,5,6,7,8,9]. In consequence of two interlinked factors—the fast progression of computer vision technologies and the continuous updating of imaging capture equipment—crack detection methods based on image recognition have begun to appear in the public eye. As a better alternative to traditional detection methods, the application of image recognition technology in the domain of detection could be categorized into two distinct classifications: deep learning-based technique and image processing techniques [10,11]. Various types of image processing techniques exist, such as image preprocessing, morphological work, image percolation, and edge detection. In order to identify cracks in an image, a variety of image processing methods can be employed, including edge detection, image normalization, smoothing, thresholding, and grayscale scaling [12].

As deep learning theory continues to develop, a substantial corpus of research has been amassed by scholars regarding the development laws of concrete structural cracks, as well as target detection, using various deep learning models. Dung et al. [13] raised an approach for crack detection in concrete images based on a deep fully convolutional network (FCN). Yang et al. [14] investigated the crack expansion in reinforced concrete members using the kinetic evolution law of cracks. Sarhadi et al. [15] raised a new image segmentation method to recognize cracks in concrete using optimized U-Net++ architecture. In light of the identified deficit pertaining to a paucity of crack sample sets, Wu et al. [16] raised a self-supervised concrete surface crack recognition method utilizing the second-generation vector-quantized variational autoencoder (VQ-VAE-2), which achieved efficient detection. However, these methods have more parameters, high arithmetic capacity, and high requirements on computer configuration. Song et al. [17] proposed an enhanced semantic segmentation algorithm, termed Another Finite State Machine Network (AFSM-Net), which was developed on the basis of DeepLabv3+. This innovation addressed the limitations of conventional semantic segmentation models, such as low segmentation accuracy, extensive parameterization, and substantial computing demands for cracks. Wang et al. [18] developed a concrete structure surface crack recognition method using Residual Network 18 layers (ResNet-18), combining algorithms for image processing and deep learning theory. This approach has been shown to have the advantages of low cost, high efficacy, and strong practicality. Numerous scholars have also utilized the You Only Look Once (YOLO) model for detecting and counting structural surface cracks [19,20,21,22,23], which achieved higher accuracy and speed, but did not accurately classify the cracks.

In summary, existing research has made significant progress in identifying surface cracks on concrete structures, but less research has been conducted on wind turbine towers, coupled with the complex environment in which wind turbine towers are located and the towering structure, which makes detection more difficult. Combined with deep learning technology, this paper proposes a new wind tower surface crack detection model, YOLOv5l-GCB, to accomplish accurate classification of wind tower surface cracks. GhostNetV2 is incorporated into the backbone of YOLOv5l to realize lightweighting of the backbone, reduce the complexity of the model, and enhance inference speed; the CBAM is added to enhance the attention of the model to the target region; and the bidirectional feature pyramid network (BiFPN) is introduced to enhance the detection performance of the model under the complex scene. Finally, the feasibility of the above improvement measures is verified by ablation experiments, and the detection effect of YOLOv5l-GCB is further verified by comparison experiments with other crack detection models.

2. Wind Turbine Tower Surface Crack Detection Model

2.1. YOLOv5l Target Detection Model

The network construction of YOLOv5 consists mainly of backbone, neck, and head networks. Depending on the depth of the network and the width of the feature layer, YOLOv5 is divided into the following versions: n, s, m, l and x. The dataset generated in this study was tested for the detection performance of five versions of YOLOv5 in order to select the appropriate version. Table 1 shows the results of the tests. In the case of the five versions, YOLOv5n has the lowest number of floating-point operations per second (FLOPs), parameter count number (Params), and precision, while YOLOv5x values are the highest, and the floating-point computation, parameter count, and detection precision of YOLOv5m and YOLOv5l are between YOLOv5s and YOLOv5x. Considering the balance between model complexity and detection precision, this paper selects the basic model of YOLOv5l and makes some improvements. Comparing YOLOv5l with YOLOv5x, which has the best detection effect, the floating-point computation and parameter count decrease dramatically when the precision rate decreases by only 0.2 percentage points.

2.2. YOLOv5l-GCB Target Detection Model

In this study, the following improvements are made on the basis of YOLOv5l to obtain a wind turbine tower surface crack detection model, YOLOv5l-GCB, with higher accuracy and efficiency: (1) introduce GhostNetv2 in the backbone; (2) add CBAM to the neck; and (3) introduce the BiFPN in the Neck. Figure 1 shows the structure of YOLOv5l-GCB.

2.2.1. GhostNetV2

The introduction of lightweight networks is currently an effective strategy for models to reduce computational cost and parameters, and the most representative lightweight networks are the Mobile Network (MobileNet) [24], Shuffle Network (ShuffleNet) [25], and Ghost Network (GhostNet) [26]. MobileNet decomposes the standard convolution into deep and pointwise convolutions, and Mobile Network Version 2 (MobileNetV2) [27] and Mobile Network Version 3 (MobileNetV3) [28] improve the network structure by introducing inverse residual blocks. ShuffleNet uses the idea of channel rearrangement to achieve information exchange, while Shuffle Network Version 2 (ShuffleNetV2) [29] reduces the network branches to achieve the effect of generalizing the inference rate of the model on this basis. The results on the ImageNet dataset demonstrate that GhostNetV2 outperforms other networks with regard to such metrics as classification accuracy, the quantity of parameters, and the velocity of detection [30], so in this article, we introduce GhostNetV2 into the backbone part of YOLOv5l so as to replace the ELAN module and test its effectiveness.

Given an input feature X ∈ R^H×W×C, H, W, and C are the height, breadth, and number of channels of feature maps, respectively. GhostNetV1 linearly transforms part of the feature map generated by a small count of traditional convolutions, which increases the redundancy effect of the feature maps and inherits the advantages of traditional convolutions. It is first generated using 1 × 1 convolution as shown in Equation (1), where * represents the convolution operation symbol, Y′ represents intrinsic features, and F_1×1 denotes point-by-point convolution pixel by pixel.

Y^{'} = X * F 1 \times 1

(1)

Y′ ∈ R^{H×W×C′out} denotes the intrinsic feature, which is generally lower than the original output feature Y ∈ R^H×W×Cout, i.e., C′out < Cout. Subsequently, more features are generated using a simple deep convolution operation, and then splicing the channel dimensional features of the two parts is performed as shown in Equation (2), where F_dp denotes the deep convolutional filter, and Concat denotes the splicing operation function.

Y = Concat ([Y^{'}, Y^{'} * F_{d p}])

(2)

But only half of the spatial information achieves informational interaction, while the other feature information does not have informational interaction after 1 × 1 dot-by-dot convolution. Therefore, GhostNetV2 proposes a novel decoupled fully connected attention (DFC) mechanism, in the context of a given feature Z, defined as a function of Z ∈ R^H×W×C, and with the feature map of the matrix expressed as A = [α₁₁, α₁₂, …, α_hw], the fully connected layer (FC) undergoes a disaggregation process, resulting in the division into horizontal and vertical pixels.

The architecture of GhostNetV2 is characterized by an inverted bottleneck design, which incorporates two ghost modules that function to increase and then decrease the feature dimensions, respectively. The DFC mechanism is synchronized with the first ghost detector module in order to enhance the reliability of capturing long-distance spatial information. The schematic diagram of GhostNetV2 with the DFC attention mechanism is shown in Figure 2.

2.2.2. CBAM

The CBAM attention mechanism constitutes a module that integrates both the channel attention mechanism (CAM) and the spatial attention mechanism (SAM) [31]. Figure 3 shows its structure.

The purpose of the CBAM is to serve the model by augmenting its focus on the salient contour features of the target. This objective is realized through the concurrent implementation of maximum and average pooling operations on the original feature map F, yielding two 1 × 1 feature maps of equivalent dimensions. Subsequently, these feature maps undergo processing via an architecture comprising two layers of fully connected neurons. This results in a one-dimensional vector whose dimensionality is equivalent to the count of channels present in the initial feature maps, thus creating a dimensional reduction in the data. The generation of this vector occurs through the implementation of the sigmoid activation function. This resultant vector is subsequently multiplicated by the feature map corresponding to the input. The mathematical representation of this process is provided in Equation (3), where

M C (F)

signifies the channel attention output weights;

σ

denotes the sigmoid activation function;

F_{a v g}^{c}

represents the spatial feature mapping following average pooling;

F_{m a x}^{c}

indicates the spatial feature mapping subsequent to maximum pooling; and W₀ and W₁ denote the weight matrices for the 1st and 2nd fully connected layers, respectively.

M C (F) = σ (W_{1} (W_{0} (F_{a v g}^{c})) + W_{1} (W_{0} (F_{m a x}^{c})))

(3)

The SAM serves to enhance the model’s capability to concentrate on the features that are pertinent to the specified location. This approach involves the concurrent application of both maximum and average pooling on the original input feature map, denoted as F, resulting in two feature maps, each possessing a single channel and matching the dimensions of the input feature map. Subsequently, these two feature maps are combined through a concatenation operation and processed via standard convolution. This procedure yields a weighted feature map, which retains the same dimensions and channel configuration as the given input feature map, achieved via the application of a sigmoid function. The resulting weighted feature map is then subjected to a multiplication process involving the input feature map. The process can be mathematically expressed in Formula (4), where

M S (F)

signifies the output weights of the spatial attention mechanism;

f^{7 * 7}

represents the convolution kernel of size 7 × 7;

F_{a v g}^{s}

indicates the feature map obtained through average pooling across the channel; and

F_{m a x}^{s}

refers to the feature map derived from maximum pooling across the channel.

M S (F) = σ (f^{7 * 7} [F_{a v g}^{s}; F_{m a x}^{s}])

(4)

2.2.3. BiFPN Bidirectional Pyramids

The model’s capacity to extract features could be enhanced and its performance ensured by employing strategies. In the original YOLOv5l model, a top-down feature pyramid network (FPN) [32] unidirectional information flow is employed. BiFPN is a deep learning-based approach to processing visual data. It treats the top-down and bottom-up neighboring paths in each layer of a multilayer path as a feature network layer. And the incorporation of additional weighted feature fusion into the BiFPN has been demonstrated to enhance the performance of the system whilst concomitantly reducing the number of parameters required. In this study, the BiFPN (a multiscale feature fusion method centered on bidirectional cross-scale connectivity and weighted feature fusion) is adopted. This method has the capacity to extract information from features of differing resolutions. It introduces learning weights in order to define the importance of different input features. These features are defined iteratively as a result of top-down and bottom-up multiscale feature fusion. The method has been demonstrated to possess the capacity to incorporate a greater number of features without the necessity of introducing an excessive number of parameters, which provides powerful semantic information for the network and highly enhances the identification capability of the model in complex scenes, and the BiFPN structure is shown in Figure 4.

3. Experiments and Results

3.1. Dataset

Since there is no publicly available dataset on the surface cracks of wind turbine towers, a dataset was constructed in this study through on-site photography, and to meet the actual needs of the project, the following factors were taken into account when taking the photos: lighting conditions, shooting angle, and shooting distance. The shooting equipment includes the following: Huawei Mate50Pro and Mate30Pro (Huawei, Shenzhen, China), with the resolution of the photos at 1280 × 720 and 1080 × 1920 pixels, divided into six categories, including block crack, longitudinal crack, transverse crack, alligator crack, oblique crack, and pothole, as shown in Figure 5, with a total of 3144 photos. Then, the Labelimg (v1.8.5) image annotation tool was used to label the rectangular boxes of the images in the Pascal VOC [33] format, and the data format was converted to the YOLO format after the annotation was completed. The training set and test set were divided into two groups, 80% and 20%, respectively.

3.2. Experimental Platforms and Evaluation Indicators

The experimental platform is shown in Table 2 and the hyper-parameter settings are shown in Table 3. The hyper-parameters are set with reference to the official best configuration when training the YOLOv5 model on the COCO public dataset and, by transplanting the training parameters of the original model, can provide a better starting point for the training of YOLOv5l-GCB, accelerate the training process, and thus enhance the efficacy of the model on the concrete crack detection task [34].

To measure the experimental effect of the methodology in this article, this paper uses the evaluation metrics commonly used in deep learning, as demonstrated in Table 4.

3.3. Training Results

YOLOv5l and YOLOv5l-GCB are trained using the dataset created for this paper. The changes in loss functions obtained are shown in Figure 6. Comparing the trends of the confidence loss function and localization loss function before and after the model improvement, it is evident that at the end of training, the objectness loss function, box loss function, and classification loss function of YOLOv5l-GCB exhibit smaller loss values, indicating that the error between the prediction frames and the real frames generated by YOLOv5l-GCB is smaller, that the location of the prediction frames is more accurate, and, at the same time, that YOLOv5l-GCB has a stronger ability to correctly predict targets.

3.4. Ablation Experiments

To confirm the efficiency of YOLOv5l-GCB in identifying wind tower surface cracks, a total of five experiments pertaining to the ablation process were conducted. These tests illustrate the validity of the enhanced methodology proposed. The results of the comparison of the baseline network in the ablation tests and the network improved by each module are shown in Table 5. Firstly, the integration of GhostNetV2 within the backbone network of YOLOv5l enhances the FPS by 5.6, suggesting that the GhostNetV2 module contributes to the lightweighting of the backbone, the reduction in model complexity, and the acceleration of inference speed. Subsequent to the incorporation of the CBAM, the P, R, and mAP were augmented by 0.6%, 1%, and 3%, respectively. The results show that the introduction of the CBAM serves to reinforce the model’s emphasis on the target region. Following the introduction of the BiFPN, the values of P, R, and mAP each increased by 0.9%, 1%, and 3%. This suggests that YOLOv5-GCB could effectively integrate the feature information of the shallow and deep networks after the replacement of the structure. Furthermore, it improves the model’s detection precision in complex scenes.

3.5. Comparison Results of Different Model Detection

In order to perform a more objective evaluation of the overall performance of YOLOv5l-GCB, the same dataset, experimental environment, and hyper-parameter configurations were used to compare it with the existing crack detection models of the YOLO series, including YOLOv3 [19], YOLOv4 [20], YOLOv5l [21], YOLOv7 [22], and YOLOv8 [23].

The experimental results of the above models are shown in Figure 7. It is evident from the findings that the YOLOv5-GCB model exhibits an augmentation in precision, recall, F1-score, and mAP by 4.7%, 2%, 1%, and 10.4%, respectively, in comparison to its predecessor YOLOv5l. In terms of detection speed, YOLOv5l-GCB achieves an FPS of 28.6, which means that 28 images can be detected per second, a 39% improvement in detection speed, and a 60% saving in training and testing time compared to YOLOv5l. Compared to other models, its P, R, F1, and mAP are the highest. Although YOLOv5l-GCB is not optimal in terms of detection speed and training and testing time, in terms of the overall performance, YOLOv5l-GCB has a higher accuracy and has a certain advantage in the detection of cracks on the surface of wind turbine towers.

The detection results of each model for the six crack categories are shown in Figure 8.

Oblique cracks can lead to shear damage or overall destabilization of the member, which can be harmful to the structure. In the case of oblique cracks, YOLOv3 failed to successfully identify an oblique crack but also did not misidentify the shadow as an oblique crack, YOLOv4 also failed to successfully identify an oblique crack but misidentified the shadow as an oblique crack and divided it into two parts and also misidentified the painted area as an oblique crack, and YOLOv5l misidentified the shadow as an oblique crack but was able to successfully identify an oblique crack. YOLOv8, YOLOv7, and YOLOv5l-GCB all recognized the oblique cracks, but YOLOv5l-GCB delineated the oblique cracks more carefully.

Longitudinal cracks may reduce the overall stiffness and load carrying capacity, which is more harmful to the structure. In the case of longitudinal cracks, all models can identify them: YOLOv3 and YOLOv4 can only identify obvious link joints rather than cracks, YOLOv5l and YOLOv8 can identify small longitudinal cracks but with a larger frame range, and YOLOv7 and YOLOv5l-GCB can both accurately identify link joints and small longitudinal cracks, but YOLOv5l-GCB recognizes small longitudinal cracks more accurately and with higher values.

Transverse cracks, although not as hazardous as oblique and longitudinal cracks, also have the potential to reduce member flexural stiffness. In the case of transverse cracks, YOLOv3, YOLOv4, and YOLOv8 successfully identified them but also misidentified the shaded portion as a transverse crack, and YOLOv5l, YOLOv7, and YOLOv5l-GCB successfully identified them and did not misidentify the shaded portion, with YOLOv5l-GCB discriminating all of the transverse cracks and delineating them crack in a more detailed way.

Alligator cracks are mostly surface damage, but may accelerate carbonization and chloride ion penetration, affecting durability in the long term. In the case of these cracks, YOLOv3 misidentified them as longitudinal cracks, YOLOv4 identified one crack with low probability, YOLOv5l and YOLOv8 both managed successful identification with high probability but of only one crack, YOLOv7 misidentified one as an oblique crack although it successfully identified two with high probability, and YOLOv5l-GCB successfully identified two alligator cracks but one of them with low probability.

Block cracks usually do not affect structural safety but need to be repaired to prevent further deterioration. In the case of block cracks, YOLOv3 misidentified them as two transverse cracks, YOLOv4 misidentified them as one transverse crack and one longitudinal crack, YOLOv5l identified them successfully but with a smaller range, YOLOv7 and YOLOv5l-GCB both identified them successfully with a high probability and range, with YOLOv5l-GCB accurately identifying them with a higher probability and range, and YOLOv8 successfully recognized them but with a much lower probability and range than YOLOv7 and YOLOv5l-GCB.

Potholes affect the esthetics of the structure and are less harmful to the structure. For potholes, YOLOv3 and YOLOv4 incorrectly identified surface contamination as potholes, YOLOv5l and YOLOv8 similarly incorrectly identified surface contamination as potholes, and with a higher number but with a higher probability of identifying true pothole defects, and both YOLOv7 and YOLOv5l-GCB accurately identified pothole defects, though YOLOv7 had one more misdetection compared to YOLOv5l-GCB, with one more false detection.

4. Conclusions

(1): A wind turbine tower surface crack detection model, YOLOv5l-GCB, is proposed, which simplifies the model’s complexity and enhances the inference speed by introducing GhostNetV2 into the backbone of YOLOv5l to realize lightweighting of the backbone, includes the CBAM to enhance the model’s attention to the target region, and introduces the BiFPN to improve the detection accuracy of the model under complex scenarios, with ablation experiments conducted to confirm the viability of the above improvement measures.
(2): The values of YOLOv5l-GCB precision, recall, F1 score, and mean average precision reached 91.6%, 99.0%, 75.0%, and 84.6%, which improved by 4.7%, 2%, 1%, and 10.4% compared to YOLOv5l. The detection speed is improved to 28 sheets per second, and the training and testing time is saved by 60%; it has a certain superiority compared to the other commonly used lightweight detection models, such as YOLOv3, YOLOv4, YOLOv5l, YOLOv7, and YOLOv8.
(3): YOLOv5l-GCB provides a new way of thinking for the precise classification of surface cracks in concrete structures. In the practical application of the project, an unmanned aerial vehicle could be used to detect the surface cracks of wind turbine towers in real time to complete the precise location and classification of cracks, so as to take corresponding reinforcement measures according to the development of the cracks and prevent the problem before it occurs.
(4): YOLOv5l-GCB has shown good results in detecting a wide range of cracks (block cracks, longitudinal cracks, transverse cracks, alligator cracks, oblique cracks, and potholes) under normal weather conditions, but the robustness of its detection performance under different weather conditions (e.g., rainy and snowy) has not been verified, which will be a clear direction for the next research.

Author Contributions

F.H.: project administration and writing—review and editing; X.L.: formal analysis; C.M.: data curation; G.S.: supervision and methodology; D.W.: funding acquisition and writing—review and editing; D.L.: writing—original draft and visualization; Z.Z.: software and investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

Authors Feng Hu, Xiaohui Leng, Chao Ma, and Guoming Sun were employed by the company CGN New Energy Investment (Shenzhen) Co., Ltd. (Jilin Branch). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Kim, H.; Lee, J.; Ahn, E.; Cho, S.; Shin, M.; Sim, S.-H. Concrete crack identification using a UAV incorporating hybrid image processing. Sensors 2017, 17, 2052. [Google Scholar] [CrossRef] [PubMed]
Grosse, C.U.; Finck, F. Quantitative evaluation of fracture processes in concrete using signal-based acoustic emission techniques. Cem. Concr. Compos. 2006, 28, 330–336. [Google Scholar] [CrossRef]
Ibrahim, M.E. Nondestructive testing and structural health monitoring of marine composite structures. In Marine Applications of Advanced Fibre-Reinforced Composites; Woodhead Publishing: Sawston, UK, 2016; pp. 147–183. [Google Scholar]
Özcebe, A.G.; Tiganescu, A.; Ozer, E.; Negulescu, C.; Galiana-Merino, J.J.; Tubaldi, E.; Toma-Danila, D.; Molina, S.; Kharazian, A.; Bozzoni, F.; et al. Raspberry shake-based rapid structural identification of existing buildings subject to earthquake ground motion: The case study of Bucharest. Sensors 2022, 22, 4787. [Google Scholar] [CrossRef]
Abdelkader, E.M.; Zayed, T.; Faris, N. Synthesized evaluation of reinforced concrete bridge defects, their non-destructive inspection and analysis methods: A systematic review and bibliometric analysis of the past three decades. Buildings 2023, 13, 800. [Google Scholar] [CrossRef]
Zhao, M.; Nie, Z.; Wang, K.; Liu, P.; Zhang, X. Nonlinear ultrasonic test of concrete cubes with induced crack. Ultrasonics 2019, 97, 1–10. [Google Scholar] [CrossRef]
Watanabe, T.; Trang, H.T.H.; Harada, K.; Hashimoto, C. Evaluation of corrosion-induced crack and rebar corrosion by ultrasonic testing. Constr. Build. Mater. 2014, 67, 197–201. [Google Scholar] [CrossRef]
Wang, D.; Ma, Y.; Kang, M.; Ju, Y.; Zeng, C. Durability of reactive powder concrete containing mineral admixtures in seawater erosion environment. Constr. Build. Mater. 2021, 306, 124863. [Google Scholar] [CrossRef]
Farhidzadeh, A.; Dehghan-Niri, E.; Salamone, S.; Luna, B.; Whittaker, A. Monitoring crack propagation in reinforced concrete shear walls by acoustic emission. J. Struct. Eng. 2013, 139, 04013010. [Google Scholar] [CrossRef]
Gupta, P.; Dixit, M. Image-based crack detection approaches: A comprehensive survey. Multimed. Tools Appl. 2022, 81, 40181–40229. [Google Scholar] [CrossRef]
Golding, V.P.; Gharineiat, Z.; Munawar, H.S.; Ullah, F. Crack detection in concrete structures using deep learning. Sustainability 2022, 14, 8117. [Google Scholar] [CrossRef]
Lee, J.; Kim, H.-S.; Kim, N.; Ryu, E.-M.; Kang, J.-W. Learning to detect cracks on damaged concrete surfaces using two-branched convolutional neural network. Sensors 2019, 19, 4796. [Google Scholar] [CrossRef] [PubMed]
Dung, C.V. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
Yang, Y.; Yang, H.; Fan, Z.; Mu, Z. Crack Propagation Law of Reinforced Concrete Beams. Appl. Sci. 2024, 14, 409. [Google Scholar] [CrossRef]
Sarhadi, A.; Ravanshadnia, M.; Monirabbasi, A.; Ghanbari, M. Using an improved U-Net++ with a T-Max-Avg-Pooling layer as a rapid approach for concrete crack detection. Front. Built Environ. 2024, 10, 1485774. [Google Scholar] [CrossRef]
Wu, J.; Liu, C. VQ-VAE-2 based unsupervised algorithm for detecting concrete structural apparent cracks. Mater. Today Commun. 2025, 44, 112075. [Google Scholar] [CrossRef]
Song, F.; Wang, D.; Dai, L.; Yang, X. Concrete bridge crack semantic segmentation method based on improved DeepLabV3+. In Proceedings of the 2024 IEEE 13th Data Driven Control and Learning Systems Conference (DDCLS), Kaifeng, China, 17–19 May 2024; IEEE: Piscataway, NJ, USA; pp. 1293–1298. [Google Scholar]
Wang, R.; Zhou, X.; Liu, Y.; Liu, D.; Lu, Y.; Su, M. Identification of the surface cracks of concrete based on resnet-18 depth residual network. Appl. Sci. 2024, 14, 3142. [Google Scholar] [CrossRef]
Gu, H.; Zhu, K.; Strauss, A.; Shi, Y.; Sumarac, D.; Cao, M. Rapid and Accurate Identification of Concrete Surface Cracks via a Lightweight & Efficient YOLOv3 Algorithm. Struct. Durab. Health Monit. SDHM 2024, 18, 363–380. [Google Scholar]
Yao, G.; Sun, Y.; Wong, M.; Lv, X. A real-time detection method for concrete surface cracks based on improved YOLOv4. Symmetry 2021, 13, 1716. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, T.; Xu, J.; Hong, Y.; Pu, Q.; Wen, X. Rotating target detection method of concrete bridge crack based on YOLO v5. Appl. Sci. 2023, 13, 11118. [Google Scholar] [CrossRef]
Ye, G.; Qu, J.; Tao, J.; Dai, W.; Mao, Y.; Jin, Q. Autonomous surface crack identification of concrete structures based on the YOLOv7 algorithm. J. Build. Eng. 2023, 73, 106688. [Google Scholar] [CrossRef]
Dong, X.; Liu, Y.; Dai, J. Concrete Surface Crack Detection Algorithm Based on Improved YOLOv8. Sensors 2024, 24, 5252. [Google Scholar] [CrossRef] [PubMed]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2018; pp. 6848–6856. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Lei, Y.; Yao, Z.; He, D. Automatic detection and counting of urediniospores of Puccinia striiformis f. sp. tritici using spore traps and image processing. Sci. Rep. 2018, 8, 13647. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Liu, S.; Hu, Z.; Bai, Y.; Shen, C.; Shi, X. Separate degree based Otsu and signed similarity driven level set for segmenting and counting anthrax spores. Comput. Electron. Agric. 2020, 169, 105230. [Google Scholar] [CrossRef]
Ma, X.; Li, Y.; Yang, Z.; Li, S.; Li, Y. Lightweight network for millimeter-level concrete crack detection with dense feature connection and dual attention. J. Build. Eng. 2024, 94, 109821. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Sohaib, M.; Jamil, S.; Kim, J.M. An ensemble approach for robust automated crack detection and segmentation in concrete structures. Sensors 2024, 24, 257. [Google Scholar] [CrossRef]
Yu, H.; Yang, S.; Zhu, S. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning. Proc. AAAI Conf. Artif. Intell. 2019, 33, 5693–5700. [Google Scholar] [CrossRef]

Figure 1. Structure of YOLOv5l-GCB.

Figure 2. GhostNetV2 and DFC attention mechanism structure.

Figure 3. Structure of CBAM.

Figure 4. BiFPN structure.

Figure 5. Crack categories.

Figure 6. Loss function variation.

Figure 7. Experimental results of different models.

Figure 8. Detection effects of different models.

Table 1. Different versions of YOLOv5.

Model	Params/×10⁶	FLOPS/×10⁹	Precision P/%
YOLOv5n	2.67	7.7	80.6
YOLOv5s	7.25	22.4	86.3
YOLOv5m	18.60	64.2	88.2
YOLOv5l	44.10	107.5	91.4
YOLOv5x	92.20	189.1	91.6

Table 2. Experimental platforms.

Configuration	Specific Parameters
CPU	Intel (R) Core (TM) i7-9750H@2.60GHz (Intel, Santa Clara, CA, USA)
GPU	NVIDIA GeForce RTX2060 (NVIDIA, Santa Clara, CA, USA)
Computer Memory	16G
Operating System	Windows 10 (64-bit)
Software Framework	Python 3.7 + Pytorch 1.8.2
GPU Acceleration Library	CUDA12.6 + cuDNN8.0.5

Table 3. Hyper-parameter settings.

Hyper-Parameters	Value
Input image size	640 × 640 pixels
Batch-size	4
Workers	4
Epoch	300
Momentum factor	0.937
Initial learning rate	0.01
Weight decay coefficient	0.0005
Learning rate adjustment strategies	cosine annealing
Optimization algorithm	stochastic gradient descent (SGD) [35]

Table 4. Evaluation indicators.

Evaluation Indicators	Significance
Precision (P)	P is the proportion of correct predictions that are positive to all predictions that are positive, and it represents the degree of prediction accuracy in positive sample results.
Recall (R)	R is the true positive rate to all actual positive cases.
F1-score (F1)	The F1-score considers both P and R in a combined manner.
Mean average precision (mAP)	The curve in which P is plotted as the vertical coordinate and R on the horizontal axis is referred to as the P-R curve. The region beneath the curve denotes the average precision (AP) for the category, and the ratio of AP to crack type is the mean average precision (mAP).
Frames per second (FPS)	The quantity of images that the model processes per second is denoted by the term FPS; a higher FPS indicates a more rapid detection capability of the model.

Table 5. Results of ablation experiment.

GhostNetV2	CBAM	BiFPN	P/%	R/%	mAP/%	FPS
×	×	×	86.9	97.0	74.2	20.6
√	×	×	87.2	97.0	79.7	26.2
×	√	×	87.5	98.0	77.6	28.0
×	×	√	87.8	98.0	77.2	27.8
√	√	√	91.6	99.0	84.6	28.6

Note: √ indicates that the module is used, × indicates that the module is not used.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, F.; Leng, X.; Ma, C.; Sun, G.; Wang, D.; Liu, D.; Zhang, Z. Wind Turbine Surface Crack Detection Based on YOLOv5l-GCB. Energies 2025, 18, 2775. https://doi.org/10.3390/en18112775

AMA Style

Hu F, Leng X, Ma C, Sun G, Wang D, Liu D, Zhang Z. Wind Turbine Surface Crack Detection Based on YOLOv5l-GCB. Energies. 2025; 18(11):2775. https://doi.org/10.3390/en18112775

Chicago/Turabian Style

Hu, Feng, Xiaohui Leng, Chao Ma, Guoming Sun, Dehong Wang, Duanxuan Liu, and Zixuan Zhang. 2025. "Wind Turbine Surface Crack Detection Based on YOLOv5l-GCB" Energies 18, no. 11: 2775. https://doi.org/10.3390/en18112775

APA Style

Hu, F., Leng, X., Ma, C., Sun, G., Wang, D., Liu, D., & Zhang, Z. (2025). Wind Turbine Surface Crack Detection Based on YOLOv5l-GCB. Energies, 18(11), 2775. https://doi.org/10.3390/en18112775

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wind Turbine Surface Crack Detection Based on YOLOv5l-GCB

Abstract

1. Introduction

2. Wind Turbine Tower Surface Crack Detection Model

2.1. YOLOv5l Target Detection Model

2.2. YOLOv5l-GCB Target Detection Model

2.2.1. GhostNetV2

2.2.2. CBAM

2.2.3. BiFPN Bidirectional Pyramids

3. Experiments and Results

3.1. Dataset

3.2. Experimental Platforms and Evaluation Indicators

3.3. Training Results

3.4. Ablation Experiments

3.5. Comparison Results of Different Model Detection

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI