1. Introduction
To realize the goal of sustainable development, many countries are striving to promote energy transition. According to the International Renewable Energy Agency (IRENA) [
1], in 2022, the global installed capacity of renewable energy generation reached approximately 295 GW, which was an 83% share of the global capacity expansion, with wind energy adding 74.7 GW, accounting for approximately 25% of the total. It is expected that the contribution of wind power generation will increase by 5–30% by 2050 [
2]. Cold, high-altitude climates provide around 10% more exploitable wind energy because denser air increases power output [
3,
4], prompting large wind-farm deployment in China’s “Three-North” region and the Yunnan–Guizhou Plateau [
5] as well as in northern Europe and North America [
6]. However, the existence of moisture in such environments can pose a higher risk of freezing for wind turbine units, especially exposed components such as WTB and anemometers [
7]. Moreover, the rapid development of offshore wind power in recent years has also faced icing issues, as high moisture content and ocean splashing cause WTB freezing at low temperatures [
8].
The main hazards of WTB ice cover include variation in blade aerodynamic profile, changes in load distribution, ice throw, and damage to sensing equipment. (1) Icing, especially glaze ice formation, significantly alters the flow field, resulting in reduced aerodynamic performance of the WTB [
9]. When the WTB lift coefficient is reduced, it will affect the output power, resulting in economic damage; when the stall of WTB is delayed, the wind turbine is prone to overload, and safety will be affected [
10,
11]. Gao et al. [
12,
13,
14,
15] conducted a series of experimental studies using icing tunnels to investigate the degradation of aerodynamic performance due to ice accumulation through the dynamic process of ice accumulation on surfaces. (2) Uneven load distribution due to icing can exacerbate the fatigue of WTB and even cause them to crack. Afzal et al. [
16] proposed that the weight of ice on a WTB can be up to 50% of its weight. Moreover, this uneven distribution induces additional vibration in WTB [
17]. (3) The centrifugal force generated by the rotation of the wind turbine can throw ice from the blades, posing a threat to the safety of nearby people, buildings, etc. [
18,
19]. Tammelin B et al. [
20] found that the distance over which ice is thrown from WTB can reach 1.5 times the sum of the turbine height and the rotor diameter. The shedding of bulk ice induces additional vibrations [
21,
22]. (4) The presence of ice on WTB interferes with sensing equipment, such as anemometers and wind vanes [
23]. This affects the control system of the wind turbine, causing the turbine to operate away from optimal conditions. In 2021, heavy snowfall in Texas severely impacted the operation of local wind turbines, resulting in various problems, including pump system failures, internet outages, and heating issues [
24]. Hence, it is crucial to develop an appropriate monitoring method to detect icing on the surface of WTB.
Wind turbine condition monitoring becomes increasingly important in wind farm management [
25]. This aligns with a broader trend in engineering where dynamic modeling of a physical system’s input and output data is foundational to creating digital twins and enabling intelligent risk assessment for critical infrastructure. For instance, recent studies have developed sophisticated probabilistic models to predict extreme events and assess risks in transportation infrastructures based on monitoring data [
26]. The core principles of using sensor-derived data to identify risk states are directly transferable to energy systems, where early and accurate detection of anomalies like icing is paramount. The inspection of WTB can be categorized into offline and online methods [
27]. Offline monitoring involves shutting down the units for regular inspections, which is not only expensive but impractical for identifying failures between inspection intervals [
28]. Therefore, online condition monitoring technology is more suitable for wind farms with longer inspection intervals [
29,
30]. The use of computer vision, a real-time detection tool, to detect WTB icing has the following advantages: (1) The ability to detect the icing condition of WTB in the early stages. (2) There is no need to install monitoring equipment on the surface of WTB or inside, which could protect the aerodynamic properties of wind turbines. (3) The operation and maintenance of detection equipment is convenient.
The use of computer vision for the detection of WTB icing is currently divided into two categories: the traditional method, which relies primarily on image preprocessing, machine learning, and edge detection; and the deep learning method, which relies on convolutional neural network-based target detection algorithms. The basic train of thought behind traditional detection is to preprocess images with image denoising, enhancement, correction, etc., and then discriminate the ice-covering condition by using the edge detection algorithm or pixel coordinate judgment. Hu et al. [
31] used the above-mentioned preprocessing method to solve issues of low contrast, unclear details, and blurred edges in the data. Subsequently, the edge of icing on WTB was detected using multiscale wavelet analysis, and the thickness of ice on the leading edge of WTB was calculated using pixel coordinates. Similarly, M. Akhloufi et al. [
32] and G. Skrimpas et al. [
33] all used traditional detection methods to identify WTB icing images by image preprocessing combined with different classification or edge detection algorithms. The aforementioned traditional methods, along with hybrid approaches that integrate conventional techniques with the strengths of deep learning, continue to exhibit substantial application value and competitiveness, primarily attributed to their streamlined design and robust interpretability. In specific scenarios—such as environments with stable lighting, simple backgrounds, or constrained computing resources—the traditional methods referenced above are feasible for detecting the icing state of wind turbine blades (WTB). Nevertheless, the detection process of these methods excessively hinges on prior information that only remains valid within specific environmental contexts. This over-reliance not only leads to insufficient efficiency but also renders them incapable of meeting the demands of real-time monitoring. As a consequence, such detection methods are deficient in universal applicability. Furthermore, enhancing model robustness in complex operational environments is a key research focus; some studies even show that deliberately introducing ‘positive-incentive noise’ can paradoxically improve the diagnostic accuracy of AI models, challenging traditional denoising paradigms [
34]. Simultaneously, as models become more complex, their ‘black-box’ nature poses a significant challenge for deployment in safety-critical systems, making the pursuit of interpretable AI methods that can provide physically meaningful results a vital research avenue [
35].
With the rapid development of deep learning, it has been widely used in the field of WTB image recognition, but most of the current research using deep learning to detect WTB focuses on the detection of surface damage [
36,
37,
38], and fewer studies have focused on WTB ice cover detection. Markus et al. [
39] installed a camera on the nacelle of a wind turbine to record the ice on the blades. They compared various convolutional neural network models and determined that the VGG19 model exhibited the highest level of performance. Kemal H et al. [
24] trained several network models to classify WTB icing and demonstrated their reliability with the category activation graph method. Finally, the accuracy of the proposed model is verified by comparing it with the best model using U-Net. In comparison to previous studies employing general classification models such as VGG19, ResNet-50, and Inception-V3, YOLO and its improved versions exhibit notable advantages in model design, performance, application scenarios, as well as efficiency and real-time capabilities. These models are typically designed to concurrently identify objects within images and localize their positions. However, in comparison to the currently widely used target detection model, it continues to encounter the challenge of achieving a balance between precise positioning and swift detection.
Accordingly, to achieve real-time detection of WTB icing thickness, this anchor-free method has fast inference speed while maintaining high accuracy, making it suitable for real-time object detection tasks. This paper proposes an improved PP-YOLOE network and realizes fast and accurate recognition on the dataset constructed by the wind tunnel laboratory. The main contributions of this paper include the following three points:
A wind-tunnel–based dataset of wind-turbine-blade icing states is established, filling a data-resource gap in the field and providing a solid foundation for subsequent research.
An improved PP-YOLOE network is developed with multiple innovations over the original version, including embedding a Coordinate Attention (CA) mechanism in the CSPRes backbone and introducing an Atrous Spatial Pyramid Pooling (ASPP) module into the CSPBlock of the PANet structure; these enhancements markedly strengthen the model’s ability to detect blade icing.
Particle Swarm Optimization (PSO) is applied for the first time to tune the model’s parameters, further boosting detection performance and constituting another key innovation of the study.
2. Model and Methods
To improve the speed and accuracy of the measurement of ice cover thickness detection on WTB, this paper introduces the base model PP-YOLOE. This is an ‘evolved’ version of the YOLO (You Only Look Once) series of models, and it is an industrial target detector with high performance and easy deployment [
40]. The PP-YOLOE network structure is divided into three key components: backbone, neck, and head. Backbone uses the CSPRes network to extract features. Neck employs the PANet structure for multiscale fusion. The head completes the classification and regression tasks.
Figure 1 shows the PP-YOLOE network structure as well as core module components. In the backbone of
Figure 1, the first parameter denotes the change in the number of channels, while the second parameter represents the number of ResBlocks in the CSPRes structure.
To improve the detection accuracy of WTB icing state, not only have CA and ASPP been introduced to improve detection performance but an innovative approach to PSO has also been used to optimize the model parameters [
41,
42]. These improvements have led to an increase in the performance of the model, which can identify the ice cover thickness condition on WTB with higher accuracy.
2.1. Coordinate Attention-Based Model of Ice Thickness Measurement Y-Att-Net
The ice cover thickness measurement model Y-Att-Net was constructed by introducing an attention mechanism based on PP-YOLOE, which is a module with a higher ability to capture ice cover features.
In traditional backbone networks, feature extraction is mainly based on local relationships, which means that the network can only consider local regions in the input data and cannot effectively capture global or directional position information. As the depth of the network increases, model performance may decrease because it becomes more difficult to extract useful features using traditional methods. To address this problem, an attention mechanism was used that considers directional location information and focuses on channel information. Specifically, coordinate attention [
43] was embedded into the Res Block module of the backbone network, a common residual network component used to solve the gradient vanishing or gradient explosion problem in deep networks, and the structure of the Res Block is shown in
Figure 2a. The Coordinate Attention (CA) module is embedded into the Res Block module, specifically after the CBS and before the Rep VGG Block, as shown in
Figure 2b. The incorporation of the CA module enhances the attention mechanism and the utilization of positional information, effectively improving the perception of ice-edge and texture features while maintaining the model’s lightweight nature.
Coordinate attention encodes channels and establishes dependencies over long distances through accurate positional information. The goal of coordinate attention is to encode accurate positional information to establish dependencies between channels and distances during feature extraction. The introduction of coordinate information enables the model to better utilize position information to establish dependencies between channels, thus improving the localization and recognition performance. To better understand the coordinate attention mechanism,
Figure 3 specifically depicts the process of generating coordinate attention.
Given input X, each channel encodes a dimension along the horizontal and vertical axes using (H, 1) and (1, W), respectively. Thus, the output of the
-th channel with height
and the output of the
-th channel with width
can be expressed as
The aggregated feature maps obtained from the
-th channel above in the height and width formulas are spliced, and the spliced features are next fed into a common
convolutional transform function
to obtain the intermediate feature mapping
:
where
denotes the concatenation operation along the spatial dimension,
δ is a non-linear activation function (specifically, a ReLU function in our implementation), and
is an intermediate feature mapping encoding spatial information in the horizontal and vertical directions. Along the spatial dimension,
is split into two different tensors
and
. Using two
convolutions
and
,
and
are converted into two tensors with the same number of input
channels. The equation can be expressed as
where
is the sigmoid activation function, which scales the output to a range of [0, 1]. The resulting
and
serve as the attention weights for the horizontal and vertical directions, respectively. The final output of the coordinate attention module is then computed as
where
is the basic feature input at
,
is the output of the coordinate attention module at
.
2.2. Atrous Spatial Pyramid Pooling-Based Model of Ice Thickness Measurement SP-Y-Net
The SP-Y-Net model is created by combining Y-Att-Net with the ASPP [
44] structure. This model further improves ice thickness detection performance through a wider range of sensor fields and feature extraction, allowing more accurate identification of ice cover conditions at different scales.
In the neck of PP-YOLOE, the SPP structure serves to extract information from different receptive fields. As illustrated in
Figure 4a, it achieves this by employing multiple parallel max-pooling layers, each with a different kernel size. This allows the model to capture feature information at various scales. However, we argue that it still cannot adequately capture the relations in global and local contexts when compared to our proposed alternative. The theoretical advantage of replacing SPP with ASPP lies in how they capture multi-scale information. SPP utilizes max-pooling with different kernel sizes, which aggressively down-samples features and can lead to a loss of fine-grained spatial detail. In contrast, ASPP employs atrous (dilated) convolutions with varying dilation rates. This allows the model to probe features at multiple scales and capture a wider context without reducing spatial resolution. For WTB icing detection, this is particularly beneficial: a small dilation rate can focus on the texture of fine ice crystals, while a large dilation rate can perceive the overall shape and extent of the ice coverage along the blade’s edge. This richer, multi-scale contextual information, gathered without sacrificing spatial detail, theoretically leads to more robust feature representation and, consequently, superior detection performance compared to the SPP module. By applying different expansion rate expansion factors to the input features, the convolution operation is completed, multiple feature maps and the feature maps of the pooled input features are merged, and finally, the fused features are output using a
convolution. The working process of ASPP is depicted in
Figure 4b.
PP-YOLOE networks utilize SPP solely in the first CSPBlock module of the multiscale-structured PANet and not in any other CSPBlock modules. In contrast, SPP is replaced not only in the first CSPBlock module of the multiscale-structured PANet with ASPP but also in integrating ASPP into other CSPBlock modules.
From the previously discussed stage onwards, this paper presents two improvements to the PP-YOLOE network structure, as shown in
Figure 5. First, the attention mechanism is integrated into the CSPRes module of the backbone network. Second, ASPP replaces SPP in the initial CSPBlock module of the intermediate layer network and is also incorporated into other CSPBlock modules.
2.3. PSO Algorithm-Based Model of Ice Thickness Measurement PSO-Opti-Net
To further enhance the performance of the model, we introduce a parameter optimization phase using the Particle Swarm Optimization (PSO) algorithm, resulting in the final PSO-Opti-Net model. It is important to clarify that our innovation does not lie in modifying the canonical PSO algorithm itself, but rather in its novel application to automate the hyperparameter tuning process for a WTB icing detection network. This approach systematically explores the complex parameter space (e.g., learning rate, batch size) to identify a more optimal configuration than what is typically achievable through manual or grid-search methods, thereby constituting a key contribution of our methodology. The performance of the model depends greatly on the choice of parameters, and to achieve the best combination of parameters, this paper employs PSO, the core idea of which is to build a particle swarm, where each particle represents a set of parameters of the SP-Y-Net model. These hyperparameters, which govern the training process, include the initial learning rate, the batch size, and the momentum for the Adam optimizer. The PSO continually updates the velocity and position of the particles to find parameter combinations with better fitness values. This process is accomplished through many iterations, the particle swarm gradually converges to the global best position, and the final parameter combination obtained is the best PSO-Opti-Net parameter setting.
The PSO-Opti-Net for WTB ice cover detection balances global search and local search, allowing the model’s parameter combinations to better fit the data and reduce errors. In this way, the researcher can automatically search the parameter space, which improves the performance of the SP-Y-Net model and provides an effective parameter tuning strategy, which is implemented in the following steps:
1. Initialize the particle swarm: create a swarm of particles, each representing a combination of parameters of the SP-Y-Net model, as follows:
where
denotes the combination of parameters for particle
and
denotes the number of parameters.
2. Initialize the velocity and position: randomly initialize the velocity and position for each particle as follows:
where
and
are the velocity and position of the current particle, respectively.
3. Initialize the individual best position: calculate the individual best position of each particle, i.e., find the minimum fitness value in the parameter space as follows:
where
represents finding the value of
that minimizes
among all possible values of
.
4. Initialize the global best position: find the particle with the best fitness value in the whole particle swarm and set its parameters as the global best position as follows:
5. Update Velocity and Position: For each particle, update the velocity and position according to the following equations:
Location Updates:
where
denotes the velocity of particle
in dimension
,
denotes the current position of particle
in dimension
,
denotes the individual best position of particle
in dimension
,
denotes the global best position in dimension
,
is the inertia weight,
and
are the acceleration factors, and
and
are random numbers independently sampled from a uniform distribution in the range [0, 1].
6. Fitness evaluation: For each particle, train the SP-Y-Net model using its current combination of parameters and calculate its prediction error or loss function value as the fitness value as follows:
7. Update Individual Best Position: If the current fitness value of a particle is better than the fitness value of its individual best position, then update the individual best position.
8. Update global best position: If the fitness value corresponding to the individual best position of a particle is better than the fitness value of the global best position, then update the global best position.
9. Iteration: Repeat steps 5 to 8 until the set number of iterations is reached or the convergence condition is reached.
10. Final result: When the PSO converges or reaches the number of iterations, the parameter combination corresponding to the global optimal position finally obtained is the optimal PSO-Opti-Net parameter setting.
3. Experiment
3.1. Experimental Equipment
3.1.1. Cryogenic Wind Tunnel Experimental System
Figure 6 shows the test section of the environmental wind tunnel laboratory at the Haval Technology Center. The system utilizes a fan with a total power of 2 MW to provide power for airflow, and the maximum wind speed of the spout can reach 250 km/h. The system realizes temperature regulation through a heat exchanger, which can realize the temperature to be transformed between −40.0 °C~60.0 °C. The laboratory has a super-large-scale spout and test section and contains a water mist injection system, rain and snow simulation system, etc.
3.1.2. Experimental Blade
The experimental blade is produced by Qingdao Yineng Wind Power Equipment Manufacturing Co., Ltd., model YN-2.7 D. The blade is 1400 mm long, made of fiberglass reinforced plastic (FRP), with a rated power of 1 kW, a maximum power of 1.2 kW, an operating wind speed of 3–25 m/s, a rated wind speed of 8 m/s, and a rated rotational speed of 400 r/min. The blade is fixed by a flange connection to a manually adjustable angle of the pedestal.
3.2. Experimental Program
After evaluating the preliminary experimental performance and efficiency, and considering the impact of environmental factors on blade icing, an icing type of Glaze ice was selected for its more significant visual effect on ice accumulation. To achieve more visually significant images of ice accumulation on the side of the blade, the blade angle of attack was set to 0°. Considering the icing efficiency, the experimental conditions from group 2 (wind speed 8 m/s) were chosen. The experimental duration was set to 60 min. The specific experimental conditions are presented in
Table 1.
3.3. Data Preprocessing
Preprocessing operations such as labeling, division, and data enhancement. need to be performed on the images before training the network model. In this paper, the initial data preprocessing includes a normalization process and a wavelet denoising technique to eliminate picture noise, ensuring picture quality and stability. This method will provide clean and consistent data for subsequent prediction.
Normalization is the process of mapping the pixel values of an image to a specific range, usually
or
, for more efficient analysis. The Min–Max normalization formula is as follows:
where
is the original data and
and
are the minimum and maximum values of the image pixels, respectively. This formula scales the data to the range
.
Wavelet denoising is a technique commonly used in signal processing and image noise reduction. It uses wavelet transform to separate the high-frequency and low-frequency components of a signal and then removes the high-frequency noise. The basic equation of the wavelet transform is given below:
where
are the wavelet coefficients,
is the original signal, and
is the wavelet basis function, where
and
are the scale and translation parameters. In the technique of wavelet denoising, thresholding is commonly used to eliminate wavelet coefficients that are below a specific threshold to eliminate noise. Once this process of thresholding is performed, an inverse wavelet transform is necessary to regain the denoised signal. The formula for the inverse wavelet transform is mentioned below.
where
is the denoised signal.
The data preprocessing method of normalization and wavelet denoising is utilized to obtain signals with more prominent features. Subsequently, this paper labels and expands the processed data as follows:
- (1)
Divide the dataset: To provide a more precise depiction of the various ice-covering conditions, the images captured were categorized during the experiment at 0 min, 20 min, 40 min, and 60 min into four levels: lv0, lv1, lv2, and lv3. This was based on the time at which the experiment was conducted. In this case, ice covering is represented by different levels: lv0 indicates no ice covering, while lv1, lv2, and lv3 correspond to mild, moderate, and severe ice-covering conditions. Sample pictures are displayed in
Figure 7. During the screening process, a set of strict quality criteria was applied to the initial 3669 images. Images were discarded if they exhibited significant motion blur, poor focus, over or under-exposure, or contained reflections that obscured the details of ice accretion on the blade surface. Furthermore, only images providing a clear, perpendicular view of the blade’s leading edge were retained to ensure consistency in representing the icing state. This systematic screening process yielded a final high-quality dataset of 3000 images. To ensure the consistency and reliability of the annotations, a standardized labeling protocol was established. Initially, two experienced researchers independently annotated a random subset of 200 images (approximately 7% of the dataset). The annotations were then compared, and any discrepancies were discussed to resolve ambiguities and refine the labeling guidelines. This finalized protocol was subsequently used by one researcher to annotate the entire dataset, with the second researcher performing spot-checks to maintain high-quality standards throughout the process. Among them, there are 806 images belonging to lv0, 730 images belonging to lv1, 694 images of lv2, and 670 images belonging to lv3.
- (2)
Data annotation: The images of the WTB were labeled utilizing the Labeling tool, wherein the labels corresponded to lv0, lv1, lv2, and lv3, each representing varying degrees of ice coverage. A corresponding XML file is created post-labeling, containing information regarding the size and position of the ice cover as well as the image name and size.
- (3)
Data Enhancement: Data augmentation techniques are utilized to broaden the dataset, which not only enriches it but also improves the network’s generalization capacity, allowing for better adaptation to various backgrounds in the images. In this study, data augmentation techniques were employed to expand the dataset, thereby enriching its diversity and enhancing the network’s generalization ability. The augmentation strategy adopted herein was strategically designed to improve the model’s robustness against common visual variations encountered in real-world scenarios. Specifically, the dataset was expanded through the application of three key operations: horizontal flipping, Gaussian noise addition, and contrast adjustment. Following data augmentation, the total number of images in the dataset was increased to 10,000, with the training and test sets retaining the original 7:3 proportional split. After data enhancement, the dataset was increased to 10,000 images, with the training and test sets maintaining the original ratio.
3.4. Experimental Setup
The initial dataset of 3000 sheets was divided into training and test sets in a 7:3 ratio before data enhancement. After enhancement, the dataset was augmented to 10,000 sheets while maintaining the original ratio. The experimental platform is Python 3, PaddlePaddle 2.3.2, CUDA 11.2, and cuDNN 8.2. The GPU is a Tesla V100, the memory is 16 GB, and the operating system is Linux.
The experiments are conducted in the same computer operating environment. The initial parameters were set as follows: batch size was set to 16, learning rate was 0.01, and the number of training steps was 10,000. Adam optimization was used, and momentum was set to 0.9. Other classical deep learning detection algorithms also used the Adam optimizer for optimization with momentum set to 0.9; the batch size, learning rate, and number of training steps were set according to the network model and data volume.
To achieve the search and optimization of the best model parameters and provide faster convergence, a Particle Swarm Optimization (PSO) algorithm was introduced for model parameter optimization. The core idea of PSO is to establish a particle swarm, where each particle represents a set of parameters for the SP-Y-Net model. These hyperparameters, which govern the training process itself, are optimized using the PSO algorithm. Each particle possesses its own velocity and position, which represent the rate of change of parameters and the current parameter values, respectively. Each particle also maintains a personal best position, corresponding to the parameter setting with the smallest fitness value found by itself, as well as the global best position, which refers to the parameter setting with the smallest fitness value identified by the entire swarm. PSO continuously updates the velocity and position of particles based on these best positions to find better parameter combinations. The process involves multiple iterations until the swarm converges to the global best position. The final parameter combination obtained is the optimal setting for the PSO-Opti-Net model. The fitness function for PSO was defined based on the prediction error or loss function value of the model.
It is worth noting that the optimization problem in this study revolves around improving the performance of the proposed improved PP-YOLOE network for detecting WTB icing status. Specifically, the overarching goal of our study is to obtain a model with optimal performance. This is achieved through two distinct optimization processes: (1) The model’s internal parameters (convolutional layer weights and biases) are optimized through the standard training process via gradient descent. (2) The external training hyperparameters (such as learning rate and batch size), which govern the training process itself, are optimized using the PSO algorithm. The aim of using PSO is to find a hyperparameter configuration that allows the gradient-based training to converge to a better final model. The optimization process is guided by loss functions and training algorithms, which inherently constrain parameter values through gradient descent and other optimization techniques. However, regularization techniques such as weight decay and learning-rate scheduling were adopted to mitigate overfitting and maintain training stability; collectively, these methods impose implicit constraints on the design variables.
Optimization outcomes were assessed by adopting multiple average precision (mAP) at an Intersection over Union (IoU) threshold of 0.5 (mAP@0.5) as the primary evaluation metric. mAP is widely applied in object detection and classification tasks to jointly quantify precision and recall across a range of IoU thresholds. Furthermore, auxiliary indicators such as recall rate, F1 score, mean absolute error, and mean square error were evaluated to provide a comprehensive assessment of model performance. These indicators are reported in detail in the results section to evaluate the effectiveness of our optimization work.
The swarm comprised 30 particles, with a maximum of 50 iterations for convergence. A linearly decreasing inertia weight (w) from 0.9 to 0.4 was employed to balance global exploration and local exploitation. The cognitive and social acceleration factors, c1 and c2, were both set to the standard value of 2.0. The algorithm systematically explored a defined search space for the key training hyperparameters: the initial learning rate was searched within the range [0.0005, 0.01], the batch size was selected from a discrete set of {8, 16, 32}, and the optimizer’s momentum was tuned within the interval [0.85, 0.95].
3.5. Ablation Experiments
To assess the effectiveness of the three enhancements of the PSO-Opti-Net network structure, ablation experiments were carried out on each enhancement point of the PP-YOLOE baseline model utilizing the following methodology:
Ablation Experiment 1: Removal of Coordinate Attention Mechanisms
A version without the attention mechanism embedded in the CSPRes module of the backbone network is constructed while keeping other network structures and hyperparameters unchanged. Compare the performance of the models with and without the CA in the task of WTB ice-cover thickness detection, including the metrics Recall, F1-Score, and mAP.
Recall indicates the ability of the model to find all correct objects. A higher rate of find-all indicates that the model can find more positive examples correctly.
F1-Score is the reconciled average of Precision and Recall indicators. It is used to balance the Precision and Recall indicators, making the model more stable and not biased toward either the Precision or Recall indicators.
The mAP is used to measure the model performance. It is calculated by averaging multiple average precision (AP) values after summing them. AP is the area under the P-R curve and is used to measure the average accuracy of the model for a single category.
where
is the number of positive examples for each category, and
is interpreted as the predicted probability for the corresponding category
Ablation Experiment 2: Replacing ASPP back to SPP
Replace the ASPP operation in the first CSPBlock module of the middle layer network with the original SPP operation, keeping other network structures and hyperparameters unchanged. Compare the performance of the models using ASPP and SPP on the task of WTB ice-cover thickness detection, including the three metrics of F1-Score, recall, and mAP.
Ablation Experiment 3: Changing parameter settings
The experiment was started using the initial parameter settings as the batch size was set to 16, the learning rate was 0.01, and the number of training steps was 10,000. Adam optimization was used, and the momentum was set to 0.9. Then, the parameters were compared using the parameters that were optimized by the particle swarm, and the comparison parameters were MAE and MSE.
MAE (mean absolute error):
MAE (Mean Squared Error):
where
denotes the number of samples,
denotes the actual value of the
-th sample, and
denotes the predicted value of the
-th sample.
4. Results and Discussions
4.1. Ablation Experiments Results
In ablation experiments, it is necessary to maintain consistency in other parameters and settings to focus only on specific improvements. By comparing the experimental results, the impact of the attention mechanism and ASPP replacement on the PP-YOLOE network in the WTB icing thickness detection task can be evaluated, and their effectiveness and contribution can be determined. Models 1, 2, 3, and 4 represent CSPRes + Attention, CSPRes, CSPBlock + ASPP, and PSO-Opti-Net, respectively. The experimental results are shown in
Table 2.
From the results of the ablation experiments, it can be seen that when the training step reaches 10,000, the PSO-Opti-Net network has a recall of 0.9 in the WTB ice-cover thickness detection task, which is 0.03, 0.02, and 0.05 higher than that of CSPRes + Attention, CSPBlock + ASPP, and PP-YOLOE, respectively. The mAP and F1-score are also both higher than those of the other models.
As seen from the results of the ablation experiment in
Figure 8, when the training step reaches 10,000, the PSO-Opti-Net network has an accuracy of 0.94 in the WTB ice-cover thickness detection task, which is 0.02, 0.01, and 0.03 higher than CSPRes + Attention, CSPBlock + ASPP and PP-YOLOE, respectively. The detection results achieved a recall of 0.90 across test images, which is also higher than that of CSPRes Attention, CSPBlock ASPP, and PP-YOLOE. It is important to note that while the incremental gains from individual components in the ablation study may appear marginal, their combined effect in the final PSO-Opti-Net model shows a consistent and synergistic improvement across all key metrics (mAP, Recall, and F1-Score). This consistent outperformance suggests that each modification contributes meaningfully to the model’s enhanced capability for this specific task.
The reason for the higher recall and accuracy of CSPRes + Attention over PP-YOLOE is that the attention mechanism helps the model to focus on the regions that are particularly important for the task of identifying ice cover, thus making better use of contextual information. By introducing the attention mechanism, the model can automatically learn which regions are more critical for the identification of ice cover, enabling the model to focus more on these regions and improve the detection of ice cover.
The reason for the higher recall and accuracy of CSPBlock + ASPP over PP-YOLOE is that ASPP can extract features over a different range of receptive fields, including smaller details and larger contextual information. In this way, the model can understand the image content more comprehensively, which helps to detect the ice cover more accurately.
From the results of the ablation experiments, it can be seen that the F1-Score of the PSO-Opti-Net network in the WTB ice cover thickness detection task is 0.92, and the P-R curve of the PSO-Opti-Net network is always above those of CSPRes + Attention, CSPBlock + ASPP and PP-YOLOE. This means that PSO-Opti-Net can provide higher precision at the same recall rate. In other words, the PSO-Opti-Net WTB ice-cover thickness detection task can be recognized more accurately.
The two evaluation metrics, and , are used in ablation experiment 3 in this paper mainly because of their comprehensiveness and usefulness. These two metrics can provide information on multiple aspects of model performance, including the magnitude and direction of errors and robustness to outliers. and are also intuitively interpretable, being able to express the magnitude of errors in the same units as the problem domain, making it easier for decision-makers to understand the results. In addition, these metrics are generalizable across applications, not only for regression problems but also to help compare the performance of different models. Therefore, the selection of and as evaluation metrics helps to comprehensively assess the predictive performance of models and provides a strong basis for model selection and improvement.
In addition, this paper also designs experiments to compare the performance of SP-Y-Net and PSO-Opti-Net in dealing with the prediction of WTB ice-cover thickness, which is used to explore the impact of particle swarm parameter optimization algorithms on the detection task, and the results are shown in
Table 3.
Following the optimization of particle swarm parameters, the SP-Y-Net model successfully predicts the ice cover on WTB. This is attributed to the model’s ability to thoroughly optimize parameters. PSO-Opti-Net explores the parameter space to find optimal configurations and prevents the model from encountering a local optimum. As a result, the model adapts better weights and hyperparameter settings to a specific ice cover thickness prediction task. In contrast, an unoptimized SP-Y-Net might use default or empirical parameters that limit the model’s performance. Thus, PSO helps to improve the predictive accuracy and overall performance of the model for superior performance in WTB ice-cover thickness prediction tasks.
4.2. Comparison Experiments Results
Various deep learning models have their own unique characteristics and performance attributes. Comparison experiments can analyze their strengths and weaknesses in this particular task and ultimately determine the most appropriate model to deliver precise WTB ice-cover thickness detection results. Comparison experiments can uncover performance variations among different models for detecting the thickness of ice on wind turbine blades. This can serve as a reference point for eventual decision-making and implementation. PPYOLOE, Faster R-CNN, EfficientDet, and YOLOv5 are the selected models for comparison in this paper’s ice-cover thickness detection experiments. The comparison experiments are conducted to evaluate the performance of different models on this task and to select the most suitable model for application. The experimental results are shown in
Table 4.
Based on the results of the comparison experiments, it is evident that when the training stage reaches 10,000, the PSO-Opti-Net network has a recall rate of 0.9 in identifying the thickness of the ice covering wind blades. This recall rate is higher than that of PPYOLOE and Faster R-CNN, and comparable to the performance level of the YOLOv5 model, while being slightly lower than EfficientDet’s 0.91.
As seen from the results of the comparison experiments in
Figure 9, when the training step reaches 10,000, the PSO-Opti-Net network has an average accuracy of 0.94 in the task of detecting the thickness of WTB ice cover, which is higher than that of PPYOLOE, Faster R-CNN, EfficientDet, and YOLOv5. Considering that PSO-Opti-Net uses the PANet structure, the structure can effectively fuse feature information from different scales. This makes PSO-Opti-Net more advantageous in dealing with targets of different sizes and therefore better than models such as YOLOv5 for WTB ice-cover thickness detection.
Based on the comparison experiments, it is evident that the P-R curve of the PSO-Opti-Net network consistently outperforms that of PPYOLOE, Faster R-CNN, EfficientDet, and YOLOv5. This indicates that the proposed model in this paper operates optimally.
The outstanding performance of PSO-Opti-Net in recall, accuracy, and F1-Score is attributed to the application of the attention mechanism and ASPP, optimization of the model structure, and enhanced data and training procedures. These factors collectively contribute to the exceptional performance of the model. By implementing these enhancements, PSO-Opti-Net becomes capable of more precise recognition of the attributes concerning the accumulation of ice on WTB, enhancing the aptitude for identifying the magnitude of this accumulation and surpassing other approaches in terms of evaluation metrics.
4.3. Detection Effect
In this study, the optimization problem revolves around enhancing the performance of the proposed improved PP-YOLOE network for detecting icing states of WTBs. Specifically, model parameters, including but not limited to convolutional layer weights and biases are optimized to minimize classification error, thereby enhancing overall accuracy. This optimization is implicitly addressed through the training process, where the model is trained to minimize a loss function that reflects the classification error.
In
Figure 10, the process of training the model is visualized. Our experiments have shown that mAP has reached a relatively high level and remains stable over multiple iterations. The stability of mAP on the validation set suggests that the model has effectively learned the features within our dataset and converged well, without showing signs of significant overfitting to the training data. The fact that mAP has not undergone significant changes beyond this point is a powerful indicator of convergence. Similarly, the loss function measures the difference between the predicted label and the true label and converges to a stable value after 10,000 iterations. The loss curve tends to stabilize, indicating that the model no longer shows significant improvement in reducing prediction errors. This is a sign of convergence in machine learning models. The model checkpoint with the highest validation mAP of 0.976 was saved during training. For final evaluation on the test set, this model achieved a stable and generalizable mAP of 0.94, which is the value reported in our comparative results.
In the validation of wind turbine blade ice-covered images using the optimal neural network training model,
Figure 11 illustrates the experimental impact.
While this study demonstrates the effectiveness of the proposed PSO-Opti-Net model, this study acknowledges certain limitations that lay the groundwork for future research. A primary limitation stems from the dataset being collected within a controlled wind tunnel laboratory, utilizing a single blade model under fixed environmental parameters and a static angle of attack. This controlled setting, while crucial for foundational validation, may not fully capture the complexities of real-world operational environments, which involve dynamic blade rotation, diverse weather patterns, variable lighting, and different blade types. Consequently, the model’s generalization capability across these varied, uncontrolled scenarios warrants further investigation.
5. Conclusions
To detect WTB ice covering, an improved PP-YOLOE network, PSO-Opti-Net, is proposed in this paper. After experimental testing, PSO-Opti-Net outperformed PP-YOLOv2, Faster R-CNN, EfficientDet, and YOLOv5 in three aspects: recall, accuracy, and F1-Score. PSO-Opti-Net has a Recall of 0.90, compared to the other models’ Recall scores of 0.85 (PP-YOLOE), 0.88 (Faster R-CNN), 0.91 (EfficientDet), and 0.90 (YOLOv5). PSO-Opti-Net achieves a mAP of 0.94, while the other models have mAP scores of 0.91 (PP-YOLOE), 0.93 (Faster R-CNN), 0.92 (EfficientDet), and 0.93 (YOLOv5). This shows that PSO-Opti-Net leads the other models in terms of the average accuracy of detection results. In addition, PSO-Opti-Net’s F1 score is 0.92, while PP-YOLOE’s F1 score is 0.87, Faster R-CNN’s F1 score is 0.91, EfficientDet’s F1 score is 0.90, and YOLOv5’s F1 score is 0.91, which reflects that PSO-Opti-Net has a good balance between precision and recall rate is well balanced and performs better when accuracy and recall are combined.
The performance of model is driven by key innovations, including architectural enhancements through the attention mechanism and ASPP, and a novel ap-plication of particle swarm optimization for systematic hyperparameter finding. These adjustments enable the model to capture the characteristics of wind blade ice cover more accurately. Compared to previous methods used to detect icing on wind turbine blades, PSO-Opti-Net bases its prediction on global image information, as opposed to detection algorithms that employ sliding windows as well as area suggestions. Compared to two-stage detectors like Faster R-CNN, which achieved a mAP of 0.93, our single-stage PSO-Opti-Net not only achieved a higher mAP of 0.94 but also demonstrated strong performance in extracting generic and representational features of the object directly from the global image information. This contributes to its good performance and robustness on the established dataset.
The wind turbine blade icing detection model based on PP-YOLOE network introduced in this article improves the balance between detection accuracy and speed, and reduces dependence on anchor point settings compared to some traditional architectures, and avoids special operators such as deformable convolution and matrix NMS (Non-Maximum Suppression), making it easy to deploy on a wide range of hardware and providing guidance for improving the accuracy of intelligent detection and recognition of wind turbine blades. Ultimately, this research not only provides an efficient tool for WTB ice detection, but also demonstrates a referenceable approach that can integrate deep learning components to address more complex detection challenges.
Future work will focus on extending the dataset to include images from in-service wind turbines, thereby enhancing the model’s robustness and applicability. To further improve performance in complex visual environments, plans are outlined to explore advanced signal processing methods for image quality enhancement, such as the utilization of Positive-Incentive noise concepts—concepts that have demonstrated promise in fault diagnosis. Moreover, addressing the inherent “black-box” nature of deep learning constitutes a critical future direction, with the aim of enhancing the model’s interpretability to foster greater trust for industrial deployment. This objective will draw inspiration from emerging techniques that improve the transparency of AI decisions. Collectively, these efforts will advance the development of a more reliable and field-deployable intelligent system for WTB icing detection.