Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning

Li, Xin; Tian, Jinghe; Pang, Xinfu; Shen, Li; Li, Haibo; Zheng, Zedong

doi:10.3390/s25144414

Open AccessArticle

Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning

by

Xin Li

¹,

Jinghe Tian

^1,*

,

Xinfu Pang

¹

,

Li Shen

¹,

Haibo Li

¹

and

Zedong Zheng

²

¹

Key Laboratory of Energy Saving and Controlling in Power System of Liaoning Province, Shenyang Institute of Engineering, Shenyang 110136, China

²

School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(14), 4414; https://doi.org/10.3390/s25144414

Submission received: 24 May 2025 / Revised: 25 June 2025 / Accepted: 9 July 2025 / Published: 15 July 2025

(This article belongs to the Special Issue Deep Learning for Perception and Recognition: Method and Applications)

Download

Browse Figures

Versions Notes

Abstract

Timely and accurate detection of wind turbine blade surface defects is crucial for ensuring operational safety and improving maintenance efficiency with respect to large-scale wind farms. However, existing methods often suffer from poor generalization, background interference, and inadequate real-time performance. To overcome these limitations, we developed an end-to-end defect recognition framework, structured as a three-stage process: blade localization using YOLOv5, robust feature extraction via the large vision model DINOv2, and defect classification using a Stochastic Configuration Network (SCN). Unlike conventional CNN-based approaches, the use of DINOv2 significantly improves the capability for representation under complex textures. The experimental results reveal that the proposed method achieved a classification accuracy of 97.8% and an average inference time of 19.65 ms per image, satisfying real-time requirements. Compared to traditional methods, this framework provides a more scalable, accurate, and efficient solution for the intelligent inspection and maintenance of wind turbine blades.

Keywords:

defect detection; DINOv2; Stochastic Configuration Network; wind blades; YOLOv5 network

1. Introduction

1.1. Literature Review

As wind is a renewable energy source, wind power generation does not emit greenhouse gases or other pollutants (unlike traditional thermal power generation), thereby supporting the achievement of carbon peaking and neutralization. Severe weather and gravity-related factors will affect the long-term stable operation of wind turbine blades outdoors. Blade defects, such as paint chipping, cracking damage, and oil stains, usually develop over time. Regular inspection of wind turbine blades is currently carried out manually or with the assistance of drones. Manual detection usually requires workers to climb up tall wind turbine towers to conduct visual inspections, posing high safety risks, and this method is prone to misjudgment and missed detection. In addition, the efficiency of manual inspection is low, and it is difficult to quickly cover many wind turbines on large wind farms. Therefore, it is vital to judge blade defects safely and effectively to allow the continuous and reliable operation of a wind turbine.

In Ref. [1], the Bayesian classification method was used to classify the vibration signals generated by cracks, corrosion, loose connections, and other faults. In Ref. [2], RPCA was employed to reduce the data dimensions of vibration signals. The wind turbine blade inspection blade data set was deemed significant, and traditional machine learning was found to have low processing efficiency for large-scale data and high data quality requirements. Traditional machine learning methods must manually design and extract features, and over-reliance on prior knowledge is subject to influence by subjective factors.

In recent years, owing to advancements in deep learning advancements, neural-network-based methods have become the standard approach to wind turbine blade recognition. In Ref. [3], a characteristic map of vibration signals was input into an MCNN to extract the features of different defect types, and the ART network was used as a classifier. In Ref. [4], the Haar-AdaBoost cascade classifier was used to determine the damaged area, and then the variant VGG16 was used to classify the types of damage. However, the Haar features of this method need to be artificially designed in order to improve the model’s generalization ability. In Ref. [5], the Otsu algorithm was used to remove the complex background in blade images, and then AlexNet combined with transfer learning was used for feature extraction. Finally, a random forest was used to classify defect types. In Ref. [6], a two-stage object detection network, Faster-RCNN, with a backbone of Inception-ResNet-v2 was used to identify and classify blade defects, and the images were enhanced via flipping, brightness transformation, Gaussian blur, and other methods. In Ref. [7], the ADMM algorithm was used to reduce the weight of VGG11 for blade defect detection. In Ref. [8], the improved ResNet50 was used to replace the backbone VGG part of the SSD network to realize blade defect detection. In Ref. [9], a UAV was used to circle a wind turbine blade in order to obtain blade images, and AlexNet was used for damage classification. In Ref. [10], a multi-feature fusion residual structure named ResNet34 was proposed, with a smaller network depth than the original ResNet34. In Ref. [11], a DCNN pre-trained on ImageNet was used as a feature extractor, and an SVM was used as a classifier to classify blade defects. In Ref. [12], combined with transfer learning, ResNet101 was fine-tuned as the backbone network of an R-CNN to realize the identification of three defects: damage, cracks, and oil pollution. The improved k-means algorithm was used to reduce the influence of complex backgrounds. Although this method successfully classified these three types of defects, its accuracy still needs to be improved. In Ref. [13], VGG16 with an attention mechanism and an adaptive learning rate was used as a feature extractor. In Ref. [14], MASK-RCNN and MRNet were combined to reduce background influence. In Ref. [15], the authors compared the performance of ResNet50 and AlexNet in blade defect recognition tasks, and the results showed that ResNet50 was better. The authors of [16,17,18,19,20] all used the original or improved YOLO-series algorithms to realize defect recognition, added attention mechanisms to reduce the influence of blade image background on defect features, or used lightweight models to increase reasoning speed and lower computational power consumption. In recent years, foundation models have received extensive attention in both the language and vision domains. Large-scale language models, such as GPT and BERT, are pre-trained on vast textual corpora for language comprehension and generation. In contrast, visual foundation models such as DINOv2 and CLIP are pre-trained on large-scale image datasets to extract general-purpose visual representations. DINOv2 offers strong generalization and feature representation capabilities, particularly in complex visual environments and small-sample scenarios, making it well-suited for tasks such as wind turbine blade defect classification. In Ref. [21], a fuzzy-system-based genetic algorithm was designed to perform adaptive segmentation of trajectory sequences, enabling global optimization through dynamic mutation and crossover strategies. The core ideas of adaptive structure adjustment and optimization are relevant to visual detection in complex environments. In Ref. [22], a dual-scale complementary spatial–spectral joint model was introduced for hyperspectral image classification, improving the robustness of feature representation by integrating multi-scale spatial and spectral information. There are also recent studies offering important references for building efficient, generalizable models in the field of defect detection. In Ref. [23], an energy-efficient mechanical fault diagnosis method named SpikingFormer was proposed, inspired by neural dynamics and metric learning. This method addresses the challenge of limited sample availability in industrial scenarios by combining low-energy computing with prototype-based classification. In Ref. [24], a multi-source domain adversarial transfer network that adheres to a dual-fusion strategy was developed to enhance fault diagnosis performance in cross-domain and noisy environments. This approach integrates both feature-level and decision-level information to improve generalization. These studies reflect recent advancements in deep learning-based fault diagnosis, highlighting the importance of robustness, domain adaptation, and efficient representation learning. In Ref. [25], a meta-learning framework named MDGCML was proposed to address imbalanced open-set domain generalization in fault diagnosis. By coordinating gradients across domains and classes, it enabled balanced decision boundaries and fast adaptation to unknown conditions. This method demonstrates strong generalization in few-shot and cross-domain scenarios, offering valuable insights for robust industrial diagnosis. A summary of related work is shown in Table 1.

1.2. Motivation and Contributions

The analysis of vibration signals during the detection of defects in wind turbine blades is a standard method that has been applied in previous studies. However, it is hindered by environmental interference, poor real-time performance, and high equipment costs, and it is not suitable for large-scale wind turbine blade defect detection tasks. Therefore, defect detection methods based on visual images have become the mainstream. (1) Traditional machine learning requires the manual design of features, relying on domain experts’ knowledge and experience. The process of feature extraction for high-dimensional data and unstructured data is time-consuming and complex. (2) The performance of existing deep learning methods depends on image quality. Noise in the complex background of an image of a wind turbine blade may be confused with defective features of the blade. For example, background elements such as sky, clouds, and earth may look similar to blade defects, resulting in false or missed detection. (3) To ensure continuous and stable operation of a wind turbine and avoid excessively long downtimes (causing more severe damage to the blades), wind turbine blade defects should be detected in real time.

The following contributions were made to solve the above problems and improve the accuracy of wind turbine blade defect detection:

(1) An end-to-end defect detection framework for wind turbine blades was constructed, comprising three key stages—blade region localization, defect feature extraction, and defect classification—thus forming a complete visual detection process.

(2) Over-fitting suppression and robustness enhancement in blade positioning were achieved using YOLOv5. The images of wind turbine blades taken by drones show different perspectives at different angles. Changes in light will also affect the brightness or darkness (via shadows) of the blade surface, thus affecting the model’s generalizability. Therefore, we suppressed over-fitting using Mixup, brightness transformation, flipping, random scaling, and Mosaic methods for image enhancement. These augmentations improve the model’s ability to generalize across unseen scenarios, contributing to the robustness of the system. After positioning, the augmented and cropped blade regions were used as inputs for the subsequent classification process, ensuring that the DINOv2-based feature extraction and SCN-based classification operated on inputs that had already been optimized for variability and complexity, allowing the system to maintain high performance across diverse real-world conditions. This augmentation strategy effectively enhances the end-to-end robustness of the proposed defect recognition framework.

(3) We performed feature extraction of wind turbine blades based on a transfer-learning DINOv2 large vision model. In traditional feature extraction methods, features must be manually selected based on previous experience. After dimensionality reduction, some data information may be lost, resulting in limited feature expression ability. For wind turbine blade defect recognition, we propose using the DINOv2 large vision model to extract the features of the blade. This model autonomously extracts image features and possesses strong generalization capabilities, significantly enhancing the accuracy of detecting defects in turbine blades.

(4) We achieved wind turbine blade defect classification based on an SCN. The existing wind turbine blade defect detection methods have low accuracy, but using the SCN classifier can effectively improve accuracy.

The remaining parts of this study are structured as follows. The second section describes the problems inherent in wind turbine blade defect recognition and introduces the specific steps of wind turbine blade defect detection and recognition, including YOLOv5-based blade area positioning, the feature extraction of wind turbine blades via the DINOv2 large vision model, and stochastic-configuration-network-based wind turbine blade defect classification. The third section provides the details of the simulation experiment. The fourth section is a summary of our work.

2. Wind Turbine Blade Positioning and Defect Recognition

2.1. Framework for Wind Turbine Blade Defect Recognition

Wind turbines are generally installed in windy areas such as hills, mountains, coastal regions, and vast plains. Regular maintenance of wind turbine blades is essential to ensuring the safe operation of wind turbines and prolonging their service lives. Wind turbines are generally tens of meters high. Manual blade inspection is highly risky, inefficient, and prone to subjective limitations if a turbine has been erected in a harsh natural environment. Therefore, we propose using a UAV to capture images of wind turbine blades for automatic defect detection. The process includes three parts: First, locate the blade area in the image and remove the background that does not contain the blade. Second, input the image of the blade area obtained via positioning into the defect recognition algorithm. Finally, note the position of the blade area and the corresponding defect type in the original image. This process is depicted in Figure 1.

Wind turbine blade defect recognition has three parts: blade area positioning, blade feature extraction, and blade defect classification. A framework of this strategy is shown in Figure 2. (1) Wind turbine blade area positioning is determined. Original UAV-captured images containing complex backgrounds are input into the YOLOv5 object detection model. YOLOv5 accurately locates the blade regions, which are then cropped for further processing. This process effectively isolates the blade from noisy backgrounds, improving downstream feature extraction. (2) The localized blade images are resized and fed into the large vision model DINOv2, which generates a 384-dimensional feature vector representing the visual characteristics of the image. These features are robust and discriminative, and they generalize well under conditions involving complex textures. (3) The 384-dimensional feature vector extracted from the DINOv2 model is fed into a Stochastic Configuration Network for defect classification. An SCN is an incremental learning model capable of dynamically constructing its network structure while ensuring universal approximation. It selects hidden nodes based on predefined error tolerance and activation constraints, allowing for fast convergence and high classification accuracy. This process outputs the final predictions for each defect type, including damage, paint shedding, and dirt buildup.

2.2. Wind Turbine Blade Area Positioning Method

The next step entails locating the blade in an image and removing the complex background. The YOLOv5 detection network, known for its speed, accuracy, and simple design, is utilized to identify and localize wind turbine blade areas, aiding in efficient monitoring and analysis. The blades in the original wind turbine blade dataset image are extracted to avoid confusion between the blades’ surface defects and the background information precipitated by the feature extraction process, thereby helping the classifier improve recognition accuracy.

YOLOv5 is a single-stage detector [26,27,28]. Owing to its high efficiency, accuracy, and ease of use, it is widely used in industrial automation, automatic driving, medical imaging, and other fields. The YOLOv5 target detection model adds a CSP structure to the backbone network based on YOLOv3 [29,30]. Its structure includes backbone feature extraction, neck feature fusion, and head target prediction. The wind turbine blade area positioning process based on YOLOv5 is shown in Figure 3.

The steps of the wind turbine blade area positioning algorithm based on YOLOv5 are as follows:

Step 1: Wind turbine blade feature extraction.

The original wind turbine blade image dataset—with a significant background area—captured by the UAV is input into the YOLOv5 backbone feature extraction network. Focus, CBS, C3, and SPP modules are applied to successfully extract and enhance the images.

The CBS module includes Conv (convolution), BN (batch normalization), and Silu activation functions for feature extraction. The activation function expressions are given in Equations (1) and (2), and the batch normalization process is shown in Algorithm 1. The C3 module can perform feature extraction by stacking the convolutional layers (Conv) and the BottleneckCSP module.

S i l u (x) = x \times S i g m o i d (x)

(1)

S i g m o i d (x) = \frac{1}{1 + e^{- x}}

(2)

Algorithm 1: Batch normalization

Input: A =

{x_{1}, x_{2}, x_{3,} \dots x_{n}}

, n is the input batch size
Output: B =

{y_{1}, y_{2}, y_{3} \dots y_{i}}

1 For i = 1 to n do
2 Calculate the input mean:

μ_{A} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

3 Calculate variance:

σ_{A}^{2} = \frac{1}{n} {\sum_{i = 1}^{n} (x_{i} - μ_{A})}^{2}

4 Normalize each input using variance and mean:

{\tilde{x}}_{i} = \frac{x_{i} - μ_{A}}{\sqrt{σ_{A}^{2} + ε}}

,

ε

is a nonzero decimal number
5 Introducing learnable parameters

γ, β

perform linear transformation:

y_{i} = γ {\hat{x}}_{i} + β

6 End For

Step 2: Wind turbine blade feature fusion.

The SPP module performs maximum pooling on the feature map by pooling kernels of different sizes. Then, it performs splicing operations to obtain a new feature map, which improves the model’s adaptability to various sizes of wind turbine blades and helps express blade features. After the convolution operation, the resolution of the feature map is improved via nearest-neighbor upsampling so that this map can be fused with other feature maps to form a multi-scale feature pyramid, which can improve the detection ability of the network for different sizes of blades. The nearest neighbor upsampling is shown in Equation (3).

x_{s r c} = \frac{x_{d s t} W i d t h_{s c c}}{W i d t h_{d s t}}, y_{s r c} = \frac{y_{d s t} H e i g h t_{s r c}}{H e i g h t_{d s t}}

(3)

Here,

x_{s r c}

is the abscissa of a pixel in the original image,

y_{s r c}

is the ordinate of a pixel in the original image,

W i d t h_{s r c}

is the width of the original image,

H e i g h t_{s r c}

is the height of the original image,

x_{d s t}

is the abscissa of the corresponding pixel of the target image,

y_{d s t}

is the ordinate of the corresponding pixel of the target image,

W i d t h_{d s t}

is the width of the target image, and

H e i g h t_{d s t}

is the height of the target image.

Step 3: Obtain the wind turbine blade area results.

Feature decoding is performed on the feature pyramid of the wind turbine blade neck. The head network predicts the position and size of the wind turbine blade bounding box in each image through the convolution and activation layers. It distinguishes the wind turbine blade from the background area. After upsampling, the original image size is restored, and the position of the wind turbine blade is framed.

2.3. Classification of Wind Turbine Blade Defects

Wind turbine blade defect classification includes feature extraction and dimension reduction with data classification. Firstly, the DINOv2 large vision model is used to extract the features of the YOLOv5-located blade images. To meet the requirements of a limited-resource environment and fast reasoning, the DINOv2 ViT-S/14 model was selected as the feature extractor. The embedding dimension of the model is 384 [31], and the obtained wind turbine blade feature vector has 384 elements. Then, the Stochastic Configuration Network is used to classify the feature vectors after data dimensionality reduction. Finally, the probability values of various defect types of wind turbine blades can be obtained.

2.3.1. Method for Extracting Features Utilizing DINOv2

The wind turbine blade area extracted by YOLOv5 still contains a bit of background imagery. To reduce the redundant features generated by the background and the feature dimensions of the defective blade, the number of calculations carried out by the classifier is diminished, and classification efficiency is improved. Accordingly, the DINOv2ViT-S/14 model is used as the feature extractor for blade area in combination with transfer learning [31]. A flow chart of feature extraction via the DINOv2 large vision model is shown in Figure 4, and the corresponding algorithm is shown in Algorithm 2. DINOv2ViT-s/14 extracts features from unlabeled wind turbine blade image data through self-supervised learning. The student model learns the output of the teacher model and improves feature expression ability through knowledge distillation [32]. The ViT neural network has an attention mechanism that can capture the global information of an image, and parallel computing improves the running speed of the model. The attention mechanism is shown in Equation (4).

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(4)

In this equation, Q is the query matrix, K is the key matrix, V is the value matrix, and d_k is the key dimension.

Figure 4. A structural diagram of the characteristics of the wind turbine blades obtained via DINOv2.

Algorithm 2: DINOv2 wind turbine blade feature extraction algorithm

Input: blade image

X_{i} \in ℝ^{224 \times 224 \times 3}

,

ε = 10^{- 6}

Output: blade feature vector

y \in ℝ^{384}

1 The blade image is divided into 16×16 image blocks with 14 × 14 pixels, and the wind turbine image is obtained:

X_{c} \in ℝ^{256 \times 14 \times 14 \times 3}

2 Flatten the three-channel

X_{c}

to get

X_{f} \in ℝ^{256 \times 588}

3

X_{f}

is linearly projected onto a 384-dimensional vector space to obtain

X_{f c_1} = X_{f} A^{T} + b

,

A \in ℝ^{384 \times 588}

,

X_{f c_1} \in ℝ^{256 \times 384}

4 To distinguish the relative position of sequence elements, embedded position encoding,

X_{f c_1} \in ℝ^{257 \times 384}

5 For i = 1 to 12 do
6 Layer normalization:

X_{L N_1} = \frac{X_{f c_1} - E [X_{f c_1}]}{\sqrt{V a r [X_{f c_1}] + ε}} \cdot γ + β

7 Attention mechanism:

X_{a t t} = A t t e n t i o n (X_{L N_1}, X_{L N_1}, X_{L N_1})

8 Layer normalization:

X_{L N_2} = \frac{X_{a t t} - E [X_{a t t}]}{\sqrt{V a r [X_{a t t}] + ε}} \cdot γ + β

9 The first fully connected layer of multi-layer perceptron:

X_{f c_2} = X_{a t t} {A_{1}}^{T} + b

,

A_{1} \in ℝ^{1536 \times 384}

,

X_{f c_2} \in ℝ^{257 \times 1536}

10 The second fully connected layer of multi-layer perceptron:

X_{f c_3} = X_{f c_2} {A_{2}}^{T} + b

,

A_{2} \in ℝ^{384 \times 1536}

,

X_{f c_3} \in ℝ^{257 \times 384}

11 End For
12 Layer normalization:

X_{L N_3} = \frac{X_{f c_3} - E [X_{f c_3}]}{\sqrt{V a r [X_{f c_3}] + ε}} \cdot γ + β

13 The blade feature vector y is the first row element of

X_{L N_3}

2.3.2. Feature Vector Classification Based on a Stochastic Configuration Network

The wind turbine blade feature vector extracted by DINOv2 has many dimensions and a large quantity of data, and it is difficult to manually distinguish the characteristics of different types of defects. Therefore, the Stochastic Configuration Network [33], which can automatically learn the classification rules from the input feature vector, is used as the classifier. It is robust and can quickly fit the data. The Stochastic Configuration Network is a random-learning algorithm with a supervision mechanism. The algorithm contains a small number of initial hidden layer nodes. According to the residual error in the input wind turbine blade feature vector fitting process and the preset tolerance error, a judgement is made regarding whether to increase the number of hidden layer nodes. Equation (7) is the current node residual calculation process. The self-monitoring mechanism randomly assigns the weights and deviations of the hidden layer, and the network structure is shown in Figure 5. In Equations (5) and (6), all the feature vectors of wind turbine blades extracted by DINOv2 are superimposed into a matrix and input into the Stochastic Configuration Network to reduce the error by automatically updating the number of hidden layer nodes until the tolerance error is satisfied. Finally, the probability values of different types of wind turbine blade defects are output.

Y_{L - 1} (x) = \sum_{j = 1}^{L - 1} β_{j} s i g m o i d (ω_{j}^{T} x + b_{j}), L = 1, 2, 3, \dots, n; f_{0} = 0

(5)

s i g m o i d (x) = \frac{1}{1 - e^{- x}}

(6)

e_{L - 1} = f - f_{L - 1} = [e_{L - 1, 1}, e_{L - 1, 2}, \dots, e_{L - 1, n}]

(7)

Suppose the output residual error cannot meet the preset tolerance error. In that case, the number of hidden layer nodes is automatically increased. The output of the L hidden nodes can be expressed as Equation (8).

h_{L} = s i g m o i d (ω_{L}^{T} x + b_{L})

(8)

Theorem 1

[31]: Suppose span

(Γ)

is dense in L₂ space, and

\forall h \in Γ, 0 \leq ‖h‖ \leq b_{h}

. Given that

0 < r < 1

and a non-negative sequence

{μ_{L}}, μ_{L} \leq 1 - r

and

\lim_{L \to + \infty} μ_{L} = 0

, for

L = 1, 2 \dots

, denoted by Equation (9),

δ_{L} = \sum_{q = 1}^{m} δ_{L, q}, δ_{L, q} = (1 - r - μ_{L}) {‖e_{L - 1}‖}^{2}, q = 1, 2, \dots, m

(9)

If h_L satisfies the following inequality

{〈e_{L - 1}, h_{L}〉}^{2} \geq {b_{h}}^{2} δ_{L, q}, q = 1, 2, \dots, m

(10)

the output weights can be evaluated as follows:

[β_{1}^{*}, β_{2}^{*}, \dots, β_{L}^{*}] = \underset{β}{a r g m i n} ‖f - \sum_{j = 1}^{L} β_{j} h_{j} (X)‖

(11)

Then, we obtain

\lim_{L \to + \infty} ‖f - f_{L}^{*}‖ = 0

, where

f_{L}^{*} = \sum_{j = 1}^{L} β_{j}^{*} h_{j} (X), β_{j}^{*} = {[β_{j, 1}^{*}, β_{j, 2}^{*}, \dots, β_{j, m}^{*}]}^{T}

.

Theorem 1 demonstrates that the universal approximation capability of SCNs is theoretically guaranteed through the constraint of Inequality (10). Therefore, the continual addition of new hidden nodes based on (10) and (11) ensures that the model’s error will converge within the tolerance error.

3. Experimental Validation and Evaluation

3.1. Experimental Environment

The software and hardware equipment required for the experiment in this study are shown in Table 2.

3.2. Model Architecture and Parameter Setting

(1) The entire model architecture consists of three sequential modules: YOLOv5 for blade area localization, DINOv2 for visual feature extraction, and the Random Configuration Network (SCN) for defect classification. In the first stage, YOLOv5 was used for object detection. Before being input into YOLOv5, the original images captured by the unmanned aerial vehicle were adjusted to dimensions of 640 × 640, and the YOLOv5 output was used to crop the bounding box of the wind turbine blade area. In the second stage, the image of the wind turbine blade, following positioning and pruning, was resized to 224 × 224 and passed into the DINOv2 ViT-S/14 large vision model. This model output a 384-dimensional feature vector, representing the semantic and spatial information of the blade surface. In the third stage, the SCN received the DINOv2 feature vector and classified it. The SCN constructed a single-hidden layer feedforward neural network, randomly generated hidden nodes, and calculated the weights of the analytical output. The final prediction indicated whether the type of defect was damage, peeling paint, or dirt buildup. The integration of transformer-based feature extractors and adaptive SCN classifiers contributes to the overall accuracy and robustness of the proposed framework.

(2) In the blade-positioning experiment, the batch size of YOLOv5 was set to n = 16, the number of training instances was set to 100, the learning rate was set to

η

= 0.01, and the momentum was set to

γ

= 0.937. Choosing a lower learning rate and momentum can accelerate convergence and reduce oscillation. To accelerate the convergence of the model, improve its generalizability, and avoid premature over-fitting, Warmup and cosine annealing learning rates are used. When model training began, the Warmup preheating learning rate was selected, as shown in Equation (12). The cosine annealing strategy is shown in Equation (13).

W a r m u p_l r = l r_{i n} + (l r_{\max} - l r_{i n}) \cdot \frac{e p o c h_{n o w}}{e p o c h_{t t}}

(12)

l r = \frac{1}{2} l r_{\max} (1 + c o s (\frac{π \times e p o c h_{n o w}}{e p o c h_{t t}}))

(13)

where

l r_{i n}

is the initial learning rate of training,

l r_{\max}

is the preset learning rate,

e p o c h_{n o w}

is the current training round, and

e p o c h_{t t}

denotes the total training rounds.

(3) In the defect type recognition experiment, the maximum number of hidden nodes (

L_{\max}

= 500), the tolerance error (

ε

= 0.0001), and the maximum number of candidate nodes (

T_{\max}

= 100) were randomly configured. Selecting 500 hidden nodes can effectively guarantee the complete convergence of the SCN, and selecting 100 maximum candidate nodes can ensure that the error is minimized every time the SCN updates hidden nodes.

3.3. Analysis of Wind Turbine Blade Area Positioning

The dataset used for blade area positioning was obtained from a public online resource. It consists of 4590 high-resolution images of wind turbine blades captured by drones under different conditions. These images cover different perspectives, lighting changes, and backgrounds, such as hills, skies, and grasslands. The dataset was randomly divided into three subsets in an 8:1:1 ratio using a Python script, with 3672 images for training, 459 images for validation, and 459 images for testing. This random division ensured the data distribution was balanced and facilitated repeatable experimental evaluation. Some examples of the blade area positioning dataset are shown in Figure 6.

To effectively avoid over-fitting of the YOLOv5 target detection network in the early training stage, various image data enhancement techniques were adopted to enrich the diversity and complexity of the dataset. These techniques include Mosaic enhancement, which dramatically increases the diversity of datasets by combining multiple images into one image, and the flipping are rotation of the images. These two geometric transformation techniques can simulate the wind turbine blade images taken by the UAV from different perspectives so that the model can adapt to the positioning tasks at various angles. By changing the chromaticity, the wind turbine blade images under different illumination and weather conditions were simulated by adjusting the brightness, contrast, and saturation of the images to improve the model’s robustness. By adding noise, the enhancement method simulates image noise in the natural environment, helping the model learn how to accurately detect the target in images containing noise. The image enhancement methods are shown in Figure 7.

The prediction results from the wind turbine blade area positioning experiment can be classified into four cases: instances where a positive sample was correctly classified as a positive sample, i.e., a True Positive (TP); instances where positive samples were mistakenly identified as negative samples, i.e., False Negatives (FNs); instances where negative samples were mistakenly identified as positive samples, i.e., False Positives (FPs); and instances where negative samples have been correctly identified as such, i.e., True Negatives (TNs). These four situations correspond to two evaluation indicators for evaluating the performance of wind turbine blades, namely, Precision and Recall, and the corresponding calculation equations are shown in Equations (14) and (15), respectively. Given the difficulty of direct integral calculation, we used the interpolation method to simplify the process. Specifically, it was used to calculate the average value of the accuracy rate at different recall rate levels to obtain the AP value. This index can comprehensively reflect the performance of the blade position detection network; it can be calculated as shown in Equation (16).

p r e c i s i o n = \frac{T P}{T P + F P}

(14)

r e c a l l = \frac{T P}{T P + F N}

(15)

A P = \int_{0}^{1} P (r) d r \approx \sum_{n = 1}^{11} \max_{\tilde{n} \geq n} P (\tilde{n}) Δ r (n)

(16)

The YOLO series single-stage target detection model and Faster-RCNN two-stage model were used to locate the wind turbine blades in this experiment. The accuracy of each model in locating wind turbine blades is shown in Table 3. AP: 50 represents the average accuracy calculated when the IOU threshold is 0.5, and the formula for calculating the IOU is shown in Equation (17). The IOUs of the predicted and actual bounding boxes were calculated. If the IOU ≥ 0.5, the prediction result was considered correct.

I O U = \frac{a r e a (B_{p} \cap B_{a b})}{a r e a (B_{p} \cup B_{a b})}

(17)

In this equation,

B_{p}

is the model’s prediction boundary, and

B_{a b}

is the actual boundary of the wind turbine blade.

As shown in Table 3, YOLOv5 had the best accuracy and speed and is the preferred model for wind turbine blade area positioning tasks. The accuracy of the FasterRCNN two-stage target detection network was much lower than that of the YOLO series, and its inference speed was also lower than that of YOLOv5 and YOLOv7. Under our experimental conditions, involving the use of an RTX4090 graphics card, YOLOv5’s speed in locating the original image of a single wind turbine blade reached 9.0 ms, meeting the real-time requirements of wind turbine inspection tasks. The experimental results regarding YOLOv5’s ability to locate wind turbine blades are shown in Figure 8.

Figure 9a–d show the trend of the learning rate during the training process. Warmup and cosine decay strategies were adopted. After three rounds of preheating, the learning rate reached the preset value and then decayed with the cosine curve. In the Warmup rounds, a lower learning rate was used for training to avoid over-fitting and reduce the oscillation of the training process. After the model became stable, the preset learning rate was used for training. The cosine annealing strategy allows the learning rate to change according to the cosine function during the training process, allowing a smooth attenuation of the learning rate.

3.4. Analysis of Wind Turbine Blade Defect Classification

The dataset used to classify wind turbine blade defects consisted of the blade area images located by YOLOv5, comprising a total of 2989 images. The specific categories and division of wind turbine blade images are shown in Table 4.

The feature vectors extracted by DINOv2ViT-s/14, ResNet50, and ResNet18 were reduced to three-dimensional space using the t-SNE algorithm to realize feature visualization [34]. t-SNE can reduce the dimensions of a feature vector and adapt to human observation. However, it also leads to a loss of some feature information. The feature vectors extracted by DINOv2ViT-s/14 allowed for the classification of the wind turbine defects into three categories. The vectors extracted by ResNet18 and ResNet50 are very chaotic, making the defect classification accuracy of the Stochastic Configuration Network low. Feature visualization is shown in Figure 10.

As shown in Figure 11, using the pre-trained DINOv2ViT-s/14 model as the feature extractor and an SCN as the classifier yielded the highest wind turbine blade defect classification accuracy. The feature vector extracted by PCA had the lowest classification accuracy, mainly because PCA does not rely on prior experience and can only capture the linear features of the data, making it difficult to deal with high-dimensional data such as images. When combined with DINOv2ViT-s/14 and transfer learning, ResNet’s feature extraction effect with respect to ImageNet pre-training was better than that of PCA. Therefore, the feature extraction method consisting of applying DINOv2 and ResNet combined with transfer learning effectively improved the accuracy of the classifier.

As shown in Table 5, which compares the SCN and other classification algorithms, K-means clustering yielded a value of k = 3, and the KNN yielded an initial k = 5; SVM adopts a linear kernel function, and the evaluation standard for random forest is the Gini coefficient. The experimental results show that the SCN with automatically updated nodes and randomly assigned weights and offsets can reduce the error, effectively fit wind turbine blade data exhibiting different defect types, and improve the accuracy of defect classification. The results show that the performance of DINOv2 is significantly better than that of ResNet18 for the same classifier, reflecting the enhancement of its visual representation learning ability. When using the features extracted by DINOv2, SCN exhibited the best classification performance among all the evaluated classifiers, verifying that the SCN has stronger discriminative learning and recognition ability. The DINOv2 + SCN method exhibited good feature extraction ability and could effectively classify defective wind turbine blades.

To evaluate the feasibility of real-time deployment, the inference speed of each component in the proposed defect recognition framework was measured. The YOLOv5-based blade-positioning module achieved an average inference time of 9.0 ms per image, while the DINOv2 model required 10.60 ms for feature extraction. The SCN classifier completed defect-type classification with a time of only 0.05 ms per image. The total end-to-end inference time was approximately 19.65 ms, which corresponds to over 50 frames per second (FPS). This result demonstrates that the proposed method satisfies the requirements for real-time blade defect detection.

To evaluate the contribution of the blade region localization stage, an ablation experiment was conducted by removing the YOLOv5-based positioning module. Under these conditions, the original UAV images containing background elements such as sky, hills, and vegetation were directly input into the DINOv2 feature extractor without being cropped. As shown in Table 5, the classification accuracy dropped from 0.987 (with YOLOv5) to 0.944 (without YOLOv5). This result confirms that YOLOv5 can effectively suppress background interference and enable more focused and accurate feature extraction by DINOv2, thereby improving overall classification performance.

4. Conclusions

To address the low positioning efficiency for wind turbine blades, we used the YOLOv5 target detection network to remove complex background areas and retain the areas containing the blades. Aiming to address the poor robustness of feature extraction, poor adaptability to different defect types, and the low accuracy of defect classification for defective wind turbine blades, we propose a method for detecting and classifying defects in wind turbine blades based on using DINOv2ViT-s/14 as a feature extractor and a random configuration network, combined with transfer learning, as a classifier. Our conclusions are as follows.

(1) The original image data contain many complex background images, interfering with detecting blade defects. YOLOv5 was used to crop the blade areas. To avoid premature over-fitting of YOLOv5, several image enhancement techniques were applied to enhance dataset diversity. This strategy broadened data variation, improving model generalization and robustness.

(2) To address the problem wherein the traditional feature extraction method relies on prior experience, manual design, and the extraction of features, resulting in poor expression ability and low classification accuracy, we used DINOv2ViT-s/14 combined with transfer learning as a feature extractor and a random configuration network as a classifier., The accuracy of classifying defective wind turbine blades reached 98.7% using this method.

In this study, when using an Nvidia RTX4090 GPU, the AP: 50 of the wind turbine blade area was 99.4%, the inference speed for a single picture was 9.0 ms, and the wind turbine blade defect classification accuracy was 98.7%. The proposed method outperforms existing techniques in both accuracy and speed for wind turbine blade image positioning and defect classification.

While the proposed method effectively classifies the surface defects of wind turbine blades, it does not provide information about the shape or extent of the defects, limiting its ability to support finer-grained maintenance strategies that rely on a quantitative analysis of defect areas. Future work will explore the integration of segmentation networks to extract the exact locations and sizes of defects, enabling severity-level estimation and enhancing the practical value of the system in condition-based maintenance scenarios.

Author Contributions

Conceptualization, J.T.; methodology, X.L. and J.T.; software, X.L.; validation, X.L.; formal analysis, X.L.; investigation, X.L. and J.T.; resources, H.L.; data curation, X.L.; writing—original draft, X.L. and J.T.; writing—review and editing, Z.Z.; visualization, X.L.; supervision, L.S.; project administration, L.S.; funding acquisition, X.P. and J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Liaoning Province, China (2023JH2/101700261, 2024-MS-217); the Department of Education of Liaoning Province, China (LJ222411632035, LJ212411632075); and the Shenyang Science and Technology Plan Project (24-213-3-29).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Abbreviations

YOLO	You Only Look Once
SCN	Stochastic Configuration Network
DINO	DETR with Improved Denoising Anchor Boxes
CNN(Conv)	Convolutional Neural Network
Faster-RCNN	Faster Region Convolutional Neural Network
VGG	Visual Geometry Group Network
ResNet	Residual Network
SSD	Single-Shot Multi-Box Detector
SVM	Support Vector Machine
RPCA	Recursive Principal Component Analysis
MCNN	Multi-Channel Convolutional Neural Network
ART	Adaptive Resonance Theory
CSP	Cross-Stage Partial Network
CBS	Conv + Batch Normalization + Silu
SPP	Spatial Pyramid Pooling
BN	Batch Normalization
Silu	Sigmoid Linear Unit
AP	Average Precision
IOU	Intersection over Union
PCA	Principal Component Analysis
t-SNE	t-Distributed Stochastic Neighbor Embedding
KNN	K Nearest Neighbor
K-means	k-Means Clustering

References

Joshuva, A.; Sugumaran, V. A Comparative Study of Bayes Classifiers for Blade Fault Diagnosis in Wind Turbines through Vibration Signals. Struct. Durab. Health Monit. 2017, 12, 69–90. [Google Scholar]
Rezamand, M.; Kordestani, M.; Carriveau, R.; Ting, D.S.-K.; Saif, M. A New Hybrid Fault Detection Method for Wind Turbine Blades Using Recursive PCA and Wavelet-Based PDF. IEEE Sens. J. 2020, 20, 2023–2033. [Google Scholar] [CrossRef]
Wang, M.H.; Lu, S.D.; Hsieh, C.C.; Hung, C.C. Fault Detection of Wind Turbine Blades Using Multi-Channel CNN. Sustainability 2022, 14, 1781. [Google Scholar] [CrossRef]
Guo, J.; Liu, C.; Cao, J.; Jiang, D. Damage Identification of Wind Turbine Blades with Deep Convolutional Neural Networks. Renew. Energy 2021, 174, 122–133. [Google Scholar] [CrossRef]
Yang, X.; Zhang, Y.; Lv, W.; Wang, D. Image Recognition of Wind Turbine Blade Damage Based on a Deep Learning Model with Transfer Learning and an Ensemble Learning Classifier. Renew. Energy 2021, 163, 386–397. [Google Scholar] [CrossRef]
Shihavuddin, A.S.; Chen, X.; Fedorov, V.; Nymark Christensen, A.; Andre Brogaard Riis, N.; Branner, K.; Bjorholm Dahl, A.; Reinhold Paulsen, R. Wind Turbine Surface Damage Detection by Deep Learning Aided Drone Inspection Analysis. Energies 2019, 12, 676. [Google Scholar] [CrossRef]
Xu, D.; Wen, C.; Liu, J. Wind Turbine Blade Surface Inspection Based on Deep Learning and UAV-Taken Images. J. Renew. Sustain. Energy 2019, 11, 053305. [Google Scholar] [CrossRef]
Lv, L.; Yao, Z.; Wang, E.; Ren, X.; Pang, R.; Wang, H.; Zhang, Y.; Wu, H. Efficient and Accurate Damage Detector for Wind Turbine Blade Images. IEEE Access 2022, 10, 2169–3536. [Google Scholar] [CrossRef]
Zhao, X.Y.; Dong, C.Y.; Zhou, P.; Zhu, M.J.; Ren, J.W.; Chen, X.Y. Detecting Surface Defects of Wind Turbine Blades Using an AlexNet Deep Learning Algorithm. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2019, E102-A, 1817–1824. [Google Scholar] [CrossRef]
Zhu, J.; Wen, C.; Liu, J. Defect Identification of Wind Turbine Blade Based on Multi-Feature Fusion Residual Network and Transfer Learning. Energy Sci. Eng. 2022, 10, 219–229. [Google Scholar] [CrossRef]
Yu, Y.; Cao, H.; Yan, X.; Wang, T.; Ge, S.S. Defect Identification of Wind Turbine Blades Based on Defect Semantic Features with Transfer Feature Extractor. Neurocomputing 2020, 376, 1–9. [Google Scholar] [CrossRef]
Mao, Y.; Wang, S.; Yu, D.; Zhao, J. Automatic Image Detection of Multi-Type Surface Defects on Wind Turbine Blades Based on Cascade Deep Learning Network. Intell. Data Anal. 2021, 25, 463–482. [Google Scholar] [CrossRef]
Liu, Z.H.; Chen, Q.; Wei, H.L.; Lv, M.Y.; Chen, L. Channel-Spatial Attention Convolutional Neural Networks Trained with Adaptive Learning Rates for Surface Damage Detection of Wind Turbine Blades. Measurement 2023, 217, 113097. [Google Scholar] [CrossRef]
Zhang, C.; Wen, C.; Liu, J. Mask-MRNet: A Deep Neural Network for Wind Turbine Blade Fault Detection. J. Renew. Sustain. Energy 2020, 12, 053302. [Google Scholar] [CrossRef]
Yang, P.; Dong, C.; Zhao, X.; Chen, X. The Surface Damage Identifications of Wind Turbine Blades Based on ResNet50 Algorithm. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020. [Google Scholar]
Yao, Y.; Wang, G.; Fan, J. WT-YOLOX: An Efficient Detection Algorithm for Wind Turbine Blade Damage Based on YOLOX. Energies 2023, 16, 3776. [Google Scholar] [CrossRef]
Ran, X.; Zhang, S.; Wang, H.; Zhang, Z. An Improved Algorithm for Wind Turbine Blade Defect Detection. IEEE Access 2022, 10, 122171–122181. [Google Scholar] [CrossRef]
Foster, A.; Best, O.; Gianni, M.; Khan, A.; Collins, K.; Sharma, S. Drone Footage Wind Turbine Surface Damage Detection. In Proceedings of the 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), Nafplio, Greece, 26–29 June 2022. [Google Scholar]
Zhang, Y.; Yang, Y.; Sun, J.; Ji, R.; Zhang, P.; Shan, H. Surface Defect Detection of Wind Turbine Based on Lightweight YOLOv5s Model. Measurement 2023, 220, 113222. [Google Scholar] [CrossRef]
Zhang, R.; Wen, C. SOD-YOLO: A Small Target Defect Detection Algorithm for Wind Turbine Blades Based on Improved YOLOv5. Adv. Theory Simul. 2022, 5, 2100631. [Google Scholar] [CrossRef]
Ran, X.; Suyaroj, N.; Tepsan, W.; Lei, M.; Ma, H.; Zhou, X.; Deng, W. A Novel Fuzzy System-Based Genetic Algorithm for Trajectory Segment Generation in Urban Global Positioning System. J. Adv. Res. 2025, in press. [CrossRef] [PubMed]
Chen, H.; Sun, Y.; Li, X.; Zheng, B.; Chen, T. Dual-Scale Complementary Spatial-Spectral Joint Model for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 6772–6789. [Google Scholar] [CrossRef]
Wang, C.; Yang, J.; Jie, H.; Zhao, Z.; Wang, W. An Energy-Efficient Mechanical Fault Diagnosis Method Based on Neural-Dynamics-Inspired Metric SpikingFormer for Insufficient Samples in Industrial Internet of Things. IEEE Internet Things J. 2025, 12, 1081–1097. [Google Scholar] [CrossRef]
Wang, C.; Jie, H.; Yang, J.; Gao, T.; Zhao, Z.; Chang, Y.; See, K.Y. A Multi-Source Domain Feature-Decision Dual Fusion Adversarial Transfer Network for Cross-Domain Anti-Noise Mechanical Fault Diagnosis in Sustainable City. Inf. Fusion 2025, 115, 102739. [Google Scholar] [CrossRef]
Wang, C.; Shu, Z.; Yang, J.; Zhao, Z.; Jie, H.; Chang, Y.; Jiang, S.; See, K.Y. Learning to Imbalanced Open Set Generalize: A Meta-Learning Framework for Enhanced Mechanical Diagnosis. IEEE Trans. Cybern. 2025, 55, 1464–1475. [Google Scholar] [CrossRef] [PubMed]
Zheng, Y.; Liu, Y.; Wei, T.; Jiang, D.; Wang, M. Wind Turbine Blades Surface Crack-Detection Algorithm Based on Improved YOLO-v5 Model. J. Electron. Imaging 2023, 32, 033012. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
Liu, X.; Liu, C.; Jiang, D. Wind Turbine Blade Surface Defect Detection Based on YOLO Algorithm. In Proceedings of the International Congress and Workshop on Industrial AI and eMaintenance 2023, Luleå, Sweden, 13–15 June 2023; pp. 367–380. [Google Scholar]
Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; et al. DINOv2: Learning Robust Visual Features without Supervision. arXiv 2023, arXiv:2304.07193. [Google Scholar]
Caron, M.; Touvron, H.; Misra, I.; Jegou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 9650–9660. [Google Scholar]
Wang, D.; Li, M. Stochastic Configuration Networks: Fundamentals and Algorithms. IEEE Trans. Cybern. 2017, 47, 3466–3479. [Google Scholar] [CrossRef] [PubMed]
van der Maaten, L.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]

Figure 1. Wind turbine blade-positioning and defect classification.

Figure 2. Framework of the wind turbine blade defect recognition process.

Figure 3. A flow chart of wind turbine blade area positioning based on YOLOv5.

Figure 5. A structural diagram of the Stochastic Configuration Network.

Figure 6. Example images from the wind turbine blade area positioning dataset.

Figure 7. Wind turbine blade area positioning data enhancement methods.

Figure 8. YOLOv5 wind turbine blade positioning.

Figure 9. YOLOv5 training curve: (a) training set loss curve; (b) verification set loss curve; (c) recall, MAP:50, and MAP50-95; and (d) learning rate.

Figure 10. Wind turbine blade feature visualization: (a) ResNet18, (b) ResNet50, and (c) DINOv2ViT-s/14 feature extraction.

Figure 11. Experimental curve of defect classification.

Table 1. Summary of related work.

References	Wind Turbine Blade Positioning	Blade Defect Feature Extraction	Blade Defect Classification
[1]	-	Descriptive statistical parameters + J48 decision tree algorithm	Bayesian classification
[2]	-	RPCA	RPCA
[3]	-	MCNN	ART
[4]	Haar-AdaBoost	Improved VGG16	Fully connected layer of VGG16
[5]	Otsu algorithm	AlexNet	Random forest
[6]	-	Inception-ResNet-v2	Fully connected layer of Faster-RCNN
[7]	-	Improved VGG11	Fully connected layer of VGG11
[8]	-	ResNet50	Fully connected layer of SSD
[9]	-	AlexNet	AlexNet
[10]	-	Improved ResNet34	Fully connected layer of
[11]	-	DCNN	SVM
[12]	Improved k-means algorithm	Fine-tuned ResNet101	Fully connected layer of R-CNN
[13]	-	Improved VGG16	Fully connected layer of VGG16
[14]	MR algorithm	ResNet50	DenseNet-121
[15]	-	ResNet50	Fully connected layer
[16]	-	CSPDarknet+RepVGG	Fully connected layer of YOLOvX
[17]	-	CSPDarknet	Fully connected layer of YOLOv5
[18]	-	CSPDarknet	Fully connected layer of YOLOv5
[19]	-	MobileNetv3	Fully connected layer of YOLOv5
[20]	-	CSPDarknet	Fully connected layer
This study	YOLOv5	Feature extraction of DINOv2 large vision model	Stochastic Configuration Network

Table 2. Experimental equipment.

Computer Software and Hardware	Version/Model
GPU	Nvidia RTX 4090 (24 GB)
Python	3.8
CPU	Intel Xeon Platinum 8352 V
CUDA	11.3
Operating System	Ubuntu 22.04
Pytorch	1.10

Table 3. Localization accuracy of wind turbine blade area for each model.

Blade-Positioning Model	AP:50	Reasoning Speed
YOLOv3	0.993	10.3 ms
YOLOv5	0.994	9.0 ms
YOLOv7	0.941	8.1 ms
YOLOv8	0.990	21.9 ms
YOLOv9 FasterRCNN	0.994 0.909	31.3 ms 9.6 ms

Table 4. Division and statistics of wind turbine blade defect dataset.

Sample	Damage	Paint Shedding	Dirt Buildup
Training set	647	687	758
Validation set	138	148	163
Testing set	139	147	162
Total images	924	982	1083

Table 5. Experimental results of wind turbine blade defect classification.

Wind Turbine Blade Defect Classification Algorithm	Accuracy
DINOv2 + SCN	0.987
DINOv2 + Kmeans	0.515
DINOv2 + KNN	0.982
DINOv2 + SVM	0.906
DINOv2 + random forest	0.983
DINOv2 + RVFL	0.958
ResNet50 + SCN	0.873
ResNet18 + SCN	0.795
PCA + SCN	0.453
RESNet18 + KNN	0.759
RESNet18 + Kmeans	0.360
DINOv2 + SCN (No YOLOv5 positioning)	0.944

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Tian, J.; Pang, X.; Shen, L.; Li, H.; Zheng, Z. Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning. Sensors 2025, 25, 4414. https://doi.org/10.3390/s25144414

AMA Style

Li X, Tian J, Pang X, Shen L, Li H, Zheng Z. Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning. Sensors. 2025; 25(14):4414. https://doi.org/10.3390/s25144414

Chicago/Turabian Style

Li, Xin, Jinghe Tian, Xinfu Pang, Li Shen, Haibo Li, and Zedong Zheng. 2025. "Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning" Sensors 25, no. 14: 4414. https://doi.org/10.3390/s25144414

APA Style

Li, X., Tian, J., Pang, X., Shen, L., Li, H., & Zheng, Z. (2025). Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning. Sensors, 25(14), 4414. https://doi.org/10.3390/s25144414

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wind Turbine Blade Defect Recognition Method Based on Large-Vision-Model Transfer Learning

Abstract

1. Introduction

1.1. Literature Review

1.2. Motivation and Contributions

2. Wind Turbine Blade Positioning and Defect Recognition

2.1. Framework for Wind Turbine Blade Defect Recognition

2.2. Wind Turbine Blade Area Positioning Method

2.3. Classification of Wind Turbine Blade Defects

2.3.1. Method for Extracting Features Utilizing DINOv2

2.3.2. Feature Vector Classification Based on a Stochastic Configuration Network

3. Experimental Validation and Evaluation

3.1. Experimental Environment

3.2. Model Architecture and Parameter Setting

3.3. Analysis of Wind Turbine Blade Area Positioning

3.4. Analysis of Wind Turbine Blade Defect Classification

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI