1. Introduction
The year 2023 marked a milestone for the wind industry, achieving unprecedented growth with a 50% increase in installations compared to the previous year. This surge reflects the increasing global demand for renewable energy sources, driven in part by ambitious international goals, such as the commitment in the COP28 agreement to triple renewable energy capacity by 2030 [
1].
Wind turbines play a pivotal role in the transition to renewable energy. However, the blades are particularly vulnerable, as they experience the highest levels of stress and are more susceptible to environmental damage [
2]. Wind turbine blades (WTBs) are subjected to strong loads during operation. Cyclic loading can result in fatigue-induced degradation, which can develop cracks on the blade and cause the blade to fail prematurely [
3]. Wind turbines are also exposed to harsh and variable environmental conditions while in operation, including storm winds, rain, hail, ice accumulation, bird collisions, and lightning [
2,
3]. Studies indicate that blade failures account for approximately 19.4% of all wind turbine failures, underscoring the urgent need for effective inspection and maintenance strategies [
4]. This paper aims to investigate some of these challenges and propose a potential solution for the detection and classification of WTB defects.
There are many ways to minimize downtime and ensure healthy WTBs during operation. The blades can be tested statically, with numerical or physical methods, or they can be put through a process known as fatigue testing to estimate the life of the blades [
3]. However, once WTBs are installed, unpredictable events and potential damage may occur, highlighting the importance of early detection through inspection methods. The primary goal of WTB inspections is to prevent disasters before they materialize. Regular inspections not only minimize risks but also enhance efficiency by reducing downtime and operational costs [
3].
Traditionally, wind turbines undergo manual inspections by rope teams, who visually examine the blades for signs of damage. While effective, this method is labor-intensive, time-consuming, and potentially hazardous. Recent advancements in technology have introduced alternative solutions, such as drone-based inspections investigated in this paper, which offer safer and more efficient options [
5]. Compared to manual inspections, drone-based solutions can lead to cost reductions of up to 70% for the inspection itself and reduce revenue lost to downtime by up to 90% [
6]. Complementing these advancements with breakthroughs in deep learning has enabled the use of neural networks for defect detection, significantly improving accuracy and reliability. Convolutional neural networks (CNNs) have proven effective for anomaly detection [
7], and sophisticated architectures such as YOLO, faster region-based CNN (R-CNN), and Mask R-CNN have shown promise in identifying and segmenting defects on turbine blades [
8].
Building on these advancements, and previous works [
3,
7,
9], this study focuses on enhancing the performance of Mask R-CNN for defect detection on wind turbine blades. As opposed to [
7], which classifies the presence of WTB damage with deep learning neural networks, we propose a post-processing system designed to improve detection accuracy by leveraging fuzzy voting. Note that although [
9] implements fuzzy logic as a dataset preprocessing step, it does not implement any voting between networks, nor is it applied to defect detection on wind turbine blades. Our proposed system aggregates outputs from multiple Mask R-CNN models and applies fuzzification techniques combined with various processing methods. The goal is to improve the robustness and reliability of defect detection, paving the way for more effective maintenance practices in the wind energy sector. The Mask R-CNN models used in this study utilize three different backbone networks: Visual Geometry Group 19-layer CNN (VGG19), Extreme Inception (Xception), and Residual Network 50-layer (ResNet-50). To our knowledge, voting systems have been applied to CNNs in the past [
10] as well as Mask R-CNN [
11]. However, ref. [
10] utilized a basic weighted majority voting approach among various detection systems, resulting in only a slight improvement in accuracy for Pneumonia detection from chest X-ray radiograph images. Similarly, ref. [
11] also proposed weighted majority voting with Mask R-CNN, using predictions’ confidence as the respective weights instead of manually assigned values. This led to significantly fewer false positives. We were unable to find any voting systems with similar applications that also utilize fuzzy logic. The key contributions of this paper are as follows:
The creation and annotation of a novel RGB dataset comprising 3000 aerial images depicting a Primus Air Max small wind turbine mimicking damages and defects that can be commonly found on wind turbine blades including cracks, holes, and edge erosion.
The implementation of Mask R-CNN architectures leveraging VGG19, Xception, and ResNet-50.
Detailed tuning strategy for the Mask R-CNN models utilizing the Hyperband algorithm.
The design, modeling, training, and implementation of a multi-variable fuzzy (MVF) voting system and a data preprocessing stage to utilize outputs from multiple Mask R-CNN models, thereby enhancing detection accuracy.
The remainder of this paper is organized as follows: The theoretical background of this paper is covered in
Section 2. This includes the CNN backbone architectures, such as VGG19, Xception, and ResNet, as well as Mask R-CNN, and the mathematical basis for model evaluation and accuracy. Our self-created and annotated RGB dataset with cracks, holes, and edge erosion is presented in
Section 3, along with the hyperparameter tuning of backbone architectures with Hyperband. The main innovation of this paper is the MVF voting system proposed in
Section 4. The experimental results are presented in
Section 5 followed by the conclusions and suggestions for future work in
Section 6.
4. Proposed Multi-Variable Fuzzy (MVF) Voting for Defect Detection Using Mask R-CNN with Xception, ResNet, and VGG19 Backbones
Fuzzy logic provides a robust approach to reasoning under uncertainty by enabling input variables to continuously range from 0 to 1, rather than exclusively taking binary true/false values. This characteristic makes fuzzy logic particularly effective for decision-making with imprecise or ambiguous input parameters such as image-based defect detection, where outputs often exhibit ambiguity and partial correctness. Therefore, it is used to combine results and facilitate voting among different Mask R-CNN models.
We propose a multi-variable fuzzy (MVF) voting system, which leverages fuzzy inference to integrate predictions from three Mask R-CNN models (Xception, ResNet, and VGG19) and produce a refined set of defect masks for wind turbine blades. In this research, MVF voting refers solely to the proposed methodology and does not have an established meaning beyond identification. The MVF system comprises the following key components: crisp input extraction, input variable fuzzification, fuzzy rule-based inference, defuzzification, and refined output generation. Each step is explained in detail below.
4.1. Crisp Input Extraction and Preprocessing
Predictions from all Mask R-CNN backbones (Xception, ResNet, and VGG19) often exhibit issues such as overlapping masks, fragmented masks, or empty masks with only a few pixels. To address this, we apply a mask preprocessing stage, which includes:
Mask isolation: Masks containing multiple disconnected regions are separated into individual masks, each corresponding to a single defect. Isolated masks are identified using SciPy’s ndimage.label function [
26].
Noise filtering: Masks with an area smaller than a defined threshold (five or fewer pixels) are discarded.
Mask merging: Highly overlapping masks are merged using a logical OR operation to prevent duplicates.
These preprocessing steps are applied to the output of each backbone individually, reducing false positives and forming the foundation for subsequent fuzzy reasoning.
Figure 6 illustrates the impact of the mask-cleaning pipeline on Mask R-CNN predictions.
4.2. Grouping of Mask Predictions
To facilitate voting among masks, it is essential to identify which masks correspond to the same defect. Predictions from different models that overlap are grouped together. Each group becomes an individual unit for fuzzy evaluation. For example, if all three models predict a defect in the same area of the image, their masks are grouped together. However, if overlapping predictions refer to distinct defect types (e.g., a crack and a hole), separate groups are maintained.
4.3. Fuzzification of Input Variables
In this stage, the MVF voting system evaluates each group of masks to determine whether it represents a true positive or a false positive. If the group is deemed valid, the best mask within the group is selected and included in the output.
Fuzzification maps crisp input values to fuzzy sets using defined membership functions. Two fuzzy input variables are defined and used:
Confidence score (
c): Derived from the maximum confidence value among all masks in a group as described in
Section 4.3.1.
Agreement score (
a): Based on the maximum Intersection over Union (IoU) among the masks in the group as described in
Section 4.3.2.
These fuzzy variables are transformed into fuzzy values using linear clamped membership functions, described in Equation (
7):
where
x represents the input value (e.g., confidence, IoU, etc.),
a is the starting point of interpolation, and
b marks the endpoint, beyond which the function remains constant at 1.
4.3.1. Confidence Score
The Mask R-CNN network assigns a confidence value between 0% to 100% to each prediction. This confidence level was identified as the most reliable indicator of a mask’s quality, making it a critical factor in the voting process. To generate the confidence score, the highest confidence among the masks in a group is used as input to Equation (
8), as shown in
Figure 7.
where
c is the confidence score, and X. Conf., V. Conf., and R. Conf. are the respective confidence values from the Xception, VGG19, and ResNet-50 masks, respectively. The membership function used to calculate the confidence score,
c, is visualized in
Figure 7.
4.3.2. Agreement Score
When all three models predict damage in the same area of a wind turbine blade, it strongly indicates true damage. The MVF voting system incorporates this by creating the agreement score,
a, which quantifies the level of agreement between the models. This score is derived by fuzzifying the maximum IoU between the three models, as defined in Equation (
9) and illustrated in
Figure 8.
where
,
, and
represent the IoUs between “Xception and VGG19”, “VGG19 and ResNet-50”, and “ResNet-50 and Xception”, respectively. This relationship is visualized in
Figure 8. The agreement score,
a, is a constant for each group and does not vary between individual masks.
4.4. Defuzzification and Decision Threshold
The fuzzy inference output is defuzzified to a crisp value representing the group validity score (
v). A simple aggregation method is used to calculate the group validity score from the confidence and agreement scores, as seen in Equation (
10).
where
v is the validity score,
c is the confidence score, and
a is the agreement score. If
, the group is accepted as a valid detection, and the mask with the highest individual confidence is selected as the group’s representative output. Otherwise, the group is discarded.
The final output of the MVF voting system is a consolidated and refined set of high-quality defect masks. By leveraging fuzzy reasoning and integrating predictions from diverse backbone architectures, the MVF voting system enhances detection accuracy, robustness, and generalization beyond what any single model can achieve. The steps of the proposed MVF voting system are delineated in
Figure 9, illustrated in
Figure 10, and enumerated in Algorithm 1.
Algorithm 1 Multi-variable fuzzy (MVF) defect Mask R-CNN voting. |
Input: Mask R-CNN predictions from Xception, VGG19, and ResNet-50 for one image. |
Output: A new set of masks. - 1:
Cleaning: - 2:
Split isolated masks. - 3:
Discard small masks. - 4:
Merge overlapping masks. - 5:
Grouping: Group masks that overlap together. - 6:
Fuzzification: For each group, calculate the degree of membership for both the confidence score and agreement score. - 7:
- 8:
- 9:
Defuzzification: - 10:
: - 11:
If : - 12:
The group passes. - 13:
Return the highest confidence mask.
|
5. Experimental Results
The Mask R-CNN models with VGG19, Xception, and ResNet-50 backbones were trained and evaluated using the created dataset introduced in
Section 3.1. Each backbone underwent hyperparameter tuning, and the optimal hyperparameters, as detailed in
Section 3.2, were used to generate the results provided in
Table 3,
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9. Validations were performed on a separate dataset of 300 images; the accuracies of the three base models and the proposed MVF voting system are presented in
Table 3.
Among individual backbones, Xception achieved the highest performance with an mAP of 77.48% (
Table 3, row 1, column 2). The best F
1 score, 78.33%, was achieved by VGG19 (
Table 3, row 2, column 3).
The predictions from all three models were further processed using the proposed MVF voting system, which improved accuracy.
The most notable improvement was observed in mAP, as it was the primary focus during the development of the voting system. However, no improvement was observed in the secondary metric, the F1 score, indicating that the models were already well-balanced in terms of precision and recall.
The MVF voting system increased the mAP by nearly 3%, from Xception’s 77.48% to 80.10%, as shown in the second column of
Table 3 in rows 2 and 5, respectively. This improvement was achieved by prioritizing recall at the expense of precision, resulting in a slight decline in the F
1 score.
Detailed class-based results from the proposed MVF voting system are provided in
Table 4. For completeness, matching results are also provided for the Xception, VGG19, and ResNet-50 backbones individually, in
Table 5,
Table 6, and
Table 7, respectively.
MVF outperformed Xception for all damage types except holes.
MVF outperformed VGG19 for all damage types except cracks.
ResNet-50 had the highest Recall for crack detection, even surpassing MVF, which had the highest recall across the board for other types of damage.
Each model was excellent at recognizing holes in wind turbine blades. Xception was the best with an mAP of 90.22%, being the only model to rise above 90% for any class.
Edge erosion proved to be the most challenging defect type for the models to identify. MVF voting achieved the highest mAP at 70.58%, exceeding the next-best model, VGG19, by over 5%.
The computational cost of the proposed MVF voting system is comparable to the cost of the three individual models. In order to use the voting system, Mask R-CNN predictions using each of the three backbones must be generated first. This is computationally intensive, as the deep neural networks require a significant amount of video memory (VRAM) to operate. A quick benchmark was performed, averaging the computation time of each step across the validation subset of the dataset (300 images). The results are presented in
Table 8.
The intended use case of the proposed MVF voting system is for static, offline processing of images, where speed is not as important. For real-time applications, to speed up inference, we propose using only one of the individual models. Xception, for example, has the highest precision for holes and erosion, as seen in
Table 5. VGG19 has the highest precision for cracks, as seen in
Table 6, making it a compelling candidate as well.
A notable advantage of the proposed MVF voting system is that it is highly customizable, with adjustable fuzzy parameters that can be adapted to specific applications, such as optimizing for precision or F
1 scores in targeted use cases. A modified version of the MVF voting system was developed specifically to maximize the F
1 score. This adjustment involved altering the membership function for the confidence score to range from 0.9 to 1.0, while the agreement score function was modified to range from 0.8 to 1.0. Additionally, the final cutoff threshold was increased from 0.8 to 0.9. The process of optimizing the voting system parameters was manual and not comprehensive. From experience, we knew that increasing the quality cutoff for each group would increase the F
1 score at the expense of mAP. Initial values for the parameters, numerically larger than before, were chosen by hand, and then iteratively adjusted with small nudges to arrive at the above set of F
1 score-focused parameters. As a result, the modified MVF voting system achieved an F
1 score of 80.07%, nearly 2% higher than the next best model, VGG19. Detailed class-based accuracy results for this alternative MVF voting system are provided in
Table 9.
6. Conclusions
To facilitate fully automated wind turbine blade inspection using deep learning and drones, three Mask R-CNN models with Xception, VGG19, and ResNet-50 backbones were developed, trained, and evaluated. The models were trained on a novel dataset created at Utah Valley University, which consists of 3000 RGB images (300 × 300 pixels) annotated with three common types of defects: cracks, holes, and edge erosion. To further improve defect detection accuracy, a multi-variable fuzzy (MVF) voting system was proposed, trained, and implemented. The MVF voting system functions as follows: (i) it takes sets of binary masks from the three Mask R-CNN models (VGG19, Xception, and ResNet-50) as input, (ii) preprocesses the input masks to clean them and groups any overlapping masks, and (iii) employs fuzzy logic to process each group, selecting the optimal masks to generate the final output.
The results showed that the fuzzy voting system significantly outperformed the individual models, providing a promising step toward safer and more efficient automated inspections of wind turbine blades. Among individual backbones, Xception achieved the highest mAP of 77.48% (
Table 5), while the proposed MVF voting systems improved the mAP score by nearly 3% to 80.10% (
Table 4). VGG19 had the highest F
1 score among individual models at 78.33% (
Table 6).
A key advantage of the proposed MVF voting system is its high customizability, with adjustable parameters to suit specific applications. For instance, a modified version of the MVF voting system, optimized for F
1 score, improved the F
1 performance from 74.52% (achieved by the unmodified MVF,
Table 4) to 80.07% (
Table 9). This score surpasses those of Xception (77.51%,
Table 5), VGG19 (78.33%,
Table 6), and ResNet-50 (74.52%,
Table 7).
Our future research will aim to improve the fuzzy voting system and explore new voting mechanisms capable of integrating more than three input models. A key approach to increase the accuracy would be to train the deep learning networks on a dataset larger than 3000 images. That will be a focus for future work, as well as extending the defect analysis to include fuzzy fault size computation, enhancing the depth of inspection insights.