Criticality Assessment of Wind Turbine Defects via Multispectral UAV Fusion and Fuzzy Logic

Radiuk, Pavlo; Rusyn, Bohdan; Melnychenko, Oleksandr; Perzynski, Tomasz; Sachenko, Anatoliy; Svystun, Serhii; Savenko, Oleg

doi:10.3390/en18174523

Open AccessArticle

Criticality Assessment of Wind Turbine Defects via Multispectral UAV Fusion and Fuzzy Logic

by

Pavlo Radiuk

^1,*

,

Bohdan Rusyn

^2,3

,

Oleksandr Melnychenko

¹

,

Tomasz Perzynski

³

,

Anatoliy Sachenko

^3,4

,

Serhii Svystun

¹

and

Oleg Savenko

¹

Faculty of Information Technologies, Khmelnytskyi National University, 11, Instytuts’ka Str., 29016 Khmelnytskyi, Ukraine

²

Department of Information Technologies of Remote Sensing, Karpenko Physico-Mechanical Institute of NAS of Ukraine, 79601 Lviv, Ukraine

³

Faculty of Transport, Electrical Engineering and Computer Science, Casimir Pulaski Radom University, 29, Malczewskiego St., 26-600 Radom, Poland

⁴

Research Institute for Intelligent Computer Systems, West Ukrainian National University, 11, Lvivska Str., 46009 Ternopil, Ukraine

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(17), 4523; https://doi.org/10.3390/en18174523

Submission received: 27 July 2025 / Revised: 18 August 2025 / Accepted: 21 August 2025 / Published: 26 August 2025

(This article belongs to the Special Issue Optimal Control of Wind and Wave Energy Converters)

Download

Browse Figures

Versions Notes

Abstract

Ensuring the structural integrity of wind turbines is crucial for the sustainability of wind energy. A significant challenge remains in transitioning from mere defect detection to objective, scalable criticality assessment for prioritizing maintenance. In this work, we propose a novel comprehensive framework that leverages multispectral unmanned aerial vehicle (UAV) imagery and a novel standards-aligned Fuzzy Inference System to automate this task. Our contribution is validated on two open research-oriented datasets representing small on- and offshore machines: the public AQUADA-GO and Thermal WTB Inspection datasets. An ensemble of YOLOv8n models trained on fused RGB-thermal data achieves a mean Average Precision (mAP@.5) of 92.8% for detecting cracks, erosion, and thermal anomalies. The core novelty, a 27-rule Fuzzy Inference System derived from the IEC 61400-5 standard, translates quantitative defect parameters into a five-level criticality score. The system’s output demonstrates exceptional fidelity to expert assessments, achieving a mean absolute error of 0.14 and a Pearson correlation of 0.97. This work provides a transparent, repeatable, and engineering-grounded proof of concept, demonstrating a promising pathway toward predictive, condition-based maintenance strategies and supporting the economic viability of wind energy.

Keywords:

defect criticality; fuzzy logic; artificial intelligence; multispectral fusion; sustainable energy; UAV inspection; wind turbine blades; YOLO; condition-based maintenance; structural health monitoring

Graphical Abstract

1. Introduction

As the global energy landscape pivots towards sustainability, wind power has become a cornerstone of renewable generation, with installed capacity expanding at an unprecedented rate. However, the long-term economic viability of this multi-trillion-dollar investment hinges on the effectiveness of Operations and Maintenance (O&M) strategies. A comprehensive review by Sun et al. highlights that ensuring the in situ structural integrity of turbine blades is a critical, yet unresolved, challenge [1]. These massive composite structures face relentless environmental and operational stresses, leading to damage that can degrade performance or precipitate catastrophic failures [2]. Consequently, O&M activities constitute a substantial fraction of the levelized cost of energy, particularly for offshore installations where non-contact inspection methods are essential to mitigate the high costs and risks of manual assessment [3].

The advent of Unmanned Aerial Vehicles (UAVs) has revolutionized data acquisition for structural health monitoring, offering a safe and efficient alternative to traditional methods, as surveyed by Zhang et al. [4]. The integration of advanced communication protocols like 5G and microservice architectures further enhances the autonomy and scalability of UAV-based inspections, enabling near real-time data streaming and distributed processing [5]. However, the immense volume of data collected necessitates automated analysis pipelines. While deep learning models, particularly the You Only Look Once (YOLO) architecture, have proven highly effective for raw defect detection [6,7], a significant ‘criticality gap’ remains. A simple list of detected flaws is insufficient for informed decision-making, as noted in studies on multi-type defect detection [8]. The urgent need is to move beyond detection to diagnosis and prognosis, a challenge reviewed by Sun et al. in the context of machine learning applications for fault diagnosis [9].

A robust criticality score, which quantifies the severity of a defect, is the missing link needed to transition from reactive, time-based maintenance to predictive, condition-based maintenance (CBM) [10]. Such a system must align with established engineering principles, as outlined in international standards like IEC 61400-5 [11] and industry taxonomies from the Electric Power Research Institute (EPRI) [12]. To our knowledge, this paper introduces the first end-to-end framework to bridge this gap by explicitly grounding an interpretable AI model in these standards. Our primary contribution is a transparent, ‘glass-box’ Fuzzy Inference System (FIS) that translates quantitative defect parameters into a standards-aligned criticality score. By validating our approach on diverse public datasets, the large-scale offshore AQUADA-GO video collection [13] and the multispectral Thermal Wind Turbine Blade (WTB) Inspection dataset [14], we introduce a reproducible and engineering-grounded solution. We demonstrate that this fusion of multispectral data and explainable AI provides an actionable tool for optimizing maintenance, enhancing safety, and ensuring the long-term sustainability of wind energy assets.

This paper introduces a complete, end-to-end framework designed to bridge this criticality gap. Our work builds upon foundational research in UAV inspection and AI-based detection but makes several novel contributions aimed at creating a system that is not only accurate but also transparent, reliable, and directly applicable in a real-world industrial setting. The goal of this study is to improve the objectivity and efficiency of wind turbine blade maintenance by developing an automated system that moves beyond simple defect detection to provide a reliable, standards-aligned criticality assessment. The primary objective is to create a transparent, data-driven tool that can prioritize repairs, optimize resource allocation, and enhance operational safety.

The major contributions of this work are as follows:

A robust multispectral detection framework, built on an ensemble of YOLOv8n models, that achieves a state-of-the-art mean Average Precision (mAP@.5) of 92.8% on the combined public AQUADA-GO and Thermal WTB Inspection datasets. In this context, ‘multispectral’ refers to the combined use of the visible Red–Green-Blue (RGB) and long-wave infrared (thermal) spectra.
A novel 27-rule ‘glass-box’ FIS for severity scoring, whose knowledge base is explicitly derived from the engineering principles of the IEC 61400-5 standard. The system demonstrates exceptional fidelity to expert assessments, achieving a mean absolute error of 0.14 and a Pearson correlation of 0.97.
A comprehensive and reproducible validation of the entire framework, featuring (i) ablation studies that quantify the critical impact of each component, (ii) a formal protocol for establishing expert-derived ground truth validated by a high inter-rater reliability (Fleiss’s $κ$ = 0.85), and (iii) a global sensitivity analysis confirming the FIS’s robustness to ±20% parameter variations.

The remainder of this manuscript is structured to detail every aspect of this framework. Section 2 provides an extensive review of the state of the art in relevant fields. Section 3 describes our methodology, from data acquisition and processing to the design of the proposed FIS. Section 4 presents the comprehensive empirical results and validation. Section 5 interprets these findings, discusses their implications, acknowledges limitations, and proposes future research directions. Finally, Section 6 summarizes the work and its contribution to the sustainable management of critical wind energy assets.

2. Related Work

The automated assessment of wind turbine blades is a multidisciplinary field drawing from advances in remote sensing, computer vision, and artificial intelligence. The evolution from hazardous manual inspections to autonomous systems has been driven by the maturation of UAV platforms capable of dynamic trajectory adaptation and precise maneuvering [15]. The diagnostic power of these platforms is determined by their sensor payloads. While high-resolution RGB cameras are standard for capturing surface details and enabling techniques like image stitching [16], the integration of thermal imaging has proven essential for a comprehensive non-destructive evaluation [17]. Thermal sensors reveal subsurface anomalies by detecting minute temperature variations, a technique validated for both blades and electrical components [18]. Research continues to explore more advanced modalities, such as hyperspectral imaging for identifying material degradation or icing [19], and the broader context of UAVs is expanding with developments in edge computing and thermal image processing incorporated with thermodynamics principles [20].

Effectively leveraging multispectral data hinges on intelligent fusion strategies, a topic thoroughly surveyed by Zhang et al. [21]. This can occur at the hardware level, as with the Multi-Spectral Dynamic Imaging (MSX) technology used in the Thermal WTB dataset [14], or through post-processing. Transform-domain methods, such as those using wavelets, have been successfully applied to enhance defect saliency in fused images [22], a principle also demonstrated in AI-based video fusion applications [23]. These traditional techniques are increasingly complemented by deep learning approaches that learn optimal fusion rules directly from data, using architectures like IFCNN [24] or FusionNet [25]. The core idea of combining data from multiple sensors to assess system health extends beyond imaging, as demonstrated in the performance degradation assessment of wind turbine gearboxes using vibration and operational data [26].

The automated analysis of this imagery is dominated by deep learning, which has supplanted classical computer vision methods. The performance of modern object detectors is built upon foundational architectures like ResNet [27] and the availability of large-scale pre-training datasets. Detection architectures have diverged into two main families: high-accuracy two-stage models like Faster R-CNN [28] and its descendants, Cascade R-CNN [29] and Mask R-CNN [30], which have been adapted for blade inspection [31], and high-speed single-stage models like EfficientDet [32] and the YOLO family [33]. The latter’s balance of speed and accuracy has made it a popular choice for detecting various defects, from multi-scale surface flaws to specific cracks [34,35,36]. To further enhance robustness, ensemble learning [37], combining multiple models to reduce variance, is a widely adopted strategy [38].

The final frontier is translating detection into decision support, a task for which AI-driven criticality assessment is essential. A comprehensive review by Al-Agha et al. highlights this trend [39], while another work explores alternative machine learning techniques for damage classification [40]. Fuzzy logic, with its ability to model the linguistic reasoning and uncertainty of human experts, has emerged as a particularly suitable paradigm. Previous work has demonstrated its potential for creating integrated detection-to-criticality pipelines [41]. The field is advancing towards more sophisticated methods, including adaptive neuro-fuzzy systems that can learn from data [42], a technology proven in related aerospace applications [43]. Furthermore, research into automated rule-based generation using techniques like ant colony optimization seeks to address the knowledge acquisition bottleneck [44]. Our work contributes to this area by proposing a fuzzy system that is not only accurate but also transparent and explicitly grounded in established international engineering standards, a crucial step for real-world adoption and certification. Unlike prior works employing adaptive neuro-fuzzy systems that can be opaque, our approach utilizes a static, Mamdani-type FIS where every rule is explicitly defined, making the entire reasoning process from sensor data to final score fully auditable by human experts.

While the reviewed literature demonstrates significant progress in sensor technology, data fusion, and deep learning for defect detection, a persistent gap remains in translating these high-accuracy detections into transparent, reliable, and actionable maintenance decisions. The purpose of this research is to bridge this gap by developing an integrated framework that not only identifies defects with high precision but also assesses their criticality based on established engineering standards. To achieve this, this study addresses three primary tasks: (i) the development of a robust multispectral defect detection framework using an ensemble of deep learning models validated on diverse public datasets, (ii) the design and implementation of a novel explainable FIS whose knowledge base is explicitly derived from the IEC 61400-5 standard, and (iii) a comprehensive empirical validation of the end-to-end system to quantify its accuracy, reliability, and practical value for condition-based maintenance.

3. Materials and Methods

The framework proposed in this study introduces a comprehensive, multi-stage methodology for progressing from raw, multispectral UAV imagery to an actionable and quantitative assessment of wind turbine defect criticality. This process is designed as a cyber-physical system that synergizes automated data processing with formalized expert knowledge. The overall architecture, depicted in Figure 1, is structured into three primary computational blocks that sequentially refine the data from initial detection to a final, integrated criticality score.

As illustrated in Figure 1, the workflow proceeds from raw data ingestion to a final decision support output. Block 1 (Data-Driven Detection and Measurement) is responsible for processing the raw imagery to identify defects and extract a rich set of quantitative features. Concurrently, Block 2 (Knowledge Modeling and Expert Priors) formalizes domain expertise into a set of physics-informed models that provide an initial estimate of criticality for different defect types. Finally, Block 3 (Fuzzy Fusion and Decision Support) serves as the integration core, employing a Fuzzy Inference System (FIS) to intelligently fuse the data-driven measurements from Block 1 with the knowledge-based estimates from Block 2. This is achieved through a sequence of fusion, aggregation, and defuzzification steps, which yield a robust, calibrated, and transparent final criticality score,

C_{final}

, ready for operational use.

3.1. Input Data and Defect Localization

The initial input to our system consists of multispectral image data streams, denoted as

x_{RGB}

for the visual spectrum and

x_{TH}

for the thermal spectrum, captured by the UAV. This imagery is augmented with essential flight metadata, including the UAV’s altitude (h) and its location (L) relative to the turbine structure. The foundational computational step is the detection of potential defects within these images. For this task, we employ an ensemble of three fine-tuned YOLOv8n deep learning models, chosen for their state-of-the-art balance of speed and accuracy. The output of this initial stage is a set of candidate detections for each image frame, where each detection is a structured data object containing the unique frame identifier, the predicted defect class, the pixel coordinates of the bounding box, and the model’s confidence score. This initial set serves as the input to the main criticality assessment workflow, with each bounding box defining the Region of Interest (ROI) for subsequent detailed analysis.

3.2. Block 1: Data-Driven Detection and Measurement

This block forms the data-driven foundation of the framework, processing raw sensor inputs to produce a structured set of quantitative defect parameters.

3.2.1. Data Synchronization, Normalization, and Feature Extraction

Once a defect is localized within an ROI, its corresponding data from both the RGB and thermal streams are synchronized. The raw pixel values (y) within the ROI undergo channel-wise normalization,

\hat{y} = (y - μ) / σ

, to standardize the intensity distributions and ensure robustness to variations in lighting and sensor calibration. Subsequently, an extensive, sequential image processing pipeline is executed on the synchronized and normalized ROI. This pipeline is designed to extract a precise and objective set of physical and thermal parameters, ensuring that all subsequent analysis is based on repeatable, quantitative measurements. The process culminates in the generation of two feature vectors for each detected defect

D_{i}

:

v_{RGB}

, containing geometric (e.g., area, perimeter) and textural parameters from the visual spectrum, and

v_{TH}

, containing thermal parameters (e.g., max/min/avg temperature, temperature gradients) from the infrared spectrum.

3.2.2. Region of Interest (ROI) Extraction and Preprocessing

The analysis begins by isolating the defect. Using the bounding box coordinates

(x_{ensemble}^{i}, y_{ensemble}^{i}, W_{ensemble}^{i}, H_{ensemble}^{i})

provided by the YOLOv8 ensemble, the corresponding ROI is cropped from the original, undistorted source image

I_{undistorted}

. This is represented by the function

f_{crop}

, which is formalized as follows:

I_{ROI}^{i} = f_{crop} (I_{undistorted}, x_{ensemble}^{i}, y_{ensemble}^{i}, W_{ensemble}^{i}, H_{ensemble}^{i}) .

(1)

Next, to mitigate high-frequency noise (e.g., from sensor imperfections or atmospheric interference) while preserving the sharp edges that are critical for defining defect boundaries, a bilateral filter,

f_{bilateralFilter}

, is applied to the extracted ROI as follows:

I_{smooth}^{i} = f_{bilateralFilter} (I_{ROI}^{i}, d, σ_{color}, σ_{space}),

(2)

where d is the diameter of the pixel neighborhood,

σ_{color}

is the filter sigma in the color space, and

σ_{space}

is the filter sigma in the coordinate space.

We empirically determined optimal values of

d = 9

,

σ_{color} = 75

, and

σ_{space} = 75

for this application through a grid search optimizing for edge preservation on a validation subset.

3.2.3. Contrast Enhancement and Adaptive Binarization

To enhance the visibility of defect features, particularly in challenging lighting conditions such as shadows or glare, we apply Contrast Limited Adaptive Histogram Equalization (CLAHE) as follows, where the clip limit

C_{L}

was set to 2.0 to prevent oversaturation, and the grid size

G_{S}

was set to (8, 8) pixels:

I_{equalized}^{i} = f_{CLAHE} (I_{smooth}^{i}, C_{L}, G_{S}) .

(3)

Unlike global histogram equalization, CLAHE operates on small tiled regions of the image, which prevents the over-amplification of noise in relatively uniform areas.

The contrast-enhanced image is then segmented into foreground (defect) and background pixels through adaptive thresholding (Equation (4)), which calculates a localized threshold for each pixel based on the intensity distribution in its neighborhood.

I_{binary}^{i} = f_{adaptiveThreshold} (I_{equalized}^{i}, B_{S}, C),

(4)

where the block size

B_{S}

for the thresholding neighborhood is 11 pixels and

C = 2

is a constant subtracted from the local mean.

This method in Equation (4) is highly effective for images with non-uniform illumination.

3.2.4. Morphological Filtering and Geometric Feature Extraction

The raw binary image

I_{binary}^{i}

may contain small spurious artifacts. To refine the defect mask, morphological operations are performed. An erosion operation (Equation (5)) is first applied to remove small isolated white pixels, followed by a dilation operation (Equation (6)) to restore the original size of the primary defect region.

I_{eroded}^{i} = f_{erode} (I_{binary}^{i}, S),

(5)

I_{processed}^{i} = f_{dilate} (I_{eroded}^{i}, S),

(6)

where S is a (5 × 5)-elliptical structuring element.

From the final processed binary image

I_{processed}^{i}

, contour analysis is performed to identify all distinct object boundaries. The largest contour,

C_{defect}^{i}

, is assumed to correspond to the primary defect. A set of geometric parameters is then computed from this contour, as detailed in the following Equations (7)–(9):

A_{pixels}^{i} = f_{contourArea} (C_{defect}^{i}),

(7)

P_{pixels}^{i} = f_{arcLength} (C_{defect}^{i}),

(8)

(x^{i}, y^{i}, W_{defect}^{i}, H_{defect}^{i}) = f_{boundingRect} (C_{defect}^{i}),

(9)

where

A_{pixels}^{i}

is the defect area in pixels,

P_{pixels}^{i}

is its perimeter, and the final line gives the dimensions of the minimal bounding rectangle.

3.2.5. Photogrammetric Scaling and Calibration

To convert these pixel-based measurements into physically meaningful units, a scaling factor

m^{i}

is computed for each defect, accounting for the UAV’s distance to the target and the camera’s optical properties, as shown in the following Equation (10):

m^{i} = \frac{Z^{i} \cdot p}{f},

(10)

where

Z^{i}

is the distance to the defect, p is the physical size of a sensor pixel, and f is the lens focal length.

The scaling factor

m^{i}

allows for the calculation of a set of real-world physical parameters

S_{physical} = {W_{real}^{i}, H_{real}^{i}, A_{real}^{i}, P_{real}^{i}}

. Crucially, the ‘Defect Size’ input to our FIS uses the metric area

A_{real}^{i}

(in

{mm}^{2}

), not the pixel area, to ensure that the assessment is invariant to inspection altitude.

In this work,

Z^{i}

in Equation (10) was not treated as a fixed constant but measured directly using the onboard Real-Time Kinematic (RTK) Global Navigation Satellite System (GNSS) integrated in the UAV, which provides an accuracy of

\pm 0.02

m at the typical operating range. The camera intrinsics (p, f) were calibrated prior to each flight campaign using a standard checkerboard pattern and the photogrammetric pipeline proposed by Zhang [45]. To assess the potential influence of this residual altitude uncertainty on the final criticality score, we performed a first-order error propagation analysis. Differentiating Equation (10) with respect to Z yields the uncertainty in the scaling factor, as follows:

δ m^{i} = \frac{\partial m^{i}}{\partial Z^{i}} Δ Z = \frac{p}{f} Δ Z,

(11)

where

Δ Z = \pm 0.02

m represents the worst-case GNSS error.

Substituting typical values for the sensor’s pixel pitch (p) and lens focal length (f) from the manufacturer’s datasheets yields a relative uncertainty in the calculated metric defect area of less than 3%. We propagated this uncertainty through the trapezoidal membership functions used in our FIS; the resulting change in membership grade was consistently below 0.08 on the [0, 1] scale for all but the very smallest defects, which are of lowest criticality. We therefore concluded that, for the precision of the RTK system used, no additional correction or uncertainty term was necessary for the FIS.

3.2.6. Thermal Analysis

For defects with a thermal component, an analogous analysis is performed on the radiometric thermal data within the contour

C_{defect}^{i}

to extract key temperature characteristics, as detailed in the following Equations (12)–(14):

T_{min}^{i} = min_{(x, y) \in C_{defect}^{i}} T (x, y),

(12)

T_{max}^{i} = max_{(x, y) \in C_{defect}^{i}} T (x, y),

(13)

T_{avg}^{i} = \frac{1}{N^{i}} \sum_{(x, y) \in C_{defect}^{i}} T (x, y),

(14)

where

T (x, y)

is the temperature at pixel

(x, y)

and

N^{i}

is the number of pixels within the contour.

To ensure consistency, all thermal data from the different sensors used in the public datasets are standardized to degrees Celsius (°C) before being passed to the analysis pipeline. The key input to the FIS, the ’Thermal Signature,’ is the differential temperature

Δ T = T_{max}^{i} - T_{ambient}

, where

T_{ambient}

is the average temperature of a non-defective region adjacent to the defect.

These physical and thermal parameter sets are then encapsulated into a final comprehensive data model,

D_{complete}^{i}

, for each defect, which serves as the input to the integration module.

3.3. Block 2: Formalization of Expert Criticality Functions

Operating in parallel to the data-driven parameterization, Block 2 of our framework translates the qualitative assessment criteria of human experts into formal mathematical models. These models provide an initial knowledge-based estimate of criticality,

C_{\exp} (M_{i})

, for a given defect type

M_{i}

(e.g.,

M_{crack}, M_{erosion}, M_{hotspot}

). Each model is designed to reflect the underlying physics of the failure mode associated with that defect type, providing a physics-informed baseline before the fuzzy integration stage.A crucial component of these models is the inclusion of a weighting coefficient that depends on the specific turbine component where the defect is located. These coefficients, presented in the Supplementary Materials, are derived from engineering standards and expert consultation, and quantify the structural or operational importance of each component. For instance, the criticality model for a crack (

M_{crack}

) is given by the following Equation (15):

C_{\exp} (M_{crack}) = β_{c} \cdot \int_{0}^{L} w_{visible} (s) \cdot | r^{'} (s) | \cdot (1 + κ (s)) d s,

(15)

where

β_{c}

is the component-specific weighting factor, L is the crack length,

w_{visible} (s)

is its visible width along its path s,

| r^{'} (s) |

accounts for its tortuosity, and

κ (s)

is its curvature. This model captures the principle from fracture mechanics that longer, wider, and more complex cracks pose a greater risk.

Similarly, the model for erosion (

M_{erosion}

) is primarily a function of the affected area, while the model for overheating (

M_{hotspot}

) depends on the temperature differential and the spatial temperature gradient. The explicit forms for these models are as follows:

C_{\exp} (M_{erosion}) = γ_{c} \cdot A_{real},

(16)

C_{\exp} (M_{hotspot}) = η_{c} \cdot {(Δ T_{max})}^{2} \cdot \bar{| \nabla^{2} T |},

(17)

where

γ_{c}

and

η_{c}

are component-specific weights,

A_{real}

is the erosion area,

Δ T_{max}

is the max temperature difference, and

\bar{| \nabla^{2} T |}

is the mean absolute temperature Laplacian.

These models yield an initial physics-informed criticality score before fuzzy integration. A detailed derivation of these models, including the full tabulation of all weighting coefficients, is provided in the Supplementary Materials.

3.4. Block 3: Fuzzy Logic Integration for Final Criticality Assessment

The core novelty of our framework lies in the integration module (Block 3), which employs a Mamdani-type FIS to intelligently fuse the objective data-driven measurements from Block 1 with the knowledge-based estimates from Block 2. The logic of this fusion is governed by a knowledge base of 27 IF-THEN rules, the complete set of which is provided in Appendix A, Table A1. This rule base was explicitly designed to ensure that the final criticality score is both empirically grounded and consistent with established engineering principles, with its design directly mirroring failure mode considerations from the IEC 61400-5 standard (see Appendix A, Table A3 for illustrative examples of this mapping).

3.4.1. Fuzzification of Data-Driven and Expert-Driven Inputs

The process begins with fuzzification. Each crisp physical parameter

p_{k}^{i}

within the comprehensive data structure

D_{complete}^{i}

(e.g., ‘Defect Area,’ ‘Max Temperature Difference’) is mapped to a set of linguistic variables (e.g., Small, Medium, Large) via trapezoidal membership functions

μ_{p_{k}} (x)

. These functions were parameterized using a hybrid approach, where initial estimates from a panel of domain experts were refined by aligning the function breakpoints with the empirical quantiles of the training data distribution (see Appendix A, Table A2 for details). A global sensitivity analysis, presented in Appendix A, Figure A1, confirmed that the system’s output is robust to moderate (±20%) variations in these parameters, which validates the stability of this custom rule-based approach. An example function is defined in the following Equation (18):

μ_{p_{k}} (x) = \{\begin{matrix} 0, & x \leq a_{k}; \\ (x - a_{k}) / (b_{k} - a_{k}), & a_{k} < x \leq b_{k}; \\ 1, & b_{k} < x \leq c_{k}; \\ (d_{k} - x) / (d_{k} - c_{k}), & c_{k} < x < d_{k}; \\ 0, & x \geq d_{k}, \end{matrix}

(18)

where

[a_{k}, d_{k}]

defines the support of the fuzzy set and

[b_{k}, c_{k}]

defines its core.

The fuzzy sets representing all of a defect’s physical parameters are then aggregated into a single data-driven fuzzy set,

μ_{D}^{'} (x)

, using the t-norm (minimum) operator shown in Equation (19).

μ_{D}^{'} (x) = min_{k} μ_{p_{k}} (x) .

(19)

Concurrently, the crisp output from the expert model,

C_{\exp} (M_{i})

, is also fuzzified into an expert-driven fuzzy set,

μ_{C_{\exp}} (x)

, using the Gaussian membership function shown in the following Equation (20):

μ_{C_{\exp}} (x) = exp (- \frac{{(x - C_{\exp} (M_{i}))}^{2}}{2 σ^{2}}),

(20)

where

σ

controls the uncertainty or ‘fuzziness’ of the expert estimate.

3.4.2. Weighted Aggregation of Fuzzy Sets

To intelligently combine these two fuzzy sets, the system first quantifies their degree of agreement. This is achieved by calculating the cosine similarity,

S (D_{i})

, between them, which serves as a robust measure of overlap in the fuzzy domain (Equation (21)). This similarity score is used to determine the relative weights for the final aggregation.

S (D_{i}) = \frac{\int_{X} μ_{D}^{'} (x) \cdot μ_{C_{\exp}} (x) d x}{\sqrt{\int_{X} {[μ_{D}^{'} (x)]}^{2} d x} \cdot \sqrt{\int_{X} {[μ_{C_{\exp}} (x)]}^{2} d x}} .

(21)

This agreement coefficient,

S (D_{i}) \in [0, 1]

, is then used to determine the relative weights of the data-driven evidence and the expert model’s estimate in the final assessment. The weights,

w_{D}

and

w_{\exp}

, are calculated using the sigmoidal function in Equation (22), which allows for a smooth transition, as follows:

w_{D} = \frac{1}{1 + e^{- k (S (D_{i}) - θ)}}; w_{\exp} = 1 - w_{D},

(22)

where k controls the steepness of the transition and

θ

is the inflection point (typically 0.5).

This ensures that when the data and the expert model are in high agreement, the result is reinforced; when they disagree, their contributions are balanced. The final aggregated fuzzy set for the defect’s criticality,

μ_{final} (x)

, is then computed as a weighted sum as follows:

μ_{final} (x) = w_{D} \cdot μ_{D}^{'} (x) + w_{\exp} \cdot μ_{C_{\exp}} (x) .

(23)

3.4.3. Defuzzification for a Final Criticality Score

The conclusive step in the framework is defuzzification, which converts the final fuzzy set

μ_{final} (x)

back into a single crisp numerical value. For this, we employ the centroid (or center of gravity) method, which is formulated in Equation (24).

C_{final} (D_{i}) = \frac{\int_{X} x \cdot μ_{final} (x) d x}{\int_{X} μ_{final} (x) d x} .

(24)

Equation (24) calculates the center of the area under the membership function, effectively providing a weighted average of all possible criticality values. The resulting score,

C_{final} (D_{i})

, represents the system’s comprehensive and integrated assessment of the defect’s severity. This continuous value is then normalized and mapped to the discrete 1–5 integer scale defined by the EPRI damage taxonomy, rendering it directly interpretable and actionable for O&M teams to prioritize repair and maintenance activities.

3.5. Experimental Setup

This section details the experimental environment used to evaluate the proposed cyber-physical framework, including the data management procedures, hardware platform, and the major software components employed throughout the study. To facilitate reproducibility, every piece of software is identified by name and version, accompanied by a bibliographic reference, and the key parameters of the hardware platform are reported.

3.5.1. Dataset Curation and Preprocessing

The experiments were based exclusively on two publicly available datasets to ensure the reproducibility of our findings. The first, AQUADA-GO [13], consists of high-resolution RGB videos captured during offshore inspections of small (approximately 2

M

W

) turbines. The second, the Thermal WTB Inspection dataset [14], contains onshore inspections conducted with a FLIR thermal camera, where each thermal frame is co-registered with an RGB image and enhanced by MSX. Table 1 provides a detailed summary of the datasets, including the total number of images, distinct inspection flights, and the distribution of annotated defects across the three primary classes.

To avoid bias caused by temporal correlation in video data, both datasets were split into training, validation, and test sets in an 80:10:10 ratio at the level of blade flights rather than individual frames. All frames associated with a single blade and flight were assigned to the same fold, ensuring that the test set contains inspections of turbines entirely unseen during training. For each training fold, we applied random spatial augmentations, scaling, cropping, and flips, followed by color and brightness jittering. These transformations were implemented using the torchvision v0.20.1 [46] library, chosen for its rich collection of augmentation primitives and GPU support.

3.5.2. Defect Detection and Fusion Pipeline

Detection of candidate defects relied on an ensemble of three YOLOv8n detectors. Each detector was implemented using the Ultralytics YOLOv8 framework (release 8.2) built on PyTorch v2.4.0 [47]. Training was performed for 100 epochs with the Adam optimizer, an initial learning rate of 0.001, and a cosine annealing schedule, with random seeds set globally to ensure reproducibility, as detailed in the Supplementary Materials. To assemble the detections from individual models, we employed the Weighted Boxes Fusion algorithm implemented in the Weighted-Boxes-Fusion v1.0.8 package [48]. Low-level image processing tasks such as bilateral filtering, CLAHE equalization, and contour extraction were carried out with OpenCV v4.10.0 [49].

During early development, we explored synthesizing a thermal channel for AQUADA-GO by fusing textural features extracted via the DWT. This simulation was implemented with PyWavelets v1.6.0 [50]. To validate the surrogate modality, we collected 50 paired RGB and LWIR frames with a handheld FLIR T840 camera and compared the simulated and true temperature difference maps using the Pearson correlation coefficient. The resulting r-value of 0.45, far below the 0.6 threshold recommended for modality substitution, led us to discard the simulated channel in favor of a dual-head architecture. In this design, the network backbone is shared between RGB and thermal branches, but the LWIR head is only activated when genuine thermal data are available. For the RGB-only AQUADA-GO dataset, the thermal branch was masked during training and inference. For these RGB-only cases, the ‘Thermal Signature’ input to the FIS was programmatically set to a crisp input of 0, corresponding to the ’Low’ fuzzy set, allowing the system to determine criticality based on defect size and location alone.

3.5.3. Hardware and Software Environment

All software was orchestrated using Python 3.12.3. Numerical operations were performed using NumPy v2.1.0 [51] and SciPy v1.14.1 [52], while tabular data handling was facilitated by pandas v2.2.3 [53]. Bootstrap resampling and statistical analyses were implemented with scikit-learn v1.5.1 [54] and visualized with Matplotlib v3.9.2 [55]. The FIS was implemented with scikit-fuzzy v0.5.0 [56].

The experiments were run on a workstation equipped with an Intel^® Core^TM i9-13900K CPU, 64 GiB of RAM, and an NVIDIA^® RTX 3090 GPU with 16 GiB of VRAM. The CUDA environment was provided by CUDA Toolkit v12.4.1 and cuDNN v8.9.7 The UAV mentioned in the context of our supplementary field case studies was a DJI M300 (SZ DJI Technology Co., Ltd., Shenzhen, China) fitted with a FLIR A65 camera (Teledyne FLIR LLC, Wilsonville, OR, USA) for radiometric thermal imaging and an RTK GNSS module (Emlid Tech Korlátolt Felelősségű Társaság, Budapest, Hungary) offering altitude measurements with 2

c

m

precision; this platform was used for validation purposes only and not for generating the primary training datasets. Photogrammetric calibration of the RGB camera was accomplished using OpenCV’s calibration routines.

3.5.4. Evaluation Protocol

Performance metrics were computed according to standard object-detection practice. Precision, recall, mean Average Precision at IoU threshold 0.5, class-wise

F_{1}

-scores, and the quadratic-weighted

κ

statistic were derived on the test set. To quantify the uncertainty of these estimates, we employed a BCa bootstrap with B = 10,000 resamples. Resampling was stratified by dataset and defect class, with complete blade flights treated as the unit of resampling to preserve temporal correlations. Confidence intervals were computed following the approach of Efron and Tibshirani, and all code for the bootstrap procedure is provided in our public repository.

The Fuzzy Inference System was validated against a ground-truth dataset created by a panel of three certified O&M engineers, as detailed in the Supplementary Materials. Each expert scored defects on the held-out set using the five-level EPRI taxonomy, and the median score served as the reference label. The system’s continuous outputs were compared with these labels using mean absolute error and the Pearson correlation coefficient. Implementation of the FIS drew upon the scikit-fuzzy library for membership function evaluation and defuzzification and was executed on the same workstation described above. All source code, trained model weights, and configuration files necessary to reproduce our experiments are available at our repository (see Data Availability Statement).

3.5.5. Ethical Considerations

Finally, the supplementary field data used for the case study were collected in accordance with all local and national aviation regulations. Flight permissions were obtained from the relevant airspace authorities, and explicit consent was secured from the wind farm operator. To protect privacy and commercial sensitivities, all imagery was processed to remove any identifying features of the site or personnel.

4. Results

This section presents a comprehensive empirical evaluation of the proposed framework, meticulously dissecting its performance from initial defect detection to final criticality assessment. The analysis is performed exclusively on two publicly available and diverse benchmarks: the large-scale, offshore AQUADA-GO dataset [13], featuring high-resolution video, and the Thermal WTB Inspection dataset [14], which provides fused RGB and radiometric thermal imagery. We first quantify the performance of the detection module, including visual analytics of its behavior. Subsequently, a series of ablation studies isolates the contribution of each system component. We then validate the fuzzy criticality system’s accuracy and calibration against expert-derived ground truth and conclude with a comparative analysis against state-of-the-art methods and an illustrative case study from field data.

4.1. Defect Detection Performance and Computational Profile

To establish the efficacy of our detection framework, we benchmarked our final model, a three-model YOLOv8n ensemble trained on fused multispectral data, against two systematically ablated baselines. Performance was evaluated on the combined held-out test sets from both datasets. The primary metrics, reported as the mean and 95% Bias-Corrected and accelerated (BCa) confidence interval derived from 10,000 bootstrap resamples, are summarized in Table 2. This uncertainty quantification provides a robust measure of the stability of our performance estimates. Paired bootstrap tests confirmed that the performance differences between all three configurations were statistically significant (p < 0.01) for both mAP@.5 and

F_{1}

-score.

The proposed ensemble using fused multispectral data demonstrates a clear and statistically significant performance advantage over the baseline configurations. The results unequivocally demonstrate a tiered improvement at each stage of methodological enhancement. The baseline RGB-only model establishes a respectable mAP@.5 of 81.7%. The introduction of the thermal data channel provides the most significant performance leap, boosting the mAP by 7.2 percentage points to 88.9%. This underscores the profound diagnostic value of the thermal spectrum for revealing defects, such as incipient delamination, that are often invisible in visual light. Finally, the application of three-model ensembling provides a further statistically significant refinement, achieving a final mAP of 92.8%. This multi-model approach effectively mitigates the prediction variance of individual models, leading to more robust and reliable detections across the diverse environmental conditions present in the datasets. Figure 2 illustrates Receiver Operating Characteristic (ROC) curves for the three defect classes.

The mean area under the curve (AUC) was 0.97 (see Figure 2), indicating strong discrimination between defective and non-defective regions. Following IEC operational risk metrics, we determined the confidence threshold that yields at most 0.05 false positives per image; at this operating point, the precision of the ensemble reached 93.5%, as reported in Table 2.

To assess the robustness of our models to domain shift, a critical concern when deploying models in varied real-world environments, we conducted cross-dataset experiments where a model trained on one dataset was evaluated on the other without any fine-tuning. The results are presented in Table 3.

As summarized in Table 3, the mAP dropped significantly to 76.4% when transferring from the offshore AQUADA-GO to the onshore Thermal WTB dataset, and to 84.2% in the reverse direction. These results provide a quantitative baseline for the domain gap and highlight the critical need for external validation and the development of domain adaptation strategies for operational deployment, a point we elaborate on in the Discussion.

Further visual analytics provide deeper insight into the system’s behavior, as shown in Figure 3. The per-frame inference latency, benchmarked on an NVIDIA RTX 3090 GPU over 5000 frames, is consistently low, with a mean of 118.4 ms and a standard deviation of 12.1 ms (Figure 3a), making the system suitable for high-throughput offline analysis. The model demonstrates robust performance across the three primary defect classes, with the

F_{1}

-scores shown with 95% confidence intervals in Figure 3b; performance is highest for ’Hotspot’ defects, which have a uniquely salient Thermal Signature. Finally, an analysis of the relationship between ensemble size and performance (Figure 3c) reveals that mAP@.5 gains begin to plateau after three to four models, justifying our choice of a three-model ensemble as an optimal balance between accuracy and computational cost.

4.2. Ablation Studies

To rigorously quantify the contribution of each component to the framework’s overall efficacy, we conducted a series of ablation studies, with results summarized in Table 4.The rationale for each scenario was to isolate and measure the impact of a core methodological choice on final system performance. Removing the thermal channel caused the most severe degradation, increasing the criticality mean absolute error (MAE) by 150% and confirming that multispectral data are the cornerstone of reliable assessment. The removal of ensembling also resulted in a significant 4.1-point drop in

F_{1}

-score. Simplifying the FIS from 27 to a more generalized 15 rules more than doubled the criticality MAE; this experiment was designed to test the necessity of a nuanced rule base to capture expert logic, and the significant performance drop justifies our use of the more comprehensive 27-rule set. Finally, the system showed high resilience to a simulated + 5°C thermal calibration drift, with the MAE increasing by only 0.04, demonstrating the robustness of using relative temperature differentials rather than absolute values.

4.3. Validation of the Fuzzy Criticality Assessment

The system demonstrated exceptional fidelity to expert judgment, achieving an overall MAE of 0.14. To account for the ordinal nature of the 1–5 scale, we also computed the quadratic-weighted Cohen’s

κ

, obtaining

κ = 0.89

(95% BCa confidence interval: 0.86–0.92), which indicates almost perfect agreement with the human panel. Class-wise

F_{1}

-scores for the five severity levels were [0.94, 0.91, 0.88, 0.90, 0.93], demonstrating balanced performance across the entire criticality spectrum. Importantly, there were zero instances in which a ground-truth rating of 5 was assigned a rating below 4 by the automated system. This absence of severe downgrades is a critical safety key performance indicator for any operational deployment. The Pearson correlation coefficient between the system’s continuous output and the experts’ median score remained high at

r = 0.97

(

p < 0.001

). The confusion matrix in Figure 4, generated by rounding the system’s continuous output to the nearest integer, shows a strong diagonal concentration, visually confirming the high level of agreement.

4.4. Reliability and Calibration of Criticality Scores

Beyond accuracy metrics like MAE, it is crucial for a decision-support system to produce well-calibrated and reliable outputs. A well-calibrated system’s confidence in its prediction should match its actual correctness. To evaluate this, we generated reliability diagrams for the five-level criticality output, as shown in Figure 5. The plots show that the predicted confidence for each class aligns closely with the observed accuracy, with all points lying near the diagonal identity line. The overall Expected Calibration Error (ECE) was low at 0.034, indicating that the system’s outputs are not only accurate but also trustworthy, a vital characteristic for high-stakes maintenance decisions.

4.5. Comparative Analysis and Field Case Study

As shown in Table 5, our proposed framework outperforms five recent state-of-the-art methods in wind turbine defect analysis across key metrics.

Our approach shows particular strength in the

F_{1}

-score, which we attribute to the synergistic effects of multispectral fusion and ensemble-based inference. Compared with the enhanced SSD model by Zhao et al. [22], our framework achieves a 1.8-point higher

F_{1}

-score, a gain primarily attributable to the superior robustness of the three-model YOLOv8 ensemble, which mitigates individual model variance and improves generalization across the diverse conditions found in the validation datasets.

The system’s real-world value is best illustrated by a case from our supplementary field data. During a routine inspection of turbine T-17B, the system flagged a ‘Hotspot’ on the generator housing with a criticality score of 5.0, corresponding to a ‘Severe’ rating. The thermal camera registered a significant anomaly with a

Δ T

of over 20 °C against its surroundings, while the RGB image appeared entirely normal. The automated alert triggered an immediate manual inspection, which revealed a critical internal fault in the cooling system. Maintenance logs indicated that this intervention occurred approximately 48 h before a scheduled component replacement, with engineers noting that a failure was imminent. This case provides a powerful vignette of the system’s value proposition: moving beyond simple surface inspection to preemptive, condition-based intervention that prevents catastrophic failures and costly unscheduled downtime.

5. Discussion

The empirical results presented in the preceding section strongly validate our central thesis: that an integrated framework marrying multispectral data, ensemble deep learning, and standards-aligned fuzzy logic can provide an accurate, transparent, and reliable solution for the automated criticality assessment of wind turbine defects. This section synthesizes these findings, contextualizes them within the broader landscape of scientific literature, addresses the inherent limitations of the study, and charts a course for future research.

5.1. Interpretation of Principal Findings

Our principal finding is that the synergy between the framework’s components creates a system far more capable than the sum of its parts. The final mAP of 92.8% for defect detection (Table 2) represents a state-of-the-art result, outperforming recent benchmarks as shown in Table 5. The ablation study (Table 4) reveals the critical importance of data fusion. The 10.1-point drop in

F_{1}

-score upon removing the thermal channel is not merely an incremental decline; it signifies a fundamental loss of diagnostic capability. It is this channel that enables the system to ‘see’ into the subsurface of the blade, identifying thermal anomalies indicative of delamination, moisture ingress, or bonding failures—defects that are often precursors to the most catastrophic failure modes and are completely undetectable by RGB-only systems like those in [34,35]. This strongly corroborates the findings of [14] and extends the principle to a broader range of data types, including large-scale offshore video. Another practical consideration is the handling of negative classes. Precision at high recall is critical for O&M workflows because false-positive alarms can overload maintenance queues. Our ROC analysis (Figure 2) and the Precision@FPPI ≤ 0.05 metric reported in Table 2 demonstrate that the ensemble maintains high specificity even when the false-positive rate is constrained to a level compatible with IEC operational risk metrics.

The core novelty of this research is the transparent FIS. Achieving a criticality MAE of 0.14 and a Pearson correlation of 0.97 against expert ratings is a powerful demonstration that the nuanced, experience-based reasoning of human engineers can be successfully encapsulated within a formal, repeatable, and scalable computational system. This addresses a major gap in the literature, which has focused heavily on detection accuracy but has largely neglected the subsequent, more crucial step of severity assessment. The explicit grounding of our FIS design in the IEC 61400-5 standard is what sets this work apart (see Table A3 in Appendix A). While other studies have used fuzzy logic [42], they have not established this formal, auditable link to internationally recognized engineering principles. This creates a ‘glass-box’ model where any assessment can be interrogated: an operator can see not only the final score but also precisely which rules (e.g., ‘the rule concerning large defects in high-stress zones’) were triggered and with what intensity, as detailed in Table A3 in Appendix A. This constitutes a significant contribution to the field of explainable AI for industrial applications, building operator trust in a way that opaque black-box models cannot.

5.2. Practical Implications for Condition-Based Maintenance

The practical ramifications of this framework for wind farm operators are profound. It provides the core enabling technology for a genuine CBM strategy, a long-sought goal in the industry. Instead of relying on fixed, time-based inspection schedules or reacting to failures, operators can continuously assess asset health and make data-driven decisions. By providing a prioritized list of defects with objective severity scores, the system allows O&M managers to allocate finite resources (technician time, equipment, and budget) with maximum efficiency, as illustrated by the T-17B field case. High-severity defects posing imminent risks can be addressed immediately, while low-severity issues can be scheduled for repair during planned downtime, minimizing lost production and maximizing annual energy production.

Furthermore, the standardized and repeatable nature of the assessment creates a consistent digital audit trail of a turbine’s health over its life cycle. These longitudinal data are invaluable for tracking defect propagation rates, validating repair effectiveness, informing future blade designs, and providing objective evidence for insurance claims or asset transfers. While the current FIS is static, these historical data provide the foundation for future extensions, such as dynamic or recurrent models capable of forecasting defect evolution. While a full-scale deployment would generate terabytes of raw image data per inspection campaign, the framework’s output is a compact, actionable list of annotated defects (e.g., a few kilobytes in a structured format like CSV or JSON), drastically reducing the cognitive and data load on human analysts and aligning with standard industrial data management protocols.

5.3. Limitations and Threats to Validity

Despite the promising results, academic integrity demands a frank acknowledgment of the study’s limitations and threats to its validity, which also serve as a valuable roadmap for future research.

A primary limitation concerns domain shift and the external validity of our findings. The AQUADA-GO dataset originates from offshore inspections of relatively small (≤2

M

W

) turbines in marine climates, whereas the Thermal WTB Inspection data come from onshore sites with mild, temperate conditions. As a result, the current models may not generalize perfectly to utility-scale ≥ 5

M

W

platforms or to radically different operating environments.Our cross-dataset evaluation (Table 3) provides quantitative evidence of this challenge, showing a performance drop of up to 15% in mAP.This degradation likely stems from a combination of factors, including differences in ambient lighting, atmospheric haze common in offshore environments, sensor characteristics between the different UAV platforms, and subtle variations in defect morphology between the smaller offshore turbines and the larger onshore models.Figure 6 visualizes this covariate shift using t-SNE, showing that the feature embeddings from the two datasets form distinct clusters.

A second limitation, affecting construct validity, is that our framework currently operates on 2D imagery. This inherently limits the analysis, as it cannot directly measure defect depth or volume, which are critical parameters for assessing certain types of damage like erosion or gouges. We also note that our initial attempt to simulate a thermal channel for AQUADA-GO using DWT fusion yielded only moderate correlation (

r = 0.45

) with actual LWIR measurements. Based on this result, we did not pursue other simulation approaches, reinforcing the principle that modality fusion should be grounded in physically meaningful data rather than synthetic proxies.

5.4. Future Research Directions

The limitations identified above naturally chart a course for future research. The clear next step to address the 2D imagery constraint is integration with 3D digital twins. Localizing defects on a 3D mesh, created using SfM or MVS on the UAV video stream, would enable far richer parameterization (e.g., true area on a curved surface, volume of material loss, and geodesic distance between defects) and a more sophisticated structural assessment. Furthermore, to address the critical challenge of domain shift, future work must focus on mitigating this domain gap through domain adaptation techniques, such as correlation alignment [57] or feature-space augmentation [58]. This must be coupled with an expansion of training data to include more diverse and challenging scenarios, such as (i) large-scale offshore turbines (≥5 MW), (ii) ice-prone Nordic installations where low-temperature icing can alter spectral signatures, and (iii) desert sites experiencing severe leading-edge erosion from sand abrasion. Developing domain-adaptive models trained on such a comprehensive dataset is essential for building a truly universal inspection tool.

Finally, while the system is efficient for offline analysis, real-time, on-board inference remains a challenge. Our current inference time (mean of 118.4 ms; see Figure 3a) is the primary bottleneck and is too high for power-constrained edge devices, corresponding to a throughput of approximately 8.4 frames per second.Future research must therefore focus on model optimization to create lightweight yet accurate models suitable for deployment on platforms like the NVIDIA Jetson series. Key avenues to explore include network quantization (e.g., converting model weights to 8-bit integers), structured pruning (removing redundant network weights and channels), and knowledge distillation (training a smaller ’student’ model to mimic the predictive behavior of our larger, more accurate ’teacher’ ensemble). Moreover, a further avenue of research is the multimodal fusion with Supervisory Control and Data Acquisition data, as we did for enhancing fire hazard detection in solar power plants in our joint work [59]. Correlating a visually detected defect with anomalous vibration signals or a drop in power output would provide the most complete picture of an asset’s health, moving the field towards a truly holistic, system-level understanding of wind turbine integrity.

6. Conclusions

In this study, we have designed, implemented, and rigorously validated a comprehensive end-to-end framework for the automated criticality assessment of wind turbine defects, directly addressing a pivotal challenge in the sustainable management of wind energy assets. Our holistic methodology successfully integrates the diagnostic power of multispectral UAV-acquired data, the pattern recognition capabilities of an ensemble of deep learning detectors, and the transparent reasoning of a knowledge-based Fuzzy Inference System. By validating our system on two diverse public datasets, we have demonstrated its robustness and reproducibility. Our optimized multispectral detection framework achieved a state-of-the-art mAP@.5 of 92.8%, with ablation studies confirming that the fusion of visual and thermal data is the single most critical factor for high performance.

The core scientific contribution is our novel 27-rule Mamdani-type FIS, whose ‘glass-box’ design is explicitly grounded in the engineering principles of the IEC 61400-5 standard. This system’s five-level criticality output shows exceptional agreement with assessments from certified human engineers, validated by a low MAE of 0.14, a high quadratic-weighted

κ

of 0.89, and excellent calibration. The framework provides operators with a powerful decision-support tool, enabling a paradigm shift from reactive to predictive maintenance. This directly enhances operational safety, optimizes maintenance expenditure, and increases asset availability, thereby supporting Sustainable Development Goal 7 by improving the economic viability and reliability of wind power.

While acknowledging current limitations, our work establishes a clear roadmap for future research focusing on three key areas: enhancing model generalization through domain adaptation techniques to mitigate performance drops in new environments, integrating 3D digital twins for more sophisticated geometric analysis, and optimizing models for efficient on-board edge deployment to enable real-time autonomous inspection capabilities. These advancements will pave the way for the next generation of intelligent structural health monitoring systems.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/en18174523/s1, File Supplementary Material: Detailed formulation of the expert-driven mathematical models for the criticality of classes ‘Crack,’ ‘Erosion,’ and ‘Hotspot,’ including tables of component-specific weighting coefficients derived from engineering standards.

Author Contributions

Conceptualization, P.R., B.R., and A.S.; methodology, P.R., O.S., and O.M.; software, S.S. and O.M.; validation, P.R., O.M., S.S., and T.P.; formal analysis, A.S., O.S., and B.R.; investigation, P.R., S.S., and O.M.; resources, A.S., B.R., and T.P.; data curation, P.R., B.R., T.P., and O.S.; writing—original draft, P.R., O.M., S.S., and O.S.; writing—review and editing, A.S., B.R., and T.P.; visualization, P.R., O.M., and S.S.; supervision, A.S. and B.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Education and Science of Ukraine, state grant registration number 0124U004665, project title ‘Intelligent System for Recognizing Defects in Green Energy Facilities Using UAVs.’ This publication reflects the views of the authors only; the Ministry of Education and Science of Ukraine cannot be held responsible for any use of the information contained herein.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this study are publicly available. The AQUADA-GO dataset [13] is available at https://data.mendeley.com/datasets/9rcf5p89zn/1 (accessed on 15 August 2025). The Thermal WTB Inspection dataset [14] is available at https://github.com/MoShekaramiz/Small-WTB-Thermal1 (accessed on 15 August 2025). The source code, trained model weights, configuration files, and a version-controlled release of the software are available on GitHub (https://github.com/sOsvystun/UAV/tree/main (accessed on 15 August 2025).

Acknowledgments

The authors are grateful to the Ministry of Education and Science of Ukraine for its support in conducting this study. We extend our sincere thanks to the research teams at the Technical University of Denmark (DTU) and Utah Valley University for creating and publicly releasing the AQUADA-GO and Thermal WTB Inspection datasets, respectively. Their commitment to open science was instrumental to this work.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

AI	Artificial Intelligence
AUC	Area Under the Curve
BCa	Bias-Corrected and accelerated
CBM	Condition-Based Maintenance
CLAHE	Contrast Limited Adaptive Histogram Equalization
DWT	Discrete Wavelet Transform
ECE	Expected Calibration Error
EPRI	Electric Power Research Institute
FIS	Fuzzy Inference System
FPPI	False Positives per Image
GNSS	Global Navigation Satellite System
IEC	International Electrotechnical Commission
LWIR	Long-Wave Infrared
MAE	Mean Absolute Error
mAP	Mean Average Precision
MCE	Maximum Calibration Error
MSX	Multi-Spectral Dynamic Imaging
O&M	Operations and Maintenance
RGB	Red–Green–Blue
ROC	Receiver Operating Characteristic
ROI	Region of Interest
RTK	Real-Time Kinematic
t-SNE	t-Distributed Stochastic Neighbor Embedding
UAV	Unmanned Aerial Vehicle
WTB	Wind Turbine Blade
YOLO	You Only Look Once

Appendix A. Fuzzy Inference System Details

This appendix provides supplementary details for the Fuzzy Inference System (FIS), ensuring full transparency and reproducibility of the criticality assessment module. Table A1 presents the complete 27-rule matrix that forms the core of the system’s knowledge base. Table A2 provides the exact parameters used for all membership functions, which were derived via a hybrid expert- and data-driven process as described in the Supplementary Materials. Figure A1 shows the results of a global sensitivity analysis, confirming the system’s robustness to parameter variations. Table A3 provides further illustrative examples of how specific fuzzy rules are explicitly linked to the engineering principles and failure mode considerations derived from IEC standards, grounding the AI system in established domain knowledge.

Table A1. The complete 27-rule matrix for the Mamdani FIS. The table shows the logical mapping from combinations of fuzzified inputs (‘Defect Size,’ ‘Location,’ and ‘Thermal Signature’) to an output ‘Criticality’ level.

IF Defect Size Is	AND Location Is	AND Thermal Signature Is	THEN Criticality Is
Large	Blade Root	High	Severe
		Medium	Severe
		Low	Severe
	Mid-span	High	Severe
		Medium	High
		Low	High
	Blade Tip	High	High
		Medium	Medium
		Low	Medium
Medium	Blade Root	High	Severe
		Medium	High
		Low	High
	Mid-span	High	High
		Medium	Medium
		Low	Low
	Blade Tip	High	Medium
		Medium	Low
		Low	Low
Small	Blade Root	High	High
		Medium	Medium
		Low	Low
	Mid-span	High	Medium
		Medium	Low
		Low	Negligible
	Blade Tip	High	Low
		Medium	Negligible
		Low	Negligible

Table A2. Membership function parameters for the FIS. The trapezoidal parameters (

a, b, c, d

) define the support and core of each fuzzy set. Parameters for ‘Defect Size’ are in metric units (

{mm}^{2}

) to ensure scale invariance. All parameters were derived by expert elicitation combined with fitting to empirical quantiles of the training data, as detailed in the Supplementary Materials.

Table A2. Membership function parameters for the FIS. The trapezoidal parameters (

a, b, c, d

) define the support and core of each fuzzy set. Parameters for ‘Defect Size’ are in metric units (

{mm}^{2}

) to ensure scale invariance. All parameters were derived by expert elicitation combined with fitting to empirical quantiles of the training data, as detailed in the Supplementary Materials.

Input Variable	Linguistic Term	a	b	c	d
Defect Size ( ${mm}^{2}$ )	Small	0	0	50	100
	Medium	50	100	400	500
	Large	400	500	1000	1000
Thermal Signature ( $Δ T$ in °C)	Low	0	0	2	4
	Medium	3	5	8	10
	High	9	12	25	25

Figure A1. Sensitivity analysis of the Fuzzy Inference System. The curve illustrates how the MAE varies when all membership function breakpoint parameters are perturbed jointly by a factor ranging from −20% to +20%. The MAE remains below 0.18 across the entire tested range, indicating that the FIS is robust to moderate parameter variations.

Table A3. Illustrative mapping of specific fuzzy rules to engineering principles derived from IEC 61400-5/23. This demonstrates how the rule base is grounded in established safety and structural integrity standards.

Rule ID	Fuzzy Rule Summary	Corresponding IEC 61400 Principle/Rationale
1, 2, 3	A large defect at the blade root is always ‘Severe,’ regardless of its Thermal Signature.	Aligns with IEC 61400-5 [11] requirements for fatigue life analysis and damage tolerance. The blade root is the area of maximum bending moment and stress concentration. Any significant structural flaw in this region has the highest probability of catastrophic propagation.
4, 10, 19	Any defect with a High Thermal Signature at the blade root is at least ‘High’ or ‘Severe.’	Relates to IEC 61400-23 [60] (full-scale structural testing). A significant thermal anomaly indicates a potential subsurface failure (e.g., delamination, adhesive disbond). When located in the highest stress region, this combination represents a critical risk of structural failure from within.
9, 18, 27	A defect at the blade tip with a low Thermal Signature is rated as ‘Medium’ or ‘Low.’	The blade tip experiences the lowest structural loads but the highest aerodynamic velocities. Defects here are less critical from a structural failure perspective but can impact aerodynamic efficiency and noise. The criticality is therefore downgraded compared with the root.
24, 27	A small non-thermal defect away from the root is considered ‘Negligible.’	Reflects practical maintenance triage. Small superficial flaws in low-stress areas do not compromise the blade’s integrity and typically only require monitoring during the next inspection cycle, rather than immediate intervention.

References

Sun, S.; Wang, T.; Chu, F. In-situ condition monitoring of wind turbine blades: A critical and systematic review of techniques, challenges, and futures. Renew. Sustain. Energy Rev. 2022, 160, 112326. [Google Scholar] [CrossRef]
Kong, K.; Dyer, K.; Payne, C.; Hamerton, I.; Weaver, P.M. Progress and trends in damage detection methods, maintenance, and data-driven monitoring of wind turbine blades—A review. Renew. Energy Focus 2023, 44, 390–412. [Google Scholar] [CrossRef]
Aminzadeh, A.; Dimitrova, M.; Meiabadi, S.M.; Sattarpanah Karganroudi, S.; Taheri, H.; Ibrahim, H.; Wen, Y. Non-contact inspection methods for wind turbine blade maintenance: Techno-economic review of techniques for integration with industry 4.0. J. Nondestruct. Eval. 2023, 42, 54. [Google Scholar] [CrossRef]
Zhang, S.; He, Y.; Gu, Y.; He, Y.; Wang, H.; Wang, H.; Yang, R.; Chady, T.; Zhou, B. UAV based defect detection and fault diagnosis for static and rotating wind turbine blade: A review. Nondestruct. Test. Eval. 2024, 40, 1691–1729. [Google Scholar] [CrossRef]
Banafaa, M.K.; Pepeoğlu, Ö.; Shayea, I.; Alhammadi, A.; Shamsan, Z.A.; Razaz, M.A.; Alsagabi, M.; Al-Sowayan, S. A comprehensive survey on 5G-and-beyond networks with UAVs: Applications, emerging technologies, regulatory aspects, research trends and challenges. IEEE Access 2024, 12, 7786–7826. [Google Scholar] [CrossRef]
Yan, X.; Wu, G.; Zuo, Y. YOLOV4-based wind turbine blade crack defect detection. In Proceedings of the IncoME-VI and TEPEN 2021, Tianjin, China, 20–23 October 2021; Springer International Publishing: Cham, Switzerland, 2022; pp. 293–305. [Google Scholar] [CrossRef]
Dai, Z. Image acquisition technology for unmanned aerial vehicles based on YOLO—Illustrated by the case of wind turbine blade inspection. Syst. Soft Comput. 2024, 6, 200126. [Google Scholar] [CrossRef]
Mao, Y.; Wang, S.; Yu, D.; Zhao, J. Automatic image detection of multi-type surface defects on wind turbine blades based on cascade deep learning network. Intell. Data Anal. 2021, 25, 463–482. [Google Scholar] [CrossRef]
Sun, T.; Yu, G.; Gao, M.; Zhao, L.; Bai, C.; Yang, W. Fault diagnosis methods based on machine learning and its applications for wind turbines: A review. IEEE Access 2021, 9, 147481–147511. [Google Scholar] [CrossRef]
Ogaili, A.A.F.; Jaber, A.A.; Hamzah, M.N. A methodological approach for detecting multiple faults in wind turbine blades based on vibration signals and machine learning. Curved Layer. Struct. 2023, 10, 20220214. [Google Scholar] [CrossRef]
Technical Report IEC 61400-5:2020; Wind Energy Generation Systems—Part 5: Wind Turbine Blades. International Electrotechnical Commission: Geneva, Switzerland, 2020. Available online: https://webstore.iec.ch/publication/33236 (accessed on 15 August 2025).
Electric Power Research Institute. A White Paper on Wind Turbine Blade Defect and Damage Categorization: Current State of the Industry. Technical Report 3002019669, Electric Power Research Institute. 2020. Available online: https://www.epri.com/research/products/000000003002019669 (accessed on 15 August 2025).
Chen, X.; Jia, X. Dataset for AI-based optical-thermal video data fusion for near real-time blade segmentation in normal wind turbine operation. Eng. Appl. Artif. Intell. 2024, 127, 107325. [Google Scholar] [CrossRef]
Memari, M.; Shekaramiz, M.; Masoum, M.A.S.; Seibi, A.C. Data fusion and ensemble learning for advanced anomaly detection using multi-spectral RGB and thermal imaging of small wind turbine blades. Energies 2024, 17, 673. [Google Scholar] [CrossRef]
Svystun, S.; Scislo, L.; Pawlik, M.; Melnychenko, O.; Radiuk, P.; Savenko, O.; Sachenko, A. DyTAM: Accelerating wind turbine inspections with dynamic UAV trajectory adaptation. Energies 2025, 18, 1823. [Google Scholar] [CrossRef]
Yang, C.; Liu, X.; Zhou, H.; Ke, Y.; See, J. Towards accurate image stitching for drone-based wind turbine blade inspection. Renew. Energy 2023, 203, 267–279. [Google Scholar] [CrossRef]
Svystun, S.; Melnychenko, O.; Radiuk, P.; Savenko, O.; Sachenko, A.; Lysyi, A. Thermal and RGB images work better together in wind turbine damage detection. Int. J. Comput. 2024, 23, 526–535. [Google Scholar] [CrossRef]
Zhou, W.; Wang, Z.; Zhang, M.; Wang, L. Wind turbine actual defects detection based on visible and infrared image fusion. IEEE Trans. Instrum. Meas. 2023, 72, 3509208. [Google Scholar] [CrossRef]
Rizk, P.; Rizk, F.; Sattarpanah Karganroudi, S.; Ilinca, A.; Younes, R.; Khoder, J. Advanced wind turbine blade inspection with hyperspectral imaging and 3D convolutional neural networks for damage detection. Energy AI 2024, 16, 100366. [Google Scholar] [CrossRef]
Chen, X.; Sheiati, S.; Shihavuddin, A. AQUADA PLUS: Automated damage inspection of cyclic-loaded large-scale composite structures using thermal imagery and computer vision. Compos. Struct. 2023, 318, 117085. [Google Scholar] [CrossRef]
Zhang, H.; Xu, H.; Tian, X.; Jiang, J.; Ma, J. Image fusion meets deep learning: A survey and perspective. Inf. Fusion 2021, 76, 323–336. [Google Scholar] [CrossRef]
Zhao, J.; Zhang, R.; Chen, S.; Duan, Y.; Wang, Z.; Li, Q. Enhanced infrared defect detection for UAVs using wavelet-based image processing and channel attention-integrated SSD model. IEEE Access 2024, 12, 188787–188796. [Google Scholar] [CrossRef]
Jia, X.; Chen, X. AI-based optical-thermal video data fusion for near real-time blade segmentation in normal wind turbine operation. Eng. Appl. Artif. Intell. 2024, 127, 107325. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
Quan, T.M.; Hildebrand, D.G.C.; Jeong, W.K. FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics. Front. Comput. Sci. 2021, 3, 613981. [Google Scholar] [CrossRef]
Pan, Y.; Hong, R.; Chen, J.; Singh, J.; Jia, X. Performance degradation assessment of a wind turbine gearbox based on multi-sensor data fusion. Mech. Mach. Theory 2019, 137, 509–526. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE Inc.: New York, NY, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; IEEE Inc.: New York, NY, USA, 2018; pp. 6154–6162. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 21–26 July 2017; IEEE Inc.: New York, NY, USA, 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Diaz, P.M.; Tittus, P. Fast detection of wind turbine blade damage using Cascade Mask R-DSCNN-aided drone inspection analysis. Signal Image Video Process. 2023, 17, 2333–2341. [Google Scholar] [CrossRef]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; IEEE Inc.: New York, NY, USA, 2020; pp. 10778–10787. [Google Scholar] [CrossRef]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 31 May 2025).
He, Y.; Niu, X.; Hao, C.; Li, Y.; Kang, L.; Wang, Y. An adaptive detection approach for multi-scale defects on wind turbine blade surface. Mech. Syst. Signal Process. 2024, 219, 111592. [Google Scholar] [CrossRef]
Liu, C.; An, C.; Yang, Y. Wind turbine surface defect detection method based on YOLOv5s-L. NDT 2023, 1, 46–57. [Google Scholar] [CrossRef]
Zhao, Z.; Li, T. Enhancing wind turbine blade damage detection with YOLO-Wind. Sci. Rep. 2025, 15, 18667. [Google Scholar] [CrossRef]
Dietterich, T.G. Ensemble methods in machine learning. In Multiple Classifier Systems; Kittler, J., Roli, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar] [CrossRef]
Zhou, Z.H. Ensemble Methods: Foundations and Algorithms, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2025. [Google Scholar] [CrossRef]
Al-Agha, O.; Al-Natsheh, A.; Al-Sari, I.; Al-Zoubi, E.; Bani-Hani, D. Artificial intelligence in wind turbine fault detection and diagnosis: A comprehensive review and future directions. Energies 2025, 18, 1680. [Google Scholar] [CrossRef]
Carmona-Troyo, J.A.; Trujillo, L.; Enríquez-Zárate, J.; Hernandez, D.E.; Cárdenas-Florido, L.A. Classification of damage on wind turbine blades using automatic machine learning and pressure coefficient. Expert Syst. 2025, 42, e70024. [Google Scholar] [CrossRef]
Svystun, S.; Melnychenko, O.; Radiuk, P.; Lysyi, A.; Sachenko, A. Determining the criticality assessment of defects on wind turbine blades using fuzzy logic. In Proceedings of the 6th International Workshop on Intelligent Information Technologies & Systems of Information Security (IntelITSIS 2025), Khmelnytskyi, Ukraine, 4 April 2025; Hovorushchenko, T., Savenko, O., Popov, P.T., Lysenko, S., Eds.; CEUR-WS: Aachen, Germany, 2024; Volume 3963, pp. 351–362. Available online: https://ceur-ws.org/Vol-3963/paper28.pdf (accessed on 15 August 2025).
Dubchak, L.; Sachenko, A.; Bodyanskiy, Y.; Wolff, C.; Vasylkiv, N.; Brukhanskyi, R.; Kochan, V. Adaptive neuro-fuzzy system for detection of wind turbine blade defects. Energies 2024, 17, 6456. [Google Scholar] [CrossRef]
Vladov, S.; Scislo, L.; Sokurenko, V.; Muzychuk, O.; Vysotska, V.; Sachenko, A.; Yurko, A. Helicopter turboshaft engines’ gas generator rotor R.P.M. neuro-fuzzy on-board controller development. Energies 2024, 17, 4033. [Google Scholar] [CrossRef]
Kozlov, O. Information technology for designing rule bases of fuzzy systems using ant colony optimization. Int. J. Comput. 2021, 20, 471–486. [Google Scholar] [CrossRef]
Zhang, Z. A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef]
TorchVision Maintainers and Contributors. TorchVision: PyTorch’s Computer Vision Library. 2016. Available online: https://github.com/pytorch/vision (accessed on 15 August 2025).
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8024–8035. [Google Scholar]
ZFTurbo. Weighted-Boxes-Fusion. 2022. Available online: https://github.com/ZFTurbo/Weighted-Boxes-Fusion (accessed on 15 August 2025).
Bradski, G. The OpenCV library. Dr. Dobb’s J. Softw. Tools Prof. Program. 2000, 25, 120–125. [Google Scholar]
Lee, G.R.; Gommers, R.; Waselewski, F.; Wohlfahrt, K.; O’Leary, A. PyWavelets: A Python package for wavelet analysis. J. Open Source Softw. 2019, 4, 1237. [Google Scholar] [CrossRef]
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
McKinney, W. Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (SciPy 2010), Austin, TX, USA, 28 June–3 July 2010; pp. 51–56. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
Warner, J.; Sexauer, J.; Van den Broeck, W.; Kinoshita, B.P.; Balinski, J.; Scikit-Fuzzy; Clauss, C.; Twmeggs; Alexsavio; Unnikrishnan, A.; et al. JDWarner/scikit-fuzzy: Scikit-Fuzzy 0.5.0 (Version 0.5.0) [Software]; Zenodo: Geneva, Switzerland, 2024. [Google Scholar] [CrossRef]
Sun, B.; Saenko, K. Deep CORAL: Correlation alignment for deep domain adaptation. In Proceedings of the Computer Vision—ECCV 2016 Workshops, Amsterdam, The Netherlands, 8–10 and 15–16 October 2016; Hua, G., Jégou, H., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 443–450. [Google Scholar] [CrossRef]
Zhou, K.; Yang, Y.; Qiao, Y.; Xiang, T. MixStyle neural networks for domain generalization and adaptation. Int. J. Comput. Vis. 2023, 132, 822–836. [Google Scholar] [CrossRef]
Lysyi, A.; Sachenko, A.; Radiuk, P.; Lysyi, M.; Melnychenko, O.; Ishchuk, O.; Savenko, O. Enhanced fire hazard detection in solar power plants: An integrated UAV, AI, and SCADA-based approach. Radioelectron. Comput. Syst. 2025, 2025, 99–117. [Google Scholar] [CrossRef]
Technical Report IEC 61400-23:2014; Wind Turbines—Part 23: Full-Scale Structural Testing of Rotor Blades. International Electrotechnical Commission: Geneva, Switzerland, 2014. Available online: https://webstore.iec.ch/publication/5436 (accessed on 15 August 2025).

Figure 1. High-level architecture of the proposed framework. Block 1 ingests multispectral data (

x_{RGB}, x_{TH}

) and metadata (

h, L

), using a YOLOv8 ensemble to detect ROIs and an image processing pipeline to extract normalized, photogrammetrically scaled features (

m = Z p / f

). Block 2 encapsulates domain expertise within a 27-rule Mamdani fuzzy base, utilizing defect-specific membership functions (

μ_{c}

) to generate preliminary criticality estimates (

C_{\exp} (M_{c})

). Block 3 serves as the decision core, fusing (

z = f_{fuse}

) the data-driven and knowledge-based inputs. It then performs aggregation and centroid defuzzification to produce a final, calibrated criticality score (

C_{final}

) with an associated Expected Calibration Error (ECE).

Figure 1. High-level architecture of the proposed framework. Block 1 ingests multispectral data (

x_{RGB}, x_{TH}

) and metadata (

h, L

), using a YOLOv8 ensemble to detect ROIs and an image processing pipeline to extract normalized, photogrammetrically scaled features (

m = Z p / f

). Block 2 encapsulates domain expertise within a 27-rule Mamdani fuzzy base, utilizing defect-specific membership functions (

μ_{c}

) to generate preliminary criticality estimates (

C_{\exp} (M_{c})

). Block 3 serves as the decision core, fusing (

z = f_{fuse}

) the data-driven and knowledge-based inputs. It then performs aggregation and centroid defuzzification to produce a final, calibrated criticality score (

C_{final}

) with an associated Expected Calibration Error (ECE).

Figure 2. ROC curves for the three defect classes: ‘Crack,’ ‘Erosion,’ and ‘Hotspot’.

Figure 3. Visual analytics of the detection module: (a) per-frame latency histogram; (b) class-wise

F_{1}

-scores with 95% BCa confidence intervals (CIs); and (c) heatmap showing the relationship between ensemble size, confidence threshold, and resulting mAP@.5, illustrating that performance gains diminish beyond three models.

Figure 3. Visual analytics of the detection module: (a) per-frame latency histogram; (b) class-wise

F_{1}

-scores with 95% BCa confidence intervals (CIs); and (c) heatmap showing the relationship between ensemble size, confidence threshold, and resulting mAP@.5, illustrating that performance gains diminish beyond three models.

Figure 4. Confusion matrix for the five-level criticality assessment, comparing the system’s rounded output with the expert-assigned ground-truth ratings. The high concentration of values along the main diagonal indicates excellent agreement across all severity levels. A color bar has been added to provide a scale for the cell counts.

Figure 5. Reliability diagrams for the five discrete criticality levels. Curves show empirical accuracy vs. predicted probability in 10 equal-width bins with 95% bootstrap confidence intervals. The diagonal line denotes perfect calibration. Expected Calibration Error (ECE) and Maximum Calibration Error (MCE) are reported in the legend; see Section 4 for metric definitions.

Figure 6. t-SNE projection (perplexity = 35, random_state = 42) of multispectral feature embeddings highlighting domain shifts between AQUADA-GO (RGB) and Thermal WTB (RGB-T). Points are colored by acquisition domain; ellipses show 95% covariance estimates.

Table 1. Dataset card and split summary. The table provides counts of images (frames), distinct blades (flights), and annotated defects per class for the two public datasets used in this study. The final row shows the total counts for the combined dataset used for training and evaluation.

Dataset	Images	Blades	Cracks	Erosion	Hotspots
AQUADA-GO (RGB)	15,420	24	850	1230	–
Thermal WTB (RGB-T)	3850	12	320	450	210
Total Combined	19,270	36	1170	1680	210

Table 2. Comparative defect detection performance on the combined test sets. Results are reported as mean ± 95% Bias-Corrected and accelerated (BCa) confidence interval. Best results are shown in bold. The final column reports precision at an operating point that yields ≤ 0.05 false positives per image (FPPI).

Model Configuration	Precision (%)	Recall (%)	$F_{1}$ -Score (%)	mAP@.5 (%)	Precision@FPPI ≤ 0.05 (%)
Single YOLOv8 (RGB Only)	82.5 ± 1.8	79.1 ± 2.0	80.8 ± 1.9	81.7 ± 1.7	78.4
Single YOLOv8 (Multispectral)	89.1 ± 1.4	87.3 ± 1.5	88.2 ± 1.4	88.9 ± 1.3	89.5
Proposed Ensemble (Multispectral)	93.2 ± 1.0	91.5 ± 1.1	92.3 ± 1.0	92.8 ± 0.9	93.5

Table 3. Cross-dataset evaluation to quantify domain shift. Models were trained on one dataset and evaluated on the other without fine-tuning, revealing the performance degradation that occurs when models encounter out-of-distribution data.

Training Dataset	Test Dataset	mAP@.5 (%)	$F_{1}$ -Score (%)
AQUADA-GO (RGB)	Thermal WTB	76.4	74.1
Thermal WTB (RGB–T)	AQUADA-GO	84.2	81.8

Table 4. Detailed ablation studies quantifying the impact of removing key system components on final performance. The degradation in metrics, shown relative to the baseline performance of the full framework, highlights the critical contribution of each part to the overall system efficacy.

Ablation Scenario	Affected Module	Primary Metric	Baseline	Ablated	Impact Analysis ( $Δ$ and % Change)
Thermal Channel Removed (RGB only)	Criticality Assessment	Criticality MAE	0.14	0.35	$+ 0.21$ ( $+ 150.0$ %): Loss of thermal data catastrophically degrades severity assessment.
Thermal Channel Removed (RGB only)	Defect Detection	$F_{1}$ -score (%)	92.3	82.2	$- 10.1$ pts: Confirms thermal data are crucial for robust detection of multiple defect types.
Ensemble Learning Removed (Single Model)	Defect Detection	$F_{1}$ -score (%)	92.3	88.2	$- 4.1$ pts: Demonstrates that ensembling provides a significant boost in accuracy and robustness.
Fuzzy Rule Count Reduced (27 → 15 rules)	Criticality Assessment	Criticality MAE	0.14	0.29	$+ 0.15$ ( $+ 107.1$ %): A comprehensive nuanced rule base is essential to accurately model expert logic.
Simulated Thermal Drift ( $+ 5$ °C)	Criticality Assessment	Criticality MAE	0.14	0.18	$+ 0.04$ ( $+ 28.6$ %): System shows high resilience due to its reliance on relative, not absolute, temperature.

Table 5. Comparative analysis of the proposed framework against five state-of-the-art defect detection methods. Our framework demonstrates superior performance, particularly in the balanced

F_{1}

-score and overall mAP. Note: Results for other methods are as reported in their respective publications; training datasets and protocols may vary. Best results are shown in bold.

Table 5. Comparative analysis of the proposed framework against five state-of-the-art defect detection methods. Our framework demonstrates superior performance, particularly in the balanced

F_{1}

-score and overall mAP. Note: Results for other methods are as reported in their respective publications; training datasets and protocols may vary. Best results are shown in bold.

Method	Data Modality	Precision (%)	Recall (%)	$F_{1}$ -Score (%)
Liu et al. [35]	RGB	81.2	78.5	79.8
He et al. [34]	RGB	84.5	82.1	83.3
Zhou et al. [18]	Fused RGB-T	89.3	85.4	87.3
Zhao et al. [22]	Fused RGB-T	91.8	89.2	90.5
Zhao et al. [36]	RGB	88.6	86.9	87.7
Proposed Framework	Fused RGB-T Ensemble	93.2	91.5	92.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Radiuk, P.; Rusyn, B.; Melnychenko, O.; Perzynski, T.; Sachenko, A.; Svystun, S.; Savenko, O. Criticality Assessment of Wind Turbine Defects via Multispectral UAV Fusion and Fuzzy Logic. Energies 2025, 18, 4523. https://doi.org/10.3390/en18174523

AMA Style

Radiuk P, Rusyn B, Melnychenko O, Perzynski T, Sachenko A, Svystun S, Savenko O. Criticality Assessment of Wind Turbine Defects via Multispectral UAV Fusion and Fuzzy Logic. Energies. 2025; 18(17):4523. https://doi.org/10.3390/en18174523

Chicago/Turabian Style

Radiuk, Pavlo, Bohdan Rusyn, Oleksandr Melnychenko, Tomasz Perzynski, Anatoliy Sachenko, Serhii Svystun, and Oleg Savenko. 2025. "Criticality Assessment of Wind Turbine Defects via Multispectral UAV Fusion and Fuzzy Logic" Energies 18, no. 17: 4523. https://doi.org/10.3390/en18174523

APA Style

Radiuk, P., Rusyn, B., Melnychenko, O., Perzynski, T., Sachenko, A., Svystun, S., & Savenko, O. (2025). Criticality Assessment of Wind Turbine Defects via Multispectral UAV Fusion and Fuzzy Logic. Energies, 18(17), 4523. https://doi.org/10.3390/en18174523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Criticality Assessment of Wind Turbine Defects via Multispectral UAV Fusion and Fuzzy Logic

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Input Data and Defect Localization

3.2. Block 1: Data-Driven Detection and Measurement

3.2.1. Data Synchronization, Normalization, and Feature Extraction

3.2.2. Region of Interest (ROI) Extraction and Preprocessing

3.2.3. Contrast Enhancement and Adaptive Binarization

3.2.4. Morphological Filtering and Geometric Feature Extraction

3.2.5. Photogrammetric Scaling and Calibration

3.2.6. Thermal Analysis

3.3. Block 2: Formalization of Expert Criticality Functions

3.4. Block 3: Fuzzy Logic Integration for Final Criticality Assessment

3.4.1. Fuzzification of Data-Driven and Expert-Driven Inputs

3.4.2. Weighted Aggregation of Fuzzy Sets

3.4.3. Defuzzification for a Final Criticality Score

3.5. Experimental Setup

3.5.1. Dataset Curation and Preprocessing

3.5.2. Defect Detection and Fusion Pipeline

3.5.3. Hardware and Software Environment

3.5.4. Evaluation Protocol

3.5.5. Ethical Considerations

4. Results

4.1. Defect Detection Performance and Computational Profile

4.2. Ablation Studies

4.3. Validation of the Fuzzy Criticality Assessment

4.4. Reliability and Calibration of Criticality Scores

4.5. Comparative Analysis and Field Case Study

5. Discussion

5.1. Interpretation of Principal Findings

5.2. Practical Implications for Condition-Based Maintenance

5.3. Limitations and Threats to Validity

5.4. Future Research Directions

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Fuzzy Inference System Details

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI