1. Introduction
Due to the harsh environments in which wind turbines (WTs) operate, failures occur frequently, resulting in a heightened demand for operation and maintenance (O&M). Traditional O&M strategies, including shutdown, inspection, and regular maintenance, affect the health condition of WTs in varying ways [
1], as illustrated in
Figure 1.
Each inflection point represents a manual maintenance intervention. In essence, the revenue generated via a wind farm is proportional to the area under the curves. Ideally, it is crucial to maintain wind turbines (WTs) in a consistently high condition of health. Among various O&M strategies, shutdown has the most detrimental impact on wind farm efficiency. Although inspection is less disruptive, it is constrained by high labor and material costs, as well as low operational efficiency. Regular maintenance also faces cost-efficiency challenges and often results in ineffective investment due to unnecessary servicing of normally functioning WTs. Gradually, predictive maintenance has emerged as a promising alternative, offering a cost-effective and efficient approach by enabling accurate detection of abnormal WTs and timely forecasting of potential failures.
A critical task in predictive maintenance is the accurate identification of inflection points, which necessitates a precise definition of the condition of WTs. Therefore, it is essential to develop an efficient, automated, and intelligent O&M method, along with a condition monitoring (CM) method [
2], which supports the early identification of critical information before the occurrence of anomalies or failures. This approach minimizes unnecessary losses and provides maintenance personnel with sufficient time for proactive intervention.
CM is fundamentally based on anomaly detection (AD) [
3], which plays a crucial role in industrial processes by enhancing quality control and improving product qualification rates [
4]. In the wind power industry, AD is primarily employed in O&M to identify sudden or unexpected anomalies [
5]. Using both supervised [
6] and unsupervised [
7] methods, AD detects anomalies in individual wind turbine components instantaneously. The key distinction between CM and AD lies in scope and temporal focus: while AD identifies short-term abnormal patterns, CM continuously evaluates the condition of components over a longer time horizon, providing a more comprehensive understanding of equipment health.
The application of traditional model-based and signal-processing-based CM is often constrained due to the complexity of modeling difficulty and high implementation costs. In contrast, data-driven CM provides an effective approach to extracting valuable insights from WT data. Capturing the coupling effects among multiple variables enables the identification of patterns corresponding to different operational conditions. Leveraging pattern recognition techniques, data-driven CM supports end-to-end monitoring. More importantly, it can detect and extract micro-level features before anomalies become perceptible to maintenance personnel, allowing for early fault warnings [
8,
9,
10]. Additionally, when implemented using supervisory control and data acquisition (SCADA) systems, it eliminates the need for additional sensors or hardware, thereby enhancing internal system reliability while reducing O&M costs [
11,
12]. Consequently, data-driven CM has become a widely accepted and practical solution in the wind power industry.
With its various features, SCADA provides a relatively intuitive description of the operational processes for WT [
13,
14]. However, significant dynamic fault characteristics may be lost as the time intervals between data points are relatively long, typically around 10 min. This limitation can be addressed using intelligent methods that possess powerful learning and data mining capabilities.
Extensive research has been conducted to monitor WT conditions, primarily focusing on either the entire WT system or specific individual components. Detecting anomalies in the entire system often leads to inefficiencies and high labor costs. Moreover, data collected from operational wind farms are typically unlabeled, making it challenging to obtain component-specific operational data. Extracting data for individual components from numerous features is a difficult and highly labor-intensive process that requires extensive research. Additionally, the lifespan of WTs is predominantly influenced by key components, while other components have a lesser impact. Therefore, it is essential to develop adaptive CM and anomaly analysis methods that specifically target the critical subsystems of WTs. Statistically, among the various subsystems, the gearbox is the most critical when considering both failure rates and downtime caused by failures [
15].
Traditional condition monitoring methods that rely on vibration analysis based on signal processing face the challenge of high computational costs. Additionally, the original signals often contain significant noise, resulting in a low signal-to-noise ratio, which severely limits the effectiveness of vibration analysis in condition monitoring tasks. Furthermore, the conventional simple threshold methods based on SCADA data introduce substantial uncertainty during the modeling process, leading to a high false alarm rate and significantly reduced accuracy, thereby impairing overall condition monitoring performance.
CM for WT gearbox is often hindered by the lack of an effective model and framework, which limits its practical application in the wind industry [
16]. To address this, a normal behavior model [
17] is proposed, comprising the following key steps: feature selection, model training, difference comparison, and condition monitoring. First, a machine learning model is trained using raw data obtained from healthy operating conditions. Next, unknown data requiring monitoring are input into the pretrained model. The results are then compared with corresponding healthy-condition data. Smaller differences indicate that the monitored data sample closely resembles a healthy condition. A failure threshold is established based on these differences to quantitatively detect anomalies in WTs. However, existing CM methods suffer from low accuracy and efficiency due to random feature selection, low accuracy in abnormal condition recognition, an insufficient consideration of feature correlations during model training, and uncertainty in difference calculation.
Feature selection for condition monitoring can be classified into subjective methods and objective methods [
18]. The former relies on personal experience and prior literature to manually identify features related to the target subsystem [
19,
20]. However, subjective methods can lead to information loss due to inherent biases [
21]. For example, the SCADA system can only describe the operational conditions of wind turbines from the perspective of sensor networks, but it lacks the precision to provide a comprehensive overview. Consequently, subjectively selecting variables as model inputs introduces substantial uncertainty. Additionally, some valid variables may be overlooked, resulting in the loss of valuable information during model training. Therefore, objective methods, which quantitatively identify relevant features based on computational relationships [
22,
23,
24], have been more widely adopted by researchers. These methods often use various correlation coefficients, such as the
Pearson coefficient [
25], maximal information coefficient [
26], and
Kendall coefficient [
27]. This approach effectively eliminates subjective biases and uncertainties, preserves the maximum amount of useful information in the data, ensures proper training of the model, and enables high-accuracy anomaly detection. However, different coefficients vary in their applicable ranges, and their applications are limited to specific scenarios, an aspect that has not been sufficiently addressed.
Machine learning plays a significant role in extracting, recognizing, and modeling various conditions within a multidimensional space [
28]. According to the training process, it can be classified into supervised [
29], semi-supervised [
30], and unsupervised [
31] methods. However, considering the lack of labeled data, the insufficiency of effective samples, the discrepancies between data and features, and the uncertainty in describing the conditions of WT in real-world WT condition monitoring, a self-supervised method [
32,
33], which is similar to unsupervised learning, has attracted increasing attention due to its ability to automatically construct supervisory information from unlabeled data [
34,
35]. Self-supervised learning uses the data itself as labels to generate self-supervised signals, allowing the model to learn representations by predicting missing parts, perturbations, or correlations in the data [
36,
37]. Under the self-supervised framework, specific models play a crucial role. Among them, graph neural networks (GNNs) [
38] are particularly noteworthy due to their unique architecture. By representing data as nodes and edges, GNN can effectively model and learn the characteristics of raw data, addressing the differences among various features in the raw SCADA system and capturing the correlations among them.
Healthy conditions are modeled using a model trained on historical healthy data. Data from unknown conditions are represented as graphs and processed through the pretrained model. Subsequently, a comparison is made between the corresponding results obtained from healthy-condition data and unknown-condition data. Commonly used metrics for comparison include residuals [
39,
40], the root mean square error [
26], square prediction error [
41], Mahalanobis distance [
42,
43], and Kantorovich distance [
44]. Based on the distance metrics, a numerical representation is constructed to describe the healthy condition of the subsystem and provide a threshold for determining failures. The health index (
HI) is often established using an exponentially weighted moving average (EWMA) [
45], and the failure threshold is determined according to the
principle or statistical process control (SPC) [
46]. When the
HI exceeds this threshold, the subsystem is considered abnormal, and a warning is issued.
In summary, within the context of low anomaly recognition, traditional CM performs poorly in both accuracy and efficiency due to the high cost of CM, subjective factors in feature selection, random factors in method application, and the lack of consideration of feature correlations. Specifically, the manual separation for feature selection overlooks the correlations between features. And there is an absence of specific standards that target subsystems in quantitative selection. This results in high labor costs and significant variability, leading to low efficiency and reliability. Additionally, the weak expressive power and low accuracy of models often stem from inefficiencies caused by manually labeled data during deep model training, as well as the tendency of basic models to overlook feature differences and correlations. Moreover, residuals for specific target features can only reflect subsystem conditions from a single perspective during prediction and discrepancy calculation, making them susceptible to subjective biases. The overall reliability is further compromised due to the challenge of representing subsystem conditions with a single-dimensional measurement. To address these limitations, an adaptive condition-monitoring method based on SCADA data for wind turbine gearboxes is proposed. The main contributions are as follows:
Modified normal behavior model-based wind turbine gearbox condition monitoring.
Correlation-based adaptive feature selection with a quantitative indicator defined.
Optimized self-supervised contrastive residual graph neural network for data mining.
Generalized healthy index and adaptive failure threshold for condition monitoring.
Ninety-percent-plus recognition accuracy and 40 h’ advance warning for 2 real SCADA datasets.
This paper is structured as follows.
Section 2 presents the methodology applied at each step, including five subsections: feature selection, model training, health index, failure threshold, and the overall online condition monitoring system.
Section 3 presents the experimental process conducted on two datasets for validation, along with relevant applications and discussions. Finally,
Section 4 concludes the paper and outlines prospects for future works.
2. Methodology
An adaptive condition-monitoring method, adaptive feature selection and self-supervised contrastive residual graph neural network (AFS-CRGN), is proposed in this paper for a wind turbine gearbox. For data preparation, AFS is initially developed by analyzing correlations between features, with a quantitative indicator established. To facilitate data mining and model training, a self-supervised CRGN model is designed. After this, online monitoring begins for the gearbox HI, which is established using distance metrics in multidimensional space and EWMA. Based on the HI, a failure threshold is established using SPC. Finally, online condition monitoring of the wind turbine gearbox is realized.
2.1. Adaptive Feature Selection Based on Correlation Analysis
AFS aims to identify suitable data features for the CM task through correlation analysis. This analysis determines the degree of relationship between features by examining the correlation coefficients among various data features.
The Anderson–Darling test is performed to assess whether each feature of the raw SCADA data follows a normal distribution. However, none of the variables follow a normal distribution, rendering the
Pearson correlation coefficient unsuitable. Therefore, the
Spearman coefficient is employed, as shown in Equation (
1), where
represents the difference in ranks for each pair of observations, and
n represents the total number of observations.
Manually and directly selected features typically exhibit high randomness and are prone to introducing subjective factors. This creates significant uncertainty for the condition monitoring task of wind turbine gearboxes. To prevent the introduction of excessive subjective factors and mitigate the impact of random factors on the results, the adaptive feature selection is proposed. Some core features are selected firstly based on the industry experience as they are the most relevant to the target sub-system, gearbox. Then, the correlations between the remaining features and such core features are evaluated to select the most appropriate features for condition monitoring. In detail, five features closely associated with the gearbox are identified as core features based on previous research, the primary functions of the gearbox, and O&M reports from wind farms (only for
Case I), as shown in
Table 1. The rotor speed and generator speed represent the rotational speeds of the low-speed shaft and high-speed shaft of the gearbox, respectively.
For each core feature, the
Spearman correlation coefficients between the remaining features and the core features are calculated. Each remaining feature is correlated with five core features, and the average of the results obtained is taken as the degree of similarity of that feature. For each feature excluding the core, the following equation applies.
where
represents the correlation coefficients of remaining features with respect to the core features. Subsequently, a quantitative indicator is established. The variable indices with correlation coefficients exceeding this indicator correspond to those strongly correlated variables. These variables, combined with the core features, constitute the raw data samples used for model training. The quantitative indicator is a key parameter, and the number of adaptively selected variables varies, depending on its value. Typically, it ranges from 0.5 to 0.8. In this paper, it is set to 0.6 based on the experiments and discussions presented in
Section 3. After feature selection, the selected data undergoes min–max normalization to eliminate dimensionality effects, as shown in Equation (
3).
Through the above AFS, the data for subsequent model training is prepared. Adaptive refers to the fact that the feature selection method proposed in this article provides a framework that can autonomously select features for different wind turbines and operating conditions.This ensures the generalization of the method, allowing for feature selection on a larger scale. Compared to the traditional recursive feature elimination method and the importance-based feature selection method, the proposed AFS can operate independently, without the need to bind subsequent tasks to this part of the content or the need for complex computations. Based on the actual physical meaning of core features determination, it starts from the data itself and uses the relationships among data for calculations. The proposed AFS can lower the subjective factors in feature selection, making it more efficienct in wind turbine gearbox condition monitoring. It is efficient and simple, and it can adaptively adjust dynamically, making it suitable for promotion and use in real industrial scenarios.
2.2. Model Training Based on Contrastive Residual Graph Neural Network
Once the data is prepared, model training is conducted to learn specific representations from the data and its samples. In this section, the data sampling and model training methods are presented.
As a type of deep learning model, GNN takes into account the correlations between different features of raw data. This study focuses on the CM task of WT gearbox. The selected features through AFS exhibit significant correlations. Using raw data directly for model training overlooks these inter-feature correlations, resulting in suboptimal training performance. Self-supervised learning uses the data itself as labels to generate self-supervised signals, allowing the model to learn representations by predicting missing parts, perturbed information, or correlations in the data. The gearbox monitoring data used in this study comes from the SCADA system of the wind farm and consists of a high-dimensional, non-stationary time series. Without effective labeled samples, self-supervised learning methods can extract useful information from a large amount of unlabeled data and complete model training. In addition, contrastive learning can further enhance the robustness of the model by employing data augmentation strategies to maintain invariance to noise and operational fluctuations in sensor system data. Therefore, a CRGN based on GNN and self-supervised contrastive learning is constructed as the foundational model for data mining and model training.
The data samples used for CRGN are referred to as graph samples, which include raw data and the correlation relationships among different features. Each feature is represented as a node, while the correlations among features are represented as edges. Based on the
Spearman coefficients between features, an adjacency matrix is constructed, as shown in Equation (
4), where
represents the
Spearman coefficients between different features.
The formation of the graph data samples is illustrated in
Figure 2, where F1 to F5 represent the core features, and Ft 1 to Ft n denote the selected features based on AFS.
The CRGN constructed in this study consists of four graph convolutional layers, each referred to as a graph convolutional network (GCN) layer. By combining the advantages of GNNs and convolutional neural networks, it is well-suited for performing the CM task and generating graph samples. A GCN is a class of neural network designed to operate on graph-structured data. It extends the concept of convolution from grid-like data, such as images, to graphs by aggregating features from neighboring nodes, thereby capturing both local and global structural information. A common formulation for a GCN layer is given by Equation (
5).
where
represents the feature matrix at the
layer, with
as the input features.
represents the adjacency matrix with added self-connections, as shown in Equation (
6).
represents the diagonal degree matrix of
, defined as shown in Equation (
7).
represents the trainable weight matrix at layer
l.
represents activation function.
The Leaky-ReLU is used as the activation function, with the negative slope coefficient set to 0.1. Additionally, residual connections are implemented to preserve information integrity. Under a self-supervised learning framework, two encoders are constructed for contrastive learning. The raw data is masked by
Gaussian random noise, with a standard deviation of 0.1. Both the raw and masked data are input into the two encoders. The contrastive loss-based objective function is presented as Equation (
8).
where
represents the feature representation.
and
represent the positive and negative samples, respectively.
refers to the cosine similarity.
signifies the temperature parameter that determines the shape. The stochastic gradient descent (SGD) optimizer with Nesterov momentum is treated as optimization, along with L2-regularization. Model framework is shown in
Figure 3.
After model training, the health feature representations are learned from data samples. The trained models are then deployed to the online monitoring system to participate in condition monitoring.
2.3. Healthy Index Based on Distance Metric
Under the same data preprocessing and sampling methods, graph samples for the target gearbox under monitoring are constructed. After loading the CRGN model trained with healthy data, the unknown condition samples are input into the pre-trained model. Based on distance metrics, a similarity comparison is performed on the obtained results. Then, HI is established to characterize the varying health conditions of the WT gearbox.
Since relying on a single distance metric introduces randomness, multiple distance metrics are employed to mitigate the impact of this variability on the results, such as the
Manhattan distance
,
Euclidean distance
, cosine similarity
, and
Chebyshev distance
. The average of these distances,
, is considered the mean distance between healthy samples and samples with unknown conditions. The formulas used to calculate each metric are expressed as Equations (
9)–(
13) below, where
x and
y represent the two vectors involved in the distance calculations, and
Normed indicates that the results have been normalized.
According to EWMA and normalized average distance,
HI is established for the gearbox. The weighted moving average assigns different weights to observed values for the subsequent calculation of the moving average. The forecast value is then determined based on the most recent moving average value. In the case of EWMA, the weighting coefficients decrease exponentially over time, giving greater weight to values closer to the current time.
HI is calculated by Equation (
14), where the penalty factor
is set to 0.2.
Furthermore, the calculation results are processed using median filtering to remove noise, employing a filter window size of 100.
2.4. Failure Threshold Based on Statistical Process Control
Statistically, SPC relies on statistical analysis to monitor the production process in real time, aiming to scientifically differentiate between common-cause and special-cause variation in product quality during production. Assuming that the quality variable of a certain product follows a normal distribution, and that quality characteristic X follows a normal distribution with a mean of
and a standard deviation of
, the probability P of the quality characteristic falling within a certain range
is calculated as Equation (
15).
where
represents the probability density function of a normal distribution. The
HI in this paper can be assumed to follow a normal distribution. Therefore, the probability that the quality characteristics lie within a specific interval can be calculated using Equation (
15).
In the practical application of this study, which addresses condition monitoring for a wind turbine gearboxes, a confidence level of 0.99 is adopted. This choice is based on historical experience and aims to ensure the method’s generalizability. That is, the probability of HI within a reasonable confidence interval needs to be 0.99.
When defining the failure threshold, we need an upper bound as the value, so the interval where the quality factor is located should have an upper limit. According to SPC and the integration rule for the normal distribution, the probability P of the quality characteristic falling within range
is shown in Equation (
16).
where
v is a certain coefficient to be determined. In practice, the sample mean
and standard deviation
can be used to represent the mean and standard deviation of the normal distribution. When an upper bound and a confidence interval of 0.99 are required,
is asked. It can be found that, when
, it meets the requirement. So, we have the following:
According to SPC, the failure condition is defined as the range that exceeds a certain threshold
. Therefore, the calculated condition indication parameters are used to calculate the threshold according to the SPC principles. Considering the definition of
HI, only the upper boundary is taken into account when selecting the region boundary value. This upper boundary is defined as the failure threshold
Th, as shown in Equation (
18).
In the process of CM, the operating condition of the WT gearbox at that moment is deemed abnormal if the HI exceeds the corresponding Th. Therefore, the existing issues must be investigated before any continued operations.
2.5. Adaptive Condition Monitoring for Wind Turbine Gearbox
On the basis of the normal behavior model, a WT gearbox condition monitoring method including AFS and CRGN is developed, as shown in
Figure 4.
There are three stages in the proposed methodology: adaptive feature selection, CRGN training, and online condition monitoring. Stage 1 consists of data preprocessing and adaptive feature selection. It provides graph data samples for model training in Stage 2 and for condition monitoring in Stage 3. Stage 2 involves deep learning-based training of the CRGN model proposed in this study. It supplies pre-trained models for condition monitoring in Stage 3. Stage 3 is the process of online condition monitoring using samples from Stage 1 and the pre-trained models from Stage 2.
Stage 1: The raw SCADA data is initially preprocessed to eliminate null values and other similar anomalies. Subsequently, features strongly correlated with the wind turbine gearbox are selected based on AFS. A set of selected features is then prepared, dynamically varying according to different wind turbines. Finally, a graph data sample is generated based on feature correlations and the raw data.
Stage 2: After graph data samples are prepared, Gaussian random noise is introduced into the samples to construct a mask for model training. Based on contrastive learning, offline CRGN model training is performed to develop two encoders that share initial weights, each comprising a GNN with residual connections.
Stage 3: The pre-trained model from Stage 2 is integrated into the online monitoring system. Meanwhile, according to Stage 1, graph samples are constructed using online data under unknown operating conditions. Subsequently, the predictions generated via the pre-trained model are compared with those from healthy samples to compute the distances in multi-dimensional space. Based on these distance metrics, HI for the gearbox is constructed in conjunction with EWMA. Finally, the failure threshold Th is established using SPC, and the relative magnitudes of HI and Th are compared to facilitate online CM for the WT gearbox.
The three stages constitute a comprehensive framework for wind turbine gearbox condition monitoring, encompassing various aspects such as data acquisition, data preprocessing, feature selection, graph sample construction, graph neural network-based model training, online data integration and model configuration, similarity measurement, the development of a health index, and the determination of failure thresholds.
4. Conclusions
A novel condition monitoring framework for wind turbine gearbox based on adaptive feature selection and self-supervised contrastive residual graph neural network (AFS-CRGN) is proposed, including AFS for data preparation, CRGN for model training, HI, and failure threshold for online CM. Validation using SCADA data from two wind farms in China and Portugal demonstrates that AFS enhances monitoring efficiency and reliability, while CRGN exhibits strong generalizability across different turbines and consistently delivers satisfactory performance. For gearbox anomaly detection, the framework enables effective condition monitoring and early warning of abnormalities. The accuracy and F1 score of abnormal condition identification both exceed 90%, indicating the effectiveness of the method proposed in this article and the consistency of the data samples. And the lead time for anomaly detection and warning before a fault occurs can reach 30 to 40 h. Compared to other methods applied to the two datasets, the proposed approach shows superior performance and improved generalization.
Directly predicting anomalies in WT gearboxes in advance using simple sensor signals is challenging. However, intelligent methods remain effective for categorizing abnormal conditions into distinct patterns and extracting deep data insights through pattern recognition. In this paper, based on the correlations among various data features, the proposed CRGN model evaluates the descriptive contribution of each feature to the gearbox condition. By constructing graph samples and adjacency matrices for training, the residual connections enhance the model’s expressive power and generalizability. Additionally, adaptive model training is achieved through contrastive learning, enabling self-supervised tasks on unsupervised data. Furthermore, the established quantified HI is more adaptive to variations among different data features, mitigating randomness in CM, supporting abnormality assessment, and facilitating abnormality detection and early warning.
The limitation lies in the requirement for a large amount of data and generalization in the application of methods for different wind turbines and operating conditions. In addition, we have found that the decline modes of the main shaft bearing and generator follow a pattern similar to that of gearbox. Therefore, such similar degradation trends suggest the method’s potential for extension to other subsystems in future research. At the same time, to apply the method in industrial real-world scenarios, improving computational efficiency and reducing computational costs while maintaining excellent condition monitoring performance will also be an objective of future efforts.