1. Introduction
The focus on wind power generation technology has increased significantly on a global scale, with its large-scale deployment progressing in numerous countries, both onshore and offshore. At the 2023 Conference of the Parties of the UNFCCC (COP28), the global installed capacity of renewable energy was set to be tripled, representing an ambitious target and further strengthening the role of wind power [
1]. Analogous trends are al-so being observed within Japan. In the 7th Strategic Energy Plan proposed by the Ministry of Economy, Trade and Industry, offshore wind power is positioned as a “key driver” for making renewable energy a major power source [
2].
On the other hand, Japan faces a distinctive challenge in wind power development, as winter lightning frequently damages wind turbines, thereby hindering the sustainable expansion of wind power generation projects. Winter lightning is a frequent phenomenon along the coast of the Sea of Japan and is characterized by long-lasting currents and high energy levels [
3]. Approximately one-quarter of Japan’s wind turbines are situated along the coast of the Sea of Japan, where wind conditions are favorable; consequently, many of these turbines are exposed to the threat of winter lightning [
4]. According to a study, lightning strikes caused 58% of wind turbine accidents in Japan that resulted in insurance claims [
5]. This indicates that more than half of all major wind turbine accidents covered by insurance are caused by lightning strikes.
Against this backdrop of challenges, Japan has been developing legal frameworks to mitigate damage to wind turbines caused by lightning strikes. The most fundamental regulations on lightning protection for wind turbines are set out in the “Interpretation of Technical Standards for Wind Power Generation Facilities” [
6]. This regulation mandates that all wind turbines in Japan be equipped with Lightning Detection Systems (LDSs) and requires that they be immediately stopped in the event of a lightning strike. This allows for the safe stoppage of wind turbines struck by lightning, thereby reducing the risk of severe damage such as blade breakage [
7]. Wind turbines stopped due to lightning strikes are typically inspected by workers to verify their soundness before being restarted. From 2024 onwards, inspections using digital technologies such as AI and drones were legally permitted, which was expected to lead to more sophisticated and efficient inspection methods [
8].
Nevertheless, these regulations alone have not been sufficient to eliminate the risk of lightning damage to wind turbines; therefore, there is a need to develop a lightning damage detection system that operates rapidly and accurately. An investigation report [
7] indicates that minor lightning damage may go undetected by visual inspection, potentially leading to severe failures after turbine restart. Moreover, the current operational practice tends to be overly conservative, as turbines are stopped after any lightning strike regardless of actual damage. This approach, combined with the time required for site access and inspection, often results in prolonged downtime and reduced availability [
9]. For these reasons, the negative impact of lightning strikes on wind turbine operation remains significant; hence, there is an urgent need to develop technologies for rapid and accurate remote monitoring of turbine soundness.
A large volume of research has been dedicated to the development of methodologies for the detection and location of faults in wind power plants [
10]. As an IoT based method, Liu et al. proposed a novel iterative nonlinear filter for the detection of bearing faults in wind turbines. This approach has been shown to accurately identify a naturally damaged inner-racing fault in a 15-year-old bearing in-service, outperforming conventional filtering and Hilbert-envelope methods [
11]. As an data driven processing techniques, Wang et al. proposed a multi-channel CNN for wind-turbine blade and pitch-angle fault diagnosis, achieving up to 87.8% accuracy on four FRP-blade states using real vibration data from a laboratory-scale test turbine [
12]. The necessity to establish technology capable of automatically monitoring faults in remote wind turbines is indicated by the studies.
One method for verifying the soundness of wind turbines involves the use of Supervisory Control and Data Acquisition (SCADA) system. It is expected that the soundness of wind turbines can be monitored without additional equipment, as SCADA systems [
13] are already installed on all turbines. Several methods have been proposed for detecting anomalies in wind turbines based on SCADA data, including approaches using feature selection and support vector regression [
14], feature extraction through clustering and dimensionality reduction combined with deep learning [
15], and change-point detection techniques [
16]. However, attempts to detect lightning damage have not yet been put to practical use. Our research group is currently investigating methods for the rapid and accurate detection of lightning damage to wind turbines using anomaly detection techniques based on SCADA data. These studies have identified wind speed, rotational speed, and power as effective features for detecting lightning damage [
17]; moreover, anomaly detection models such as the Gaussian Mixture Model (GMM) and autoencoders have been proposed [
18,
19].
Figure 1 illustrates the conceptual flow of a wind turbine lightning protection system based on SCADA data, which represents the ultimate objective of this study. After a lightning strike, the turbine is automatically restarted in de-rated operation, during which it is immediately stopped if an anomaly is detected. It is then operated in rated mode, where anomaly detection is applied. This process enables faster restarting turbines after lightning strikes while maintaining safety, thereby improving their availability.
However, the present method cannot accommodate multiple operating modes of wind turbines including de-rated operation [
19,
20]. The anomaly detection model needs to be robust to variations between rated operation and de-rated operation, as the switching of modes is independent of the structural condition of the wind turbine. In contrast, it is difficult to achieve robustness solely by training the model with the available data, because de-rated operation scenarios are less frequent and data are scarcer than for rated operation.
This study proposes an anomaly detection method based on a “scaled power curve” for detecting lightning damage in wind turbines under de-rated operation. The scaled power curve, normalized by the maximum power limit, aligns power curves across operating modes and provides a feature robust to such variations. Accordingly, anomaly detection based on this feature enables robust performance under multiple operating modes. In addition, a GMM is employed as a simple and effective method for SCADA-based anomaly detection to clearly evaluate the effectiveness of the proposed feature [
18,
21].
The structure of this paper is as follows.
Section 2 describes the SCADA data used in this study and the wind turbine accident.
Section 3 describes the process and the evaluation methods used for anomaly detection.
Section 4 presents the results of anomaly detection and compares them with those obtained without using the scaled power curve.
Section 5 is dedicated to a discussion of the feasibility of this method based on the results obtained. Finally,
Section 6 provides the conclusions of the paper.
3. Lightning Damage Detection Method
3.1. Work Flow
Figure 3 shows the workflow of the proposed method. As illustrated in
Figure 3, the method consists of a training process and a test process. These processes were defined using the Python 3.9 programming language and the scikit-learn framework.
The training process is conducted in the following sequence. First, the scaled power curve is calculated from the training dataset and used as a feature (see
Section 3.2). Second, pre-processing is applied to these features (see
Section 3.2). Third, the Gaussian Mixture Model (GMM) is trained using the pre-processed features (see
Section 3.3). Fourth, the trained GMM is used to calculate the anomaly scores of the training data, in order to determine the threshold. The anomaly threshold was set to the 99.9th percentile of the training scores, corresponding to a false-alarm rate of 0.1%. This value was selected to minimize unnecessary turbine shutdowns, as frequent false alarms increase downtime and maintenance costs. Therefore, a 0.1% false-alarm rate was considered an appropriate trade-off between detection sensitivity and operational reliability.
The test process is conducted in the following sequence. First, scaled power curve are calculated from the test dataset using the same method as in the training process, after which pre-processing is applied. Second, the extracted features are input into the trained GMM to calculate the anomaly score. Finally, data are detected as abnormal if their anomaly scores exceed the threshold defined during the training process.
3.2. Feature Exstraction: Scaled Power Curve
In this study, the scaled power curve was calculated from SCADA data and used as a feature. The scaled power curve is a normalized representation that aligns the power curves during both rated and de-rated operation. The calculation of the scaled power curve assumes that the power curve of the wind turbine follows Equation (1) below [
23].
Here, P denotes the output, vw denotes the wind speed, and denotes the maximum allowable output. In addition, CP is the power coefficient, which depends on factors such as blade shape; is the air density; A is the rotor area of the wind turbine; and K is assumed to be a constant. varies depending on the operating mode of the wind turbine and can range from 0 kW to the rated output. At this stage, normalizing the power curve (i.e., the vw-P characteristic) can be achieved by dividing P by and vw by , thereby obtaining the scaled power curve independent of the value of .
The Scaled Power Curve can be calculated using the flowchart shown in
Figure 4. The
may be designated as the upper limit of the power or the upper limit of the rotational speed. Consequently, as demonstrated in
Figure 4, a comparison was conducted of the upper limit of the power and the upper limit of the rotational speed, with the lower value being adopted. It is to be noted that
in
Figure 4 is a function that returns the
when the upper limit of the rotational speed is given, and it must be approximated in advance from the SCADA data. In this study, linear regression was performed on the
Nlim −
P characteristics of the normal data to obtain an approximate equation. The calculated
P* and
vw* are then entered into the GMM.
Figure 5 compares the normal power curve and the scaled power curve. It is evident that the (a) normal power curve exhibits multiple saturation values. The presence of these saturation values is due to differences in operation modes, and the saturation value of around 1000 kW indicates de-rated operation. By contrast, in the (b) scaled power curve, these saturation values are scaled to 1, thus demonstrating a uniform distribution irrespective of the operating modes. The utilisation of scaled wind speed and scaled power as features is anticipated to facilitate robust anomaly detection, independent of operating modes.
In this study, data points corresponding to periods of inactivity were excluded from the extracted features as part of pre-processing. This is because the study focuses on anomaly detection in wind turbines under operating conditions. Here, data with power less than 1 kW were excluded.
3.3. Gaussian Mixture Model
The GMM is an unsupervised learning method commonly employed for anomaly detection [
24]. The GMM estimates the probability distribution of normal data using a weighted linear combination of multiple Gaussian distributions, thereby enabling the calculation of data likelihoods. Abnormal data have a low probability of occurrence, resulting in a low likelihood; this property can be exploited for anomaly detection.
The basic equation for GMM is shown in Equation (2) below. Note that in Equation (2),
p denotes the probability distribution approximated by the GMM,
denotes the multivariate Gaussian distributions, and
x denotes a
d-dimensional random variable. Equation (2) shows that the probability distribution is approximated as a weighted sum of
K Gaussian distribution. Note that
and
denote the mean vector and covariance matrix of the
k-th Gaussian distribution, respectively, and
denotes its weight. The parameter set
is denoted by
, and these parameters constitute the model parameters of the GMM. The optimal value of
is estimated using the EM algorithm [
25] by maximizing the log-likelihood function shown in Equation (3). Note that the value of
K must be determined in advance before executing the EM algorithm.
The calculation of the anomaly score for a given data
x is demonstrated in Equation (4) below. This calculation is derived from the likelihood output by the trained GMM. As demonstrated in Equation (4), the greater the deviation of
x from the distribution of the training data, the more significant the anomaly score. The classification of
x as either normal or abnormal is determined by the comparison of this anomaly score with a threshold.
The GMM settings utilised in this study are delineated in
Table 3 below. The GMM was defined using scikit-learn: an open-source machine learning library for the Python programming language. The hyperparameter
K was set to 128 based on previous research [
21], as it must be determined prior to anomaly detection. A sensitivity analysis with respect to
K was also conducted, as described in
Section 3.4. The remaining parameters were set to their default values provided by the implementation library, as these settings are widely used and ensure stable convergence.
3.4. Evaluation Method
In this study, the performance of the anomaly detection was evaluated using a confusion matrix, the Precision–Recall (PR) curve, Average Precision (AP), Receiver Operating Characteristic (ROC) curve, and Area Under the Curve (AUC). This subsection provides an overview of these evaluation methods.
The confusion matrix is a table that represents the results of anomaly detection, categorized into the four events shown in
Figure 6. Since test data used for anomaly detection are typically highly imbalanced, with abnormal samples being extremely scarce relative to normal data, a confusion matrix is required to evaluate performance separately for each class.
Recall and precision are defined by Equation (5) using the values in the confusion matrix [
26]. Recall indicates the proportion of actual abnormal data that are correctly detected, whereas precision indicates the proportion of detected data that are actually abnormal. Recall and precision exhibit a trade-off relationship: lowering the threshold increases recall, whereas it decreases precision.
The PR curve illustrates the relationship between recall and precision as the threshold varies [
26]. AP represents the average precision calculated from the PR curve: a value closer to 1 indicates higher anomaly detection performance. Calculating the AP does not require setting a threshold, thereby allowing the evaluation of anomaly detection performance without relying on a single threshold value.
The ROC curve is a plot of the True Positive Rate (
TPR) and False Positive Rate (
FPR) by the threshold [
27]. Equation (6) defines
TPR and
FPR. The area under the ROC curve was defined as the AUC. The greater the accuracy of the anomaly detection model, the greater the value of this variable.
In addition, a sensitivity analysis of the GMM hyperparameter K was conducted by varying K (32, 64, 128, and 256) to evaluate its effect on anomaly detection results.
5. Discussion
5.1. Evaluation and Comparison
This subsection provides a comparison between the results obtained using the scaled power curve as a feature and those obtained using the simple power curve. Note that the only difference between the two methods lies in the features; all other parameters are kept identical.
First,
Figure 10 shows the anomaly detection results for three scenarios—during a lightning strike accident, during normal rated operation, and during normal de-rated operation—presented as a confusion matrix.
Figure 10a shows the result of conventional method, obtained using the power curve as a feature.
Figure 10b shows the result of proposed method, obtained using the scaled power curve as a feature.
In conventional method, 226 false positives were observed, whereas in proposed method, the number was reduced to 20. In this case, the scaled power curve led to a substantial reduction in false positives. Note, however, that the results of proposed method show an increase in false negatives compared to conventional method. Furthermore, the period between the first and ninth lightning strikes was excluded from the evaluation because the exact timing of blade damage is unknown in this study. Therefore, it should be noted that the number of false negatives in both results may be further underestimated.
Next,
Figure 11 presents the power curve along with the results of anomaly detection based on data obtained during de-rated operation.
Figure 11a,c,e show the results of conventional method, obtained using the power curve as a feature.
Figure 11b,d,f show the results of proposed method, obtained using the scaled power curve as a feature. Note that
Figure 11(b), (d) and (f) is a reproduction of
Figure 9(c), (d) and (e), respectively.
As illustrated in
Figure 11a,c, a total of 212 false positives were observed when the power curve is used as a feature. This is because the anomaly detection model is mistakenly identifying the upper limit value that appears during de-rated operation as abnormal. The anomaly score continues to exceed the threshold at its saturation values during de-rated operation, as shown in
Figure 11e. Accordingly, power-curve-based anomaly detection methods cannot be employed during de-rated operation.
By contrast, only 14 false positives were observed when the power curve was used as a feature, as illustrated in
Figure 11b,d. Therefore, Using the scaled power curve as a feature ensures that the saturation values observed during de-rated operation are correctly classified as normal. Furthermore, the anomaly score remains below the threshold, except for the periods immediately before and after the stoppage or restart of the wind turbines. These results suggest that the scaled power curve may be less sensitive to operating modes in the present case.
Figure 12 shows the anomaly detection results for three scenarios—during a lightning strike accident, during normal rated operation, and during normal de-rated operation—presented as a PR curve and ROC curve. In
Figure 12, the black line represents the results of conventional method, obtained using the power curve, whereas the red line represents the results of proposed method, obtained using the scaled power curve.
Figure 12 demonstrates that the proposed method’s performance in detecting anomalies is superior to that of the conventional method. Compared to using the power curve as a feature, using the scaled power curve as a feature yield improved PR curves and higher AP by 0.114. Furthermore, the application of the scaled power curve as a feature yielded enhanced ROC curves and an elevated AUC of 0.062 in comparison with the utilization of the power curve as a feature. These results demonstrate that the use of the scaled power curve as a feature enables more accurate anomaly detection.
Table 4 presents the recall, precision, AP and AUC of the conventional and proposed methods for each value of
K. In comparison with the conventional method, the proposed method demonstrates enhanced anomaly detection accuracy, as evidenced by the elevated precision, AP and AUC values. It is important to note that precision improved while maintaining a comparable level of recall. The present findings exceed the results of previous studies [
20] that used the same dataset, where the maximum AUC achieved was 0.803, whereas the proposed method attains a higher maximum AUC of 0.868. Therefore, these findings provide preliminary evidence that the scaled power curve may be useful for detecting lightning damage.
5.2. Feasibility Studies and Issues
The scaled power curve is likely to be useful for the lightning protection system shown in
Figure 1, as it is an effective way of detecting lightning damage to wind turbines as discussed in
Section 5.1. In addition, with an average calculation time of 0.122 ms for training and 0.105 ms for testing (on a 13th Gen Intel Core i7-13700KF CPU, Intel Corporation, Santa Clara, CA, USA), this method is fully feasible for real-time use. As the practice of stopping wind turbines when lightning strikes is already in implementation in Japan, the introduction of this method in conjunction with an LDS and SCADA system, which detects anomalies based on the scaled power curve, is expected to enable the safer and more efficient stopping of wind turbines. Furthermore, the capacity for the rapid restart of operations without the necessity for inspection, as a consequence of the detection of anomalies, is believed to result in an improvement in the availability of the wind turbines.
On the other hand, the implementation of the scaled power curve as a feature for anomaly detection led to an increase in false negatives and a decrease in recall. It is believed that this is a limitation of the GMM employed in this method. GMM is a simple anomaly detection method, characterized by a lower modelling expressiveness than more recent methods. Consequently, the model exhibits a delayed response to anomalies and is susceptible to overlooking them. The scaled power curve can serve as an input feature for other anomaly detection models. Future work will be required to integrate it with more advanced anomaly detection models.
In practical applications, it is presumed that false positives can be eliminated by stopping the wind turbines when consecutive anomalies are detected. For instance, let us consider the case in which an operational rule is applied such that the wind turbine is stopped when the anomaly score exceeds the threshold continuously for 5 min. In the following, we consider the case where this operational rule is applied. First,
Figure 13b enlarges the temporal variation in the anomaly score shown in
Figure 7e. The period during which the threshold is exceeded continuously for 5 min is highlighted in red. For comparison, the result obtained using a simple power curve is shown in
Figure 13a. As is evident from a comparison between
Figure 13a,b, a larger number of intervals are identified as abnormal when the scaled power curve is used than when the simple power curve is applied. However, because the wind turbine is stopped at the moment when the threshold is exceeded for 5 consecutive occurrences for the first time, it is confirmed that the turbine can be safely stopped by either method.
Second,
Figure 14 enlarges the temporal variation in the anomaly score shown in
Figure 9e and
Figure 11e under normal de-rated operation. The period during which the threshold is exceeded continuously for five minutes is highlighted in red. Because
Figure 14 uses data obtained during normal operation, it is desirable that all data be classified as normal. However, in
Figure 14a, the threshold is exceeded for 5 consecutive occurrences during many intervals, which would cause unnecessary stoppage of the wind turbine. This indicates that practical operation is infeasible with this method. In contrast, in
Figure 14b, the threshold is not exceeded for 5 consecutive occurrences at any time. Therefore, it is confirmed that practical operation is feasible when the scaled power curve is employed.
In addition, it is imperative to highlight the necessity for enhancing the model’s anomaly detection performance in terms of reliability, particularly in the setting of practical implementation. This study is a case study that examines only a single wind turbine accident; therefore, further validation is required in future work. However, it is rare for lightning strikes to cause major accidents involving wind turbines. In addition, very few cases of SCADA data from such accidents are stored in a format that can be analyzed. Consequently, this constitutes an important limitation of the present study. In order to verify the applicability of this method to wind turbines in different conditions, further data collection and investigation is required. As a specific example, the proposed method illustrated in
Figure 1 may be applied in conjunction with conventional visual inspections, thereby enabling the collection of SCADA data with ground-truth labels while maintaining safety. The method is therefore expected to be applicable to wind turbines within the same wind farm as that used for validation, because it utilizes the power curve as a feature, eliminating the need for retraining under identical wind conditions and with the same manufacturer.
Another issue is the need for a more rigorous definition of the scaled power curve. In this study, the air density and the rotor area of the wind turbine were assumed to be constant; however, the air density varies depending on location, season, and time of day. By incorporating temperature and atmospheric pressure as features and subsequently defining the scaled power curve with air density as a variable, features robust to variations in air density can be obtained. However, it should be noted that, to obtain these results, the threshold must be set to at least the 99th percentile.
6. Conclusions
This paper reports our findings on a lightning damage detection model based on SCADA data, which is applicable even to wind turbines under de-rated operation. First, we proposed the “scaled power curve” as a robust feature that is less affected by multiple operating modes. The scaled power curve is normalized using the maximum allowable output Pmax, thereby enabling the analysis of both de-rated and rated operation on a comparable scale. The GMM was employed for anomaly detection, after which experiments were conducted using both the scaled power curve and the original power curve.
The experimental results demonstrated that, in the field data used in this study, the use of the scaled power curve as a feature yielded more accurate anomaly detection performance than using the original power curve as a feature. The anomaly detection model using the scaled power curve successfully identified normal conditions, resulting in a significant reduction in false positives—equivalent to 206 min of false positive events. Furthermore, the accuracy of the PR and ROC curves also improve. These results suggest that the scaled power curve may be less sensitive to operating modes in the present case.
Meanwhile, several issues were identified in anomaly detection when the scaled power curve was used as a feature. The primary issue is an increase in false negatives and a corresponding decrease in recall. To address this problem, an operational countermeasure is effective in practical implementation. Specifically, it was demonstrated that anomaly detection based on the scaled power curve becomes practically feasible if an operational rule is introduced such that the wind turbine is stopped only when the threshold is exceeded more than five consecutive times. The second issue is the reliability of the anomaly detection model. It is evident that further data collection and analysis is required in order to address this issue.
Based on the findings from this study, it is expected that a lightning damage detection model applicable to wind turbines under de-rated operation can be developed. This technology enables the rapid and accurate remote assessment of wind turbine soundness after lightning strikes, allowing quicker restart and thereby improving overall availability.