1. Introduction
The installed capacity of wind power in the global market in 2021 was 93.6 GW, bringing the global total capacity to 837 GW [
1]. As wind turbine technology continues to evolve, sophisticated multi-MW wind turbines have been applied for onshore and offshore wind farms [
2]. However, larger wind turbines have proven to develop more failures than small ones [
3]. Moreover, wind farms are generally located in remote areas with a harsh operational environment, the limited accessibility of which leads to high costs for operation and maintenance (O&M). Statistics show that the O&M costs account for 10–15% of total onshore wind farm project costs [
4]. For an offshore wind farm, the O&M costs account for up to 14–30% [
5]. Therefore, it is vital to reduce O&M costs for enhancing the competitiveness of wind farms.
Condition monitoring and fault diagnosis of wind turbines aiming at detecting incipient faults can improve the reliability of wind turbines and reduce O&M costs [
6]. Recently, many techniques have been presented and achieved some success. Vibration analysis [
7,
8,
9], oil analysis [
10], and strain measurement [
11] have been widely studied, and are mainly used to monitor the highest-cost subcomponents of wind turbines, (e.g., main bearing, gearbox, and electric generators) due to the costs associated with mounting additional sensors and maintaining. On the other hand, supervisory control and data acquisition (SCADA) systems have become a standard installation on large wind turbines, which provide a wide range of wind turbines’ operational signals. As a potentially low-cost and wide-coverage solution, plentiful studies using SCADA data for fault diagnosis have been developed [
12,
13,
14]. In addition, analysis of alarms generated by wind turbine alarm systems is a promising way of fault diagnosis. Typically, alarms are triggered and recorded when key component signals exceed threshold limits [
15], which indicates the need for the operator’s emergency action to protect a wind turbine from running into risky conditions. Alarm systems are critically important for the safety and efficiency of wind turbines. Due to the high requirements for condition monitoring of modern large wind turbines, more and more alarm configurations are added to alarm systems, which can provide a large number of alarm data that cover almost all wind turbine subcomponents. The performance of a wind turbine can be monitored through a proper analysis of these collected alarms.
However, it is not easy for on-site operators to diagnose wind turbine faults through alarms. Alarms typically contain descriptive information about an abnormal situation, which cannot directly indicate the fault types. Moreover, large numbers of alarms are usually triggered in a short period once a specific fault occurs. The operator is easily overwhelmed by these alarms because it exceeds his response capability. There are three main reasons for the situation. First, irrational and redundant alarm configurations commonly exist [
16] in alarm systems, which will cause false alarms and repeated alarms. Second, modern turbines present a high level of interconnectivity due to the mechanical structures, electrical connections, and complex control systems [
17]. The propagation of faults in wind turbines will trigger many consequential alarms and related alarms [
18]. Third, the operating conditions of wind turbines are complex and changeable. Under different operating conditions, the same fault could trigger different alarms [
19]. As a result, when overwhelmed by alarms, the operator needs to rely on extra expert consultation for fault analysis.
Some researchers have focused on the use of alarms for wind turbine fault diagnosis. A feasibility study of the wind turbine alarm diagnosis method using an artificial neural network was presented in [
20]. To find alarm patterns, the alarms triggered by a fault were transformed into an alarm matrix. However, the actual fault samples are difficult to satisfy its exponential dependence on data volume. A time-sequence method and probability-based method were proposed in [
15] for analyzing alarms. The fault cases on the wind turbine converter and pitch system were used to verify the proposed methods. The results showed that both methods had the potential to rationalize alarm data and identify fault locations. However, the issue of time consumption must be solved when the methods are applied to larger data. An improved Apriori algorithm was proposed in [
21] to analyze alarms, which occurred during blade angle asymmetry fault. The results showed that the related alarms could be integrated into one critical alarm to reduce the number of alarms. The accuracy of the method is limited due to its dependence on sufficient sample data. A clustering analysis of alarm sequences for characterizing and classifying wind stoppages was conducted in [
22]. Despite recent progress, the accuracy of the clustering requires improvement. A multi-dimensional information fusion method based on the Dempster–Shafer evidence theory was proposed in [
23], which obtained a higher diagnosis accuracy of alarm sequences. The results showed that the diagnosis accuracy was affected by the quality of recorded fault labels in maintenance records. A weighted Hamming distance was proposed and applied in the similarity analysis of alarm lists to identify the fault category [
24]. It did not require a time-consuming training procedure and was easy to apply. However, the improvement in accuracy is limited by the number of labeled alarm sequences.
The above research status shows that when diagnosing the wind turbine faults using triggered alarms, the few fault samples and low-quality fault labels have limited the improvement of diagnosis accuracy. Both factors are related to maintenance records because the fault types that trigger alarms are recorded in maintenance records. However, due to the self-inspection function of wind turbines and the irregular work of the operator, a large proportion of alarms has no corresponding maintenance record. That is to say, a large proportion of alarms have no recorded fault labels. Therefore, the actual alarm data contain few labeled alarms and many unlabeled alarms. The existing studies mainly focus on the analysis of labeled alarms. As far as we know, there is no research about how to improve diagnosis accuracy with few labeled and many unlabeled alarms. In addition, the existing literature does not delve deeply into the relationship between individual alarms. Some studies only consider the temporal order or occurrence probability of individual alarms [
15,
21], while others only focus on the relationship between one alarm sequence and another [
20,
22,
24], without considering individual alarms.
To fill this gap, this paper proposes a new fault diagnosis method for wind turbines with alarms. The proposed diagnosis method is designed based on the word embedding technique and a Siamese neural network. Firstly, the Skip-gram model in word embedding is employed to convert non-numerical alarm codes into real-valued vector representations, considering their sequential relationships and frequencies within the alarm sequence (the Skip-gram model will be described in detail in
Section 3.2). Additionally, the pretraining technique in word embedding is utilized to explore the relationships among individual alarms in unlabeled alarm data. Subsequently, by further optimizing the alarm vectors obtained from pretraining using labeled alarm data, the joint utilization of labeled and unlabeled data is achieved. Secondly, the designed fault type diagnostic model based on the Siamese neural network for unknown alarm sequences can delve into the similarity features among alarm sequences (the diagnostic model based on the Siamese neural network will be specifically described in
Section 3.3) and produce similarity scores. In this study, the criterion used to diagnose the fault type of unknown alarm sequences is the similarity score between the unknown alarm sequence and known alarm sequences. Therefore, the overall strategy of the proposed method can be divided into two steps. First, a Siamese convolutional neural network with an embedding layer (S-ECNN) model is proposed to distinguish different alarm sequences. Secondly, the fault category of an unknown alarm sequence is deduced by the similarity score obtained through the S-ECNN model.
The main contributions of this paper can be summarized as follows:
The unlabeled and labeled alarms can be collaboratively applied in the proposed S-ECNN model, which can effectively improve the fault diagnosis accuracy of wind turbines.
The potential relationships among individual alarms are captured in n-dimensional space using a word embedding method, which considers not only the alarm order but also the frequency of occurrence.
The rest of the paper is organized as follows:
Section 2 describes the background of wind turbine alarms and maintenance,
Section 3 presents the proposed fault diagnosis method, the results of experimental verification and discussions are provided in
Section 4, and conclusions are presented in
Section 5.
2. Background
In this section, a brief description of wind turbine alarms and maintenance records is given. Moreover, we analyze the control principle of a wind turbine’s main control system when it deals with alarms, which will explain why there are many unlabeled alarms and few labeled alarms.
2.1. Wind Turbine Alarms
Wind alarm systems vary widely between manufacturers but generally share the same broad functionality. They monitor wind turbines’ operational variables and trigger alarms when the signals exceed threshold limits. A sample of an alarm list is shown in
Table 1. Alarms are recorded continuously in chronological order. The alarm records contain turbine number, triggering time, alarm types, alarm codes, alarm flags, and description. Among them, the alarm code is the unique code of an alarm. The alarm flag represents the start or the end of each alarm. Hence, each alarm has two records.
When a wind turbine experiences a fault, it can result in alterations to multiple variable values and the subsequent generation of multiple alarms. Nevertheless, these alarms, occurring in a short time frame, are not indicative of the specific fault type. As such, further analysis of the alarms is necessary to identify the underlying cause of the fault.
Furthermore, it can be observed that the alarm data are in non-numerical form. To efficiently analyze and process this data, it is necessary to convert these non-numerical data into numerical form. Finding a reasonable and effective transformation method is one of the problems addressed in this paper.
2.2. Maintenance Records
After a wind turbine stops due to alarms, manual inspections are arranged by maintenance personnel. The technicians investigate the turbine malfunction and document the specific details in maintenance records. Consequently, the fault type or tag that triggers the alarm is recorded in the maintenance records.
Table 2 provides an example of a maintenance record. The record contains the turbine number, the start time and end time of maintenance activity, the actual faults, and the solutions to faults. However, not all faults can be found in the maintenance records. This is primarily because the wind turbine’s main control system automatically handles certain alarms.
To ensure the safety of wind turbine operation, the main control system responds to specific faults that trigger multiple alarms by performing different operations to eliminate them. The controlling principle is illustrated in
Figure 1, wherein each alarm level corresponds to a particular severity of abnormality. When the alarm level is low, no operation is performed. When the alarm level is moderate, the wind turbine is restarted or reset. If the moderate-level alarm persists even after a restart or reset, the wind turbine is shut down. When the alarm level is high, the wind turbine is immediately shut down. After the shutdown, the main control system executes pre-set actions through the self-inspection function. If the alarms persist, manual maintenance is performed and the fault events are documented in the maintenance records.
From the above, we can draw the following conclusions:
When a wind turbine is shut down due to alarms, manual maintenance will be performed. However, many alarms cannot cause a shutdown. Thus, the fault events that trigger these alarms are not available.
Some alarms that can cause a shutdown are eliminated by the self-inspection function and thus have no recorded fault events.
In addition, during the actual maintenance activities, due to the irregular work of the operator, some maintenance details are missing. Thus, more alarms have no available fault events. In this paper, we name these alarms the unlabeled data. On the contrary, the alarms that have available fault events are named the labeled data. The fewer labeled data make it harder to diagnose wind turbine faults. On the other hand, the unlabeled data are generally ignored by the existing studies. We will address both issues in this paper.
5. Conclusions
This paper proposed a novel fault diagnosis method for wind turbines with alarms based on word embedding and a Siamese convolutional neural network. To improve diagnosis accuracy, the proposed method collaboratively used labeled alarm sequences and unlabeled alarm sequences. For the unlabeled alarm sequences, the potential relationship among alarms was mined using the Skip-gram model, and n-dimensional pretrained alarm vectors were obtained. For the labeled alarm sequences, the discriminative features were extracted to distinguish different alarm sequences by the proposed S-ECNN model, in which the pretrained alarm vectors were optimized and trained. The effectiveness of the proposed method was proved by using the actual alarm data of a wind farm in China. The accuracy of the proposed S-ECNN model for distinguishing different alarm sequences was 86.8%, which was higher than its variants. The result indicated that the collaborative use of the labeled and the unlabeled alarm sequences could effectively improve the distinguishing ability. The macro-average accuracy of the proposed method for fault diagnosing was 97.0%, which was higher than its variants and the existing three methods. The result indicated that the proposed method could effectively improve fault diagnosis accuracy. In addition, the embedding layer introduced in the proposed network provides the possibility of transfer learning, which will be further researched in the following works.
The method proposed in this paper utilizes word embedding to convert alarms into numerical vector representations. Furthermore, alarm sequences consisting of multiple alarms can also be represented in matrix form. In industrial settings, alarm codes continue to increase and are not presented in the form of alarm sequences. Therefore, it is of great research value to investigate how to predict the next alarm code based on historical alarm sequences and thereby forecast the type of failure that wind turbines are likely to experience. Additionally, studying the relationship between the occurrence of alarms and wind turbine power and load is another worthwhile research question. Based on the numerical representation of alarm sequences, alternative time series data mining models such as Bi-LSTM [
37] can be used to establish prediction models.