Effective Fault Detection and Diagnosis for Power Converters in Wind Turbine Systems Using KPCA-Based BiLSTM

: The current work presents an effective fault detection and diagnosis (FDD) technique in wind energy converter (WEC) systems. The proposed FDD framework merges the beneﬁts of kernel principal component analysis (KPCA) model and the bidirectional long short-term memory (BiLSTM) classiﬁer. In the developed FDD approach, the KPCA model is applied to extract and select the most effective features, while the BiLSTM is utilized for classiﬁcation purposes. The developed KPCA-based BiLSTM approach involves two main steps: feature extraction and selection, and fault classiﬁcation. The KPCA model is developed in order to select and extract the most efﬁcient features and the ﬁnal features are fed to the BiLSTM to distinguish between different working modes. Different simulation scenarios are considered in this study in order to show the robustness and performance of the developed technique when compared to the conventional FDD methods. To evaluate the effectiveness of the proposed KPCA-based BiLSTM approach, we utilize data obtained from a healthy WTC, which are then injected with several fault scenarios: simple fault generator-side, simple fault grid-side, multiple fault generator-side, multiple fault grid-side, and mixed fault on both sides. The diagnosis performance is analyzed in terms of accuracy, recall, precision, and computation time. Furthermore, the efﬁciency of fault diagnosis is shown by the classiﬁcation accuracy parameter. The experimental results show the efﬁciency of the developed KPCA-based BiLSTM technique compared to the classical FDD techniques (an accuracy of 97.30%).


Introduction
Wind energy is one of most essential substitute energies due its competitive cost and maturity of technology.According to the World Wind Energy Association (WWEA), the total capacity of all wind farms worldwide reached 744 GW in 2020.
Due to the development of wind power production, enhancement of the control of wind energy conversion (WEC) systems is required.For this reason, manufacturers' efforts have been focused on the improvement of these systems' lifetimes and the decrease of operation breakdowns (downtime maintenance process), leading to continuous energy production with high power quality [1,2].
Wind energy conversion (WEC) systems are composed of various interconnected electrical and mechanical elements.However, unexpected failures usually accompany the operation of these systems.When a fault in a system occurs, it can have an adverse effect on the system's availability, in addition to the production rate.Indeed, many components of wind turbines (WT) can fail due to harsh environmental and operating conditions, resulting in lengthy downtime maintenance periods [3,4].The most common failures are related to blades [5,6], generators [7,8], power converters [1,9], and gearboxes [10,11].As a crucial component and the heart of these systems, the power converter plays a significant role in transferring the generated power to the grid.It converts electrical energy that varies according to the wind speed to energy with a constant frequency complying with grid specifications [12].It was indicated in [13] that 21% of 25% of the total failures in WEC converters (WECC) are caused by the semiconductor.In order to avoid the WECC collapse, these failures should be detected and diagnosed at an early stage.Therefore, fault detection and diagnosis (FDD) is viewed as essential means to achieve these goals [14].The authors of [15,16] considered multiple faults in the same-side converter.They address multiple faults in both converter sides at once.The authors of [17] have studied multiple faults by modeling both converter sides as a state space equation.In [18], the authors examined two open-switch faults in one sub-module and also addressed the detection of multiple faults in random sub-module elements.However, the linking effects between generator-side and grid-side converters are not taken into account, which could affect considerably the system behavior.The authors in [14] focused on simple faults in both converter sides.This current work deals with faults in both converter sides, taking into consideration all possible fault scenarios such as simple fault generator-side, simple fault grid-side, multiple fault generator-side, multiple fault grid-side, and mix fault on both sides.Each scenario affects the system behavior in a different way, accordingly, considering each of it is a crucial task.
Generally, FDD approaches can be categorized into two main classes: the model-based and the data-driven methods.Model-based FDD uses observers and system identification models of the processes; it demands a precise mathematical model, which is complicated to acquire in reality.Its performance is dramatically impacted by uncertainties and unmodeled noises [19,20].Data-driven methods aim to extract information from the measured signals to train the model, and then use the information for diagnosis in the testing phase [21][22][23].Numerous studies based on machine learning approaches have been employed in WEC FDD, such as decision tree (DT) [24], naive Bayes (NB) [25], support vector machine (SVM) [26], K-nearest neighbors (KNN) [27], and random forest (RF) [14].In [2], a WEC fault diagnosis technique based on an RF and kernel principal component analysis (KPCA) approach is developed.In this proposal, KPCA is applied to extract the most informative features from data, with the aim of improving the classification results using an RF classifier.In [24], the authors introduce five-stage statistical process control and machine learning methods to diagnose wind turbine faults (rotary blades, gearboxes, generators, and hydraulic oil systems) and predict maintenance demands.The five adopted analytical tools in statistical process control are: (1) check lists, (2) Pareto charts, (3) cause and effect diagrams, (4) scatter plots, and (5) control charts.Firstly, the check list comprises information such as the type of wind turbine faults, the duration of faults, causes, and repair events.Authors have classified the repair events by frequency of anomalies in the dataset.Secondly, a Pareto chart is developed based on the classified check list items and presents the repair events with regard to cumulative percentage.Thirdly, an analytical tool, that is, the cause and effect diagram, is presented in order to distinguish the essential causes of principal mechanical issues and produce recommendations to technicians for maintenance.Fourthly, scatter plots are applied to investigate the relationship between features and determine abnormal data.Lastly, control charts are applied to show changes and variation in the observed data over time.After that, a density-based spatial clustering of applications with noise (DBSCAN) approach is used to represent the relationship between the entire amount of wind generation and the five attributes, in addition to ranking normal and abnormal data.Finally, two machine learning techniques-decision tree and random forest-are applied in order to construct a predictive maintenance models for anomalies.The inherent disadvantages of traditional ML-based approaches make them ineffective at representing complex functions due to their unsatisfactory performance and their generalization capabilities.With the explosion of deep learning (DL) algorithms in artificial intelligence (AI) applications, technology has shown a strong ability to surpass conventional intelligent algorithms [28], whose problems include their dependence on hand-designed feature, as well as their difficulty in understanding sequential data.Thus, many researchers have opted to use DL modes instead of traditional classifiers in fault diagnosis.In fact, the major distinction between AI models and DL models is that the latter can automatically learn precious features directly from raw data.Considering the rapid rise of DL, many architecture have been developed, such as convolutional neural networks (CNN), deep belief networks (DBN), and recurrent neural networks (RNN).
Authors in [29] propose an ensemble transfer CNN driven by multi-channel signals for fault diagnosis of rotating machinery.In this case, modified CNNs based on stochastic pooling and leaky rectified linear unit (LReLU) are pre-trained using multi-channel signals.Then, the target CNN is initialized using the learned parameter knowledge of each individual source CNN with the help of parameter transfer.Lastly, in order to achieve the comprehensive result, a new decision fusion procedure is constructed to flexibly fuse each individual target CNN.An FDD approach based on the convolutional neural network long short-term memory attention mechanism (CNN-LSTM-AM) for anomaly recognition and fault detecting of wind turbine is suggested in [30].The CNN is used to extract features of state space from wind turbine, LSTM is applied to improve the time characteristics fusion of different part states, and AM is used to help the model make more accurate judgments through mapping weight and parameter learning.The authors of [31] propose an approach to regularize the discriminant structure of the deep network with both intrinsic and extrinsic generalization goals in order to improve the learning of robustness features and to generalize to unseen domains.In [32], the authors develop an improved RNN techniques for fault detection and diagnosis for wind energy conversion (WEC) systems.In the beginning, a reduced RNN-based hierarchical K-means clustering is adopted in order to simplify the complexity of the model in terms of training and computation time.It is used to treat the correlations between samples and extract a reduced number of observations from the training data matrix.Then, two reduced RNN-based interval-valued-data methods are developed for classification purposes.
With the RNN, sequence inputs of variable length can be handled due to the recurrent hidden states, whose activation at any particular time is dependent on that of the previous moment.Other research proposes long short-term memory (LSTM) to directly learn features and time-series data [33].In fact, the recursive behavior of the LSTM gate architecture allows it to capture long-term dependencies and efficiency figures without the gradient vanishing problem of recurrent neural networks (RNNs) [34].
In the current work, we propose an innovative fault diagnosis paradigm using KPCAbased BiLSTM.In fact, the previous studied LSTM-based fault diagnosis approaches were applied directly to raw data without taking account the impact on the extracted and selected features on the classification accuracy, as well as the nonlinear behavior of features.To address these issues, a KPCA-based bidirectional LSTM (KPCA-based BiLSTM) FDD approach is proposed to detect the faults and distinguish between the working modes in the WTC systems.The KPCA model is able to deal with noisy, nonlinear, multivariate, and statistical features [35].In comparison to other nonlinear techniques, KPCA has the advantages of not involving nonlinear optimization, requiring no prior specification of reduced space dimensions, and being able to handle a wide range of nonlinearities due to its ability to use different kernels [36].Therefore, in this work, the KPCA feature extraction/selection paradigm and the BiLSTM classification model are applied to detect and classify the WTC faults.The proposed approach makes full use of the KPCA for powerful feature extraction/selection and BiLSTM for fault diagnosis, which can solve the problem of nonlinear, statistical, and multivariate feature extraction and fault diagnosis in WTC systems.
This paper is organized as follows: Section 2 is dedicated to a brief description of the KPCA tool used in feature extraction and selection and of the BiLSTM technique for classification purposes.Section 3 presents the application of the developed methodology for fault detection and diagnosis.Finally, the conclusions are illustrated in Section 4.

Model 2.1. Bidirectional LSTM Description
LSTM was derived from recurrent neural networks (RNN) in 1997 by Hochreiterand Schmidhuber [28].It was developed to tackle the vanishing gradient issue witnessed in RNNs.Hence, to achieve this target, the architecture of the LSTM has three gates: the input gate, forget gate, and output gate.Figure 1 illustrates the LSTM cell with input gate (i t ), forget gate ( f t ), and output gate (o t ), which are denoted by the following equations: The forget gate ( f t ) indicates what information of the previous state (C t−1 ) will be forgot or kept by looking at the values of the current input vector (x t ) and hidden state (h t−1 ), as given in the following equation: where W f and b f represent the weight matrix and the bias term, respectively.
In the same way and in order to update the cell state, the input gate (i t ) decides how much information from the input (x t ) and (h t−1 ) must pass, expressed as: where Ct denotes an immediate condition.
The updated state of the cell (when deciding which information to reserve and which to forget) is presented as follows: where C t represents the long term state and the symbol denotes element-wise vector multiplication.The output gate (o t ) checks the flow of information from the current cell state to the hidden state.
where h t denotes the output.LSTM exists in several architectures [37] and it might be used in the following forms: vanilla LSTM, stacked LSTM, CNN-LSTM, encoder-decoder LSTM, and bidirectional LSTM.The last of these is the focus of this study.
In 2005, Graves and Schmidhuber developed the bidirectional LSTM by fusing the BRNN with the LSTM cell.The sequential data have strong temporal dependencies in machine disease monitoring systems [38].Thus, it is important to take into consideration the future situation [39].Accordingly, the BiLSTM is an essential means of handling this case.Figure 2 illustrates the general concept of BiLSTM architecture.The architecture for classification purposes is shown in Figure 3.It is composed of an input layer, a BiLSTM layer followed by a fully connected layer, and a softmax layer at the output.
BiLSTM can learn input in both directions: forward and backward.The forward LSTM treats data from left to right and its hidden state can be expressed as , while the backward LSTM treats information in the opposite direction, and its hidden state can be presented as Finally, concatenate the forward and backward states to generate the BiLSTM output, as presented in the following equation: Take into consideration that the final hidden state h f encodes the most features from the input signal and uses this as input to the fully connected layer, which aims to convert it into a vector in which the length is equal to the class number.A softmax layer is approved for fault classification.The probability distribution is given as: where W s and b s indicate the weight and bias, respectively.
where z i is the ith element of the input vector z.The BiLSTM model is trained by minimizing the error between the predicted Ỹ and actual Y.

Kernel Principal Component Analysis
KPCA extends conventional PCA to handle nonlinear data [40].In fact, KPCA includes two main steps: (1) mapping the data into a higher dimensional feature space, and (2) performing the linear PCA in that space.

KPCA-Based Feature Extraction
Consider the data matrix where N denotes the number of samples and m represents the number of variables.The mapped data in the new feature space is organized as follows: where h >> m is the dimension of the feature space.The following Eigenvector expression is used to compute the kernel principal components (KPCs): This equation indicates that α and λ are the Eigenvectors and Eigenvalues of the kernel matrix K.The kernel matrix K is declared as:

KPCA-Based Features Selection
The feature selection function intends to select the smallest group with most relevant and expressive features by eliminating all the irrelevant and redundant features.The given equation represents the kernel matrix Eigenvector [41]: The matrix P = [v 1 , . . ., v ] denotes the matrix of the retained principal loading of the KPCA in the feature space.Referring to Equation (2), the matrix P can defined as: where P * = [α * 1 , . . ., α * ] and Λ = diag(λ 1 , . . ., λ ) are the principal Eigenvectors and Eigenvalues of K, respectively.
Moreover, the kernel principal components are computed as: The selection of the number of kernel principal components (KPCs) has been subjected to various studies; Ref. [42] details some of them.In this work, the cumulative percent variance (CPV) criterion is used to select the first KPCs in the KPCA model.The features extracted from the KPCA model are the first retained KPCs.

Proposed Approach
The proposed methodology includes two major steps comprising feature selection and extraction, and fault classification.The developed approach is discussed in such a way that the kernel principal component analysis (KPCA) method is applied for feature extraction goals and the BiLSTM classifier is used for fault diagnosis.The goal of this methodology is to reduce the complexity of the proposed classifier.The first step of the proposed methodology includes the gathering of WECC data.Then, the KPCA is applied to the data in the interest of extracting and selecting the most effective and relevant features.In the next step, the final features subset is considered as input to the BiLSTM tool to classify faults and distinguish between the different operating mode.To summarize, the current paper presents an intelligent fault diagnosis approach based on the KPCA model and the BiLSTM classifier.In fact, the classical BiLSTM-based fault diagnosis techniques were previously utilized directly on the raw data without considering the impact on the feature extraction and selection phase in the diagnosis performance.To deal with these issues, a multivariate KPCA-based bidirectional LSTM classifier approach is presented to detect and identify the faults in WTC systems.In the developed FDD approach (so-called KPCA-based BiLSTM), the KPCA model is applied to extract nonlinear, multivariate, and statistical features, and BiLSTM is utilized for fault classification purposes

System Description
In this paper, a variable speed wind turbine based on a squirrel cage induction generator (SCIG) is considered, as illustrated in Figure 5.This structure offers unlimited variable speed operation.No matter the rotation speed of the machine, the voltage created is rectified and converted into direct current and voltage.Accordingly, the grid-side converter command assists in giving an alternating voltage with a constant frequency referring to that of the grid.The maximum power generated by the turbine is determined by the nominal power of the generator.For this configuration, the grid-side generator is based on an insulated gate bipolar transistor (IGBT), the structure of which is the same as that of the grid-side converter.The wind turbine parameters are shown in Table 1.The power converters are a crucial component in WEC systems.The authors of [43] proved that 21% of the faults in power converters are attributed to semiconductors (IGBT, diode), as shown in Figure 6.In the wind chain, the power converter topology exists on two levels.Each converter is composed of three arms.Each arm includes a high and a low IGBT, (as shown in Figure 8).

Data Collection
In order to construct a data base to perform FDD, a test bench must be designed under realistic conditions.As detailed in Figure 9, the test setup should be positioned to stress the IGBT modules as they would be in a real wind turbine application.For the sake of injecting short-circuits and open-circuits, we add a controlled switch either in parallel or in serial.This paper deals with several fault scenarios and each scenario comprises different cases, as shown in Table 2.

•
First scenario: This denotes simple faults that concern just one IGBT on the generatorside converter (SFGS); • Second scenario: This denotes simple faults that concern just one IGBT on the gridside converter (SFGrS); • Third scenario, forth scenario: Practically, there can be more than one fault on the same converter side; in this paper, we consider multiple faults on the generator side (MFGS) and grid side (MFGrS) separately; • Fifth scenario: In the real word, faults may happen on both the converter sides simultaneously; for that reason, we consider mixed faults (MxF); • Sixth scenario: In order to monitor the system in all its states, we combine all the above scenarios.Figures 10 and 11 clearly demonstrate that faults do not affect the system behavior in the same way.In fact, some fault scenarios do not significantly affect the behavior of the system, in which case service can be maintained until the fault is isolated, as illustrated in Case 12.For example, the output power in healthy mode is almost constant, while when the fault is injected, the same level of power is found with some oscillations.Other types of faults that considerably affect the behavior of the system are considered serious.In Case 2, for instance, the generator current reaches around 500A, which is an insupportable current for the system, and in this situation the system must be taken out of service immediately.

Performance Metrics
In order to evaluate and compare performance, the approved criteria are: accuracy (%), which indicates the rate of observations correctly predicted over the total number of observations; recall (%), which indicates, in the pertinent class, the rate of positive observations correctly predicted to observations; precision (%), which indicates the number of positive observations correctly predicted divided by the number of total predicted positive observations; F1 score (%), which indicates the weighted average of precision and recall; and computation time (CT(s)), which represents the time required to carry out the algorithm.
where TP (true positive) is properly classified positive observations, FP (false positive) is mis-classified positive observations , TN (true negative) is correctly classified negative samples, and FN (false negative) is misclassified negative observations.

Parameters Setting
In this work, the 95% cumulative variance criterion is applied to select the retained KPCs where 32 KPCs are maintained.Sampling noise can appear during the training process due to the complex relationships among inputs and outputs of neural networks, leading to overfitting, which decreases the predictive capability of the model [44].In order to avoid this issue, optimal hyperparameters are used in this paper (as shown in Table 3), such as the Adam optimization algorithm, which is used in order to decrease the error in each iteration.Actually, Adam exceeds other optimization algorithms due to its relatively low memory requirement [45], as well by using dropout, which is a method that evades extracting same features over and over again to reduce the risk of overfitting [46].For the NN, FFNN, CFNN, and RNN classifiers, the number of selected hidden layers is equal to 10 and the number of neurons in the hidden layers is 50.For the CNN classifier, we used a convolution layer, ReLU function, pooling layer, fully connected layer, and softmax layer.Furthermore, to train the neural network, CNN uses the cross-entropy loss function.Moreover, Adam optimization algorithm is applied.

Fault Classification Results
For the purpose of performing the different experiments for FDD purposes, ten variable measurements are gathered, as listed in Table 4.These variables represent 1 healthy (at-tached to class C 0 ) and 15 faulty operating conditions of WECC (attached to C i ; i = 1, . . ., 15), as shown in Table 5.Each mode behavior is adequately described over 2000 10-spaced samples, with 20 KHz as the sampling frequency for the training phase.We used of the samples 80% for the training phase and 20% for the testing phase.

Variables
Descriptions N g : Generator speed (tr/m) x 3 i sag : Generator current phase a (A) i sbg : Generator current phase b (A) i scg : Generator current phase c (A) x 6 V DC : Bus voltage (V) x 7 P out : Output power (W) i sar : Grid current phase a (A) x 9 i sbr : Grid current phase b (A) x 10 i sbr : Grid current phase b (A) In this paper, various classifiers are applied and the best classifier is selected on the basis of its classification accuracy.Table 6 illustrates the global performance accuracy.
In Scenario 1, the faults occur in the grid-side converter, which do not seriously affect the behavior of the wind system.In this case, all the developed techniques have showed high diagnosis performance except for CNN.However, in Scenario 2, different faults are presented in the generator-side converter that considerably affect the behavior of the system.This affects the diagnosis performance of the applied FDD techniques.In cases of Scenarios 3, 4, and 5, the FDD techniques showed good results, with the exception of CNN.When dealing with all fault scenarios, it is clear from Tables 6 and 7 that the BiLSTM classifier provides better classification performance when compared to the classical methods.In order to improve further the above results, a novel FDD approach is proposed using a KPCA-based BiLSTM, in which the most informative features are extracted and selected using KPCA and then fed to the BiLSTM for fault classification purposes.
As shown in Table 8, the developed KPCA-BiLSTM approach reached an accuracy rate of 97.20%.This result demonstrates its enhanced classification performance when compared to the standard BiLSTM.To better assess the efficiency of the proposed approach, the testing classification results are illustrated in Table 9 using the confusion matrix (CM).The CM illustrates the correctly classified samples and misclassified ones for the healthy case (C 0 ) and faulty cases (C 1 to C 15 ).For example, for the healthy case (C 0 ), the KPCA-based BiLSTM approach determined 2320 observations among 2500 (true positive).For this class, 7.2% were misclassifications (false alarms).In the faulty operating modes (C 5 , C 8 , C 9 , C 11 , C 12 , C 13 ), the precision was 100% and the recall was 100%, with 0.0% misclassification.

Conclusions and Future Works
In this paper, an enhanced KPCA-based BiLSTM method was presented for wind energy conversion (WEC) system fault detection and diagnosis (FDD).The proposed FDD approach was addressed in such a way that the extracted and selected features using the KPCA model are introduced as input for the BiLSTM for classification purposes.In fact, the effectiveness of the proposed classifier was validated by comparing it with several classical methods, including NN, FFNN, CFNN, RNN, and CNN.In order to evaluate the performance of the developed KPCA-based BiLSTM approach, we used data obtained from healthy WEC converters (WECC) that were then injected with several fault scenarios of fault: simple fault generator-side, simple fault grid-side, multiple faults generator-side, multiple faults grid side, and mixed faults both side.The obtained results showed the effectiveness and robustness of the proposed FDD approach in terms of accuracy, recall, precision, and computation time.The fault diagnosis accuracy when using the proposed tools showed some missed detection and false alarm results, and some faults were not correctly classified.Thus, one future research direction is to develop adaptive BiLSTM-based tools to update the model in order to reduce missed classification results.Another future direction is to develop adaptive BiLSTM-based approaches dealing with uncertainties in WTC systems using interval-valued data representation.Additionally, ensemble-based models will be developed using multiple models in order to enhance decision-making accuracy.Ensemblebased models merges multiple learning models in order to produce one optimal predictive model that gives effective diagnosis results.Furthermore, in this study, we considered a wind profile where the mean value of the speed, as well as the pitch angle, is constant.In the real world, the wind has a variable profile according to climatic conditions.Thus, one future research direction is to implement an FDD approach while taking into account wind variations.

Figure 4 .
Figure 4. Architecture of the proposed approach.

Figure 6 .
Figure 6.Common catastrophic failures of IGBT.The usual faults in power switches involve two type of failures: wear-out failures and catastrophic failures.The first type ensues from long time degradation, while catastrophic faults generally happen due to one overstress incident.This paper concern only opencircuits and short-circuits, which cause irretrievable harm to the converter system.In fact, open-circuit faults of IGBT do not cause serious damage to the converter, but influence the performance of the other-side converter and the feedbacks in the control loop.Figure 7 classifies IGBT catastrophic failures into open-circuit and short-circuit statuses arising from various failure mechanisms.

Figure 9 .
Figure 9. Illustration of failure mode distribution.

Figures 10 and 11
Figures 10 and 11  show the behavior of some electrical and mechanical variables in different faulty cases.

Figure 10 .
Figure 10.Input torque and output power for different cases.

Figure 11 .
Figure 11.Generator current and grid current for different cases.

Table 2 .
Construction of database for the fault diagnosis system.

Table 4 .
Labeling and description of the measured and monitored system variables.

Table 5 .
Creation of database for fault diagnosis system.

Table 6 .
Performance comparison of conventional techniques.

Table 7 .
Performance comparison of deep learning techniques.

Table 8 .
Performance comparison of different techniques.

Table 9 .
Confusion matrix of KPCA-based BiLSTM in testing phase.