Misalignment Fault Prediction of Wind Turbines Based on Combined Forecasting Model

: Due to the harsh working environment of wind turbines, various types of faults are prone to occur during long-term operation. Misalignment faults between the gearbox and the generator are one of the latent common faults for doubly-fed wind turbines. Compared with other faults like gears and bearings, the prediction research of misalignment faults for wind turbines is relatively few. How to accurately predict its developing trend has always been a difficulty. In this paper, a combined forecasting model is proposed for misalignment fault prediction of wind turbines based on vibration and current signals. In the modelling, the improved Multivariate Grey Model (IMGM) is used to predict the deterministic trend and the Least Squares Support Vector Machine (LSSVM) optimized by quantum genetic algorithm (QGA) is adopted to predict the stochastic trend of the fault index separately, and another LSSVM optimized by QGA is used as a non-linear combiner. Multiple information of time-domain, frequency-domain and time-frequency domain of the wind turbine’s vibration or current signals are extracted as the input vectors of the combined forecasting model and the kurtosis index is regarded as the output. The simulation results show that the proposed combined model has higher prediction accuracy than the single forecasting models.


Introduction
The problem of energy shortage and environmental degradation in the world is becoming more and more serious. Wind energy as environmentally friendly and renewable energy has attracted increasing attention [1]. The cumulative installed capacity of global wind power has also steadily increased in recent years [2]. Because wind turbines are often located in remote areas and the working environment is poor, many wind turbines often fail during operation, which greatly reduces their work quality and efficiency and increases maintenance costs [3]. Therefore, how to effectively decrease the risk of fault during the operation of wind turbines has become a difficult problem.
At present, doubly-fed wind turbines (DFWT) have become the main units for large-capacity wind farms [4]. Due to installation errors, deformation after loading or frequent wind speed fluctuations, misalignment between the gearbox and the generator often happens [5]. The misalignment fault of wind turbines belongs to a latent fault [6,7]. This is because when it happens in actual operation, the unit's operating parameters will not reach their early warning values immediately, but when the fault accumulates to a certain extent, it will seriously damage the unit's equipment and cause unit shutdown [8]. Therefore, it is necessary to predict the latent trend of misalignment, which can overcome the blindness of handling the fault and avoid more loss of human and material resources.
There are many commonly used signals for mechanical fault prediction, such as vibration signals, current signals, temperature signals, pressure signals, etc. [9][10][11][12][13]. When the equipment fails, the amplitude of the mechanical vibration will increase [14]. Therefore, vibration signals often more quickly and directly reflect the operational status of the equipment. Compared with vibration signals, the current signals can be more easily obtained and are not easily affected by noise. Thus, in this paper, the vibration signal and current signal are regarded as the signal sources to research the misalignment fault of wind turbines.
After the fault signals are obtained, a reasonable and effective prediction method is necessary for accurately predicting the future operational status of the equipment faults. At present, many scholars have studied the prediction techniques for different types of faults [15][16][17]. In order to determine a suitable forecasting model of misalignment fault in wind turbines, the commonly used prediction methods are summarized in Table 1. For the complex non-linear system, a single forecasting model is not enough to obtain ideal prediction results. Therefore, in order to predict the mechanical fault accurately, the combined forecasting model has attracted more and more attention from scholars. For example, in Ref. [18], the improved Grey Model (GM (1,1)) and the Back Propagation (BP) neural network optimized by Genetic Algorithm (GA) were used as the single forecasting models. The minimum sum of error squares was used as the combination principle to assign appropriate weight coefficients to them. The combined forecasting model had a smaller prediction error. Ref. [19] proposed a calculation method of combined weight coefficients for the unequal weight of error. The combined forecasting model was constructed by Multivariate Grey Model (MGM (1, n)) and Extreme Learning Machine (ELM) neural network. The combined forecasting model was more suitable for predicting the trend of the bearing fault. In Ref. [20], according to the minimum variance principle, Support Vector Machine (SVM) and grey model were combined to make up the shortcomings of single forecasting models. In Ref. [21], SVM was used as the combiner of forecasting models. The Kalman filter, BP neural network and SVM model were used as single forecasting models. The prediction errors of the single forecasting models were larger than that of the combined model. In Ref. [22], the BP neural network was used to determine the weight coefficients of each single forecasting model. The combined forecasting model using GM (1,1, θ ) optimized by Particle Swarm Optimization (PSO) algorithm and SVM optimized by PSO achieved better prediction accuracy for the short-term load of a regional power grid.
Because the wind turbine is a complex non-linear system, when the misalignment fault occurs, the fault signals have both certainty and randomness characteristics [23]. In addition, the misalignment fault samples obtained in this paper are relatively few. It can be indicated from Table  1 that the grey prediction model is suitable for predicting deterministic trends with few samples, while Least Squares Support Vector Machine (LSSVM) is suitable for predicting the non-linear and stochastic trends with few samples and higher speed [24]. Therefore, the grey prediction model and LSSVM are selected to be the prediction methods for the misalignment fault of wind turbines. In the grey prediction models, the MGM (1, n) can use the multivariate characteristic parameters of the fault state as the inputs of the prediction model [25], which can comprehensively reflect the fault state at the previous moment and establish a more accurate prediction model. Therefore, this paper uses MGM (1, n) to predict the misalignment faults of wind turbines. Because MGM (1, n) has the disadvantage of only being suitable for short-term prediction, the rolling prediction method is used to improve MGM (1, n). Compared with the combination of fixed weight coefficient, LSSVM can assign non-linear weight coefficients to the prediction values of single prediction models dynamically, which makes the weight allocated more reasonably. Therefore, in this paper, a combined forecasting model using LSSVM optimized by quantum genetic algorithm (QGA) as a non-linear and variable weight combiner is proposed. The output prediction values of both the LSSVM model optimized by QGA and the improved Multivariate Grey Model (IMGM (1, n)) are input to the non-linear combiner to get the final predicted values. The vibration and current signals from the misalignment fault simulation model of wind turbine demonstrate that the combined forecasting model has higher prediction accuracy than the single ones.

Multivariate Grey Model
In 1982, Professor Julong Deng of Huazhong University of Science and Technology proposed the grey system theory, which has good adaptability to small samples and uncertain systems [26]. Because the GM (1,1) model only uses single time series data and it cannot reflect the interaction between multiple variables, some scholars have proposed a MGM (1, n), where n is the number of variables. The MGM (1, n) is not a simple combination of GM (1,1), but a generalization from the univariate GM (1,1) to the multivariate case by solving n differential equations simultaneously [27].
It is assumed that (0) ( ),( 1,2, , ; 1,2, , ) i x k i n k m = =   is n sets of time series data, and k is the number of each data set. The process of establishing a MGM (1, n) prediction model is as follows: (1) Accumulate the original data to generate a new sequence data (1) (2) A system of n-ary first-order ordinary differential equations can be used to express MGM (1, n): Equation (2) can be written in matrix form as: (1) (1) where, 11 (3) By the method of least squares, the parameter matrices A and B can be estimated. Assume that . The corresponding matrix can be expressed as: where, and I is the unit matrix.
(4) The predicted values of MGM (1, n) can be obtained.

Improved Multivariate Grey Model
Compared with GM (1, 1), MGM (1, n) has the advantage of considering multiple input features at the same time. However, both the GM (1,1) and MGM (1, n) are not suitable for medium to longterm forecasting. Because the rolling prediction method can make the prediction model achieve medium to long-term forecasting, MGM (1, n) is combined with the rolling prediction method in this paper to get the improved MGM (1, n) prediction model (IMGM (1, n)). The schematic diagram of IMGM (1, n) is shown in Figure 1. In the IMGM (1, n), based on the establishing process of MGM (1, n), the rolling update of the modelling data is performed by adding the actual value corresponding to the previous step of the current prediction point and removing the first one in the previous modelling data to achieve the dynamic addition of new information. The modelling data is updated every time the model outputs one predicted value.
Assume that ( of cycles is T and the number of predicted steps for each cycle is one. The predicted process of IMGM (1, n) is as follows: • As shown in Figure 1, the original data (0)

LSSVM Optimized by Quantum Genetic Algorithm
The Least Squares Support Vector Machine (LSSVM) replaces inequality constraints with equality constraints, regarding the sum of squared errors as experience losses of training set, transforming quadratic programming problems into linear equations [28]. The Radial Basis Function (RBF) is simple to calculate, requiring few parameters to be determined, and with strong generalization ability. The equation of the Radial Basis Function (RBF) is as follows: where σ is the kernel width. In this paper, the RBF kernel function is used as the kernel function of LSSVM. The regularization parameter γ and the parameter 2 σ of the RBF kernel function have a great influence on the prediction accuracy of LSSVM. If the value of γ is too large or too small, the generalization ability of the system will be deteriorated. The value of 2 σ will also affect the performance of the model. Therefore, choosing appropriate parameters can give LSSVM a good prediction effect. In this paper, the QGA is used to realize the adaptive selection of the parameters of LSSVM.
The QGA is an intelligent optimization algorithm combining quantum computing and genetic algorithms, which was proposed by K. H. Han et al. [29]. The probability amplitude representation of qubits is applied to the coding of the chromosome so that a chromosome can express the superposition of multiple states. The operation of quantum revolving gate can update the chromosome, and thus the optimal solution of the goal can be achieved [30]. Compared with GA, QGA has the characteristics of small population size, strong optimization ability, high convergence speed and short calculation time [31].
In GA, the commonly used coding methods for chromosomes are binary coding, real coding and symbol coding. In QGA, chromosomes are encoded using qubits [32].
The state of a qubit can be expressed as: where, α and β are complex constants and satisfy . Qubits are used to store and express a gene. The gene can be in a "0" state or a "1" state, or any superposition state of them, which makes QGA have better diversity characteristics than GA. Multi-state genes encoded by qubits are shown in equation (10).
where, t j q is the th j chromosome of the th t generation; , t i j α and , ,( 1,2, , ; 1,2, , ) the quantization information of the gene. k is the number of qubits of each gene and m is the number of chromosome genes. For QGA, the quantum revolving gate in quantum theory is used to achieve population update. The operations of selection, crossover and mutation of GA are not adopted. The matrix of quantum revolving gate is expressed as equation (11): The update process is as follows: where, is the new qubit after the update. i θ is the rotation angle and it is determined according to a previously designed adjustment strategy [33]. Before using QGA to optimize the parameters of LSSVM, the fitness function needs to be defined first. In this paper, the Root Mean Square Error (RMSE) is used as the objective function. The calculation formula of RMSE is shown in equation (13): where, i y is the actual value of the th i sample;ˆi y is the predicted value of the th i sample;

Combined Prediction
The combined prediction was first proposed by Bates. J. M and Granger. C. W. J. The combined forecasting model is constructed by assigning different weights to the prediction values of single forecasting models [34]. The classification of combined prediction can be divided into the following two types: • According to the functional relationship between the combined and the single forecasting models, the combined forecasting model can be divided into linear and non-linear combination prediction [35].

•
According to the weight coefficients of the single models, the combined forecasting model can be divided into fixed weight and variable weight combination prediction [36].
The linear combination prediction has relatively large errors and has great limitations, while the fixed weight combination prediction cannot dynamically adjust the combination weight, therefore it is necessary to use a non-linear and variable weight combined forecasting model. For example, neural networks and SVM are non-linear combiners, and these two combination methods can realize nonlinear and variable weight combination of the single forecasting models. However, neural networks are not suitable for processing few samples data. There will be overfitting problems and the prediction accuracy is not satisfactory [37]. SVM or LSSVM has obvious advantages in solving few samples, non-linear and high-dimensional problems [38]. Therefore, LSSVM optimized by QGA is adopted in this paper as a non-linear and variable weight combiner. The flow chart of the proposed combined prediction is shown in Figure 3.

Accuracy Test of Grey Prediction Model
After the grey prediction model is established, the model can be used to predict effectively only after the accuracy test is qualified. The posteriori error test is adopted in this paper. The equations of posteriori error test are as follows.
The posteriori error ratio C: where, 1 S is the standard deviation of the original sequence and 2 S is the standard deviation of the residual sequence. The small error probability P: where, ( ) ε i is the residual error and ε is the residual average.
The smaller the value of C and the greater the value of P, the higher the grade of the grey prediction model. Generally, the accuracy grade of the grey prediction model can be divided into four levels which are shown in Table 2. where the precision grade of grey prediction model is the maximum of the grade of P and the grade of C.

Forecasting Evaluation Index
In order to compare the prediction effects of the combined model and the single ones, the following 3 indexes are calculated.
Assume that Root Mean Square Error (RMSE): The closer the RMSE is to zero, the smaller the prediction error and vice versa. Mean Absolute Error (MAE): The closer the MAE is to zero, the smaller the prediction error and vice versa. Coefficient of determination (R 2 ): The larger the R 2 is and the closer it is to 1, the better the fitting effect of the prediction model and vice versa [39].

Signal Acquisition
A 1.5 MW wind turbine is taken as the research object in this paper. The three-dimensional model of the 1.5 MW wind turbine drive system is established by Solidworks, and then it is imported into ADAMS for dynamic simulation. In the dynamic simulation model, parallel misalignment, angular misalignment and comprehensive misalignment are simulated. The acceleration signal of the high-speed output of the gearbox is used as the vibration signal (details in literature [40]). In MATLAB, the wind turbine and its control system are established to obtain the stator current for misalignment faults (details in literature [41]). The simulation model of the wind turbine is in the Maximum Power Points Tracking (MPPT) stage, in which the input speed is 81.3 °/s, and the vibration signal and stator current signal are obtained under parallel misalignment fault for researching.

Feature Extraction
The RMS represents the average energy measure of the signal and is very stable [42]. Kurtosis: The kurtosis is highly sensitive to early faults and impact signals, but its stability is poor [43,44]. Kurtosis index: The kurtosis index is very sensitive to the impact of vibration signals [45].

Frequency-Domain Feature Parameters
Time Centroid Frequency (FC): Variance Frequency (VF): where ( ) S ω is the power spectrum of the signal. ( ) . ω is the discretized angular frequency.

Time-Frequency Domain Feature Parameters
The Empirical Mode Decomposition (EMD) improved by mirror extension is used to process the vibration signal [46]. The advantage of this method is the suppression of endpoint effects. The energy entropy, which is suitable for non-stationary and non-linear complex signals [47], is extracted from the processed vibration signal.
Assume [48]: , The equation of energy entropy i P is as follows: where, 1, 2, , with the original signal [49]. The stator current signal obtained from the simulation model is decomposed into 4 layers by the Dual-tree Complex Wavelet Transform (DTCWT) [50]. The obtained 5 sub-band signals are reconstructed and the sample entropy is calculated. Sample entropy can obtain better and more stable results with less data and has stronger robustness [51]. For finite sequences t N , the equation for the sample entropy i Se is as follows: where, m is the number of dimensions of the constructed vector, r is a given threshold, and ( ) m B r is the average of the maximum distance between two m-dimensional vectors.

Feature Vectors and Normalization
The input speed of the dynamic simulation model of the wind turbine drive system is set to be 81.3 °/s. Afterwards, 60 segments of vibration and current signals of parallel misalignment faults are collected. The time-domain, frequency-domain and time-frequency domain feature parameters of each signal segment are extracted to construct feature vectors. After extracting the time-frequency domain feature parameters of the vibration signal, the correlation coefficients of the first three IMFs of the vibration signal decomposed by IEMD are all greater than 0.1. Therefore, the first three IMFs are selected as effective components in this paper. The feature parameters of vibration and current signal are listed in Tables 3 and 4.
where, x is an unnormalized matrix, min y is a minimum value of the normalization interval, and max y is a maximum value of the normalization interval. The feature parameters are normalized to the range [0,1] in this paper.
For the vibration and current signals, multivariate feature parameters with 9 dimensions and 11 dimensions are used as input vectors in this paper. Because the kurtosis is sensitive to early faults [43,44], the kurtosis is selected as the output of the prediction model.

The Results of IMGM (1, n)
The MGM (1, n) is suitable for the prediction of few samples and the required number of modelling is usually between 5 and 60 [52][53][54]. Therefore, the collected 60 vibration signal samples are divided into the first 45 samples and the last 15 samples according to the ratio of 3:1. The first 45 samples are used to construct an IMGM (1, n) prediction model. The indexes of the posteriori error test calculated from the fitted values are listed in Table 5.  (1,9)) and IMGM (1,9) for vibration signal.

C P
Precision Grade MGM (1,9) 0.4952 0.8667 Qualified IMGM (1,9) 0.4880 0.9048 Qualified It can be shown from Table 5 that the models of MGM (1,9) and IMGM (1,9) belong to the qualification grade, so they can be used for prediction. Because the C value of IMGM (1,9) is smaller and the P value of IMGM (1,9) is larger, IMGM (1,9) is more suitable for predicting the kurtosis of vibration signals than MGM (1,9). Figure 4 shows the prediction results of MGM (1,9) and IMGM (1,9)   In Figure 4, the predicted and fitted values of MGM (1,9) and IMGM (1,9) both fluctuated up and down around their actual values. The forecasting evaluation indexes of IMGM (1,9) and MGM (1,9) are listed in Table 6. For the fitted values, the RMSE and R 2 of IMGM (1,9) is nearly the same to that of MGM (1,9), while the MAE of IMGM (1,9) is better than that of MGM (1,9). For the predicted values, the RMSE, MAE, and running time of the IMGM (1,9) are smaller than those of MGM (1,9). The R 2 of IMGM (1,9) is larger than that of MGM (1,9). Therefore, IMGM (1,9) improves the prediction accuracy of MGM (1,9) for misalignment fault vibration signal of wind turbine. This is because when predicting the last 15 points, the data used to establish the MGM (1,9) model has always been the first 45 points, and the data has not been updated with the increase prediction steps. However, the modelling data of IMGM (1,9) is updated by adding the actual value corresponding to the previous step of the current prediction point. The modelling data is updated every time the model output one predicted value. Hence, the IMGM (1,9) can be one of the single prediction models input to the combined forecasting model for vibration signal. Table 6. Comparison of forecasting evaluation indexes of MGM (1,9) and IMGM (1,9) for the vibration signal.

Method
Date  Table 7. The three forecasting models are established based on the optimized parameters in Table 7. The corresponding forecasting evaluation indexes are listed in Table 8. In Table 8, for the training set, the indexes of Grid Search_LSSVM show that it has smaller prediction errors of training set than GA_LSSVM and QGA_LSSVM. For the testing set, the RMSE and MAE of QGA_LSSVM are smaller than those of GA_LSSVM and Grid Search_LSSVM. The R 2 of QGA_LSSVM is the largest and its running time is the shortest. Therefore, compared with the results of Grid Search_LSSVM and GA_LSSVM, QGA_LSSVM has the best prediction accuracy and the shortest running time, so it can be one of the single forecasting models input to the combined forecasting model for vibration signal.

The Results of the Combined Forecasting Model
In this paper, the prediction results of IMGM (1,9) and QGA_LSSVM are used as the inputs of QGA_LSSVM combiner and the actual kurtosis values of the vibration signal are used as the output. The comparison of predicted results between the single forecasting models and combined model are shown in Figure 5. In Figure 5a, the fluctuation range of IMGM (1,9) around the actual value is the largest. The prediction results of QGA_LSSVM and the combined forecasting model are closer to the actual value. Figure 5b shows that the absolute value of relative error of IMGM (1,9) is significantly large. From the 46th predicted points to the 51th predicted points, the absolute value of relative error of the combined forecasting model is less than that of QGA_LSSVM. After the 53th predicted point, they have similar absolute value of relative error. Therefore, the combined forecasting model has a smaller absolute value of relative error than the two single forecasting models. It can also be shown from Table 9 that, for the training test, the forecasting evaluation indexes of the combined forecasting model are near to those of QGA_LSSVM and better than those of IMGM (1,9); but for the testing set, the indexes of the combined forecasting model are the best in Table 9. Thus, compared with the single forecasting models, the combined forecasting model improves the prediction accuracy and reduces the running time for the vibration signal. For vibration signal, the optimal parameters of the combiner are 2 [ , ] [24.5174 , 9.4170] γ σ = .  Table 10.  Table 10, MGM (1,11) is barely qualified grade and IMGM (1,11) belongs to the qualified grade. Therefore, IMGM (1,11) is more suitable for predicting the kurtosis of the current signal than MGM (1,11). The prediction results of MGM (1,11) and IMGM (1,11) of the current signal are shown in Figure 6. By comparing the prediction results in Figure 6a and Figure 6b, the predicted fluctuation range of IMGM (1,11) is obviously smaller than that of MGM (1,11). It is because when predicting the last 15 points, the data used to establish the MGM (1,11) has always been the first 45 points, and the data has not been updated with the increase prediction steps. However, the modelling data of IMGM (1,11) is updated by adding the actual value corresponding to the previous step of the current prediction point. The modelling data is updated every time the model outputs one predicted value. Hence, when predicting the kurtosis of a current signal with a large rising slope, the IMGM (1,11) has significantly better prediction accuracy than that of MGM (1,11). The forecasting evaluation indexes of IMGM (1,11) and MGM (1,11) are listed in Table 11. For the fitted values, the RMSE of IMGM (1,11) is nearly the same to that of MGM (1,11), while the MAE and R 2 of IMGM (1,11) in the table is better than those of MGM (1,11). For the predicted values, the RMSE, MAE and running time of the IMGM (1,11) are smaller than those of MGM (1,11). The R 2 of IMGM (1,11) is much larger than that of MGM (1,11). Therefore, compared with MGM (1,11), IMGM (1,11) improves the prediction accuracy and stability of the current signal, and it can be one of the single prediction models input to the combined forecasting model for current signal.  [ , ] γ σ obtained by the three prediction models are listed in Table 12. The three forecasting models are established based on the optimized parameters in Table 12. The corresponding forecasting evaluation indexes are listed in Table 13. For the training set, the Grid Search_LSSVM has smaller errors than GA_LSSVM and QGA_LSSVM. For the testing set, all of forecasting evaluation indexes of QGA_LSSVM are better than those of GA_LSSVM and Grid Search_LSSVM in Table 13. Thus, QGA_LSSVM has a good prediction effect not only for the vibration signal but also for the current signal, and QGA_LSSVM can be one of the single forecasting models input to the combined forecasting model for the current signal.

The Results of the Combined Forecasting Model
The prediction results of IMGM (1,11) and QGA_LSSVM are used as the inputs of the QGA_LSSVM combiner, and the actual kurtosis values of the current signal are used as the output. The predicted results are shown in Figure 7.  Figure 7a indicates that, compared with IMGM (1,11), the prediction results of QGA_LSSVM and the combined forecasting model are closer to the actual values. In Figure 7b, the absolute value of relative error of QGA_LSSVM and combined forecasting model is within 6%. The absolute value of relative error of IMGM (1,11) is much larger than those of other models. In Table 14, for the training set, the forecasting evaluation indexes of the combined forecasting model are near to those of QGA_LSSVM and better than those of IMGM (1,11), but for the testing set, the indexes of the combined forecasting model are the best. Therefore, it is proved by current signals that the combined forecasting model has a higher prediction accuracy with less running time.

Conclusions
To improve the prediction accuracy of single forecasting models, this paper proposes a combined forecasting model using LSSVM optimized by QGA as a combiner to predict the misalignment fault of wind turbines based on vibration and current signals. The specific method is to extract the time-domain, frequency-domain and time-frequency domain feature parameters from the signals at first. After being normalized, the extracted features are used to predict by IMGM (1, n) and QGA_LSSVM separately and the predicted values are then regarded as the inputs of the combiner, which dynamically assigned non-linear weight coefficients to them. The simulation results indicate that the prediction accuracy of the combined prediction model is higher than those of single models and the running time is shortened. This new model is suitable for fault prediction with deterministic and random trends. In future research, the kurtosis predicted by the proposed model will be combined with the kurtosis threshold to achieve early warning of wind turbine misalignment faults.
Author Contributions: Conceptualization, Y.X.; Writing-original draft, Y.X. and Z.H.; Writing-review & editing, Y.X. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (51577008).