An Improved LightGBM Algorithm for Online Fault Detection of Wind Turbine Gearboxes

: It is widely accepted that conventional boost algorithms are of low e ﬃ ciency and accuracy in dealing with big data collected from wind turbine operations. To address this issue, this paper is devoted to the application of an adaptive LightGBM method for wind turbine fault detections. To this end, the realization of feature selection for fault detection is ﬁrstly achieved by utilizing the maximum information coe ﬃ cient to analyze the correlation among features in supervisory control and data acquisition (SCADA) of wind turbines. After that, a performance evaluation criterion is proposed for the improved LightGBM model to support fault detections. In this scheme, by embedding the confusion matrix as a performance indicator, an improved LightGBM fault detection approach is then developed. Based on the adaptive LightGBM fault detection model, a fault detection strategy for wind turbine gearboxes is investigated. To demonstrate the applications of the proposed algorithms and methods, a case study with a three-year SCADA dataset obtained from a wind farm sited in Southern China is conducted. Results indicate that the proposed approaches established a fault detection framework of wind turbine systems with either lower false alarm rate or lower missing detection rate.


Introduction
Wind turbines are usually operated in remote and harsh areas with extreme weather conditions, which might cause their faults. The gearbox faults will affect the overall performance of the equipment and even cause human injuries and economic loss [1]. Therefore, fault detection and rapid fault identification of wind turbine gearbox components are of great importance to reduce the operation and maintenance costs of wind turbines and improve the production of wind farms [2,3]. Over the years, extensive research has been carried out contributing to the fault diagnosis of wind turbines.
At present, monitoring and fault diagnosis methods are mainly used in wind turbine gearboxes and other major components, such as wavelet-based approaches, statistical analysis, machine learning, as well as some other hybrid and modern techniques [4][5][6][7][8]. However, the need for transformation the range of features to scale the range in [0, 1] or [−1, 1], the 0-1 scaling of x can be computed as follows: x i = x − x min x max − x min (1) where x i denotes the normalized value, x is the initial value, x min is the minimum value of x, x max is the maximum value of x. Missing values also have effects on model estimation performance, while handling missing values often includes deletion methods, and imputation methods [25]. LightGBM was selected to deal with the possibility of missing values here as it has an amount of knowledge that cannot be overlooked. The second stage is for feature selection. By making feature selection, the reasonable parameters of wind turbine gearboxes were selected and the model performance has been improved. In this part, maximum information coefficients are proposed to measure of how much information between two wind turbine features share. By inputting the original feature set, the maximum information coefficient method was used for parameter selection and outputting the optimal feature subset.
The third stage is developed for Bayesian hyper-parameter optimization, as LightGBM is a powerful gradient boosting algorithm which has numerous hyper-parameters. Therefore, here Bayesian hyper-parameter optimization is proposed to tuning the hyper-parameters into LightGBM. By dividing the processed data into two subsets-training dataset and testing dataset-and using the training dataset to construct the improved LightGBM fault detection model. Then the training datasets and the test datasets are inputted, by setting the LightGBM parameter search field and using Bayesian hyperparameter optimization on LightGBM and then output the LightGBM optimal hyperparameters and obtained the final model.
The final step comes to LightGBM online fault detection. By inputting the optimal LightGBM hyperparameters to obtain the final model, followed by applying the final model on testing datasets, and embedding the missing detection rate, finally the false alarm rate can be used to calculate the performance evaluation criteria. The fault sample and the fault-free sample are distinguished according to the improved LightGBM method. This paper proposed a performance evaluation criterion for the improved LightGBM model to support fault detection. By embedding the confusion matrix as a performance indicator, an improved LightGBM fault detection approach is developed. Subsequently, the improved LightGBM method was used to detect faults of wind turbines. The framework of this study can be shown as Figure 1.
Energies 2020, 13, where ̅ denotes the normalized value, x is the initial value, is the minimum value of x, is the maximum value of x. Missing values also have effects on model estimation performance, while handling missing values often includes deletion methods, and imputation methods [25]. LightGBM was selected to deal with the possibility of missing values here as it has an amount of knowledge that cannot be overlooked.
The second stage is for feature selection. By making feature selection, the reasonable parameters of wind turbine gearboxes were selected and the model performance has been improved. In this part, maximum information coefficients are proposed to measure of how much information between two wind turbine features share. By inputting the original feature set, the maximum information coefficient method was used for parameter selection and outputting the optimal feature subset.
The third stage is developed for Bayesian hyper-parameter optimization, as LightGBM is a powerful gradient boosting algorithm which has numerous hyper-parameters. Therefore, here Bayesian hyper-parameter optimization is proposed to tuning the hyper-parameters into LightGBM. By dividing the processed data into two subsets-training dataset and testing dataset-and using the training dataset to construct the improved LightGBM fault detection model. Then the training datasets and the test datasets are inputted, by setting the LightGBM parameter search field and using Bayesian hyperparameter optimization on LightGBM and then output the LightGBM optimal hyperparameters and obtained the final model.
The final step comes to LightGBM online fault detection. By inputting the optimal LightGBM hyperparameters to obtain the final model, followed by applying the final model on testing datasets, and embedding the missing detection rate, finally the false alarm rate can be used to calculate the performance evaluation criteria. The fault sample and the fault-free sample are distinguished according to the improved LightGBM method. This paper proposed a performance evaluation criterion for the improved LightGBM model to support fault detection. By embedding the confusion matrix as a performance indicator, an improved LightGBM fault detection approach is developed. Subsequently, the improved LightGBM method was used to detect faults of wind turbines. The framework of this study can be shown as Figure 1.

Maximum Information Coefficient
The theory of maximum information coefficients is used to measure the strength of the numerical correlation between the two features [26]. Given X is a discrete variable, the information entropy [27] of X can then be expressed as Conditional entropy refers to the conditional probability distribution of X occurring when random variable Y occurs.

Maximum Information Coefficient
The theory of maximum information coefficients is used to measure the strength of the numerical correlation between the two features [26]. Given X is a discrete variable, the information entropy [27] of X can then be expressed as For the random variable X, the maximum information coefficient of Y is where |X|·|Y| represents the number of grids. Parameter B represents the 0.6th power of the total amount of data. The maximum information coefficient ranges from 0 to 1, and the closer the value is to 1, the stronger the correlation between the two variables, and vice versa.

LightGBM
Light Gradient Boosting Machine (LightGBM) is a Gradient Boosting Decision Tree (GBDT) framework based on the decision tree algorithm proposed using gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB). The continuous features can be discretized by the GBDT algorithm, but it only uses the first-order derivative information when optimizing the loss function, the decision tree in GBDT can only be a regression tree which is because each tree of the algorithm learns the conclusions and residuals of all previous trees. Moreover, GBDT is challenged in accuracy and efficiency with the growth of data volume. The XGBoost algorithm introduces the second derivative to Taylor's expansion of the loss function and the L2 regularization of the parameters to evaluate the complexity of the model, and can automatically use the CPU for multi-threaded parallel computation, after that, the efficiency and accuracy of diagnosis can be improved. However, the leaf growth mode grows with the greedy training method of layer-by-layer. Then LightGBM adopted the histogram-based decision tree algorithm. The leaf growth strategy with depth limitation and multi-thread optimization in LightGBM contributes to solve the excessive XGBoost memory consumption, which can process big data with have higher efficiency, lower false alarm rate and lower missing detection rate.
Given the supervised learning data set X = {(x i , y i )} N i=1 , LightGBM was developed to minimize the following regularized objective.
In this algorithm, logistic loss function is used to measure the difference between the predictionŷ i and the target y i .

of 16
The regression tree can be represented by another form, namely w q(x) , q ∈ {1, 2, . . . , J}, where J is the number of leaf nodes, q is the decision rule of the tree, w is the sample weight, and the objective function can be expressed as: The traditional GBDT uses the steepest descent method, which only considers the gradient of the loss function. In LightGBM, Newton's method is used to quickly approximate the objective function. After further simplification and deriving of Equation (9), the objective function can be expressed as Equation (10): where g i , h i represents a first-order loss function and a second-order loss function, respectively.
Using I j to represent the sample set of leaf j, Equation (11) can be transformed as follows: Ob Given the structure of the tree q(x), the optimal weight of each leaf node and the limit of L T can be obtained through quadratic programming: The gain calculation formula then is: LightGBM uses the maximum tree depth to trim trees and avoid overfitting, using multi-threaded optimization to increase efficiency and save time.

Bayesian Hyper-Parameter Optimization
The main parameters which affect the performance of the LightGBM model are the number of leaves, the learning rate, etc., instead of being obtained through training, these parameters need to be manually adjusted. These parameters were defined as hyper-parameters [28]. Traditional methods of hyper-parameter optimization include grid searching, random searching, and so on. Although grid searching supports parallel computing, it is memory consuming [29]. The purpose of the random searching is to obtain the optimal solution of the approximation of the function by random sampling in the searched range, which is easier to jump out of the global optima and cannot guarantee an optimal solution. The Bayesian optimization is based on the past evaluation results of the objective function, using these results to form a probability model, and mapping the hyper-parameters to the objective function's scoring probability to find the optimal parameter θ, which can be expressed as P(Y|X) [30]. As to the selection of probability model, it can be divided into Gaussian process, random forest regression, and Tree-structured Parzen Estimator (TPE). The TPE method was found can achieve better performance. The Bayesian Tree-structured Parzen estimation method is used to optimize the parameters of LightGBM.
Suppose θ = {θ 1 ,θ 2 . . . θ n } represents hyperparameters in machine learning algorithm A (such as LightGBM), D train data set is used for training, and D valid data set is used for verification (i.e., hyperparameter optimization), and the two are independently distributed. L (A, θ, D valid , D train ) is used to represent the verification loss of algorithm A. K-fold cross-validation is generally used to address the optimization requirement: The interval range for parameters are needed to set in LightGBM algorithm. In the process of parameter optimization, the model is continuously trained, and the classification result obtained by each parameter combination is evaluated by the evaluation function. Finally, the optimal parameter combination is obtained. The combination is substituted into the LightGBM algorithm, and the classification performance is improved.
Implementation of the proposed LightGBM hyper-parameters optimization can be detailed as follows [31]:  (1) 4: If the data is in fault, calculate the error between the model prediction y p and the online test data y 0 5: Calculate the performance according to Equations (15) and (16) Output: False Alarm Rate and Missing Detection Rate Algorithm 1, 2, 3 indicates the process of LightGBM via hyper-parameters optimization model, Off-line implementation of improved LightGBM fault detection method, online implementation of improved LightGBM fault detection method, respectively. LightGBM is a powerful machine learning method that has numerous hyper-parameters. In this paper, TPE is proposed to tune the hyper-parameters in LightGBM.

Experimental Setup
To validate the effectiveness of the proposed gearbox fault detection model, a 1.5MW wind turbine located in a wind farm in China was selected for case studies, with three years' gearboxes data extracted from the SCADA dataset. By analyzing the wind turbine gearbox mechanism and expert experience, the data within the period time from 30 min before the start of fault to 30 min after the fault was selected. The selected raw data can be found in Table 1. A schematic diagram of wind turbines including the wind rotor, gearbox, etc. It is illustrated in Figure 2. 18   As shown in Table 2, this dataset contains three different datasets including dataset 1, dataset 2, and dataset 3. with each dataset has two types of sample including fault-free and faulty. Dataset 1 includes the gearbox oil over temperature data and the fault-free data, dataset 2 includes the gearbox oil level fault data and the fault-free data, while dataset 3 includes the gearbox lubrication oil pressure fault data and fault-free data respectively.

Feature Selection
The gearbox bearing temperature information is used to evaluate the health of the gearbox. Parameters that have a great influence on the parameters was chosen. Based on the expert experience method and the method about feature extraction of wind turbines gearboxes, 18 parameters that the most relevant features to the feature of gearbox oil temperature are obtained. The maximum information correlation between these datasets is shown in Figures 3-5.  As shown in Table 2, this dataset contains three different datasets including dataset 1, dataset 2, and dataset 3. with each dataset has two types of sample including fault-free and faulty. Dataset 1 includes the gearbox oil over temperature data and the fault-free data, dataset 2 includes the gearbox oil level fault data and the fault-free data, while dataset 3 includes the gearbox lubrication oil pressure fault data and fault-free data respectively.

Feature Selection
The gearbox bearing temperature information is used to evaluate the health of the gearbox. Parameters that have a great influence on the parameters was chosen. Based on the expert experience method and the method about feature extraction of wind turbines gearboxes, 18 parameters that the most relevant features to the feature of gearbox oil temperature are obtained. The maximum information correlation between these datasets is shown in Figures 3-5. As shown in Table 2, this dataset contains three different datasets including dataset 1, dataset 2, and dataset 3. with each dataset has two types of sample including fault-free and faulty. Dataset 1 includes the gearbox oil over temperature data and the fault-free data, dataset 2 includes the gearbox oil level fault data and the fault-free data, while dataset 3 includes the gearbox lubrication oil pressure fault data and fault-free data respectively.

Feature Selection
The gearbox bearing temperature information is used to evaluate the health of the gearbox. Parameters that have a great influence on the parameters was chosen. Based on the expert experience method and the method about feature extraction of wind turbines gearboxes, 18 parameters that the most relevant features to the feature of gearbox oil temperature are obtained. The maximum information correlation between these datasets is shown in Figures 3-5.     As illustrated in Figures 3-5, the correlation between each feature is quite different. To avoid weak and redundant features influences, the correlation between the 18 state features was further explored. According to the maximum information coefficient correlation analysis method, the correlation coefficient between each feature and the gearbox oil temperature are calculated (shown in Table 3).    As illustrated in Figures 3-5, the correlation between each feature is quite different. To avoid weak and redundant features influences, the correlation between the 18 state features was further explored. According to the maximum information coefficient correlation analysis method, the correlation coefficient between each feature and the gearbox oil temperature are calculated (shown in Table 3).  As illustrated in Figures 3-5, the correlation between each feature is quite different. To avoid weak and redundant features influences, the correlation between the 18 state features was further explored. According to the maximum information coefficient correlation analysis method, the correlation coefficient between each feature and the gearbox oil temperature are calculated (shown in Table 3).  From the correlation analysis results in Table 3, it can be concluded that the correlation between the various state parameters and the gearbox bearing temperature is different. To avoid the impacts of uncorrelated and weakly correlated state parameters on the gearbox fault detection, the correlation coefficient was set as 0.50 to 0.95 (shown as bold parts in Table 3). The characteristics between them are also included in Table 3.

Hyper-Parameter Optimization in LightGBM
The selection of hyper-parameters is of great importance in modelling. There are a great deal of hyper-parameters to choose from in LightGBM. To improve the real-time performance in fault detection, only the parameters that have significant influence on model performance were selected for hyper-parameter optimization. The main parameters of LightGBM in the experiment are shown in Table 4 [32].

Gearbox Fault Detection Performance Evaluation Criteria
There are four states corresponding to the normal state, the gearbox total failure, gearbox oil temperature overrun, gearbox oil pressure failure, respectively, recorded as P = [0, 1, 2, 3], which was divided into four sections. The three faults with the normal state are combined and fault diagnosis have been performed through the LightGBM algorithm to obtain four sets of classification types. The fault diagnosis problem studied in this paper can be regarded as a binary classification. The false alarm rate (FAR) and the missing detection rate (MDR) are adopted as the performance evaluation criteria which is a commonly used confusion metric to measure the performance of a classification method. The mixed matrix of the binary classification problem is shown in Table 5: Table 5. Confusion matrix of binary classification problems.

Results and Discussion
In this section, case studies were conducted with a three-year SCADA dataset collected from a wind farm sited in Southern China. The effectiveness of the proposed improved LightGBM framework fault detection was then validated. To further demonstrate the superiority of the proposed framework, comparative studies were implemented between three mainstream fault diagnosis methods, namely GBDT, XGBoost, LightGBM.
By using different evaluation criteria in the three different datasets, the FAR and MDR under different algorithms are depicted shown in Figures 6-11. To avoid over-fitting in the model, this paper employed the 10-fold cross-validation method to evaluate the model. The smaller the FAR and MDR the better the performance.
Gradient boosting decision tree (GBDT) is a powerful boosting framework, which is widely used in machine learning models and has been successful applied in fault diagnosis [33]. Thus, GBDT was applied to predict the faults and classify the type of faults of wind turbines gearboxes. In this paper, as shown in Figures 6-10, all the fault detection results by using GBDT have a relatively higher FAR and MDR than other boost algorithms. From Figure 6, the average of FAR using GBDT is 0.107. The boxplot shows that the classification of the GBDT method is better. Compared with the MDR using the GBDT method in Figures 6 and 7, the figure shows that the model has not been fitted.

Results and Discussion
In this section, case studies were conducted with a three-year SCADA dataset collected from a wind farm sited in Southern China. The effectiveness of the proposed improved LightGBM framework fault detection was then validated. To further demonstrate the superiority of the proposed framework, comparative studies were implemented between three mainstream fault diagnosis methods, namely GBDT, XGBoost, LightGBM.
By using different evaluation criteria in the three different datasets, the FAR and MDR under different algorithms are depicted shown in Figures 6-11. To avoid over-fitting in the model, this paper employed the 10-fold cross-validation method to evaluate the model. The smaller the FAR and MDR the better the performance.          Gradient boosting decision tree (GBDT) is a powerful boosting framework, which is widely used in machine learning models and has been successful applied in fault diagnosis [33]. Thus, GBDT was applied to predict the faults and classify the type of faults of wind turbines gearboxes. In this paper, as shown in Figures 6-10, all the fault detection results by using GBDT have a relatively higher FAR and MDR than other boost algorithms. From Figure 6, the average of FAR using GBDT is 0.107. The boxplot shows that the classification of the GBDT method is better. Compared with the MDR using the GBDT method in Figures 6 and 7, the figure shows that the model has not been fitted.
XGBoost, as a strong classification model in machine learning, has been widely applied in fault diagnosis [34]. Moreover, it has been reported that this approach can successfully detect faults in industrial fields [35]. Therefore, XGBoost was also applied to detect faults for comparison. The results in Figures 8 and 9 indicate that the performance of the fault diagnosis is slightly worse than that of the LightGBM. The average of FAR and MDR using XGBoost was 0.165 and 0.178, respectively. The general performance of XGBoost is better than GBDT, this may be because XGBoost uses a secondorder Taylor expansion to approximate the optimal solution of the objective function.   Gradient boosting decision tree (GBDT) is a powerful boosting framework, which is widely used in machine learning models and has been successful applied in fault diagnosis [33]. Thus, GBDT was applied to predict the faults and classify the type of faults of wind turbines gearboxes. In this paper, as shown in Figures 6-10, all the fault detection results by using GBDT have a relatively higher FAR and MDR than other boost algorithms. From Figure 6, the average of FAR using GBDT is 0.107. The boxplot shows that the classification of the GBDT method is better. Compared with the MDR using the GBDT method in Figures 6 and 7, the figure shows that the model has not been fitted.
XGBoost, as a strong classification model in machine learning, has been widely applied in fault diagnosis [34]. Moreover, it has been reported that this approach can successfully detect faults in industrial fields [35]. Therefore, XGBoost was also applied to detect faults for comparison. The results in Figures 8 and 9 indicate that the performance of the fault diagnosis is slightly worse than that of the LightGBM. The average of FAR and MDR using XGBoost was 0.165 and 0.178, respectively. The general performance of XGBoost is better than GBDT, this may be because XGBoost uses a secondorder Taylor expansion to approximate the optimal solution of the objective function. XGBoost, as a strong classification model in machine learning, has been widely applied in fault diagnosis [34]. Moreover, it has been reported that this approach can successfully detect faults in industrial fields [35]. Therefore, XGBoost was also applied to detect faults for comparison. The results in Figures 8 and 9 indicate that the performance of the fault diagnosis is slightly worse than that of the LightGBM. The average of FAR and MDR using XGBoost was 0.165 and 0.178, respectively. The general performance of XGBoost is better than GBDT, this may be because XGBoost uses a second-order Taylor expansion to approximate the optimal solution of the objective function.
LightGBM is of two novel techniques: gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB) which can deal with a large number of data instances and large numbers of features in wind turbines, respectively [36]. In this research, the GOSS is adopted to split the optimal node using variance gain and EFB. The GOSS has no impact on the training accuracy and will outperform random sampling. The results using the LightGBM method are illustrated in Figures 6-11. The average of FAR and MDR in Figures 10 and 11 indicates that it has better performance than existing methods.
To reduce the FAR and MDR, the Maximum Information Coefficient (MIC) is proposed for feature selection and Tree-structured Parzen Estimator (TPE) for hyper-parameter optimization to using the improved LightGBM methods to detect the wind turbine gearbox faults including the gearbox total failure, gearbox oil temperature overrun, and gearbox oil pressure failure. Experimental results indicate that the proposed method can also achieve good performance for real-time fault detecting. Figures 8  and 9 show that the average FAR and the average MDR of LightGBM via the TPE method are 0.10 and 0.16, respectively, which are lower than the FAR of GBDT and XGBoost and lower than the MDR of GBDT and XGBoost. Similarly, as shown in Figures 10 and 11, it can be known that LightGBM via the TPE method has stronger generalization capability than GBDT and XGBoost. It can be known from the experiments that the hyper-parameter optimization of LightGBM successfully solves the fault detection problems and improves the model performance, and the TPE method is superior to the grid search method. Consequently, the improved LightGBM method in wind turbines gearboxes fault detection is effective and advanced.
The preceding comprehensive comparison studies demonstrate that the improved LightGBM has superior performance over GBDT, XGBoost, and LightGBM for wind turbine gearbox fault diagnosis. Experimental results demonstrated that the proposed improved LightGBM fault diagnosis significantly outperformed the traditional boosting algorithm in terms of feature learning, model training, and classification performance.

Conclusions
Over the years, machine learning methods for fault diagnosis were well studied by experts and scholars. The effort was devoted to formulating boost-based fault diagnosis methodology and developing corresponding fault diagnosis systems. However, challenges are still existing. This paper provided a novel method for fault detection. The main contributions including: A feature selection approach based on MIC is constructed to select state parameters, remove irrelevant, redundant, or useless variables, and it can improve fault detection performance.
By using the TPE hyper-parameter optimization and a novel LightGBM algorithm, an intelligent fault detection method is finally developed in this research. The improved LightGBM classification performance evaluation criteria are better than other algorithms, with high-efficiency parallelization, fast speed, high model accuracy, and low occupancy rate. In addition, the accuracy of fault detection is up to 98.67%, thus the presented approach for wind turbine gearboxes is feasible in practical engineering not only in wind turbines fault detection but also in large-scale industrial fault detection.
Experimental results show that the method is not only suitable for fault diagnosis of wind turbine gearboxes but can also applied in industrial system fault diagnosis with multiple feature vectors and low diagnostic accuracy. Based on the improved LightGBM wind turbines gearboxes fault detection presented in this paper, suggestions for future studies might include: 1.
In the case of few imbalanced data distributions in fault diagnosis field, further investigation can be implemented on the imbalanced dataset based on boost algorithm methods to mitigate the influence on skewed data distribution between faulty samples and fault-free samples.

2.
In addition, real-time fault prediction is of great importance in industrial applications.

3.
Combined applications of the improved LightGBM algorithm with other techniques might offer the potential to overcome the drawbacks of each method.

4.
To improve fault diagnosis performance, hybrid fault diagnosis approaches might be a desired solution which worth to be investigated in upcoming studies.