An Online Hybrid Model for Temperature Prediction of Wind Turbine Gearbox Components

Condition monitoring can improve the reliability of wind turbines, which can effectively reduce operation and maintenance costs. The temperature prediction model of wind turbine gearbox components is of great significance for monitoring the operation status of the gearbox. However, the complex operating conditions of wind turbines pose grand challenges to predict the temperature of gearbox components. In this study, an online hybrid model based on a long short term memory (LSTM) neural network and adaptive error correction (LSTM-AEC) using simple-variable data is proposed. In the proposed model, a more suitable deep learning approach for time series, LSTM algorithm, is applied to realize the preliminary prediction of temperature, which has a stronger ability to capture the non-stationary and non-linear characteristics of gearbox components temperature series. In order to enhance the performance of the LSTM prediction model, the adaptive error correction model based on the variational mode decomposition (VMD) algorithm is developed, where the VMD algorithm can effectively solve the prediction difficulty issue caused by the non-stationary, high-frequency and chaotic characteristics of error series. To apply the hybrid model to the online prediction process, a real-time rolling data decomposition process based on VMD algorithm is proposed. With aims to validate the effectiveness of the hybrid model proposed in this paper, several traditional models are introduced for comparative analysis. The experimental results show that the hybrid model has better prediction performance than other comparative models.


Introduction
Wind energy, as a clean and renewable energy, now has been one of the major potential and practical renewable resources. In recent years, the installed capacity of wind turbines all over the world has increased rapidly [1,2]. With the increase of installed capacity and wind turbine complexity, frequent malfunctions result in low reliability and expensive maintenance costs of wind turbines. According to statistics, the cost of operation and maintenance of onshore wind farms and offshore wind farms account for about 15-20% and 30-35% of the total revenue, respectively [2,3]. To raise the availability and reliability of wind turbines, monitoring the operation status of wind turbines and detecting potential faults are increasingly significant. Gearbox, as a key component of wind turbines, often occurs various faults, which leads to high maintenance costs. Statistically, the maintenance cost caused by gearbox is as high as 13% of the total cost [4]. In recent years, monitoring the operation status of the gearbox has attracted wide attention.
With the development of the wind power industry, there are numerous studies on wind turbines fault diagnosis and condition monitoring. According to the methods adopted by these studies, they can be roughly classified into two types: model-based methods and data-driven methods [5]. In addition to classical methods such as state estimation and parameter estimation, many new model-based studies have been proposed in recent years [6][7][8][9][10]. In [8], a set-valued approach is proposed for wind turbine fault diagnosis. In order to ensure the performance of fault diagnosis, model-based methods need to establish accurate mathematical models of wind turbines system. However, due to the complexity of wind turbine systems, it is difficult to establish an accurate mathematical model, which leads to the difficulty of model-based in practical application [5]. In contrast, data-driven methods do not require accurate mathematical models, and most wind turbines are equipped with a supervisory control and data acquisition (SCADA) system, which makes it easy to obtain data. Therefore, the data-driven method is a very worthwhile aspect to be studied for wind turbine fault diagnosis and condition monitoring. The temperature of gearbox components is closely related to the operation state of the gearbox. Excessive temperature will cause the occurrence of faults. Similarly, the occurrence of faults in a component will also be accompanied by a significant change in temperature [11]. Therefore, high temperature warning of gearbox components is crucial for condition monitoring of wind turbines and reduction of operational and maintenance costs. The key of high-temperature warning is to improve the accuracy of the temperature prediction model as much as possible. In this paper, a data-driven method based on temperature prediction is studied to monitor the operation status of the gearbox.
Generally, according to the sources of data, the time series prediction models can be divided into two categories as the multi-variable models and single-variable models in the wind turbines system. At present, most temperature prediction models adopt multi-variable data based on SCADA system [12,13]. Huang et al. [12] put up with a hybrid method combining principal component analysis (PCA) and nonlinear autoregressive dynamic neural network to establish a gearbox oil temperature prediction model. Wang et al. [13] presented a condition monitoring method of wind turbine main bearing based on the deep belief network (DBN), where DBN is adopted to establish the normal temperature prediction model, so as to realize the condition monitoring of wind turbine main bearing. However, the use of multi-variate data may increase the complexity and uncertainty of the modeling process, which will reduce the performance of the prediction model. Compared to the multi-variable model, the single-variable model has lower computational complexity and easier data acquisition [14].
Although single-variable methods are seldom used in temperature prediction of gearbox components, many prediction methods have been proven to be effective in other aspects of wind energy systems, such as wind speed and wind power. The prediction methods can be roughly classified into three categories: the statistical methods [15,16], conventional machine learning methods [17,18] and deep learning methods [19,20]. Among the statistical methods, autoregressive integrated moving average (ARIMA) is the most classical and widely adopted model. However, most statistical methods are difficult to deal with the non-linear characteristics of the time series, which results in low prediction accuracy. In addition, the conventional machine learning methods are also widely chosen in time series prediction, which mainly include back propagation (BP) neural networks, radial basis function (RBF) neural network, extreme learning machine (ELM), support vector machine (SVM) methods and so on. Nevertheless, although the traditional machine learning method is an intelligent method, its ability of learning data nonlinearity and non-stationarity is not strong because of its shallow structure. In recent years, with the breakthrough of neural network technology, deep learning approaches have attracted wide attention because of its better performance in many tasks. Compared with the shallow methods, the deep learning methods can have a better ability of non-linear expression and data feature extraction [21]. Wang et al. [19] carried out a novel hybrid deep learning-based approach. The comparison results indicate that the hybrid model can better learn the non-linear and non-stationary characteristics.
The performance of gearbox condition monitoring depends on a high precision temperature prediction model, especially in the part of the high-temperature series. To this end, it is of great significance to develop optimization methods for promoting prediction performance. The existing optimization algorithms have three main aspects, including signal processing techniques [22][23][24], parameters optimization techniques [25,26] and error correction techniques [27,28]. As shown in Table 1, it is a summary of the above-mentioned and related algorithms.
In signal processing techniques, the signal decomposition method is widely used, such as empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD), fast ensemble empirical mode decomposition (FEEMD) and complete ensemble empirical mode decomposition (CEEMDAN). Various literatures have proved the effectiveness of decomposition algorithms. However, these traditional decomposition methods have some shortcomings. For example, sometimes it is difficult to decompose multiple low-frequency components for wavelet decomposition (WD) and wavelet packet decomposition (WPD), while other decomposition algorithms, including EMD, EEMD, FEEMD and so on, currently lack the strict mathematical proof [29]. In order to overcome these drawbacks, some new decomposition algorithms are adopted in time series prediction, such as empirical wavelet transform (EWT) and variational mode decomposition (VMD). In [24], the VMD approach is chosen to decompose the corresponding time-series signals, which avoids the interaction between different modes. In addition to the decomposition algorithm mentioned above, error correction is also a method to improve the performance of the prediction model [30]. In [28], an error correction model based on ICEEMDAN and ARIMA algorithm is proposed to promote the prediction accuracy.
In addition, there are still some deficiencies in the field of research, which need to be further studied. First, many literatures decompose training data and testing data together [31,32], which is not feasible in the process of real-time prediction. Regretfully, other literature does not clearly explain the construction process of the modeling data. Second, different from the wind speed prediction, the temperature will drop dramatically due to shutdown and other factors in the operation of wind turbines, which will result in inaccurate prediction results.
In the study, a new hybrid forecasting method is proposed, which consists of a preliminary temperature prediction model and an adaptive error correction model. The innovations and contributions of the proposed hybrid model are as follows: (a) with aims to avoid the complexity and uncertainty of multi-variable prediction model, a prediction model based on single-variable data is proposed. In this paper, a more suitable deep learning model for time series analysis, long short term memory (LSTM) model, is adopted, which can better learn the non-linear and non-stationary characteristics of temperature series; (b) in view of the problem of drastic temperature drop caused by the above mentioned downtime phenomenon, an adaptive error correction model is designed to improve the precision of prediction model; (c) to avoid the weakness of some decomposition algorithms mentioned above such as EMD, EEMD, FEEMD and CEMDAN, the VMD decomposition algorithm is employed in this paper, which can effectively reduce the chaotic characteristics and non-stationary of error series; (d) in view of the above mentioned the modeling data construction problems, a rolling data decomposition process which can be applied in practice is proposed.
The organizational structure of the paper is as follows: (a) the framework and algorithms of the hybrid prediction model are explained in Section 2; (b) gearbox components temperature forecasting case studies are presented in Section 3; and (c) conclusions are drawn in Section 4.  [16] single-variable AR conventional machine learning methods Huang et al. [12] multi-variable PCA, NARX Li et al. [17] single-variable ANN,RBF Abdoos et al. [18] single-variable ELM deep learning methods Wang et al. [13] multi-variable DBN Wang et al. [19] single-variable DBN Wang et al. [20] single-variable CNN optimization methods signal processing techniques Liu et al. [22] single-variable WPD, FEEMD Mi et al. [23] single-variable WPD, EMD Naik et al. [24] single-variable VMD parameter optimization techniques Meng et al. [25] single-variable crisscross optimization Liu et al. [26] multi-variable GA error correction techniques Wang et al. [27] single-variable Markov Wang et al. [28] single-variable ICEEMDAN-ARIMA

The Overall Framework of the Proposed Model
The overall framework of the hybrid model presented in this paper is shown in Figure 1. The general process of the proposed hybrid model is described as follows: • The original temperature series was predicted by the LSTM model to generate preliminary prediction results. Meanwhile, error series was generated by comparing predicted values with actual values.

•
Faced with the non-stationary, high-frequency and chaotic characteristics of error series, the VMD decomposition algorithm was employed to decompose it into sub-sequences of different frequencies.
In order to apply the model to the online prediction process, as shown in Figure 2, a rolling data decomposition process was developed. In Figure 2, T i , U i and R i,j represent the original temperature series, the error series of the preliminary prediction and the frequency component of error series decomposed by the VMD algorithm respectively, where i is a time label and j stands for the labels of different frequency components. • The prediction model of each frequency component was established by the error prediction model, and the final error prediction results were reconstructed based on the adaptive error correction algorithm.

•
The final forecasting results were obtained by adding the error prediction results with the preliminary temperature prediction results. When the predicted temperature exceeds a certain threshold, a high-temperature warning should be carried out.

Preliminary Prediction Model
This paper is devoted to the temperature prediction of wind turbines gearbox components so as to better realize the condition monitoring of gearbox. To avoid the complexity and uncertainty of the multi-variable prediction model, a single-variable prediction method is proposed. Due to the influence of complex operational conditions, it is difficult for conventional machine learning approaches to learn the nonlinear and non-stationary characteristics of gearbox components temperature data. Compared with traditional machine learning methods, deep learning methods have stronger non-linear expression ability. LSTM is a deep learning model, which not only has stronger non-linear expression ability, but also is more suitable for the prediction model of time series because of its memory characteristics. Therefore, the LSTM algorithm was applied to the preliminary prediction model of the gearbox component temperature series in this paper.
The LSTM neural network is an improved model based on a recurrent neural network (RNN) [33]. The output of LSTM depends not only on the input and weight of the current neuron, but also on the input of the previous neuron. Therefore, the LSTM structure is usually more suitable for processing time-series data. The basic unit structure of the LSTM model is shown in Figure 3. Four elements, including state of each unit, input gate, forget gate and output gate, are the core of the LSTM model. The relationship of the LSTM unit states and the three gates are expressed as Equations (1)-(5) [34,35].
where X t are input vectors; i t , o t and f t represent the output results of input gate, output gate and forget gate, respectively; c t represents the activation status of each cell; h t is the output results of memory unit. In addition,

Adaptive Error Correction Model
The temperature prediction accuracy of gearbox components greatly affects the high-temperature monitoring performance of gearbox. Therefore, an adaptive error correction model is presented in this paper, which can increase the accuracy of prediction by predicting error. However, due to the non-stationary and chaotic characteristics of the error series, it is difficult to predict the error series directly. Signal decomposition can effectively reduce the non-stationary and chaotic characteristics of time series, and many literatures have proved the effectiveness of the signal decomposition method. With aims to overcome the shortcomings of conventional decomposition, algorithms such as mode mixing problem and lack of mathematical proof, a kind of state-of-art VMD algorithm was applied in this paper. The final error value was reconstructed by predicting each decomposition component.

The VMD Algorithm
VMD, as a new signal decomposition method, has been widely used in recent years. Unlike EMD recursive solution, VMD transforms the solution problem into a variational problem. The purpose of the VMD algorithm is to find the inherent modal components of a specified number. To solve this variational problem, the alternate direction method of multipliers (ADMM) is selected to solve the modes and corresponding central frequencies. The specific algorithm process of the VMD algorithm is as follows [36].
(1) The constructive process of variational problems To calculate the bandwidth of each mode component, the analytical signals of each mode component are obtained by Hilbert transform, and then the unilateral frequency spectrum is obtained as follows.
where intrinsic mode function (IMF) is defined as an amplitude modulated frequency modulated signal.
Then the corresponding baseband is obtained by spectrum conversion of analytic signal. [ By calculating the L 2 -norm of the above analytical signal derivative and the bandwidth of each mode, the constrained variational problem is constructed as follows.
(2) The solution process of the variational problem To solve this variational problem, the constrained variational problems of Equation (8) are transformed into unconstrained variational problems by using Lagrange multiplier method.
where η is a quadratic multiplication factor and λ(t) represents Lagrangian multipliers. The ADMM algorithm is used to solve the above variational problems. In ADMM algorithm, the saddle point of the Lagrangian expression can be found by alternately updating u n+1 k , ω n+1 k and λ n+1 . Among them, u n+1 k can be updated using the following equation.
where the ω k and the ∑ u i (t) are equivalent to ω n+1 k and ∑ u i (t) n+1 , respectively. n is the number of iterations.
By using the Parseval/Plancherel Fourier isometry transformation, Equation (10) can be converted into a frequency domain form and solved in the frequency domain.
where· is used to represent the frequency form of the corresponding signal. sgn is sign function. Then the ω can be updated as ω − ω k in the first part.
The problem can be changed into a non-negative frequency interval integral form.
Finally, the solution of the quadratic optimization problem can be obtained as follow.
whereû n+1 k (ω) can be regarded as the Wiener filtering of the current residual. Similarly, the central frequencies of the corresponding modes are updated as follows: The ω n+1 k is the power spectrum center of the k-th modal component at the n+1 iteration. Theλ n+1 (ω)can be updated as: where ρ is the update coefficient ofλ n+1 (ω).

Adaptive Error Correction Algorithm
A prediction model is needed to predict each modal component after VMD decomposition. To simplify the complexity of the model, the LSTM model is employed to predict each component decomposed. The input in the error prediction model is the data of the past four moments. In the selection of the input number of the error prediction model, the grid search method is used to search the optimal parameters in the prediction performance of the model. Finally, each prediction component is reconstructed to get the final prediction value. However, the error prediction model has better prediction performance for the weak volatility part of the series than for the strong volatility part. Faced with highly volatile parts, the correction model may lead to deteriorating results. To reduce this situation, the following adaptive error correction algorithms are proposed to further improve the accuracy. The adaptive error correction algorithms are mainly considered in two aspects: effectiveness of correction model and amplitude analysis of primary error series. Given d, m and c are the results of error prediction, error series after correction and error series before correction, respectively. When g > 0, the correction is defined as invalid. Where g is equivalent to the difference between |m| and |c|. When g exceeds a certain threshold for continuous moments, the correction of the next time may also be invalid. In addition, to effectively decrease the influence of worsening correction, it is necessary to limit the amplitude of correction errors. The adaptive error correction algorithm is described in Algorithm 1.

Algorithm 1
The adaptive error correction algorithm.  where ξ, α and β is defined as a threshold.
In the process of threshold setting of ξ, α and β, the magnitude of error series is analyzed, and the threshold is set by grid search within a reasonable range. The effect of threshold setting on the accuracy of the hybrid temperature prediction model was studied by a grid search algorithm. In the experiment, the prediction performance of the model fluctuates slightly due to the influence of parameter initialization. Therefore, when choosing thresholds, we choose a group of thresholds whose prediction performance is in the middle, which can make the algorithm have better generalization ability and robustness.

Model Performance Evaluation
In order to compare the prediction performance of different prediction models, three evaluation indexes, including the mean square error (MSE), the mean absolute error (MAE) and the mean absolute percentage error (MAPE), are exploited in this study. The equations of three evaluation indexes are explained as follows: whereŷ i and y i are the predicted results of the model and the actual temperature values, respectively; and n is the length of the predicted temperature series.

Data Description
Almost all wind turbines are equipped with a SCADA system, which makes it very easy to obtain the temperature data of the gearbox components. In this study, the data is gathered from one wind farms in Shandong Province, China, which contains thirty-three wind turbines SCADA data at 10-min intervals from 1 February 2014 to 27 June 2014. In this study, three wind turbine prediction cases are provided to verify the superiority of the proposed hybrid model. The temperature data of gearbox components, including gearbox oil temperature, gearbox input shaft temperature, and gearbox output shaft temperature, are from SCADA system of #1, #2 and #3, where #1, #2 and #3 represent wind turbine 1, wind turbine 2 and wind turbine 3, respectively. Each data set contains 6400 series of 10-min data and is divided into two parts, including the first 5400 temperature series and the last 1000 series, which were used in the training process and the testing process, respectively. Generally speaking, in common types of wind turbines, the oil temperature early warning temperature threshold and alarm temperature threshold of the gearbox can be set to 75 • and 80 • respectively. The high temperature warning threshold of gearbox input and output shaft can be set to 80 • . In addition, this paper also collects two sets of wind speed data of wind turbine 1 to analyze the influence of the decomposition process for the on-line prediction model. The two datasets contain 600 and 601 observations at 10-min intervals in time scale, respectively.

The Case of Decompose Algorithm
To analyze the application of the decomposition algorithm in real-time time series prediction, the above two wind speed series are decomposed by EMD and VMD algorithm. Figures 4 and 5 show the decomposition results.
Through the analysis of Figures 4 and 5, it can be found that whether EMD or VMD decomposition algorithm, the new data may affect the results of previous data decomposition to a certain extent, which shows that the new data has a guiding effect on the results of the previous data decomposition. Therefore, it is not suitable for a real-time prediction model to decompose training data and testing data together. As shown in Figure 2, a real-time rolling data decomposition process based on VMD algorithm is proposed, which can be better applied to real-time prediction process. In the training data of the preliminary prediction model, the original temperature series is used to establish the preliminary prediction model (LSTM). In the training process of LSTM model, nine temperature values (such as T 1 . . . T 9 ) are used as input vectors (X i ) and the next temperature value (such as T 10 ) is used as output (y i ). Then, the error series generated by comparing the predicted result with actual value. In the training data of error prediction model, every 200 error series (such as U 1 . . . U 200 ) as a group are decomposed by the VMD algorithm. Then the last data after decomposition (such as R 200,j ) is used as the predicted value, and the four data (such as R 196,j . . . R 199,j ) before the last data are used as input. In the testing data of the error prediction model, the last four decomposed data (such as R 1397,j . . . R 1400,j ) are taken as input vectors. The final error prediction results are reconstructed by predicting the value of each frequency component. The final prediction results are obtained by adding the adaptive error prediction results with the preliminary temperature prediction results.

The Case of Gearbox Components Temperature Prediction
The case uses gearbox components temperature data of #1, #2 and #3. Each experiment consists of seven prediction models, including the LSTM model, the BP neural network, the ELM model, the LSTM model with error correction (LSTM-EC), the ELM model with error correction (ELM-EC), the ELM model with adaptive error correction (ELM-AEC) and LSTM-AEC. In the experiment of comparing the hybrid model with other models, all models have similar parameter settings. All models are built and simulated under Windows 10 operating system, Inter-Core i5-7500 CPU @ 3.40 GHz and RAM of 8 GB. All the experiments are implemented through Python 3.6. The parameters α, β, ξ and the number of input data in the adaptive error correction algorithm are set to 0.5, 1, 1 and 4, respectively. In the preliminary temperature prediction model, nine temperature data of historical time were used as inputs of the model. The BP neural network, containing a hidden layer with 26 neurons, is used in the three experiments. Three experiments used ELM networks containing a hidden layer with 6, 10 and 10 neurons, respectively. In the training data selection of the LSTM model, in order to make the model more robust, 1000-4000 observations are selected from 4000 observations, which prevents special results from special training sets. The decomposition number of the VMD algorithm was set to 8. In addition, the learning rate was 0.6 and the Adagrad optimization algorithm was used in LSTM and BP models.
The temperature of three gearbox components, including gearbox oil temperature, gearbox input shaft temperature, and gearbox output shaft temperature, is predicted by the proposed hybrid model. Tables 2-4 and Figures 6-14 show the prediction results of different models. In Figures 6-14, model A-H represents real value, LSTM-EC, LSTM-AEC, LSTM, ELM, ELM-AEC, ELM-EC and BP respectively. From the above prediction case, it can be concluded that: (a) By comparing the predictive performance of LSTM, ELM, and BP, the forecasting accuracy of the LSTM model was higher than other prediction models under the same conditions. Take the prediction results of wind turbine one gearbox oil temperature as an example in Table 2, promoting of the MSE of the BP and ELM model by the LSTM model are 0.7129 and 0.4046, respectively. Thus, it can be seen that the LSTM model can learn more about the non-stationary and non-linear characteristics of temperature data to a certain extent. (b) The prediction model with error correction has higher accuracy than the single prediction model in general. There are some prediction results, such as ELM and ELM-EC prediction results of the gearbox input shaft temperature in Table 3, which can prove this point. However, there are some special cases with opposite results, which contains three LSTM and LSTM-EC prediction results of the gearbox output shaft temperature and so on in Table 4. Therefore, it can be seen that some residual series will lead to worsening correction results. (c) Whether with ELM or LSTM, the accuracy of the prediction model with adaptive error correction can be improved. For example, in the prediction results of gearbox oil temperature in Table 2, promoting of the MSE of the LSTM model by the LSTM-AEC model are 0.2317, 0.0654 and 0.0819, respectively. (d) In all the prediction models involved, the proposed hybrid model has the best forecasting performance than other comparative models. From Figures 6-14, it can be seen that the predicted value of the proposed hybrid model in the high-temperature part is very accurate, which provides a guarantee for high-temperature warning of gearbox components. As shown in Figures 12 and 14, the temperature of the gearbox output shaft exceeds the high-temperature warning threshold at several points in #1 and #3 respectively, such as the high-temperature series starting from time points 45, 218, 629 and 938 in #1.   In practical application, the training process of the model is completed off-line. Once the model training is completed, the model can be used for real-time temperature prediction, which is guaranteed by the rolling data decomposition process proposed in this paper. In our forecasting case, it's like simulating the whole process, including model training and real-time forecasting. In addition, the experimental time was measured. The training time of this hybrid model was about 325.1167 s, but it should be noted that the training of the model was completed off-line. At the same time, 1000 temperature values were predicted, which took 49.8446 s. The average prediction time of each temperature was 0.0498 s, which fully satisfied the demand for 10-minute interval temperature prediction.

Conclusions
The accuracy of the prediction model directly affects the high-temperature warning performance of the wind turbines gearbox components. In order to achieve higher forecasting accuracy, a novel hybrid model, named the LSTM-AEC, is proposed in the study, which consists of the LSTM preliminary prediction model and adaptive error correction algorithm based on the VMD method. Besides, the dynamic and real-time data decomposition process of the VMD algorithm ensures that the proposed model can be used in the online process. To demonstrate the effectiveness and superiority of the proposed hybrid model, three wind turbine prediction experiments are given in this paper. The prediction models for performance comparison include the hybrid model (LSTM-AEC), BP, ELM, LSTM, ELM-EC, LSTM-EC, and ELM-AEC. Based on the comparative analysis of the prediction performance of different models, the following conclusions can be drawn. (a) By comparing LSTM with ELM and BP algorithms, it can be found that LSTM is superior to other models to some extent; (b) by comparing the two sets of models which contains ELM, ELM-EC, ELM-AEC, LSTM, LSTM-EC and LSTM-AEC, it is found that the adaptive error correction algorithm can optimize the preliminary prediction results to a certain extent; (c) according to the prediction results of three wind turbines, the proposed hybrid model has better performance than other comparative models. Moreover, the prediction accuracy of the proposed hybrid model in the high-temperature series part is high, which lays a solid foundation for the high-temperature warning of the wind turbines gearbox components.
Although the current research shows that the hybrid model has better prediction performance in temperature prediction of gearbox components, there are still some limitations of the model which need further study. The influence of model parameter initialization results in the fluctuation of prediction performance. Although this fluctuation does not affect the conclusions drawn in this paper, it shows that the hybrid model proposed in this paper has the possibility of further improvement. In addition, the hybrid model proposed in this paper only predicts the temperature of gearbox components in one step, but in practical applications, the multi-step prediction is more greatly needed, which can provide more maintenance time. In future work, the problem of parameter initialization will be studied to further improve the performance and robustness of the prediction model, and the development of multi-step temperature prediction model is needed, which makes the prediction model more practical.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: The output results of LSTM output gate c t The activation state of each cell h t The output results of memory unit w xi , w hi , w x f , w h f The corresponding weight vectors w xc , w hc , w xo , w ho The The L 2 -norm symbol n The number of iterationŝ (·) The frequency form of the corresponding signal sgn Sign function ρ Update coefficient of λ n+1 (ω) d Result of error prediction m Error series after correction c Error series before correction | · | The absolute value symbol g Difference between |m| and |c| ξ, α, β Threshold of Algorithm 1 T i The original temperature series U i The error series of the preliminary prediction R i,j The frequency component decomposed by the VMD algorithm #1, #2, #3 wind tubine one, wind turbine two and wind turbine three