An Adaptive Generalized Cauchy Model for Remaining Useful Life Prediction of Wind Turbine Gearboxes with Long-Range Dependence

: Remaining useful life (RUL) prediction is important for wind turbine operation and maintenance. The degradation process of gearboxes in wind turbines is a slowly and randomly changing process with long-range dependence (LRD). The degradation trend of the gearbox is constantly changing, and a single drift coefﬁcient is not accurate enough to describe the degradation trend. This paper proposes an original adaptive generalized Cauchy (GC) model with LRD and randomness to predict the RUL of wind turbine gearboxes. The LRD is explained jointly by the fractal dimension and the Hurst exponent, and the randomness is explained by the diffusion term driven by the GC difference time sequence. The estimated value of the unknown parameter of adaptive GC model is deduced, and the speciﬁc expression of the RUL estimation is deduced. The adaptability is manifested in the time-varying drift coefﬁcient of the GC model: by continuously updating the drift coefﬁcient to adapt to the change in the degradation trend, the adaptive GC model offers high accuracy in the prediction of the degradation trend. The performance of the proposed model is analyzed using real wind turbine gearbox data.


Introduction
As wind energy continues to grow, ensuring the reliability and reducing maintenance of wind turbines gains increasing importance [1,2]. Monitoring the health state and predicting the RUL of wind turbines has become important to limiting premature failures of wind turbines [3,4]. Bearings and gearbox failures are common wind turbines failures [5]. According to the statistics of the latest reliability database in 2011, approximately 76% of wind turbine failures occur due to bearings and gearboxes [6]. Most gearbox failures start at the bearing location and transfer to the gear teeth in the form of bearing fragments leading to shutdown [7,8].
Research has been conducted in wind turbine condition monitoring, fault detection and diagnosis, and failure prediction. The methods proposed are mainly divided into data-driven methods and model-based methods [9]. For example, a multi-class support vector machine is proposed for fault detection and identification in wind power transmission gearboxes by fusing different time-domain and frequency-domain features. An integral extension load mean decomposition multi-scale entropy method is used in [10] to extract features, and a least square support vector machine is developed to perform fault diagnosis and RUL prediction for wind turbines. In [11], an adaptive Bayesian algorithm is used to predict the bearing's RUL in wind turbines. A practical disadvantage of the data-driven methods is that they usually require historical data of degradation to train the model [12]. Wind turbines are usually complex, large-scale and highly reliable, and obtaining large amounts of fault measurement data is expensive and time-consuming. The degradation process of wind turbine bearings and gearboxes can be affected by many factors (for example, different environmental factors, failure modes, operating conditions, etc.), which introduces difficulties to the construction and training of data-driven models [13]. Therefore, fault diagnosis and failure prediction based on limited fault data samples are still challenging problems [3].
As a practical alternative, model-based wind turbine life prediction methods have been considered. For instance, a gamma model has been proposed in [14] to describe degradation processes. The model parameters are estimated and updated by the maximum likelihood estimation method and Bayesian method, respectively. A monotonic degradation model based on the inverse Gaussian process is used in [15] to describe the degradation process. The change point of the bearing degradation process is considered in [16] to establish a stochastic prediction model based on the Wiener process. However, the degradation of wind turbine gearboxes is a slow and continuous process with the LRD characteristic, which is affected by the historical state evolution. The above-mentioned model-based prediction methods of wind turbine do not take the LRD into account and do not consider the adaptive nature of the drift coefficient.
In this context, the work presented in this paper proposes an adaptive GC model to describe the LRD and randomness of wind turbine degradation. The adaptive GC model considers the time-varying nature of the drift coefficient of the degradation process [17]. In the continuous change in the degradation trend, the drift coefficient is automatically updated and adjusted in real time to adapt to the change so that the adaptive GC model can predict the degradation trend with high accuracy. The LRD and randomness are explained by the diffusion term driven by the GC difference time sequence. The LRD of the GC process is evaluated by its autocorrelation function (ACF) [18]. The GC process is considered to satisfy the LRD characteristics if its ACF integral diverges. Based on the integration of ACF, the GC process has been found to satisfy the LRD characteristic when 0 < (4 − 2D)(2 − 2H) ≤ 1 [19]. H is the Hurst index, and D is the fractal dimension. The GC process satisfies the LRD characteristics, which means that its ACF slowly decays to the extent of integral divergence. According to the above conditions, the LRD characteristics of the GC process are jointly evaluated by the fractal dimension and the Hurst exponent.
In summary, the main contributions of this paper are highlighted as follows: 1. The LRD prediction model driven by a GC process is introduced to describe the degradation process of wind turbine gearboxes. The LRD is explained jointly by the fractal dimension and the Hurst exponent, and the randomness is explained by the diffusion term driven by the GC difference time sequence; 2.
The time-varying drift coefficient is used to describe the adaptive update of the degradation trend, and the parameter estimation and RUL estimation of the adaptive GC model are deduced; 3.
The normality test of the GC difference time sequence ensures the accuracy of the multi-dimensional joint distribution of the predicting model.
The remaining sections of this paper are organized as follows. Section 2 describes the LRD characteristics of the GC process. The GC difference time sequence is constructed by ACF, and a normality test is carried out. Section 3 proposes an adaptive GC model and derives the parameter estimation and RUL estimation of the degradation model. Section 4 describes how the proposed procedure is applied to online monitoring and RUL prediction of wind turbines gearboxes. Finally, Section 5 is the conclusion of this paper.

The LRD Characteristic of The GC Process
Let {x(t), 0 < t < ∞} be a GC stochastic process; the ACF, R(τ), of x(t) can be written as follows [19]: where τ is the lag time, and α and β are parameters describing the LRD characteristics. The LRD characteristics of the GC process are evaluated jointly by the Hurst index H and the fractal dimension D, which represent global self-similarity and local roughness, respectively. The specific relationships are H = 1 − β/2, D = 2 − α/2 [19].
If the improper integral of the ACF of x(t) on (0, ∞) is infinite, then x(t) is said to be a stochastic process with LRD characteristics. The infinite integration of the ACF can be calculated as follows: where B(x, y) = Γ(x)Γ(y)/Γ(x + y) is the beta function and Γ(·) is the gamma function.
Consider the condition when αβ = 1, which diverges since Γ(z) ∼ 1/z as z → 0 . Thus,Γ(0)/α ∼ ∞ and the GC process is LRD for 0 < αβ ≤ 1, whereas it is short-range dependent if αβ > 1. According to the definition of LRD, we obtain that when 0 < αβ ≤ 1, the GC process satisfies the LRD condition: Substituting the fractal dimension and Hurst exponent for α and β, one obtains the evaluation criterion for LRD, 0 < (4 − 2D)(2 − 2H) ≤ 1. The ACF for different values of H and D is plotted in Figure 1. It can be seen from Figure 1 that low D and high H values make the ACF curve decay slowly. When the lag time is large enough, the ACF curve does not decay to zero and the improper integral in Equation (4) diverges.

The Construction of the GC Difference Time Sequence
In this paper, the ACF is used to construct the GC time sequence. The specific steps are as follows.
Step 1: The power spectral density (PSD) function of the GC process is obtained from the relationship between the ACF and the PSD: where F stands for the Fourier transform. Suppose that H(ω) and h(t) are the system function and impulse response function, respectively, and that h(t) is the inverse Fourier transform of H(ω). The relationship between the power spectral density function S(ω) and the system function H(ω) is S(ω) = |H(ω)| 2 . From Equation (5), we can derive the expression of the impulse response function h(t): Step 2: The GC time sequence can be obtained from the white noise signal and the impulse response function of the GC process [20]: where w(t) is the white noise signal. Suppose that the frequency spectrum of unit white noise is W(ω) = exp(jθ(ω)). The white noise signal w(t) is obtained by inverse Fourier transform of the frequency spectrum W(ω) of the white noise: Step 3: Set the time interval to 1 and obtain the GC difference time sequence through the difference The specific flowchart of the above generation procedure is shown in Figure 2. The GC time sequence is generated with D = 1.2 and H = 0.7. The GC difference time sequence is shown in Figure 3.

The Normality Test on the GC Difference Time Sequence
The PSD is generated by the Fourier transform of the ACF of the GC process, and the impulse response function is obtained by the system function. Then, the GC time sequence is obtained by convolving the impulse response function with the white noise sequence. Finally, the first-order difference processing is performed to obtain the GC difference time sequence. Before obtaining the distribution to which the increment of the degradation process obeys, it is necessary to test the normality of the GC difference time sequence. The histograms under different lengths of the GC difference time sequence are shown in Figure 4. The histogram test, the joint test of skewness and kurtosis, the Shapiro-Wilk (SW) test and the Kolmogorov-Smirnov (KS) test were used to verify the normality of the GC difference time sequence [21,22]. It can be seen from Figure 4 that as the length of the time sequence increases, the time sequence is closer to the normal distribution. The skewness describes the symmetry of the time sequence, and the skewness of the normal distribution is zero. The kurtosis describes the steepness of the time sequence, and the kurtosis of the normal distribution is three. The result of the joint test of skewness and kurtosis is shown in Table 1.  In Tables 2 and 3, the p-values of the different-length time sequences after the SW test and the KS test are larger than 0.05, which shows that there is no significant difference between the GC difference time sequence and the normal distribution. Therefore, the GC difference time sequence obeys the normal distribution.

The Adaptive GC Model
The degradation process of wind turbine gearboxes can be described as [23] where y k is the degradation value at time t k ,dy k is the difference of y k ,a k Λ k is the drift function of the degradation process, reflecting the degradation trend item Λ k is a function of time t k and σdB(t k ) is the diffusion function of the degradation process. The GC degradation process is obtained by replacing the random term B(t k ) with GC(t k ) [24]: In order to express the time-varying nature of the degradation trend, suppose that a = [a 1 , a 2 , . . . , a n ] is a time-varying drift coefficient that obeys the normal distribution N(µ a , σ 2 a ). Then, an adaptive GC model is proposed. Adaptability is necessary to automatically adjust and update the model parameter values and associated RUL expressions according to the data characteristics. The principle of adaptive update of the drift coefficient is shown in Figure 5. The cycle ends when the health indicator (HI) exceeds the fault threshold (FT). Bringing the difference time sequence into Equation (10) yields Equation (11):

Parameter Estimation in the Adaptive GC Model
There are five unknown parameters-µ a , σ a , σ, H, D-in Equation (11), the values of which need to be determined. Let us denote the unknown parameter vector as Φ= [µ a , σ a , σ, H, D] T , where [·] T represents the transpose of the vector. The maximum likelihood estimation method is used to estimate the values of the unknown parameters based on actual degradation data. Suppose that the measured values of the degradation process of the HI at time t 1 , t 2 , . . . , t n are contained in the vector variable Y = [y 1 , y 2 , . . . , y n ] T , where n is the number of measurements. Let Λ= [Λ 1 , Λ 2 , . . . , Λ n ] T . According to the nature of the independent increment of the GC process, the multi-dimensional random variable Y follows: where Q is the covariance matrix of the GC process. The likelihood function is obtained from the probability density function of the multi-dimensional joint normal distribution Calculating the partial derivatives of Equation (13) can predict the parameters µ a and σ a , respectively, ∂l Setting Equations (16) and (17) to 0 and solving to obtain the maximum likelihood estimates ofμ a andσ 2 The contour log-likelihood function about σ can be obtained by substituting Equations (18) and (19) into Equation (13): whereΣ =σ 2 a ΛΛ + σ 2 Q. The maximum likelihood valueσ 2 can be obtained by maximizing the contour log-likelihood function.
In order to maximize the contour log-likelihood function, it is usually necessary to set the initial value of σ 2 . An inaccurate setting of the initial value will affect the result of the estimated valuesσ 2 . The initial values of σ 2 can be obtained as follows:
Regarding the initial values of a 1 , a 2 , . . . , a n as n actual values of a i ∼ N(µ a , σ 2 a ), the initial estimated values of µ a , σ 2 a are obtained by calculating the mean and variance of a 1 , a 2 , . . . , a n . 3.
According to a 1 , a 2 , . . . , a n , maximize l(σ 2 Y) to obtain the estimated valueσ 2 The Hurst exponent H is calculated by the rescaled range (RS) method [25]: where y is the mean of the HI, y i and y i,N are dispersion values, R and S are the range and standard deviation, respectively. The estimated value of the Hurst exponent H is obtained by the logarithmic relationship between R/S and N: The value of the fractal dimension D is estimated by the box dimension method [26]: where N d is the total number of squares occupied by the HI and d is the square side length.

RUL Estimation Based on the Adaptive GC Model
According to the concept of a RUL based on the first arrival time of degradation process to a given failure threshold, RUL at the moment l k is defined as: where ω is the FT. When the gearbox degradation process exceeds FT after the prediction start time (PST) for the first time, it is considered that a fault has occurred, and this time is called the end of life (EOL). In Figure 6, the RUL prediction schematic is clearly explained. Based on the RUL of the Wiener process degradation model, it is easy to obtain the RUL based on the GC process degradation model [27].
With the above parameter update mechanism, one can adaptively update the drift coefficient a k according to the observation data at the current moment. It can be seen from Equation (27) that the expression of the RUL is related to the drift coefficient a k . Next, we discuss the adaptive update of the RUL. The probability density function (PDF) of the RUL is obtained from the total probability formula where f L k (t y(t k )) is the PDF of the RUL considering the adaptive update of the drift coefficient a k , f L k (t y(t k ), a k ) is the conditional PDF of the RUL when the drift coefficient a k is determined, and f (a k |y(t k )) is the PDF of the drift coefficient a k following a normal distribution. If Z ∼ N(µ Z , σ 2 Z ), we obtain ). (29) According to the equation for the calculation of the mean value, we can regard f L k (t y(t k ), a k ) as the integrand function. Then, Equation (28) can be written as Comparing Equations (29) and (30), we can set Z, K, K 1 , K 2 and K 3 to Z = a k , K = ω Λ k −tdΛ k /dt , K 1 = ω, K 2 = Λ k and K 3 = tσ 2 , respectively, where S k = Λ 2 k σ 2 a + tσ 2 . When the drift coefficient a k updates adaptively, the mean µ a and variance σ a of the drift coefficient a k also change accordingly. It can be seen from Equation (31) that the PDF of the RUL is related to µ a and σ a , so that the RUL is also updated adaptively with the drift coefficient a k . According to the above, the flowchart of the specific procedure for RUL prediction is shown in Figure 7.

Data Description
We consider a case study that concerns experimental data obtained from the FZG test bench of the company STRAM, which operates a wind turbine. As shown in Figure 8, the test bench is composed of a cooling and lubricating controller, gear platform, torque controller and actual operation platform. The sampling frequency is set to 50,000 Hz, and the sampling duration is set to the first 20 s of each minute, that is, one million data are sampled each time. The experimental temperature is set to 70 • C. The vibration signal is collected by an accelerometer placed on the experimental gearboxes. When the amplitude of the collected vibration signal exceeds a certain fault threshold, the experiment stops. In Figure 9, the normal gears before the experiment and the broken gears after the experiment are shown. In this paper, the gearbox vibration signal data collected by the gear contact fatigue test bench are used to verify the effectiveness of the adaptive GC model.

Data Preprocessing
In Figure 10, the original signal is depicted, and feature extraction is required to reflect the degradation information of the gearboxes. The unit of sampling interval in Figure 10 is minutes. In Figure 11, nine time-domain features of the data are extracted to describe the degradation process of the gearboxes [28].  In order to reduce the computational burden, these high-dimensional features need to be fused to form a synthetic HI. In this paper, the Mahalanobis distance (MD) method is used to construct the HI [29,30]: where MD i is the MD calculated at the i-th moment, Z i = [z i,1 , z i,2 , . . . , z i,m ] are the m features extracted at the i-th moment in time, and Σ Z is the covariance matrix of Z i . The new HI is obtained by calculating the MD of Equation (32), and the result is shown in Figure 12. The extracted features are fused by the MD method as the HI, which is used to describe the degradation process in the RUL prediction.

RUL Prediction
The MD method is used to perform the fusion of the features extracted from the gearbox vibration signal into an HI, which is often used to predict the RUL. When the failure threshold is set to 10 units of HI value, the corresponding fault occurs at 566 min. For the parameter estimation, we give the concrete expression of the adaptive drift coefficient. The specific estimation results of the unknown parameters Φ= [µ a , σ 2 a , σ 2 , H, D] T of the adaptive GC model are given in Table 4. The GC model, fractional Brownian motion (FBM) model, long short-term memory (LSTM) model and adaptive Wiener are selected for comparison with the adaptive GC model [31,32]. The difference between the GC model and the adaptive GC model is that the adaptive GC model takes into account the time variation of the drift coefficient. From the estimated parameter value if 0 < (4 − 2D)(2 − 2H) = 0.9370 ≤ 1, the gearbox degradation process has an LRD characteristic. Therefore, the adaptive GC model with an LRD characteristic can be used to predict the gearbox RUL. The estimation of the Hurst exponent H by the RS Method is shown in Figure 13. According to Equation (24), the estimated value of the Hurst exponent H is the slope in Figure 13. The calculated result is 0.5723. The predicted RUL is obtained through 1000 trials of Monte Carlo simulation [28,33]. The histogram and PDF of 100 RUL prediction are shown in Figure 14. As shown in Figure 14, according to the statistical results, the RUL has the largest value of the PDF in 46. Hence, the RUL estimate from Monte Carlo simulation is 46. Since the sampling interval is minutes, all RUL values in this work are given in minutes. Then, the PDF of the RUL at 10 different prediction starting points of the operation process is obtained by Monte Carlo simulation. The prediction results of the PDFs by different models are shown in Figure 15 [34][35][36][37]. In Figure 15, the black straight line represents the actual RUL, and the five-colored star represents the estimated RUL. The RUL obtained by the five different prediction models is expressed in a two-dimensional plane, and the results are shown in Figure 16. The estimated RUL value of the different models are shown in Table 5.  To comprehensively compare the prediction advantage of the different models, the evaluation indicators of mean absolute error (MAE), root mean square error (RMSE) and health degree (HD) are considered as follows [38]: where L k is the predicted RUL, L * k is the actual RUL, s is the number of predictions and L k is the mean value of L k .
In Table 6, the RUL prediction accuracy of the adaptive GC model is higher than that of the other four models. One of the reasons is that the drift coefficient of the adaptive GC model has been adaptively updated on the data collected, which gives a more accurate description of the degradation trend of the gearboxes with a fixed drift coefficient. Another reason is that it is more accurate to use an adaptive GC model with LRD characteristics to predict the degradation process of wind turbine gearboxes with LRD characteristics.

Conclusions, Limitations and Future Research
Based on the GC process, an adaptive GC model with LRD characteristics and randomness is originally proposed for RUL prediction of the gearboxes in a wind turbine. The general conclusions of the work related to such model are as follows: 1.
The degradation of wind turbine gearboxes is a slow and continuous process with LRD characteristics. The LRD is demonstrated by the Hurst exponent and the fractal dimension. The randomness is explained by the diffusion term driven by the GC difference time sequence. In this paper, the GC difference time sequence is constructed through the ACF of the GC process, and the normality of the time sequence is verified; 2.
The adaptability of the adaptive GC model is manifested by using time-varying drift coefficients to update the degradation trend in real time so as to improve prediction accuracy. The expressions for the adaptive estimation of the unknown parameters and RUL are derived. The upgradation of the parameters is beneficial to the research of RUL prediction compared with the models of fixed parameters; 3.
The upgrading of the drift coefficient is based on the random sampling of the Gaussian distribution, which brings little error. Future work can focus on a better upgradation of the drift coefficient.