Bearing Remaining Useful Life Prediction Based on a Scaled Health Indicator and a LSTM Model with Attention Mechanism

: Rotor systems are of considerable importance in most modern industrial machinery, and the evaluation of the working conditions and longevity of their core component—the rolling bear-ing—has gained considerable research interest. In this study, a scale-normalized bearing health indicator based on the improved phase space warping (PSW) and hidden Markov model regression was established. This indicator was then used as the input for the encoder–decoder LSTM neural network with an attention mechanism to predict the rolling bearing RUL. Experiments show that compared with traditional health indicators such as kurtosis and root mean square (RMS), this scale-normalized bearing health indicator directly indicates the actual damage degree of the bearing, thereby enabling the LSTM model to predict RUL of the bearing more accurately.


Introduction
Bearing reliability evaluation and remaining useful life (RUL) prediction have received extensive attention due to the increasingly extreme working conditions of the entire system [1,2]. Generally, bearing life follows a specific statistical distribution that can be inferred from a large number of life data samples. However, due to the influence of assembly errors, material defects, and load fluctuations, in practice, bearing life has strong randomness. Studies carried out using public bearing life datasets can be taken as examples. In the accelerated degradation datasets of bearings collected by FEMTO-ST on the PRONOSTIA experimental platform, the RUL of the test bearing under specific load and speed conditions falls within the range of 2-7 h [3]. The results of the datasets provided by the BPC system from Sumyoung Technology Co. are similar, where the life spans of the test bearings are between 30 min and 8 hours [4]. In addition to the accelerated degradation experiments mentioned above, the RULs obtained from the natural degradation experiments are also different. For example, in the life test of the Rexnord ZA-2115 double-row bearing carried out on the durability test rig by NASA, the results are between seven days and one month [5]. Therefore, it is highly challenging to predict bearing RUL from a statistical point of view accurately. For this reason, in engineering practice, research focus has shifted to the study of the RUL of individual bearing, considering its actual working condition. Kundu [6] predicted the bearing RUL by establishing a Weibull proportional regression model based on the monitored signals of the PRONOSTIA platform. By combining the respective advantages of long-and short-term memory (LSTM) and statistical process analysis, Liu [7] proposed a new network to predict the bearing RUL using the datasets released by NASA and FEMTO-ST. Huang [8] introduced the transfer learning method and constructed a transfer depth-wise separable convolution recurrent network to predict the bearing RUL from the same public datasets considering different work conditions. Various mathematical and physical models were successfully applied to the prediction of bearing RUL. However, an overview of the aforementioned studies indicates that the actual degree of damage to the test bearing was not considered in the studies using the public bearing datasets; only the physical signals, e.g., acceleration and temperature signals, monitored in the bearing life test were studied. Take the FEMTO dataset as an example, and it did not provide damage sizes of the test bearings. The termination criterion was only created by the acceleration signal exceeding the threshold. Similarly, the same criterion was defined as the end of bearing life in many other representative studies [7][8][9][10]. However, it was found in our experiments that under the same operating conditions, even if the same acceleration threshold is used as the test termination condition, the final damage degree, namely the crack length in the spall area, can be remarkably different. Figure 1 shows the damage to four test bearings under the same operating condition, for which the tests were stopped based on the same acceleration threshold. It was demonstrated that even bearings with the same damage exhibited utterly different degrees of damage under similar acceleration levels. Therefore, under different application scenarios, the construction of health indicators and the development of life prediction methods need to be explicitly designed.

Inner-race crack length
Outer-race crack length Multiple spall areas Among the conventional health indicators [11,12], the root mean square (RMS) is the most commonly used, although it has the disadvantage of strong fluctuations. Additionally, the relative root mean square (RRMS) was adopted in the literature to reduce the negative impact of the fluctuations on the calculation results [9]. In addition to the RMS, Zhong [13] used a nonparametric health index for machine condition monitoring by combining envelope spectrum and statistic model, and principal component analysis (PCA) was used by Kundu [6] for dimensionality reduction of feature vectors to extract those features with better monotonicity. Xu [14] purposed a new health indicator called moving average cross-correlation of the power spectral density (MACPSD) to predict the health state of ball bearing. However, the problem with these health indicators in practice is that their calculated values are not directly related to the actual damage degree of the bearing, as shown by the crack length of the spall area in Figure 1.
In RUL prediction, researchers have proposed numerous prediction methods, which can be mainly divided into the model-driven method, data-driven method, and hybrid method. Model-driven methods, such as the Paris-Erdogan method [15], can accurately predict the bearing RUL under different working conditions. Still, the accuracy depends on establishing a practical model based on the physical system. The data-driven method utilizes statistical models or artificial intelligence (AI) models to predict the RUL, where the statistical model is often referred to as an empirical model-based method. For example, Kumar [16] combined the Kullback-Leibler divergence and Gaussian process regression to predict the RUL. Li [11] and Qiu [17] presented a stochastic process to predict the RUL. Xing [18] and Li [19] proposed a mixed Gauss model-hidden Markov model (GM-HMM) to predict the RUL of wind turbine bearings. Other methods include AI mathematical models such as the support vector machine [9] and artificial neural network [20], which can handle complex systems problems without any prior knowledge. However, datadriven methods can only manage the targeted work condition and exhibit no universal adaptability. On the other hand, the hybrid method combines the advantages of the model-driven method and the data-driven method; techniques such as the Kalman filter [21] and particle filter [22] belong to this type. Although many works were proposed to study the problem of bearing RUL prediction, few relevant studies have considered detecting accurate damage occurrence time and end of lifetime, which are crucial for prediction outputs. Antoni [23] used entropic evidence to detect the initial faults in rotating machinery. Chegini [24] purposed ensemble empirical model decomposition and wavelet packet decomposition to detect the initial faults in rotating machinery, and the Nirwan [25] used the acoustic emission to detect faults. Although such mentioned works provided relatively accurate detection results, the detection models or the extracted features originated from extensive calculations. It is not easy to implement such methods under the requirement of real-time response, mainly when the prediction of bearing RUL is an ongoing process.
To address the problem of detecting initial bearing damage, an adaptive envelope analysis method was employed in Section 2. Subsequently, in Section 3, an improved phase-space warping (PSW) algorithm was proposed to construct the bearing health indicator, followed by the indicator normalization using the hidden Markov model regression (HMMR) introduced in Section 4. The test bearing datasets verified the capability of the normalized health indicator to reasonably reflect the actual degree of bearing damage. In Section 5, the health indicator threshold, indicating the true damage extent, was defined as the end of bearing lifetime, which allows the proposed encoder-decoder long-short term memory (LSTM) model with an attention mechanism to predict the bearing RUL. The results through analyzing the experimental data of bearing life proved the effectiveness of the proposed method, see Section 6.

Detection of Bearing Initial Damage
In engineering practice, it is not practical to take the operation starting time of the rolling bearing as the initiation for bearing RUL prediction. This method has the shortcomings of extensive computation and unclear engineering significance. In contrast, predicting the bearing RUL after detecting the initial damage through real-time monitoring of signals can help determine the maintenance schedule and have more practical engineering significance as the amount of calculation is relatively small. Therefore, before predicting the bearing RUL, it is necessary to develop a damage detection algorithm with high sensitivity and strong robustness to determine the starting time of prediction (i.e., damage occurrence time). This paper proposed a sensitive and stable envelope analysis method that does not need a large amount of calculation and can adaptively determine a reasonable envelope bandwidth to identify the initial damage of the bearing as well as the starting time for RUL prediction.

Adaptive Frequency Band Selection
The vibration signal should theoretically contain ball passing frequencies (BPFs) when bearing damage occurs, whose calculation formulas are shown as follows, in which, the R , P D , b D , and n are, respectively, the rotational speed, pitch diameter, roller diameter, contact angle and number of rollers.
Envelope analysis is the standard procedure to extract the ball passing frequencies, called the damage-related characteristic frequencies, from the signal envelopes in real applications. In the envelope analysis, the selection of frequency band for the bandpass filter is a critical step, for which researchers have proposed various methods. Among them, the fast Kurtogram [26] is the most commonly used one for selecting envelope frequency bands. The signal components extracted by this method are primarily impulsive, as fast Kurtogram was initially designed to be highly sensitive to the impacts hidden in the original signal. Due to the impulsive nature of bearing damage response, the method has proven effective in many applications. However, in some other cases, the slide-roll ratio of the bearing or the collision between the cage and rolling elements under normal operations also have significant impacts. As a result, envelope analysis based on fast Kurtogram would lead to misjudgment owing to its excessive sensitivity. Recently, Peeters [27] proposed to select the optimal frequency band based on the sparsity of envelope signals. This approach utilizes the second-order stationarity of healthy bearing signals to maximize the envelope sparsity, thereby realizing the selection of the envelope frequency band. However, when the normal impacts break the hypothesis of second-order cyclostationary, this method would still fail to extract the characteristic frequencies. Therefore, we shift our focus to develop a sensitive and robust detection method for damage frequency extraction. The method must predict the bearing RUL automatically and avoid the interferences caused by slight impacts from the slide-roll mechanism or the collision between the cage and rolling elements during the regular operation of the bearing.
It is well-known that the bearing damage-induced impacts excite vibrations at the bearing resonance frequency [28]. In other words, if the resonance frequency can be identified, the frequency band close to it can be used as the filter band to perform the envelope analysis. To this end, the peaks of power spectral densities (PSDs) of bearing accelerations are used as the center frequencies of the envelope frequency bands. Moreover, it is assumed that the natural frequency rests at the local peak of its PSDs under 10 kHz [29]. Then, eleven frequency bands from the local minimum were selected as the envelope passbands to extract the characteristic frequencies of the bearing damage, see Figure 2.

Initial Damage Detection
Spectral peaks of a healthy bearing would be evident at the corresponding characteristic frequencies in the envelope spectrum.
Specifically, when a local spalling occurs in the inner-ring, the ball passing frequency of the inner-ring (BPFI), its low-order harmonics, and the nearby sidebands may all exhibit peak characteristics. When a bearing has an outer-ring spalling, the ball passing frequency of the outer-ring (BPFO) and its low-order harmonics would also exhibit corresponding peak characteristics. Table 1 shows the geometrical parameters, experimental conditions and the calculated characteristic frequencies of our test bearing. On the other side, Figures 3 and 4 show the envelope spectrums of the inner-and outer-ring damaged bearing, respectively. They are obtained from the pre-divided bandpass frequency bank with eleven passbands, see Figure 2 as an example.  Each time the acceleration signal is collected from the accelerometer, we select filter bands from the envelope spectrum using the method described in Section 2.1. Then, the obtained amplitudes of the characteristic frequencies would be compared with the predefined threshold to determine whether damage has occurred in the bearing. In this paper, the threshold is defined as th n 3 , where n represents noise level expressed as local average, excluding the amplitudes of BPFs. Taking the definition of the BPFO threshold as example, the BPFO can be determined using Equation (5). In the equation, s f represents the bandwidth that needs to be captured, (freq) s F represents the amplitude at the target frequency, s n is the number of spectral lines, and the BPFO is the ball passing frequency of the outer-ring damaged bearing. To avoid misjudgment, when the initial damage occurs in the bearing, it is assumed that at least two of the amplitudes at its characteristic frequencies, i.e., the BPF, the loworder harmonics, and the sideband frequencies (if inner-ring damage occurs in the bearing), would exceed the threshold [30]. Therefore, 11 time instants can be determined based on the 11 envelope spectrums as the bearing service time goes by. The minimum of these time instants is defined as the occurrence time of bearing damage. Figure 5a,b are the zoom-in envelope spectrums corresponding to the damage occurrence time obtained by the above method. Compared with other frequency components, the characteristic frequencies are dominant in amplitude (see the BPFI and its sideband in Figure 5a and the BPFO in Figure 5b). Thus, when damages occur to the bearing, if reasonable bandpass filter ranges are selected, there would be significant increases in the amplitudes of characteristic frequencies in the envelope spectrum. On the other side, Figure 5c,d show the envelope spectrums for the same time instant obtained after selecting the filter band using the fast Kurtogram. The amplitudes of the BPFI and its sideband in Figure 5c, together with the amplitude of BPFO in Figure 5d, are not significantly larger than those of other frequency components. Disappointingly, the fast Kurtogram based envelope analysis method fails to detect bearing damages at last and cannot be applied to initialize the prediction of bearing RUL. Figure 5. The zoom-in envelope spectrums of inner-and outer-ring damaged bearing, in which the calculated results of (a) the BPFI and its sidebands from our method, (b) the BPFO from our method, (c) the BPFI and its sidebands from the fast Kurtogram and (d) the BPFO from the fast Kurtogram.

Construction of Health Indicator
The amplitudes of BPFs in the envelope spectrum can help determine the occurrence time of bearing initial damage. Nevertheless, it is not a good option as a health indicator to predict the RUL because of its poor monotonicity against the bearing degradation process. Therefore, once the initial damage is detected, a new bearing health indicator needs to be established to predict the bearing RUL.
Previous studies often use time-domain statistical features, e.g., RMS, kurtosis, and peak-to-peak value, as health indicators in industrial applications. In contrast, applicational cases related to frequency-domain features, e.g., wavelet transform and Hilbert-Huang Transform, were presented in the references until recent years. However, due to the introduction of pre-defined parameters in the mathematical transformation, its applications in industrial fields still have limitations. Time-domain features are still the most commonly used health indicators. The calculation formulas of RMS and kurtosis are as follows: where xi is the raw data and n is the number of data points used to calculate the feature value. μx and σx are respectively the mean value and standard deviation of the raw data. However, owing to the complexity of actual physical systems, it is challenging to accurately express the service status of the bearing with the above features. Their respective trends in Figure 6 shows statistical characteristics such as RMS and kurtosis. The RMS exhibits evident fluctuations, while the kurtosis would decrease as the number of defectinduced impacts increases when the bearing damage becomes more severe. These poor monotonic features are likely to cause misjudgment of bearing life in engineering applications. Furthermore, for features extracted based on AI methods, it is challenging to explain the physical meaning of such features. Most importantly, the above-mentioned features are incapable of reflecting the crack length of the bearing spall area shown in Figure  1. As shown by a group of experimental results listed in Table 2, when the experiment ended, the RMS and kurtosis after scale standardization (the standardization method is introduced in Section 4) cannot effectively reflect the crack length in the bearing (+represents multiple spall areas, as shown by Figure 1). Therefore, to overcome the afore-described challenges, this paper proposes to use the PSW algorithm as the feature extraction method.

Phase Space Warping (PSW) Theory
PSW is an effective method for evaluating the damage of multi-layer dynamical systems, which was first proposed by Chelidze [31]. From the perspective of dynamic theory, the damage evolution of the entire rotor system can be regarded as a high-order system composed of a "fast time" scale and a "slow time" scale. Its definition is as follows: ,, ,, In Equation (7), n xR  is the "fast time" scale variable, which can be measured directly. m R is the "slow time" scale variable, which can directly reflect the damage state of the entire system but cannot be measured directly.
() f  and () g  are the "fast time" and "slow time" scaling functions, respectively.
()is a function of the variable , t represents time, and (0 1) is a constant parameter that defines the damage rate of the system. The response states of the bearing system at the initial time point t0 and after operating for a certain period tp can be expressed as ,, Suppose that the bearing system always maintains its initial state without experiencing any change or damage, then the response of the bearing system can be calculated using Equation (9). x t F x t (9) Therefore, with the initial operating state of the bearing system as the reference state, there is 0 R tt  . Then, the damage state of the bearing system (or damage tracking) can be expressed as follows: In Ref. [31], after performing the Taylor expansion, the bearing system's damage state can be finally expressed as where () O  represents higher-order infinitesimal. In this paper, the raw acceleration signal of the bearing is regarded as an observable "fast time" scale variable, while the damage state of the bearing system is regarded as a "slow time" scale variable. Typically, to calculate the damage state of the bearing, the phase space reconstruction theory based on the Takens embedding theorem would be introduced, and the damage state of the bearing would be quantified on this basis. The phase space reconstruction is mathematically expressed as follows: where D R yR  is the reference initial state of the phase space of the bearing system, n is the number of vectors in the phase space, and N is the total number of observation data points.  and D represent the time delay and embedding dimension of the phase space, which can be calculated using the mutual information [32] method and Cao's method [33]. Theoretically, the unknown mapping between the reconstruction vector (  . Yn+1 is the target response, that is, the combination of the next step vectors of the I nearest neighbor vectors. However, in actual scenarios, the "slow time" scale parameter (i.e., the operating state of the bearing) is in a state of continuous degradation. Thus, after degradation for time tp, the mapping between each phase space point and its next step would be changed, and the true mapping can no longer be expressed by Equation (13) but can be obtained using Equation (17).
Assuming that during the bearing operation, there is no damage to its operating state, and the mapping relationship is not changed after the elapse of time tp. In this case, the theoretical reconstruction vector of the next step Based on the above idea, the damage "trajectory" during bearing operation after the elapse of time tp can be obtained as follows: ;;  ship but exhibits a certain degree of nonlinearity. Therefore, a PSW mapping model that takes into account actual nonlinear factors is proposed in this section.

Improved PSW Algorithm
Ref. [34] proposed the random vector functional-link net (FLNet), where the input and output layers are directly connected without an activation function. At the same time, an enhanced pattern is employed as an alternative to the nonlinear activation function. The structure of FLNet used for phase space nonlinear mapping in this paper is illustrated in Figure 7. () G is the randomly input activation function, representing the nonlinear part of the model, such as the sigmoid function and so on.
is a parameter generated randomly from the 0-1 continuous uniform distribution to generate nonlinear effects in combination with the activation function, which is called the enhancement node, and m is the number of enhancement nodes. Compared with conventional neural network structures with nonlinear relationships, the structure adopted herein not only provides nonlinear characteristics that are not originally available for the mapping between phase space reconstruction vectors but is also more in line with engineering reality. At the same time, under the framework of this structure, the regression parameters can be directly calculated by linear regression formulas, which can eliminate calculations in neural networks that need to use gradient descent methods, thus exhibiting the superior advantage of small calculation amounts. In summary, when applying the phase space reconstruction function, the mapping between where ( 1) md R is a random parameter matrix generated from the continuous uniform distribution, which is composed of ; is the combination of the reconstruction vector and the enhancement node. In summary, by rewriting Equation (16) using Equation (20), the following can be obtained: Then, according to the Ref. [31], the final quantified damage state of the bearing in the phase space can be calculated using Equation (23). q n e n E qn (23) where () qn is the weight function. n r is the Euclidean distance between the current reconstruction vector () p yn and the one that has the farthest Euclidean distance in the space composed of the I nearest neighbor vectors. M is the correlation dimension of the reference phase space. Finally, with the sample Bearing 1-1 as an example, the improved PSW results are shown in Figure 8. Compared with kurtosis and RMS, it exhibits significantly smaller fluctuations and better monotonicity, so it is more appropriate to serve as the health indicator for the bearing. In Refs. [35,36], several other features shown in Table 3 are used for the prediction of bearing RUL. However, the prediction task begins after the occurrence of bearing initial damage. Hence, we select the monotonicity of the features after the bearing initial damage as the evaluation criterion. For a comparison purpose, the result from the PSW method is presented together with those from other features, as shown in Figure 9.

Feature Normalization
The improved PSW can better describe the damage state of the bearing from the perspective of the bearing system, but there are still certain problems with PSW for different bearings. Figure 10a shows the full life PSW data of five identical bearings under the same working conditions. There are relatively large initial fluctuations in the PSW values of different bearings. Similarly, the traditional RMS also faces the same problem. Therefore, it is necessary to develop a numerical standardization method to unify the quantitative standard of the bearing health indicator. This study utilized a method proposed in the Ref. [9], which used relative values to unify and quantify health indicators. In this method, the relative PSW value is calculated as follows: Eh hh (25) where h1 and h2 are the starting and ending point of the steady phase during the full life of the bearing. In Ref. [9], these two points were selected empirically, which is not a scientifically effective approach. In this paper, the ending point h2 of the steady phase of the bearing is defined as the point of occurrence of minor faults; that is, the starting point of the fault obtained by the adaptive envelope analysis method described in Section 2. As for the starting point of the steady phase, Figure 10 indicates that once the bearing starts to operate, there is a certain "shift" in the bearing system state at the initial phase. However, this de facto "shift" is not damage; therefore, it is unreasonable to use the bearing's starting point as the starting point of its steady phase. Therefore, this paper proposes to employ HMMR and use the PSW data before the fault point as its good input to perform unsupervised classification, thereby automatically dividing this segment of data into the initial phase and the steady phase.

Hidden Markov Theory
The HMM model is a model for nested stochastic processes, which contains two states: one is the invisible hidden state

Z(1) Z(T) Z(2)
The evolution of this model is generally described by the transition probability matrix A, which can be expressed as The initial probability of the stochastic process is expressed by 12 , ... Q . In summary, the parameters contained in a complete HMM model are given as

HMMR-Based Normalization
In this study, after extracting the healthy data of the bearing, it is assumed that the healthy phase of the bearing can be divided into two phases-namely the initial running-in stage and the steady stage-and regarded as the hidden states of the HMM model [37]; moreover, this is an irreversible process. Thus, the transition matrix A can be expressed as 11 11 1 01 Figure 12 indicates that in the initial running-in stage, the bearing's PSW value is in the form of an increasing cubic spline, while in the steady phase, it tends to be a stable straight line. Therefore, the explicit state is treated as a continuous Gaussian random distribution based on a polynomial regression model in this paper, rather than a traditional discrete emission matrix. This way, the traditional HMM model is modified into an HMMR model, where the running-in phase is a cubic spline Gaussian regression model, and the steady phase is a linear Gaussian regression model, as shown by Equation (28).
where Uj is the regression coefficient of the ( 1) l -dimensional lth order function under the jth hidden state.
2T [1, , ] p j j j j t t t is the regression input (i.e., time input) under the jth hidden state.
(0, 1) N is the Gaussian distribution with zero mean and one standard deviation, j is the standard deviation of the regression model under the jth hidden state. Therefore, the parameters for the HMMR model in this paper are rewritten as follows: Finally, according to the maximum likelihood estimation algorithm, the model parameters can be obtained as According to the method proposed in the Ref. [38], Equation (30) can be solved by the expectation-maximization (EM) algorithm. Its results are shown in Figure 13, where the service state of the healthy bearing is automatically divided into the initial running-in phase and the steady operating phase. The end of the initial running-in phase-that is, the starting point of the steady operation phase-will be finally regarded as the theoretical point h1 in Equation (25). Figure 10b shows the standardized result of the PSW data in Figure 10a obtained by the above method.  Figure 1 illustrates the crack length in the spall area to represent the damage degree of a specific damaged bearing. Multiple spall areas are also shown in Figure 1 and are represented as the identifier '+' in Table 2. Damage related standardized PSW value is given in the same column of Table 2, together with RMS and kurtosis. Compared with the traditional RMS and kurtosis, an obvious positive correlation can be observed between the standardized PSW value and the crack length.

Ending Point of the Bearing Life
According to the theory introduced in the above sections, the standardized PSW indicator of the bearing can be obtained, and the starting point of the bearing's minor fault can be accurately identified. Based on the above information, the bearing RUL can be predicted with the starting point of the bearing's minor fault as the starting point. Table 2 indicates that there is a positive correlation between the bearing's PSW indicator and its physical damage crack. Therefore, a PSW threshold indicator can be defined based on the above information to serve as the ending condition of bearing life. In this paper, this threshold is defined as Threshold 1.7 , as shown in Figure 14.

Feature Smoothing
To further reduce the error caused by feature fluctuations and to lower the influence of noise, exponential curve fitting is performed on the data before training the mathematical model for bearing RUL prediction [7,9]. The theoretical model is expressed as where E , E , and E are the exponential curve fitting parameters. The fitting result is shown in Figure 15. Finally, the fitted data are input into the mathematical model presented in Section 5.3 for calculation. As shown in Figure 16, the method with exponential curve fitting has a better performance in predicting the bearing RUL against the method without curve fitting.

Bearing RUL Prediction
The LSTM model is a classic time series prediction network model [39]. This model consists of the following three components: forget gate, input gate, and output gate. In this model, the current hidden state Ht and the cell memory state Ct update on Ht−1 and Ct−1 based on the input It−1 of the previous time point according to the relevant algorithm. Finally, the time information of the time series is input. Its forward propagation process is expressed as follows: To enhance the prediction effectiveness of the LSTM model for bearing life, an LSTMbased encoder-decoder model [7] was proposed. To further improve the performance of the network, this paper sought to improve the model itself, such that in addition to introducing the attention mechanism reference, the bi-directional long-short term memory (BILSTM) was adopted to further extract the time information between bearing PSW indicators, and a residual model was also introduced to prevent gradient divergence. The calculation of the attention mechanism is shown by Equations (33)- (35). In the traditional LSTM model, the initial hidden unit H0 is usually a zero matrix. However, in this model, the damage type and working condition of the bearing are used as the one-hot matrix and the hidden unit input, as illustrated in Figure 17. 1  2  3  1  1   2  3  4  1  2   1  1  3  2  -1   1  2  2  1   , , , ,  ,   , , , , , ,  Figure 18. Flowchart of RUL prediction.

Description of the Test Rig
The bearing accelerated life test rig shown in Figure 19 is used for the experiment. In this test rig, the rotation of the shaft, test bearing, and support bearing are driven by a motor. The load on the test bearing is provided by the hydraulic loading system, and the vibration during bearing operation is measured using acceleration sensors.  The radial vibration of the bearing seat during the experiment is picked up by two horizontal and vertical PCB 608A11 acceleration sensors. The NI-9234 vibration acceleration acquisition card is used to acquire acceleration signals at a sampling frequency of 51.2 kHz. The sampling duration of each acquisition is 1.28 sec, and data are acquired every 12 sec. Section A sampling termination rule is formulated during the experiment--when any one of the two radial vibration accelerations exceeds 20 g, the test would be stopped immediately. The LDK-UER 205 bearing is used as the test bearing, and the corresponding geometric parameters are listed in Table 1. The test was carried out under three different working conditions, namely the rotation speed of 33 Hz/load of 800 kg, the rotation speed of 33 Hz/load of 900 kg, and the rotation speed of 33 Hz/load of 1000 kg. The tests are detailed in Table 4.

Experimental Results
In this experiment, it is defined that the life of the bearing is terminated only after the standardized PSW indicator exceeds 1.7. Therefore, based on the information in Table 4 To quantitatively assess the accuracy of the model proposed in this paper, the RMS error (see Equation (37)) is used as the criterion.  Table 5 shows the average error of the eleven bearing datasets (see Figure 20), respectively, as the test sets, using several other methods, including SVR [9], 3-layer FC network, LSTM [40], TCN-LSTM [41] and raw encoder-decoder model [7].

Conclusions
Most extant studies on the RUL prediction of rolling bearings have limited themselves to online public data sets, which do not consider the actual damage degree of the bearing and also rarely consider the starting and ending points for the bearing RUL prediction or the damage type of the bearing. Compared with the traditional health indicators used for life prediction, the standardized PSW indicator based on HMMR and improved PSW proposed in this paper can effectively evaluate the actual damage size in the bearing. Moreover, it was experimentally verified that with this indicator as input, the encoderdecoder neural network that combines the attention mechanism (which takes into account the bearing damage type and working conditions) and the BILSTM model can effectively improve the prediction accuracy for bearing RUL, thereby generating certain engineering value.
The main contributions of this paper are summarized as follows: Instead of using the traditional statistical life distribution to predict the bearing life, the damage starting time point for bearing life prediction is accurately defined from the perspective of the characteristic frequency of the rolling bearing through the adaptive envelope analysis method.
A health indicator that can effectively reflect the actual damage of the bearing is obtained by combining the HMMR model and the PSW model.
Compared with the traditional LSTM model, the encoder-decoder LSTM model proposed in this paper considers the timing signal characteristic of the health indicator. Not only are the BILSTM model and attention mechanism introduced into the model, but also the operating conditions of the bearing are considered as the LSTM hidden state input for the calculation, thereby improving the prediction accuracy for bearing RUL.
Author Contributions: Data curation, Y.Z. and J.Z.; Investigation, S.G. and X.X. All authors have read and agreed to the published version of the manuscript.