Combined Remaining Life Prediction of Multiple Bearings Based on EEMD-BILSTM

: To improve the accuracy of a symmetrical structural rolling bearing life prediction under noise interference, a multi-bearing life prediction method combining Ensemble Empirical Mode Decomposition (EEMD) and Bi-directional Long Short-Term Memory (BiLSTM) is proposed. First, EEMD is proposed to decompose the original vibration signal to obtain a ﬁnite number of Intrinsic Mode Function (IMF), and the IMFs are further ﬁltered by combining the correlation criterion and kurtosis criterion. Then, the time domain features and frequency domain features of the reconstructed signal are ﬁltered by monotonicity index to obtain the set of features containing key information. Finally, the BiLSTM network is trained on the ﬁltered features set, and the method is proved to accurately predict the remaining life of rolling bearings under different operating conditions through rolling bearing full-life experiments, and the effectiveness of the method is veriﬁed by comparing the prediction results with several main recurrent neural networks.


Introduction
The functioning of rolling bearings, being the most significant component of rotating equipment, has a direct impact on the machine's operation [1]. Extracting and modeling the rolling bearing symmetry characteristics is significant in improving remaining useful life prediction for mechanical equipment. The proactive maintenance of rolling bearings before they are damaged can effectively avoid accidents and thus reduce property losses. Therefore, it is important to study the prediction of its remaining life [2]. When it comes to the life prediction issue, multi-bearing synergetic analysis is more difficult since it considers many bearings with identical operating circumstances in order to determine the remaining life. Indeed, since the deterioration patterns of different bearings in a set might vary widely, it is difficult to accurately anticipate the remaining service life of numerous bearings at once [3].
The analysis of rolling bearing vibration signals has developed into the most efficient approach for monitoring bearing condition. On the one hand, there is typically noise interference in the bearing working environment, and the signal-to-noise ratio is poor. The measured vibration signal from numerous types of bearings, on the other hand, often exhibits nonlinearity. Fourier analysis and wavelet variation are often used in traditional fault analysis techniques. Fourier analysis is often used to deal with non-smooth data, although it has a low adaption rate. Wavelet transforms have multi-resolution properties, wavelet transform methods are sensitive to frequency mixing, wavelet basis selection is challenging, and wavelet transform algorithms are complex to develop. Empirical Mode Decomposition (EMD) is a time-frequency signal decomposition approach proposed by N.E. Huang [4] to analyze the non-smooth nonlinear signal. EEMD is a noise-assisted The majority of current research on rolling bearing life focuses on the individual bearings. However, different bearings have certain similarity in characteristics under the same working conditions, and machine learning algorithms can mine the characteristics of vibration data from vibration data well.
The challenge of forecasting the remaining service life of numerous bearings using condition monitoring data obtained under the same operating circumstances is known as cooperative prediction of the remaining service life of multiple bearings. Cooperative prediction of the remaining life of numerous bearings tries to forecast the remaining life of the bearings using both the bearings' own monitoring data and the monitoring data for other bearings of the same kind and operating circumstances.

EEMD Noise Reduction Principle
In recent years, as a successful approach for coping with nonlinear non-smooth signals, EMD has been able to deconstruct the signal into numerous IMFs, each of which reflects a component of each frequency in the original signal. This approach can address the issue of difficult basic function selection in wavelet noise reduction, but it cannot tackle the problem of noise-induced mode mixing [20]. Mode mixing refers to 1 IMF containing widely different characteristic timescales, or similar characteristic timescales distributed in different IMFs. In order to mitigate the mode mixing, Wu and Huang et al. [5] proposed an Ensemble Empirical Modal Decomposition (EEMD). EEMD is capable of adaptively separating the high frequency information in the fault signal well. The EEMD noise reduction method is shown in Figure 1: the bearings using both the bearings' own monitoring data and the monitoring data for other bearings of the same kind and operating circumstances.

EEMD Noise Reduction Principle
In recent years, as a successful approach for coping with nonlinear non-smooth signals, EMD has been able to deconstruct the signal into numerous IMFs, each of which reflects a component of each frequency in the original signal. This approach can address the issue of difficult basic function selection in wavelet noise reduction, but it cannot tackle the problem of noise-induced mode mixing [20]. Mode mixing refers to 1 IMF containing widely different characteristic timescales, or similar characteristic timescales distributed in different IMFs. In order to mitigate the mode mixing, Wu and Huang et al. [5] proposed an Ensemble Empirical Modal Decomposition (EEMD). EEMD is capable of adaptively separating the high frequency information in the fault signal well. The EEMD noise reduction method is shown in Figure 1: Ensembly y 1 (t) Ensembly y 1 (t) Ensembly y n (t) Ensembly y n (t)

EMD Decomposition EMD Decomposition
Add White Noise S n (t) Add White Noise S n (t) Add White Noise S 1 (t) Add White Noise S 1 (t) … (1) A normally distributed white noise sequence ( ) is added to the collected signal ( ) to obtain ( ), and normalize the noise-added signal.
(2) The obtained signal ( ) is decomposed by EMD to obtain different IMFs ( ) and the residuals ( ) that ( ) represents the first noise is decomposed to obtain the first residual.
(3) Repeat steps (1), (2) times and add a new random normally distributed white noise sequence each time.
(4) The overall averaging operation of all decomposed obtained IMFs can eliminate the effect of adding white noise to the real IMF several times, and the final IMF components and residuals after EEMD decomposition are calculated as follows: (1) (1) A normally distributed white noise sequence S i (t) is added to the collected signal x(t) to obtain y i (t), and normalize the noise-added signal. (2) The obtained signal y i (t) is decomposed by EMD to obtain different IMFs C ij (t) and the residuals r i (t) that C ij (t) represents the first i noise is decomposed to obtain the first j residual. (3) Repeat steps (1), (2) n times and add a new random normally distributed white noise sequence each time. (4) The overall averaging operation of all decomposed obtained IMFs can eliminate the effect of adding white noise to the real IMF several times, and the final IMF components and residuals after EEMD decomposition are calculated as follows: where: C j is the EEMD decomposition of the first j component and r is the residual after decomposition. (5) Selection of IMF components: the decomposed I MFi components are listed in frequency order from high to low. When reconstructing the signal, the appropri- ate IMF components should be selected to retain the feature information to the maximum extent.

Principle of Bi-Directional Long and Short-Term Memory Network
LSTM, as a particular case of RNN, not only avoids gradient explosion and disappearance, but also has a larger memory capacity than RNN, which has just one state h in the implicit layer that is sensitive to short-term input. The LSTM augments this with a long-term memory unit c. As seen in Figure 2, the LSTM model is primarily responsible for controlling the memory unit c through three gate structures: an input gate, a forgetting gate, and an output gate [21]. where: is the EEMD decomposition of the first component and is the residual after decomposition.
(5) Selection of IMF components: the decomposed components are listed in frequency order from high to low. When reconstructing the signal, the appropriate IMF components should be selected to retain the feature information to the maximum extent.

Principle of Bi-Directional Long and Short-Term Memory Network
LSTM, as a particular case of RNN, not only avoids gradient explosion and disappearance, but also has a larger memory capacity than RNN, which has just one state h in the implicit layer that is sensitive to short-term input. The LSTM augments this with a long-term memory unit c. As seen in Figure 2, the LSTM model is primarily responsible for controlling the memory unit c through three gate structures: an input gate, a forgetting gate, and an output gate [21]. The forgetting gate is primarily responsible for determining which information is forgotten at the final possible time. A number of (0, 1) is output by reading ℎ −1 and . This value determines how much information will be retained by the cell −1 state, with the formula: The input gate's primary duty is to contribute information to the cell's state. The first section establishes values that are updated by the sigmoid function once the information a_t is produced through the tanh activation function, as follows: The update module status: where, "∘" denotes the Hadamard Product. The output gate is in charge of determining whether or not to output the current state information, which is likewise formed of two sections: the first section is composed of ℎ −1 , , and sigmoid activation. The second section includes the tanh activation function: The forgetting gate is primarily responsible for determining which information is forgotten at the final possible time. A number of (0, 1) is output by reading h t−1 and x t . This value determines how much information will be retained by the cell c t−1 state, with the formula: The input gate's primary duty is to contribute information to the cell's state. The first section establishes values that are updated by the sigmoid function once the information a t is produced through the tanh activation function, as follows: The update module status: where, "•" denotes the Hadamard Product. The output gate is in charge of determining whether or not to output the current state information, which is likewise formed of two sections: the first section is composed of h t−1 , x t , and sigmoid activation. The second section includes the tanh activation function: In the above equations, W, U, and b are linear correlation coefficients and bias coefficients, and σ is the sigmoid activation function.
The LSTM's primary property is that each new input has some characteristics of the prior input. By regularly updating the model's cellular state, the model retains long-term memory capacity. BiLSTM is made up of two LSTMs, one forward and one reverse, which could consider both past and future states by obtaining two time series with opposite hidden states and then connecting them to get the same output [22]. Therefore, BiLSTM can better grasp the complete information when processing time series data, and the BiLSTM structure is shown in Figure 3.
In the above equations, W, U, and b are linear correlation coefficients and bias coefficients, and is the sigmoid activation function. The LSTM's primary property is that each new input has some characteristics of the prior input. By regularly updating the model's cellular state, the model retains long-term memory capacity.
BiLSTM is made up of two LSTMs, one forward and one reverse, which could consider both past and future states by obtaining two time series with opposite hidden states and then connecting them to get the same output [22]. Therefore, BiLSTM can better grasp the complete information when processing time series data, and the BiLSTM structure is shown in Figure 3.
Passing of the hidden layer LSTM The hidden state of BiLSTM at moment contains the forward ℎ ⃗⃗⃗ and backward ℎ ⃖⃗⃗⃗

Methodological Steps
Three critical aspects must be considered in order to correctly anticipate the remaining life of rolling bearings.
① Filtering of the IMFs after EEMD decomposition. If the EEMD decomposition results are not selected, it may cause the extracted features to be vulnerable to noise interference. To accomplish noise reduction in this study, the correlation coefficient and kurtosis index are employed to filter the IMFs, and signal reconstruction is done on the filtered IMFs.
The correlation coefficient may be used to measure the degree of linear correlation between each IMF and the original signal, and it can be stated as: where denotes the correlation coefficients for x and y.  The hidden state of BiLSTM at moment H t contains the forward where: T is the sequence length.

Methodological Steps
Three critical aspects must be considered in order to correctly anticipate the remaining life of rolling bearings. 1 Filtering of the IMFs after EEMD decomposition. If the EEMD decomposition results are not selected, it may cause the extracted features to be vulnerable to noise interference. To accomplish noise reduction in this study, the correlation coefficient and kurtosis index are employed to filter the IMFs, and signal reconstruction is done on the filtered IMFs.
The correlation coefficient may be used to measure the degree of linear correlation between each IMF and the original signal, and it can be stated as: where ρ xy denotes the correlation coefficients for x and y. E[·] denotes the mathematical expectation, µ x and µ y are the mean values of the original signals x and y, and σ x and σ y are the standard deviations of the original signals x and y.
Kurtosis is a dimensionless parameter that can reflect the sharpness of the waveform. As can be seen from Equation (13), the kurtosis, as a fourth-order cumulative quantity, is unrelated to the size of the bearing, the rotational speed, or the load applied, but is very sensitive to the fault shock component of the signal, where: N denotes the number of samples, σ t is the standard deviation. The correlation coefficient r is taken in the range (−1,1), and the interference of uncorrelated quantities in the vibration signal is reduced by finding the correlation coefficient of each IMF with the original signal. Normalize the desired kurtosis value and combine it with the desired correlation coefficient to define it as the K ρ value of IMF: In Equation (14) K and ρ correspond to the kurtosis value and correlation coefficient of an IMF, and α is the degree of the current IMF component's kurtosis value on the weight of K ρ , and in this paper, we choose α to be 0.6 through experiments. 2 Analyzing the bearing state and choosing the best features. In this paper, a total of 13-time domain features and frequency domain features are extracted, among which some can well characterize the full life state of the bearing, while others cannot. In this research, monotonicity is used as a quantitative indicator to assess the extracted feature parameters quantitatively. The monotonicity index is calculated as follows: . . , f (t k )] and the time series T = [t 1 , t 2 , . . . , t k ], the f (t k ) denotes the time at which t k corresponds to the eigenvalues at the time, K is the total length of the sample time, and δ(•) is the simple unit step function. 3 Selecting a suitable lifespan prediction model. The typical neural network predicts the remaining life using data from the present instant, and data volatility has a big influence on the prediction result, and it cannot understand the trend features of the data change with time well. The author presents a technique for predicting the remaining life of rolling bearings based on BiLSTM, and Figure 4 depicts the system's flow chart. The following are the particular steps: (1) Filtering of IMFs: After EEMD decomposition of the original signal and obtaining a set of IMFs, the correlation coefficient of each IMF with the original signal is calculated according to Equation (12), the kurtosis value of each IMF is calculated according to Equation (13), and the K ρ of each IMF is calculated using Equation (14). For noise reduction, the IMF corresponding to the top three K ρ is chosen and rebuilt. (2) Extraction of feature parameters: The bearing's full-life vibration data are used to extract the time and frequency domain features, and according to Equation (15), the features that do not accurately depict bearing deterioration are discarded, leaving the characteristics that do. (3) Determination of the degradation start moment point and define the network training label: Bearings are in normal operation for a long time throughout their lifetime, therefore we need to figure out where the degradation begins, and life forecast of their deterioration process may be a smart approach to conserve computing resources. In this paper, the degradation threshold is determined using µ + 3σ (µ is the median of the series and σ is the standard deviation of the series) [23]. The remaining life of the bearing is normalized to between (0, 1) from the moment of degradation to the complete failure of the bearing as the network prediction label. (4) BiLSTM network training and prediction: The whole lifespan of the first two bearings in each operational condition is utilized as the network's training set, and the degraded  (1) Filtering of IMFs: After EEMD decomposition of the original signal and obtaining a set of IMFs, the correlation coefficient of each IMF with the original signal is calculated according to Equation (12), the kurtosis value of each IMF is calculated according to Equation (13), and the of each IMF is calculated using Equation (14). For noise reduction, the IMF corresponding to the top three is chosen and rebuilt. (2) Extraction of feature parameters: The bearing's full-life vibration data are used to extract the time and frequency domain features, and according to Equation (15), the features that do not accurately depict bearing deterioration are discarded, leaving the characteristics that do. (3) Determination of the degradation start moment point and define the network training label: Bearings are in normal operation for a long time throughout their lifetime, therefore we need to figure out where the degradation begins, and life forecast of their deterioration process may be a smart approach to conserve computing resources. In this paper, the degradation threshold is determined using μ+3σ(μ is the median of the series and σ is the standard deviation of the series) [23]. The remaining life of the bearing is normalized to between (0, 1) from the moment of degradation to the complete failure of the bearing as the network prediction label. (4) BiLSTM network training and prediction: The whole lifespan of the first two bearings in each operational condition is utilized as the network's training set, and the degraded feature set is used as the network's input. The network is trained according to the labels so that the parameters of the network can be determined. The test bearing dataset is fed into the training network to validate the model's accuracy and compare it to other algorithms.

Data Description
The XJTU-SY bearing data were used in this experiment and the specific experimental setup is shown in Figure 5. The purpose of the XJTU-SY test platform is to provide vibration signals throughout the service life of the naturally degraded bearing under different operating conditions. The rotational speed of the AC induction motor is controlled by the speed controller of the AC induction motor, which generates different radial forces that are applied to the tested bearing. The test is designed for three types of working conditions, and 5 samples are collected for each working condition of rolling bearing operation, and 15 samples in total for the three working conditions. They are named rolling bearing 1-1-rolling bearing 1-5, rolling bearing 2-1-rolling bearing 2-5, and rolling bearing 3-1-rolling bearing 3-5. Within each working condition, the first two bearings of each working condition are utilized as training samples, and the remaining three bearings of each working condition are employed as test samples in this experiment. Since the vertical vibration signals do not reflect the degradation information well, the horizontal vibration signals of the bearings are used for this study [24]. tion, and 15 samples in total for the three working conditions. They are named rolling bearing 1_1-rolling bearing 1-5, rolling bearing 2_1-rolling bearing 2-5, and rolling bearing 3_1-rolling bearing 3-5. Within each working condition, the first two bearings of each working condition are utilized as training samples, and the remaining three bearings of each working condition are employed as test samples in this experiment. Since the vertical vibration signals do not reflect the degradation information well, the horizontal vibration signals of the bearings are used for this study [24].

Sampling Point Setting
The sampling frequency in the experiment is 25.6 kHz while the length of each sample is 1.28 s, and the original signal is 32,768 sample points per group. In order to increase the experimental sample size, 4096 points were set as one sample in this study. The operating conditions of the rolling bearing under the three operating conditions are shown in Table 1. Figure 6 depicts the whole life signal of bearing 1-1.

Sampling Point Setting
The sampling frequency in the experiment is 25.6 kHz while the length of each sample is 1.28 s, and the original signal is 32,768 sample points per group. In order to increase the experimental sample size, 4096 points were set as one sample in this study. The operating conditions of the rolling bearing under the three operating conditions are shown in Table 1. Figure 6 depicts the whole life signal of bearing 1-1.

EEMD Bearing Vibration Signal Noise Reduction
The data of a certain time period of the bearing are selected to verify the bearing reduction effect. The time-domain waveform and frequency spectrum of the beari fore noise reduction are presented in Figure 7, and the frequency spectrum shows th initial vibration signal has a certain noise, which has a considerable interference bearing's ability to extract useful information.

EEMD Bearing Vibration Signal Noise Reduction
The data of a certain time period of the bearing are selected to verify the bearing noise reduction effect. The time-domain waveform and frequency spectrum of the bearing before noise reduction are presented in Figure 7, and the frequency spectrum shows that the initial vibration signal has a certain noise, which has a considerable interference to the bearing's ability to extract useful information.

EEMD Bearing Vibration Signal Noise Reduction
The data of a certain time period of the bearing are selected to verify the bearing noise reduction effect. The time-domain waveform and frequency spectrum of the bearing before noise reduction are presented in Figure 7, and the frequency spectrum shows that the initial vibration signal has a certain noise, which has a considerable interference to the bearing's ability to extract useful information.  Next, the vibration signals of rolling bearings are processed for noise reduction by the EEMD method. In this paper, 100 groups of white noise sequences are added to the original signal x(t). The amplitude of each group of white noise is set to 0.3, and a total of 9 IMFs and one residual are generated. Figure 8 shows the results of EEMD decomposition. of 9 IMFs and one residual are generated. Figure 8 shows the results of EEMD decomposition.  Table 2. The values are sorted from largest to smallest, and the first three IMFs with larger Then the correlation coefficients, kurtosis values, and K ρ values are calculated for each component, the first six IMFs are more correlated, but the kurtosis values of IMF1, IMF5, and IMF6 are smaller, and the results of K ρ values are shown in Table 2. The K ρ values are sorted from largest to smallest, and the first three IMFs with larger K ρ values are selected for superposition to form the new vibration signal, and the selected components are IMF1, IMF3, and IMF4, and the other IMFs are rejected as noise. Figure 9 shows the time-domain waveform and frequency spectrum after noise reduction. each component, the first six IMFs are more correlated, but the kurtosis va IMF5, and IMF6 are smaller, and the results of values are shown in Tab The values are sorted from largest to smallest, and the first three IM values are selected for superposition to form the new vibration signal, an components are IMF1, IMF3, and IMF4, and the other IMFs are rejected as n shows the time-domain waveform and frequency spectrum after noise redu As can be seen in Figure 9, the low and high frequency components o weakened after noise reduction compared with the original signal, and the of the signal are more significant.  As can be seen in Figure 9, the low and high frequency components of the signal are weakened after noise reduction compared with the original signal, and the shock feature of the signal are more significant. It is proved that too many feature parameters can lead to an increase in the number of operations and result in long computation times. Fewer parameters may not reflect the bearing degradation trend better. In this paper, the monotonicity index is used to quantitatively filter out the feature parameters with strong characterization ability. The monotonicity can well reflect the time-series of the feature parameters, and the calculation is referred to Equation (15).

Feature Parameter Extraction
After quantification, the sorting results are shown in Figure 10, and the best result is obtained by testing that the threshold value of 0.5 is selected. The six features larger than the threshold value are extracted, and the normalization method is used to convert the extracted features to (0, 1) in order to eliminate the negative impact of different value ranges, which is then used as the input value of the life prediction model. It is proved that too many feature parameters can lead to an increase in the of operations and result in long computation times. Fewer parameters may not re bearing degradation trend better. In this paper, the monotonicity index is used to tatively filter out the feature parameters with strong characterization ability. Th tonicity can well reflect the time-series of the feature parameters, and the calcu referred to Equation (15).
After quantification, the sorting results are shown in Figure 10, and the best obtained by testing that the threshold value of 0.5 is selected. The six features lar the threshold value are extracted, and the normalization method is used to con extracted features to (0,1) in order to eliminate the negative impact of differen ranges, which is then used as the input value of the life prediction model.

BiLSTM Lifetime Prediction
In this paper, we construct a BiLSTM model based on Keras, which provides various methods for initializing parameters based on probability distributions. A uniform distribution is generated as the initial weight initialization method. The first two layers use the sigmoid function as the activation function, while the ReLU function is chosen as the activation function for the Dence layer, where the optimizer is selected from Adam and Dropout technique is applied to prevent overfitting [23,25]. The network parameters of the BiLSTM are shown in Table 3. We performed a comparative experiment by using BiLSTM, LSTM, and RNN to validate the accuracy of the EEMD-BiLSTM remaining life prediction model. Because all recurrent networks have the same structure, the only thing that has to be changed is the kind of recurrent network, while the parametric network stays unaltered.
As shown in Figure 11, the full-life prediction results of bearing 1-4 under each model can predict the life degradation trend well, among which RNN is less effective since RNN has only one hidden state. Assuming that the bearing degrades at point t, the reason for the large prediction accuracy error at the time when the bearing first begins to degrade is that the bearing has just failed, and because the damage at point t is greater than at other locations, the vibration signal when the bearing is running to point t should be significantly different from the state at other points in the same rotation cycle, resulting in low prediction accuracy. As the bearing wears, the signal variation at point t becomes less noticeable compared to other places, and model's accuracy improves. Therefore, the proposed method can better predict the remaining bearing life and provide for the effective operation of the bearings.

Dense_2
(None,6,25) (None,6,1) Relu - We performed a comparative experiment by using BiLSTM, LSTM, and RNN to validate the accuracy of the EEMD-BiLSTM remaining life prediction model. Because all recurrent networks have the same structure, the only thing that has to be changed is the kind of recurrent network, while the parametric network stays unaltered.
As shown in Figure 11, the full-life prediction results of bearing 1-4 under each model can predict the life degradation trend well, among which RNN is less effective since RNN has only one hidden state. Assuming that the bearing degrades at point t, the reason for the large prediction accuracy error at the time when the bearing first begins to degrade is that the bearing has just failed, and because the damage at point t is greater than at other locations, the vibration signal when the bearing is running to point t should be significantly different from the state at other points in the same rotation cycle, resulting in low prediction accuracy. As the bearing wears, the signal variation at point t becomes less noticeable compared to other places, and model's accuracy improves. Therefore, the proposed method can better predict the remaining bearing life and provide for the effective operation of the bearings. The Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) assessment indices of the model are utilized in this paper to validate the applicability of the suggested method's applicability. The two equations are shown as follows: where: n is the sample size of data, y i is the value of the ith sample, andŷ i is the prediction value of the ith sample. Table 4 shows the results of a comparison of the prediction results with the commonly used time series algorithms. As shown in table, except for bearings 1-5 and 2-3, which have slightly higher prediction errors, all seven bearings perform well. 1-5 and 2-3 are due to bearing cage damage, and future research in this area can focus on life prediction using the proposed algorithm. The proposed method's mean values for RMSE and MAE are 0.125 and 0.257, respectively, and the experimental results confirm the effectiveness of the proposed algorithm for lifetime prediction.

Conclusions
The remaining life of multiple bearings are studied based on combining model in this paper and the follows conclusions are drawn.
(1) The ensemble empirical modal decomposition method is proposed and the decomposed IMFs are filtered by combining the correlation coefficient and the kurtosis index. It is proved that the final reconstructed signal can adequately remove the irrelevant noise. (2) Thirteen time-domain and frequency-domain features are extracted from the noisereduced signal, and the initial feature set is filtered using monotonicity, and the feature set composed of the filtered features can fully retain the degradation information of the bearings. (3) The experimental analysis results show that the proposed method in this paper is more effective and has better prediction accuracy compared to other recurrent neural networks. It is more applicable and shows good performance in different working conditions.