Bearing Remaining Useful Life Prediction Based on Naive Bayes and Weibull Distributions

Bearing plays an important role in mechanical equipment, and its remaining useful life (RUL) prediction is an important research topic of mechanical equipment. To accurately predict the RUL of bearing, this paper proposes a data-driven RUL prediction method. First, the statistical method is used to extract the features of the signal, and the root mean square (RMS) is regarded as the main performance degradation index. Second, the correlation coefficient is used to select the statistical characteristics that have high correlation with the RMS. Then, In order to avoid the fluctuation of the statistical feature, the improved Weibull distributions (WD) algorithm is used to fit the fluctuation feature of bearing at different recession stages, which is used as input of Naive Bayes (NB) training stage. During the testing stage, the true fluctuation feature of the bearings are used as the input of NB. After the NB testing, five classes are obtained: health states and four states for bearing degradation. Finally, the exponential smoothing algorithm is used to smooth the five classes, and to predict the RUL of bearing. The experimental results show that the proposed method is effective for RUL prediction of bearing.


Introduction
As the key equipment in the production of products, rotating machinery covers many fields, such as agriculture, machinery manufacturing, industry, electric power, aerospace industry and so on, and plays an important role in the process of industrial production. The emergence of rotating machinery improves production efficiency and reduces energy consumption. However, in the actual production process, due to long-term work and improper operation of parts, mechanical equipment is prone to failure and causes unnecessary losses. The rolling bearing, which plays an indispensable role in the healthy operation of rotating machinery, is an important part of mechanical equipment. The malfunction of rotating machinery is mainly caused by the fault of rolling bearing, and its health state determines the running state of the equipment [1][2][3]. Therefore, the detection of bearing status and the evaluation of life expectancy are very important. Recently, Prognostics and Health Management (PHM) is a promising research direction that can improve the safety and performance of mechanical equipment. PHM predicts the life of the equipment based on actual performance analysis of the equipment. Maintenance of equipment before predicted life can greatly improve the reliability and with ANN. However, due to the uncertainty about the number of hidden layers in the neural network, it is difficult to determine the number of layers in constructing the network. The network keeps trying during training, which leads to randomness of training results. In order to avoid the influence of uncertain number of layers on the training results, Huang Guangbin proposed a new learning algorithm called extreme learning machine (ELM) [27]. ELM is a single hidden layer neural network, which is widely used, by virtue of its simple structure and fast training speed. Fang Liu et al. [17] who proposed a two-layer joint approximate diagonalization of eigen matrices (JADE), which can be regarded as a new degradation index from which redundant features have been eliminated. Then extracted degradation index is passed to the ELM to predict bearing RUL. Next, Fang liu et al. [21] proposed joint phase space reconstruction with JADE to jointly extract sensitive features, and then ELM is used to predict RUL of bearing. ELM greatly shortens the training time, but the randomness in the choice of parameters still caused the randomness of training results. In the current study, Lei Ren [28] proposes a method to compress and calculate the features by using the depth self-encoder, and then uses the depth learning framework to predict the real time life of the bearing. Furthermore, the result of the experiment is achieving better efficiency in bearing RUL prediction. The literature [29] proposes real-time prediction using multi-layer perceptron (MLP) and radial basis functions (RBF). The results show that RBF is superior to MLP in experimental accuracy and time, and results in interesting results. Andres Bustillo et al. [30] proposed to use the popular various Artificial Intelligence (AI) techniques processing sample data set to judge the machine residual life under actual industrial conditions. The experimental results show that the AI technology provides a higher precision to predict the residual life. However, the existing data-driven residual life prediction method does not accumulate knowledge to determine the bearing state. Health status determination is based on expert experience [31]. Bayes is a datahl-driven method based on prior knowledge, which effectively avoids the randomness of results. Naci. Z Gebraeel et al. [32] proposed a Bayesian updating method to update the random parameters of the bearing degradation model and then to develop RUL of degraded device. The method proposed in literature [9] is based on parameters and models. The selection of parameters and the construction of models are very complicated. F. D. Maio's method et al. [33] applied NB to bearing fault prediction which is a non-parametric data-driven method. When the signal fluctuation of the bearing is large and the accurate classification of bearing cannot be provided, the RUL of the bearing cannot be predicted accurately. According to Reference [34], bearing degradation is rising over time. The running process of bearing is generally divided into three stages: normal operation stage, continuous recession stage, and final failure stage. Therefore, the improved WD is mainly used to fit the bearing signals in different stages to predict the RUL. This paper mainly considers bearing degradation signal. First, the time domain statistical characteristics are extracted from the vibration signal of the bearing. Then, according to the correlation analysis, the sensitive degradation index of bearing is extracted. Then, the improved WD algorithm is used to fit the degradation index of the fluctuation of bearing in different recession stages which is used as input of NB training stage. The actual degradation data of the bearing is used as test samples. Finally, the results of the time series are smoothed by exponential smoothing, thus the RUL of the bearing is obtained.
The presentation of the paper is as follows. Section 2 briefly describes the correlation analysis, WD and NB. The RUL prediction of bearing, the experimental data and results are presented in Section 3. In Section 4, we draw a conclusion based on the experiment listed in Section 3.

Correlation Analysis
Correlation analysis is a statistical method for analyzing variables [35]. Correlation analysis aims to measure the degree of correlation between variables. The degree of correlation is mainly denoted by the correlation coefficient between variables. The higher the correlation coefficient is, the higher the degree of correlation between the variables is. The correlation coefficients of two stochastic variables A and B is: where Cov(A, B) is the covariance of variables A and B, D(A) and D(B) are the variance of the variables.

Weibull Distribution
The Weibull Distribution (WD) [36] is widely used in the reliability theory, and the life of most mechanical equipment obeys the WD. WD mainly include two parameters WD and three parameter WD, as shown in Table 1. Where the parameters δ, k and u are proportional parameters, shape parameters and position parameters, respectively. Where t is the input variable.

Two-Parameter Weibull
Three-Parameter Weibull An improved WD proposed in [26] is called Universal Failure Rate Function (UFRF), and the UFRF formula is defined as: The parameter δ > 0 is the scale parameter, the k > 0 is the shape parameter, the adoption of c makes the WD adapt to any range and the adoption of the parameter b is to adjust the value of the WD at the beginning.

Naive Bayes
NB is a classification method based on Bayesian rule [37,38]. Bayes and NB are classifiers with prior knowledge. DataSet X = {X 1 , X 2 , . . . , X n } is known. Where X i =< A, Y >, A is a property of the dataset A = {A 1 , A 2 , . . . , A t }, Y is the category of dataset Y = {Y 1 , Y 2 , . . . , Y m }. When there is unknown data C = {C 1 , C 2 , . . . , C m }, the NB classification method is to assign unknown data C to the category Y with the largest probability value, i.e., calculate Max(P(Y i /C)). The calculation method of P(Y i /C) is obtained by Bayesian method with prior knowledge: NB is Bayesian method with independent characteristic conditional, so the probability of P(C) for each class is a constant, it only needs to be calculate P(C/Y i )P(Y i ). According to known prior knowledge, it can be obtained: where n i is the number of class i of data set Y, and n is the total number of data set Y. Attribute A c is a continuous property. Usually, the continuous property obeys Gauss distribution A c ∼ N(u c , δ 2 c ), Therefore, the Equation for P(A cj /Y i ) is: where u c and δ 2 c are the mean and variance of the dataset X, respectively.
The discriminant function of the NB can be obtained as follows: Therefore, the category of unknown data C is judged as the category with the maximum value of discriminant function P(Y i /C).

Prediction of Bearing RUL
In this paper, the prediction model framework of bearing RUL is divided into three stages as shown in Figure 1.

Signal Acquisition
To verify the effectiveness of our method, the full-cycle bearing data used in this paper is from the Intelligent Maintenance Center of the University of Cincinnati [39]. The full cycle data (run-to-failure) acquisition device of the bearing is shown in Figure 2. One is the real figure of the bearing and the other is the sensor placement illustration of the bearing. In addition, sensor placement illustration and real figure are corresponding. As it is shown in Figure 2, the bearing experimental test platform consists of four Rexnord ZA-2115 double row test bearings (Bearing 1-4) mounted on a rotating shaft. The shaft is driven by AC motor and the bearing is always kept at a constant speed of 2000 revolutions per minute. Furthermore, radial load system of 2724 kg is loaded on the bearing and the drive shaft. In addition, there are two ICP-based Accelerometers(model: PCB 353B33 high sensitivity quartz ICP) produced by PCB USA to measure the vibration data along the x and y of channel signals. The vibration data were collected by NI DAQ card 6062E and recorded every 10 min using 20 kHz sampling frequency. The degradation of bearing s is mainly determined by the debris collected by magnetic plugs. When a certain amount is reached, the test platform ends the test. After 35 days of testing, a total of 3 sets of run-to-failure data were collected. Each set contains 20,480 sampling points. In this experiment, the bearing run-to-failure data of three data sets were adopted, as shown in Table 2. Figure 3 shows the vibration data of the entire cycle of three datasets from normal to fault.  It can be seen from Figure 3 that the signal frequency of bearing 1, bearing 3 and bearing 4 is low at the beginning and the vibration frequency increases gradually over time. However, the amplitude analysis of the signal cannot get the RUL of the bearing. Furthermore, there is a certain noise in these signals. It is very difficult to analyze these signals directly. So the signal is processed preliminarily.

Feature Extraction
The time domain signal is a waveform signal that changes with time and contains the state information of the bearing in the waveform. The state of bearing can be diagnosed by analyzing time-domain waveform. However, in the real working condition, the bearing is mixed with the noise during the operation, so this paper uses the statistical time domain method to extract sensitive features from signals. The health status degradation information of the bearing is extracted by statistical calculation. As shown in Table 3, there are a total of 16 statistical time-domain features. Table 3. Time domain analysis of bearing run-to-failure data.

Number Characteristic Equation Number Characteristic Equation
The 16 original statistical time-domain features (as shown in Table 1) extracted from the vibration signal are selected, and described in Figures 4-6. The time-domain features of F 1 − F 8 are RMS, the average value, absolute mean, average power, square amplitude, peak, peak-to-peak and variance, respectively. The time-domain hlfeatures of F 9 − F 16 are standard deviation, skewness, kurtosis, waveform, Crest index, impluse index, margin index and skewness index, respectively. It can be seen from Figures 4-6 that not all features of bearing degradation data F 1 − F 16 are robust in the process of degradation. For example, F 6 in Figures 4 and 5 cannot well present the degradation process of bearing, but the features F 4 forms well is about the degradation process of bearing. So, we need to find the features that can better present the degradation process of the bearing. Therefore, it is necessary to choose a suitable and robust feature for the RUL prediction.

Feature Selection
Before life prediction modeling, the appropriate features are very important for reflecting the bearing degradation process. Therefore, it is necessary to select the characteristics of 16 time-domain features and select the appropriate features to predict the RUL of the bearing. Time domain features of RMS can effectively reflect the overall health status of bearings. Furthermore, with the degradation of the bearing, the degradation trend of RMS gradually increased. So, RMS is often used as the main features for bearing trend analysis and RUL prediction [40,41]. This paper takes RMS as the main feature, and uses correlation coefficients to extract the statistical time domain features consistent with the RMS trend, and then to predict the RUL of bearing.
In this paper, the correlation coefficient is used to extract the high-correlation time domain features with RMS. The flow chart is shown in Figure 7. . . , f ni }. n is the number of samples and t is the number of features attributes. M nt is written as: Then, the correlation coefficient matrix R1t between RMS and Mnt is calculated.
Among them, the correlation coefficient r can be calculated according to Equation (1), then r 1i = ρ(RMS, F i ). Finally, the threshold T is set to determine the relationship between correlation coefficient matrix R and T. If r 1i > T, it shows that the feature F i has a strong correlation with RMS. Otherwise, the correlation between F and RMS is low, then the ith column in the feature matrix is dropped.
In the experiment of this paper, there are 16 feature vectors. Therefore, the correlation coefficients between RMS and 16 feature vectors need to be calculated. In general, the correlation coefficient of the two vectors is greater than 0.9, and we consider that these two vectors are significantly correlated. In this paper, we need to get features of high correlation and RMS. We set the threshold T = 0.9 to filter off the features of low correlation, and the dimension of the features is reduced to 6. The six features are F 1 , F 3 , F 4 , F 5 , F 8 and F 9 . Correlation coefficient between bearing features and RMS are as shown in Table 4.

Classification Model Construction
In this paper, NB is used to train and test the bearing degradation data. The training data mainly use UFRF to fit the bearing data of degradation trend. In the testing process, the real bearing degradation data are used for classification and then for predicting the RUL.
The overall degradation of the bearing is increasing over time. In Figures 4-6, it can be seen that the extracted six time-domain features curves of F 1 , F 3 , F 4 , F 5 , F 8 and F 9 have a large number of fluctuations, which make it difficult for the structural classification to predict the RUL of the bearing. There are many reasons for these fluctuations, such as noise, speed and so on. In order to avoid the influence of these fluctuations, the UFRF proposed in literature [26] is used to smooth and fit the features smoothing increases with time.
However, the bearing degradation process can be divided into three stages and each stage is different. The first stage is the normal operation stage, and the curve of RMS features of bearings in this stage has no obvious change, and there is no degeneration. The second stage is the continuous recession stage, where the degradation is relatively obvious, but the degradation is a slow and continuous process. Generally there will be no significant degradation. The third stage is the final failure stage. Bearing degradation is severe and its fluctuation will be especially obvious. It can be seen from above that the degradation process of each degradation stage is different. Therefore, in this paper, the UFRF with different parameters is used to fit thefeatures of the degradation process at different stages. However, the overall degradation is increasing over time, consistent with the bearing degradation process. Reference [42] shows that the starting point of bearing 1 and bearing 3 failure is at the 586th point and 1808th, respectively. Then the decline trend of bearing degradation curve shows down in the following time. Furthermore, then the degradation curve of the bearing will change rapidly after reaching the failure threshold (950th and 2120th). Literature [21] shows that the three-stages time ratio of four is 0.5:0.2:0.3. Tables 5-7 show the parameters of different stages of bearing 1 feature extraction. The selection of parameters is first randomly generated according to the distribution rules of the WD distribution, and then the optimal parameters are obtained by adjusting the parameters according to the real data.   Through Tables 5-7, features according with bearing degradation is fitted. In Figures 8-10, the selected features are consistent with the bearing degradation process. After fitting, the fitted features are constructed for the following training. NB is a supervised learning method, so it is necessary to label the training data. Figure 11 shows us adding labels of varying degrees of degradation according to the RMS degradation process. As can be seen from the Figure 11a, there are six types of labels, among which label 1 represents normal bearing data, label 2, 3, 4 and 5 respectively represent 25%, 50%, 90% and complete failure, respectively. In Figure 11b,c, the six type of labels represent 0, 30%, 60%, 90%, 100% and 0, 40%, 80%, 95%, 100%, respectively. After UFRF fitting, the fitted feature is used as the input of NB. In the test phase, the true feature of bearing degradation is as the input.
Through Table 4, features according with bearing degradation is fitted. In Figure 6, the selected features are consistent with the bearing degradation process. After fitting, the fitted features are constructed for the following training. NB is a supervised learning method, so it is necessary to tag the training data. Figure 11 shows us adding labels of varying degrees of degradation according to the RMS degradation process. As can be seen from the Figure 11a, there are six types of labels, among which label 1 represents normal bearing data, label 2, 3, 4 and 5 respectively represent 25%, 50%, 90% and complete failure, respectively. In Figure 11b,c, the six type of labels represent 0, 30%, 60%, 90%, 100% and 0, 40%, 80%, 95%, 100%, respectively. After UFRF fitting, the fitted feature is used as the input of NB. In the test phase, the true feature of bearing degradation is as the input. Then, according to the classification concept of NB, the bearing data is divided into five categories (label 1-5). The descriptions of bearing data are shown in Tables 8-10.   After fitting, the training data set was obtained, and then the NB classification model was trained. The real bearing data is used as the model input, and the test results are obtained, as shown in Tables 11-13. Table 14 shows the comparison results of bearing 1. It can be seen from the table that the result of NB classification is better than reference [26].  [26] 74.2%

RUL Construction
After the classification, we have obtained the category. However, the results cannot predict the RUL of bearings directly. Therefore, a new approach is needed to convert the category results of the time series into the degradation trend of the time series, and then to predict the RUL of the bearing.
The exponential smoothing is proposed by Brown, who believes that the time series is stable and regular [43]. So the time series can be reasonably forecasted and exponential smoothing is a common method in prediction. Therefore, this paper proposes that exponential smoothing method is used to obtain the real degradation process of bearings, so as to predict the RUL of bearings. The equation is as follows: where z is the smoothing constant and s t the smoothing value of time t, is the real value of time t. s t−1 the smoothing value of time t − 1. The smoothing constant z is very important for the smoothing level, which determines the gap response speed between the predicted value and the actual result. The range of the smoothness constant z is from 0 to 1. In general, the closer the smoothing constant is to 0, the stronger the smoothing effect is. The smoothing constant in this paper is a value close to 0. The experimental results show that when the smoothing constants z of bearings 1, 3 and 4 are 0.08, 0.05 and 0.07 respectively, RUL prediction achieve best in this paper, as shown in Figure 12. Figure 12a-c are the predicted results of bearing 1, 3 and 4, respectively. As can be seen from the Figure 12, after smoothing, we can get the RUL of each point.
Error is used as a measurement index to predict bearing precision, and its equation is: |s t − l t | (14) where s t is the predicted RUL, and l t is the real RUL of the bearing. Table 15 is a comparison of the experimental results of the bearing degradation data set. It can be concluded from the experimental results in the table that the algorithm proposed in this paper is more accurate in predicting the RUL of the bearing.

Conclusions
RUL prediction is very important in improving the safety, reliability, availability and maintainability of rotating machinery. Through the prediction of residual life, the possible faults can be detected and predicted in time, and the rotating machinery can be repaired so as to prolong the service life of rotation. This paper starts with bearing vibration signal, combined with data drive technology and fault prediction method to predict the residual life of bearing. Firstly, feature extraction and selection of vibration signal are performed. Then, according to the law of bearing degradation, the bearing data are divided into three stages, and the bearing data are divided into different categories. Next, the bearing degradation data were classified using NB. Finally, an exponential distribution is used to predict bearing's RUL. In order to verify the effectiveness of the experiment, real bearing degradation data were used to perform the experiment. The experimental results show that the NB is effective in predicting the RUL of the bearing. In this paper, the residual life of bearings is predicted based on the classification method, and it is hoped that the remaining life of the bearing can be predicted directly in the future.