Predicting Remaining Useful Life of Rolling Bearings Based on Reliable Degradation Indicator and Temporal Convolution Network with the Quantile Regression

: High precision and multi information prediction results of bearing remaining useful life (RUL) can effectively describe the uncertainty of bearing health state and operation state. Aiming at the problem of feature efﬁcient extraction and RUL prediction during rolling bearings operation degradation process, through data reduction and key features mining analysis, a new feature vector based on time-frequency domain joint feature is found to describe the bearings degradation process more comprehensively. In order to keep the effective information without increasing the scale of neural network, a joint feature compression calculation method based on redeﬁned degradation indicator (DI) was proposed to determine the input data set. By combining the temporal convolution network with the quantile regression (TCNQR) algorithm, the probability density forecasting at any time is achieved based on kernel density estimation (KDE) for the conditional distribution of predicted values. The experimental results show that the proposed method can obtain the point prediction results with smaller errors. Compared with the existing quantile regression of long short-term memory network(LSTMQR), the proposed method can construct more accurate prediction interval and probability density curve, which can effectively quantify the uncertainty of bearing running state.


Introduction
With the development of sensor technology, signal data shows explosive growth. Under such circumstances, effective information extraction and information value analysis become the key links for the efficient use of big data to promote the development of manufacturing industry. In Smart Manufacturing technology, accurate remaining useful life (RUL) prediction provides decision guidance for the formulation of appropriate preventive maintenance and replacement strategies. On the premise of ensuring the operational reliability of mechanical equipment, it can avoid resource waste due to over frequency maintenance or other serious consequences caused by mechanical parts failure. The efficiency of prediction and maintenance decision making will be improved through the mining and analysis of key features of the bearings operation status monitoring data. Therefore, the prediction of RUL of mechanical equipment parts (such as gears and bearings) has been paid more and more attention by scholars. As an essential part of rotating machinery, the performance degradation or failure of rolling bearings directly affects the performance and operation reliability of mechanical equipment. Prognostics Health Management (PHM) of rolling bearings is a process of monitoring and predicting current or future degradation based on historical operational degradation data, designed to assist personnel in making reasonable maintenance decisions to prevent or avoid bearing failure. At present, most sensor-based PHM technologies are mainly based on data-driven statistical models that de-termine rules in a probabilistic manner [1,2] and artificial intelligence algorithms that rely on machine learning tools [3]. Some experts and scholars have put forward the rolling bearing RUL prediction method based on nonlinear degradation and model [4,5].
At the same time, under the background of rapid development of technology, mechanical equipment status monitoring data shows a trend of massive growth, which provides information basis for real-time acquisition and advance prediction of equipment operation status information. However, when a significant measurement error is introduced due to the complex and changeable operating environment of the equipment, environmental interference, abnormal sensor or system disturbance, some isolated singularities deviating from expected values may occur in the original monitoring data. During the operation of the equipment, due to the occurrence of faults or defects, the monitoring data will also appear abnormal. Obviously, such data contains important information, which cannot be processed in the same way as noise data, anomalous sensor data, and anomalous data caused by environmental interference. Therefore, in obtaining the running state of the equipment, noise processing, and key features mining analysis become the essential work of predicting the operating state of the material and carrying out PHM tasks.
Because of the problem that the high dimension original monitoring data often contain different types of noise signals, it is necessary to extract sufficient information from the contaminated original monitoring data and restore the pure data to achieve the effectiveness of degradation indicators. In recent years, key features mining analysis technology has made rapid development in the field of deep learning. Many degradation indicators construction methods and neural networks have achieved satisfactory performance in different scenarios, vibration signal feature extraction technology based on wavelet transform [6,7], wavelet packet decomposition technology [8] has been frequently used in health indicator construction task, and has been proved to be effective. Based on the idea of deep learning, as an unsupervised learning method, autoencoder, is widely used in dimensionality reduction and information retrieval tasks in the field of PHM because of its robust feature extraction and generalization ability [9][10][11]. For the past few years, some experts and scholars have proposed to use autoencoder, or improved version of autoencoder, such as stacked sparse autoencoder [12], stacked denoising autoencoder [13,14], convolutional autoencoder [15], and to reconstruct sensor readings and automatically extract degradation characteristics to obtain unsupervised degradation indicator values to identify the severity of equipment degradation. The long short-term memory network (LSTM) [16], convolutional neural network (CNN) [17], deep convolution neural network (DCNN) [18], Convolution and LSTM Hybrid Deep Neural Networks [19], recursive gating unit (Gru) neural network [20], and recurrent convolutional neural network [21] are combined to realize the task of PHM.
The above research work provides a useful reference for the effective learning of the degradation state characteristics of bearing operation. In particular, literature [22] realized the effective prediction of bearings RUL based on deep autoencoder and deep neural networks. On this basis, we will focus on the dimension, purity and efficiency of data. In practical engineering applications, the actual operation data and forecast model parameters of the bearing have a substantial uncertainty, so the traditional point prediction results will inevitably have errors, which is difficult to reflect the uncertainty of bearings degradation state. In the establishment of maintenance strategy based on reliability, if able to quantify the uncertainty of prediction results, they can run for maintenance decision-making and risk assessment to provide more abundant information. The probability prediction offers an effective way to quantitatively balance the risk, which can better describe the possible fluctuation range, uncertainty, and risk of the RUL in the process of future operation degradation, so it has more research value.
For the probability density prediction, most studies adopt nonparametric modelings, such as quantile regression (QR) [23], kernel density estimation(KDE) [24], etc. Which can directly calculate the distribution function or quantile. Taylor [25] introduced a more flexible model in 2000, namely, the quantile regression neural network (QRNN); how-ever, the shallow structure of QRNN lacks enough ability to simulate the complex time characteristics of time series model. With the continuous development of deep learning, some deep learning models have been gradually applied to the research of time series data. On this basis, considering the correlation between RUL and time, the previous time information can be associated with the current task, the LSTM network model [26] that can learn long-term dependence relationship is constructed. Compared with LSTM and other networks, CNN has a natural advantage in large-scale parallel data processing. Temporal convolution network (TCN) [27] is an improvement of one-dimensional CNN for time series problems, literature [27] shows that TCN has the advantages of faster prediction speed and higher accuracy in most scenarios. At present, there is little research on TCN in the field of engineering reliability, and there is no report on the research of fusion of TCN and neural network in the probability density prediction of RUL.
This study mainly focuses on the problems such as the current running state of the motor rolling element bearings and RUL, which cannot truly represent the abnormal monitoring data caused by various reasons during the equipment condition monitoring process. In this paper, a degradation indicator (DI) construction method based on data reduction and key features mining analysis is proposed. The QR was combined with TCN to improve the performance of the prediction model. Firstly, time domain and frequency domain features are compressed by stacked denoising autoencoder (SDAE). Through redefinition and combination of correlation, redundancy, and monotonicity, the sensitivity measurement standard Std DI of DI is further proposed. The optimal features are selected as optimal DI set Opt DI . Then bearings DI in set Opt DI and its corresponding RUL are taken as input and output variables of temporal convolution network quantile regression (TCNQR) model. The relationship between input variables and response variables under different quantiles is obtained, based on the above results, the probability density function of the predicted bearing RUL is obtained by KDE, which provides more abundant information than traditional point prediction results.
The main contributions of this paper are summarized as follows: (1) The original features are compressed and reconstructed based on the SDAE compression method to obtain low-dimensional representative features and retain sufficient information on the premise of not increasing the size of the neural network. (2) The redundancy, correlation, and monotonicity measures were incorporated into the DI selection criteria. The sensitivity standard of DI is redefined to further reduce the dimension of input feature variables on the premise of ensuring the efficiency of DI set. (3) Combining TCN with QR, a probability density prediction method based on TCNQR is proposed to obtain the predicted value of bearings RUL probability density, obtain more comprehensive and effective information of bearings degradation state, and further reflect the uncertainty of bearings RUL, so as to guide the equipment maintenance decision of actual production and manufacturing activities and avoid large errors and economic losses.
The rest of the paper is organized as follows: Section 2 introduces the basic theory of SAE and QR. The detailed implementation process of the construction of bearings DI set and the TCNQR prediction model are presented in Section 3. The performance of the proposed method was verified using the motor rolling element bearings datasets from Xi'an Jiaotong University (Shaanxi, China) and compared with other methods in Section 4. Finally, conclusions are drawn in Section 5.

Basic Theory of SAE
Equipment condition monitoring data often has noise signals in the data due to the working environment or operating conditions, which leads to "impure" data. To realize the accurate analysis of the data, we need to "clean" the data, extract the effective features, and restore the pure data. In practical application, principal component analysis (PCA), as a classical method to improve the signal-to-noise ratio (SNR) and linear dimension reduction, is limited in subspace feature extraction [28], and the determination of principal component weight is often subjective, so it is not suitable for the understanding of nonlinear features. However, Hinton [29] in 2006 proposed a low dimensional expression method of learning high-dimensional features by training deep "autoencoder" network, which is essentially a signal compression model based on neural network. Autoencoder(AE) compression embodies its advantages in feature compression and provides a new direction for the research in this field. The simplified network structure of shallow AE is shown in Figure 1. The AE is a neural network that reconstructs the input signal from the target expression. The purpose is to obtain a dimension reduction feature expression H = {h (1) , h (2) , h (3) , . . . } of the data through training based on the input unlabeled data The AE encodes the input vector X to the compression feature of the hidden layer through activation function mapping to express H: The feature expression H of the hidden layer is reconstructed by mapping and decoding as follows:x where, θ = (W, b) and θ = (W , b ) are coding model parameters and decoding model weight parameters respectively; W and W are encoding weight matrix and decoding weight matrix respectively; b and b are bias vectors; s function is the activation function, and the general expression is s(u) = sigmoid(u) = 1/(1 + exp(−u)).
The reconstruction resultx cannot reproduce the input x completely and accurately. Our goal is to find the minimum reconstruction error of parameters θ and θ , and to use loss function to represent the training effect and to minimize the loss function. At this time, the common characteristics of the input information x and the reconstruction information x are extracted to the maximum. Generally, there are square error loss function and cross entropy loss function, which are respectively expressed as: The optimization function is expressed as: Considering that there will be noise information in the actual operation process of bearings, this paper uses the de-noising autoencoder (DAE) to add noise to the original features, that is, the original input vector x is added with noise to get x , the unit is randomly selected according to a certain proportion and forced to be set to 0, and then the de-noised data x is trained for encoding and decoding. At this time, the loss function is consistent with the traditional AE loss function, and the optimization function is as follows: The cost function is: where h W,b ( x i ) is the activation value of neurons in the hidden layer. The combination of DAE is stacked into a deep learning hierarchical structure, that is, multiple DAE are cascaded to complete the task of feature extraction layer by layer, and representative features with lower dimension are obtained. Its essence is to take the damaged information with noise as the input signal, so that the reconstructed signal has a certain robustness to the noise in the input signal. The stacked denoising autoencoder (SDAE) takes the hidden layer output of each layer as the input of the next layer, and obtains the parameter sum of each layer. In this paper, Hinton's layer by layer greedy learning algorithm is used to construct a SDAE network. The main idea of the algorithm is to train only one layer of the network at a time, that is, to train a DAE with only one hidden layer at a time. When the DAE of this layer reaches the optimization, the next DAE will be trained.
Build the three-layer SDAE model, the input layer nodes is n, hidden layer nodes is m, the original input data x is de-noised according to the proportion coefficient λ to get x. According to Equations (1)-(6), the hidden layer H of the first layer is obtained, and the hidden layer output of the first layer network is obtained according to formula (7) of the minimized cost function , that is, the model parameters θ 1 = (W, b) are obtained. According to the layer by layer greedy training algorithm, the hidden layer outputs of the second and third layer networks are obtained in turn.
The model parameters W and b are obtained by updating the weight once in each iteration using the gradient descent method. The process is as follows: where α is the learning rate.

Basic Theory of Quantile Regression
The essence of quantile regression (QR) [30] is to estimate the different conditional quantiles of response variables by taking different values of τ , so as to obtain the regression prediction model under all quantiles. The calculation formula is: is the model parameter related to quantile τ, the appropriate α(τ) needs to be solved by optimizing the following formula : where, N is the number of samples, ρ is the optimization function.
After obtaining the estimated value of α(τ), the estimated value of the response variable under τ quantile can be obtained according to Equation (8).

Methodology
To formulate a reasonable and effective maintenance plan, it is necessary to take the RUL as the reference. Bearing as the spare part of mechanical equipment, its RUL prediction is mainly divided into the following four steps: (1) bearing operation state data acquisition; (2) Representative feature extraction of bearing running state information; (3) Predictive modeling; (4) RUL prediction. Figure 2 is the data-driven based RUL prediction. Based on the original bearings vibration signal data, different features are extracted. Data reduction and key features analysis techniques are used to drive the construction of DI set. that is, the SDAE model is used to reduce the dimension of time-domain and frequencydomain features, the initial representative feature set F SDAE are extracted from the timedomain and frequency-domain features based on the SDAE compression method; the features in F SDAE are de-redundant, and the de-redundant feature set DE MIC FF is obtained based onmaximum information coefficient (MIC) method. The optimal DI set Opt DI which can reveal the degradation state of the bearing is constructed based on the sensitivity standard Std DI proposed in this paper. Then, the elements in DI set Opt DI are input into the TCNQR network to predict the RUL probability density of the bearing. Finally, the prediction results are compared and analyzed. The structure of the bearings RUL prediction model proposed is shown in Figure 3.

Feature Extraction
The time-domain feature represents the change of bearings vibration signal with the passage of time. Although the time-domain features of bearings vibration signal cannot provide enough degradation information to predict the bearings RUL, it describes the attenuation trend of bearings and reveals whether the running state of bearings is stable. For example, according to the amplitude change of vibration signal and other time-domain features, the bearing can be peliminarily judged whether damage occurs, but different time domain features have different ability to characterize the health status of bearing operation, so it is particularly important to select appropriate time-domain features in bearing running condition monitoring. Frequency variation can display the noise information of bearing vibration signal, and the influence of noise signal can be eliminated by processing the frequency domain features of bearings vibration signal. Therefore, the time and frequency domain characteristics of bearings running signal are selected as the initial research object.

Feature Compression
In order to select efficient features, this paper adopts the SDAE to compress the features in the time domain and frequency domain.
The SDAE network is used to realize feature compression. Due to the large number of time-domain and frequency-domain features, it is difficult to combine different time-domain and frequency-domain features to predict the bearing RUL. However, the SDAE network can realize feature compression, ensure the similarity between the decoding results and the input data, and restore the original features to the maximum extent. In the training stage of SDAE network, the input and output are both time domain and frequency domain feature vectors extracted from the original training set. We put the features compressed by SDAE into feature subset F SDAE as the basis for the construction of the optimal DI set Opt DI .

Feature Fusion
Through the SDAE compression features, to a certain extent, can improve the efficiency of features, and the characteristics of advantage can be found, but because of the existence of redundancy, the optimal feature attributes together is not necessarily the optimal and sensitive feature subset. The sensitive features that can represent the running state of the bearing should be monotonically related to the degradation process of the bearing and have low redundancy with other features, since irrelevant and redundant features will affect the prediction efficiency and accuracy, eliminating those features that cannot provide sufficient fault information has become the main task in the feature selection stage.
For the measurement of correlation, most studies use the method of Person linear correlation coefficient [31,32] to measure the correlation, so as to select the features with higher contribution rate. However, the nonlinear relationship between variables is often ignored, resulting in the unreliability of measurement results, because the complex relationship between general variables cannot be modeled by a single function. The MIC [33] solves the above problems very well. The MIC is suitable for measuring the linear and nonlinear relationship between variables in the measurement data, and mining the non functional dependency relationship between variables.
Based on these goals, MIC is used to measure the relationship between features and degradation trend and the correlation between features.
The MIC is calculated mainly by mutual information (MI) and meshing method. MI is used to measure the degree of correlation between variables. Given variable A = {a i , i = 1, 2, . . . n} and variable B = {b i , i = 1, 2, . . . n}, where n is the number of samples, the mutual information (MI) is defined as follows: where P(a, b) is the joint probability density of A and B, and P(a) and P(b) are the boundar probability densities of A and B, respectively.
. . n} is a set of finite ordered pairs. It defines a division G, which is used to divide the value range of variable A into x segments and divide the value range of variable B into y segments. G is a grid with a size of x × y. Calculate MI(A, B) within each grid partition obtained, since the same grid can be divided several ways. The maximum value of MI(A, B) under different division methods is chosen as the MI value of a division G.
The maximum mutual information formula of D under a division is defined as MI * (D, x, y) = max MI(D|G ), where D|G denotes data D are divided by G. The maximum information coefficient (MIC) uses MI to indicate the quality of the grid; a feature matrix is formed by maximum normalized MI values under different divisions. The feature matrix is defined as M(D) x,y and the formula is : MIC is defined as: where n is the sample size of the sample and B(n) is a function of sample size and represents the upper limit of the grid x × y. Generally, ω (1) ≤ B(n) ≤ o n 1−ε , 0 < ε < 1. In this paper, the MIC is used to measure the correlation between features and degradation trend as well as features and features. In essence, it is a normalized maximum MI with the value interval of [0, 1]. If Std DI > 0.5, the feature will be selected into the optimal DI set Opt DI .

Optimal DI Set Construction
A subset of F SDAE is composed of low-dimensional feature expression based on SDAE feature compression method. We will further de-redundant the features in the subset F SDAE .
Based on the MIC feature selection method [34], a de-redundancy feature subset DE MIC FF can be obtained by de-redundancy operation.
After the de-redundant feature subset DE MIC FF is obtained, we will calculate the correlation and monotonicity of the features in the DE MIC FF subset, and measure the sensitivity of the DI based on the proposed Std DI standard, which ultimately constitutes the optimal DI set Opt DI . The steps for obtaining the set Opt DI are shown in Algorithm 1.  3: for Every value in every row of the MIC − Mon matrix do 4: Calculate the Std DI value based on Std DI = MIC FR +Mon F−DT 2 , 5: if Std DI > 0.5 then, 6: Select the features in subset DE MIC FF to form the optimal DI set Opt DI .

Quantile Regression of Temporal Convolution Network (TCNQR)
Based on the results of literature [27], TCN has faster prediction speed and higher prediction accuracy. In this paper, quantile regression (QR) was combined with TCN, and the RUL probability density prediction of bearings was realized based on the optimal DI construction method.
The optimization problem of quantile regression (QR) will be introduced into the TCN to realize the bearing RUL probability density prediction. By optimizing the following objective function, the parameters of the TCNQR model are estimated.
X i is the input data constructed by the above DI construction method. The RUL prediction can be expressed as Y p i = g(X i ,Ŵ,b),Ŵ andb are the optimal estimate of the weight parameter and the bias parameter.
The essence of TCN is the optimization of CNN for time series problems. By adjusting parameters such as convolution kernel, number of convolution layers and expansion coefficient, the global perception of sequence data with specified length is realized, it can be expressed as: where, X is the input time series; * is the convolution operation; o is the size of the convolution kernel; d is the dilation factor; D is the expansion coefficient. Further, in order to simplify the calculation of parameters and accelerate the convergence speed of the algorithm, TCN rewrited the weight parameter W as a joint representation of modulus m and direction V through weight normalization [27], as shown in Equation (16).
To avoid too much information loss in the process of feature extraction, TCN replaces the convolutional layer with the residual block, and the input data passes through 1 × 1 convolution processing, reaches the specified dimension, it is added with the feature data extracted by the dilated causal convolution [27] to serve as the final output of this layer. As shown in the Figure 4, the two dilated causal convolution layers and their corresponding weight normalization, spatial dropout [35] (which added after each dilated convolution for regularization), activation function and residual connection link, and finally formed a new residual block, which is the basic unit of deep TCN. In this paper, combining the QR theory and TCN algorithm, Quantile Regression of temporal convolution network (TCNQR) algorithm is proposed for bearings RUL prediction problem. The QR loss function is as follows: The value range of τ is 0 to 1. Through constant optimization and adjustment of W and b, equation (17) can be minimized. In TCNQR, equation (14) is rewritten as: whereQ Y (τ | X) is the optimal estimate of Y when input data is X in the case of determining quantile τ.

Kernel Density Estimation (KDE)
Kernel density estimation (KDE) [36] is a method to estimate the probability density function of unknown variables from the data itself [37]. Taking the above conditional quantile as the input value of KDE and selecting the appropriate window width for probability density prediction, the following quantile function can be obtained. Z i =Q Y (τ | X), i = (1, . . . , n), n is the number of quantiles.
Assuming that Z 1 , Z 2 , . . . , Z n is an independent quantile function derived from the estimated probability distribution, then its kernel density is estimated as: f h (Z) = 1 nh ∑ n i=1 K( Z−Z i h ), h is the window width, and K(·) is the kernel function.
For the selection of kernel function, there are Gaussian kernel function, matrix kernel function and trigonometric kernel function. In this paper, Gaussian kernel function is selected, and the expression is: , the value range of h is 1.8 − 2.0.

Prediction Accuracy Measures
In this paper, the mean absolute error (MAE) and root mean square error (RMSE) are used as the evaluation indexes of deterministic point prediction results, and the prediction interval coverage (PICP) and Mean prediction interval width (MPIW) are used as the reliability and clarity evaluation indexes of probabilistic prediction results [38].
(1) Mean absolute error (MAE) n is the number of prediction points, Y i is the real RUL of the sample, Y P i is the predicted value.
(3) Reliability Prediction interval coverage probability (PICP) is usually used to evaluate the accuracy of prediction interval. It is composed of upper bound and lower bound of coverage target value.
A larger PICP value means that more target values Y P i fall within the constructed prediction interval. n is the number of prediction points, C i is the coverage of the prediction interval, , otherwise, C i = 0. u i and l i are the upper and lower bounds of the target value Y P i respectively, and the optimal value of PICP is 100%, which means that all the target values Y P i fall within the prediction interval, that is, the coverage rate is 100%. (4) Clarity Mean prediction interval width (MPIW) shows the average width of prediction interval [39].
where D is the difference between the maximum value and the minimum value of the target values Y P i , and the target with D is used to normalize the average width. The bandwidth of the prediction interval should be as small as possible, which determines the amount of information in the prediction interval. Therefore, the width of prediction interval can be used as a standard to compare the prediction results.

Experiment and Analysis
The bearings run-to-failure data acquired from accelerated degradation tests were used to demonstrate the effectiveness of the proposed method. The proposed method was compared with other methods.

Data Description
The bearings testbed is shown in Figure 5. Detailed information about the platform and experiments can be found in [40].
As tabulated in Table 1, 15 rolling element bearings were tested under three different operating conditions. The first two bearings and the last bearing in every operating condition were regarded as a training set and the others were used as a testing set.
The change trend of the original time domain and frequency domain characteristics with the monitoring time is shown in Figure 6. The amplitude of vibration signal varies with time, which indicates that vibration signal plays an important role in the evaluation of bearings performance degradation.

Data Preprocessing
The original vibration signal data usually contains rich degradation information. In order to effectively characterize the degradation state, it is necessary to process the vibration signal properly, such as feature extraction and transformation.
In the bearing vibration signal data, in order to avoid information loss, we extract multiple features from time domain and frequency domain to form the initial feature set. In addition, to improve the convergence speed and prediction accuracy of the prediction model, all characteristics are normalized and set between [0,1].
When bearing failure occurs in mechanical equipment, both time-domain and frequencydomain signals will change. Vibration signal data has many time-domain and frequencydomain characteristics, and different features have different representational capabilities for bearing health state. Some time-domain features even have almost no representational capabilities. Therefore, we need to select the appropriate time domain and frequency domain characteristics to achieve efficient prediction of bearings RUL.
The initial features are extracted from the original vibration signals, the time-domain and frequency-domain features in both horizontal and vertical directions are calculated, and the total number of features in both directions is 50, the time-domain feature and frequencydomain features were calculated using the feature parameters listed in Table 2 [41]. As the detailed characteristic parameters in Table 2 where x(n) is the time-domain signal series, for n = 1, 2, · · · , N, N is the number of each sample points.
where s(k) is the frequency-domain signal series, for k = 1, 2, · · · , K, K is the number of spectral lines. f k is the frequency value of the k-th spectral line.

Construction of Bearing Optimal Degradation Indicator Set
In this paper, the SDAE feature compression method is used to compress and extract low dimensional features from 24 time-domain features and 26 frequency-domain features in horizontal and vertical directions. In this experiment, different output features are used to test. Before the original time-domain and frequency-domain features are input into the SDAE network, the input data are normalized to ensure the effectiveness of the compression [22]. It can be seen from Figure 7 that the decoding error decreases with the increase of the number of features. However, when the number of features is 35, the average decoding error of bearings from three working conditions tends to be flat. To retain decoding information as much as possible, the first 35 features are selected, that is, the network inputs 50-dimensional time domain and frequency domain vectors, and outputs 35-dimensional compressed time domain and frequency domain features. It is called the SDAE compression feature set F SDAE .  At last, the correlation and monotonicity of elements in feature subset DE MIC FF are calculated, the correlation between features and degradation trend MIC FR , and the monotonicity Mon F−DT between features and degradation trend obtained by monotonicity measurement method Mon F−DT . In order to get the optimal feature set Opt DI , the index Std DI is used to select features, and 0.5 is taken as the threshold, as shown in Figure 9. Features higher than 0.5 will become members of the optimal feature set Opt DI , and features lower than 0.5 will not be retained. Therefore, there are 14 features in the final optimal feature set Opt DI . The MIC FR value, Mon F−DT value, and Std DI value of the features in the final optimal feature set Opt DI are shown in Figure 10.  In the same process, the bearings degradation indicator set in working condition B and working condition C are constructed with the same method and the specific parameters of the features in the constructed optimal degradation indicator set are shown in Figure 11

Train Prediction Model
After obtaining the set Opt DI , the prediction model is trained. In the prediction stage, the feature vectors of the test data set are input into the trained TCNQR network, and the predicted RUL values will be output . The prediction results will be evaluated based on the prediction accuracy measures in Section 3.6.
In the TCNQR network, the number of residual blocks and the size of the convolution kernel will affect the computation amount and computation speed of the neural network, and the appropriate dropout rate can effectively solve the problem of neural network over fitting, the network is built by using Keras deep learning framework, the activation function is the default Relu function. Using Adam optimization algorithm, we design independent adaptive learning rates for different parameters. The number of iterations is 100, the quantile interval is 0.01, then a TCNQR network was constructed. The optimal structure and parameters of the TCNQR model under each quantile are determined by training samples, based on the trained model, the quantiles of the predicted bearing at all quantiles are obtained, which are substituted into the Gaussian kernel density estimation function to estimate the probability density curve of the predicted bearing RUL.

Comparison of Point Prediction Results
To reflect the advantages of constructing bearing DI set by the method proposed in this paper, different DI construction methods were applied to the bearings under three different working conditions. Each bearing in the test set was run 10 times; we obtained the average prediction accuracy of 3 bearings under the every operating condition. Figure 12 (a) − (c) depict the average prediction accuracy and the number of features in the DI set of operating condition A, operating condition B, and operating condition C, respectively. As shown in Figure 12, the proposed method extracts fewer features than the other feature selection methods, and has relatively high accuracy. This is mainly because the proposed method measures redundancy, correlation and monotonicity on the basis of dimension reduction, which ensures the sensitivity robustness of the degradation index set based on dimension reduction features.
The median of TCNQR prediction result can be regarded as estimating the RUL of bearing as a certain value or at a certain point, in order to further quantify the effectiveness of the prediction results, we take MAE and RMSE as the evaluation indexes of the prediction results, the results were compared with the predicted results of TCNQR. We predicted the bearings under different operating conditions in the test set 10 times, and calculated the MAE and RMSE of the three bearings under each operating condition, compared the RUL prediction results based on the PCA construction method, SDAE construction method, de-redundant construction method and the DI construction method proposed in Table 3.
According to the comparison results in Table 3, it is found that the proposed method has obvious advantages in the error of RUL deterministic prediction results under these three different working conditions, on the one hand, the feature selection method we proposed in the same prediction model has a lower error, which proves the effectiveness of the DI construction method proposed in this paper. On the other hand, under the same DI construction method, TCNQR prediction model has better prediction performance. The results show that the proposed method achieves the goal of optimizing the prediction model from the perspective of data analysis.
Taking the bearing_3 of different conditions as case, taking the median of density function as the result of deterministic point prediction, and the influence of DI construction on the prediction results is considered. In order to increase the readability of the comparison graph, three sets of degradation indicators with fewer features were selected to compare the predicted results with the real RUL. The prediction results of TCNQR prediction model are shown in the Figure 13. Figure 13a-c are the deterministic prediction results of three different bearing _3 under working condition A, condition B, and condition C, respectively. It can be seen that the prediction results of the method proposed in this paper is closest to the real RUL value, this is more clearly shows the effectiveness of the DI method proposed in this paper.

Comparison of KDE Prediction Results
To verify that the method proposed in this paper can more effectively predict the random uncertainty characteristics of the RUL, based on the above contents, the probability density prediction results of bearing A_4 are obtained by TCNQR. Taking three operation moments (75 min, 90 min and 105 min) of bearing A_4 as an example, the probability density curve of bearing RUL is obtained. The real RUL values corresponding to the above there operation moments are 47 min, 32 min and 17 min. The prediction results are shown in the Figure 14. Figure 14d-f are the KDE prediction results of 75 min, 90 min and 105 min, respectively.
As can be seen from the above figure, the density of the predicted value of the proposed method is concentrated in a range smaller than the real value (at three different moments), which meets the demand for early warning in practical engineering applications. Compared with PCA method, the prediction curve of De-redundant method is more concentrated and closer to the real RUL value. Of course, the peak value of the probability density curve of the RUL prediction result of the proposed method is the closest to the real RUL value, indicating that the probability density forecasting method of bearing RUL based on the TCNQR model and KDE is better than other methods.
In order to verify the effectiveness of the proposed method, the RUL interval prediction of three bearings under different working conditions was carried out, the PICP value and MPIW value of each model at the confidence level of 90% were recorded, and the results were compared in Table 4.   Table 4 shows the prediction results at different time points, among which the RUL prediction results at three different moments are shown. Sometimes the PICP value of PCA method and F SDAE method is less than 90%, it is less than the given confidence degree, which indicates that the method can not guarantee the accuracy of the prediction results. In terms of accuracy, the PICP values of de-redundant method and DI method proposed in this paper are all greater than 90% at most times, especially the DI method proposed in this paper has higher prediction accuracy. From the perspective of MPIW value analysis, the MPIW value of PCA method and de-redundant method are both larger than the DI construction method proposed in this paper. It can be seen that the method proposed in this paper can obtain a relatively narrow prediction range on the premise of meeting the accuracy. Through the horizontal comparison between LSTMQR and TCNQR prediction model, it can be seen that the PICP value of TCNQR is larger than LSTMQR, and the MPIW value of TCNQR is significantly smaller, which further shows that TCNQR has better prediction performance in probability density prediction of time series problems.

Conclusions
The performance of rolling bearings DI set will affect the prediction accuracy of the bearing RUL to a large extent. In this paper, the SDAE feature compression method is introduced to compress complex multi-dimensional degradation information. The optimal feature set Opt DI is obtained based on the redefined bearing DI standard Std DI . The influence of DI on bearing RUL prediction is further quantified. Combining TCN with QR, the TCNQR method based on kernel density estimation is used to realize the bearing RUL prediction. Through experiment validation, the proposed method was compared with different DI construction methods and prediction models. The results showed that the method could not only improve the prediction accuracy, but also obtain the probability density function of the RUL of bearings operation degradation at any time. Through the method proposed in this paper, a higher prediction accuracy and a narrower prediction range can be obtained, and an optimal decision can be made for employees when making maintenance plans, thus reducing resource waste and production downtime and other consequences caused by bearing failure due to over frequency maintenance.