An Automatic Partition Time-Varying Markov Model for Reliability Evaluation

As the service time of mechanical devices is getting longer and longer, the safe and reliability evaluation during operation is highlighted. Moreover, real-time reliability evaluation with consideration of multi-state performance degradation becomes increasingly important nowadays, since the consequences of sudden failures are more unacceptable than ever before. The Markov process is a commonly used model in multi-state reliability evaluation. However, little research of the Markov model can deal with multi-source monitoring data and time-varying properties of device performance degradation, as well as the scientific state number determination. In this article, a real-time reliability evaluation model based on automatic partition and the time-varying Markov chain is proposed to solve the problems of the scientific state number selection and time-varying properties description with the state transition matrix of the Markov process, together with taking advantage of multi-source information. The effectiveness of the proposed algorithm is demonstrated on the bearing with life-long vibration and temperature data. It shows that the proposed automatic partition time-varying Markov model can decide the state number automatically according to the trend of life-long data, and evaluate real-time reliability based on equipment operating hours and operating status. The result of predicted remaining useful life obtained by the proposed model is more accurate, and it also shows great superiority in conformity with reality.


Introduction
With the continuous improvement of design and manufacturing technology, as well as the materials used, service time of mechanical devices is getting longer and longer, especially in the field of rail transit. For instance, the design service life of bearing on a bogie axle box is more than 20 years. These products usually have only a small number of failures or no failures in life tests, including accelerated life test methods. However, the traditional reliability evaluation methods such as fault tree analysis (FTA) [1], event tree analysis (ETA) [2], and failure mode effects and criticality analysis (FMECA) [3], etc. mainly focus on the "average" characteristics of the same kind of product with fault record data. There is an insurmountable gap of these methods in describing the performance degradation of specific components. Moreover, devices with the same type have different degradation trajectories under different operating environments [4]; even devices of the same type with the same operating environments may also have different degradation trajectories, caused by randomness in the manufacturing process or other unexpected factors. Since real-time reliability evaluation becomes increasingly important nowadays, performance monitoring data is generally used for reliability evaluation of the specific product, instead of traditional failure records. Thanks to the rapid development of sensor technology, more and more data for the service performance evaluation and reliability analysis hasve been provided, which greatly improves the possibility of accurate reliability evaluation and prediction.
The concept of reliability analysis based on performance degradation data was first proposed by Gertsbackh and Kordonskiy in 1969 [5]. Since then, many experts and scholars have started exploratory research in this area. At present, it can be mainly divided into two kinds of performance degradation reliability analysis methods based on regression model and stochastic process. A simple regression model is used in the escalators Avante model based on time series analysis [6]. The effect of variable amplitude service loading conditions and wear phenomenon are investigated based on nonlinear regression reliability analysis for rail steel [7]. Reference [8] combined the fault data with the state detection data of performance degradation, and used the DTS (Degradation Threshold Shock) method to analyze the reliability with competitive failure phenomenon. The commonly used stochastic processes are mainly the Wiener processes [9,10], Gamma process [11], Poisson process [12], Markov process [13], etc. Some reliability analysis methods generally assume that the device only has two states (normal and failure) [14][15][16]. The potential changes during device degradation had been ignored, which made the reliability results too crude to guide practical application. Compared with two-state (normal, fault) reliability analysis models, the multi-state model is more suitable for actual equipment operation. Reference [17] provided a reliability evaluation framework based on multi-performance conversion, which is conducted on the conversion capabilities of the components. A multistate physics model for dynamic reliability evaluation with the crack propagation state transitions and the Forman model is proposed for a multi-cracked structure [18]. As a powerful tool for multi-state performance degradation modeling, the Markov process of different types has obtained many achievements, such as discrete time Markov process (DTMP) [19,20], continuous time Markov process (CTMP) [21,22], discrete-time hidden aging Markov process (DTHAMP) [23,24], nonhomogeneous continuous-time hidden Markov process (NHCTHMP) [25,26], homogeneous discrete-time hidden semi-Markov process (HDTHSMP) [27][28][29], etc. However, the number of device states was set directly, which may not objectively reflect the overall trend of the device's performance and may lead to a decrease in model accuracy. Very few pieces of literature have studied the actual number of the device's states. Reference [30] achieved the optimal number of states by multiple trials under different states number with BIC and AIC as evaluation indicators. However, the time-consuming problem stands out with the increase of available data. Besides that, the state transition matrix of most Markov models is fixed, which means no matter how the device performance degraded, the device reliability does not change when it is detected in a certain state of such a Markov model. In fact, the longer the system or device stays in a certain state, the greater the probability of transitioning out of the state, so these mentioned Markov models cannot reflect the performance degradation within the same state interval over time, which is not in line with reality [31]. That is a great obstacle in dealing with time-varying characteristics of device performance. Moreover, most degradation reliability evaluation methods, especially for mechanical devices, are based on single monitoring data, such as vibration data. Multi-source monitoring is the trend of device performance condition monitoring, and more information is provided for accurate detection and prediction [32]. How to make full use of multi-source data is an urgent problem to be solved.
Hence, in this article, a real-time reliability evaluation model based on the automatic partition method and time-varying Markov chain is proposed, in which the automatic partition is utilized to select the state number of Markov processes scientifically, while the time-varying properties of device performance are described by the time-varying transition matrix of the Markov process. The rest of this article is structured as follows. The basic concept of Tsallis entropy working as data features is presented in Section 2, first. Then the framework of the proposed real-time reliability evaluation algorithm based on automatic partition time-varying Markov process is illustrated. An experiment is carried out in Section 3 to demonstrate the superiority of the proposed algorithm. In particular, it is conducted on bearing with a condition monitoring vibration signal in two directions and a temperature signal. The article is summarized in Section 4.

Tsallis Entropy
Tsallis entropy is non-extensive entropy; it is an expression of entropy with exponent q proposed by Brazilian physicist Constantino Tsallis in 1988 [33]. Its mechanism is based on the non-extensive statistics derived from the Boltzmann-Gibbs theory. The Tsallis entropy performs very well in the description of the system with long-range interactions, long-term memory effects, or systems evolving in a multifractal-like space-time. Due to the above characteristics, the Tsallis entropy has a wide range of applications in many fields, such as physics, medicine, and economics. The Tsallis entropy is concave, stable, and is limited in the rate of entropy generation per unit time [34]. Although Shannon entropy, Renyi entropy, etc., have very rich applications in the field of bearing analysis, these entropies are not good enough to describe the bearing performance degradation with vibration signal. Therefore, the Tsallis entropy is extracted as the data featured from the vibration signal in this article. The definition is as below: where q describes the non-extension degree of the target object. In this research, it represents the long-range correlation of time series data. n is the total data number in one sample. p i is the i-th value of the sample. The purpose of Tsallis entropy is to find a clear q value (generally not equal to 1) to describe as much as possible the phenomenon that is neither regular nor completely chaotic or random. After many experiments, we found that q = 2 can better describe the changing characteristics of bearing.

The Automatic Partition Time-Varying Markov Model for Reliability Evaluation
The reliability analysis method based on the automatic partition time-varying Markov process mainly includes five steps: degradation feature extraction, state number decision, training data preparation, time-varying Markov process modeling, and reliability assessment. The automatic partition method is used to automatically divide the component degradation sample data into multiple stages, to avoid the artificial and time-consuming selection of the number of Markov process states. In the data processing procedure, in order to deal with the different degradation processes of bearings, the slope-based dynamic time warping method is used to effectively regularize the collected samples, so as to get more accurate bearing Markov model. On the basic Markov process, a time-varying transition probability matrix is proposed to describe the changing trend of the device's own characteristics with time, which provides a necessary means for more accurate assessment of device state reliability and life prediction. The functions for reliability and remaining useful life are provided at the last step.
The basic steps are provided as follows and shown in Figure 1.
(1) Extract feature on the full-life-running vibration data.
(2) The automatic partition method is used to decide the state number of Markov model, corresponding to the degradation state of device.
(3) Prepare training data. The slope-based dynamic time warping method is used for sample time warping to complete the sample data processing in this step.
(4) Establish the time-varying Markov model. The model parameters are re-estimated using the conditional probability formula for calculating the initial forward probability, backward probability, and state sequence. (1) Extract feature on the full-life-running vibration data.
(2) The automatic partition method is used to decide the state number of Markov model, corresponding to the degradation state of device.
(3) Prepare training data. The slope-based dynamic time warping method is used for sample time warping to complete the sample data processing in this step.
(4) Establish the time-varying Markov model. The model parameters are re-estimated using the conditional probability formula for calculating the initial forward probability, backward probability, and state sequence.
The objective function is generally represented as the distances between the monitoring data in each segment and their corresponding fitting function.

The Automatic Partition Method for State Number Determination of the Markov Model
The automatic partition method based on fuzzy time series segmentation is used to automatically obtain the state number of the Markov model.
It is assumed that x l is an M dimensional multivariate sample, and S is the sample se- According to the trend of data change, that is based on the equipment performance degradation, sample sequence S can be partition into several segments.
In this article, sample sequence S is assumed to be divided into P segments, noted as Sub P S = Sub p S |p = 1, 2, . . . , P , and the boundaries are s 1 < s 2 < · · · s P , where s 1 , s 2 , . . . , s P ∈ S and s 1 = d 1 , s P = d L . That is, Sub p S = (s p−1 + 1, s p ), p = 1, 2, . . . , P. Define the sample sequence segmentation objective function as C(Sub P S ).
The objective function is generally represented as the distances between the monitoring data in each segment and their corresponding fitting function.
where v x p is the cluster center of the p-th segment, D 2 (x l , v x p ) is the distance between x l and v x p , β p (t l ) is the membership function of x l subordinate to the p-th segment.
The objective function mentioned above is minimized to obtain the segment boundaries with optimal segment positions s p−1 + 1, s p , p = 1, 2, . . . , P. It is generally conducted by various heuristic algorithms.
β p (t l ) is defined as a (0, 1) binary function in references [35,36], which is not appropriate in practice. In this article, the multivariate mixture Gaussian function is introduced into a cluster prototype as the sequence fitting function to describe the fuzzy characteristic of boundaries between segments. The optimal segment positions are obtained by minimizing the weighted sum of squared distances between the monitoring data in each segment and the center of the cluster prototype.
is the weighted index of fuzzy clustering, generally taking m = 2 [37]; D 2 d l , η p represents the distance between the monitoring data in each segment and the center of the cluster prototype; η p is the clustering prototype function of p-th segment. It is assumed that the expectation and the covariance matrix of the used multivariate Gaussian distribution for sample sequence are v p and F p , respectively. The probability density function f (d l |η ) denoting the adscription of monitoring data to the p-th segment is According to probability theory, the distance function and the membership in the p-th segment are inversely proportional based on the Gath-Geva clustering algorithm [38]. The time and feature variables of sample sequence are independent with each other. Then, where α k is the initial probability of clustering, ) is the distance between the time variable and the clustering center of time variable, ) is the distance between the feature variable in the l-th sample and the feature cluster center, v x p is the feature cluster center of p-th segment, and r is the rank of the feature variable distance norm, A p .
There are many ways to calculate the distance norm A p . Mahalanobis distance is a good choice and can be used to adjust the correlation between variables. Let A p = F p , where F p is the fuzzy covariance matrix of the multivariate Gaussian distribution, Principal Component Analysis (PCA) is introduced to eliminate the strong correlations between variables while reducing dimensionality.
It is assumed that λ p,q , q = 1, 2, . . . , Q are the Q non-zero eigenvalues of F p in descending order. Their corresponding eigenvectors are u p,q , q = 1, 2, . . . , Q. It can be obtained that: The feature variables of the sample sequence are reduced to Q-dimension by the PCA method, and y p, to probabilistic principal component analysis [39], Until now, the automatic partitioning algorithm of fuzzy segmentation has been transformed into an operational research optimization problem. The model is shown below, The alternate optimization algorithm and the Picard iterative algorithm can be used to solve this problem.
The basic steps are as follows: (1) Initialization The initial segments number of S and the dimension of the eigenvector space are given at first. Then, select the appropriate stopping condition, ε, ε > 0, and initialize W p , v x p , σ 2 p,x , µ p,l ; (2) Loop calculation.
(i) Calculate the cluster prototype η p .
Cluster initial probability: Cluster center: where Variance update: Distance norm: Calculation of cluster prototype center and standard deviation parameter of time parameter for sample sequence: (ii) Cluster merging The similarity factor, S a,b PCA , based on PCA proposed by Krzanowski is applied as one of the merging criteria for two adjacent segments, Sub a S , Sub b S , to determine whether they need to be merged or not.
where U a,Q , U b,Q are the first Q eigenvalues of the eigenvectors for segments Sub a S and Sub b S , respectively. The distance between the cluster centers of Sub a S and Sub b S is another merging criterion.
Since the clustering process is carried out in the whole range of S, the clustering similarity of each state space is measured by the fuzzy decision algorithm proposed by U. Kaymak et al. are featured in [40], in which the overall similarity matrix is If the clustering similarity factor, h a,a+1 , of two adjacent segments, Sub a S , Sub a+1 S , is exceeded the set threshold, Sub a S , Sub a+1 S will be merged. It ends when the algorithm termination condition, ε, is reached.

The Time-Varying Markov Model
The automatic segmentation method provides a scientific basis for the states number selection of the Markov process. Moreover, we argue that the transition probability between states is not only related to the state that the device stays, but also related to the sojourn time that the equipment stays in the present state. In practical situations, the longer a device stays in a state, the greater the probability of its transition from that state to another states. That is to say, the state sojourn time affects the state transition probability to a certain extent. Therefore, the state transition matrix (Sojourn time-based state transition probabilities) related to the state sojourn time is introduced to establish a time-varying Markov process for real-time reliability evaluation. The proposed Markov process is noted as M = (π, A, ST), where, π is the initial state probability distribution π = {π i }, 1 ≤ i ≤ N and π i = P[s 1 = i], 1 ≤ i ≤ N; N is the state number, which is determined by the automatic partition method mentioned above; A is the time-varying state transition matrix, and a ij (st i t ) represents the state transition probability of the device transitioning from state i to state j when the device stays in state i ST i is the maximum sojourn time in state i, and ST is the state sojourn time distribution, which is assumed to obey a Gaussian distribution.
Firstly, according to the sample set, the sojourn time distribution of each state is determined, and then the model parameters are estimated by the improved Baum-Welch algorithm.
(1) When the model parameters M = (π, A, ST) are known, the probability that the device stays in state i for a sojourn time, T * , at time t is α t (i, T t i * ) = P(q t = i, st i t = T t i * |M ), When t = 1, the initial forward probability is When t = 2, 3, . . . , T, the forward probability recursion formula is (2) At time t, when the equipment stays in state i for a sojourn time st i t , the backward probability of producing the state sequence in time T − t is β t (i, st i t ) = P(s t = i, st i t |M ). When t = T, the initial backward probability formula is When 0 < t < T, the backward probability recursion formula is Based on the forward probability formula and the backward probability formula, the conditional probability formula S = (s 1 , s 2 , . . . , s T ), s 1 , s 2 , . . . , s T ∈ {i, 1 ≤ i ≤ N} of the state vector can be obtained as (3) The re-estimated parameters of the time-varying Markov model can be obtained by the formula is the probability when the device stays in state i for a sojourn time of st at time of t, It can be seen from a ij (st) = p(s t+1 = j|s t = i, st i t = st) that the revaluation formula of a ij (st) is Assume that the state sojourn time probability obeys a Gaussian distribution (µ i , σ 2 i ),

(4) Reliability and remaining useful life prediction
The device in the completely failed segment means that it cannot complete the specified task as required. Therefore, it can be considered that the probability ζ t (i, N, st) of the device transferring to the completely failed segment, N, at time, t, with a sojourn time, st, is its unreliability. The sum probabilities for device from the current state transferring to every other states is its reliability.
The reliability of a device at different times can be defined as The failure rate function is noted as λ(t), and its life distribution function is as F(t); density function is f (t), f (t) = F (t), the reliability function is R(t), F(t) + R(t) = 1, and λ(t) = f (t)/R(t). For ∀t, there is R(t) > 0. Therefore, it can be considered that λ(t) is an estimate of the interval fault conditional probability.
Let L denote the life cycle of the device and RUL(t) denote the remaining life expectancy of the equipment running normally at time, t, from time, t, to the failure time; At time t * , if the device stays in state i with a sojourn time of st, its remaining useful life is equal to the sum of the effective remaining sojourn time of running state i and the remaining time in other remaining states (assuming that it traverses i + 1, i + 2, . . . , N − 1 states).
Among them, the effective remaining sojourn time in operating state i is where rul i t * +st is the effective remaining sojourn time when the device stays in state i with a sojourn time of st at time t * ; rul i is the expected sojourn time in state i, rul i = µ i . Therefore, the remaining useful life for device at time t * , when it stays in state i with a sojourn time of st can be obtained,

Results
Rolling bearing is one of the most widely used general mechanical components in rail vehicles [35]. Only 10~20% of bearings can reach their design life time, according to statistics. They usually cannot work properly due to fatigue, wear, strain, electrical corrosion, fracture, and gluing, etc. Therefore, the bearing is taken as an example to verify the science of the proposed method in this article.

Experiment Description
The full life data of bearings collected on the PRONOSTIA experimental platform (Figure 2), specially designed and implemented by the AS2M department of the French FEMTO-ST Institute, are used. The PRONOSTIA experimental platform consists of three main parts: the rotating part, the loading part (where radial force is applied on the test bearing), and the measuring part.

Experiment Description
The full life data of bearings collected on the PRONOSTIA experimental platform (Figure 2), specially designed and implemented by the AS2M department of the French FEMTO-ST Institute, are used. The PRONOSTIA experimental platform consists of three main parts: the rotating part, the loading part (where radial force is applied on the test bearing), and the measuring part. The operating conditions of the bearing are measured by the radial force exerted on the bearing, the rotational speed of the bearing, and the torque on the bearing, with a sampling frequency of 100 Hz. The amount of degradation of the bearing is measured by vibration and temperature sensors, and the vibration signal is measured by two placements at a 90-degree angle on the horizontal and vertical axes. The sampling frequency is 25.6 kHz, sampling every 10 s, and each sampling duration of 0.1 s, including 2560 sample points. The temperature sensor is located in the hole near the outer ring of the bearing, with a sampling frequency of 10 Hz and 600 points per minute.
The test dataset used for training in this article is shown in Table 1.  The operating conditions of the bearing are measured by the radial force exerted on the bearing, the rotational speed of the bearing, and the torque on the bearing, with a sampling frequency of 100 Hz. The amount of degradation of the bearing is measured by vibration and temperature sensors, and the vibration signal is measured by two placements at a 90-degree angle on the horizontal and vertical axes. The sampling frequency is 25.6 kHz, sampling every 10 s, and each sampling duration of 0.1 s, including 2560 sample points. The temperature sensor is located in the hole near the outer ring of the bearing, with a sampling frequency of 10 Hz and 600 points per minute.
The test dataset used for training in this article is shown in Table 1.

Feature Extraction
Calculate the entropy of the bearing vibration acceleration data and align the temperature data with the vibration acceleration data. The Tsallis entropy with q = 2, kurtosis, Shannon entropy, and Renyi entropy were extracted from the same full-life of the bearing vibration signal.
As shown in Figure 3, it can be seen that Shannon entropy and Renyi entropy are not monotonic, and they also fluctuate greatly during the full-life signal period. The trend of the kurtosis value is not obvious enough, and there are some values larger than the failure value, where the sample number ranges from 450 to 900. Those cannot accurately express the performance of the bearing degradation. The Tsallis entropy exhibits monotonic characteristics during the whole life cycle, but also begins to degrade when the bearing performance is not very well (sample number = 1500 in Figure 3) and the numerical stability of the Tsallis entropy is good with bearing degradation. monotonic, and they also fluctuate greatly during the full-life signal period. The tren the kurtosis value is not obvious enough, and there are some values larger than failure value, where the sample number ranges from 450 to 900. Those cannot accura express the performance of the bearing degradation. The Tsallis entropy exhi monotonic characteristics during the whole life cycle, but also begins to degrade w the bearing performance is not very well (sample number = 1500 in Figure 3) and numerical stability of the Tsallis entropy is good with bearing degradation. The bearing temperature signal is resampled to make its length the same as vibration signal; the maximum value is extracted, and the three groups of signals normalized respectively to transform it into a dimensionless expression. Tak Bearing1_1 as an example, its characteristic expression is shown in Figure 4. The bearing temperature signal is resampled to make its length the same as the vibration signal; the maximum value is extracted, and the three groups of signals are normalized respectively to transform it into a dimensionless expression. Taking Bearing1_1 as an example, its characteristic expression is shown in Figure 4.

The Automatic State Number Determination of the Markov Model
The automatic partition method is performed on the feature data of condit 1-bearing 1 in the dataset. The input feature is [Tsallis 1, Tsallis 2,Temperature], and result is shown in Figure 5.

The Automatic State Number Determination of the Markov Model
The automatic partition method is performed on the feature data of condition 1-bearing 1 in the dataset. The input feature is [Tsallis 1, Tsallis 2, Temperature], and the result is shown in Figure 5.

The Automatic State Number Determination of the Markov Model
The automatic partition method is performed on the feature data of condition 1-bearing 1 in the dataset. The input feature is [Tsallis 1, Tsallis 2, Temperature], and the result is shown in Figure 5. . That is, the degradation process of the bearing can be divided into four stages, which can be considered as the running-in stage, the normal operation stage, the degradation failure stage, and the complete failure stage. It is consistent with the actual use, which fully verifies the validity and scientific rationality of the multivariate automatic partition algorithm. . That is, the degradation process of the bearing can be divided into four stages, which can be considered as the running-in stage, the normal operation stage, the degradation failure stage, and the complete failure stage. It is consistent with the actual use, which fully verifies the validity and scientific rationality of the multivariate automatic partition algorithm.
The complete failure stage is regarded as the complete failure state; the running-in stage, the normal operation stage, and the degenerate failure stage are denoted as the state 1, the state 2, and the state 3 in this article.
Different bearings have different degradation trajectories. Samples are aligned based on the slope-based dynamic time warping (DTW) method to obtain more accurate training data, according to the change trend of the degradation amount, not the degradation amount. Taking the alignment process of Bearing1_1 and Bearing1_2 as an example, the optimal path for slope alignment, the slope alignment result and sample alignment results are shown in Figures 6-8 The complete failure stage is regarded as the complete failure state; the running-in stage, the normal operation stage, and the degenerate failure stage are denoted as the state 1, the state 2, and the state 3 in this article.
Different bearings have different degradation trajectories. Samples are aligned based on the slope-based dynamic time warping (DTW) method to obtain more accurate training data, according to the change trend of the degradation amount, not the degradation amount. Taking the alignment process of Bearing1_1 and Bearing1_2 as an example, the optimal path for slope alignment, the slope alignment result and sample alignment results are shown in Figures 6-8, respectively.     However, not every bearing will go through these four degradation stages. In t tested samples, as shown in Figure 9, the Bearing2_4 fails directly without obvio degradation stage. The break-in stage of Bearing2_4 is [1,402], the degradation failu stage is [403,740], and the complete failure stage is [741,751]. It can be concluded that t slope-based dynamic time warping method performed very well for samples that do n fully conform to the degenerate process. It avoids the negative impact caused confusion of different degradation stages during Markov model training. T segmentation result of dataset is listed in Table 2. However, not every bearing will go through these four degradation stages. In the tested samples, as shown in Figure 9, the Bearing2_4 fails directly without obvious degradation stage. The break-in stage of Bearing2_4 is [1,402], the degradation failure stage is [403,740], and the complete failure stage is [741,751]. It can be concluded that the slope-based dynamic time warping method performed very well for samples that do not fully conform to the degenerate process. It avoids the negative impact caused by confusion of different degradation stages during Markov model training. The segmentation result of dataset is listed in Table 2. tested samples, as shown in Figure 9, the Bearing2_4 fails directly without obvi degradation stage. The break-in stage of Bearing2_4 is [1,402], the degradation fai stage is [403,740], and the complete failure stage is [741,751]. It can be concluded that slope-based dynamic time warping method performed very well for samples that do fully conform to the degenerate process. It avoids the negative impact caused confusion of different degradation stages during Markov model training. segmentation result of dataset is listed in Table 2.

Reliability Evaluation
In order to verify the validity and accuracy of the state division of mechanical devices of rail transit trains proposed in this article, Bearing1_1, Bearing1_2, Bearing2_1, Bearing2_4, and Bearing3_1 datasets are used for training, and other bearing data are used for testing.
In the time-varying Markov process model, it is assumed that the state transition probability obeys the probability density function of mixed Gaussian distribution, and the probability distribution function of state sojourn time obeys the probability density function of a single Gaussian distribution. Based on the automatic partition method, the state number of the time-varying Markov model is 4, that is N = 4.
The maximum number of iteration steps is set to be 1000, and the stop criteria of the error convergence is 1 × 10 −5 . The initial matrix of mutual conversion between each state is shown in Table 3. The average sojourn time in each state can be obtained based on training datasets; its result is shown in Table 4. A 4-state, time-varying Markov process reliability and life prediction model is obtained. The trained time-varying Markov model is updated when a new monitoring data sample is arrived. The bearing staying in state 1 for 10 h is taken as an example. After feature extraction of the monitoring data sample, the data is sent into the trained timevarying Markov model, and state transition matrix is updated at this time. At this time, the state transition matrix is shown in Table 5, while the mean and variance of the state sojourn time are shown in Table 6. The bearing reliability at current time is R(i = 1, st 1 = 10) = 1 − 0.0003 = 0.9997.
To demonstrate the adaptability of the proposed model to changes in time, we also show the model changing parameters when the bearing stays in state 2 for 40 h. At this time, the state transition matrix is shown in Table 7, while the mean and variance of the state sojourn time are shown in Table 8.  The bearing reliability at current time is R(i = 2, st 1 = 40) = 1 − 0.1179 = 0.8821. For the bearing used in this experiment, its effective life is the sum of sojourn time in state 1, 2, and 3. When the bearing reaches state 4, it has completely failed. Compared with the hidden Markov model (HMM) algorithm, the prediction result of Bearing1_1 is shown in Figure 10. The lifetime prediction of the HMM presents an obvious step-descent characteristic. Owing to the stable state transition matrix of HMM, the remaining useful life would not change when device performance is detected into a certain state. This characteristic makes the HMM seriously inconsistent with the actual situation. In contrast, the predicted lifetime of the proposed algorithm in this paper has higher accuracy than HMM, which is in line with the gradual decline with time, since the state transition probability matrix has been designed to update over time theoretically. In addition, the proposed algorithm becomes more and more accurate as the bearing performance gradually degrades; that is, the results of predicted life are more accurate in the later stage. It should be emphasized that there is also a slight step-descent as the red line shown in Figure 10. Because the state transition probability matrix is not changing over time in practice, it is only updated when a new monitoring data sample has arrived. The shorter the interval of condition monitoring data, the smoother the prediction curve would be.

Conclusions
The proposed automatic partition time-varying Markov model has two main contributions on real time reliability evaluation, based on the Markov process. Firstly, the state number of Markov processes is selected scientifically on the basis of the device's performance degradation stages. This work is realized automatically, based on the characteristics of the degradation curve, with full use of the multi-source monitoring data. It is done without manual intervention, and the state number is obtained by taking full advantage of multidimensional information, instead of by only a single threshold. This work builds a good foundation for a time-varying Markov model. The other main contribution is the time-varying Markov model. It describes the timely changing degradation process of device performance. The polymorphic real-time reliability assessment and remaining useful life prediction are achieved in this procedure. From the experiment, the proposed algorithm demonstrates a strong capability for data changes and great potential in accurately performing degradation prediction. This model would be more accurate if the monitoring data points are provided more densely.
There are still some aspects left for improvement of this work. A good feature is the In contrast, the predicted lifetime of the proposed algorithm in this paper has higher accuracy than HMM, which is in line with the gradual decline with time, since the state transition probability matrix has been designed to update over time theoretically. In addition, the proposed algorithm becomes more and more accurate as the bearing performance gradually degrades; that is, the results of predicted life are more accurate in the later stage. It should be emphasized that there is also a slight step-descent as the red line shown in Figure 10. Because the state transition probability matrix is not changing over time in practice, it is only updated when a new monitoring data sample has arrived. The shorter the interval of condition monitoring data, the smoother the prediction curve would be.

Conclusions
The proposed automatic partition time-varying Markov model has two main contributions on real time reliability evaluation, based on the Markov process. Firstly, the state number of Markov processes is selected scientifically on the basis of the device's performance degradation stages. This work is realized automatically, based on the characteristics of the degradation curve, with full use of the multi-source monitoring data. It is done without manual intervention, and the state number is obtained by taking full advantage of multidimensional information, instead of by only a single threshold. This work builds a good foundation for a time-varying Markov model. The other main contribution is the time-varying Markov model. It describes the timely changing degradation process of device performance. The polymorphic real-time reliability assessment and remaining useful life prediction are achieved in this procedure. From the experiment, the proposed algorithm demonstrates a strong capability for data changes and great potential in accurately performing degradation prediction. This model would be more accurate if the monitoring data points are provided more densely.
There are still some aspects left for improvement of this work. A good feature is the basis of the whole model. In this work, the feature was manually designed on the vibration signal. Its quality depends entirely on human experience, and this step is also time-consuming. In addition, the assessments procedure could be simplified in the future. Some migration algorithms may be considerable approaches to this problem.
Author Contributions: L.K. contributed to the conceptualization, methodology, formal analysis, and writing. B.C. designed and performed the research and approved the final manuscript. Y.C. analyzed the data, interpreted the results, and revised the manuscript. Y.Q. worked on conceptualization, formal analysis, and final writing review. All authors have read and agreed to the published version of the manuscript.