Three-Stage Wiener-Process-Based Model for Remaining Useful Life Prediction of a Cutting Tool in High-Speed Milling

Tool condition monitoring can be employed to ensure safe and full utilization of the cutting tool. Hence, remaining useful life (RUL) prediction of a cutting tool is an important issue for an effective high-speed milling process-monitoring system. However, it is difficult to establish a mechanism model for the life decreasing process owing to the different wear rates in various stages of cutting tool. This study proposes a three-stage Wiener-process-based degradation model for the cutting tool wear estimation and remaining useful life prediction. Tool wear stages classification and RUL prediction are jointly addressed in this work in order to take full advantage of Wiener process, as this three-stage Wiener process definitely constitutes to describe the degradation processes at different wear stages, based on which the overall useful life can be accurately obtained. The numerical results obtained using extensive experiment indicate that the proposed model can effectively predict the cutting tool’s remaining useful life. Empirical comparisons show that the proposed model performs better than existing models in predicting the cutting tool RUL.


Introduction
Cutting tool plays an important role in the machining system and its wear causes an increase in friction and heat generation in the machining process. Wear of cutting tools not only affects the quality of machined surface and the machining precision but also results in increasing machining cost. Moreover, unnecessary tool replacement that aims at preventing the decrease in surface quality will increase the downtime and machining cost in high-speed milling. Tool condition monitoring can be employed to ensure safe and full utilization of a cutting tool. Hence, the remaining useful life (RUL) prediction of a cutting tool is an important issue for an effective high-speed milling process-monitoring system.
In the past few decades, various studies have been devoted to predicting the cutting tool RUL in both academia research and industry. According to the preliminary research, these studies were developed based on the consideration of physics-based and data-driven models. The Taylor model [1] and the Forman crack growth model [2] are the typical physics-based models. Although these physics-based methods can accurately describe the degradation process of equipment and require less training data, it is very difficult if not impossible to build the exact physical model because of heavy dependence on the expert knowledge and the degradation law of equipment. Recently, intensive research has been conducted on the utilization of data-driven models, which was regarded as an effective tool for the RUL estimation of various degrading systems. Machine learning applications and statistical learning applications in the field of data-driven models have been reported to outperform the traditional physics-based methods. For instance, Benkedjouh et al. [3] extracted the features form vibration, AE and force signals and then utilized the support vector regression (SVR) to predict the tool RUL. Sun et al. [4] evaluated the RUL of a cutting tool based on the result of operational reliability assessment and back propagation neural network. Zhou et al. [5] used the long short-term memory (LSTM) model to solve complex correlation and memory accumulation effects and then established the RUL prediction model under different conditions. Babu et al. [6] proposed a convolutional neural network (CNN)-based regression approach, and then learned the features and estimated the RUL by the supervised feedback. Zhang et al. [7] proposed a deep belief network (DBN)based multi-objective ensemble method for the RUL estimation in prognostics, and the experimental results demonstrated the superiority of the proposed method compared to the existing approaches. An et al. [8] developed a hybrid model by combining the CNN with the LSTM network and then predicted the RUL with sequence tool wear data. The proposed model proves to be efficient in tool wear evolution tracking and the RUL prediction accuracy reached up to 90%. Though the models in the above cited work proved to be effective in real applications, they ignore the correlation characteristics in the timeseries wear data and cannot sufficiently model the tool wear process. In order to address this issue, the hidden Markov model (HMM) and its various variants, one of the most commonly used statistical learning techniques, have been comprehensively employed to describe the dynamic evolution of wear process due to the powerful ability in processing the time-series data. Geramifard et al. [9] proposed a physically segmented HMM to build the relationship between hidden stages with the real health stages of a cutting tool and the provided relationship was further used for diagnostics and prognostics. Following the study of [9], Geramifard et al. [10] proposed a multimodal HMM-based approach for tool wear monitoring. Yu et al. [11] developed a weighted HMM approach, which takes the wear rate as the hidden state. Then, they predicted the RUL and estimated the tool wear during tool operation. The experiment results show that their approach outperforms the conventional HMM approach. Zhu and Liu [12] utilized the hidden semi-Markov model (HSMM) to model the complex tool wear process and then estimated the tool wear and predicted the RUL of tool with a forward algorithm. The experiment results show that the proposed method leads to more accurate RUL prediction in high-speed milling.
Although these aforementioned methods appear to obtain not so bad RUL estimation results, it is still difficult to describe the nonmonotonic dynamic characteristics of system. Fortunately, the Wiener process has shown to be the optimal degradation process modeling tool for explaining the physical behavior of a dynamic system, due to its excellent mathematical properties and physical interpretations [13]. Ghorbani and Salahshoor [14] developed a degradation model for the engine fault of turbofan by combining a physicsbased model and a Wiener process with positive drift. Li et al. [15] considered the various degradation processes of different units and developed a Wiener-process-model-based method for RUL prediction. For a complicated dynamic system, stochastic behavior is inevitable due to multiple sources of variability, which contribute to the uncertainty of the RUL estimation. Therefore, the effect of these uncertainties and different kinds of variability should be incorporated into a wear model to improve the accuracy of the RUL estimation. Tsai et al. [16] analyzed that some quality characteristics (QC) whose degradation over time can be related to the reliability of the product and established the degradation model by taking this variability into account. Sun et al. [17] modeled the tool wear process of a cutting tool with the Wiener process considering the measurement variability and then estimated the RUL of cutting tool.
The above research studies ignore the mechanism of degradation and regard it as an overall process. However, the degradation process (e.g., rotating machinery, lithium-lon batteries) exhibits multiple stages characteristics in practice due to the environmental working condition, internal materials and so on. Lim et al. [18] proposed a two-phase accelerated degradation test to efficiently derive product lifetime information in the development stage for new products, when reliability-related information is generally limited and there is limited sample availability. Zhang et al. [19] proposed a multi-phase stochastic degradation model based on the Wiener process and then used this method for the RUL prediction of lithium-lon batteries. Chen and Tsui [20] predicted the RUL of rotational bearings with a two-phase model, which characterizes and determines different stages of the degradation process. Wen et al. [21] proposed a flexible Bayesian multiple-phase modeling approach to characterize degradation signals for prognosis. Later, Wen et al. [22] extended the previous work and developed a multiple change-point Wiener process model for RUL prediction.
According to the literature review given above, the aim of this study is to develop a three-stage Wiener-process-based degradation model that can predict the tool's remaining useful life in high-speed milling processes. First, the time domain, frequency domain and time-frequency domain features are extracted from raw sensor data (i.e., AE, cutting force, vibration), and then the stacked denoising autoencoder (SDAE) is utilized to automatically select the most relevant features. Second, tool wear stages and corresponding wear value are estimated with the extreme learning machine (ELM). Third, the degradation process of each wear stage is established based on Wiener process and the overall useful life is estimated with the stage RUL prediction. Additionally, multi-source variabilities (i.e., the inherent temporal variability of wear path, individual variability of machining condition or measurement variability of sensors) are considered and quantified in the modeling process of each stage, which helps describe the physical behavior of tool wear process more precisely and significantly improves the accuracy of RUL prediction.

Motivation
In most existing research studies, the degradation mechanism was assumed as stationary over the entire life of a cutting tool, which was characterized with a fixed model. However, the degradation process of a cutting tool exhibits multiple-stage characteristics in practice due to the environmental working condition, internal materials and so on. It can be seen from Figure 1a that a typical degradation process of a cutting tool can be divided into three stages depending on the wear rate, namely slight wear stage (Figure 1b), medium wear stage ( Figure 1c) and severe wear stage (Figure 1d). stochastic degradation model based on the Wiener process and then used this method for the RUL prediction of lithium-lon batteries. Chen and Tsui [20] predicted the RUL of rotational bearings with a two-phase model, which characterizes and determines different stages of the degradation process. Wen et al. [21] proposed a flexible Bayesian multiplephase modeling approach to characterize degradation signals for prognosis. Later, Wen et al. [22] extended the previous work and developed a multiple change-point Wiener process model for RUL prediction.
According to the literature review given above, the aim of this study is to develop a three-stage Wiener-process-based degradation model that can predict the tool's remaining useful life in high-speed milling processes. First, the time domain, frequency domain and time-frequency domain features are extracted from raw sensor data (i.e., AE, cutting force, vibration), and then the stacked denoising autoencoder (SDAE) is utilized to automatically select the most relevant features. Second, tool wear stages and corresponding wear value are estimated with the extreme learning machine (ELM). Third, the degradation process of each wear stage is established based on Wiener process and the overall useful life is estimated with the stage RUL prediction. Additionally, multisource variabilities (i.e., the inherent temporal variability of wear path, individual variability of machining condition or measurement variability of sensors) are considered and quantified in the modeling process of each stage, which helps describe the physical behavior of tool wear process more precisely and significantly improves the accuracy of RUL prediction.

Motivation
In most existing research studies, the degradation mechanism was assumed as stationary over the entire life of a cutting tool, which was characterized with a fixed model. However, the degradation process of a cutting tool exhibits multiple-stage characteristics in practice due to the environmental working condition, internal materials and so on. It can be seen from Figure 1a that a typical degradation process of a cutting tool can be divided into three stages depending on the wear rate, namely slight wear stage (Figure 1b   Motivated by this phenomena, three models were developed to characterize the features into different stages. It can be seen from Figure 1b that tool wear presents a sharp rate in the first stage, which can be fitted with power law. In the second stage depicted in Figure 1c, the wear rate becomes slow and can be covered with an integrated powerexponential law. When it turns into the third stage in Figure 1d, the wear rate gets faster and exponential law can be utilized to describe the degradation process. The basic law of these models can be formulated as follows: where t and X(t) denote the cutting pass and corresponding wear value, a 1 , b 1 , k 1 , a 2 , b 2 , c, d, k 2 , a 3 , b 3 , and k 3 are model parameters of different wear stages, τ 1 and τ 2 are the boundaries of the three stages.

Model Formulation
In practice, the wear process of a cutting tool usually exhibits a stochastic behavior with nonlinearity and multiple variability sources. The Wiener process has proved to be effective in characterizing the degradation signals due to its great mathematical properties. In order to quantify the multi-source variabilities (e.g., inherent temporal variability of wear path and individual variability of machining condition) and characterize the degradation process of each stage, a three-stage Wiener-process-based model considering multi-source variabilities was proposed in this subsection.
Let {S(t), t ≥ 0} denote the degradation state of tool wear over time t with the initial value S(0) = s 0 , and then the stochastic process can be expressed as follows: where the stochastic process is driven by a standard Brown Motion B(t) with a nonlinear drift term t 0 g(τ; θ)dτ to characterize the nonlinearity in tool wear process. σ B is the diffusion coefficient, which represents the inherent temporal variability of wear path. Typically, the formulation of the drift term, which describes the individual variability of different machining conditions, can be expressed as g(t; θ) = abt b−1 , g(t; θ) = ab exp(bt) or g(t; θ) = abt b−1 + cd exp(dt). In the proposed model, the nonlinear drift terms were set with different laws in each wear stages. Combining the segmented model in Equation (1) and Wiener process in Equation (2), the proposed three-stage Wiener-process-based model can be formulated as follows: where ω k ∼ N(0, 1), and a, b, c, d are the parameters of drift term, and satisfied the follow constraints: To define the lifetime and predict the RUL, a concept of first hitting time (FHT) is adopted. According to the concept of FHT, the lifetime T can be defined as: where ω is the failure threshold of the cutting tool. The probability density function (PDF) of lifetime T defined by Equation (6) can be obtained as follows: where I B (t) = (ω − t 0 g(τ; θ)dτ)/σ B . The RUL L k at time t k can be obtained through time and failure threshold translation and the formulation can be expressed as follows: Usually, the degradation state of a cutting tool cannot be measured directly, but formulated with the health indicator (HI) generated from the observable sensor data. The relationship between degradation state and HI can be expressed as follows: where O(t) is HI, and ε(t) is the random measurement error with ε(t) ∈ N(0, γ 2 ), which describes the effect of measurement variability of sensing. φ(S(t); ξ) denotes the relationship between HI and underlying degradation states, which can be formed as: where β 0 and β 1 are the parameters of the linear model. To estimate the underlying degradation state S k , the Kalman filter and Rauch-Tung-Striebel smoother (RTS) are introduced into the proposed model. Table 1 shows the steps of Kalman filter algorithm. Table 1. Computational flow of Kalman filter algorithm.
Step 1: Set the parameters θ, σ B , ξ, γ Step 2: Estimate the stateŜ k|k−1 and variance P k|k−1 Step 4: Update the state and variancê After the Kalman filter algorithm, RTS is used to obtain the optimal estimation of the preceding conditional expectations. The smoothing algorithm is summarized as follows in Table 2. Table 2. Computational flow of RTS smoothing algorithm.
Step 1: Forward iteration through Kalman filter and obtain the optimal estimationŜ k|k and P k|k .
Step 2: Optimal smoothing estimation of backward iteration

RUL Prediction Framework
The RUL prediction framework is presented in this section and the overall architecture of this framework is depicted in Figure 2. As it can be seen in Figure 2, the time domain, frequency domain and time-frequency domain features are extracted from raw sensor data (i.e., AE, cutting force, vibration) and then SDAE is utilized to automatically select the most relevant features. Next, feature vector is set into ELM to classify tool wear stages and estimate the corresponding wear value. After that, the degradation process of each wear

RUL Prediction Framework
The RUL prediction framework is presented in this section and the overall architecture of this framework is depicted in Figure 2. As it can be seen in Figure 2, the time domain, frequency domain and time-frequency domain features are extracted from raw sensor data (i.e., AE, cutting force, vibration) and then SDAE is utilized to automatically select the most relevant features. Next, feature vector is set into ELM to classify tool wear stages and estimate the corresponding wear value. After that, the degradation process of each wear stage is established based on Wiener process and the estimated wear value is regarded as the observable health indicator. At last, the overall useful life can be estimated with the stage RUL prediction.

Feature Extraction
Feature extraction is conducted to reduce noise interference and remove irrelevant signals from the original signals. Statistical features in the time domain and frequency domain are extracted from force signals, vibration signals in each direction (X, Y and Z) and acoustic emission signals. In total, 16 kinds of time domain features namely mean, standard deviation, variance, peak value, peak to peak, root mean square error (RMSE), skewness, kurtosis, kurtosis factor, average absolute value, shape factor, crest factor, impact factor, margin factor, skewness factor and bias factor are extracted from seven channels, respectively. A total of 112 features can be obtained and these features are listed in Table 3.
Eight frequency domain features are also extracted from raw sensor signals. These statistical features are listed in Table 4. In Table 4, p i represents the ith spectrum of the sensor signal f (t), n is the number of spectra and f i is the frequency of the ith spectrum.
The wavelet packet decomposition (WP) is a kind of time-frequency domain analysis tool and is propitious to analyze the non-stationary and time-varying signal. It can decompose the raw signal into multiple levels and each level consists of several frequency bands, in which abundant information can be obtained. In this study, three channels of force, three channels of vibration and one channel acoustic emission signals are decomposed into seven levels by WP. The energy of a signal can be expressed as Equation (11), in which t k is the wavelet packet coefficient [23]. The summary energy of all frequency bands at the jth level can be expressed as Equation (12). Furthermore, the normalization of energy can be expressed as Equation (13). According to Equation (13), seven time-frequency domain features can be obtained in total. E j,n = ∑ k c j,n,k 2 (11) To deal with these problems, SDAE is utilized for the feature dimensionality reduction in this study, which is composed of two DAEs. The raw features with high dimensionality are used as the input in the first DAE. The low-dimensional and representative features can be obtained robustly based on the hidden layer of the second DAE. To employ the SDAE for feature dimensionality reduction, the relevant parameters are set as follows: learning rate, 1.0; sparsity penalty, 0.05; activation function, sigmoid; input zero masked fraction, 0.5. In this study, to obtain the most representative features, the selection of node sizes in the SDAE is researched. The node number of the hidden layer is chosen from 10 to 160 with the interval of 10 in the first DAE and from 5 to 50 with the interval of 10 in the second DAE, respectively. Considering the computation time and robustness, we finally recommend the node sizes to be 30 and 15 in the first and second DAE, respectively.

Tool Wear Stage Classification and Health Indicator Construction
Current wear stages are determined and health indictor is constructed based on the feature vector. Conventionally, this step can be conducted with a machine-learningbased or deep-learning-based algorithm. In this subsection, we adopt ELM to perform the classification and regression task, while other intelligent algorithms (e.g., SVM, BP Network, RVM, etc.) can also be used as an alternative.
ELM was first proposed by Huang in 2006 [24] and then widely used for classification problems. However, the abovementioned method is only suitable for binary classification, while the tool wear stages in this study are more than two. To address this problem, pairwise coupling was utilized to estimate the probabilistic outputs of multi-class [25]. Suppose that there are C different wear states, then the multi classification problem is transformed to C(C − 1)/2 binary classification problems. For the cth binary classification problem, the probability of i class and j class can be written as: where P(.|.) is the probability output of ELM with sigmoid function. By fusing the probabilities of C(C − 1)/2 binary classification problems, the probability of i class in multi classification can be expressed as: Then the probability p i can be obtained by solving the following objective function: The objective function of Equation (16) can be rewritten as where Finally, probabilistic outputs P= {p 1 , p 1 , · · · , p C } can be solved by the following matrix Q e e T 0 where b is the Lagrangian multiplier of the equality constraint in Equation (16), e is the C × 1 vector of all ones and 0 is the C × 1 vector of all zeros. By ranking the probabilistic outputs, we can obtain the current wear stage. After classifying the tool wear stage, the tool wear value can be estimated with regression ability of ELM. Moreover, the estimated tool wear value can be used as the observable health indictor.

Parameter Estimation and RUL Prediction
In this subsection, the initial parameters of the proposed model are first estimated with the historical sensor data and wear label and then updated online with the new sensory data. Benefit from the offline estimation, the parameters that update by our algorithm can converge quickly and obtain remarkable performances.

Offline Estimation of Initial Model Parameters
According to the proposed three-stage mode in Section 2, we can obtain the state-space model of wear process: Supposing there are N different cutting tools and each cutting tool contained the same measure time in the historical tool wear data. Let S n = [s n,1 , s n,2 , · · · , s n,m ] T denotes the nth cutting tool with m wear stages, and the incremental data is ∆S n = [∆s n,1 , ∆s n,2 , · · · , ∆s n,m ] T . Based on the incremental dataset and maximum likelihood estimation (MLE), the loglikelihood function of Θ = [a, b, c, d, σ B ] can be written as follows: For the convenience of calculation and simplification of expression, we define two intermediate variables P i and Q i as follows: Then, the log-likelihood function can be expressed as: (∆s n,i − (aP i + cQ i )) 2 (23) To estimate the parameters Θ, we first fix the parameters b, c, d and then take the partial derivatives of Equation (23) to parameters a and σ 2 B .
∂ ln(L(θ |S )) ∂a Then the estimated parametersâ andσ 2 B can be obtained by zeroing Equations (24) and (25), respectively.â The log-likelihood function of parameters b, c, d can be deduced by taking Equations (26) and (27) into Equation (23), and the parameters b, c, d can be calculated by maximizing Equation (28) with a multidimensional search algorithm.
Furthermore, the estimation parameters a and σ 2 B can be obtained by substituting the estimated parameters b, c, d into Equations (26) and (27), respectively.

Online Updating of Model Parameters
Φ= [Φ 1 , Φ 2 ] is the model parameter vector that need to be updated online with the real-time wear data. The parameter vector consists of two parts, i.e., Φ 1 = [a, b, c, d, σ B ] T and Φ 2 = [β 0 , β 1 , γ] T , corresponding to the state equation and observation equation in the statespace model, respectively. In this study, the EM algorithm is used to estimate the parameter vector online because the underlying wear states of the cutting tool cannot be observed directly. It is worth mentioning that the parameters estimated based on the historical tool wear data in Section 3.3.1 are used as the initial parameters in the online updating of model parameters. According to the Bayesian chain principle, the log-likelihood function of underlying wear state S 1:k and the wear indicator O 1:k constructed by the observable sensor data can be given by: Substituting the model parameters and omitting the variables that are unrelated to the estimated parameters, we can obtain the expectation of log-likelihood function.
Then, the estimation procedure consists of two steps: expectation step (E-step) and maximization step (M-step).
• E-Step: Calculating the expectation of log-likelihood function based on the jth iteration.
where the intermediate variablesŜ i|k ,Ŝ i−1|k P i|k , P i−1|k and P i,i−1|k can be calculated through the RTS algorithm presented in Section 2. The details of these variables are as follows: Then A i and B i can be expressed by the above intermediate variables.
Equation (30) can be divided into two parts: The parameters Φ 1 and Φ 2 can be estimated by using the MLE that was introduced in Section 3.3.1 and the specific expression of these parameters are given by

RUL Prediction
After the model parameter estimation, the overall lifetime and remaining useful life to a specific moment and their corresponding PDFs can be obtained.

• PDF of lifetime
The PDF of lifetime can be described as follows: Model parameters, underlying wear stateŜ k|k and its variance P k|k can be updated by the EKF once the new observable data are available. With the wear state and up-dated parameters, the analytical form of the RUL distribution at time t k can be written as Equation (7) based on the law of total probability.

Experimental Study
In this section, the proposed tool wear model is demonstrated with a practical experimental study based on the dataset from the 2010 Prognostic and Health Management (PHM) competition (Prognostics and Health Management Society 2010) [26].

Experiment Set-Up and Data Acquisition
In the conducted experiment as shown in Figure 3, a high-speed CNC milling machine with a three-flute ball-nose cutter was used to mill the workpieces (material: Inconel 718). During the cutting process, the spindle rotation speed was set as 10,400 r/min and the feed rate was set as 1555 mm/min, while the cutting width of Y direction (radial) and the cutting depth of Z direction (axial) were set as 0.125 mm and 0.2 mm, respectively. A quartz threechannel dynamometer was mounted on the CNC milling machine to measure the cutting force. Simultaneously, vibration signal along the same three directions (x, y and z) was measured by a piezo-accelerometer. Moreover, an acoustic emission sensor was mounted on the workpiece to capture the high-frequency stress wave generated by the cutting process. All the signals collected above were amplified by the charge amplifier and eventually converted into voltage signals. Then, the voltage signal of seven channels was sampled with 50 kHz. Additionally, the explicit tool wear value was measured by the microscope in an offline way after each cutting. Finally, three groups of experiments (Cutter #1, Cutter #4 and Cutter #6) were conducted. Each cutting process lasted for 315 passes and each cutting pass was four seconds. The raw data consist of two parts: true wear value measured with a microscope and several channels' sensory data. Figure 4 depicts the several channels' sensory data at a cutting pass and the true wear value of the entire life of Cutter #1.

Feature Selection with SDAE
In this study, to obtain the most representative features, the selection of node sizes in the SDAE was researched. The node number of the hidden layer was chosen from 10 to 160 with the interval of 10 in the first DAE and from 5 to 50 with the interval of 10 in the second DAE, respectively. Given different combinations of node sizes, the pre-classify accuracy results can be obtained and presented in Figure 5. As seen in Figure 5, the classification accuracy rate can approach the maximum when the nodes sizes in the first and second DAE were set at 30 and 15, respectively. It can be also observed from Figure 5 that the accuracy rate is relatively high when the nodes sizes in the first and second DAE were set at 90 and 35. However, more features will improve computing time. Thus, the node sizes in the hidden layers of the first and second DAE were recommended to be 30 and 15, respectively.

Feature Selection with SDAE
In this study, to obtain the most representative features, the selection of node sizes in the SDAE was researched. The node number of the hidden layer was chosen from 10 to 160 with the interval of 10 in the first DAE and from 5 to 50 with the interval of 10 in the second DAE, respectively. Given different combinations of node sizes, the pre-classify accuracy results can be obtained and presented in Figure 5. As seen in Figure 5, the classification accuracy rate can approach the maximum when the nodes sizes in the first and second DAE were set at 30 and 15, respectively. It can be also observed from Figure 5 that the accuracy rate is relatively high when the nodes sizes in the first and second DAE were set at 90 and 35. However, more features will improve computing time. Thus, the node sizes in the hidden layers of the first and second DAE were recommended to be 30 and 15, respectively. that the accuracy rate is relatively high when the nodes sizes in the first and second DAE were set at 90 and 35. However, more features will improve computing time. Thus, the node sizes in the hidden layers of the first and second DAE were recommended to be 30 and 15, respectively.

Experimental Results
In this study, each group of experiment data describe the whole wear process of a cutter and can be divided into three stages according the to the Taylor tool life curve [12]. We randomly select two groups of experiment data as the training set and the left one as

Experimental Results
In this study, each group of experiment data describe the whole wear process of a cutter and can be divided into three stages according the to the Taylor tool life curve [12]. We randomly select two groups of experiment data as the training set and the left one as a testing set. Therefore, we can obtain three groups of validation results (i.e., Cutter #1, Cutter #4, and Cutter #6). In order to compare the effectiveness of the proposed model, three other models (i.e., M1 [17], M2 [27] and M3 [28]) from previous studies, which follow the power law, exponential law and an integrated power-exponential law, respectively, are also conducted with the same date.

Model Parameter Estimation
Parameters of the proposed three-stage Wiener-process-based model were estimated with the experimental data. The key model parameters of each stage are list in Table 5.
According to the proposed model in Section 2, the three stage list in Table 5 can be expressed as follows:

RUL Prediction Result
The whole life of each cutter is 315 cutting passes and each cutting pass lasts four seconds. RUL prediction result of three cutters with our proposed model and three other models are present in Figure 6. The prediction results with the proposed three-stage model are much closer to the actual RUL in every wear stage than in the other three models. The performances of the other three models vary in different wear stages. For example, M1 model, built based on the power law, provides poor performances at the first and second stages in both three cutters. Both M2 and M3 models perform better than M1 and present different deviations at the three stages, but there is still a certain gap compared with our three-stage model. These results indicate that our model can accurately predict the RUL of a cutting tool.
To further investigate the superiority of the proposed three-stage Wiener-processbased degradation model, the PDFs of RUL at different wear stages are calculated. Take Cutter #1 as an example, Figure 7 presents the PDFs of RUL, the prediction RULs and actual RULs over a period of cutting passes at three wear stages. As seen in Figure 7, the predicted RULs are closed to the actual RULs and the PDFs calculated with our model can cover the actual RULs at all the three stages. To further investigate the superiority of the proposed three-stage Wiener-processbased degradation model, the PDFs of RUL at different wear stages are calculated. Take Cutter #1 as an example, Figure 7 presents the PDFs of RUL, the prediction RULs and actual RULs over a period of cutting passes at three wear stages. As seen in Figure 7, the predicted RULs are closed to the actual RULs and the PDFs calculated with our model can cover the actual RULs at all the three stages.

Comparison and Evaluation
In order to compare the proposed three-stage model with other models more clearly, the prediction errors are calculated and the error result of Cutter #1 is present in Figure 8. We can observe that the prediction errors using our model keep in a low range throughout the whole process, while the other three models fluctuate greatly. To quantitatively compare the RUL prediction accuracy between the proposed model and the other three models, we list the mean square error (MSE), mean absolute error (MAE) and coefficient of determination ( 2 R ) of estimation error in Table 5. Formulations to calculate the prediction error are defined as follows: To quantitatively compare the RUL prediction accuracy between the proposed model and the other three models, we list the mean square error (MSE), mean absolute error (MAE) and coefficient of determination (R 2 ) of estimation error in Table 5. Formulations to calculate the prediction error are defined as follows: (1) MSE (2) MAE where l k , l k and l denote the predicted RUL, actual RUL and mean actual RUL, respectively. We can observe from Table 6 that our model achieves much better performance than the other three models. To further explore the inherent reasons for the performance gap between the proposed model and the other three models, a comparative study is carried out. PDFs of Cutter #1 were calculated with different models. Figure 9 presents the PDFs of different models with an interval of ten cutting passes from the 230 to 310 cutting passes. It can be seen from Figure 9 that the calculated PDFs using our model are much tighter and higher than the other three models for all the nine cutting passes, which indicates that our model can reduce the uncertainty of RUL prediction when compared with other existing models. Additionally, the estimated RULs are more and more close to the true RUL with the accumulation of sensor data. All these results indicate that the parameters of our model are updated constantly with online sensing data, which causes the proposed model to become more similar to the true degradation model. models with an interval of ten cutting passes from the 230 to 310 cutting passes. It can be seen from Figure 9 that the calculated PDFs using our model are much tighter and higher than the other three models for all the nine cutting passes, which indicates that our model can reduce the uncertainty of RUL prediction when compared with other existing models. Additionally, the estimated RULs are more and more close to the true RUL with the accumulation of sensor data. All these results indicate that the parameters of our model are updated constantly with online sensing data, which causes the proposed model to become more similar to the true degradation model.

Discussion and Conclusions
This study proposes a three-stage Wiener-process-based degradation model for the cutting tool wear and remaining useful life prediction. Taking full advantage of the Wiener process to describe the degradation processes and ELM to classify tool wear stages, joint implementation of tool wear stage classification and RUL prediction is carried out. Moreover, considering multi-source variabilities in the modeling process of each

Discussion and Conclusions
This study proposes a three-stage Wiener-process-based degradation model for the cutting tool wear and remaining useful life prediction. Taking full advantage of the Wiener process to describe the degradation processes and ELM to classify tool wear stages, joint implementation of tool wear stage classification and RUL prediction is carried out. Moreover, considering multi-source variabilities in the modeling process of each stage, the proposed model can describe the physical behavior of the tool wear process more precisely and significantly improve the accuracy of RUL prediction. The numerical results obtained using extensive experiments indicate that the proposed model can effectively predict the tool's remaining useful life in high-speed milling processes. Empirical comparisons show that the proposed three-stage Wiener-process-based model performs better compared to existing models in predicting the tool's RUL while also providing the analytical form of the RUL distribution that is very useful for the health management of the cutting tool.
Although the experimental studies in this study show that our model can accurately predict the RUL and estimate the wear value of the cutting tool, there are still several issues that need to be studies further. Firstly, the three stages of a cutting tool are assumed mutually independent, which leads to the saltatory prediction result at the boundary of different stages. Therefore, the correlation between different stages needs to be considered. Secondly, we mainly concentrate on the gradual and continuous wear process of a cutting tool; however, the sudden failures of the cutting tool (such as worn, tipping and so on) also need to be considered in the practical application. It is worth pursuing this research direction in our future work.