Real-Time Remaining Useful Life Prediction of Cutting Tools Using Sparse Augmented Lagrangian Analysis and Gaussian Process Regression

Remaining useful life (RUL) of cutting tools is concerned with cutting tool operational status prediction and damage prognosis. Most RUL prediction methods utilized different features collected from different sensors to predict the life of the tool. To increase the prediction accuracy, it is often necessary to mount a great deal of sensors on the machine in order to collect more types of signals, which can heavily increase the cost in industrial applications. To deal with this issue, this study, for the first time, proposed a new feature network dictionary, which can enlarge the number of candidate features under limited sensor conditions, and the developed dictionary can potentially contain as much useful information as possible. This process can replace the installation of more sensors and incorporate more information. Then, the sparse augmented Lagrangian (SAL) feature selection method is proposed to reduce the number of candidate features and select the most significant features. Finally, the selected features are input to the Gaussian Process Regression (GPR) model for the RUL estimation. Extensive experiments demonstrate that our proposed RUL estimation framework output performs traditional methods, especially for the cost savings for on-line RUL estimation.


Introduction
In recent years, the fourth industrial revolution (Industry 4.0) has brought about an increase in demand for precision machining and production manufacturing. As one of the typical manufacturing machines, high-speed computer numerically controlled (CNC) milling machines are widely used in industry [1]. Statistics show that CNC milling machines now account for approximately 46% or more of the total modern industrial machining cluster [2]. In order to ensure the safe operation of CNC machines and the quality of the products, cutting tools have a crucial role in the overall CNC machine system. However, cutting tools are usually operated in high-pressure and harsh environments, which can result in wear and tear on the tools. Some of the studies show that the failure rate of cutting tools is higher than 38% among all mechanical failures of CNC machines. When tool wear exceeds a certain threshold, the machined part no longer meets the machining requirements, which in turn causes losses to the plant. Drawing on these insights, there is an urgent need for researchers to develop efficient and cost-effective approaches to estimate the real-time remaining useful life (RUL) of tools in order to free up sufficient time to plan replacements and repairs, which can prevent potential unscheduled machine downtime and shorten the manufacturing cycle time of products [3,4].
At present, a variety of active achievements have been made to estimate the RUL predictions of cutting tools including direct and indirect techniques [5]. Direct methods basically use machine vision systems to capture the images of the cutting cools by means of digital cameras or optical sensors [6][7][8]. For example, Prasad et al. [9] used a pair of stereo images taken from two different locations to measure the depth of wear. Then the measured wear was used as test data to predict flank wear using a back propagation neural network. Lanzetta [10] measured and classified various categories of wear (e.g., flank wear, crater wear, fracture, and breakage) of cutting tool inserts using a vision sensor approach. The approach comprised a camera, an autofocus zoom lens, and different types of illumination. A resolution enhancement algorithm was used by the author which can enhance the resolution of the measured picture to 40 µm/pixel. The advantages of direct methods are that they can provide high identification accuracy for the tool wear measurement. However, they are essentially off-line techniques that often result in unnecessary machine downtime and increased costs due to lost productivity. To address this issue, indirect methods have been extensively investigated which are conducted by measuring parameters that are related to the condition of the cutting tool, i.e., acoustic, vibration, force, and sound [11][12][13]. Compared with direct approaches, indirect methods facilitate the processing of different types of signals and can easily be applied to on-line monitoring, and, therefore, have been a popular topic for estimating the RUL of cutting tools.
The application of indirect methods is based on two stages containing feature extraction and feature-based RUL estimation [1]. In regard to feature extraction, the significant features are extracted from the candidate feature dictionary developed from the collected signals in order to represent the status of the cutting tool. With respect to the feature-based RUL estimation, the relationship between the extracted features and the status of the cutting tool is established to predict the life of the cutting tool. For example, Liu et al. [14]. utilized 14 signal features developed from the collected acoustic emission (AE) signals during the feature extraction stage and input the features to the support vector regression (SVR) for the life prediction during the RUL estimation stage. The results show that the method can achieve a prediction accuracy of up to 94.35%. Hu et al. [15] extracted time-domain features, such as mean, variance, and kurtosis; frequency-domain features, such as spectral kurtosis and spectral skewness; and time-frequency domain features, such as wavelet energy as the significant features; and the Long Short-term Memory (LSTM) network was applied to conduct tool wear prediction. Li et al. [16] utilized 16 features from the force signals during the feature extraction stage and applied a fuzzy neural network for the RUL estimating stage. Wang et al. [17] proposed a multisensory data fusion approach for the feature extraction and SVR for the RUL estimation. As can be seen from the aforementioned literature, to deliver the desired RUL estimation performance of indirect methods, the feature extraction that converts the raw signal into useful knowledge about the health status of cutting tools plays a fundamental role [1]. To extract significant features, the size of the candidate feature dictionary is required large enough to contain sufficient information about the tool state. Nevertheless, this process gives rise to a significant challenge which requires more sensors to be fitted on the machine which can heavily increase the cost.
To solve this issue, in the present study, a new feature network dictionary is proposed during the off-line training stage that enlarges the number of candidate features under limited sensor conditions, and the developed candidate features in the dictionary can potentially contain as much useful information as possible. After that, the sparse augmented Lagrangian (SAL) feature selection method is proposed during the feature extraction stage which can reduce the number of candidate features and select the most significant features [18]. The characteristics of SAL are that it utilizes Split Augmented Lagrangian Shrinkage Algorithm (SALSA) to produce some intermediate features with a sub-sampling technique, and then only the features with high selecting probability are chosen as the final selected features. Therefore, SAL has a very high sparsity and only a very few significant features can be selected. These selected features are often associated with some specific sensors, and only these specific sensors will be used for on-line monitoring. The sensors that are not selected can be removed from the machine, which can save costs for on-line monitoring. Finally, the selected features are input to the Gaussian Process Regression (GPR) model for the RUL estimation. GPR is a non-parametric Bayesian regression method that performs well on small datasets and has the capability to produce uncertainty measurements on the predictions [19]. Promising results have demonstrated the effectiveness of the newly proposed RUL estimation framework including SAL and GPR has high prediction accuracy, demonstrating the advantage over many existing methods.
The main contributions of this article can therefore be summarized as follows.
• First, the feature network dictionary is proposed to extend the number of candidate features under limited sensor conditions in order to contain as much useful information as possible. • Second, a SAL feature selection approach is proposed. It has very high sparsity and only the significant features can be selected. • Finally, during the on-line monitoring, only the selected sensors will be used, and the sensors that are not selected can be potentially removed from the machine creating a favorable opportunity for cost savings in real-world industry applications.
The remainder of this paper is organized as follows. In Section 2, the theoretical background of the feature network dictionary, SALSA, SAL for feature selection, and GPR are discussed in detail. Then, Section 3 is dedicated to a description of the first case study and Section 4 shows the results of the second case study. Section 5 discusses the proposed method in detail. Finally, Section 6 concludes this study.

Feature Network Dictionary
During the machining process, different types of sensors are installed on the machine to monitor the cutting tool status. Suppose x i,j (t), t = 1, ..., T, i = 1, .., I and j = 1, ..., J is the measured raw signal responses from the i-th sensor output during the j-th cutting process, under which the tool wear is denoted as y j . The signal features of x i,j (t) can be used to assess the tool wear severity and are given by i,j , P 2 i,j , ..., P n i,j ], n = 1, ..., N and where the notation n represents the n-th signal feature and N represents the number of signal features for i-th sensor output. P n i,j is the generated feature and P i,j represents the feature matrix enclosing N signal features. If the features from all I sensor outputs are combined in a single matrix, the signal feature dictionary during the j-th cutting process is given by [20] P j =[P 1,j , P 2,j , ..., P I,j ] =[P 1 1,j , ..., P N 1,j , P 1 2,j , ..., P N 2,j , ..., P 1 I,j , ..., P N I,j ], j = 1, ..., J From (3), there are a total of N I features in the signal feature dictionary. To extend the size of the dictionary under the I sensor outputs conditions in order to contain as much useful information as possible, the feature network dictionary is developed where the process of the extension of (3) is an iterative process. Support P l j is the extended dictionary after the l-th extension iteration, and the original feature dictionary (3) can be written as P 0 j = P j =[P 1 1,j , ..., P N 1,j , P 1 2,j , ..., P N 2,j , ..., P 1 I,j , ..., P N I,j ] =[a 1 , a 2 , . . . , a φ ], φ = 1, . . . , N I Define A 1 as an intermediate matrix during the l-th iteration, which records the result of the matrix product, providing The vectorization of the matrix A 1 , denoted V E C(A 1 ), is the 1 × φ 2 row vector: Define D as a process to delete the element which has the same expression of a vector, so D(V E C(A 1 )) can be expressed as where one of the same expression elements of A 1 , such as a 2 a 1 and a 1 a 2 , will be deleted. Based on the permutations and combinations formula [21], the number of elements in . Then, the extended dictionary after the first iteration is where the size of the dictionary P 1 j extends to 1 × (N I + N I(N I+1)
In the same way, after the l-th extension with l > 0 and l ∈ Z + , the extended dictionary can be expressed as where and the dictionary P l j is defined as the feature network dictionary during the j-th cutting process having the size of 1 To facilitate understanding of the process of feature network dictionary generation, Figure 1 further demonstrates the procedure. As a consequence, after a total of J cutting processes, the feature network dictionary can be rewritten as where the feature network dictionary P is a J × N I + ( N I+1

Split Augmented Lagrangian Shrinkage Algorithm (SALSA)
Assume that the tool wear vector is y = [y 1 , ..., y J ] T , then the tool wear estimation model can be expressed as where Θ stands for the model coefficients, and Ξ = ξ 1 , ..., ξ J T is the unmodeled noise.
The feature network dictionary is required to be large enough to contain sufficient information with the aim of giving an accurate estimation of the tool wear. However, the features in the dictionary P are usually redundant and may not be necessary for the tool wear estimation. A sparse model representation is therefore beneficial in this case [22,23].
The sparse solution of y = PΘ + Ξ can be obtained by solving the following l 1 -norm optimization problem [24]: SALSA is applied to solve this equation, which is a variable splitting technique converting the original l 1 -norm minimization problem into a series of subproblems which can be solved separately. Equation (13) can be therefore converted as: where The solution of (14) can be approximated to the weighted l 1 -norm optimization problem (13) with the increase of µ. By applying the augmented Lagrangian method, the optimization problem of (14) is then expressed as: where u is a vector of Lagrange multipliers. Substitute d = u/µ into (15), then the problem is transformed into: The problem (16) is then solved by converting it into three suboptimization problems: solving Θ, u and d individually, and in detail: where Θ k+1 represents the estimation of Θ and λ is the penalty parameter. It can be seen that Θ k+1 is calculated iteratively and the iteration will stop when the nonzero elements of Θ k+1 and Θ k have the same sign and location. Algorithm 1 briefly summarizes the main procedure of the SALSA algorithm.

Algorithm 1 SALSA Algorithm
Input: Model signal y and dictionary P Initialization: sets k = 0, the Lagrange multiplier

Sparse Augmented Lagrangian (SAL) Algorithm for Feature Selection
By applying the SALSA algorithm, the l 1 -norm optimization problem can be solved. However, the results are often not sufficiently sparse. To solve this issue, the SAL algorithm is used to improve the performance of SALSA. Firstly, SAL uses SALSA to generate a series of intermediate models. Secondly, certain features from the model series are selected to build the final model. The detailed procedure of the SAL algorithm is presented as follows: (1) The downsampling method is repeatedly used to process the tool wear vector y and the feature network dictionary P. Specifically, some rows are randomly selected from y to form a new vector as y υ . Similarly, the same rows are also selected from P to form a new matrix denoted as P υ . After that, the intermediate models based on y υ and P υ can be estimated as where υ indicates the υ-th repetition of the downsampling method with υ = 1, ..., Υ, where J s is the number of random subsampling, and κ represents the ratio of the downsampling. (2) SALSA is then applied to calculate Θ υ from the υ-th intermediate model using the downsampling data y υ and P υ .
(3) The set of the selected features by SALSA for the υ-th intermediate model can be expressed as where p ξ ∈ P and Θ ξ ∈ Θ. The notation ξ is the ξ th element of the Θ, and p ξ are the ξ th features. (4) Based on all M υ , υ = 1, ..., Υ, the frequency of each feature being selected in all sets can be calculated. Define selection frequency (SF) as where χ and η are the select times and the select frequency, respectively, of p ξ . (5) Then, the newly selected dictionary P s is defined as where the notation δ indicates a predefined feature selection threshold. This method can discard certain irrelevant features and only keep the features with high SF. The newly selected dictionary can be expressed as (6) Finally, the location of each selected feature in the feature network dictionary P is recorded as Λ.
The final implementation of SAL for feature selection can be summarized in Algorithm 2 and Figure 2.

Algorithm 2 SAL for feature selction
Input: Model signal y, dictionary P, downsampling ratio κ and feature selection threshold δ Output: Selected features P s and the location Λ.

Gaussian Process Regression (GPR) for the Remain Useful Life (RUL) Estimation
After the features are selected by SAL, GPR is applied to conduct the RUL estimation of the cutting tool. Assume the selected features P s and the measured tool wear vector y can be divided into two datasets which are the training datasets used for GPR model training and the validation/testing dataset used for model evaluation. The training datasets are denoted as P α s and y α , respectively, where P α s indicates the selected dictionary and y α indicates the measured tool wear vector. Similarly, the validation/testing datasets are denoted as P β s and y β , respectively. The first step for the RUL estimation is to define a Gaussian kernel by using the training datasets. In the present study, the radial basis function (RBF) is selected as the kernel function because it has outstanding nonlinear ability and infinitely differentiable characteristics [19]. The RBF kernel is defined as [25] where σ f and l are hyperparameters. After that, the predictive mean of the RUL of the validation/testing dataset denoted asȳ β can be derivedȳ with L = C(K(P α s , P α s ) + σ 2 y I) where The function C represents the Cholesky decomposition, and the notation σ y presents the noise estimation parameter relating to the noise level of y.
Then, the predictive variance of the RUL of the validation/testing dataset is represented as where V(.) indicates the variance function. Finally, the log marginal likelihood can be calculated as where the notation Γ = L T \(L\y).

RUL Estimation of the Cutting Tool
The cutting tool RUL estimation can be performed using the following algorithm, which has two parts, including off-line training and on-line RUL estimation.
Firstly, off-line training will extract the significant features from the proposed feature network dictionary, and train the GPR model off-line. The detailed process is as follows: (1) The collected raw signals from different sensor outputs are used to generate a feature network dictionary denoted as P, and the features inside the dictionary P are normalized by using min-max normalization. (2) The normalized feature network dictionary P and the measured tool wear vector y are divided into two parts which are the training datasets and validation datasets. The training datasets are denoted as P α and y α , respectively, and the validation datasets are denoted as P β and y β , respectively. (3) The generated dictionary P α and the measured tool wear vector y α are used to train the SAL feature selection model, and the selected dictionary is denoted as P α s , where the location of each selected feature in the dictionary P α is recorded as Λ. Based on Λ, the significant features from the validation dataset denoted as P β s can be directly selected. (4) GPR is conducted to estimate the RUL of the cutting tools. First, P α s is applied to define the Gaussian kernel. Then, the predictive mean denoted asȳ β s and the predictive variance denote as V(y β ) of the validation dataset can be estimated. (5) Mean square error (MSE) shown below is used to evaluate the predictive results.
where the notation ς indicates the length of y β s . If the MSE β is greater than a predefined threshold T β , the corresponding parameters of the SAL feature selection procedure need to be re-tuned, and (4) and (5) need to be repeated until MSE β ≤ T β .
The second part is the on-line RUL estimation using the on-line extracted features and off-line trained SAL and GPR models to conduct the on-line RUL prediction. The procedure is as follows: (1) Based on the location Λ, the significant features can be directly selected from the on-line collected signals.

Experiment Setup
To demonstrate the effectiveness of the proposed SAL and GPR framework, the down milling operation experiment conducted by the PHM society in 2010 is employed as shown in Figure 4 [26]. As can be seen, different types of sensors including the dynamometer, accelerometer, and acoustic emission (AE) sensor are used to mount on the different places of the machine with the aim of collecting sensor signals to conduct the RUL of the cutting tools. First, a Kistler quartz 3-component dynamometer is installed between the workpiece and the machining table in order to measure the cutting forces in X, Y, and Z directions during machining. The collected signals are denoted as force x , force y , and force z , respectively. Then, three Kistler piezo accelerometers denoted as acce x , acce y , and acce z , respectively, are mounted in the X, Y, and Z directions of the workpiece in order to measure the vibrations in each of these directions. Furthermore, a Kistler AE sensor denoted as AE RMS is used, which can measure high-frequency energy signals generated during material removal from the workpiece in the machining process. As a consequence, seven sensor outputs which are force x , force y , force z , acce x , acce y , acce z , and AE RMS , respectively, are collected from the applied sensors, and these sensor outputs are sampled by a NI DAQ PCI 1200 board with 12 kHz sampling rate. During the experiment, the cutting tool with three flute cutters was utilized to cut the workpiece in order to machine a sloping surface [16]. The cutting speed is set up to 4.7 m/min and the spindle speed is set to 23,600 rpm. After each cutting process, the machine will stop to measure the flank wear of the cutter, which was conducted by employing the LEICA MZ12 microscopy system. At the end of the experiment, a total of three cutting tools denoted as T1, T2, and T3, respectively, were used to machine three aluminium workpieces, and each cutting tool made a total of 300 cutting processes so as to produce 300 datasets enclosing 7 columns sensor outputs and 1 column tool wear measurements.

Sparse Augmented Lagrangian (SAL) Algorithm for the Feature Selection
To conduct the SAL-based feature selection, as can be seen in Figure 5, the original datasets collected from T1 and T2 were first used as the training and validation datasets and the rest dataset collected from T3 was used as the testing dataset. As shown in Table 1, this experimental design for data partitioning is denoted as E1 where the training and validation datasets contain 600 (=300 × 2) datasets and the testing dataset contains 300 datasets.   E1  T1 and T2  T3  E2  T1 and T3  T2  E3 T2 and T3 T1

Symbol Training/Validation Testing
During the off-line training and validation stage, 90% of the training and validation datasets (540 datasets) were randomly selected as the training dataset and the rest 10% of the datasets (60 datasets) were used as the validation dataset. Then, the training datasets were used to generate a feature network dictionary, and the features utilized in the current study are presented in Table 2. Based on the utilized features, a feature dictionary can be generated according to Equation (11). As can be seen in the equation, the critical parameter for the feature network dictionary is the extension iteration l. The ideal value of l can help in producing the gratifying size of the feature dictionary to contain as much useful information as possible. However, the increase of l will greatly expand the size of the initial dictionary matrix which heavily increases the computational load. For the current study, the initial extension iteration l is set to 1 for the SAL-based feature selection. According to Equation (11), the feature network dictionary of the training dataset is a 540 × 5670 matrix, meaning that the number of the designed features in the dictionary is extended to 5670. In a similar way, the validation dataset can also be used to generate a feature network dictionary where the size of the dictionary is 60 × 5670.
After the feature dictionary is generated, the SAL algorithm was applied to select the significant features from the candidate 5670 features. Based on the algorithm introduced in Section 2.3, only 8 features denoted as P α s were finally selected among the 5670 candidate features, where the selected features are AE RMS , force x × force x , force x × force y , force x × force z , force y × force y , force y × force z , force z × force z , and AE RMS × force z . The location of these selected features is recorded as Λ. As can be seen from the selected features, they are only related to the AE signal and the force signals, and there is no dependence on the vibration signals. Therefore, for the future on-line RUL estimation, there will be no need to install vibration sensors, which is very promising for industrial applications because the combination of feature network dictionary and SAL can be effective in cost saving. Finally, according to the location Λ of the selected features in the dictionary, the selected features of the validation dataset can be represented as P β s .

Gaussian Process Regression (GPR) for the RUL Estimation
After the features were selected via the SAL algorithm, they were then input to GPR for the RUL estimation. First, based on the algorithm introduced in Section 2.4, 8 selected features P α s from the training dataset were used to train the GPR model. After that, the 8 selected features P β s from the validation dataset were input to the trained GPR model in order to predict the RUL of the validation dataset. To estimate the performance of the prediction results, the mean square error (MSE) and the area of uncertainty (AoU) are used. The equation of the MSE is computed as where y β ς andŷ β ς are the measured tool wear of the validation dataset and the GPR predicted tool wear, respectively. The notation Ξ indicates the length of the predicted tool wearŷ β ς , and ς indicates the index of the dataset. The smaller the MSE, the more accurate the model's predictions will be. Furthermore, the area of uncertainty (AoU) is proposed to measure the uncertainty of the prediction results, which can be expressed as where the notation σ ς = sqrt(V(ŷ β ς )) represents the standard deviation of the prediction results at the ς-th dataset. The smaller the AoU, the higher the reliability of the model's prediction will be. Figure 6a shows the validation result where the red dash line indicates the mean value of the estimation and the gray color represents the uncertainty of the estimation which is plotted byŷ v ς ± 2 × σ ς . As can be seen, the MSE is 39.84 and the AoU is 0.12.  Figure 6. RUL prediction resultsof the experiment E1 of (a) the validation dataset and (b) the testing dataset.

On-Line RUL Estimation
After the SAL-based feature selection model and GPR model were trained by using the data from T1 and T2, they were then applied to the testing dataset acquired from T3 with the aim of conducting the on-line RUL estimation. First, the dataset from T3 was used to generate a feature network dictionary where the size of the dictionary is 300 × 5670. Second, based on the off-line trained location Λ, the features of the test dataset were selected which can be expressed as P β * s . Finally, the selected features were input into the GPR model for the RUL estimation. Figure 6b shows the predicted RUL on the test dataset producing an MSE of 192.65 and an AoU of 0.41.

Additional Experiment Designs
To validate our proposed RUL estimation framework, as can be seen in Table 1, two additional experimental designs denoted as E2 and E3 were carried out. In the case of E2, the training and validation datasets are from T1 and T3, and the testing dataset is from T2; while for E3, the datasets acquired from T2 and T3 were used as the training and validation datasets, and the dataset collected from T1 was utilized as the testing dataset. For both experiments, 7 features were selected from 5670 candidate features during the training and validation stage. Similar to E1, the selected features are also associated with AE and force signals only, with no dependence on vibration signals. The validation results of E2 and E3 were presented in Figures 7a and 8a, respectively. Finally, after applying testing datasets to the trained SAL and GPR model, the RUL testing results are presented in Figures 7b and 8b. Tables 3 and 4 summarized the validation and testing results of E1, E2, and E3.   Furthermore, the parameter l is also set to 0 to evaluate the RUL predictions in order to highlight the advantages of the feature extension in the feature network dictionary. For E1, based on Equation (11), with a non-extended dictionary, the number of features in the dictionary is only 105 which contains less useful information than the extended feature network dictionary. After applying SAL, the number of selected features is 39, which corresponds to the whole 7 sensor outputs. Compared to Figure 6, which utilizes the extended feature network dictionary, the prediction results of the non-extended dictionary (see Figure 9) show an inferior performance and use more sensors. In the same way, E2 and E3 also show poorer performance when l = 0. As a result, the extended feature network dictionary can potentially contain as much useful information as possible which can replace the installation of more sensors and incorporate more information meaning that the cost relating to sensor installation can be saved.

Comparative Study
Since our method has two parts containing the SAL-based feature selection method and GPR prediction method, some comparisons are made in terms of RUL predictions by using a Least Absolute Shrinkage and Selection Operator (LASSO) approach [27], a pure SAL algorithm, and a pure GPR algorithm: • First, with respect to the LASSO approach, as can be seen in Tables 3 and 4, the prediction results are promising and are close to that of our proposed method. However, the number of selected features exceeds 1000 in terms of E1, E2, and E3, which far exceeds the SAL-based feature selection method. • Second, for the pure SAL algorithm, it can produce a very sparse solution with an adequate prediction performance (see Tables 3 and 4), but it cannot output the uncertainty of the model which limits the ability of engineers/researchers to have an evaluation of the reliability of the prediction results. • Last, for the pure GPR method, as presented in Tables 3 and 4, it can output the uncertainty of the model, but the method is not sparse, i.e., it exploits the overall feature information for RUL prediction, which loses efficiency in high-dimensional spaces, especially when the number of features is tremendous.
As a result, the combination of SAL and GPR can take full advantage of the benefits of uncertainty estimation and overcome the drawbacks due to oversized features.

Case Study 2: NASA Ames Milling Datasets
To further demonstrate the effectivenss of the proposed RUL estimation framework, a second case study was conducted by analysing the NASA milling dataset [28]. Figure 10 depicts the setup of the experiment. As can be seen, AE sensors, vibration sensors, and motor current sensors are mounted to the table and the spindle of the Matsuura machining center MC-510V; moreover, six signal outputs which are represented as AE table (AE table  indicates AE signals at table), AE spindle (AE spindle indicates AE signals at spindle), VB table  (VB table indicates table vibration signals), VB spindle (VB spindle indicates spindle vibration signals), smc AC (smc AC indicates AC spindle motor current), and smc DC (smc DC indicates DC spindle motor current), respectively, are collected for estimating the RUL of the cutting tools. During the experiment, the cutting speed was set to 200 m/min, and two different depths of cut (DoC), namely, 1.5 mm and 0.75 mm, and two different feed rates, namely, 413 mm/min and 206.5 mm/min, were designed for the experiments. Furthermore, the workpieces for the experiments have two materials, which are cast iron and stainless steel J45 having the size of 483 mm × 178 mm × 51 mm. All experiments were performed a second time with the same parameters and a second cutting tool. The tools used for this case study are KC710-type carbide insets. Table 5 summarized the conducted experiments under different process parameter conditions. As can be seen from the table, it contains 8 pairs of experiments with the same 8 different parameter settings. For example, Experiment 9 is a repeat experiment of Experiment 1. Therefore, the dataset acquired from Experiment 1 can be considered as the training/validation dataset and the dataset collected from Experiment 9 can be used as the testing dataset, and this pair of experiments can be denoted as P1. Similarly, the rest of the datasets can be named in the same way. Table 6 summarizes these experimental pairs.   Figure 11a shows the RUL estimation results of the validation dataset of the experiment pair P1. During the off-line training/validation stage, the number of the candidate features in the feature network dictionary is 4185, and only 9 features related to 3 sensors, namely, AE table , AE spindle , and VB spindle , were selected. This process can result in considerable cost savings in industrial applications because unselected sensors will not be used during the on-line test. After that, for the on-line estimation, as can be seen in Figure 11b, the testing results are consistent with the validation results in terms of MSE and AoU, which implies that our proposed RUL estimation framework inherits some robustness. Furthermore, the traditional LASSO approach, SAL approach, and GPR approach were also used to analyze the dataset for comparison with our method. As can be seen from the pictures shown in Figures 12-14, they have higher MSE and AoU in comparison to our method. This means that the proposed framework has higher accuracy compared to other methods. Furthermore, Tables 7 and 8 present the validation results and testing results of the 8 experiment pairs. As can be seen, LASSO can have promising results on validations datasets but has poor results on testing datasets meaning that this method is not robust. In terms of the proposed SAL and GPR method, for the majority of cases of the testing dataset, the performance of the proposed method has higher accuracy than the traditional methods in terms of the MSE and AoU demonstrating the superiority of our proposed RUL estimation framework over traditional techniques.

Discussion
To better understand the advantages of the proposed SAL and GPR-based RUL estimation framework, some discussions are summarized as follows: (1) Sparsity: This study proposed a novel feature network dictionary which can extend the number of candidate features under limited sensor conditions with the aim of containing as much useful information as possible. However, the features in the dictionary are usually redundant and may not be necessary for the RUL estimation of cutting tools. The proposed SAL feature selection method can reduce the number of candidate features and select the most significant features. Compared with the conventional feature selection method, such as LASSO, SAL has a very high sparsity and only a very few significant features can be selected. (2) Cost saving: For traditional feature selection methods, they have low sparsity. Therefore, they lead to the selection of features associated with more sensors. In contrast, the high sparsity of SAL leads to the selection of fewer sensors, which can save costs for online monitoring. (3) Limitations: The process of feature network dictionary extension can enlarge the number of candidate features but increase a significant amount of computation to select the features. Therefore, a more efficient optimization algorithm is needed to reduce the computational requirements, which will be studied in the future research.

Conclusions
In this paper, an innovative feature network dictionary was proposed to enlarge the number of candidate features under limited sensor conditions, and the developed dictionary can potentially contain as much useful information as possible. As demonstrated by the experiments, this process can replace the installation of more sensors and incorporate more information. After that, the Sparse Augmented Lagrangian (SAL) feature selection method was applied to select the significant features and the Gaussian Process regression (GPR) algorithm was applied to estimate the Remaining Useful Life (RUL) of cutting tools. The results from several case studies demonstrated the effectiveness of the proposed technique.
This study can be treated as the initial research for RUL estimation of cutting tools because it mainly focuses on simple machining processes and strategies. In future research activities, more experimental cases including different cutting processes and machining strategies can be tested to validate our proposed methods. Furthermore, we plan to use different materials in the training and testing stages for evaluating the RUL prediction results.