1. Introduction
Traditional manufacturing is undergoing a transformation towards intelligent manufacturing with the development of information technology, demanding intelligent and automated upgrades in machining [
1]. In the machining process, tool wear is inevitable, which directly affects the tool surface integrity and machining accuracy, and may even damage the machine tool [
2,
3,
4]. Relevant studies have shown that 10–40% of machine tool downtime is caused by abnormal tool conditions [
5] and the service life of tools is only 50–80% of the recommended service life [
6]. Therefore, tool condition monitoring (TCM) offers advantages in cutting costs and enhancing production efficiency and product quality, and holds significance for smart manufacturing.
Two methods are commonly applied in TCM, namely, direct and indirect methods. For the direct method, the tool wear is directly measured with an optical microscope or a CCD camera based on computer vision methods [
7]. In the indirect method, models of TCM are established based on signals collected by sensors, such as cutting force [
8], vibration [
9], acoustic emission [
10], and spindle power [
11]. Compared with direct methods that require machine tools to be shut down for measuring, indirect methods allow in situ estimates of tool condition based on sensor data. Consequently, indirect methods are considered to be more suitable for in situ tool condition monitoring.
Many models have been proposed for indirect TCM, including physical models, data-driven models, and hybrid models. Most physical models consider only the dominant factors related to tool wear, which have the advantages of low complexity and physical interpretability, but require expert domain knowledge to construct [
12]. Hybrid models combine physical knowledge with data-driven models, easing the lack of sample data and improving its generalization ability [
13,
14]. However, the applied physical knowledge mainly relies on prior information about tool wear and lacks deep embedding of tool wear mechanisms. In addition, these methods require large-scale tool wear labels for model training, which make it difficult to meet the needs of online prediction. Data-driven models use accurate mapping between features and wear to solve tool condition monitoring problem. Liao et al. [
15] proposed a method based on acoustic emission signals using wavelet packet decomposition to extract energy features; a support vector machine (SVM) model was established for TCM. Li et al. [
16] proposed a time varying and condition adaptive hidden Markov model for capturing tool wear time dependence. Chen et al. [
17] proposed an artificial neural network-based in-process tool wear prediction (ANN-ITWP) model, estimating the tool wear values by cutting parameters and average peak force in the y direction. Cheng et al. [
18] used a support vector regression (SVR) model to estimate tool flank wear, with a grid search algorithm (GS), a genetic algorithm (GA), and particle swarm optimization used for parameter optimization. The prediction accuracy of the proposed method was 97.32% and 96.72% under GA-SVR and GS-SVR prediction models, respectively. The abovementioned machine learning models have achieved considerable success in TCM. However, they require a lot of feature engineering work in feature extraction and screening [
19,
20].
Deep learning has a strong ability to extract features automatically compared with traditional machine learning methods, with no necessity to perform feature engineering. Therefore, many deep learning models have been proposed for TCM in recent studies. CNN can extract spatial relationship from the feature map through the combination of convolution layers and pooling layers. Xu et al. [
21] proposed a multi-scale feature fusion implemented by the developed parallel convolutional neural networks. The channel attention mechanism combined with the residual connection was developed to enhance the performance of the model. Duan et al. [
22] enlarged samples and applied a three-layer wavelet package decomposition. A multi-frequency-band feature extraction structure based on a deep convolution neural network structure was introduced to predict tool wear conditions. Yong et al. [
23] proposed a one-dimensional convolutional neural network (1D-CNN) and deep generalized canonical correlation analysis (DGCCA). In particular, 1D-CNN was used to extract features from 1D raw data, whereas DGCCA with attention mechanism was used to fuse the feature output from each 1D-CNN. Shah et al. [
24] extracted image quality parameters from scalograms constructed from Morlet wavelets, and built several LSTM models for tool wear prediction. Wu et al. [
25] applied a feature extraction method based on singular value decomposition (SVD) and used a BiLSTM model to predict tool wear. Zhang et al. [
26] used 1D-CNN to automatically extract features, and then used BLSTM model to mine time-dependency of features and monitor tool wear. Xu et al. [
27] proposed an integrated model based on deep learning and multi-sensory feature fusion; the proposed parallel convolutional neural network (PCNN) achieved multi-sensory feature fusion. The prediction results were generated by a fully connected neural network.
Existing deep learning models studies mostly focus on tool condition monitoring, with less attention on tool wear prediction. The significance of tool wear prediction lies in achieving early warning of tool condition and reducing the probability of outliers. In this paper, a novel method for tool wear monitoring and prediction is proposed based on a residual convolutional network and seq-to-seq structure. The contributions of this work are as follows:
- (1)
A deep convolutional network based on a residual structure is proposed to achieve multi-scale feature fusion of multi-source information and alleviate the problems of gradient disappearance and performance degradation. In addition, the introduction of BN and dropout layers improves the generalization ability of the model and avoids overfitting.
- (2)
An encoder–decoder network for short-term monitoring and long-term prediction was built based on the attention mechanism. It can capture the time dependence of depth features, as well as the instantaneous features and long-term trends of time series features.
- (3)
The encoder and decoder-based temporal model can be used for in-process smoothing to reduce local fluctuations in wear values, which improves monitoring and prediction accuracy compared to traditional smoothing methods.
The rest of paper is organized as follows.
Section 2 introduces the theoretical background of CNN and temporal model. In
Section 3, the proposed model for tool condition monitoring and prediction are introduced. Then, the experiment is presented and the results are discussed in
Section 4. Finally, conclusions are drawn in
Section 5.
3. Methodology
In this paper, a hybrid model based on 1D-CNN and Resnet (DResnet-1d) is proposed for tool condition monitoring, on the basis of which tool condition prediction is carried out by a time-series model to achieve early warning of tool condition, as shown in
Figure 5. The sensor signals collected during machining and the corresponding tool wear values are put into the proposed model. Model training determines the optimal parameters of the proposed model. The initial estimates are transferred to a multi-step predictive time-series model. Through smoothing corrections, tool wear monitoring and prediction can be achieved.
3.1. Deep Feature Extraction for Tool Wear
A deep network is supposed to have an excellent capability of feature extraction by assembling elementary features in shallow layers into advanced features in deep layers. For this paper, a DResNet-1d model was established to estimate tool wear values, as shown in
Figure 6. The convolutional unit at the beginning was used to fuse multi-sensor data and transform the number of channels of the feature map. ResBlock represents a residual block, the structure of which refers to the pre-activation residual block [
34] shown in
Figure 7. Conv is the convolution layer, BN represents the BatchNorm layer, dropout represents the dropout layer, and the activation function is ReLU. The number of convolution kernels in the residual block is 64 and 128, with strides of 1 and 2, and the kernel size is 3. In this paper, the ResBlock was set to repeat 20 times to form a deep network. FC is the full connection layer, which is used to nonlinearly map the features into tool wear values.
The inputs and outputs of the model can be represented as Equation (7). The input of the model is , where n is the number of sensor signal channels, to is the signal collected by sensor, D represents the model, and the output is the tool wear estimation.
3.2. Multi-Step Prediction Model for Tool Wear
Tool wear is a continuous process, and there exists a temporal correlation between adjacent tool wear values. Considering the ability of a RNN to capture time dependency, an encoder- and decoder-based temporal model was established to achieve long-term and short-term predictions with historical information.
The encoder- and decoder-based temporal model, which is shown in
Figure 8a, adopts a GRU based encoder–decoder structure and applies the attention mechanism. The inputs of the encoder are the tool wear values of past moments, and the semantic vector is output by the encoder. The attention layer assigns a weight to the semantic vector at each respective moment, which is fed into the decoder as the initial hidden state, and then tool wear prediction of future moments is output, as shown in
Figure 8b.
The integrated model for multi-step prediction of tool wear is used to construct an intrinsic link between multi-domain features of multi-channel signals and multi-step wear values. The continuous historical wear values obtained in the monitoring model are used as inputs to the time-series model and the information in the historical wear values is extracted by attention for the prediction of short-term and long-term wear values. The whole process can be expressed as Equation (8). The monitoring model reads the multichannel features
to obtain the historical wear values
,
G represents the monitoring model, and the temporal model takes the tool wear values of past m consecutive moments as inputs of the encoder, denoted as
, and
F represents the temporal model. The outputs of the decoder are the tool wear predictions of the next n moments, denoted as
. The process is shown as
Figure 9.
3.3. Smoothing of Estimation
The DResNet-1d model achieves multi-sensor signal fusion that reduces the impact of signal fluctuations on tool wear estimation. However, there is a lack of temporal correlation between the estimation of adjacent moments, which results in inevitable outliers. To address these shortcomings and inspired by the temporal model, smoothing correction was applied in this paper to reduce the probability of outliers and improve the accuracy of the estimation, as shown in
Figure 10.
The smoothing correction model adopted in this section follows the aforementioned encoder- and decoder-based temporal model. The historical tool wear values are fed into the model and a short-term prediction is output. Multiple predicted tool wear values exist at each moment, which can be denoted as
. The statistical characteristic of available predicted tool wear values at each moment is selected to perform smoothing correction. The smoothing correction can be expressed as follows:
where
S represents the function used to calculate statistical features.
5. Results and Discussion
5.1. Evaluation Metrics
To evaluate the performance of the model, the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were employed for evaluation. MAE describes the proximity between the estimation and actual tool wear values, while RMSE magnifies actual errors to provide a clearer picture of predictive accuracy. The expressions are as follows:
5.2. Performance of the DResNet-1d Model for Tool Condition Monitoring
To assess the effectiveness and superiority of the DResNet-1d model, some common models were used to make a comparison using the same datasets. All models are listed as follows:
SVR: traditional machine learning model based on support vector regression;
1D-CNN: deep learning model based on one-dimensional convolutional neural network;
BiLSTM: a variant of RNN, with two LSTMs in opposite directions;
BiGRU: a variant of parameter simplification of a LSTM network, with two GRUs in opposite directions.
To make the comparison between the five models more fairly, the main parameters were consistent, and a series of unified settings was adopted. For example, the optimizer was Adam, the dropout rate was 0.5, and the loss function was MSE.
The validation results of C1 are presented in
Figure 13. The evaluation metrics for each model are listed in
Table 3. Comparing the validation results of models, deep learning models outperformed traditional machine learning models. Within the realm of deep learning methods, the mean RMSE and MAE values of the DResNet-1d model were 3.09 and 2.28, respectively, which stand out in comparison to the results of other models. This indicates the incorporation of residual connections in the CNN mitigates issues such as gradient vanishing and network degradation. This allows for the efficient extraction of specific features and simplification of spatial information within the signal and exhibits remarkable feature extraction capabilities and abstract learning abilities.
5.3. Effectiveness of Smoothing Correction for Tool Condition Monitoring
It can be observed from the results of the monitoring that there were inevitably local fluctuations and outliers in the wear curve of each model, which affected the total accuracy. To improve the accuracy of estimation results, it is possible to improve the structure of the model, or to correct the estimation to reduce local fluctuations in the curve.
Some traditional algorithms have been used for time-series smoothing, such as moving average models, and autoregressive differential moving average models. Considering the latency and performance of the model, triple exponential smoothing can be chosen [
37].
The encoder- and decoder-based temporal model was used to perform smoothing correction on the estimation. The comparison of evaluation results after correction using the temporal model and the triple exponential smoothing algorithms are shown in
Figure 14, and the MAE and RMSE are listed in
Table 4.
The experimental results demonstrated that both methods showed a good performance, reducing local fluctuations in adjacent time step ranges while retaining the trend of the wear curve, making the estimated tool wear values more stable. In contrast, although the triple exponential smoothing method was relatively simple, its overall smoothing effect was inferior to that of the encoder- and decoder-based temporal model.
5.4. Performance of Tool Wear Prediction Model on Multi-Step Prediction
Similar to the smoothing process, the tool wear prediction model took as inputs multiple consecutive historical tool wear values estimated by the DResNet-1d model and computed predicted future tool wear values. It established a mapping relationship between historical tool wear information and future tool wear values. For this paper, short-term and long-term predictions were performed and tool wear values were predicted for each dataset at 5, 10, and 15 moments in the future, respectively. The results of the five-step prediction are shown in
Figure 15. To validate the performance of the proposed model, the results were obtained by comparing them with those of standard deep learning models (CNN and GRU) and advanced deep learning models (Transformer and SMAML). All the results are shown in
Table 5.
From the analysis of the experimental results listed in
Table 5, the results of the proposed prediction model outperformed the basic deep learning model and the advanced deep learning model on multi-step prediction, which proved that the model can learn the short-term features and long-term trends of time information. Meanwhile, the RMSE and MAE of 5, 10, and 15 steps showed an increasing trend, indicating a higher accuracy in the short-term prediction compared to the long-term prediction.
5.5. Validation of Generalization Capability of the Tool Wear Prediction Model
The proposed tool wear prediction method was applied to the machining of carbon fiber-reinforced polymer (CFRP) to test its generalization in a real machining environment. Side milling experiments were conducted on a three-axis vertical machining center (VMC855) in a dry cutting environment. The workpiece material was CFRP, the size was 450 mm × 40 mm × 11 mm, and the milling cutter was equipped with one APMT1135 PCD insert with a cutting diameter of 20 mm ( specific process parameters are shown in
Table 6).
As shown in
Figure 16, the force acquisition system collected the three-axis cutting force signals Fx, Fy, Fz, and the Z-axis bending moment Mz, respectively. It consisted of a wireless rotary dynamometer (Kistler-9170B), a wireless transmission module, and display software (PTS App Type Z22059-900
). The sampling frequency was 2.5 kHz. The acceleration acquisition system consisted of a wireless rotation dynamometer (Kistler-9170B) and a wireless transmission module with a sampling frequency of 2.5 kHz. The acceleration acquisition system consisted of a self-developed ADXL356CEZ three-way capacitive acceleration sensor, a wireless acquisition board, and visualization software (self-developed) from Chongqing University. The accelerometer was attached to the surface of the workpiece with adhesive and the sampling frequency was 10 kHz.
Three single-tooth tools were used for multiple milling experiments of CFRP components. Taking a wear value of 300 um as the criterion for tool failure, each tool was milled approximately 80–100 times. Three datasets consisting of cutting signals and wear values were obtained, i.e., T1, T2, and T3. In order to minimize the measurement errors, a digital microscope (MV-HM2000GM) was used to collect the tool wear images after each cutting experiment without disassembling the inserts, and the corresponding tool wear values were measured. The tool life for one set of experiments is shown in
Figure 17, where the width of the flank wear increased significantly with the number of cuts.
Multi-domain feature extraction was performed on signals from six channels (cutting force and cutting vibration in three vertical directions).
Figure 18 illustrates the signal features under three sets of experiments and differentiates the different features with colors. Then, ten features described in
Table 1 were selected from each experiment.
The extracted features were input into the monitoring and prediction model to verify model accuracy.
Figure 19 and
Figure 20 show the predicted results of flank wear in the three tool experiments. Specific results are shown in
Table 7 and
Table 8. The prediction results show that the proposed model was able to accurately realize tool wear monitoring and prediction during milling of CFRP workpieces, proving the generalization ability of the model to learn other machining conditions.