Machine Tool Wear Prediction Technology Based on Multi-Sensor Information Fusion

The intelligent monitoring of cutting tools used in the manufacturing industry is steadily becoming more convenient. To accurately predict the state of tools and tool breakages, this study proposes a tool wear prediction technique based on multi-sensor information fusion. First, the vibrational, current, and cutting force signals transmitted during the machining process were collected, and the features were extracted. Next, the Kalman filtering algorithm was used for feature fusion, and a predictive model for tool wear was constructed by combining the ResNet and long short-term memory (LSTM) models (called ResNet-LSTM). Experimental data for thin-walled parts obtained under various machining conditions were utilized to monitor the changes in tool conditions. A comparison between the ResNet and LSTM tool wear prediction models indicated that the proposed ResNet-LSTM model significantly improved the prediction accuracy compared to the individual LSTM and ResNet models. Moreover, ResNet-LSTM exhibited adaptive noise reduction capabilities at the front end of the network for signal feature extraction, thereby enhancing the signal feature extraction capability. The ResNet-LSTM model yielded an average prediction error of 0.0085 mm and a tool wear prediction accuracy of 98.25%. These results validate the feasibility of the tool wear prediction method proposed in this study.


Introduction
With the rapid development of modern industries and scientific technology, manufacturing equipment is gradually becoming larger, more integrated, faster, more automated, and intelligent.In the manufacturing industry, computer numerical control (CNC) milling is widely used, and the importance of cutting tools is evidenced by how they directly affect the dimensional accuracy and surface quality of products.In addition, it is more convenient to replace tools according to the specific piece and time required to cope with large-scale processing environments.However, this method has certain limitations.First, it relies heavily on worker experience to judge tool wear.Second, replacing tools through piece and time methods cannot accurately determine the service life of the tools, which may lead to unnecessary tool waste and, more significantly, affect the quality of the products.Developing tool wear prediction technology can avoid tool damage and other problems, as well as helping to improve tool chip speed and leading to substantial savings in production costs [1].
Tool wear prediction methods can be divided into two general categories: direct and indirect [2].Direct measurement methods involve directly measuring tool wear using equipment such as microscopes to determine the degree of tool wear.In contrast, indirect measurement methods predict tool wear based on relevant machining parameters.Using high-magnification microscopes to directly capture images of the cutting edges of tools studies demonstrate the diverse applications of traditional machine learning methods in tool wear prediction.
Deep learning methods are also very common in the fields of feature extraction and tool wear prediction.Convolutional neural networks (CNNs) are typically used to extract key features and predict tool wear amounts [17].For example, Lu et al. [18] proposed the use of shallow CNNs in the feature extraction of monitoring signals.In addition, Kong et al. [19] proposed a tool wear prediction model based on an integrated radial basis function kernel principal component analysis (KPCA_IRBF) and a relevant vector machine (RVM).Compared with traditional methods, such as partial least squares (PLS), artificial neural networks (ANN), and support vector machines (SVM), the RVM method provided more accurate predictions and offered additional advantages in terms of confidence intervals.Zhang et al. [20] proposed an improved integrated estimation method based on long short-term memory (LSTM) networks and particle filter (PF) algorithms.The integrated PF-LSTM recognition method predicted the random tool wear process based on historical measurement data, and the accuracy of the PF-LSTM method was verified through micromilling experiments.
Mathematical and modeling methods are also used to diagnose tool wear.For example, Awasthi et al. [21] developed a physics-based digital twin method for tool wear diagnosis during machining.For milling tools, information theory methods were used to optimize the test design and sensor suites were used for fault detection, thereby improving the inference of the tool wear.The robustness of the design was verified using dynamic time warping and k-NN classification methods.Li et al. [22] proposed a new physics-based meta-learning framework to predict tool wear at different wear rates.Piecewise fitting parameters were used to combine data-driven analysis and parameter estimation, which ensured the accuracy of the parameters, improved the interpretability of the tool wear prediction, and accurately reflected changes in tool wear rates.
However, regardless of whether a single-sensor detection method, multi-sensor fusion method, or machine learning algorithm is used, none of these methods consider the influence of multiple operating conditions during processing.Most methods primarily focus on monitoring processing under a single working condition and cannot adapt to the complex and dynamic conditions in actual processing situations.
Therefore, in this study, we conducted an analysis of the characteristics of the processed parts to select appropriate sensors as signal sources.To ensure the processing quality and efficiency of the parts and avoid losses caused by tool breakage, a tool wear prediction technology based on multi-sensor information fusion is proposed.The technology monitors changes in tool status during the processing of thin-walled parts.To improve the accuracy of tool wear prediction, data collected by sensors during processing were used for model training and prediction, and a predictive model for tool wear based on combining the ResNet and long short-term memory (LSTM) models (called ResNet-LSTM) was constructed.Experimental data for thin-walled parts under various machining conditions were utilized to monitor the changes in tool conditions.The proposed ResNet-LSTM model significantly improved the prediction accuracy compared to the individual LSTM and ResNet models.
The basic structure of the method developed in this study is shown in Figure 1.The rest of this paper is organized as follows.Section 2 introduces the data fusion method and describes the construction of the model, and Section 3 describes the data collection process.Section 4 analyzes the results of the processing experiments and model predictions, and validates the accuracy of the model.Finally, Section 5 concludes the study.

Multi-Sensor Information Fusion Technology
The complexity of the tool-cutting process results in the generation of signals in a non-stationary state, which poses challenges for tool monitoring.Traditional single sensor monitoring methods can reduce the accuracy and reliability of analyses particularly when they are utilized improperly.In addition, the complex and interrelated structures of machine tool systems can easily lead to one-sidedness when single-senso monitoring methods are used.
Multi-sensor information fusion technology is a comprehensive automated information-processing method that has become widely researched.Bayesian inference Kalman filtering, fuzzy set theory, neural networks, and wavelet analysis methods ar commonly used for information fusion.The application of these methods enables mor accurate data processing and more effective decisions, thereby improving th performance and reliability of systems.The main goal of information fusion is to extrac as much valid information as possible from the measured objects and environment by optimizing the combination of observations from various sensors.
The structure of a state-recognition system based on multi-sensor information fusion is illustrated in Figure 2.

Multi-Sensor Information Fusion Technology
The complexity of the tool-cutting process results in the generation of signals in a non-stationary state, which poses challenges for tool monitoring.Traditional singlesensor monitoring methods can reduce the accuracy and reliability of analyses, particularly when they are utilized improperly.In addition, the complex and interrelated structures of machine tool systems can easily lead to one-sidedness when single-sensor monitoring methods are used.
Multi-sensor information fusion technology is a comprehensive automated informationprocessing method that has become widely researched.Bayesian inference, Kalman filtering, fuzzy set theory, neural networks, and wavelet analysis methods are commonly used for information fusion.The application of these methods enables more accurate data processing and more effective decisions, thereby improving the performance and reliability of systems.The main goal of information fusion is to extract as much valid information as possible from the measured objects and environment by optimizing the combination of observations from various sensors.
The structure of a state-recognition system based on multi-sensor information fusion is illustrated in Figure 2.This study utilized the weighted observation fusion Kalman estimation algorithm to handle the problem of fusing large amounts of data from multiple sensors.Details of the equations can be found in Reference [23].Based on data fusion, the initial values of 0 x and 0 P are set.At time k, measurements are obtained from the sensors, and these values are denoted as z.Then, using a recursive method, the state estimation value at time k, denoted as N , is calculated.These steps are repeated continuously until the estimation requirements are satisfied, which terminates the recursive calculations.The basic principle of Kalman filtering involves the "predict-measure-correct" logical sequence to eliminate interference data from the collected sensor data and reconstruct the system's state vector using the measured values, thereby effectively estimating the state data.The state equation of the system infers the current state based on the previous state and control variables, and is calculated as follows: where k x is the n-dimensional vector of state components, A denotes the state transition matrix, 1 − k u is the external input that the system can accept, B is the matrix that converts the inputs into states, and 1 − k w is the noise of the prediction process (corresponding to the noise of each component in k x ), with an expectation of 0 and a covariance of Q, representing Gaussian white noise.The system's observation equation is expressed as follows: where k z is the measurement value and input of the filter, H is the matrix used to transform the state variables, and k v is the observation noise that follows a Gaussian distribution N(0, R).The basic steps involved in the Kalman filter are as follows: Step 1: Predict an estimate: Step 2: Compute the covariance: This study utilized the weighted observation fusion Kalman estimation algorithm to handle the problem of fusing large amounts of data from multiple sensors.Details of the equations can be found in Reference [23].Based on data fusion, the initial values of x 0 and P 0 are set.At time k, measurements are obtained from the sensors, and these values are denoted as z.Then, using a recursive method, the state estimation value at time k, denoted as x k (k = 1, 2, • • •, N), is calculated.These steps are repeated continuously until the estimation requirements are satisfied, which terminates the recursive calculations.The basic principle of Kalman filtering involves the "predict-measure-correct" logical sequence to eliminate interference data from the collected sensor data and reconstruct the system's state vector using the measured values, thereby effectively estimating the state data.The state equation of the system infers the current state based on the previous state and control variables, and is calculated as follows: where x k is the n-dimensional vector of state components, A denotes the state transition matrix, u k−1 is the external input that the system can accept, B is the matrix that converts the inputs into states, and w k−1 is the noise of the prediction process (corresponding to the noise of each component in x k ), with an expectation of 0 and a covariance of Q, representing Gaussian white noise.The system's observation equation is expressed as follows: where z k is the measurement value and input of the filter, H is the matrix used to transform the state variables, and v k is the observation noise that follows a Gaussian distribution N(0, R).The basic steps involved in the Kalman filter are as follows: Step 1: Predict an estimate: Step 2: Compute the covariance: Step 3: Compute the Kalman gain K k : The noise w (system error) and observation noise v (measurement error) in the state and measurement equations are generally assumed to be Gaussian white noise that follows a normal distribution P(w)-(0, Q), P(v)-(0, R), where Q and R are different covariance matrices at time k: Sensors 2024, 24, 2652 6 of 24 Step 4: Update the estimate: Step 5: Update the estimate covariance for the next time step using the following: where In the practical collection of spindle vibration signals from machine tools, the obtained signals often contain not only the original vibration signal, but also other noise or interference signals with high randomness.Noise signals are a common problem in signal analysis and may originate from various sources of interference, such as electromagnetic waves and mechanical vibrations.In the analysis process, a series of denoising measures is required to reduce the influence of noise, thereby improving the reliability and accuracy of the signal.
The wavelet packet transform is a multiscale time-frequency domain transformation method commonly used in signal analysis [24].It can decompose high-frequency band signals into subsignals with local characteristics, thereby providing more detailed information about the signal.This method can be applied to analyze and extract changes in the state of the monitoring equipment [25].
The wavelet packet is defined as follows: when decomposing using a low-pass filter, the coefficients are denoted as h(k); for a highpass filter, the coefficients are denoted as g(k).At the j-th level of the wavelet packet decomposition, there are a total of 2 j wavelet packet bases, denoted as j.When n = 0, the scaling function ϕ(t) and basic wavelet function ψ(t) are defined as follows: respectively.Using the method for determining the number of decomposition levels mentioned above, the optimal number of decomposition levels was determined to be three.Therefore, the signal was subjected to three-level wavelet packet decomposition, as shown in Figure 3.
In the figure, signal X(t) represents the original signal before decomposition.This is decomposed into a low-frequency component signal (obtained using low-pass filter coefficients g(k)) and a high-frequency component signal (obtained using high-pass filter coefficients h(k)).The high-and low-pass filter coefficients must satisfy the following orthogonal relationship: Sensors 2024, 24, 2652 7 of 24 The decomposed signals obtained at different decomposition levels are calculated layer-bylayer using the following equations: Following the aforementioned decomposition method, after the signal undergoes wavelet packet decomposition at the i-th level, 2 i characteristic signals are obtained, each corresponding to a specific frequency band.
coefficients h(k)).The high-and low-pass filter coefficients must satisfy the following orthogonal relationship: The decomposed signals obtained at different decomposition levels are calculated layerby-layer using the following equations: Following the aforementioned decomposition method, after the signal undergoes wavelet packet decomposition at the i-th level, 2 i characteristic signals are obtained, each corresponding to a specific frequency band.

Time-Frequency Domain Feature Extraction Based on Wavelet Packet and Sample Entropy
Sample entropy, proposed by Richman and Moorman in 2000 as an improvement on approximate entropy, is a method for measuring the complexity of a time series.This method can be used to analyze the time series obtained from continuously sampled processes.In theory, the sample entropy reflects the irregularity and complexity of signals and is considered a useful tool for analyzing vibration signals [26].By applying sample entropy, a better understanding of the characteristics of the vibration signals can be attained.The specific steps of the algorithm are as follows.
Step 2: Denoting the pattern dimension as m, construct an m-dimensional vector from the original sequence: Tree structure of the three-level wavelet packet decomposition.

Time-Frequency Domain Feature Extraction Based on Wavelet Packet and Sample Entropy
Sample entropy, proposed by Richman and Moorman in 2000 as an improvement on approximate entropy, is a method for measuring the complexity of a time series.This method can be used to analyze the time series obtained from continuously sampled processes.In theory, the sample entropy reflects the irregularity and complexity of signals and is considered a useful tool for analyzing vibration signals [26].By applying sample entropy, a better understanding of the characteristics of the vibration signals can be attained.The specific steps of the algorithm are as follows.
Step 2: Denoting the pattern dimension as m, construct an m-dimensional vector from the original sequence: Step 3: Define the distance between x(i) and x(j) as follows: Step 4: Set a threshold value r, and for each i, compute the ratio of the number of d(i, j) < r occurrences to the distance n -m + 1, denoted as B m i (r): Calculate the mean of B m i (r) for all i values: Step 5: For m + 1 dimensions, repeat steps ( 2)-( 4) to obtain B m+1 i (r).The sample entropy of the sequence is then obtained as follows: In practical vibration signals, n adopts a finite value; therefore, the estimated sample entropy of the sequence is:

LSTM-Based Tool Wear Prediction Model
LSTM is a special variant of recurrent neural networks (RNN).It features unique "gate" structures that address the drawbacks of traditional RNNs, such as the problem of weight impacts being too significant (which leads to issues such as gradient explosion or vanishing).LSTM networks converge faster and more effectively, resulting in an improved prediction accuracy.
LSTM networks consist of three crucial gates: forget, input, and output.These gates collaborate to determine what information is memorized and forgotten at each moment.Specifically, at each moment, they control the amount of new information added to the cell, whether information is forgotten, and whether any information is used as output.This gate control mechanism enables LSTM networks to more effectively capture long-term dependencies in time-series data, qualifying them as excellent tools for processing data with temporal properties, such as speech and text.In addition, the gate mechanisms of LSTM effectively address the issues with traditional RNNs, making neural networks more suitable for handling sequential data as well as improving model performance and learning capabilities.The basic structure of LSTM is illustrated in Figure 4. Equations are detailed in Reference [27].In the forget gate, a sigmoid function determines the information discarded from the cell state and is expressed as follows: Equations are detailed in Reference [27].In the forget gate, a sigmoid function determines the information discarded from the cell state and is expressed as follows:

OR PEER
where the output at time step t − 1 is denoted by α t−1 , the input at time t is denoted by α t−1 , the weight of each variable is represented by ω f , the bias term is denoted by b f , and σ(x) represents the form of the sigmoid function, which is defined as follows: where Γ f ranges between 0 and 1, which indicates the extent to which each value in the cell state c t−1 should be preserved; a value of 1 indicates "fully retained" and a value of 0 indicates "completely discarded".Updating the information stored in the cell state is the primary function of the output gate and involves the following three steps.
Step 1: The sigmoid function of the input gate is used to compute the result Γ u , which determines which values to update.
Step 2: A new candidate value vector c (t) is created based on the tanh function and added to the new cell.
Step 3: The old cell state is multiplied by the forget gate to forget some of the old information.Then, the product of Γ u * c (t) is added.The new candidate value continuously changes the degree of each state.Finally, the current cell state is updated.The formulas are expressed as follows: The Γ u values range from 0 to 1, whereas the tanh function is a hyperbolic tangent activation function with an output range of −1-1.Therefore, the cell state value at time t − 1 is denoted as c t−1 , c (t) and represents the recorded information to be extracted from the input information at time t, while c t denotes the updated cell state value.The sigmoid function determines the amount of output information controlled by the output gate.The value of c t is determined using the tanh function to obtain the output value at time t.This can be achieved by multiplying Γ 0 and c t , as expressed by Finally, processing within a single neuron requires the assistance of three control gates, a mechanism that allows the highest utilization of input data, and the formation of memories of past long-term data in the LSTM model.
(1) Building the LSTM network model According to [27], the use of up to three layers yields optimal results for LSTM models.Therefore, a two-layer LSTM network was constructed for this experiment.Its structure is displayed in Figure 5.
First, the data collected from the sensors, including the X-, Y-, and Z-axis vibrations, force signals, and the current signal, were preprocessed.When each signal component was treated separately, the input layer dimension was set to 6, resulting in X = [x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 ].However, when the feature vectors were used as the input, the input layer dimension was set to 40, resulting in X = [x 1 , x 2 , x 3 , . . ., x 40 ].Next, the number of neurons in the hidden layer was set to 100 to retain both the long-and short-term memory information.Subsequently, the number of neurons in the hidden layer was adjusted to 50 and then reduced to 20 before proceeding with tool wear prediction.The dimension of the fully connected output layer was set to 1, enabling the tool wear to be predicted based on the output value.This structural design aimed to fully utilize the hierarchical structure of neural networks and memory units at different levels to achieve a more accurate tool wear prediction.
Finally, processing within a single neuron requires the assistance of three gates, a mechanism that allows the highest utilization of input data, and the form memories of past long-term data in the LSTM model.
(1) Building the LSTM network model According to [27], the use of up to three layers yields optimal results for LST els.Therefore, a two-layer LSTM network was constructed for this experiment.I ture is displayed in Figure 5.  X = x ,x ,x ,x ,x ,x ,x .However, when the feature vectors were used as th the input layer dimension was set to 40, resulting in Next, the number of neurons in the hidden layer was set to 100 to retain long-and short-term memory information.Subsequently, the number of neuron hidden layer was adjusted to 50 and then reduced to 20 before proceeding with t prediction.The dimension of the fully connected output layer was set to 1, enab tool wear to be predicted based on the output value.This structural design aimed utilize the hierarchical structure of neural networks and memory units at differe to achieve a more accurate tool wear prediction.
(2) Network parameter configuration Step 1: Normalization: (2) Network parameter configuration Step 1: Normalization: The data were normalized via Step 2: Loss function calculation: The root mean square error (RMSE) was selected as the loss function in the LSTM prediction; it was defined as follows: where y t represents the predicted value, y t is the true value, and T is the number of samples.
Step 3: Evaluation metrics: The selection of the evaluation metrics significantly affected the assessment of the experimental results.In this study, three coefficients, namely, the mean absolute error (MAE), RMSE, and coefficient of determination R 2 , were chosen as indicators to evaluate the model's prediction capability.The latter (R 2 ) represents the degree of fit between the predicted and actual data (the higher the value of R 2 , the better the fit), and served as the criterion to determine the accuracy of the model's predictions.It is expressed as follows: where y i represents the true value, ŷi denotes the predicted value, and y is the mean of the actual values.The initialization parameters for the LSTM network model are shown in Table 1.

Predictive Model of Tool Wear Based on ResNet
ResNet addresses an insufficiency in feature extraction capability by introducing the concept of identity mapping.This concept allows the network to learn residuals instead of directly learning low-level features, thereby facilitating gradient propagation.The ResNet network model proposed by Yu et al. [28] consists of multiple residual modules that are stacked together.The structure of these residual modules helps maintain a stable gradient propagation, enabling the network to learn features at deeper levels.The structure of residual modules is shown in Figure 6.Firstly, the residual module transforms the input x into an output H(x).Here, H(x can be computed by simply adding F(x) and x: H(x) = F(x) + x.This formula indicates tha the output H(x) is composed of the residual part F(x) and the input x.The purpose of this design is to maintain the integrity of information propagation via identity mapping which maps the input directly to the output without any change.By introducing identity mapping, residual networks can prevent the degradation of network performance as the depth increases.
Second, networks designed with identity mapping can focus on learning the residua part F(x).Because the identity mapping part remains unchanged, the network only needs to focus on learning how to better utilize the residual information to improve perfor mance.The advantage of this design is that it simplifies the complexity of the training process.Researchers can focus more on optimizing the residual part to enhance the net work's learning ability without being concerned with how identity mapping will degrade the performance.This approach significantly reduces the difficulty of network training because the model only needs to capture the differences between the input and expected outputs.Firstly, the residual module transforms the input x into an output H(x).Here, H(x) can be computed by simply adding F(x) and x: H(x) = F(x) + x.This formula indicates that the output H(x) is composed of the residual part F(x) and the input x.The purpose of this design is to maintain the integrity of information propagation via identity mapping, which maps the input directly to the output without any change.By introducing identity mapping, residual networks can prevent the degradation of network performance as the depth increases.
Second, networks designed with identity mapping can focus on learning the residual part F(x).Because the identity mapping part remains unchanged, the network only needs to focus on learning how to better utilize the residual information to improve performance.The advantage of this design is that it simplifies the complexity of the training process.Researchers can focus more on optimizing the residual part to enhance the network's learning ability without being concerned with how identity mapping will degrade the performance.This approach significantly reduces the difficulty of network training because the model only needs to capture the differences between the input and expected outputs.
In the predictive model, metrics such as the MAE, RMSE, and R 2 were primarily used as evaluation indicators, and they were defined as follows: where y i represents the true tool wear value, ŷi represents the tool wear value predicted by the model, y represents the mean of the predicted values, SSR is the "sum of squares due to regression" and measures the total variation explained by the regression model, SSE is the "sum of squares due to error" and measures the variation that is unexplained by the regression model, and SST is the "total sum of squares" and represents the total variation in the true tool wear values.In regression analysis, these metrics are fundamental for assessing how well the model's predictions align with the actual values and how much of the total variation in the data is explained by the model.

Introduction to Tool Wear States
Tool wear can be broadly categorized as normal or abnormal.Normal tool wear is primarily caused by friction, high temperatures, and vibrations.In CNC machining, the contact between the tool and metal generates friction, leading to high temperatures and vibrations under complex working conditions.Gradual tool wear occurs during the machining process, which affects tool performance and lifespan.Abnormal wear is caused by various sudden tool failures, which are primarily caused by impact forces generated during milling processes [29].Tool failure manifests primarily as chipping, cracking, delamination, or plastic deformation.
Figure 7 illustrates a typical tool wear curve, which indicates that tool wear evolves with increasing cutting time in three main stages: the initial wear, normal wear, and rapid wear stages [30].
The characteristics of the tools vary across different wear stages, as shown in Figure 8, which illustrates the three tool wear stages: (a) Initial wear stage.Figure 8a shows an image of a tool in the initial wear stage.During this stage, the tool exhibits minor wear patterns as it engages with the workpiece.The initial wear is characterized by a slight removal of material from the tool's surface.(b) Normal wear stage.After machining operations, the tool progresses to the normal wear stage, as depicted in Figure 8b.In this stage, the wear pattern becomes more pronounced, reflecting a consistent removal of material from the tool's surface as the machining operations continue.Although the tool experiences wear, it remains functional.(c) Rapid wear stage.Figure 8c displays an image of the tool in the rapid wear stage, in which the tool undergoes significant wear, signaling that the end of its lifespan is near.At this stage, the tool exhibits severe damage, such as chipping, cracking, or plastic deformation, indicating imminent failure.
during milling processes [29].Tool failure manifests primarily as chipping, cracki lamination, or plastic deformation.Figure 7 illustrates a typical tool wear curve, which indicates that tool wear e with increasing cutting time in three main stages: the initial wear, normal wear, an wear stages [30].this stage, the tool exhibits minor wear patterns as it engages with the workpie initial wear is characterized by a slight removal of material from the tool's sur (b) Normal wear stage.After machining operations, the tool progresses to the wear stage, as depicted in Figure 8b.In this stage, the wear pattern become pronounced, reflecting a consistent removal of material from the tool's surface machining operations continue.Although the tool experiences wear, it remain tional.

Experimental Design and Data Collection
To obtain raw data for the development of data functions and construction of the algorithms described in the subsequent sections, milling experiments on heat-resistant stainless steel were designed and conducted using an intelligent monitoring system for cutting processes.The workpiece material chosen for acquiring multisource physical data during machining was heat-resistant stainless steel (1Cr11Ni2W2MoV).To collect the data, cutting experiments were conducted on a VMC-1000B vertical machining center.The workpiece was wire-cut into a rectangular block measuring 200 mm × 100 mm × 30 mm to facilitate clamping.Vibration data were collected using an NI acquisition box, filtering amplifier, and RS485 temperature and vibration sensor, as shown in Figure 9a.The cutting tools that were used were HRC550 LYD-type hard alloy end mills, including D8, D10, D12, and D16 double-edge end mills, as shown in Figure 9b.

Experimental Design and Data Collection
To obtain raw data for the development of data functions and construction of the algorithms described in the subsequent sections, milling experiments on heat-resistant stainless steel were designed and conducted using an intelligent monitoring system for cutting processes.The workpiece material chosen for acquiring multisource physical data during machining was heat-resistant stainless steel (1Cr11Ni2W2MoV).To collect the data, cutting experiments were conducted on a VMC-1000B vertical machining center.The workpiece was wire-cut into a rectangular block measuring 200 mm × 100 mm × 30 mm to facilitate clamping.Vibration data were collected using an NI acquisition box, filtering amplifier, and RS485 temperature and vibration sensor, as shown in Figure 9a.The cutting tools that were used were HRC550 LYD-type hard alloy end mills, including D8, D10, D12, and D16 double-edge end mills, as shown in Figure 9b.
The milling process involved face milling with a cutter path length of 200 mm and cutting width of 75% the tool diameter.In the face milling experiments, the cutting data were obtained under different conditions and the tool was worn to the stage required for the machining experiments.
To develop a predictive model of tool wear applicable to various conditions, milling experiments were conducted by varying the cutting parameters.Signals such as the cutting force and vibration acceleration were collected for different sets of cutting parameters and tool wear stages (initial wear, normal wear, and rapid wear).The machining path and structure of the finished parts are shown in Figures 10a and 10b, respectively.In total, 105 milling experiments were conducted using different cutting parameters.After each cutting operation, the tool wear was measured using an HY-H2100 portable electronic microscope, as shown in Figure 11.This allowed the tool wear to be measured after each cutting operation.After machining, each part was examined using a micrometer, as shown in Figure 12a.To efficiently gather additional data, targeted supplementary experiments on thin-walled specimens were designed and conducted, as shown in Figure 12b.
stainless steel were designed and conducted using an intelligent monitoring system for cutting processes.The workpiece material chosen for acquiring multisource physical data during machining was heat-resistant stainless steel (1Cr11Ni2W2MoV).To collect the data, cutting experiments were conducted on a VMC-1000B vertical machining center.The workpiece was wire-cut into a rectangular block measuring 200 mm × 100 mm × 30 mm to facilitate clamping.Vibration data were collected using an NI acquisition box, filtering amplifier, and RS485 temperature and vibration sensor, as shown in Figure 9a.The cutting tools that were used were HRC550 LYD-type hard alloy end mills, including D8, D10, D12, and D16 double-edge end mills, as shown in Figure 9b.The milling process involved face milling with a cutter path length of 200 mm and cutting width of 75% the tool diameter.In the face milling experiments, the cutting data were obtained under different conditions and the tool was worn to the stage required for the machining experiments.
To develop a predictive model of tool wear applicable to various conditions, milling experiments were conducted by varying the cutting parameters.Signals such as the cutting force and vibration acceleration were collected for different sets of cutting parameters and tool wear stages (initial wear, normal wear, and rapid wear).The machining path and structure of the finished parts are shown in Figure 10a and Figure 10b, respectively.In total, 105 milling experiments were conducted using different cutting parameters.After each cutting operation, the tool wear was measured using an HY-H2100 portable electronic microscope, as shown in Figure 11.This allowed the tool wear to be measured after each cutting operation.After machining, each part was examined using a micrometer, as shown in Figure 12a.To efficiently gather additional data, targeted supplementary experiments on thin-walled specimens were designed and conducted, as shown in Figure 12b.structure of the finished parts are shown in Figure 10a and Figure 10b, respectively.In total, 105 milling experiments were conducted using different cutting parameters.After each cutting operation, the tool wear was measured using an HY-H2100 portable electronic microscope, as shown in Figure 11.This allowed the tool wear to be measured after each cutting operation.After machining, each part was examined using a micrometer, as shown in Figure 12a.To efficiently gather additional data, targeted supplementary experiments on thin-walled specimens were designed and conducted, as shown in Figure 12b.Figure 13 shows the sensors used during the experimental machining pr their installation positions, including the arrangement of each sensor, the type and the clamping of the machining material.The experiment was designed aforementioned equipment to prepare for the subsequent data collection, exp analysis, and derivation of results.Figure 13 shows the sensors used during the experimental machining process and their installation positions, including the arrangement of each sensor, the types of tools, and the clamping of the machining material.The experiment was designed using the aforementioned equipment to prepare for the subsequent data collection, experimental analysis, and derivation of results.

Selection of Experimental Data
To ensure the completeness of the experimental data, each operating condition was treated as a separate experimental objective.Complete thin-wall milling was performed to collect the data and validate the results.The five best thin-walled pieces produced during the experiment were selected for analysis.For each thin-walled piece, 20 datasets were chosen based on the processing parameters.Thus, a total of 100 sets of experimental data were analyzed.The selection of the data focused on the x-axis owing to the intense spindle vibration that occurred when the tool was being machined.The experimental parameters and machining conditions are listed in Table 2.

Selection of Experimental Data
To ensure the completeness of the experimental data, each operating condition was treated as a separate experimental objective.Complete thin-wall milling was performed to collect the data and validate the results.The five best thin-walled pieces produced during the experiment were selected for analysis.For each thin-walled piece, 20 datasets were chosen based on the processing parameters.Thus, a total of 100 sets of experimental data were analyzed.The selection of the data focused on the x-axis owing to the intense spindle vibration that occurred when the tool was being machined.The experimental parameters and machining conditions are listed in Table 2.

Feature Signal Analysis
Studies on the technology used to monitor machine tool spindle vibrations is crucial for reducing downtime and ensuring product quality.Effective monitoring and diagnostic techniques are often required to monitor the status of equipment.Among the various signals that reflect machine tool status, vibration signals can directly indicate the machining status and dynamic characteristics of a machine tool.Therefore, they are widely used to monitor and identify a machine tool's status.Taking the collected vibration signal as an example, the vibration signal after the three-level wavelet packet decomposition is shown in Figure 14, and the frequency-domain signals reconstructed after the three-level wavelet packet decomposition are illustrated in Figure 15. Figure 16 shows the spindle vibration signals and their frequency spectra for four different states.
Directly observing the working status of the machine tool spindle from the sensor feature data alone is challenging.Therefore, it is necessary to extract feature coefficients that can effectively characterize the overall spindle and feature parameters that represent the working state under different conditions.These feature parameters can be obtained by analyzing the vibration signal amplitudes, frequencies, and phases.By comparing the feature parameters corresponding to different conditions, the trends in the machine tool spindle vibrations can be determined, which enables abnormal states or faults to be identified.The timely monitoring and diagnosis of the machine tool spindle vibrations can prevent potential failures and enable appropriate maintenance and repair measures to be taken, thus minimizing downtime and maximizing product quality.This process provides critical information for identifying the vibration status and enables a deeper understanding of the operational status of the machine tool.
As shown in Figure 16, the x-axis represents the number of points that are sampled and the y-axis represents the amplitude of the vibration signal.Figure 16a shows that during normal stable cutting, the changes in the vibration signal are relatively smooth and regular.This occurs because, during normal wear, the wear intensity of the tool edge is uniform, resulting in a stable signal.
The vibration signals exhibited during moderate wear are shown in Figure 16b.Compared to normal wear, very few transient impacts and abrupt high-frequency components are present.When the wear becomes severe, the temporal signal changes become more pronounced.In the rapid wear signal, a large number of nonstationary random components and abrupt frequency components are present, as shown in Figure 16c.Finally, Figure 16d indicates that the signal changes dramatically when the tool reaches the chipped edge stage.The energy of the chipped edge signal reaches its maximum, which produces transient impact components with much greater intensities than the wear signal.
in Figure 14, and the frequency-domain signals reconstructed after the three-level wavelet packet decomposition are illustrated in Figure 15. Figure 16 shows the spindle vibration signals and their frequency spectra for four different states.
Directly observing the working status of the machine tool spindle from the sensor feature data alone is challenging.Therefore, it is necessary to extract feature coefficients that can effectively characterize the overall spindle and feature parameters that represent the working state under different conditions.These feature parameters can be obtained by analyzing the vibration signal amplitudes, frequencies, and phases.By comparing the feature parameters corresponding to different conditions, the trends in the machine tool spindle vibrations can be determined, which enables abnormal states or faults to be identified.The timely monitoring and diagnosis of the machine tool spindle vibrations can prevent potential failures and enable appropriate maintenance and repair measures to be taken, thus minimizing downtime and maximizing product quality.This process provides critical information for identifying the vibration status and enables a deeper understanding of the operational status of the machine tool.As shown in Figure 16, the x-axis represents the number of points that are sampled and the y-axis represents the amplitude of the vibration signal.Figure 16a shows that during normal stable cutting, the changes in the vibration signal are relatively smooth and regular.This occurs because, during normal wear, the wear intensity of the tool edge is uniform, resulting in a stable signal.As shown in Figure 16, the x-axis represents the number of points that are sampled and the y-axis represents the amplitude of the vibration signal.Figure 16a shows that during normal stable cutting, the changes in the vibration signal are relatively smooth and regular.This occurs because, during normal wear, the wear intensity of the tool edge is uniform, resulting in a stable signal.

Results, Discussion, and Analysis
The data collected in the experiments described in the previous section were used to train the model.During the experiment, data were collected from vibration, cutting force, and current sensors on the CNC milling machine worktable in the X-, Y-, and Z-directions.This diverse dataset provided an accurate and comprehensive basis for monitoring tool wear.

LSTM-Based Tool Wear Prediction Model
First, feature extraction was performed on the data collected from the sensors, followed by feature selection.The selected feature vectors were then fed into the LSTM prediction model, and the actual tool wear that occurred during machining served as the training set for the model.
LSTM neural network models possess strong self-learning capabilities for handling sequential data.They possess both long-and short-term memories that enable them to extract deep features from sequential data.This implies that LSTM networks can predict and classify sequential data by learning the patterns and rules within the data.In this section, the preprocessed signal data are used as input to directly train the LSTM model and validate its self-learning capabilities.

Results, Discussion, and Analysis
The data collected in the experiments described in the previous section were used to train the model.During the experiment, data were collected from vibration, cutting force, and current sensors on the CNC milling machine worktable in the X-, Y-, and Z-directions.This diverse dataset provided an accurate and comprehensive basis for monitoring tool wear.

LSTM-Based Tool Wear Prediction Model
First, feature extraction was performed on the data collected from the sensors, followed by feature selection.The selected feature vectors were then fed into the LSTM prediction model, and the actual tool wear that occurred during machining served as the training set for the model.
LSTM neural network models possess strong self-learning capabilities for handling sequential data.They possess both long-and short-term memories that enable them to extract deep features from sequential data.This implies that LSTM networks can predict and classify sequential data by learning the patterns and rules within the data.In this section, the preprocessed signal data are used as input to directly train the LSTM model and validate its self-learning capabilities.
The specific steps of Experiment 1 were as follows.First, feature vectors were obtained from the preprocessed normalized signals of the tools, and they served as input for training the model.This approach effectively connected the tool wear with the features of the monitored signals.During the training phase, the collected wear data were used as labels for supervised model training.In the testing phase, the preprocessed signals were used as the test set to validate the LSTM model's predictions.After approximately 120 iterations, the results showed that the overall change in the loss function stabilized, yielding an RMSE of 0.0281.This indicated that the model performed well in predicting tool wear.
Subsequently, the preprocessed monitoring signal data were used to test the model, and the LSTM model was employed for tool wear prediction.The evaluation of the prediction results for the training and test sets is shown in Figure 17.The average MAE of the tool wear prediction was 0.0036 mm for the training set and 0.0181 mm for the test set.These results demonstrated the accuracy and feasibility of the proposed method.

ResNet-Based Tool Wear Prediction Model
The model training process is illustrated in Figure 18.A fusion feature matrix combining the vibration, current, and cutting force signals was constructed, and this matrix was used to train the tool wear prediction model.
The specific steps of Experiment 2 were as follows.First, feature vectors were obtained from the tool's full-life monitoring signals, and they served as input for training the model.This approach effectively connected the tool wear with the features of the monitored signals.The experiment indicated that, although the convergence speed of the loss function was relatively slow for the same number of iterations, the overall change in the loss function was significantly smaller, resulting in an RMSE of 0.0182.This indicated that the model performed well in predicting tool wear.
To further extract features from the monitoring signals, a wavelet packet transform was applied.This method allowed a more refined feature extraction, which improved the accuracy of the tool wear prediction.The feature vectors obtained were used as inputs for the ResNet model.To predict the tool wear, the ResNet model was used, yielding satisfactory results.The evaluation of the prediction results for the training and test sets is shown in Figure 19.The average MAE of the tool wear prediction was 0.0037 mm for the training set and 0.0117 mm for the test set.These results demonstrated the accuracy and feasibility of the proposed method.
Therefore, the results indicated that using feature vectors and the ResNet model for tool wear prediction was effective.Hence, this approach is not only capable of improving the prediction accuracy, but also of contributing to the timely replacement of worn tools during the manufacturing process, thereby enhancing the production efficiency and product quality.

ResNet-Based Tool Wear Prediction Model
The model training process is illustrated in Figure 18.A fusion feature matrix combining the vibration, current, and cutting force signals was constructed, and this matrix was used to train the tool wear prediction model.
The specific steps of Experiment 2 were as follows.First, feature vectors were obtained from the tool's full-life monitoring signals, and they served as input for training the model.This approach effectively connected the tool wear with the features of the monitored signals.
The experiment indicated that, although the convergence speed of the loss function was relatively slow for the same number of iterations, the overall change in the loss function was significantly smaller, resulting in an RMSE of 0.0182.This indicated that the model performed well in predicting tool wear.
To further extract features from the monitoring signals, a wavelet packet transform was applied.This method allowed a more refined feature extraction, which improved the accuracy of the tool wear prediction.The feature vectors obtained were used as inputs for the ResNet model.To predict the tool wear, the ResNet model was used, yielding satisfactory results.The evaluation of the prediction results for the training and test sets is shown in Figure 19.The average MAE of the tool wear prediction was 0.0037 mm for the training set and 0.0117 mm for the test set.These results demonstrated the accuracy and feasibility of the proposed method.

Prediction Model of Tool Wear Based on ResNet-LSTM
The ResNet-LSTM network model is illustrated in Figure 20.

Prediction Model of Tool Wear Based on ResNet-LSTM
The ResNet-LSTM network model is illustrated in Figure 20.Therefore, the results indicated that using feature vectors and the ResNet model for tool wear prediction was effective.Hence, this approach is not only capable of improvthe prediction accuracy, but also of contributing to the timely replacement of worn tools during the manufacturing process, thereby enhancing the production efficiency and product quality.

Prediction Model of Tool Wear Based on ResNet-LSTM
The ResNet-LSTM network model is illustrated in Figure 20.The feature signals, which were preprocessed but not denoised, were converted into grayscale images.Two 3 × 3 convolutional layers were used.The convolutional layer of each residual module was defined as a 2 × 2 pooling layer to achieve maximum pooling.The number of neurons was set to 100, and they were connected to the LSTM layer through the pooling layer.Two LSTM layers were set up with a number of hidden layer neurons, as shown in Figure 20.The fully connected layer had one neuron, and its output value represented the predicted tool wear.
Because of the small input dimensions of the ResNet-LSTM network model, the training speed was relatively slow.To improve the training speed, the network model was initialized using the rectified linear unit (ReLU) activation function and the network input dimensions were set to 70 × 70.The batch size of the model was set to 30.
The input parameter of the ResNet-LSTM network model is the preprocessed signal.The number of iterations was set to 500, and the other training parameters were the same as previously described.The tool life data collected by the sensors during the machining process were used as the training set.The differentiation between the training and test sets was the same as that described in previous sections.After training the model, the loss function approached zero and remained stable.The loss function of the validation set had an RMSE of 0.0101.Therefore, the experimental results indicated that the model achieved the expected convergence after approximately 100 iterations.
Subsequently, the tool wear data were tested using the test set, and the predicted results are shown in Figure 21.The average error of the tool wear prediction for the The feature signals, which were preprocessed but not denoised, were converted into grayscale images.Two 3 × 3 convolutional layers were used.The convolutional layer of each residual module was defined as a 2 × 2 pooling layer to achieve maximum pooling.The number of neurons was set to 100, and they were connected to the LSTM layer through the pooling layer.Two LSTM layers were set up with a number of hidden layer neurons, as shown in Figure 20.The fully connected layer had one neuron, and its output value represented the predicted tool wear.

Input layer
Because of the small input dimensions of the ResNet-LSTM network model, the training speed was relatively slow.To improve the training speed, the network model was initialized using the rectified linear unit (ReLU) activation function and the network input dimensions were set to 70 × 70.The batch size of the model was set to 30.
The input parameter of the ResNet-LSTM network model is the preprocessed signal.The number of iterations was set to 500, and the other training parameters were the same as previously described.The tool life data collected by the sensors during the machining process were used as the training set.The differentiation between the training and test sets was the same as that described in previous sections.After training the model, the loss function approached zero and remained stable.The loss function of the validation set had an RMSE of 0.0101.Therefore, the experimental results indicated that the model achieved the expected convergence after approximately 100 iterations.
Subsequently, the tool wear data were tested using the test set, and the predicted results are shown in Figure 21.The average error of the tool wear prediction for the training set was MAE = 0.0021 mm and that for the test set was MAE = 0.0085 mm.These results indicated that this model provided the most accurate prediction, and that the experimental results were consistent with the expected ideal outcomes.training set was MAE = 0.0021 mm and that for the test set was MAE = 0.0085 mm.These results indicated that this model provided the most accurate prediction, and that the experimental results were consistent with the expected ideal outcomes.Table 3 compares the prediction accuracy and wear error of the three network models using the same tool data as the test set.By comparing the prediction results of each model, the following conclusions can be drawn.When using the ResNet network model, wear prediction was performed by extracting the feature vectors of the signal.The experimental results showed that as the number of model layers increased, the loss function significantly decreased.Moreover, as the network depth increased further, the accuracy approached saturation without decreasing.However, after adding two LSTM layers, the accuracy further improved, indicating that the feature extraction of the LSTM model was more effective, improving the tool wear prediction.Finally, the ResNet-LSTM model was proposed by combining residual neural networks with the LSTM network model, which significantly improved the prediction accuracy of the model compared to the individual LSTM and ResNet models.The ResNet-LSTM model yielded an average prediction error of 0.0085 mm and a tool wear prediction accuracy of 98.25%.

Conclusions
With the widespread application of CNC machine tools, the accurate monitoring of machining process states and the precise identification of tool wear have become increasingly important.Experiments on tool wear prediction during machine tool processing were designed, and a tool wear prediction system based on multi-sensor information fusion was proposed.The main conclusions of this study are as follows: (1) The use of the Kalman filtering algorithm for feature extraction and the fusion of multi-sensor signals provided a basis for subsequent model training.Table 3 compares the prediction accuracy and wear error the three network models using the same tool data as the test set.By comparing the prediction results of each model, the following conclusions can be drawn.When using the ResNet network model, wear prediction was performed by extracting the feature vectors of the signal.The experimental results showed that as the number of model layers increased, the loss function significantly decreased.Moreover, as the network depth increased further, the accuracy approached saturation without decreasing.However, after adding two LSTM layers, the accuracy further improved, indicating that the feature extraction of the LSTM model was more effective, improving the tool wear prediction.Finally, the ResNet-LSTM model was proposed by combining residual neural networks with the LSTM network model, which significantly improved the prediction accuracy of the model compared to the individual LSTM and ResNet models.The ResNet-LSTM model yielded an average prediction error of 0.0085 mm and a tool wear prediction accuracy of 98.25%.

Conclusions
With the widespread application of CNC machine tools, the accurate monitoring of machining process states and the precise identification of tool wear have become increasingly important.Experiments on tool wear prediction during machine tool processing were designed, and a tool wear prediction system based on multi-sensor information fusion was proposed.The main conclusions of this study are as follows: However, in actual production and machining processes, more complex machining phenomena, in which the machining efficiency involves multiple influencing factors, are often encountered.This study collected and processed data from only four working conditions.Therefore, in future research, we will aim for a more comprehensive understanding of the tool wear status that occurs during machining and conduct more in-depth experiments and data analyses of the complex working conditions encountered during machine tool processing.In addition, high temperatures significantly affect tool life, but the influence of high temperatures on tool life was not considered in this study because of the use of cutting fluids.Accordingly, in future work, we will consider adding external temperature sensors to monitor the impact of high temperatures on tool life.

Sensors 2024 , 2 Figure 1 .
Figure 1.Diagram of the basic structure of the method developed in this study.

Figure 1 .
Figure 1.Diagram of the basic structure of the method developed in this study.

Figure 2 .
Figure 2. Structure of a state-recognition system based on multi-sensor information fusion.

Figure 2 .
Figure 2. Structure of a state-recognition system based on multi-sensor information fusion.

Figure 3 .
Figure 3. Tree structure of the three-level wavelet packet decomposition.

Figure 4 .
Figure 4. Diagram of the basic structure of LSTM.

Figure 5 .
Figure 5. Structure of the LSTM network model.

Figure 5 .
Figure 5. Structure of the LSTM network model.

Figure 8 .
Figure 8. Wear status of milling cutters in different stages: (a) initial wear stage; (b) normal wear stage; (c) rapid wear stage.

Figure 9 .
Figure 9. Experimental setup and tool selection: (a) CNC machine tool experimental platform; (b) HRC550 LYD hard alloy knife.

Figure 9 .
Figure 9. Experimental setup and tool selection: (a) CNC machine tool experimental platform; (b) HRC550 LYD hard alloy knife.

Figure 12 .
Figure 12.All machined parts processed in the experiment: (a) measurement with a dial gauge micrometer; (b) experimental processing of thin-walled specimens.

25 Figure 13 .
Figure 13.Diagram of the installation positions of the sensors used in the experiment.

Figure 13 .
Figure 13.Diagram of the installation positions of the sensors used in the experiment.

Figure 15 .
Figure 15.Frequency-domain signal reconstructed by the three-layer wavelet packet decomposition.

Figure 16 .
Figure 16.Wavelet analysis of the characteristics of the spindle vibration signals for four different states: (a) normal wear cutting state; (b) moderate wear cutting state; (c) rapid wear cutting state; (d) tool breakage cutting state.The vibration signals exhibited during moderate wear are shown in Figure 16b.Compared to normal wear, very few transient impacts and abrupt high-frequency components are present.When the wear becomes severe, the temporal signal changes become more pronounced.In the rapid wear signal, a large number of nonstationary random components and abrupt frequency components are present, as shown in Figure 16c.Finally, Figure 16d indicates that the signal changes dramatically when the tool reaches the chipped edge stage.The energy of the chipped edge signal reaches its maximum, which produces transient impact components with much greater intensities than the wear signal.

Figure 16 .
Figure 16.Wavelet analysis of the characteristics of the spindle vibration signals for four different states: (a) normal wear cutting state; (b) moderate wear cutting state; (c) rapid wear cutting state; (d) tool breakage cutting state.

Figure 19 .
Figure 19.Tool wear prediction results: (a) training set wear prediction; (b) test set wear prediction.

Figure 19 .
Figure 19.Tool wear prediction results: (a) training set wear prediction; (b) test set wear prediction.

Figure 19 .
Figure 19.Tool wear prediction results: (a) training set wear prediction; (b) test set wear prediction.

Figure 21 .
Figure 21.Tool wear prediction results: (a) training set wear prediction; (b) test set wear prediction.

Figure 21 .
Figure 21.wear prediction results: (a) training set wear prediction; (b) test set wear prediction.

( 1 )( 2 )
The use of the Kalman filtering algorithm for feature extraction and the fusion of multi-sensor signals provided a basis for subsequent model training.Using the LSTM network model and training it with the fused features of three signals generated a favorable prediction performance, although the signal features were not distinct.(3) The ResNet model was constructed for experiments with the same tool wear data, resulting in improved accuracy but a slower convergence speed for the loss function.(4) The ResNet-LSTM model was constructed by combining residual neural networks with the LSTM network model, which significantly improved the prediction accuracy compared to the individual LSTM and ResNet models.Moreover, the combination of residual neural networks and LSTM networks exhibited a certain adaptive denoising capability at the front end of the network for feature extraction, thereby enhancing the signal feature extraction capability.(5) Finally, the reliability of the method was verified through actual machining experiments.

Table 1 .
Initialization parameters for the LSTM network model.

Table 2 .
Experimental parameters and machining conditions in the milling process.

Table 2 .
Experimental parameters and machining conditions in the milling process.

Table 3 .
Comparison of tool wear prediction results.

Table 3 .
Comparison of tool wear prediction results.