Tool Wear Estimation in the Milling Process Using Backpropagation-Based Machine Learning Algorithm

: Tool condition monitoring (TCM) systems are essential in milling operations to guarantee the product’s quality, and when those are paired with indirect measure techniques, such as vibration or acoustic emission sensors, the monitoring can happen without sacrificing productivity. Some more advanced techniques in tool wear estimation are based on supervised machine learning algorithms, like several other applications in Industry 4. 0′ s context, however, a satisfactory performance can be obtained with simple techniques and low computational power. This work focuses on an application of tool wear estimation using a simple backpropagation neural network in a milling dataset. Statistics techniques, i.e., the mean, variance, skewness, and kurtosis were used as features extracted from indirect measurements from vibration and acoustic emission sensors’ data in a real milling testbench dataset containing multiple experiments with sensor data and a direct measure of the flank wear (VB) in most instances. The data was preprocessed, specifically to acquire clean and normalized values for the neural network training, assuming the VB measure as the target variable to predict tool wear, and all incomplete samples without a VB measure, as well as outliers, were removed beforehand. The train and test subsets were chosen randomly after making sure that the maximum values of every variable were represented in the training subset. A multiple topology approach was implemented to test multiple backpropagation neural networks’ configurations to determine the most suitable one based on two performance criteria, i


Introduction
Machining is a material removal process, mainly used in computer numerical control (CNC) systems.The system's cutting tool is worn down by the workpiece with each operation and this can irreversibly affect the surface of the workpiece, so tool condition monitoring (TCM) systems are essential to prevent failures and guarantee the quality of the product [1].
TCM systems use can be divided into two groups [2]: (1) Direct monitoring, directly measuring fault values such as tool wear (TW) utilizing more expensive lasers and optical sensors, and (2) Indirect monitoring, measuring physical parameters that represent tool condition parameters indirectly.Some direct monitoring methods that use microscopy or vision systems require the machining process to be interrupted and the tool to be removed from the system, while indirect methods can measure with lower precision without affecting the production line [3].Various types of sensors can be used for indirect measurements, such as dynamometers to measure cutting forces, accelerometers to measure vibration, acoustic emission (AE) sensors to measure this very parameter in the system, motor current sensors, and even microphones to measure the sound of the process to indirectly measure TW [4].The next step is to remove signal characteristics (vibration, AE, electric current, etc.) from pre-processed data.Such techniques can be only in the time domain, e.g., mean and variance, frequency domain, e.g., fast Fourier transform (FFT), or mixed in both time and frequency domain, such as the Wavelet transform (WT) [5].
Several artificial intelligence (AI) tools can be used in the decision-making process about the estimated condition of the tool based on the characteristics of the signals, specifically classical supervised machine learning algorithms such as decision trees and knearest neighbors (KNN) for classification and linear regression.However, deep learning algorithms such as convolutional neural networks (CNN) and recurrent neural networks (RNN) are also used [6].
Indirect measurement can be applied to a specific type of machining operation, such as turning, grinding, or milling.Among these, milling is one of the most common and important, so reducing costs and increasing product quality is essential.The use of TCM systems can provide parameters such as AE, vibration, cutting force, etc. to detect any faults or adverse conditions in the operation [7].The proposed work focuses on monitoring the milling operation.
Several studies focus on monitoring the TW of milling operations.[8] proposes a method that uses the short-time Fourier transform (STFT) to extract the characteristics of the vibration signal from milling operations.The STFT is used to form a time-frequency map that will be the input image in a supervised convolutional neural network (CNN) to predict tool flank wear.Another work [1] with the same goal as the previous one uses a comparative approach between statistical techniques and discrete WT with or without the Hölder exponent to analyze milling vibration signals.It also compares the performance of various machine learning algorithms, such as decision trees, k-nearest neighbors (KNN), and multi-layer Perceptron (MLP) neural networks.Another method in the literature that predicts flank wear in a milling operation uses the characteristics of a cutting force model and Wavelet packet decomposition (WPD) as the input to a deep MLP neural network [9].
Supervised machine learning methods used to estimate TW using indirect methods, specifically AE and vibration sensor signals, can accurately estimate the wear state of the machining tool.Therefore, the work proposes an MLP artificial neural network (ANN) model to estimate the TW, with the support of statistical techniques for AE and vibration signals.The public dataset used [10] provides real information from milling experiments with several measurements, enough to train a few MLP ANN's configurations and test their ability to predict TW.

Materials and Methods
This section presents the methods and materials used in the work, specifically, the dataset pre-processing and the conditions for implementing, training, and testing the proposed neural networks.
The chosen dataset is made up of experimental data from the milling operation and is called mill dataset.Pre-processing of the data is necessary to have clean and normalized values to serve as inputs for the ANN.The file is a MATLAB-specific.mat file with a mill struct made up of 167 samples and 13 fields with different variables.The explanation of each variable, as well as important information about the dataset, is described in the Readme file next to the data file.
The target variable for the ANN to estimate is called VB, the flank TW.The values of case, a specific test condition, are a combination of the variables DOC, feed, and material, meaning, respectively, the depth of cut, the feed rate of the tool, and the type of material, which can be cast iron (1) and steel (2).
Each case has a unique number of passes of the tool on the workpiece, this variable is called run.The duration of each case is in the variable time.Finally, the sensor variables already are the root mean square (RMS) values collected in around 9000 samples.Only two of these will be used, the vibration signals from the variable vib_table and AE from the variable AE_table.Typical raw AE signal values are shown in Figure 1.Some missing values (NaN) were found in the MATLAB structure.This had already been considered, as the Readme of the dataset mentions that the VB was not measured for each run.Therefore, the samples (run) with missing values were simply removed using MATLAB's own function.The DOC and feed variables were also removed, considering they were redundant as the case already represents a combination of these variables.
vib_table and AE_table sensor variables were split into statistical variables, specifically the mean, variance, skewness, and kurtosis for each signal.A simple analysis of the mean values of the statistics highlighted an anomaly in the mean and variance values leading to a huge difference between the values.The outlier sample value was highlighted (Figure 2) and it appears the signal was corrupted and, consequently, removed from all tables, resulting in much more realistic values.The statistical values replaced the signal values with 8 new variables, specifically, m_AE, m_vib, v_AE, v_vib, s_AE, s_vib, k_AE and k_vib.Each variable's name represents a combination of the first letter of mean, variance, skewness or kurtosis, and AE or vibration specifying the statistical value and sensor.The resulting table had 145 samples and 13 scalar variables.The resulting values were normalized between 0 and 1 with a MATLAB function, but the run variable range was dependent on each case, so it needed to be normalized separately, as there are different numbers of runs for each case, especially after removing some samples.The run variable was added to the normalized table after that.
The resulting values were normalized between 0 and 1 with a MATLAB function, but the run variable range was dependent on each case, so it needed to be normalized separately, as there are different numbers of runs for each case, especially after removing some samples.The run variable was added to the normalized table after that.
Samples were split into random training and test sets, using 80% for training and 20% for testing, according to the guidelines in [11].The same author also points out that the minimum and maximum values of each variable must be in the training set, so these values are removed from the random selection of the test set and the rest goes into the training set.

Backpropagation Neural Network
The MLP neural network was used as a universal function approximator, as it will estimate a real value between 0 and 1, according to the normalization done previously.The candidate topologies were 3 using only one hidden layer, respectively, with 5, 10, and 15 neurons.
The MLP networks were trained using the generalized delta rule three times for each candidate topology.The training set had 116 samples for each training with a maximum number of 1000 epochs, all of which converged for a network accuracy ϵ = 0.5 × 10 −6 and a learning rate η = 0.1.
The final training results (T1, T2, and T3) are compiled in Table 1, based on the number of epochs, i.e., how many times the algorithm had to be presented with all the training samples until it converged, and the mean squared error (MSE).The network will only converge when the absolute value of the difference between the MSE of the current and previous epoch is less than the ϵ precision.All the training sessions had good results, except for T3 in Network 2, but the best was T2 in Network 1 and T1 in Network 3, with the lowest MSE and the lowest number of epochs, so it converged faster.

Tests
Results are based on 9 different tests for each training and topology configuration.The main metrics used: the mean absolute percentage error (MAPE) and variance (σ 2 ).

Results and Discussion
Plots in Figure 3a-c coincide with the smallest MAPE, all resulting from one of the training sessions (T1, T2, or T3) of each network compiled in Table 2.The output values are very close to the desired values, represented by a black dotted line, but there was no VB result greater than 0.5, contributing to a higher MAPE value.All the networks failed on the first sample.One solution would be to obtain more training data with high values (above 0.5) to balance the distribution or even to have a better method for separating training and test sets, such as the k-partition cross-validation method, the aim of which would be to evaluate each network topology on different training and test subsets with a specific size (k) [11].The best results were close to 24% MAPE and 0.43 variance in network 3.

Conclusions
The work demonstrated how a relatively simple supervised machine-learning algorithm can estimate values that come from complex relationships between the milling tool and the workpiece.Simple methods were also used to remove features from the signal and the result was satisfactory as a proof of concept of the power of an ANN for estimating values.Clearly, more complex ANN models such as CNN and more robust feature extraction methods such as WT would have a better result, but at a higher cost of computation to train, compared to MLP with only one layer and fast training convergences.
Still, it's possible to improve the MLP's performance with more efficient algorithms and better data preparation, including adding two or more hidden layers and being able to reach the deep MLP (DMLP) with at least four hidden layers.Therefore, future work would be to compare a more robust MLP with another more advanced ANN, considering both the accuracy and the computational cost of training and implementation.The best algorithm will be the most appropriate considering the problem constraints.

Figure 1 .
Figure 1.Typical raw RMS values of AE sensor in a run.

Table 2 .
Test results.Figure4a-cshow the convergence of each network in its best training, i.e., the MSE for each epoch.All networks had similar convergence curves, but both networks 2 and 3 managed to converge in fewer epochs.Network 2 had the smallest MSE in the test converging in a similar number of epochs of network 3 in training but didn't perform better in the test phase.