Applying a Neural Network to Predict Surface Roughness and Machining Accuracy in the Milling of SUS304

: Surface roughness and machining accuracy are essential indicators of the quality of parts in milling. With recent advancements in sensor technology and data processing, the cutting force signals collected during the machining process can be used for the prediction and determination of the machining quality. Deep-learning-based artiﬁcial neural networks (ANNs) can process large sets of signal data and can make predictions according to the extracted data features. During the ﬁnal stage of the milling process of SUS304 stainless steel, we selected the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut as the experimental parameters to synchronously measure the cutting force signals with a sensory tool holder. The signals were preprocessed for feature extraction using a Fourier transform technique. Subsequently, three different ANNs (a deep neural network, a convolutional neural network, and a long short-term memory network) were applied for training in order to predict the machining quality under different cutting conditions. Two training methods, namely whole-data training and training by data classiﬁcation, were adopted. We compared the predictive accuracy and efﬁciency of the training process of these three models based on the same training data. The training results and the measurements after machining indicated that in predicting the surface roughness based on the feed per tooth classiﬁcation, all the models had a percentage error within 10%. However, the convolutional neural network (CNN) and long short-term memory (LSTM) models had a percentage error of 20% based on the whole-data training, while that of the deep neural network (DNN) model was over 50%. The percentage error for the machining accuracy prediction based on the whole-data training of the DNN and CNN models was below 10%, while that of the LSTM model was as large as 20%. However, there was no signiﬁcant improvement in the results of the classiﬁcation training. In all the training processes, the CNN model had the best analytical efﬁciency, followed by the LSTM model. The DNN model performed the worst.


Introduction
Many studies have investigated the machining results of end milling.The feed rate, attributes of the workpiece materials, cutting speed, depth of cut, cutting tools, and machine rigidity all affect the surface and dimensional accuracy of the parts.However, due to the complexity of the cutting, the ideal cutting conditions can only be achieved in laboratories or in theoretical analysis, resulting in difficulty in building an effective prediction model for real-world milling.We must obtain the assumed parameters of the model through many experiments and use various optimization techniques to improve the model in order to set the cutting conditions [1].
ANNs have long been used to optimize cutting processes, such as tool wear monitoring and surface roughness prediction.Das et al. used the back propagation algorithm for training the neural network of turning carbide inserts, and the system showed potential for successful tool wear monitoring [2].Chien et al. developed a predictive model for the machinability of 304 stainless steel with ANNs to predict the surface roughness of the workpiece, the cutting force, and the tool life.It was shown that the errors of the surface roughness, the cutting force, and the tool life were 4.4, 5.3, and 4.2%, respectively [3].Karabulut et al. used ANNs and variance analysis results to predict the surface roughness values of compacted graphite iron after a face milling process [4].The results showed a strong correlation between the lead angle, chip thickness, and surface quality.The surface roughness values were improved with the increasing lead angle value.
During the early development of this technology, the detection and control of cutting forces were expected to optimize the milling results.Tsai et al. employed an accelerometer and a proximity sensor in the milling process and collected vibration and rotation data [5][6][7][8][9][10].The spindle speed, feed rate, depth of cut, and vibration average per revolution (VAPR) were used as input parameters to develop a backpropagation-based artificial neural network (ANN) model to predict the surface roughness.The proposed ANN model had a very high accuracy rate (96-99%) in predicting surface roughness.The resulting high accuracy proved that an ANN can make accurate real-time predictions of surface roughness during end milling.Alique et al. established a versatile neural network model with a single hidden layer [6].Input parameters, such as the feed rate and depth of cut, were applied to predict the average cutting force under different conditions.The model could be used for monitoring, adaptive control, and the real-time prediction of surface roughness and cutting tool vibration.Cus et al. predicted the cutting force of a ball nose cutter using a three-layer ANN [7].The cutting speed, feed rate, radial and axial depth of cut, and cutter diameter were selected as the machining parameters to predict the components of the cutting force during the milling process, yielding an accuracy rate of ±4%.Kadirgama et al. employed an ANN to predict the cutting force for milling 618 stainless steel [8].The cutting speed, feed rate, axial depth of cut, and radial depth of cut were the input parameters, and the cutting force was the output.The range of error was approximately 12%.The error of the prediction was acceptable.According to the literature, through data training, ANN models can predict the cutting force during milling under different cutting conditions.Nevertheless, in this stage of development, the models still had limited applications in these experimental environments and conditions.
In recent years, advancements in sensor technology have improved the transmission method and data size of signals.Signal data can be captured, recorded, and transmitted back in real time during the machining process, so that substantial machining data can be obtained.In addition, big data analysis has become possible because of recent improvements in computing and data storage.Artificial neural algorithms developed using deep learning can extract features from data.If the machining data retrieved by sensors can be analyzed using ANNs, effective prediction models can be established.
A wireless sensory tool holder can be applied to machine tools in which the loads must be dynamically monitored for real-time monitoring and process recording.Ye used the sensory tool holder system for analysis during the rough machining of turbine blades and improved the planning process to shorten the processing time [11].Chen et al. employed a sensory tool holder to measure the cutting force when milling thin-walled parts [12].The cutting force was used as the load to determine the elastic deformation of the parts, and the volume error was offset by the deformation data so as to correct the processing path.This method successfully increased the machining accuracy and efficiency.Lu et al. collected signals in the machining process with a wireless sensory tool holder and extracted their features [13].The deep forest algorithm was applied to estimate the surface mass.The accuracy of the monitoring model for the training sets reached 99.54%, and it reached 90.91% in the case of the validation sets.This approach ensured the surface quality and increased the machining efficiency.The use of wireless sensory tool holders in machining could be expanded in the future.
With the development of deep learning, various ANN models have been established, the most common of which are multilayer perceptron (MP), deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN), and long short-term memory (LSTM).The different connection and transmission methods of the models produce different analytical results.Accordingly, we input the same machining signals into different models for analysis to compare their prediction accuracies.
Although ANNs have been widely used to predict the effects of cutting parameters on machining results, most studies have used the machining conditions as the input.Few studies have extracted real-time machining signals as the input for ANNs.At the same time, there are few studies discussing the differences between different models based on cutting analysis.In the present study, we employed a sensory tool holder to collect cutting force signals during machining and converted the signals through Fourier transform for feature processing.Subsequently, the data were input into three different ANNs (DNN, CNN, and LSTM) for training.The goal of this training was to measure the surface roughness and dimensional accuracy after machining.After the training was completed, the data not used for training were used for testing to determine the training effects and prediction error rates of the models.Finally, by comparing the prediction accuracy and analytical efficiency of these three models, we aimed to identify a model with a high accuracy (with a percentage error of prediction below 10%) and the shortest computing time.The identified model can facilitate real-time surface roughness and machining accuracy prediction.
In the following, the second section introduces the methods and instruments used in this study, including the experimental operation process and the setting of the mathematical model.The third chapter presents the data training results and error analysis.The fourth chapter is the conclusion.

Artificial Neural Network (ANN)
Machine learning is an approach used to realize artificial intelligence.By using algorithms, machine learning can replace the previous methods by discovering rules and forming judgements after repeated experiments.Deep learning, a branch of machine learning, was initially a stagnant field due to its insufficient computational resources and efficiency.With recent improvements in hardware, particularly the emergence of highquality graphics processing units and the rise of big data, deep learning has become the mainstream method of machine learning.An ANN is a type of mathematical, biomimetic neural network model and is the basis of the current deep learning models.Composed of artificial neurons, it contains an input layer, hidden layers, and an output layer.Data and signals can be stored or learned by such models.The calculation of a neuron is conducted through the functions of addition, subtraction, multiplication, and division.The variables, activation functions, errors, and weights input into the models are converted into output values.The most commonly applied activation functions are the Sigmoid function, rectified function (ReLU), and the hyperbolic tangent function.To construct an ANN, the parameters are set manually.Users should determine the appropriate number of neurons and layers in the model according to their requirements and the correct weights through repeated training.The numerous different neural networks developed up to the present day have produced satisfactory results in fields such as machine vision, speech recognition, natural language processing, and biomedicine.
ANN models can use various types of deep learning architectures, including MP, DNN, CNN, RNN, and LSTM.Different models have been used to predict machining results and achieve machine adaptive control.Lai et al. proposed a hybrid recurrent neural network (HRNN) model on the basis of a diagonal recurrent neural network [14].The constant force control applied during machining can be used to verify the effectiveness of the model through simulations and tests.Huang developed a new intelligent neural fuzzy system to assess surface roughness in an end milling operation [15].The model implemented the neural-assisted method to generate the fuzzy IF-THEN rules and obtain higher accuracy in surface roughness prediction.Huang et al. adopted a holistic local LSTM model (HLLSTM) to capture data features and retrieved diachronic machining signals from a triaxial accelerometer for training and testing in order to establish a deep-learning-based tool wear prediction system [16].The results of the HILSTM model were compared with those of a CNN and LSTM model, and the HILSTM model was proven to have a more satisfactory performance.Huang et al. proposed a deep convolutional neural network (DCNN) based on multi-domain feature fusion to predict tool wear [17].The performance of the prediction method was experimentally validated using a three-flute ball nose tungsten carbide cutter for dry milling using a high-speed CNC machine tool.Chan et al. also conducted tool wear prediction with an HLLSTM model [18].The model could reduce the average error of the actual tool wear values and accurately predict tool wear.

Experiment Procedure
In this study, we applied a full factorial design to determine the cutting parameters and employed a five-axis machine for milling.The milling machine was a 5-axis machining center CT-350, manufactured by Tongtai Inc., Kaohsiung, Taiwan, equipped with numeric command (Siemens 840Dsl).The axis of the machine is shown in Figure 1.The workpieces were 80 mm × 80 mm × 60 mm SUS304 stainless steel hexahedrons.Table 1 lists the mechanical properties and chemical composition of SUS304.In the cutting process, the four sides in the XY plane of the hexahedron were milled using the side edge, and the cutting depth was in the Z direction.The processing path was generated using the Siemens NX, as displayed in Figure 2 [19,20].A Ø 10 mm tungsten steel end mill from Chin Ming Precision Tools Co. Tainan, Taiwan, was used for the side milling.The specifications of the tool are presented in Table 2.During the machining process, a sensory tool holder (Pro-micron GmbH & Co. KG, Kaufbeuren, Germany) was used to collect the cutting force signals.The specifications of the sensory tool holder are presented in Table 3.It could measure the axial cutting force, cutting torque, and the bending moment in the X-Y direction and send data to a computer wirelessly.We wrote a neural network program in Python and then extracted features from the captured cutting force data.The compiler was Colaboratory, which is a product from Google Research and is free to use.Finally, we imported the features into the program for the model training and prediction.
system to assess surface roughness in an end milling operation [15].The model implemented the neural-assisted method to generate the fuzzy IF-THEN rules and obtain higher accuracy in surface roughness prediction.Huang et al. adopted a holistic local LSTM model (HLLSTM) to capture data features and retrieved diachronic machining signals from a triaxial accelerometer for training and testing in order to establish a deeplearning-based tool wear prediction system [16].The results of the HILSTM model were compared with those of a CNN and LSTM model, and the HILSTM model was proven to have a more satisfactory performance.Huang et al. proposed a deep convolutional neural network (DCNN) based on multi-domain feature fusion to predict tool wear [17].The performance of the prediction method was experimentally validated using a three-flute ball nose tungsten carbide cutter for dry milling using a high-speed CNC machine tool.Chan et al. also conducted tool wear prediction with an HLLSTM model [18].The model could reduce the average error of the actual tool wear values and accurately predict tool wear.

Experiment Procedure
In this study, we applied a full factorial design to determine the cutting parameters and employed a five-axis machine for milling.The milling machine was a 5-axis machining center CT-350, manufactured by Tongtai Inc., Kaohsiung, Taiwan, equipped with numeric command (Siemens 840Dsl).The axis of the machine is shown in Figure 1.The workpieces were 80 mm × 80 mm × 60 mm SUS304 stainless steel hexahedrons.Table 1 lists the mechanical properties and chemical composition of SUS304.In the cutting process, the four sides in the XY plane of the hexahedron were milled using the side edge, and the cutting depth was in the Z direction.The processing path was generated using the Siemens NX, as displayed in Figure 2 [19,20].A Ø 10 mm tungsten steel end mill from Chin Ming Precision Tools Co. Tainan, Taiwan, was used for the side milling.The specifications of the tool are presented in Table 2.During the machining process, a sensory tool holder (Pro-micron GmbH & Co. KG, Kaufbeuren, Germany) was used to collect the cutting force signals.The specifications of the sensory tool holder are presented in Table 3.It could measure the axial cutting force, cutting torque, and the bending moment in the X-Y direction and send data to a computer wirelessly.We wrote a neural network program in Python and then extracted features from the captured cutting force data.The compiler was Colaboratory, which is a product from Google Research and is free to use.Finally, we imported the features into the program for the model training and prediction.To differentiate the surface roughness and dimensional accuracy after processing, we selected the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut as the four factors and set three factor levels to conduct a full factorial experiment.The Table 2. Specification of the tool.

Helix angle (degree) 45
To differentiate the surface roughness and dimensional accuracy after processing, we selected the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut as the four factors and set three factor levels to conduct a full factorial experiment.The machining parameters are shown in Table 4.A total of 81 tests were designed, and each was conducted twice, resulting in 162 datasets.We conducted side milling on cubic workpieces.Face milling was first applied to the surface.Each workpiece had eight machined surfaces, including four upper and four lower.After the machining was completed, a measuring instrument (Hommel-etamic T8000) was employed to estimate the surface roughness, as displayed in Figure 3.Each machined surface was measured three times to obtain an average value.A TESA-hite Magna 400 height gauge was used to estimate the machined surfaces and the datum surface.The positions of three points were measured to obtain the mismatch and average values in order to obtain the machining dimension error, as illustrated in Figure 4. machining parameters are shown in Table 4.A total of 81 tests were designed, and each was conducted twice, resulting in 162 datasets.We conducted side milling on cubic workpieces.Face milling was first applied to the surface.Each workpiece had eight machined surfaces, including four upper and four lower.After the machining was completed, a measuring instrument (Hommel-etamic T8000) was employed to estimate the surface roughness, as displayed in Figure 3.Each machined surface was measured three times to obtain an average value.A TESA-hite Magna 400 height gauge was used to estimate the machined surfaces and the datum surface.The positions of three points were measured to obtain the mismatch and average values in order to obtain the machining dimension error, as illustrated in Figure 4.

Signal Preprocessing
The sensory tool holder could collect three types of cutting force signals (i.e., tension, torque, and the bending moment).We adopted the bending moment as the basis for the machining parameters are shown in Table 4.A total of 81 tests were designed, and each was conducted twice, resulting in 162 datasets.We conducted side milling on cubic workpieces.Face milling was first applied to the surface.Each workpiece had eight machined surfaces, including four upper and four lower.After the machining was completed, a measuring instrument (Hommel-etamic T8000) was employed to estimate the surface roughness, as displayed in Figure 3.Each machined surface was measured three times to obtain an average value.A TESA-hite Magna 400 height gauge was used to estimate the machined surfaces and the datum surface.The positions of three points were measured to obtain the mismatch and average values in order to obtain the machining dimension error, as illustrated in Figure 4.

Signal Preprocessing
The sensory tool holder could collect three types of cutting force signals (i.e., tension, torque, and the bending moment).We adopted the bending moment as the basis for the

Signal Preprocessing
The sensory tool holder could collect three types of cutting force signals (i.e., tension, torque, and the bending moment).We adopted the bending moment as the basis for the side milling evaluation (Figure 5).We observed that the bending moment increased to 9~12 N m after the tool came into contact with the workpiece during the side milling process.After the tool left the workpiece, the bending moment decreased.The sample interval of the tool holder was 0. Therefore, the number of each dataset was half of the original: 2500.Figure 6 illustrates the idling frequency for a 6000 RPM spindle speed.A peak is observable at 100 Hz.The other three recorded peaks in Figure 6 refer to the natural frequency of the sensory tool holder.Figure 7 displays the spectrum during cutting.The cutting speed is 70 m/min, and the spindle speed is 2228 RPM.A peak is observable at 37 Hz.In addition to the rotation frequencies, the harmonic frequencies also had peak values of 74 Hz, 112 Hz, and 186 Hz.Therefore, the number of each dataset was half of the original: 2500.Figure 6 illustrates the idling frequency for a 6000 RPM spindle speed.A peak is observable at 100 Hz.The other three recorded peaks in Figure 6 refer to the natural frequency of the sensory tool holder.Figure 7 displays the spectrum during cutting.The cutting speed is 70 m/min, and the spindle speed is 2228 RPM.A peak is observable at 37 Hz.In addition to the rotation frequencies, the harmonic frequencies also had peak values of 74 Hz, 112 Hz, and 186 Hz.Therefore, the number of each dataset was half of the original: 2500.Figure 6 illustrates the idling frequency for a 6000 RPM spindle speed.A peak is observable at 100 Hz.The other three recorded peaks in Figure 6 refer to the natural frequency of the sensory tool holder.Figure 7 displays the spectrum during cutting.The cutting speed is 70 m/min, and the spindle speed is 2228 RPM.A peak is observable at 37 Hz.In addition to the rotation frequencies, the harmonic frequencies also had peak values of 74 Hz, 112 Hz, and 186 Hz.

Modeling Set-Up
Before inputting the data into the ANNs for training, the parameters of each ANN were set.Parameters common to all the ANNs were the stride, learning rate, and batch size.The learning rate affects the number of strides, so that a lower learning rate requires more strides during training.In this study, the learning rate was set as 0.00015.The meansquare error (MSE) was used as the loss function to determine the training result, and the root mean-square error (RMSE) was further used to evaluate the test results.The training revealed that after a certain number of strides, the loss function and RMSE no longer changed significantly.After observing the convergence of the models, we set the stride to 1000 and the batch size as half of the training data.
In total, 162 experimental data points were obtained.The training methods could be divided into training on all the data collectively and training on the classified data in turn.
When training the classified data, we divided the data into three sets according to the three variances of the four factors (i.e., the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut).Each set had 54 data points.The training result for each set of data was determined using MSE as the loss function, and four data points were randomly selected to test for accuracy with RMSE.Convergence was achieved after three tests, and the mean absolute percentage error was calculated.

Convolutional Neural Network (CNN)
A CNN is a type of deep learning model that is mainly used for image recognition.It can effectively conduct feature identification and learning, as well as data analysis, minimizing the data size.It comprises an input layer, multiple hidden layers, and an output layer.The hidden layers are composed of convolutional layers, pooling layers, and fully connected layers.The convolutional layers are mainly responsible for feature extraction and can achieve superior spatial feature learning.The pooling layers filter feature data and retain the essential features to downsize the data, effectively reducing the difficulty of the training.
The input data for this study were obtained in time series order.Time series data are suitable for storage as a one-dimensional matrix.Thus, for the CNN model, we set the number of input channels to one and the input data format as a 1-by-2500 matrix.In order to choose a reasonable model size, the model contained a convolutional layer and a pooling layer, as presented in Figure 8.The convolutional kernel size was 500, and the stride (step of each movement of the convolution kernel) of the kernel was 300.The zero padding was 200.The depth slice of the pooling layer was 3, and the stride of the depth slice was 2. Finally, we set the number of output channels to 16.

Modeling Set-Up
Before inputting the data into the ANNs for training, the parameters of each ANN were set.Parameters common to all the ANNs were the stride, learning rate, and batch size.The learning rate affects the number of strides, so that a lower learning rate requires more strides during training.In this study, the learning rate was set as 0.00015.The mean-square error (MSE) was used as the loss function to determine the training result, and the root mean-square error (RMSE) was further used to evaluate the test results.The training revealed that after a certain number of strides, the loss function and RMSE no longer changed significantly.After observing the convergence of the models, we set the stride to 1000 and the batch size as half of the training data.
In total, 162 experimental data points were obtained.The training methods could be divided into training on all the data collectively and training on the classified data in turn.When training the classified data, we divided the data into three sets according to the three variances of the four factors (i.e., the cutting speed, feed per tooth, axial depth of cut, and radial depth of cut).Each set had 54 data points.The training result for each set of data was determined using MSE as the loss function, and four data points were randomly selected to test for accuracy with RMSE.Convergence was achieved after three tests, and the mean absolute percentage error was calculated.

Convolutional Neural Network (CNN)
A CNN is a type of deep learning model that is mainly used for image recognition.It can effectively conduct feature identification and learning, as well as data analysis, minimizing the data size.It comprises an input layer, multiple hidden layers, and an output layer.The hidden layers are composed of convolutional layers, pooling layers, and fully connected layers.The convolutional layers are mainly responsible for feature extraction and can achieve superior spatial feature learning.The pooling layers filter feature data and retain the essential features to downsize the data, effectively reducing the difficulty of the training.
The input data for this study were obtained in time series order.Time series data are suitable for storage as a one-dimensional matrix.Thus, for the CNN model, we set the number of input channels to one and the input data format as a 1-by-2500 matrix.In order to choose a reasonable model size, the model contained a convolutional layer and a pooling layer, as presented in Figure 8.The convolutional kernel size was 500, and the stride (step of each movement of the convolution kernel) of the kernel was 300.The zero padding was 200.The depth slice of the pooling layer was 3, and the stride of the depth slice was 2. Finally, we set the number of output channels to 16. number of input channels to one and the input data format as a 1-by-2500 matrix.In orde to choose a reasonable model size, the model contained a convolutional layer and a pool ing layer, as presented in Figure 8.The convolutional kernel size was 500, and the stride (step of each movement of the convolution kernel) of the kernel was 300.The zero padding was 200.The depth slice of the pooling layer was 3, and the stride of the depth slice was 2. Finally, we set the number of output channels to 16.

Deep Neural Network (DNN)
A DNN, as the name implies, is a neural network with dozens or hundreds of hidden layers.Each layer contains many neurons, and each neuron transmits its weighted output to a neuron in the next layer.Users must set appropriate parameters according to the project requirements.The activation function of the architecture is mainly used for nonlinear conversion, and the loss function is used to estimate the difference between the predicted and actual values.The model can be divided into two parts, namely the forward-propagation and backpropagation networks.
Since the number of each input dataset was 2500, the number of input layers of the DNN model was 2500.We had to adjust the number of hidden layers and the number of neurons to obtain the desired convergence.A model with more hidden layers is more complex and may cause overfitting.To reduce the complexity of the model and avoid overfitting, the number of hidden layers can be reduced.The configuration of the hidden layers and nodes in the DNN model after the convergence analysis are displayed in Figure 9.

Deep Neural Network (DNN)
A DNN, as the name implies, is a neural network with dozens or hundreds of layers.Each layer contains many neurons, and each neuron transmits its weighted to a neuron in the next layer.Users must set appropriate parameters according to ject requirements.The activation function of the architecture is mainly used for n conversion, and the loss function is used to estimate the difference between the p and actual values.The model can be divided into two parts, namely the forward gation and backpropagation networks.
Since the number of each input dataset was 2500, the number of input laye DNN model was 2500.We had to adjust the number of hidden layers and the nu neurons to obtain the desired convergence.A model with more hidden layers complex and may cause overfitting.To reduce the complexity of the model an overfitting, the number of hidden layers can be reduced.The configuration of the layers and nodes in the DNN model after the convergence analysis are displayed i 9.

Long Short-Term Memory (LSTM)
An LSTM network is a special type of RNN.It was developed to solve the p of vanishing and exploding gradients during training.Gradients vanish and exp cause model weights disappear or become excessively large due to multiplication backpropagation.By using a gating mechanism, an LSTM network solves this pro using an input gate, forget door, and output gate to enable backpropagation for t tification of time series correlations in the data.
Our LSTM model had an LSTM layer and an output layer.The number of inpu was 500, and the number of nodes in the LSTM layer was 32.Before the data wer they were dimensionally transformed and rearranged into the form required by th model.MSE was used as the loss function, and RMSE was used to evaluate the results.

Surface Roughness
We employed a surface-roughness-measuring instrument, obtained 162 roughness data, and arranged them from small to large, as illustrated in Figure distribution graphs of the roughness in terms of the four control factors are show ures 11-14.The results revealed that the three subcollections of surface roughn corresponded to the three different feed per tooth values.The measured roughnes of the feed per tooth 0.03 (mm/t) were within the range of 0.12 to 0.2 (μm), thos feed per tooth 0.06 (mm/t) were within the range of 0.34 to 0.46 (μm), and those of per tooth 0.1 (mm/t) were within the range of 0.67 to 1.08 (μm).However, the re factors had no evident influence on the surface roughness distribution, and the ro values were uniform within the range of 0.12 to 1.08 (μm).The feed per tooth was variable that affected the roughness.Therefore, the data were divided accordin

Long Short-Term Memory (LSTM)
An LSTM network is a special type of RNN.It was developed to solve the problem of vanishing and exploding gradients during training.Gradients vanish and explode because model weights disappear or become excessively large due to multiplication during backpropagation.By using a gating mechanism, an LSTM network solves this problem by using an input gate, forget door, and output gate to enable backpropagation for the identification of time series correlations in the data.
Our LSTM model had an LSTM layer and an output layer.The number of input nodes was 500, and the number of nodes in the LSTM layer was 32.Before the data were input, they were dimensionally transformed and rearranged into the form required by the LSTM model.MSE was used as the loss function, and RMSE was used to evaluate the training results.

Surface Roughness
We employed a surface-roughness-measuring instrument, obtained 162 sets of roughness data, and arranged them from small to large, as illustrated in Figure 10.The distribution graphs of the roughness in terms of the four control factors are shown in Figures 11-14.The results revealed that the three subcollections of surface roughness data corresponded to the three different feed per tooth values.The measured roughness values of the feed per tooth 0.03 (mm/t) were within the range of 0.12 to 0.2 (µm), those of the feed per tooth 0.06 (mm/t) were within the range of 0.34 to 0.46 (µm), and those of the feed per tooth 0.1 (mm/t) were within the range of 0.67 to 1.08 (µm).However, the remaining factors had no evident influence on the surface roughness distribution, and the roughness values were uniform within the range of 0.12 to 1.08 (µm).The feed per tooth was a crucial variable that affected the roughness.Therefore, the data were divided according to the feed per tooth during the classification training.The training results are displayed in Figure 15.The prediction results of these three models obtained based on the classification had a mean error percentage within 10%.However, the results of the whole-data training indicated that the mean error percentage of the prediction results of the three models was more than 10%, the CNN and LSTM models had a minimum percentage error of approximately 20%, and the DNN had a minimum percentage error of 50%.It can be speculated that if we had classified the data for the feed per tooth in the early stage of data processing, each model could have obtained accurate predictions.When whole-data analysis was used, only the CNN and LSTM models had reasonable accuracy, and the DNN model had low accuracy, which is consistent with the results of the literature.In terms of the analytical efficiency, whether whole-data analysis or classification analysis was applied, the CNN model had the fastest computing speed, followed by the LSTM model.The DNN model required the longest computing time.The model calculation times are shown in Figure 16.
, 12, x FOR PEER REVIEW 10 of 16 more than 10%, the CNN and LSTM models had a minimum percentage error of approximately 20%, and the DNN had a minimum percentage error of 50%.It can be speculated that if we had classified the data for the feed per tooth in the early stage of data processing, each model could have obtained accurate predictions.When whole-data analysis was used, only the CNN and LSTM models had reasonable accuracy, and the DNN model had low accuracy, which is consistent with the results of the literature.In terms of the analytical efficiency, whether whole-data analysis or classification analysis was applied, the CNN model had the fastest computing speed, followed by the LSTM model.The DNN model required the longest computing time.The model calculation times are shown in Figure 16., 12, x FOR PEER REVIEW 10 of 16 more than 10%, the CNN and LSTM models had a minimum percentage error of approximately 20%, and the DNN had a minimum percentage error of 50%.It can be speculated that if we had classified the data for the feed per tooth in the early stage of data processing, each model could have obtained accurate predictions.When whole-data analysis was used, only the CNN and LSTM models had reasonable accuracy, and the DNN model had low accuracy, which is consistent with the results of the literature.In terms of the analytical efficiency, whether whole-data analysis or classification analysis was applied, the CNN model had the fastest computing speed, followed by the LSTM model.The DNN model required the longest computing time.The model calculation times are shown in Figure 16.

Dimensional Accuracy
The error values of the dimensional accuracy were arranged from small to large, as illustrated in Figure 17.The results revealed negative deviations in the cutting amount occurred during stainless steel cutting.That is, residuals were detected on the parts.The reason for this is that stainless steel has superior toughness.Plastic deformation occurs on a large scale during the cutting process; hence, cutting the parts to the finished size is difficult.To finish cutting, compensation for the errors and additional cutting is required, a finding that is consistent with our experimental results.The dimensional accuracies for the different cutting factors were identified.However, unlike the distribution of roughness, evident subcollection distribution was observed for the different feeds per tooth (Figure 18).For the different radial depths of cut, the dimensional error of the minimum radial depth of 0.05 mm had larger values, as shown in Figure 19.For the different axial depths of cut, the dimensional error of the maximum axial depth of 20 mm had larger values, as shown in Figure 20.The different cutting speeds had no significant effect on the dimensional error after removing the outliers, as shown in Figure 21.
The training results are presented in Figure 22.For the machining accuracy, no obvious trend was observed during the training of the classified data.This result was different from that observed for the surface roughness.The prediction errors of the three models were close, and the values were within the range of 7% to 28%.When whole-data analysis was used, the DNN model had the lowest percent error (6.99%), followed by the CNN

Dimensional Accuracy
The error values of the dimensional accuracy were arranged from small to large, as illustrated in Figure 17.The results revealed negative deviations in the cutting amount occurred during stainless steel cutting.That is, residuals were detected on the parts.The reason for this is that stainless steel has superior toughness.Plastic deformation occurs on a large scale during the cutting process; hence, cutting the parts to the finished size is difficult.To finish cutting, compensation for the errors and additional cutting is required, a finding that is consistent with our experimental results.The dimensional accuracies for the different cutting factors were identified.However, unlike the distribution of roughness, evident subcollection distribution was observed for the different feeds per tooth (Figure 18).For the different radial depths of cut, the dimensional error of the minimum radial depth of 0.05 mm had larger values, as shown in Figure 19.For the different axial depths of cut, the dimensional error of the maximum axial depth of 20 mm had larger values, as shown in Figure 20.The different cutting speeds had no significant effect on the dimensional error after removing the outliers, as shown in Figure 21.
The training results are presented in Figure 22.For the machining accuracy, no obvious trend was observed during the training of the classified data.This result was different from that observed for the surface roughness.The prediction errors of the three models were close, and the values were within the range of 7% to 28%.When whole-data analysis was used, the DNN model had the lowest percent error (6.99%), followed by the CNN

Dimensional Accuracy
The error values of the dimensional accuracy were arranged from small to large, as illustrated in Figure 17.The results revealed negative deviations in the cutting amount occurred during stainless steel cutting.That is, residuals were detected on the parts.The reason for this is that stainless steel has superior toughness.Plastic deformation occurs on a large scale during the cutting process; hence, cutting the parts to the finished size is difficult.To finish cutting, compensation for the errors and additional cutting is required, a finding that is consistent with our experimental results.The dimensional accuracies for the different cutting factors were identified.However, unlike the distribution of roughness, evident subcollection distribution was observed for the different feeds per tooth (Figure 18).For the different radial depths of cut, the dimensional error of the minimum radial depth of 0.05 mm had larger values, as shown in Figure 19.For the different axial depths of cut, the dimensional error of the maximum axial depth of 20 mm had larger values, as shown in Figure 20.The different cutting speeds had no significant effect on the dimensional error after removing the outliers, as shown in Figure 21.
presented in Figure 23.The result was the same as that for the surface roughness.Whether we used whole-data analysis or classification analysis, the CNN model had the fastest computing speed, followed by the LSTM model, and DNN required the most time.If an accurate and efficient model is required, CNN is still the ideal model.presented in Figure 23.The result was the same as that for the surface roughness.Whether we used whole-data analysis or classification analysis, the CNN model had the fastest computing speed, followed by the LSTM model, and DNN required the most time.If an accurate and efficient model is required, CNN is still the ideal model.presented in Figure 23.The result was the same as that for the surface roughness.Whether we used whole-data analysis or classification analysis, the CNN model had the fastest computing speed, followed by the LSTM model, and DNN required the most time.If an accurate and efficient model is required, CNN is still the ideal model.The training results are presented in Figure 22.For the machining accuracy, no obvious trend was observed during the training of the classified data.This result was different from that observed for the surface roughness.The prediction errors of the three models were close, and the values were within the range of 7% to 28%.When whole-data analysis was used, the DNN model had the lowest percent error (6.99%), followed by the CNN model (10.51%).The LSTM model performed the worst.The model calculation times are presented in Figure 23.The result was the same as that for the surface roughness.Whether we used whole-data analysis or classification analysis, the CNN model had the fastest computing speed, followed by the LSTM model, and DNN required the most time.If an accurate and efficient model is required, CNN is still the ideal model.

Conclusions
In this study, three different models were applied to extract features from machining data in order to train a neural network and predict the surface roughness and machining accuracy.Through the evaluation of the resulting data, the feasibility of the neural network for the prediction of the processing outcomes was confirmed.Additionally, through the preprocessing of the data and data grouping, the accuracy of the prediction could be improved.After comparing and discussing the results of these three models, we identified the model with a high accuracy (a percentage error of prediction below 10%) and the shortest computation time.The results are as follows: 1.The feed per tooth was the most powerful factor affecting surface roughness in this study.Different feeds per tooth produce significant differences in the surface roughness.2. The machining results of stainless steel showed negative deviations in the cutting amount (with residual material) because of the material properties.This phenomenon is consistent with the literature.3. When the surface roughness was predicted by the feed per tooth grouping, the three ANN models all had high accuracy, and the percentage errors were all below 10%.

Conclusions
In this study, three different models were applied to extract features from machining data in order to train a neural network and predict the surface roughness and machining accuracy.Through the evaluation of the resulting data, the feasibility of the neural network for the prediction of the processing outcomes was confirmed.Additionally, through the preprocessing of the data and data grouping, the accuracy of the prediction could be improved.After comparing and discussing the results of these three models, we identified the model with a high accuracy (a percentage error of prediction below 10%) and the shortest computation time.The results are as follows: 1.
The feed per tooth was the most powerful factor affecting surface roughness in this study.Different feeds per tooth produce significant differences in the surface roughness.

2.
The machining results of stainless steel showed negative deviations in the cutting amount (with residual material) because of the material properties.This phenomenon is consistent with the literature.

3.
When the surface roughness was predicted by the feed per tooth grouping, the three ANN models all had high accuracy, and the percentage errors were all below 10%.However, if whole-data training is used for surface roughness prediction, the error percentage will increase significantly.Specifically, DNN will exceed 50%.4.
For the prediction of the dimensional accuracy, we could not determine whether whole-data training or classified data training was preferable.However, the predic-tions were still accurate (the percentage error was below 20%).The most accurate prediction method is the full data training of the DNN model.

5.
When the DNN model was employed to predict the machining accuracy, whole-data training was found to have the optimal performance.On the contrary, the DNN used for surface roughness was the worst model for whole-data training.The CNN and LSTM models did not have substantially different prediction performances between whole-data training and classified data training in terms of the dimension accuracy.6.
For both surface roughness prediction and machining accuracy prediction, the computation time of the CNN model is the shortest.The second shortest is the LSTM model.The DNN model had the longest computing time.

Figure 1 .
Figure 1.Sensory tool holder and machining center.Figure 1. Sensory tool holder and machining center.

Figure 1 .
Figure 1.Sensory tool holder and machining center.Figure 1. Sensory tool holder and machining center.
0004 s, and the measuring frequency was 2500 Hz.During machining, 2 s signals were captured, and 5000 signals were retrieved for each dataset.To achieve a satisfactory training result, we conducted feature extraction before inputting the data into the ANN models for training.The purpose of feature extraction was to obtain essential and meaningful features from the raw data and increase the analytical efficiency.The bending moment signals were converted to a frequency domain from a time domain using a Fourier transform technique.The bandwidth of the original signals was 2500 Hz, whereas the effective bandwidth of the signals after fast Fourier transform was 1250 Hz.

Electronics 2023 ,
12, x FOR PEER REVIEW 7 of 16 9~12 N m after the tool came into contact with the workpiece during the side milling process.After the tool left the workpiece, the bending moment decreased.The sample interval of the tool holder was 0.0004 s, and the measuring frequency was 2500 Hz.During machining, 2 s signals were captured, and 5000 signals were retrieved for each dataset.To achieve a satisfactory training result, we conducted feature extraction before inputting the data into the ANN models for training.The purpose of feature extraction was to obtain essential and meaningful features from the raw data and increase the analytical efficiency.The bending moment signals were converted to a frequency domain from a time domain using a Fourier transform technique.The bandwidth of the original signals was 2500 Hz, whereas the effective bandwidth of the signals after fast Fourier transform was 1250 Hz.

Figure 11 .Figure 10 .
Figure 11.Roughness distribution graph in terms of feed per tooth.

Figure 11 .Figure 11 .
Figure 11.Roughness distribution graph in terms of feed per tooth.

Figure 12 .
Figure 12.Roughness distribution graph in terms of radial depth of cut.

Figure 13 .
Figure 13.Roughness distribution graph in terms of axial depth of cut.

Figure 13 .
Figure 13.Roughness distribution graph in terms of axial depth of cut.

Figure 13 .
Figure 13.Roughness distribution graph in terms of axial depth of cut.

Figure 14 .Figure 14 .
Figure 14.Roughness distribution graph in terms of cutting speed.

Figure 16 .
Figure 16.Comparison of computing time between models.

Figure 16 .
Figure 16.Comparison of computing time between models.

Figure 16 .
Figure 16.Comparison of computing time between models.

Figure 18 .
Figure 18.Distribution graph of dimensional accuracy in terms of feed per tooth.

Figure 18 .
Figure 18.Distribution graph of dimensional accuracy in terms of feed per tooth.Figure 18. Distribution graph of dimensional accuracy in terms of feed per tooth.

Figure 18 .
Figure 18.Distribution graph of dimensional accuracy in terms of feed per tooth.Figure 18. Distribution graph of dimensional accuracy in terms of feed per tooth.

Figure 18 .
Figure 18.Distribution graph of dimensional accuracy in terms of feed per tooth.

Figure 19 .
Figure 19.Distribution graph of dimensional accuracy in terms of radial depth of cut.

onics 2023 , 16 Figure 19 .
Figure 19.Distribution graph of dimensional accuracy in terms of radial depth of cut.

Figure 20 .
Figure 20.Distribution graph of dimensional accuracy in terms of axial depth of cut.

Figure 21 .
Figure 21.Distribution graph of dimensional accuracy in terms of cutting speed.

Figure 22 .
Figure 22.Percentage error of dimensional accuracy prediction.

Figure 20 . 16 Figure 19 .
Figure 20.Distribution graph of dimensional accuracy in terms of axial depth of cut.

Figure 20 .
Figure 20.Distribution graph of dimensional accuracy in terms of axial depth of cut.

Figure 21 .
Figure 21.Distribution graph of dimensional accuracy in terms of cutting speed.

Figure 22 .
Figure 22.Percentage error of dimensional accuracy prediction.

Figure 21 .
Figure 21.Distribution graph of dimensional accuracy in terms of cutting speed.

Figure 21 .
Figure 21.Distribution graph of dimensional accuracy in terms of cutting speed.

Figure 22 .
Figure 22.Percentage error of dimensional accuracy prediction.Figure 22. Percentage error of dimensional accuracy prediction.

Figure 22 . 17 Figure 22 .
Figure 22.Percentage error of dimensional accuracy prediction.Figure 22. Percentage error of dimensional accuracy prediction.

Figure 23 .
Figure 23.Comparison of computing time between models.

Figure 23 .
Figure 23.Comparison of computing time between models.

Table 1 .
Mechanical properties and chemical composition of SUS304.

Table 1 .
Mechanical properties and chemical composition of SUS304.

Table 2 .
Specification of the tool.

Table 3 .
Specification of the sensory tool holder.

Table 3 .
Specification of the sensory tool holder.

min) Feed per Tooth (mm/tooth) Axial Depth of Cut (mm) Radial Depth of Cut (mm)
Electronics 2023, 12, x FOR PEER REVIEW 6 of 16