Recent Advances on Machine Learning Applications in Machining Processes

: This study aims to present an overall review of the recent research status regarding Machine Learning (ML) applications in machining processes. In the current industrial systems, processes require the capacity to adapt to manufacturing conditions continuously, guaranteeing high performance- in terms of production quality and equipment availability. Artificial Intelligence (AI) offers new opportunities to develop and integrate innovative solutions in conventional machine tools to reduce undesirable effects during operational activities. In particular, the significant increase of the computational capacity may permit the application of complex algorithms to big data volumes in a short time, expanding the potentialities of ML techniques. ML applications are present in several contexts of machining processes, from roughness quality prediction to tool condition monitoring. This review focuses on recent applications and implications, classifying the main problems that may be solved using ML related to the machining quality, energy consumption and con-ditional monitoring. Finally, a discussion on the advantages and limits of ML algorithms is summarized for future investigations.


Introduction
The Fourth Industrial Revolution has enhanced the application of Machine Learning (ML), improving the machining capabilities and reducing the production costs. Articles on ML in machining have been written from the late 1980s-1990s in the industry, as, for example, by Junkar et al. [1], where a Decision Tree algorithm was applied for performance monitoring in plunge grinding based on vibration signals with different frequency domain features. Rangwala et al. [2] used an Artificial Neural Network (ANN) for predicting optimal operating conditions (MRR maximization) in turning using process parameters as the input. Okafor et al. [3] in 1995 used an ANN based on time domain features from acoustic emission, vibration, cutting force and time signals in milling for surface roughness and bore tolerance prediction. These were some of the first articles that showed the application of ML in machining; nevertheless, these were sparse and limited due to the low computational capacity in their times.
Stochastic-based models have been implemented in the literature for machining applications [4][5][6]. The well-known distributions, such as Gaussian, Log-normal, Exponential and Weibull, are the mathematical techniques applied as the lives of tools are explored, and the optimization of the parameters is requested [7]. These models are based on the inner-relations searches between the tool wear factor and process parameters or surface roughness, with the aim of optimizing the tool life. The proportional hazards model developed by Cox may be used for fog model optimization [8]. Other models differ on the process parameters and signal features, and an optimization step is required for Cox and the selected distribution parameters [9].
The conventional methods in the machining context are commonly established on a physical model with the aim to extrapolate correlations between the model variables. Meanwhile, the ML model's purpose is to obtain an accurate as possible predictor that is frequently difficult to recognize with the physical term. Conventional methods require several assumptions, such as the statistical distribution model for the studied variables. ML methods enable to consider the available information without screening or prioritization actions, obtaining a superior degree of flexibility [10]. Moreover, a significant expansion of ML started from 2015 with an intensive application of methods and algorithms. For this reason, this review aims to evaluate the advances of ML focusing on the recent state-of-the-art innovations, analyzing the main techniques and the obtained results.
ML architectures may be classified into supervised, semi-supervised and unsupervised learning. Supervised learning applications are composed by processes where there are known input and output dataset values, and the forecast outcomes of the models need to be verified with the real resultant dataset. An unsupervised dataset is arranged by input information without knowledge of the resultant output. Finally, semi-supervised learning uses mix the known features and unknown dataset. This learning method has been implemented to improve the time-consuming limitations and corresponding expenses in supervised learning. The combination of previously mentioned architectures shows a number of challenges due to the uncertainty of the unknown result dataset [11].
The ML approach is applied in machining to solve a broad range of issues and related root causes, such as the poor quality of the production due to process vibrations, thermal conditions and component breakages or a low efficiency of equipment due to ineffective monitoring of the component life, incorrect maintenance scheduling and high energy consumption of a machine tool. Although these types of problems may seem different, they can be addressed using the same approach. In fact, the application of ML algorithms increases the machining capabilities, obtaining more robust conditions in monitoring systems and models, since they are trained from data through seven main steps: (1) problem definition, (2) data collection, (3) signal processing, (4) feature extraction, (5) feature selection, (6) model implementation and (7) model validation.
The first step is the problem statement (1). The correct selection of input data determines the capability of the deployed model. Typical choices in machining are the measurements of the process forces [12,13], accelerations [14,15], vibrations or acoustic emission sensors [16,17] (2)(3). Furthermore, the availability of the inner sensors or process parameters allow simple and reliable applications [18,19], including all the data available during the production. The use of new external sensors may be evaluated in the details, since it may increase the time and cost of the study and not imply better results in terms of accuracy for the ML application.
In order to use data collection from the different sensors, a signal treatment (4) is required for correct feature extraction. Typically, there are three main domains: (i) Time Domain (TD), (ii) Frequency Domain (FD) and (iii) Time-Frequency Domain (TFD). For each domain, there are typical coefficients, for example, statistical features (e.g., kurtosis, standard deviation) are used for TD signals [20], while other types of features are obtained (e.g., RMS and peak values [21]) for FD signals.
This analysis permits to understand those features that should be used in the prediction of the ML model, excluding those signals that may create noise in the system when incrementing the ML complexity (5). This step can be developed using wrapper methods, which consist of testing the result variations for different settings of the input data. In order to improve the selection of features, further methods, called filters, may be considered that classify the features based on external criteria. In this case, PCA (Principal Component Analysis) [22,23] and PCC (Pearson Correlation Coefficient) evaluations may be applied [24,25]. The filters could be used as the first processing phase, while the wrapper methods could be implemented to obtain the optimal feature selection for the ML applications [26,27].
Once these steps are defined (the feature selection with wrapper methods is completed only after the ML model selection), the ML model can be applied (6). The selection of the correct ML algorithm depends on the type of problem (e.g., regression or classification approach). The model selection is influenced by the dataset preprocessing activity for a robust comparison between the prediction methods to maintain the generalization ability of the architecture. Cross-validation techniques are commonly applied to determine the best model. The conventional method separates the data into training and testing substructures. The groups require that the training dataset contains representative conditions. Several different approaches have been developed, such as k-fold cross-validation or leave-p-out cross-validation [28], to allow a superior model design. The limitation is represented by the time consumption and computational requirements. However, the ability to predict data out-of-learning edges is a central characteristic of Machine Learning applications.
In this way, the wide availability of ML algorithms suggests comparing different methods in order to identify the most suitable scheme (7), with limited from time-consuming work to collect, process and analyse data. In particular, an increment of sensors may be required, resulting in big volumes of data.
These available datasets allow further extensions, such as Deep Learning (DL) applications that, differently from ML, do not obtain a limited learning rate due to the amount of data. DL applications are used as feature extraction and selection methods (4 to 5) for ML applications, proving to be a reliable and efficient application based on the obtained results, as shown in References [29,30]. Moreover, it allows the implementation of simpler ML methods, such as k-nearest neighbors (k-NN) or simple ANNs or a Decision Tree (DT). Beside this application, the actual use of DL algorithms in machining in the substitution of ML reduces the number of the required steps, since feature extraction and selection (4 to 5) is typically done by the same model [31,32]. These steps may improve the final results of the DL applications, as highlighted in References [33,34]. The main limits of DL algorithms incur when the acquired data is not enough to train the models correctly, for example, breakages or malfunctioning data tend to be reduced compared to the massive amount of data collected in conventional operational conditions of a machine tool, which may lead the model to biased situations.
A further extension of ML applications is transfer learning (TL), as shown in References [35,36], that consists of storing information and knowledge obtained in solving one problem and applying them to a different but related problem. This approach may be used in order to: (1) reduce the learning time required, (2) avoid the first problematic steps in learning for ML and DL and (3) improve the results.
In recent years, the application of these AI methods has rapidly expanded in machining processes. Their applications are typically based on nonlinear time variant problems that can be clustered into five groups: • Condition Monitoring [37,38]: it consists of monitoring process parameters (e.g., temperature, vibrations, accelerations and tool wear) in order to predict the MT conditions, for example, it allows the correct definition of the tool replacement time and reduces the periods of time that the machine is stopped due to critical breakages. In this case, further approaches may be used, such as Tool Condition Monitoring (TCM) and Condition Monitoring (CM-related to the current machining structure).

•
Chatter [39,40]: it is a self-excited vibration caused by the continuous interaction between the tool and the workpiece that creates several issues in machining: the ability to correctly determine or predict when chatter occurs reduces the probability to refinish the workpiece and the corresponding tool wear. • Quality [41,42]: the thermomechanical behavior of the machining structure may generate unwanted tool tip displacements, which would increase the final machining error. In this case, the main issue is related to the quality level of the surface roughness, which is one of the main requirements in machining.
• Modeling [43,44]: there are several applications that require to model and predict a phenomenon related to the machine technology, for example, it is useful to determine the correct material removal or the most suitable cooling/lubricating technology to improve the equipment sustainability. • Energy [45,46]: the energy consumption prediction of a machine tool is becoming increasingly critical in terms of emission reduction and energy efficiency of the manufacturing processes. The application of ML techniques permits to predict the most suitable MT setting to save energy in machining, guaranteeing the required production performance.
Although the literature presents various ML applications in machining [47][48][49][50], further studies are required to investigate the recent advances in algorithms and approaches in development. In particular, the increase of the ML extension shows the need of a set of structured rules to allow an effective comparison between different applications and results. In this work, the authors aim to present a review of the recent state-of-the-art innovations, classifying the type of application problems solved by ML implementation, the applied approaches (input and features) and the obtained results, highlighting the practical implications, as shown in Table 1.
This article is structured as follows: Section 2 presents the main applications and the different ML models and configurations. In Section 3, a discussion of the analyzed research is proposed, highlighting the advantages and limitations. Finally, Section 4 presents the conclusions of the review.

Machine Learning Paradigm in Machining Application
In the last decades, ML has attracted much attention from academic researchers and industrial engineers in a wide research area. In order to evaluate the practical applications in industrial machining, this review classifies six macro-categories of problems addressed by ML: chatter, roughness, quality, modeling, machine condition monitoring and tool condition monitoring.

Chatter
Chatter is a self-excited vibration that occurs in machining (e.g., milling and turning) while operating at a high speed. This is an undesirable phenomenon that has negative effects, such as a poor surface finish, unacceptable accuracy, excessive noise and tool wear. A conventional structured approach is the application of the stability lobe diagram (SLD) that determines the process parameters to anticipate unwanted vibrations. SLD is used to label the data; nevertheless, the system is learnt from analytical models, and therefore, it requires effective initial modeling. In this way, ML may offer new opportunities for chatter online recognition, employing extra sensors (e.g., accelerometers).
Denkena et al. [51] used the process parameters to train and compare a number of models: an ANN, a support vector machine (SVM) and a new approach based on kernel interpolation (KI) to predict the SLD. The study was applied to a five-axis milling center with a milling tool of 10 mm in diameter and four teeth. The results showed that the MLbased creation of SLD may be a good alternative to analytically calculated SLD. All the models achieved accuracies over 88%. In particular, the KI model obtained the highest accuracy (94%). The chatter phenomenon was collected using acceleration sensors and a microphone. The main limit of this application was the availability of data when the system changed the conditions. For this reason, Postel et al. [19] proposed a new approach based on Deep Neural Networks and transfer learning, since DNN models are pretrained with simulated data from analytical stability models. The aim was to reduce the differences between the real measurements and the model output due to the cutting forces and the tooltip dynamics. The study obtained an accuracy of 83.6% for the testing set, using as inputs the spindle speed, depth of cut and tool clamping length, as well as the entry angle and exit angle. Another transfer learning example with the Random Forest (RF) algorithm was studied in Reference [35], where the vibration signal was used together with the stick out length, the Ensemble Empirical Mode Decomposition (EEMD) was applied as the feature extraction and the Recursive Feature Elimination (RFE) method was executed to classify the machining state into three categories: no chatter, mild chatter and chatter. The final results showed that, for transfer learning feature extraction, the EEMD outperformed Wavelet Package Transform (WPT).
Yesili et al. [52] focused on the feature extraction method to determine the presence of chatter based on the simulated oscillations of the milling tool (1DOF) data with high noise. They implemented the Topological Data Analysis-more specifically, Carlsson Coordinates and Template Functions-obtaining high accuracies with several algorithms; the ones with the highest accuracies were the SVM and Gradient Boosting. Binary chatter classification employing vibration signals in the input also included SVM with frequency domain features in Reference [53] and ANN in Reference [54], where the statistical time domain features were used together with a DT (J48) for feature selection.
Other than accelerometers, sound sensors are sometimes considered for chatter recognition, as in Reference [55], where the binary classification was obtained with a SVM, the short-time Fourier-transform (STFT) features and an Autoencoder for dimension reduction. Mixing sound signals and vibration information has also been implemented in Reference [56] with the t-distributed stochastic neighbour embedded to obtain the final features and the Reinforced k-NN for chatter prediction.
Finally, force signals were employed for chatter classification into three classes, in Reference [13] with the Gradient Tree Boosting algorithm, the ASA for feature extraction and the Laplacian score for the final selection; in Reference [12], a convolutional neural network (CNN) was implemented based on the features extracted from the continuous wavelet transform (CWT). The obtained results showed new opportunities to combine different signals to predict the chatter phenomenon.

Roughness
Roughness estimation and prediction may be evaluated through a set of models based on two main clusters: classification and regression. The classification models permit a discretization of the interval of the regression values, as demonstrated in Pan et al. [57], where the roughness, from 0 to 300 nm, was discretized in 150 clusters of 2 nm each. Despite limiting the maximum actual accuracy in the final roughness value, the discretization allowed to increment the classification accuracy and reduce the output variability. The vibration signals measured with a laser vibrometer were applied in the t-Distributed stochastic neighbour embedding (t-SNE) method for feature selection to enhance the capability of the CNN, obtaining high accuracies in their predictions.
The classification approach may generate a similar regression prediction with a high amount of classes; nevertheless, other applications [58][59][60] defined few levels (three to four) of surface roughness for the prediction, guaranteeing effective results. In this way, Yu et al. [58] used the vibration signal and process parameters with a DL methodnamely, the Knowledge-Based Deep Belief Network (KBDBN)-to classify the roughness value within three levels, obtaining a recognition rate of 97.67%. Differently, other studies based on tree algorithms used the model to classify the roughness within four levels. In this case, Lu et al. [59] implemented the Deep Forest, a DL method based on the WPT and Fast Fourier-Transform (FFT) features of forces and load signals to predict the roughness class, obtaining a testing accuracy of 90.91%. Finally, Grzenda et al. [60] applied a semisupervised structure based on Random Forest; although the accuracy was low if compared to the other results (around 70%), the authors showed the potential contribution of unlabeled data in improving the model accuracy.
Considering the regression predictions, the renowned algorithm is the ANN: Shi et al. [61] used double-modeling (BP-ANN) to predict the cutting force and the final roughness. Thankachan et al. [62] applied ANOVA as a feature selection method to predict roughness in wire electrical discharge machining (WEDM). Mirifar et al. [63] proposed a roughness prediction model based on the discrete wavelet transform (DWT) and RMS peak value of the acoustic emission, and the grinding parameters obtaining accuracies over 97%. Finally, Segreto et al. [42] studied the acoustic emission, forces and current measurements through PCA feature reduction to obtain through the ANN the correct time for tool changes based on the roughness prediction, obtaining a mean absolute percentage error (MAPE) of 4.2% in their prediction.
Alajmi et al. [64] also implemented an ANN-based model (Adaptive Neuro-Fuzzy Inference System-ANFIS) based on the Takagi-Sugeno Fuzzy Rules and the Quantum Particle Swarm Optimization Algorithm (QPSO) to predict and optimize the surface roughness in turning. The only DL application was also based on a neuro-like model; the authors showed how 1D-CNN is able to extract high feature information in the medium range of surface roughness, whilst the FFT-LSTM (long short-term memory) structure obtained better results at higher roughness values, where its temporal modeling is advantageous [65].
Liu et al. [66] considered Gaussian modeling and Bayesian learning to implement a surface roughness estimation in ring-shaped thin-walled discs to reduce the number of measurements required based on measured values along different trajectories. Another Bayesian application [22] applied as an input the vibration and process parameter time domain features and the radial basis function-based kernel principal component analysis (KPCA-IRBF) as a feature selection to apply the Sparse Bayesian Linear Regression (SBLR) for the roughness prediction, obtaining a Root Mean Square Error (RMSE) of 0.0317 and a Pearson Correlation Coefficient of 0.9926.
Finally, Vuong et al. [67] used a Quadratic Regression model based on the cutting parameters, vibration signal and force components. In order to apply this model, a signal treatment was implemented based on the extraction of the time-domain statistical and the frequency-domain features from the input signals using FFT. The k-fold cross-validation was applied with the t-statistics and p-value for feature selection, obtaining an accuracy of 95.25% and an adj-R 2 of 71.75%.

Quality
The literature shows a number of ML applications to improve the overall performance level and quality of a machine tool (MT). In this way, Aggogeri et al. [41] presented an ANN application (Multi-Layer Perceptron-MLP) to model the MT thermomechanical deformations of CFRP (Carbon Fiber Reinforced Polymers) structures based on the global variations of the temperature, the gradient of the temperature between the front and rear of the ram, the gradient between the spindle flange and the vertical axis and the environmental condition temperatures. Authors have reduced the detected error under 10 µm. Wang et al. [68] used DL applications for thermal deformation modeling using data mining based on RST and reducing the thermal error of ~99%. Fujishima et al. [69] focused on thermal displacement prediction, applying a DNN with Bayesian Dropout and considering the sensor failures to test the robustness of the model. In Reference [70], the research aimed to predict the thermal drift based on four different working conditions; the authors applied pretrained coefficients for the CNN initialization, obtaining a so-called CNN-FT (fine-tuning). A model accuracy of 87.06% with an MSE of 0.0124 and a MAPE of 0.2154 was obtained. Li et al. [71] applied the CNN approach and a Domain Adaptation Module for the thermal error prediction, achieving a model accuracy over 94.87% and an MSE under 6.1 × 10 −6 .
The CNN scheme is also applied to detect workpiece surface defects [72], using, as the input, the scattering data from a laser beam and simulated data for training to reduce the required time. A further application to detect edge inconsistencies from images was demonstrated in Reference [73]. The quality error evaluation on the workpiece was evaluated in several researches by analyses performed with tree-based algorithms. For example, Bustillo et al. [74] applied RF Ensemble combined with the Synthetic Minority Over-Sampling Technique (SMOTE) for flatness deviation prediction based on the tool's life, average drive power and flank wear, with a testing accuracy of 86.44%. RF was also implemented for the diameter, roundness and other quality parameters using torque values and speed statistical time-domain features [75] or the axial force and torque together with the process parameters [76]. Finally, an Extreme Tree Regressor (ETR) was considered for the workpiece diameter and concentricity control in drilling and reaming [77].
A further ML approach is the Support Vector Machine. Liu et al. [78] showed a study on the residual properties of the ball screw raceway in dry machining using a Gaussian RBF kernel and the combination of SVM with the Least Mean Square method (LSSVM). The authors considered as input the cutting parameters, tool parameters and the machining condition (clamping coefficient). RBF kernels were also applied for tool deflection modeling [79], obtaining a 94% accuracy, including the SVM (k-RBF) based on LIBSVM. Glatt et al. [80] predicted the martensite content after cryogenic turning, considering the passive, cutting and feed forces and temperature, applying the PCC approach for feature selection and obtaining a RMSE of ~0.8. Linear kernel was also applied for flatness classification prediction in honeycomb cores using the time-domain and frequency-domain (based on FFT) features of the force signal and the PCA for feature reduction [81,82]. Finally, Nain et al. [83] implemented a Gaussian Process with Polynomial Kernel to predict the SR peculiarities of WEDM.

Modeling
Machine learning and Deep Learning techniques may be also exploited to model the behavior of a number of MT components and structural parts. Interesting applications are related to the prediction of the process forces on the workpiece and the computation of coefficients to define the stability of the machining. These applications could be seen as a "virtual sensor" generator. Vaishnav et al. [84] developed an ANN scheme to model the cutting force in end milling operations. The authors studied the process parameters and the rotation angle to generate the necessary data to train the ML model, obtaining a RSME of 1.0058 and an R 2 of 99.98%. This structure was also implemented to develop a Reinforcement Learning (RL) structure based on an ANN [85]. The aim was to limit the resulting force in press-and-release systems, obtaining a resulting final roughness reduction of 30.68%. A further example of RL application was described in Reference [86] to predict the optimal clamping position of the workpiece.
Neural Network applications have been applied for tool tip dynamics, stability and optimization problems [36,44,87,88]. Misaka et al. [89] considered Neural Networks, under the form of CNN, based on camera images of the metal cutting processing for machining parameters extraction, obtaining a model accuracy of 85.5% and a precision of 92.9%. In the same way, a Convolutional Neural Network structure was implemented with Res-Net configuration, considering the responsive fixtures and process data as input, in order to allow the Bidirectional LSTM model to predict the part deformation with an error equal to 10.61% [29]. LSTM was also applied to model the dependency between the deviation, tensile force and eccentricity of low-rigidity shaft machining with a MSE of 1.5456 × 10 −5 [90].
In order to predict the dynamic heat input to monitor the robotic belt grinding [91], Ren et al. presented a Bayesian Adaptive Direct Research-Least Squares Support Vector Machine (BADS-LSSVM) with a RBF kernel, considering the sound signal and the process forces (e.g., normal and tangential). In this case, the force features were obtained using the moving smoothing filter method and extracting the Dynamic Friction Coefficient. The Wavelet Package Decomposition method was applied for the sound signal, calculating several features such as average amplitude, kurtosis and zero passage rate.
SVM may be also applied to model and control the trajectory [92] and several MT parameters, such as the tool velocity, the roughness generation (90% of accuracy) and the part features (e.g., diameter) prediction (97% of accuracy), as demonstrated in Reference [93]. The models are accurate enough to provide useful conclusions applicable to the current industrial practices.
Gurgenc et al. [94] applied an Extreme Learning Machine (ELM) in milling for the time estimation of cycloidal gear machining based on the design and manufacturing parameters of the gear. The study obtained a RMSE of 1.6837 and an R 2 of 99.34%. A further example in milling was presented by Garrido-Labrador et al. [95], who evaluated the machining mode, state and motor temperature (regression) predictions by applying the ML structure. The authors implemented the process parameters (e.g., axis position, cutting tool position and machine speed) as the model input.
Gao et al. [43] considered a tree-based structure in modeling the material removal, where the DWT and FFT of the acoustic signal was used as input for the k-fold XGBoost model, obtaining a MAPE of 4.373%.
Finally, interesting applications of ML focus on Energy modeling in order to optimize the MT consumption, guaranteeing the machining performance level. Pantazis et al. [45] applied Hidden Markov Modeling using the power signal as an input while using dynamic time warping with hierarchical clustering for the feature extraction. The authors obtained a MAPE of 1.12%. Considering the instantaneous power and process parameters, He et al. [46] proposed a complex structure with two different CNN, both for modeling and feature extraction, including a fully connected layer.
Shin et al. [96] applied an ANN with a transfer learning approach based on the similarity between manufacturing conditions. The authors considered the feed rate, spindle speed and cutting speed as the model inputs, obtaining a maximum relative error of 5.94%. Through the machining parameters and the ANOVA study for the feature selection, a Gradient Boosting Regression Tree (GBRT) model was presented in Reference [18] to predict the energy consumption of a five-axis MT. The study predicted the energy consumption ratio of each considered component (e.g., basic, spindle, feed and milling) with the prediction errors within 6%. Finally, a further example of energy consumption prediction was shown in Reference [97]. The authors developed a multitasking learning for power classification in laser machining, achieving an accuracy of the classification model over 90.61% and a MAE of 0.29558 for the regression.

Machine Condition Monitoring
Condition monitoring is one of the most recognized application of ML in machining processes. The capacity to predict and detect a failure or identify the wear level of the tool may determine and improve the quality of the production and the performance level of a MT. In particular, the prognostic of the MT component and part behavior is one of the main topics of the state-of-the-art innovations in this area, although the literature shows several studies related to the Tool Condition Monitoring with respect to MT structures condition monitoring, since it may impact on equipment costs and efficiency strongly, as demonstrated in Section 2.6.
Li et al. [98] presented a deep transfer learning perspective on condition monitoring based on the CNN approach. Three types of faults were identified: spindle failure, severe tool wear and tool breakdown. Through similarity coefficients, the authors addressed the feature distribution mismatch during transfer learning, on which the CNN was applied based on the maximum mean discrepancy in a Reproducing Kernel Hilbert Space. The experimental results indicated that the approach achieved 94% accuracy in order to evaluate the lifecycle of a MT.
CNN was also applied for an imbalanced classification problem, as demonstrated in Reference [99], applying the Deep Cost Adaptive CNN configuration to the vibration signal and obtaining a final MAE of 2.5 µm.
In order to predict the backlash error in machining centers [100], a Deep Belief Network (DBN) via stacking Restricted Boltzman Machines (RBM) was applied using a number of inputs, such as the number of weeks since the last maintenance, the temperature of the coolant tank, the temperature of the machining center, the ambient temperature, machining torque and the backlash error in the previous week. The result was an MSE of 0.0122 µm and an ME of 0.2041 µm. A further extension of the neural-DL structure was the BLSTM-ANN, which considered the FFT and sliding time window of the forces, vibrations and acoustic emissions as the input to predict the Remaining Useful Life (RUL) of a MT [37].
Lu et al. [101] defined a six-state classification Deep Forest model based on tool wear, chatter and machining deformation. In this case, the FFT and WPT of the vibrations and sound signal were considered as the inputs, while the Lasso technique was implemented to select the model features.
An interesting application of MT performance monitoring and fault classification was developed with a Random Forest structure in Reference [21] with the FFT, peaks and RMS of the vibration signal from the spindle. The obtained results showed a structured approach to monitor the machine health and performance in real time.
The last tree-like structure was applied in Zhang et al. [102] for the failure prediction in cyber-physical productions system using the spindle speed, spindle power and vibration level. The authors extracted the features through the Dynamic Principal Component Analysis, and a Gradient Boosting Decision Tree was applied as the model. The experimental results indicated that the accuracy of the predicted production failures using the proposed predictive tool was close to 73%.
Finally, Nguyen et al. [103] presented an interesting study to classify the bearing fault diagnosis using the FFT of vibration data as an input signal, from which a Stacked Autoencoder extracted the features applied by a Least-Squares Support Vector Machine and optimized by the Chemical Reaction Optimization Algorithm.

Tool Condition Monitoring
The modeling and prediction of the tools' condition may offer significant opportunities in increasing the MT performance level, since it may ensure the quality of the parts, improve machining efficiency and reduce the operation costs. For this reason, several ML techniques may be evaluated and applied to model this phenomenon. The wear evolution may classify and be discretized in three main phases. The first phase presents fast wear of the tool, usually at the beginning of the process when the friction between the tool and the workpiece generates the maximum stress. In the second phase, the tool and speed wear are steady, since the roughness of the tool achieves a certain smoothness. Finally, the cutting force arises due to the blunt tool edge, and the wear speed increases with the resulting temperature, causing heat deformation, precision errors and, finally, the tool breakage.
The literature shows two main ML techniques to address this problem based on direct and indirect methods. Direct methods aim to directly measure the wear on the tool; this could also be achieved through the application of ML from images to avoid the corresponding time needed for the manual measurement. Wu et al. [31] developed a CNN approach based on tool images focusing on the spindle speed in order to allow the camera to detect the effective images. The results indicated an average recognition precision rate of 96.20% in tool wear classification (flank wear, tool breakage, adhesive wear and rake face wear). Other applications are required to stop the machining process for the tool wear measurement [32,104], with significant limits in terms of time waste. Despite the high model accuracy and precision of direct methods, the request to stop the MT generates several issues in production. Moreover, the direct measure requires a high knowledge of the process to define when the tool wear should be measured, increasing the complexity of the sample method.
In contrast, the indirect methods are based on sensor signals (e.g., vibrations, acoustic emission or forces) and are applied to conduct continuous or intermittent estimations and predictions of the tool wear. The implementation of indirect methods, despite a reduction in the final accuracy, may be preferred and provide new opportunities to develop a TCM application. The indirect methods are based on two main approaches: classification, that may vary from multiple classes to a binary application, and regression, to obtain the current wear status or the RUL.
Li et al. [23] developed a TCM classification in six wear states: good, slight, average, heavy, severe and failure. The authors obtained the wavelet packages from the sound signal in order to separate the source signals from the wavelet sub-bands. They applied the Extended Convolutive Bounded Component Analysis (ECBCA), and finally, they decomposed the source signals into time-varying oscillatory components using the Multivariate Synchro squeezing Transform (MSST) for denoising. The Adaptive Kernel Principal Component Analysis (AKPCA) was implemented to decompose the extracted features in a set of linearly uncorrelated components to obtain an input vector that allowed the application of a simpler ML method, such as SVM with an RBF kernel with a testing accuracy of 98.47%. Simpler ML applications need extra signal treatment and feature extraction for correct functioning. For example, a Two-Layer Angle Kernel Extreme Learning Machine (TAKELM) was found in three different applications by changing the input signals: sound [105], current [106] and multi-sensor [107], where a Binary Differential Evolution (DBE) was applied to search the optimal feature parameter combination.
In order to avoid any extra sensors mounted on the machining center, Pagani et al. [108] presented an indirect measurement system using HSV images of the produced chips, obtaining good performances on medium and high wear, but they obtained poor performances in initial wear recognition. Despite the ability of this method in sensor choices, positioning and acquisition; it still showed a few practical problems due to the high variability in chips and coolant applications.
The literature shows interesting applications of the ANN method. Liu et al. [109] used the WPT of the sound input signal and a Collinearity Diagnostic with a Stepwise Regression for feature selection. The authors obtained an overall error of 7.20%. Differently, Segreto et al. [25] applied the statistical features extracted from the WPT of Force, AE and Vibrations. A feature selection step was implemented based on the PCC obtaining a MAPE of 5.17%.
The multi-sensor scheme was applied in further applications, for example, with treebased structures. Wu et al. [110] considered time-domain features with a RF model, obtaining an MSE of 10.156 µm. Shen et al. [111] introduced an ensemble conceptual design based on several ML methods (e.g., RF, Gradient Boosting Regression, ANN, Linear Regression and SVM), with a final phase of dynamically smoothing to predict the tool wear size. As in other applications, the PCC was applied for feature selection, including the process parameters, obtaining a final RMSE of 0.00834.
In this context, Deep Learning applications may improve the prediction model, as demonstrated in different studies. In Reference [112], the authors applied a LeNet-WSRMC network for tool state recognition and classification, employing vibration and current signals and obtaining an average testing accuracy of 96%. Similarly, Lee et al. [38] obtained an accuracy of 97.44% in worn grinding wheel recognition, evaluating the sound signal and using the FFT and the retransformation into the time domain through iFFT for signal processing. A further application of the CNN method was proposed by Martínez-Arellano et al. [113]. They considered the GASF method to obtain an image representation of the force's signal input. Cao et al. [14] applied the DWF and the Hilbert Envelope Demodulation Spectra (HEDS) approach to the vibration signals with a 2D input for the CNN, resulting in a final classification accuracy (for five categories) of 98.7%.
Finally, the combination of LSTM and CNN improves the feature extraction step and the ability of time-dependency feature extraction. An example was presented by An et al. [33], where the vibrations and PLC controller signals (axes position and spindle power) are analysed with a SBULSTM (Stacked Bi-Directional and Uni-Directional LSTM Network) with a fully connected layer at the top, adding nonlinearity to the output and one regression layer to generate the target RUL. The results indicated an accuracy of 90%. Several other applications of both ML and DL schemes, such as SSAE [27] or DBN [114], are classified and summarized in Table 1.

Discussion
The application of ML methods to machining processes is a challenging task that covers a broad range of domains. The literature presents several recent research, examples and practical experiments, with particular focus on the TCM applications and roughness modeling. The main problems to be solved using ML and DL are often correlated, for example, chatter phenomenon may generate unwanted vibrations that impact on the workpiece surface finishing (roughness) and, consequently, on a quicker degradation of the tool. In the same way, the tool wear is directly correlated both with the final quality of the workpiece and with an increase of the energy consumption. As noted in the different applications, the main strategies are based on the same methods; nevertheless, the integration of new algorithms and schemes are required to satisfy specific conditions. This point may represent a limit to identify a structured rule to implement the ML approach. Table 1 shows an overall summary of the recent literature classified by type of application problem, machining, algorithm, input, features and obtained results. It is noted that milling and turning machining show the most promising results to model and predict the MT behavior. Nevertheless, new achievements have been obtained from other applications that represent the new frontier for ML.
In this review, an overall analysis of the recent advances of ML and DL in machining processes has been developed, starting from the application problem. Considering chatter applications, several studies and research with fewer sensors (or even none, if a datadriven model is deployed instead of a real-time recognition system) have been found. Transfer Learning and DL applications are the main implemented methods with satisfying results; despite this, the application of this type of architecture may be more complex and referred for the problem. In this context, significant results are obtained using the SVM or ANN approaches that represent an effective solution. Moreover, in order to have real-time recognition, a sensor should be applied as, for example, vibrations or sound sensors; nevertheless, the limits for industrial implementation should be considered.
Roughness prediction models can also be deployed with ML algorithms. As demonstrated, there are two main ways: a regression model and multi-classification system that corresponds to a coarser discretization of the roughness interval considered to reduce the output variability. For these applications, good results were found with Bayesian applications, especially with correct PCA applications for regression problems; for grinding processes and a coarser discretization (multi-classification), good results have been obtained by applying an ANN with a DWT feature extraction method. Finally, DL applications are used for classification problems, where a DBN application with a vibration signal obtained the highest accuracies.
Quality problems, together with model problems, are tendentially a case development problem; there are several applications, depending on the needs found by each author in real applications. The literature shows different practical studies with a broad range of algorithms and techniques. The adopted methods depend on the type of problem to be solved. Special interest and promising results were found for thermal deformation modeling and CFRP machining.
Machine Condition Monitoring through Deep Transfer Learning is an interesting area, since they allow a greater generalization in their deployment. Most applications were developed with DL structures with few ML algorithms, such as, for example, SVM, but using as the feature extraction a Stacked Autoencoder or Random Forest. Deep Forest showed, together with other DL algorithms, significative results in the backlash error prediction through a DBN. Finally, TCM has been the main source of articles for Smart Machining. As for other applications, two main structures have been found: regression and classification problems, apart from direct and indirect methods. Considering indirect methods, which are the ones that allow a continuous functioning of the machining center and a real-time control of tool wear and RUL estimation, a correct choice would be to deploy a DL strategy, through SSAE, LSTM (or RNN) and CNN, for feature extraction, which would allow an extra degree of freedom for the choice of the actual modeling algorithm, which may even be a ML one. If the available dataset is small, a good choice is to aim at using a simpler ML algorithm as ELM or SVM, both of which obtained good results when the lack of data was one of the issues. If the amount of data available is not an issue, a DL application should be deployed, as this type of configuration allows for less interference of human choices and a greater generalization of the algorithm scheme. The application of CNN is particularly interesting for feature extraction, together with an LSTM for temporal feature recognition.
The physical comprehension of the model and the amount of data expected are two factors that require an extensive experience in ML strategy deployment. White-box models (e.g., DT and RT) are based on patterns or rules that can be understood by experts and/or technicians. Black-box models (e.g., CNN and DNN) contain complex mathematical functions, and their deployment is not determined by predictable interactions [115]. The tradeoff between model complexity and depth affects the computation time of the training phase. A concern regarding the model depth is the double-descent phenomena, where the performance varies when the model complexity increases or the number of training epochs is altered, as was investigated in Reference [116].
The selection and installation of the sensors may play a critical role in the ML results. Depending on the application, the use of the sound sensors (microphones) and cameras may be suitable, although their implementation in an actual industrial environment may find several drawbacks. For example, cameras, apart from being an expensive sensor, need a correct illumination, positioning scheme and a clean environment (dust may influence the measurement), and even floor vibrations may influence the final image quality. Moreover, the actual image quality will result in high-dimensional input data, which means more complex algorithms are needed in order to be used; for CNN applications, although the image could be used, a first filtering and data reduction phase should be implemented in order to reduce the computational time required.
Sound sensors instead, despite their great ability in grinding wheel-worn control and chatter recognition, require an important phase of signal treatment typically developed for each application, and the actual microphone disposition may generate several issues in the actual practical application. The advantage of these last sensors is that they do not influence the actual machining center or change their configuration, as happens, for example, with the implementation of a dynamometer. This type of sensor may limit the dimensions of the workpieces; moreover, they are expensive and interfere with the rigidity of milling machines. Although sensor positioning is crucial for the measurements, vibration sensors tend to usually be applied among different applications, such as for sound sensors; there is the need for an extensive signal processing phase for this kind of sensor, as they are difficult to filter, and vibrations exist even during air-cut operations.
AE sensors have a superior sensitivity of high-frequency signals but are highly sensitive to environmental noise, and it is difficult to process due to intermittent cutting, such as, for example, in milling processes, there is a spike when each individual flute enters or exits the workpiece.
Due to the nature of the manufacturing process, signals are usually nonstationary and contain both high-and low-frequency components, which make the application deployment with single sensors more difficult. This is why, for example, both vibrations and AE sensors tend to be used together in order to simultaneously consider both high-and low-frequency components. Multi-sensor applications tend to have a wider application and better results, although the need for correct feature extraction and selection methods is a necessity to allow ML and DL applications with higher amounts of data. Interestingly, the current sensors are getting more attention for TCM, as they tend to be able to correctly represent the wear stages; these applications tend to need an extra sensor to recognize the actual state, as the current values depend on the machining parameters. The current sensors tend to be subjected to considerable noise, and the actual signal influenced by the damping and friction between elements; nonetheless, it has a limited sensitivity band with respect to other sensors, such as AE.

Conclusions
This review aimed to present an overall analysis of the recent advances in machining processes using ML. The literature presents several studies and research with promising results. The proposed approaches are based on the same methods; nevertheless, the integration of new schemes and algorithms is required to satisfy specific conditions. This point may represent a limit to identify a common rule to implement the ML approach. Although a number of practical studies show promising results-in particular, for milling and turning machining-the limits and constraints of ML application cannot be ignored. First, semi-supervised methods are rarely found despite being a promising typology for industrial applications, probably due to their critical aspect of depending on the output of a trained model to train the actual ML model. Another further issue is the usage of reinforcement learning, which is still difficult to be applied due to the training duration and complexity.
These drawbacks need to be considered in the overall evaluation of ML/DL machining applications. They represent an open challenge to improve the integration of ML and DL schemes into a real manufacturing environment. Smart equipment, sensors and cloud data sources that connect industrial machines are the boosting drivers for ML/DL applications. Consequently, model deployment and testing are becoming a work activity for experienced technicians and lab researchers, achieving significative results for manufacturing and production applications [160][161][162][163]. Additionally, the ML/DL combination with stochastic procedures permits a superior accuracy. In particular, a parallel operation would allow to train the models in the presence of unstable or transient processes, assuring a minimization of the convolution time.
The main technology adopted for most applications is based on Deep Neural Networks (DNN), where CNN and LSTM have proven to be particularly important in obtaining higher accuracies with respect to other ML methods, especially for TCM, where a combination of both has proven to be the optimal structure to extract the temporal features and for wear calculations. Despite the accuracy obtained with Deep Learning methods, its application is subject to great amounts of data availability with respect to other, simpler ML methods that sometimes manage to obtain close results requiring smaller datasets and a fraction of the computational time. Deep Learning applications do not require any feature extraction and selection phase, where they are required in ML applications. Moreover, the choice of the correct features, or even their extraction, may not be optimal, and thus, the influence and need of some expertise and know-how is crucial for this type of application. The review of the literature highlighted several research directions and unexploited opportunities for ML and DL applications that require further investigation.
Author Contributions: All the authors contributed equally. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflicts of interest.