A Comparative Analysis on Prediction Performance of Regression Models during Machining of Composite Materials

Modeling the interrelationships between the input parameters and outputs (responses) in any machining processes is essential to understand the process behavior and material removal mechanism. The developed models can also act as effective prediction tools in envisaging the tentative values of the responses for given sets of input parameters. In this paper, the application potentialities of nine different regression models, such as linear regression (LR), polynomial regression (PR), support vector regression (SVR), principal component regression (PCR), quantile regression, median regression, ridge regression, lasso regression and elastic net regression are explored in accurately predicting response values during turning and drilling operations of composite materials. Their prediction performance is also contrasted using four statistical metrics, i.e., mean absolute percentage error, root mean squared percentage error, root mean squared logarithmic error and root relative squared error. Based on the lower values of those metrics and Friedman rank and aligned rank tests, SVR emerges out as the best performing model, whereas the prediction performance of median regression is worst. The results of the Wilcoxon test based on the drilling dataset identify the existence of statistically significant differences between the performances of LR and PCR, and PR and median regression models.


Introduction
A composite material usually consisting of a combination of two or more materials with varying physical and chemical properties has superior characteristics as compared to its individual constituents. Without losing the properties of the entities, they are combined together, contributing to the most useful properties of a composite material for a special purpose application [1]. Several advantageous properties of composite materials, such as high impact strength, stiffness, corrosion resistance, strength-to-weight ratio, thermal conductivity, dimensional stability, customized surface finish, lightweight, etc., have made them a popular choice in manufacturing of aerospace structures, electrical equipment, pipes and tanks, laminated beams, etc. Thus, a composite material has multiple desirable properties which cannot be found in a single traditional material.
Among different types of composite materials, fiber-reinforced polymer (FRP) composites have a polymer matrix which is reinforced with an artificial or natural fiber (i.e., carbon, glass or aramid). In FRP composites, the matrix protects the fibers from environmental and external damage, while the fibers provide strength and stiffness resisting crack generation and failure of the base material. On the other hand, in metal matrix composites (MMCs), the matrix is usually made of a lighter metal (i.e., aluminum, magnesium, etc.) which is ML applications, despite its tremendous strides in some other fields, is at a nascent stage in manufacturing/machining sciences. The primary goal of this work is to analyze the utility of various ML-based regression methods in predictive modeling of machining processes. In this paper, LR, PR, SVR, PCR, quantile regression, median regression, ridge regression, lasso regression and elastic net regression are considered because of their ability to deal with continuous data for predicting the response values during turning and drilling operations of composite materials based on two past experimental datasets. To the best of the authors' knowledge, these regression models have been individually applied as prediction tools in separate machining processes, and no study has been conducted dealing with their applications in a single research framework. The predictive performance of the considered regression models is contrasted using four statistical error estimators, i.e., mean absolute percentage error (MAPE), root mean squared percentage error (RMSPE), root mean squared logarithmic error (RMSLE) and root relative squared error (RRSE) for both the case studies. Finally, two non-parametric tests in the form of the Friedman test and Wilcoxon test are performed to respectively identify the best performing regression model and statistically significant differences between those models.

Linear Regression
It is the simplest form of the regression models where the relationship between independent and dependent variables is considered to be linear. It only takes into account the main effects of the independent variables on the dependent variable, having the following form: where y is the dependent variable, β 0 is the intercept, β i is the coefficient of ith independent variable, x i is the ith independent variable (i = 1,2, . . . , n) and ε is the error term. Thus, based on simple linear equation, values of the responses for any combination of the input parameters within the specified range can be predicted.

Polynomial Regression
Unlike multivariate LR, PR model is usually developed while considering higherorder terms of the input parameters (independent variables). Both LR and PR models determine the corresponding coefficient values based on ordinary least squares estimator. In this paper, PR models of order two are developed which can be expressed as below: where β ii is the coefficient of x i 2 term.

Support Vector Regression
The SVR is a supervised learning technique, applied both for classification and regression, and is based on the principle of support vector machine (SVM), which develops a hyperplane between two sets of data [22,23]. A margin is created while developing two parallel hyperplanes, each on the opposite side, and its width reaches to the maximum at optimal solution. The optimal separation (solution) is achieved at minimum generalization error of the model, thus ensuring highest margin between the two hyperplanes. The data subset representing the optimal margin is known as support vector.
In SVM, dimension of the classified vectors has less influence on its performance unlike other conventional regression models. It employs a set of training data to learn and develop a model in order to minimize the generalization error when its performance is validated with different sets of testing data. Although it is mainly applied for solving classification problems, but after the introduction of SVR, it has received a great interest among the research community in solving regression problems which are quite difficult to solve by the conventional models. As it has very few tuning parameters, the corresponding computational effort greatly reduces while searching out its appropriate architecture for a given problem. Having the ability to solve both linear and non-linear models, it basically employs non-linear kernel functions (such as polynomial) to derive the optimal solutions for non-linear models.

Principal Component Regression
The PCR model combines both principal component analysis (PCA) and least squares regression [24]. Its application starts with developing a stepwise regression with a dependent variable y and a set of independent variables x for deriving p statistically significant independent variables (less than 0.05) and revealing the presence of multicollinearity among the p independent variables. A PCA is then performed with p independent variables for transforming a set of correlated variables to a set of uncorrelated principal components while indicating information quantities of different sets of principal components. In the subsequent steps, values of standardized dependent variable, p standardized independent variables and p principal components are determined for developing p standardized PCR models [25]. The standardized PCR model is thus formulated with the first principal component and the other principal components are added backwards one by one to derive p standardized PCR models. In this paper, all the input parameters for the considered turning and drilling processes are treated as the principal components.

Quantile Regression
Quantile regression is a technique to estimate relationship between a set of variables for all portions of a given probability distribution. While the conventional regression models provide information with respect to mean values of the distributions for a set of regressors, it computes several different regression models for various percentage points of the distribution while providing a complete depiction of the data [26]. For Tth quantile, the area under the probability distribution curve can be split into two sections, i.e., one with area below the Tth quantile and the other with area (1 − T) above it. Thus, the regression model for Tth quantile can be represented as below: In multivariate regression models, change in the conditional mean of the dependent variable related to a change in the regressor (independent variables) is specified, while quantile regression specifies changes in the conditional quantile. Thus, it can be considered as an extension of multivariate regression models. This model helps in inspecting the rate of change of the dependent variable by quantiles. When the model is developed for 50th quantile, it is called median regression.

Median Regression
It is already stated that the 50th quantile regression is known as median regression. Median regression is also sometimes referred to as LAV (least absolute-value) regression as its parameters are estimated by minimizing the sum of absolute value of the residuals. If covariates are absent in the median regression model, the calculated intercept would be the usual estimate of the median [27]. The adjusted median computed using LAV is relatively insensitive to outliers as compared to LR models. The following equation for median regression can now be derived from quantile regression:

Ridge Regression
As multivariate LR models are based on least squares estimates, they do not perform well for ill-conditioned data with respect to both prediction accuracy and model size. While deriving the optimal fit to the estimation data, least squares often do not perform well for new data (outside the region of the estimation data). To overcome these drawbacks of ordinary least squares estimates, several regularized regression models, such as ridge regression have evolved out since the last few decades.
In ridge regression, the main focus is to determine an appropriate smaller value of k to provide the least squares estimates without any prior information [28]. A ridge analysis is based on the original data or principal components. The orthogonality of both the data and priors provides estimates which are simple weighted averages of the likelihood estimate and the prior mean. These estimates with the largest variances are maximally shrunk, and larger values of k force all these estimates closer to zero. It does not reduce the coefficients to absolute zero and thus, cannot eliminate the statistically insignificant predictors.

Lasso Regression
The conventional multivariate regression models usually suffer from the problems of overfitting of data and overestimation (how well the model would perform to explain the observed variability using the considered variables). Overfitting occurs due to presence of statistically insignificant terms in the model, that inflates the training goodness-of-fit. They tend to perform poorly while predicting dependent variables having extremum risk. The least absolute shrinkage and selection operator regression i.e., lasso can effectively address both the problems. It is a shrinkage and variable selection method for developing regression models. It primarily aims to identify variables and corresponding coefficients to develop a model with minimum prediction error [29]. This can be attained while imposing a constraint on the model parameters to shrink the regression coefficients towards zero, i.e., by forcing sum of absolute values of the coefficients to be less than a fixed threshold (λ). After shrinkage, variables having regression coefficients of zero are excluded from the model. In this technique, λ is determined based on an automated k-fold cross-validation. k equi-sized sub-samples are generated from the initial dataset. (k − 1) sub-samples are employed for developing the corresponding regression model. The remaining sub-sample is utilized for model validation. This procedure is repeated for k number of times, with each one of the k sub-samples being used for validation and the others for model development. The k separate validation results for a range of λ values and the most preferred value of λ are combined together to formulate the final model. Its main advantage is that it minimizes overfitting of data and may outperform other regression models for a particular set of tuning parameters.

Elastic Net Regression
Elastic net is an amalgam of lasso and ridge regression models, combining both the principles of shrinkage and variable selection [30]. It is extremely suitable for analyzing high-dimensional data which is quite robust against extreme correlations among the predictor variables. The lasso part of elastic net helps in automatic variable selection, whereas, ridge part aids in group selection while stabilizing the solution paths in regard to random sampling, which improves the prediction accuracy. With the help of grouping effect during variable selection, a group of highly correlated variables tends to have coefficients of similar magnitude. It can select groups of correlated features when the groups are not known in advance. For developing the corresponding model, elastic net adopts a combined penalty of lasso and ridge regression penalties. The penalty parameter α determines the weight to be provided to lasso or ridge regression. The elastic net with α as 0 is equivalent to ridge regression. On the other hand, the elastic net with α close to 1 behaves much like a lasso, while removing any degeneracy and odd behavior due to high correlations among the predictor variables. It has been noticed that the application of elastic net can result in lower mean squared errors for correlated variables.
It has already been mentioned that this paper focuses on the applications of nine different regression models as prediction tools during turning and drilling operations of composite materials. To have better performance of some of these models, values of the corresponding tuning parameters are chosen based on 5-fold cross-validation, as shown in Table 1 for both the machining processes. The value of λ adds a penalty in a given regression model. With its higher values, flexibility of the regression fit decreases, leading to lower variance but increased bias. In elastic net, value of α helps to reach a trade-off between ridge and lasso regression models. It behaves like ridge for α = 0, and α = 1 corresponds to lasso.

Statistical Metrics
In this paper, to validate the prediction performance of the nine regression models, four statistical error estimators, i.e., MAPE, RMSPE, RMSLE and RRSE are considered [31]. The MAPE compares the actual (A i ) and the predicted (P i ) responses in terms of percentage error. The RMSPE is a well-accepted measure to appraise goodness-of-fit of a regression model to best describe the average percent error during prediction of the response values. RMSLE, use of logarithm helps in estimating the percentual variation between the A i and P i response values. In this measure, small differences between small A i and P i response values are treated similarly as big differences between A i and P i response values. The RRSE is calculated by first finding the total squared error and then normalizing it by dividing with the total squared error of the simple predictor. The MAPE, RMSPE, RMSLE and RRSE are computed as: where A i and P i are respectively the values of actual and predicted responses, A and P are the averages of all the actual and predicted responses respectively, and n is the number of test data.

Turning
Using a CNC lathe and based on Taguchi's L 16 orthogonal array as the experimental design plan, Laghari et al. [20] conducted 16 experiments on SiCp/Al MMC with cutting speed (v c ) (in m/min), feed rate (f ) (in mm/rev) and depth of cut (a p ) (in mm) as the turning parameters, and Ra (in µm) and tool life (TL) (in min) as the process outputs (responses). Turning operations were performed on the considered work material using a carbide cutting tool under dry machining conditions. Each of the turning parameters was varied at four different operating levels to study their effects on the responses. The measured response values at varying combinations of the turning parameters are provided  Table 2. Among these 16 experimental observations, 11 datasets are randomly selected for training the considered regression models, whereas the remaining five are adopted for testing purposes. Table 2. Turning parameters and measured responses [20].

Turning Parameter
Response Now, for this turning process, to explore the applicability and potentiality of the considered regression models, and validate their prediction performance, the corresponding regression models are developed using the open-source programming language R (version 4.0.5). The related LR and PR-based models for Ra and TL are provided as below: Tables 3 and 4, respectively, show Ra and TL's predicted values during turning operation for all the nine regression models. On the other hand, Figure 1 depicts the actual versus predicted responses for the testing data by the considered regression models. The closer the test data points are to the diagonal identity line, the better is the prediction performance with lesser error. If there is an overlap of a data point on the identity line, it indicates 100% prediction accuracy for that data point. Similarly in Figure 2, if the data points lie on the zero line, there would be no residue (error) after prediction. The larger the vertical distance of a data point from the zero line, the larger is the residue. Positive residues indicate underprediction, whereas negative residues denote overprediction by the corresponding regression model. Conversely, for Figure 1, values above the identity line indicate over-prediction, and below the identity line, the regression model indicates underprediction. Thus, from Figures 1a and 2a, it is observed that PR has large residues for all the test data points. On the other hand, the predictions are quite accurate for the SVR model baring one test data point. Small residues are also noticed for LR models. For tool life, all the regression models are found to be overpredicting, as revealed from Figures  1b and 2b. Here too, PR-based predictions have high residues. However, having simple mathematical formulation and structure, LR seems to be the most adequate model in correctly predicting both responses. Values of all the statistical error estimators, i.e., MAPE, RMSPE, RMSLE and RRSE, are now plotted in Figure 3. This figure reveals that SVR has the minimum values for all the error metrics, whereas, PR has high prediction errors.        In an attempt to identify the best and worst-performing regression models, and statistically significant differences between pairs of the regression models based on the predicted response values, Friedman test and Wilcoxon test are respectively carried out [32]. The Friedman ranks and aligned ranks are respectively provided in Tables 5 and 6 for Ra values during turning operation of SiCp/Al MMC. While assigning aligned ranks using the Friedman test, the average prediction performance by all the models is first computed for each test dataset. The differences between the performances of all the models and the average are then calculated, and are subsequently ranked. The results of both Friedman rank and aligned rank tests identify SVR as the best performing regression model (having the minimum average ranks) for the considered test dataset, where the prediction performance of median regression is not at all satisfactory. The results of Wilcoxon test for Ra, as exhibited in Table 7, exhibit no statistically significant difference between any pair of the regression models with respect to their prediction performance. Similar observations are also noticed for TL response during the said turning operation.  1  12  38  4  23  41  43  22  6  13  2  32  14  5  25  45  44  24  3  7  3  11  1  19  30  37  39  29  33  35  4  15  2  8  27  36  40  28  31  34  5  17  42  9  18  16  26  20  10

Drilling
Chaudhary et al. [33] performed drilling operation on aluminum MMCs with spindle speed (S) (in rpm), feed rate (f ) (in mm/rev) and point angle (P) (in degree) as the input parameters, and MRR (in mm 3 /min), Ra (in µm) and oversize (OS) (in mm) as the responses. Based on a central composite design plan, 20 experiments were conducted while varying the drilling parameters at three different levels. Table 8 shows the values of different drilling parameters and responses at various experimental trials. Among 20 experimental runs, 16 trials are chosen for training of the regression models and their prediction performance is evaluated using the remaining six observations.
Figures 4a and 5a depict that LR, PR and SVR are the top three models for accurate prediction of MRR values in the said drilling operation. Quantile and median regression models have the largest residues. Furthermore, the overprediction errors for all the models are observed to be larger than their corresponding underprediction errors. On the contrary, during prediction of Ra value, the order of magnitude of error for underprediction is larger than that for overprediction, as revealed from Figures 4b and 5b. Here, SVR is observed to the best performing model, followed by PR and PCR. From Figures 4c and 5c, it can be unveiled that there are more overprediction errors than underprediction errors for OS response, and SVR appears to be the best performing regression model, followed by LR.
When the values various statistical error estimators are plotted in Figure 6, it can be noticed that SVR has the superior prediction performance, followed by LR and PR models. On the other hand, ridge and median regression models have worse prediction performance. Like the turning problem, applications of Friedman rank and aligned rank tests (not shown here due to paucity of space) also recognize SVR as the best performing regression model for predicting all the response values for the said drilling process. Table 9 depicts the calculated p-values of Wilcoxon test for MRR, which reveal significant differences in the prediction performance between LR and PCR, and PR and quantile regression models. Similar differences are also noticed for Ra and OS responses for the said drilling process. performance. Like the turning problem, applications of Friedman rank and aligned rank tests (not shown here due to paucity of space) also recognize SVR as the best performing regression model for predicting all the response values for the said drilling process. Table  9 depicts the calculated p-values of Wilcoxon test for MRR, which reveal significant differences in the prediction performance between LR and PCR, and PR and quantile regression models. Similar differences are also noticed for Ra and OS responses for the said drilling process.

Conclusions
This paper deals with exploring the application potentiality of nine different types of regression models, i.e., LR, PR, SVR, PCR, quantile regression, median regression, ridge regression, lasso regression and elastic net regression as effective prediction tools for envisaging the response values during turning and drilling operations of composite materials. Two past experimental datasets are employed here for training and subsequent validation of the developed regression models. Values of the required model tuning parameters are evaluated using 5-fold cross-validation approach. It is noticed that for both the machining processes, SVR emerges out as the best regression model with minimum values of MAPE, RMSPE, RMSLE and RRSE, followed by LR and PR models. On the contrary, ridge and median regression models have poor prediction performance. Results of Friedman rank and aligned rank tests also portray the same observations. The superiority of SVR model for the two cases studies reported in the paper may be due to its smaller number of tuning parameters, robustness, and capability to deal with both linear and nonlinear models. The application of another non-parametric test (Wilcoxon test) identifies differences in the prediction performances between LR and PCR, and PR and quantile regression models at 5% significance level for the drilling process. In this paper, prediction performance of all the nine regression models is contrasted using small experimental datasets. Better and more accurate results may be expected while applying these models for large datasets. As a future scope, other regression models dealing with categorical variables, such as logistic regression, Cox regression, Tobit regression, etc., may be employed as prediction tools in real-time machining environment.