Performance Comparison of Parametric and Non-Parametric Regression Models for Uncertainty Analysis of Sheet Metal Forming Processes

: This work aims to compare the performance of various parametric and non-parametric metamodeling techniques when applied to sheet metal forming processes. For this, the U-Channel and the Square Cup forming processes were studied. In both cases, three steel grades were considered, and numerical simulations were performed, in order to establish a database for each combination of forming process and material. Each database was used to train and test the various metamodels, and their predictive performances were evaluated. The best performing metamodeling techniques were Gaussian processes, multi-layer perceptron, support vector machines, kernel ridge regression and polynomial chaos expansion. Formal analysis, A.E.M., P.A.P., A.F.G.P., and M.C.O.; funding acquisition, P.A.P. J.V.F.; investigation, A.E.M.; software, M.C.O. and B.M.R.; supervision, P.A.P., B.M.R.; A.E.M.; and


Introduction
Sheet metal forming is a widely used manufacturing technique in the automotive and aerospace industries. As the standards of modern industry become more demanding, the traditional trial-anderror approach to process design is too costly to be viable, both in scrap losses and in time spent. As such, researchers look for ways to make process design more efficient. Since sheet metal forming problems present high non-linearity with regard to material properties, boundary conditions, and geometry, the creation of analytical models is unfeasible. As a result of this, researchers began to focus on the use of the finite element method (FEM) to model forming processes. However, the FEM simulation of complex forming processes can be computationally expensive, and a large number of simulations can be required in order to find a good design solution, due to the high number of variables. Alternatively, metamodeling techniques can be used to create predictive models based on the data obtained from a set of numerical simulations, limiting the amount of simulations required during the design process and, as such, reducing the computational cost. Parametric metamodeling techniques, such as the response surface method (RSM), have been substantially used in sheet metal forming problems. Wei et al. [1] applied this method to reduce the amount of FEM simulations required to optimize the forming process of a deck-lid outer panel, while Naceur et al. [2] used a moving least squares iterative adaptation of the method in two different problems: the minimization of springback in the deep drawing of a cylindrical cup and the optimization of the initial blank shape for a forming process. These models can achieve good prediction accuracy; however, they may struggle in cases with high non-linearity. As such, in recent years much attention has been given to machine learning (ML) metamodels. In particular, the artificial neural network (ANN), support vector machine (SVM), and Gaussian process (GPs) have been applied to various sheet metal forming processes. Sun et al. [3] applied SVM, alongside RSM and Kriging (a particular case of GP), in the optimization of the forming process of an automobile inner panel. Teimouri et al. [4] explored various ANN algorithms in a springback optimization problem, and compared them with the RSM, concluding that the ANN algorithms showed better performance. Wessing et al. [5] compared the application of ANN and Kriging in predicting the final sheet thickness of a B-pillar and concluded that Kriging performed better. Similarly, Ambrogio et al. [6] obtained better results from the Kriging method when compared to ANNs and RSM, when applied to the prediction of the final sheet thickness in an incremental sheet metal forming problem. Feng et al. [7] used SVM in an optimization problem related to variable blank-holder force and Jingdong et al. [8] used GP in the prediction of forming defects, namely, the occurrence of fractures and appearance of wrinkles. Despite the growing interest in the application of these techniques, researchers usually select just one or a few based on subjective criteria. To the authors' knowledge, no in-depth study has been conducted to determine the relative performance of the many available regression metamodeling techniques when applied to sheet metal forming processes, as the existing studies only focus on a small number of techniques.
The present work consists of a performance evaluation of various regression metamodeling techniques when applied to the prediction of results of sheet metal forming processes. The parametric metamodeling techniques evaluated are the response surface method (RSM) and polynomial chaos expansion (PCE), while the non-parametric metamodeling techniques evaluated include Gaussian processes (GPs), artificial neural networks (multi-layer perceptron or MLP), decision trees (DTs), random forest (RF), k-nearest neighbors (kNN), support vector regression (SVR), and kernel ridge regression (KRR). All the non-parametric techniques considered can be classified as machine learning (ML) techniques. The forming processes considered were the U-Channel and the Square Cup. For each forming process, three steel grades were considered to cover a wide range of hardening behavior. For each grade, it was assumed that the elastic and plastic properties present some variability, described by a normal distribution [9]. The same type of distribution was also used to describe the initial thickness of the sheet, the contact with friction conditions, and one process parameter. The values of maximum thinning were evaluated for both processes, as well as springback for the U-Channel case and maximum equivalent plastic strain for the Square Cup case.
The rest of the paper is arranged as follows: first, a brief theoretical introduction of each of the metamodeling techniques is given, followed by a description of the FEM models built for each forming process, including the material properties. Then, the dataset generation process is described, followed by the results obtained and respective discussion. The final section contains the general conclusions taken from this work.

Metamodeling
Metamodeling techniques allow mathematical relationships to be established between the design variables (i.e., sources of variability) and the simulated outputs (i.e., responses) of forming processes. The vector of design variables is defined as = , = 1, …, , where is the total number of sources of variability (inputs). In order to train the metamodel, it is necessary to evaluate the metamodel response y * ( ) for a predefined set of training points, , to ensure that at those points the simulation outputs y( ) are well represented. In this context, it is possible to define a training matrix = , with = 1, …, and = 1, …, , where is the total number of training points.

Response Surface Method (RSM)
RSM is a regression model that fits a polynomial function to a set of training points [3]. In this work, a quadratic function is used, as follows: where y * ( ) is the estimated response for a given set of inputs and 0 , , and are the set of RSM coefficients, which can be organized in the vector of unknowns , with a dimension equal to the total number of RSM coefficients: = 0.5 + 1.5 + 1 . Note that for < the system of equations is underdetermined while for > it is overdetermined (i.e., there is a unique solution only when = ). Thus, for ≠ , the least squares method is used. This means that for < the Euclidean norm ‖ ‖ is minimized, imposing that = ; where is the linear system matrix and is the vector of simulation responses. For > it is the Euclidean norm ‖ − ‖ that is minimized.

Polynomial Chaos Expansion (PCE)
The polynomial chaos expansion (PCE) is a metamodel that estimates the response, y * ( ), for a given vector of probabilistic input variables, , through a basis of orthogonal stochastic polynomials. Assuming that the input variables are independent, the model response, y * ( ), is given by: where Ψ is an orthogonal polynomial basis, are the associated coefficients, and A is a set of preselected multi-index , which represents the input variables. In order to avoid a high number of response evaluations, only the multi-indexes that consider input variables up to a degree of 4 and low-order interactions between those variables, following a hyperbolic truncation scheme [10], are assumed. Hermite polynomials are used to construct the polynomial basis, Ψ , since the input variables are Gaussian. The coefficients are calculated with the ordinary least squares method by minimizing the difference between the model responses y * ( ) and the simulated outputs y( ).

Gaussian Process (GP)
A Gaussian process (GP) corresponds to a collection of random variables, which have a Gaussian distribution [8]. The properties of these variables can be specified by the mean and covariance functions of the GP. In practice, the mean function is often considered to be zero, which means that the GP is completely defined by the covariance function. The GP regression model is represented as follows: where y( ) is an observed response, ( ) is the corresponding random GP variable, and is the noise. The joint probability of the normal distribution of the training outputs ( ) and the test outputs ( ) is given by: where represents the noise variance, is the identity matrix, and each matrix is a covariance matrix evaluated for all pairs of points considered, with representing training points and * representing test points. The GP prediction for the group of testing points can be obtained through the following equations: where * is the vector of predicted results (mean) and ( * ) represents the covariance of model outputs, which acts as a measure of the predictions uncertainty.

Multi-Layer Perceptron (MLP)
A multi-layer perceptron is a type of feed forward neural network, which can be used for both classification and regression. It is formed by a series of nodes (neurons) grouped into layers [5]. Each node is connected to the nodes in the next layer, but there are no interconnections between nodes in the same layer. The first layer, called the input layer, is formed by a number of nodes equal to the number of inputs in the data, . The output layer receives information from the previous layer to make a prediction. Between the input and output layer, the model has one or more hidden layers. Each node in a hidden layer has a nonlinear activation function. The output of a node in a hidden layer can be described by the following equation: where is the output of the current node, ; ′ is the value obtained from node of the prior layer; is the weight associated to ′ ; is the bias term; and ∅ represents the activation function. For regression, the output layer nodes have a similar formulation, the only difference being the lack of an activation function.
The weights are adjusted when the model is fitted to the training data, through a process called backpropagation. This algorithm consists of assessing how each weight should be changed (increased or decreased) in order to obtain a better prediction, and then updating all weights in the network accordingly, in small increments, until a minimum error estimate for the prediction is achieved.

Decision Trees (DTs) and Random Forest (RF)
Decision trees are models that split data continuously, based on simple decision rules. During training, the choice of how to split the data at each node is made so that an error metric is minimized [11]. The most common metric in this case is the MSE (mean squared error). This process is repeated until each of the final nodes (leaf nodes) has a value of the MSE associated to its data under a certain threshold, defined a priori. The prediction value for each leaf node becomes the average of the values for the dependent variable associated with the training points in the node. The Random forest model is an extension of the decision tree model. It consists of training multiple decision trees, each with a different sample of the training data. The predictions made by this model are the average of the prediction obtained by each of the trees [12].

k-Nearest Neighbors (kNN)
The k-nearest neighbors method does not create a model with the training data. Each time it makes a prediction, it calculates the distance between the point for which the prediction will be made and each of the training points. Then, the k training points that are closest to the prediction point are selected. The result of the prediction will be either the average of the result values associated to these k training points, or a weighted average based on the distance, so that between these k training points, more influence is given to the ones closer to the prediction point [13].

Support Vector Regression (SVR)
The support vector regression model consists of finding a function that fits the training data, and is as flat as possible, while under the assumption that error values under a certain value of are accepted without penalty [14]. This means finding the function that can include the most training points in the area around it, with distance less or equal to . Since sometimes this restriction can be unfeasible, slack variables and * can also be defined, which work as soft margins. Values with errors between and these variables still affect the functions shape, but under a penalty.
When applied to a linear case, this problem can be represented by: where is the normal weight vector to the surface that is being approximated, and is a constant that represents the trade-off between function flatness and tolerance for deviations above . This problem can be generalized for non-linear cases by applying a kernel trick. A kernel is a similarity function between the training inputs and the unlabeled inputs for which the model will make a prediction. The kernel trick is used to transform the data into a higher dimensional space, allowing a linear learning model to learn non-linear functions without explicit mapping.

Kernel Ridge Regression (KRR)
Ridge regression creates a model of similar form to the one obtained by support vector regression, the main difference being the loss function used, with ridge regression using squared error loss. For a linear case, training this model consists in minimizing the cost function : where is the regularization term. Once again, in order to generalize this model to non-linear cases, a kernel trick is applied, mapping the data into a higher dimensional space [15].

Forming Simulations and Metamodeling Procedure
This section presents the details of the numerical models of the U-Channel and the Square Cup forming processes, including the materials considered and the relevant input variables. The procedure adopted for the generation and evaluation of the metamodels is also described.

Numerical Models
The numerical models of the U-Channel and Square Cup forming processes are represented in Figure 1. Both processes comprise three main elements: the blank holder, the die, and the punch. The first stage of the forming process consists of reducing the distance between the die and the blank holder, until an imposed force is attained (blank holder force (BHF)). Then, the punch moves to promote the material flow into the die cavity, while the BHF remains constant. The U-Channel forming process ends after a total punch displacement of 30 mm, while the Square Cup forming process ends after a total punch displacement of 40 mm. The last stage consists of the tools removal, which promotes the recovery of the elastic energy stored in the part (springback). The initial dimensions of the blank of the U-Channel and the Square Cup forming processes are, respectively, 150 × 35 × 0.78 mm 3 and 75 × 75 × 0.78 mm 3 . The material is considered orthotropic. Due to material and geometry symmetries, only one fourth of the blank is simulated for the Square Cup deepdrawing process, considering a finite element mesh with 1800, eight-node hexahedral solid elements. For the U-Channel, only half of the blank is considered, and boundary conditions are set to guarantee a plain strain state along the width of the blank, which enables the use of a total of 450, eight-node hexahedral solid elements. The numerical simulations were carried out with the in-house finite element code DD3IMP, developed and optimized for simulating sheet metal forming processes [16]. The forming tool geometry was modeled using Nagata patches [17]. The contact with friction is described by Coulomb's law with a constant value for the friction coefficient, µ, between the sheet and the tools. The constitutive model adopted in this study assumes (i) the isotropic elastic behavior described by the generalized Hooke's law and (ii) the plastic behavior is anisotropic, as generally observed in metallic sheets, and as described by the orthotropic Hill'48 yield criterion combined with Swift isotropic hardening law. The Hill'48 yield criterion is described as follows: where , , , , and are the components of the Cauchy stress tensor defined in the orthotropic coordinate system of the material; , , , , and are the anisotropy parameters and is the flow stress. The condition + = 1 is assumed and so is represented by the uniaxial tensile stress along the rolling direction of the sheet. The parameters and are assumed equal to 1.5, as in isotropy (von Mises). The parameters , , and can be related with the anisotropy coefficients , and , as follows: The Swift hardening law is expressed by: where ̅ is the equivalent plastic strain and , and are material parameters. Two types of numerical simulation outputs were considered for each forming process: (i) springback and maximum thinning for the U-Channel process; and (ii) maximum equivalent plastic strain and maximum thinning for the Square Cup deep-drawing.

Parameter Variability
Three different steel grades were considered for each forming process: DC06, DP600, and HSLA340. For each of these materials, a normal distribution was assumed for describing the variability of the following inputs: , and of the Swift hardening law; Young's modulus, E and Poisson coefficient ν of the generalized Hooke's law; anisotropy coefficients , and ; initial sheet thickness, ; and friction coefficient µ. The mean and standard deviation (SD) values of each parameter are detailed in Table 1. In addition to material parameters, the value of the BHF was also considered to introduce some variability in the process conditions; the mean and standard deviation values of the BHF for the U-Channel are 4900 N and 245 N, respectively; in the case of the Square Cup, the mean and standard deviation values of the BHF are 2450 N and 122.5 N, respectively.

Metamodel Generation and Evaluation
Based on the normal distribution of each input shown in Table 1, 1000 sets of inputs were randomly generated for each material. Numerical simulations of the U-Channel and Square Cup forming processes were performed for each of these randomly generated inputs, , with a total of 3 (materials) × 1000 (sets of inputs) = 3000 simulations for each forming process. For each material, the numerical simulations of each forming process were grouped into two sets: one training set, , with 700 simulations used to generate the metamodels, and one testing set, , with 300 simulations to evaluate the performance of the generated metamodels, by comparing the estimated/predicted output values with those obtained by numerical simulation. In addition to these sets, an extra training set and test set that includes simulations from all three materials was considered for each forming process. This was done to evaluate the impact of considering multiple materials on the predictive performance of the metamodels. The root mean square relative error (RMSRE) was used to evaluate the performance of each metamodel: where y( ) and y * ( ) are the simulated and predicted response values for the set of testing inputs , respectively, and is the number of testing points. The parametric metamodels (RSM and PCE) where generated in Excel, while the ML metamodels where generated with python libraries, specifically, GPy [19] for the GP metamodels and Scikit-learn [20] for the remaining models. Table 2 presents the RMSRE values of the metamodels generated for each forming process ("U-Channel" and "Square Cup"), material ("DC06", "DP600", and "HSLA340"), and simulation output ("Springback", "Maximum Thinning", and "Maximum Equivalent Plastic Strain"); the results for the cases labeled as "Mixed" correspond to the metamodels generated from a training set that includes all three materials. The lowest value of RMSRE for each case, which corresponds to the best predictive performance, is highlighted. The MLP model achieved the best performance in 6 of the 16 cases presented. For the remaining cases, the best performances were achieved by the GP (5), SVR (2), KRR (2), and PCE (1) models. It should be noted that the differences in performance between these five models were generally small. The RSM metamodels showed performances that were, in general, very similar to the PCE metamodels, and as such, can be considered as competitive with the models that achieved the best performances. On the other hand, the DT, RF, and kNN models performed clearly worse than the remaining metamodels for all cases. The few comparative studies found in the literature, namely Wessing et al. [5] and Ambrogio et al. [6], favored the use of the GP technique instead of ANN and RSM for thickness prediction, as it achieved significantly better results. In the current study, the GP technique also tended to present the best performance for the prediction of maximum thinning, but this was not valid for the other responses.

Results and Discussion
The inclusion of all three materials in the training and testing of the metamodels did not lead to significantly worse results, when compared to the performances obtained for the single material cases. In fact, in certain cases, the performance obtained for the models trained with the three materials surpassed the performance of models trained with just one material. For example, in the springback prediction for the U-Channel case, more than half of the metamodels tested achieved better performance when trained with the three materials than when trained specifically for the DC06 material. Thus, when training metamodels to predict forming process results considering various materials, it is worth considering the usage of just one dataset containing training data representative of all materials available, instead of training a different model for each material.
As an example, Figure 2 represents the comparison between the simulated values and the values predicted by the MLP and kNN algorithms for the testing dataset for the maximum thinning in the U-Channel process using DC06. The algorithms MLP and kNN were chosen for this comparison because they achieved the best and poorest performances, respectively, in this case.  Figure 3 presents the frequency distributions corresponding to the previous example, generated from 1000 new random points, according to the variability described in Table 1. The frequency distribution of the numerical simulation results is also represented and taken as a reference. The distribution obtained for the MLP metamodel closely resembles the distribution of the simulated results, however, the kNN shows more predictions in the range between 2.6% and 3.2%. In fact, the average value of the simulated results is 2.87%. This is in agreement with Figure 2b, where it is clear that the difference between predicted values and the corresponding simulated values is, in general, larger when the simulated values are further from the average.

Conclusions
In this work, parametric and non-parametric regression models were applied to predict the results of sheet metal forming processes, with the goal of evaluating their performance, and establish which metamodels offer the best results. It was concluded:  The ML techniques can be divided into two groups in terms of performance: o The first group consists of the DT, RF, and kNN metamodeling techniques, which generally showed poor performances, with kNN in particular producing the poorest predictions. o The second group consists of the MLP, GP, SVR, and KRR techniques. For almost all cases studied, the best predictive performance corresponded to one of these techniques, with MLP showing the best performance in more cases than any other. It is also of note that the performance of these techniques is comparable and, as such, the usage of any of them can be recommended.  The parametric modeling techniques, RSM and PCE, showed competitive performances when compared with the second group of ML techniques and should be considered as valid alternatives.  The performance of both types of modeling techniques depends on the response under analysis. For the particular case of the maximum thinning, GP shows better performance compared to the other techniques.  When training metamodels to predict forming process results for different materials, the usage of just one dataset containing the training data of all materials should be considered, instead of training a different model for each material.