Machine Learning-Based Models for the Estimation of the Energy Consumption in Metal Forming Processes

This research provides an insight on the performances of machine learning (ML)-based algorithms for the estimation of the energy consumption in metal forming processes and is applied to the radial-axial ring rolling process. To define the mutual influence between ring geometry, process settings, and ring rolling mill geometries with the resulting energy consumption, measured in terms of the force integral over the processing time (FIOT), FEM simulations have been implemented in the commercial SW Simufact Forming 15. A total of 380 finite element simulations with rings ranging from 650 mm < DF < 2000 mm have been implemented and constitute the bulk of the training and validation datasets. Both finite element simulation settings (input), as well as the FI (output), have been utilized for the training of eight machine learning models, implemented with Python scripts. The results allow defining that the Gradient Boosting (GB) method is the most reliable for the FIOT prediction in forming processes, being its maximum and average errors equal to 9.03% and 3.18%, respectively. The trained ML models have been also applied to own and literature experimental cases, showing a maximum and average error equal to 8.00% and 5.70%, respectively, thus proving once again its reliability.


Introduction
The radial axial-ring rolling (RARR) is a versatile forging process widely used in different industrial sectors such as automotive, agricultural, wind power, piping, and aerospace [1]. In recent years, several improvements have been introduced helping to obtain good surface quality, fine tolerances, and a considerable saving in material cost [2] with less production time compared to the machining process. Rings manufactured through RARR have high durability and structural strength, but the complexity of the process makes its settings and control hard to be handled without numerical simulations or prediction algorithms. For these reasons, several authors focused their attention on the development of algorithms and finite element models for a better understanding of the ring rolling process, as is hereafter summarized.
Lugora and Bramley [3] utilized Hill's general method for predicting the evolution of the ring during the process considering a rigid-perfectly plastic and incompressible material. Bruschi et al. [4] established a real-time control model, based on the artificial neural network (ANN) approach, to predict the geometrical accuracy of the ring, showing a good correlation between the ANN model and FEM results. Guo and Yang [5] defined the steady forming condition for the ring rolling process and built a mathematical model based on a constant velocity growth condition of the ring and considered the ring geometry in terms of average diameters. More recently, Quagliato and Berti [6] superseded this limit 2 of 20 by proposing a more accurate mathematical approach for the determination of the ring geometry for a subsection of the ring geometry, defined as a slice.
As concerns the force prediction for the ring rolling process, Quagliato and Berti [7,8] proposed two mathematical models based on the slip line theory and estimated radial and axial forces with a deviation equal to~5% and~6%, respectively, in comparison to the relevant FEM simulations and experimental results. Furthermore, Ryoo et al. [9] defined the relationship between the parameters that affect the ring rolling process at high temperatures and investigated the influence of the main roll rotational speed and the mandrel feeding speed. Kalyani et al. [10] investigated radial and axial forces during the forming process of profiled rings in terms of time and temperature, calculated the forces with an analytical approach, and compared them with FEM simulations. Kim et al. [11] investigated the influence of process parameters in producing large rings, focusing on minimizing the load, but did not consider temperature and process parameters' reciprocal influence.
As concerns energy estimation and starvation algorithms for industrial process, due to the strong influence of the energy demand on production planning and control [12], several authors focused on this topic. Unver and Kara [13] introduced a decision support tool called HORUS 5.0 to determine the lowest energy-consuming route within the scope of sustainable energy efficiency. Meissner et al. [14] developed an indicator system considering the impact of the materials, energies, and economic attributes of energy efficiency, concluding that strategic decision-making concerning energy optimization is important to be competitive. Larkiola et al. [15] investigated the role of energy efficiency in the rolling processes employing an ANN-based approach and achieved an improvement estimated in 1.8% of the overall energy efficiency. Giorleo et al. [16] compared simulation analyses with an industrial case to evaluate the effect of utilizing different ring preform geometries to reduce the total energy required during the process but focused on a single material and single set of process parameters. Allegri et al. [17] defined a main roll speed law that allows maintaining a constant ring angular velocity and achieved a 35% fishtail defect reduction and a 9% energy consumption reduction.
As summarized so far, several authors developed models for the prediction of the kinematic expansion, the force, and torque but it seems that the impact of the process parameters on the energy consumption has not been thoroughly investigated in the literature. In the hot radial-axial ring rolling process, as in every metal forming process carried out at hot or warm forming conditions, a lower process force can be obtained by reducing the feeding, or the deformation, over time. On the other hand, a longer manufacturing time induces a higher temperature drop in the workpiece, which leads to an increase in the resistance to the deformation of the material. The energy integral over time can be estimated by employing numerical simulations [18][19][20][21] but, considering the complex tools-workpiece interaction in the RARR process, the computational time required for one single simulation might range between several hours and a few days.
In the literature, machine learning algorithms have been already applied to various manufacturing topics, such as for the prediction of joint strength of ultrasonic welding processes [22], to estimate the tool wear in milling operations [23], to diagnose the dimensional variation of additive manufactured parts [24], to classify the cutting phase of the natural fiber reinforced plastic composites [25] and to predict the tool life in the micro-milling process [26]. More recently, Wang et al. [27] developed a deep learning-based algorithm for the recognition of the defects in the strip rolling process, Marques et al. [28] investigated the performances of parametric and non-parametric models for the correlation of process and material variables to springback and wall thinning, Palmieri et al. [29] defined a metamodel to correlate the process parameters and key-quality indicators for the optimization of the blank-holding forces in the stamping process, and Winiczenko [30] utilized a hybrid response surface methodology combined with a genetic algorithm to simulate and optimize the friction welding parameters in AISI 1020-ASTM A536 joints.
Although ML algorithms have been applied to various manufacturing processes, they have not yet been utilized for the investigation of the influence of process, material, and geometrical parameters in metal forming processes and have not been applied yet to the RARR process. Accordingly, the research presented in this paper aims to fill this gap in the literature by investigating the influence of (i) process parameters, (ii) material properties, (iii) initial/final ring geometries, and (iv) processing conditions on energy consumption. Based on the implemented numerical simulation database, eight machine learning (ML) models have been trained and utilized for the prediction of the energy consumption during the process based on the above-mentioned (i, ii, iii, iv) parameters clusters. The mandrel forming force integral over time, (FIOT), has been utilized as the output variable in the analysis, and as response value for the training and validation of the ML algorithms. Based on the most recent applications of machine learning model, eight different models have been adopted in the research presented in this paper, namely: linear methods [31,32], the kernel methods [33,34], the ensemble methods [35][36][37], and the artificial neural network (ANN) methodology [38][39][40], respectively.
To create the dataset for the training and the validation of machine learning models, radial-axial ring rolling finite element simulations models have been implemented in the commercial software Simufact Forming 15: six ring final outer diameters, equal to 650, 800, 1100, 1400, 1700, and 2000 mm have been considered along with three different materials, largely utilized in the ring rolling process, namely the 42CrMo4 steel [4], the Inconel 718 superalloy [41], and the AA6082 (AlMgSi) aluminum alloy [42]. The material properties have been accounted for by its temperature-dependent elastic modulus and yield strength. Since the training and validation datasets have been all acquired through FE simulations, the implemented FEM model has been validated by comparing its results with a previously published once [8], showing a maximum deviation equal to 2.15% and 0.95% in the prediction of the radial forming force outer diameter of the ring.
A total of 380 numerical simulation models have been implemented and 80% of the results have been utilized for the training of the ML models, whereas the remaining 20% were for their validation. An additional validation phase has been carried out considering the previous literature experimental results published in [5,8,11]. Based on both validation phases, the Gradient Boosting method, belonging to the ensembles methods, has been shown to be able to accurately predict the force integral over time (FIOT) and is therefore considered to be the most reliable for the case of a complex thermo-mechanical forming process, such as the radial-axial ring rolling process.

Finite Element Simulation Model Definition
To create the database for the training of the machine learning-based force integral over time (FIOT) prediction models, presented in Section 3 of the paper, thermo-mechanical FEM simulations have been implemented in the commercial software Simufact Forming 15 following the general implementation scheme shown in Figure 1. In the numerical simulation models, the dies are considered as rigid with conductive, convective, and radiation heat transfer with the ring and the surrounding environment. The reason for introducing this approximation is justified by the fact that, although the elastic deformation in the rolls can slightly affect the final shape of the ring, its influence is negligible in comparison to the size of the rings considered in this paper. The dimensions for the tools of the ring rolling mill utilized in all the implemented FEM models are summarized in Table 1 along with the additional common process conditions. Friction has been modeled considering a shear friction law, Equation (1), and the utilized friction factor [6][7][8]18] is also reported in Table 1. In Equation (1), k is defined as the ratio between the yield strength of the material and the square root of 3, according to the von Mises criterion.
(1)  A higher friction factor has been considered for the contact conditions between the ring, mandrel, and the main roll due to higher thickness draft along the radial direction in comparison to the vertical deformation, carried out by the axial rolls. As concerns the centering rolls, their role is mainly to avoid excessive shifting of the ring during the process, thus their contact with the ring is limited and discontinuous over the processing time.
Friction influences the force calculation but, as will be shown in the results section, even though the training of the machine learning models has been carried out with a single set of friction constants (Table 1) when the model is applied to literature experimental cases, where different friction conditions are considered, an accurate FIOT prediction can still be achieved. Considering the training and validation (phase 1) datasets altogether, it is composed of 380 thermo-mechanical radial-axial ring rolling numerical simulations where the final outer diameter of the ring (D F ) ranges from 650 mm to 2000 mm.
As concerns the initial annular blanks, they have been defined in terms of initial outer diameter D 0 , initial blank height h 0 , and initial inner diameter d 0 according to the relevant final shape, by means of the procedure defined in Berti et al. [18]. A total of 16 different preform sizes have been utilized for the considered six final outer diameter geometries, as summarized in Table 2.
The ring preforms have been optimized considering four different mesh detail levels, and the best compromise between accuracy has been identified in (i) 1 element every 0.5 • for the circumferential direction, 1 element every 2.5 mm for the radial direction, and 1 element every 5 mm for the vertical direction. These 16 geometries have been combined with different materials, Section 2.2, and process settings, Section 2.3, allowing obtaining the final database of 380 FEM simulations. Due to the impracticality of reporting the whole 380 settings in table form, a summary is added in Appendix A, whereas the whole dataset is made available as Supplementary Material.

Materials
In the FEM models, presented in the previous section, three materials largely utilized in the hot ring rolling process [4,41,42] have been considered: (i) 42CrMo4 steel, (ii) Inconel 718 super alloy and (iii) AA6082 (AlMgSi) aluminum alloy. Due to the high diversity of mechanical behaviors, the consideration of these three materials allows widening the range of validity of the proposed investigation. For the definition of the plastic material behavior, the Hansel-Spittel flow stress model [43] has been utilized, as reported in Equation (2) whereas the relevant model constant (C 1 , C 2 , n 1 , n 2 , L 1 , L 2 , m 1 , m 2 ), for the three considered materials, are reported in Table 3. In Equation (2), ε, . ε, and T represent the considered strain, the strain rate, and temperature. The combined consideration of these three parameters during the FEM simulations allows estimating the flow stress of the material for each element of the mesh, thus accurately estimating the relevant forming force. To be able to consider the influence of the material in the FIOT prediction models, the initial temperature of the ring, set as initial boundary conditions in the FEM models, as well as Young's modulus and yield strength at that temperature, have been considered as features in the analysis. The three considered temperatures, for each one of the three materials, are reported in Table 4 along with the two above-mentioned mechanical properties. All the elastic, plastic, and thermal-mechanical properties for the three considered materials have been acquired from the MATILDA ® (Material Information Link and Database Service) database available in Simufact Forming 15. The material features reported in Table 4 have been combined with the geometrical features, presented in previous Section 2.1, and with the process setting features, reported in the following Section 2.3, allowing the creation of the dataset utilized for the training and tests of the considered machine learning algorithms.

Radial-Axial Ring Rolling FEM Simulation Settings
The process parameters for the numerical simulations have been set considering the models proposed and validated in Berti et al. [18]. The three main parameters utilized in the analysis are reported in Equation (3), for the main roll rotational speed ω R , in Equations (4) and (5) for the mandrel initial [v M ] 0 and final [v M ] F feeding speeds, and in Equations (6) and (7) for the upper axial roll initial [v A ] 0 and final [v A ] F feeding speeds.
In Equations (4) and (7) R R is the radius of the main roll, R M the radius of the mandrel, R 0 , r 0 , and h 0 the outer radius, inner radius and height of the initial ring blank, R F , r F , and h F the outer radius, inner radius, and height of the final ring, θ half of the axial rolls vertex angle whereas β R and β A the friction angle in the contact between main roll and mandrel and axial rolls, with the ring, respectively. The friction angle is calculated based on the friction factors, Table 1, as β R = arctg(m).
For each one of the implemented numerical simulations, the above-mentioned process parameters have been set according to the range proposed in [18] and have been considered as input for the force integral over time (FIOT) estimation models, presented in Section 4.3 of the paper. Since the process parameter setting is based on a kinematic approach, different temperatures or materials result in the same set of speeds. The summary of the implemented study cases is reported in Appendix A and is fully disclosed in the Supplementary Material.

Machine Learning Models Definition, Preprocessing, and Training
Due to the complex interaction between the considered process, materials, and geometry parameters, eight machine learning (ML) algorithms, one of which is based on the artificial neural network (ANN), with different levels of complexity, have been considered in this paper. The target is to implement a methodology for the estimation of the energy consumption in the radial-axial ring rolling process based on a set of input variables composed of geometry, process conditions, and materials. The architecture of the implemented ANN model is shown in Figure 2 where four hidden layers have been considered. As concerns the remaining ML models, input and output layers are the same as shown in Figure 2 but are connected through the weights vectors, defined during the optimization process. All the considered models have been applied to the above-mentioned dataset, considering 80% of the set for the model training whereas the remaining 20% has been employed for the assessment of the model accuracy. Both sets are not predetermined but are randomly selected before the training. The employed algorithms belong to the (i) linear, (ii) kernel, (iii) ensemble and (iv) artificial neural network approaches.

Linear Methods
Linear regression methods [31,32] are utilized to model linear correlations between the independent variable x and dependent variable y as in Equation (8). The prediction calculated by the model is defined asŷ and the aim is to minimize the Residual Sum of Squares (RSS) of the objective function, as shown in Equation (9). The subscript D represents the number of considered features, whereas N represents the size of the dataset.
Linear methods can be expanded to model the non-linear relationships by replacing X with non-linear functions. In this paper, to avoid the over-fitting problem, the regularized linear method has been utilized where constraints have been imposed on the weights vector (w) of Equation (8).
Based on the general form of Equation (8), the Ridge model is defined to minimize the squared sum of weights, thus resulting in the objective function ( taset. Linear methods can be expanded to model the non-linear relationships by replacing X with non-linear functions. In this paper, to avoid the over-fitting problem, the regularized linear method has been utilized where constraints have been imposed on the weights vector ( w ) of Equation (8). (9) Based on the general form of Equation (8), the Ridge model is defined to minimize the squared sum of weights, thus resulting in the objective function (  ), Equation (10). If the hyperparameter  of Equation (10) is equal to 0, we return to the original linear model of Equation (9). The hyperparameter, present in the Ridge model, as well as in other of the models subsequently presented, is a tuning parameter utilized to increase the accuracy of the prediction and is calculated, during the training, to maximize the correlation factor between independent and dependent variables [44].  w , defined as in Equation (11).
Considering together the square of the weights, as in the Ridge model of Equation (10), and the norm of the weights, as in the LASSO algorithm of Equation (11), the third considered linear model is shown as in Equation (12) and is defined as the Elastic Net model. (12) In Equation (12), if the hyperparameter is set as  parameters represent the constants related to the first and second-order norms, respectively, and are calculated based on the random search method [44].

Kernel Methods
Linear methods can be expanded to model non-linear relationships between the independent and dependent variables by replacing X , Equation (8), with the feature function () x  . The feature function can be written with the Gram matrix ( K ), as shown in Equations (13) and (14) where ( , ) ij xx  is the kernel function [33,34], defined to model the considered relationship. The Kernel Ridge (KR) model combines the kernel method with the Ridge model (10) and, in this paper, the polynomial kernel of Equation (15) is utilized for the Kernel Ridge model. The c and d constants in Equation (15) influence ), Equation (10). If the hyperparameter λ of Equation (10) is equal to 0, we return to the original linear model of Equation (9). The hyperparameter, present in the Ridge model, as well as in other of the models subsequently presented, is a tuning parameter utilized to increase the accuracy of the prediction and is calculated, during the training, to maximize the correlation factor between independent and dependent variables [44].

Linear Methods
Linear regression methods [31,32] are utilized to model linear correlations between e independent variable x and dependent variable y as in Equation (8). The predicn calculated by the model is defined as ŷ and the aim is to minimize the Residual Sum Squares (RSS) of the objective function, as shown in Equation (9). The subscript D presents the number of considered features, whereas N represents the size of the daset.
Linear methods can be expanded to model the non-linear relationships by replacing with non-linear functions. In this paper, to avoid the over-fitting problem, the regurized linear method has been utilized where constraints have been imposed on the eights vector ( w ) of Equation (8). (9) Based on the general form of Equation (8), the Ridge model is defined to minimize e squared sum of weights, thus resulting in the objective function (  ), Equation (10). If e hyperparameter  of Equation (10) is equal to 0, we return to the original linear odel of Equation (9). The hyperparameter, present in the Ridge model, as well as in other the models subsequently presented, is a tuning parameter utilized to increase the accucy of the prediction and is calculated, during the training, to maximize the correlation ctor between independent and dependent variables [44]. , defined as in Equation (11).
Considering together the square of the weights, as in the Ridge model of Equation 0), and the norm of the weights, as in the LASSO algorithm of Equation (11), the third nsidered linear model is shown as in Equation (12) and is defined as the Elastic Net odel.

Kernel Methods
Linear methods can be expanded to model non-linear relationships between the inpendent and dependent variables by replacing X , Equation (8) (15) influence Another variation of Equation (8)

Linear Methods
Linear regression methods [31,32] are utilized to model linear correlations between the independent variable x and dependent variable y as in Equation (8). The prediction calculated by the model is defined as ŷ and the aim is to minimize the Residual Sum of Squares (RSS) of the objective function, as shown in Equation (9). The subscript D represents the number of considered features, whereas N represents the size of the dataset.
Linear methods can be expanded to model the non-linear relationships by replacing X with non-linear functions. In this paper, to avoid the over-fitting problem, the regularized linear method has been utilized where constraints have been imposed on the weights vector ( w ) of Equation (8). (9) Based on the general form of Equation (8), the Ridge model is defined to minimize the squared sum of weights, thus resulting in the objective function (  ), Equation (10). If the hyperparameter  of Equation (10) is equal to 0, we return to the original linear model of Equation (9). The hyperparameter, present in the Ridge model, as well as in other of the models subsequently presented, is a tuning parameter utilized to increase the accuracy of the prediction and is calculated, during the training, to maximize the correlation factor between independent and dependent variables [44]. , defined as in Equation (11).
Considering together the square of the weights, as in the Ridge model of Equation (10), and the norm of the weights, as in the LASSO algorithm of Equation (11), the third considered linear model is shown as in Equation (12) and is defined as the Elastic Net model.  (12) In Equation (12), if the hyperparameter is set as  parameters represent the constants related to the first and second-order norms, respectively, and are calculated based on the random search method [44].

Kernel Methods
Linear methods can be expanded to model non-linear relationships between the independent and dependent variables by replacing  (15) influence (w), defined as in Equation (11).

Linear Methods
Linear regression methods [31,32] are utilized to model linear correlations between independent variable x and dependent variable y as in Equation (8). The prediccalculated by the model is defined as ŷ and the aim is to minimize the Residual Sum Squares (RSS) of the objective function, as shown in Equation (9). The subscript D resents the number of considered features, whereas N represents the size of the daet.
Linear methods can be expanded to model the non-linear relationships by replacing with non-linear functions. In this paper, to avoid the over-fitting problem, the reguzed linear method has been utilized where constraints have been imposed on the ights vector ( w ) of Equation (8). (9) Based on the general form of Equation (8), the Ridge model is defined to minimize squared sum of weights, thus resulting in the objective function (  ), Equation (10). If hyperparameter  of Equation (10) is equal to 0, we return to the original linear del of Equation (9). The hyperparameter, present in the Ridge model, as well as in other he models subsequently presented, is a tuning parameter utilized to increase the accuy of the prediction and is calculated, during the training, to maximize the correlation tor between independent and dependent variables [44]. , defined as in Equation (11).
Considering together the square of the weights, as in the Ridge model of Equation  ), and the norm of the weights, as in the LASSO algorithm of Equation (11), the third sidered linear model is shown as in Equation (12) and is defined as the Elastic Net del.  parameters represent the constants related to the first and second-order norms, pectively, and are calculated based on the random search method [44].

. Kernel Methods
Linear methods can be expanded to model non-linear relationships between the inendent and dependent variables by replacing X , Equation (8) (15) influence Considering together the square of the weights, as in the Ridge model of Equation (10), and the norm of the weights, as in the LASSO algorithm of Equation (11), the third considered linear model is shown as in Equation (12) and is defined as the Elastic Net model. sion methods [31,32] are utilized to model linear correlations between ariable x and dependent variable y as in Equation (8). The predicthe model is defined as ŷ and the aim is to minimize the Residual Sum f the objective function, as shown in Equation (9). The subscript D ber of considered features, whereas N represents the size of the dads can be expanded to model the non-linear relationships by replacing r functions. In this paper, to avoid the over-fitting problem, the reguhod has been utilized where constraints have been imposed on the ) of Equation (8). (9) general form of Equation (8), the Ridge model is defined to minimize f weights, thus resulting in the objective function (  ), Equation (10). If er  of Equation (10) is equal to 0, we return to the original linear (9). The hyperparameter, present in the Ridge model, as well as in other equently presented, is a tuning parameter utilized to increase the accuion and is calculated, during the training, to maximize the correlation ependent and dependent variables [44].
In Equation (12), if the hyperparameter is set as λ 1 = 1 then the LASSO Equation (11) is obtained, whereas if λ 1 = 0 it results in the Ridge model, respectively. The λ 1 and λ 2 parameters represent the constants related to the first and second-order norms, respectively, and are calculated based on the random search method [44].

Kernel Methods
Linear methods can be expanded to model non-linear relationships between the independent and dependent variables by replacing X, Equation (8), with the feature function φ(x). The feature function can be written with the Gram matrix (K), as shown in Equations (13) and (14) where κ(x i , x j ) is the kernel function [33,34], defined to model the considered relationship. The Kernel Ridge (KR) model combines the kernel method with the Ridge model (10) and, in this paper, the polynomial kernel of Equation (15) is utilized for the Kernel Ridge model. The c and d constants in Equation (15) influence the feature functions and are determined through the random search method [44] during the training process.
Another Kernel method based on the squared norm of the weight factors is defined as the Support Vector Machine (SVM) model [45], Equation (16), where the RSS(w) function of Equation (10) is changed into the epsilon intensive loss function, Equation (17). In this paper, as for the case of the SVM model, the polynomial kernel function of Equation (15) has been utilized.
. Linear methods can be expanded to model the non-linear relationships by replacing ith non-linear functions. In this paper, to avoid the over-fitting problem, the regued linear method has been utilized where constraints have been imposed on the hts vector ( w ) of Equation (8).  (9) Based on the general form of Equation (8), the Ridge model is defined to minimize quared sum of weights, thus resulting in the objective function (  ), Equation (10). If yperparameter  of Equation (10) is equal to 0, we return to the original linear el of Equation (9). The hyperparameter, present in the Ridge model, as well as in other e models subsequently presented, is a tuning parameter utilized to increase the accuof the prediction and is calculated, during the training, to maximize the correlation r between independent and dependent variables [44].  (11).
Considering together the square of the weights, as in the Ridge model of Equation and the norm of the weights, as in the LASSO algorithm of Equation (11), the third idered linear model is shown as in Equation (12) and is defined as the Elastic Net el.

Ensemble Methods
The ensemble methods [35][36][37] combine different approaches and apply them to randomly selected data sub-sets to improve the prediction performances. Among the ensemble approaches, the Random Forest (RF) model, utilized in this paper, trains M decision trees and calculated the response for each one of them. For each tree, the response is defined considering different intervals for the estimator, allowing the subdivision of the problem into subclasses, which are classified by their accuracy by comparing their values with the true value. The final prediction is given by the average of the M ones, calculated as the average of each one relevant for each one of the subsets of every tree as shown in Equation (18) whereŷ is the average prediction over M-trees andŷ m is the prediction of each tree.ŷ Another ensemble method is defined as Gradient Boosting (GB) and, differently from the RF, in the beginning, only one tree ( f ) is created and it is progressively updated to minimize the objective function W 8 of 20

Linear Methods
Linear regression methods [31,32] are utilized to model linear correlations between the independent variable x and dependent variable y as in Equation (8). The prediction calculated by the model is defined as ŷ and the aim is to minimize the Residual Sum of Squares (RSS) of the objective function, as shown in Equation (9). The subscript D represents the number of considered features, whereas N represents the size of the dataset.
Linear methods can be expanded to model the non-linear relationships by replacing X with non-linear functions. In this paper, to avoid the over-fitting problem, the regularized linear method has been utilized where constraints have been imposed on the weights vector ( w ) of Equation (8). (9) Based on the general form of Equation (8), the Ridge model is defined to minimize the squared sum of weights, thus resulting in the objective function (  ), Equation (10). If the hyperparameter  of Equation (10) is equal to 0, we return to the original linear model of Equation (9). The hyperparameter, present in the Ridge model, as well as in other of the models subsequently presented, is a tuning parameter utilized to increase the accuracy of the prediction and is calculated, during the training, to maximize the correlation factor between independent and dependent variables [44]. , defined as in Equation (11).
Considering together the square of the weights, as in the Ridge model of Equation (10), and the norm of the weights, as in the LASSO algorithm of Equation (11), the third considered linear model is shown as in Equation (12) (12) In Equation (12), if the hyperparameter is set as  parameters represent the constants related to the first and second-order norms, respectively, and are calculated based on the random search method [44].

Kernel Methods
Linear methods can be expanded to model non-linear relationships between the independent and dependent variables by replacing X , Equation (8) (13) and (14) where ( , ) ij xx  is the kernel function [33,34], defined to model (w), Equation (19). Therefore, the m + 1 tree is based on the results of the m tree compensated by the gradient residual of the previous tree by considering the learning rate η, as shown in Equation (20).   (10) is equal to 0, we return to the original linear ion (9). The hyperparameter, present in the Ridge model, as well as in other ubsequently presented, is a tuning parameter utilized to increase the accudiction and is calculated, during the training, to maximize the correlation independent and dependent variables [44].   (11) ng together the square of the weights, as in the Ridge model of Equation orm of the weights, as in the LASSO algorithm of Equation (11), the third ear model is shown as in Equation (12) and is defined as the Elastic Net ods The learning rate η is defined as the speed by which the algorithm minimizes the loss function, L δ for the ensemble methods and is present only for the case of ensemble and ANN methods. For the case of the linear method, like those of Sections 3.1 and 3.2, the learning rate is not considered since the optimized value is defined as the minimum of the loss function.

Artificial Neural Network Methods
Artificial Neural Network (ANN) models [38][39][40] consist of: (i) input layers, (ii) hidden layers, and (iii) output layers, Figure 2. Input layers are connected to the hidden layers by the weight functions (w ij ) which are calculated during the training of the ANN algorithm. For each one of the nodes, the input coming from the previous layer is defined as x ij and are multiplied by the weight functions (w ij ) and summed out to the bias values (w i0 ), and the output of the layer is derived through the activation function (Ψ), Equation (21).
In the research presented in this paper, the weight matrix is updated considering the RMSprop algorithm [46], reported in Equation (22). The learning rate η and the hyperparameter ρ are optimized considering the random search method [44]. Finally, the activation function for the ANN algorithm, Ψ of Equation (23), is defined as the threshold for the activation of a considered node in the hidden layers.
Only if Ψ exceeds the threshold, the considered node in the i-layer is connected to the nodes of the i + 1 layer. The considered ANN algorithm is composed of four hidden layers made up of 200, 100, 50, and 25 nodes, respectively. The number of neurons for each layer has been optimized to minimize the loss function. The L δ target function of Equation (19) has also been utilized for the case of the ANN model.

Data Preprocessing and Machine Learning Algorithm Training
For the training of the selected machine learning algorithms, presented in Sections 3.1-3.4, the input data for the 380 FEM simulations as well as the result, in terms of radial forming force integral over the mandrel time (FIOT), have been randomly arranged to avoid any bias. In both the training and test datasets, a single feature is defined as a row of the table composed of the following data: (i) main roll rotational speed, (ii) average mandrel feeding speed, (iii) initial ring geometry, (iv) final ring geometry, (v) initial ring temperature, (vi) material yield strength, (vii) material Young's modulus and (viii) force integral over the mandrel time. Since the parameters considered in this research have different intervals and measurement units, normalization has been applied to convert them to a 0 to 1 range. As concerns the FIOT, due to the skewness of the data distribution, the input data has been converted into log(1 + FIOT) before the normalization process. Similarly, the remaining parameters have also been converted by a box-cox transformation defined as x 0.15 − 1 /0.15, where x is the considered parameter.
This procedure allows reducing the computational burden during the training as well as increasing the accuracy. The hyperparameters of the prediction models described in chapter 3 have been obtained by applying the random search method aiming to maximize the correlation factor on both the training and the test datasets. The algorithms presented in the previous sections of chapter 3 have been implemented in a Windows OS environment utilizing the scikit-learn 0.22.2 and Keras 2.3.1 modules implemented in the Anaconda Spyder program with Python 3.7.4. As previously mentioned, 80% of the dataset, corresponding to 304 data, has been randomly selected from the whole database and the remaining 20%, 76 data, has been utilized as a test set.
For the evaluation of the accuracy of each model, three validation steps have been considered: (i) in the first step, the training dataset is fed once again to the model after the hyperparameters, if present, have been optimized; (ii) the test set is fed to the model and the accuracy, for the case of untrained data, is evaluated; (iii) finally, experimental values from reference papers and self-developed experiments are fed to the model and its accuracy is defined. The results concerning the accuracy of each of the considered methods, for the above-mentioned three validation steps, are reported in Section 4 of the paper along with the relevant optimized hyperparameters.

Results
To condense the vast amount of data relevant for the 380 numerical simulations composing the training and test datasets, the key results of three numerical simulations, in terms of equivalent plastic strain and mandrel force, have been summarized in Section 4.1. In addition to that, to prove the reliability of the numerical model implementation procedure, in Section 4.2 a validation has been carried out by comparing the outer diameter expansion and mandrel force over time. The results presented in Section 4.2 are from the authors' previous work [8] and have been briefly summarized.
Finally, in Section 4.3, the results of the optimization of the hyperparameters as well as the performances of the considered machine learning models, as presented in Section 3, are reported. To enhance the validation of the proposed FIOT estimation procedure, the four most accurate machine learning models, among the eight employed, have been utilized for the prediction of three experimental cases from literature papers. This second validation phase allowed confirming the reliability of the defined investigation procedure as well as the accuracy of the implemented solutions.

Thermo-Mechanical FEM Models Results
To provide insight on the results of the numerical simulation implemented for all the 380 analyzed cases, in Figure 3 the equivalent plastic strain distribution at the end of the calibration phase, the outer diameter, and radial force evolution during the process are reported for the case of an 1100 mm final outer diameter ring made of 42CrMo4 steel with an initial temperature of 1200 • C. The radial (mandrel) forming forces relevant for all the 380 cases have been exported from the FEM simulations and utilized for the creation of the training and test database for the machine learning algorithms. Due to the large amount of data composing the database, they are not included in the manuscript but submitted along with the paper as Supplementary Material. After the export of the results of the radial forming force from the Simufact Forming 15 numerical simulations, a script has been implemented in MS-Excel for the automatic calculation of the time integral of the force, allowing to calculate the FIOT, utilized as a sort of measure of the amount of energy required in the whole forming process.
For the calculation of the FIOT, only the mandrel time, Figure 3c, utilized as user input in the numerical simulation, has been considered. The overall simulation time is composed of mandrel time and calibration time but, since the latter one can be extended at will to increase the accuracy of the ring geometry, it has not been considered in the analysis. The mandrel time instead is the time during which the mandrel is actively translating towards the main roll, thus when most of the process energy is employed.

Thermo-Mechanical FEM Model Validation
To validate the developed numerical simulation model, the experimental results presented in the authors' previous research [8] have been utilized and are hereafter summarized. For the validation, a Pb75-Sn25 alloy has been utilized for the manufacturing of the ring preform with the initial dimensions D 0 , d 0 and h 0 equal to 155 mm, 105 mm, and 42 mm, and final dimensions equal to D F , d F and h F equal to 195 mm, 153 mm, and 37 mm, respectively ( Figure 4).

Energy Prediction Models Results and Validation
By considering the setting parameters and FIOT results of the 380 implemented FEM simulations, as presented in Section 2, the hyperparameters relevant for the eight considered machine learning models have been calculated by means of the random search method and optimized during the training phase of the algorithms. The hyperparameters have been all set as random numbers at the beginning of the training process and optimized during the training to minimize the residual between prediction and true values. During the training phase, 80% of the whole 380 simulations have been utilized and this set is defined as the "train set". After the optimization of the hyperparameters, the trained machine learning models have been applied to the remaining 20% of the 380 simulations, not utilized during the training, and the accuracy in the estimation of the FIOT has been According to the results presented in Figure 5, the maximum deviation between experimental and finite element results is equal to 0.95% for the outer diameter 2.15% for the radial forming force, showing the reliability of the implemented numerical simulation model in replicating real process conditions. Since the FIOT estimation is based on the precise estimation of the forming force for the whole mandrel feeding time, the validation carried out against experimental results allows confirming the accuracy of the implemented finite element model solution, and thus the reliability of the input dataset for the training of the considered machine learning models.

Energy Prediction Models Results and Validation
By considering the setting parameters and FIOT results of the 380 implemented FEM simulations, as presented in Section 2, the hyperparameters relevant for the eight considered machine learning models have been calculated by means of the random search method and optimized during the training phase of the algorithms. The hyperparameters have been all set as random numbers at the beginning of the training process and optimized during the training to minimize the residual between prediction and true values. During the training phase, 80% of the whole 380 simulations have been utilized and this set is defined as the "train set". After the optimization of the hyperparameters, the trained machine learning models have been applied to the remaining 20% of the 380 simulations, not utilized during the training, and the accuracy in the estimation of the FIOT has been investigated. The accuracy of the training process has been verified by considering the correlation factor (R2). The optimized hyperparameters as well as the correlation factors for each model, relevant for the training dataset and the test dataset, calculated for the optimized hyperparameters, are reported in Table 5. According to the results presented in Table 5, the Gradient Boosting method shows the best correlation factor (R 2 ) both in the training and test datasets. This high accuracy is related to the capability of the ensemble methods to subdivide the training dataset into subproblems and thus, as concerns the research presented in this paper, to properly interpret the influence of different levels of the process, material, and geometrical parameters on the FIOT. Moreover, the higher accuracy of the Gradient Boosting method in comparison to the Random Forest method is related to the nature of the error minimization of the former. For the case of the Gradient Boosting method, only one tree is considered, and it is progressively optimized to minimize the residuals. The Random Forest method instead creates several trees and assigns a sub-problem to each one of them, optimizing the solutions for each one of them. However, the subdivision into sub-classes might lead to biases during the training process, a fact which is clear from the drop of the correlation factor between train test and test set for the case of the Random Forest method (Table 5).
To provide a more comprehensive evaluation of the performances of the four machine learning models that have shown the best results in terms of correlation factors (Table 5), the true values vs. prediction as well as the percentage residuals for the 76 cases of the test set are reported in Figure 6a,b, respectively. The true values in Figure 6a are the FEM result whereas the prediction values refer to the relevant ML models predictions.
The analysis of the residuals, Figure 6b, shows that although the Kernel, Random Forest and ANN methods have a remarkably high correlation factor, their residuals are considerably high, especially for small prediction values. On the other hand, the Gradient Boosting method allows having low residuals for all FIOT levels. The maximum and average residuals, for the four methods summarized in Figure 6, are reported in Table 6. In addition to that, as previously mentioned, the accuracy in the prediction of the FIOT has also been evaluated for the case of three experimental ring rolling cases from the literature [5,8,11] by applying the four trained machine learning models that showed the highest correlation factors (Table 6). These experimental results are all relevant for experiments carried out on GH4169 nickel-based superalloy [5], the Pb-Sn alloy ring also utilized for the finite element model validation [8], and AISI-304 steel alloy [11]. All these three cases are completely different in terms of the geometry and material of the ring, process conditions, and size of the ring rolling mill and have been selected to provide additional insight into the accuracy of the predictions carried out by the proposed models. True prediction percentage residuals for these three cases are summarized in Table 7. Considering the results presented in Table 7, it is once again clear that the structure of the Gradient Boosting method can catch the complex nature of the interaction between geometrical, material, and process parameters in the radial-axial ring rolling process thanks to its ability to subdivide the given task into sub-problems but while keeping error minimization linked to a single residual function. Moreover, as mentioned in Section 2.1, although the process conditions relevant for the [5,8,11] are different from those utilized in the FEM simulations utilized for the training, where the same friction conditions have been considered in all the cases, the accuracy is still remarkably good and the computational time is almost real-time, allowing a considerable improvement in comparison to the computational time of the thermo-mechanical numerical simulations, which may range from 9~12 h, for the case of the 650 mm final ring outer diameter simulations, to 1.5 to 3 days for the case of 2000 mm final ring outer diameter simulations.

Discussion
Considering the results relevant for all the utilized machine learning models, as presented in Table 5, the relatively low correlation factor shown by the linear models is an indication of the fact that the relationship between the considered input and output parameters is not linear. For the same reason, the Kernel methods, which utilize a polynomial function, show better accuracy than the linear methods but still have high residuals, as shown in the detailed analysis of Figure 6 and Table 6.
Both linear and Kernel methods calibrate the components of the weights vector (w), Equation (8) by minimizing the residual of the objective function, thus they tend to show low residuals for the case of the training dataset but relatively high ones for untrained data. On the other hand, both the Gradient Boosting as well as the Neural Network algorithms calibrate the components of the weights vector (w) considering the learning rate which allows a more robust consistency in both training and test datasets, as well as for additional predictions. Considering altogether the three validation steps carried out considering the (i) training dataset, (ii) the test dataset and (iii) the literature experimental cases, the complex interaction between process, material, and geometry parameters is therefore representable neither by a linear nor by a polynomial function.
As concerns the applicability of the proposed procedure outside the ranges considered for the construction of the training dataset, the validation case relevant for [5] gives a remarkably interesting insight. Although geometry, process parameters, and material are all different in comparison to those considered in this paper, the robustness of the trained Gradient Boosting model allows obtaining a reasonable residual in the estimation of force integral, as shown in Table 7. On the other hand, as previously mentioned, the linear and polynomial correlations considered by the linear and Kernel methods render their prediction to be affected by a high residual if the requested prediction is outside the trained ranges. Furthermore, the choice of normalizing all the parameters is also an important step for the application of the proposed procedure outside its training ranges.
Finally, an interesting feature relevant to the machine learning methods concerns the balance of the training dataset. In the considered research, the amount of data relevant for low FIOT is considerably higher than that of high force integral, and, for the case of multivariable regression methods, this fact would have resulted in good predictions for the former scenario and bad for the latter one. In principle, the need for a balanced training dataset is also valid for the machine learning models but, for the case of the ensemble methods, as well as the neural network, their sensitivity to the data clustering is almost negligible and is therefore suitable for the application in sparse and not balanced data environments. Considering altogether the investigation proposed in this paper, the applicability of the machine learning-based algorithm for the prediction of energy consumption, measured in terms of force integral over time, in forming processes has been largely explored, and both the results and analysis reported in this paper might be helpful for the extension of its application to additional industrially relevant processes.

Conclusions
The research presented in this paper highlighted the importance of considering the material, geometrical, and process parameters when estimating the forming force during the radial-axial ring rolling process. Moreover, eight different Machine Learning-based algorithms have been utilized for the prediction of the mandrel force integral over time (FIOT) and showed that the Gradient Boosting (GB) algorithm, belonging to the ensemble methods, grants the best accuracy in the prediction of the FIOT, being the maximum residual equal to 9.03%. Since the validation has been carried out on previously published results where ring geometries, process conditions, and materials were not included in the training dataset, the proposed approach has proven its robustness in predicting the FIOT also outside the range of the training data set. The trained GB algorithm can be directly applied to the radial-axial ring rolling (RARR) process through the algorithm provided as Supplementary Material and applies also to other forming and forging processes where the contact between workpiece and tools is defined by a curved line, as in the RARR process. The application of the proposed procedure allows a significant reduction in the time required for the estimation of the energy consumption during forming processes, its calculation being almost real-time, in comparison to the case of FEM simulations where the computational time ranges between~10 h (for 650 mm final outer diameter rings) to 3 days (for 2000 m final outer diameter rings). The procedure presented in this paper can also be extended to different metal forming and forging processes by considering the same geometry, material, and process parameters influence on the energy consumption, but the creation of a new training dataset might be required. For these reasons, the research presented in this paper might be of interest to researchers and process engineers interested in energy consumption in metal forming processes.

Appendix B
The material properties of the Pb75-Sn25 alloy have been determined by carrying out compression tests at four different strain rates at room temperature and the relevant flow stress curves have been derived considering the model presented in Equation (A1). The model constants for the flow stress model of Equation (A1) are reported in Table A1. The laboratory-size ring rolling machine, utilized for the validation experiment, is reported in Figure A1a whereas the comparison between experimental and numerical flow stress curves is shown in Figure A1b.