A Comparative Assessment of Six Machine Learning Models for Prediction of Bending Force in Hot Strip Rolling Process

: In the hot strip rolling (HSR) process, accurate prediction of bending force can improve the control accuracy of the strip crown and ﬂatness, and further improve the strip shape quality. In this paper, six machine learning models, including Artiﬁcial Neural Network (ANN), Support Vector Machine (SVR), Classiﬁcation and Regression Tree (CART), Bagging Regression Tree (BRT), Least Absolute Shrinkage and Selection operator (LASSO), and Gaussian Process Regression (GPR), were applied to predict the bending force in the HSR process. A comparative experiment was carried out based on a real-life dataset, and the prediction performance of the six models was analyzed from prediction accuracy, stability, and computational cost. The prediction performance of the six models was assessed using three evaluation metrics of root mean square error (RMSE), mean absolute error (MAE), and coe ﬃ cient of determination (R 2 ). The results show that the GPR model is considered as the optimal model for bending force prediction with the best prediction accuracy, better stability, and acceptable computational cost. The prediction accuracy and stability of CART and ANN are slightly lower than that of GPR. Although BRT also shows a good combination of prediction accuracy and computational cost, the stability of BRT is the worst in the six models. SVM not only has poor prediction accuracy, but also has the highest computational cost while LASSO showed the worst prediction accuracy.


Introduction
In recent years, with the development of hot strip rolling (HSR) technology, product users continuously call for increased requirements. These increased requirements include strip variety, specifications, and strip shape quality. A good strip shape quality produced by the HSR process has a desired crown and flatness, and it is also an important factor to determine the competitiveness of strip in the market. Therefore, strip shape quality has become a hot topic of many scholars [1,2].
There are many factors that affect the strip shape quality, which are mainly related to the roller, strip, and rolling conditions in the HSR process. However, the field environment of the rolling process is very complex, and there are many factors that affect the strip shape quality. There is still no perfect solution to the strip shape quality problem in the world. In order to improve the strip shape quality, most scholars mainly studied the following two aspects. The research on production equipment is the first thought of researchers. In order to improve strip shape quality, it is necessary to control roll crown effectively. Therefore, it can be achieved by replacing the work rolls with ultra-high strength to control roll crown effectively. Therefore, it can be achieved by replacing the work rolls with ultrahigh strength and ultra-high hardness to reduce the flexural deformation of the rolls. Secondly, the research on rolling technology has been carried out. In order to improve the precision of the preset model and compensate the influence of external factors on strip shape measurement precision, various factors affecting shape control model were studied. For example, in the process of strip production, if the detection accuracy of the roller is too low, it will directly affect the adjustment ability of the strip shape control mechanism, so the strip shape quality could not be improved [1,3].
Hydraulic roll bending control is one of the main methods to control the shape of hot rolled strip. The hydraulic roll bending system is more and more widely used in shape control of rolling mill because of its fast response and convenient real-time control. As shown in Figure 1, the principle of the hydraulic roller bending control system is that the bending force generated by the hydraulic cylinder is applied to the roller neck between the working roller and the supporting roller to change the deflection of the working roller instantaneously. Therefore, the shape of the gap of the load rollers is changed and the strip shape is controlled [4].

Strip Strip
Back-up roll  When the rolling process and equipment parameters are changed, the preset value of bending force needs to be adjusted in time. As a result of the adjustment, the roll gap shape is consistent with the cross-sectional shape of the strip, so that the shape of the strip rolled by the rolling mill can meet the requirements. In the production process, the setting value of bending force needs to be adjusted constantly with the requirement of the rolling process. Therefore, the bending force is usually calculated according to the rolling factors such as temperature, thickness, width, rolling force, material, thermal expansion of rolls, wears of rolls, and so on, aiming at the convexity and flatness of the strip. Due to the multivariable, strong coupling, nonlinear, and time-varying characteristics of rolling factors, the calculation model of hot rolling bending force is extremely complicated [5,6].
The traditional mathematical model considers that all the rolling factors related to bending force have linear or approximate linear effects on bending force, and the coupling relationship between the rolling factors is weak in the model. Therefore, the mathematical model established according to the traditional theory is difficult to achieve the ideal prediction effect of bending force because of the limitations of its own structure [7].
Since the 1990s, artificial intelligence methods have been widely applied to rolling processes. Furthermore, Artificial Neural Networks (ANN) have been extensively studied and applied in the fields of mechanical property prediction [8][9][10], rolling force prediction [11][12][13][14], roughing mill temperature prediction [15], strip shape and crown prediction [16][17][18][19][20]. For the first time, Wang et al. [21] used the ANN model optimized by genetic algorithm to predict the bending force in the rolling process. The accuracy of the model is verified by actual factory data, which shows that the model can be flexibly used for on-line control and rolling schedule optimization.
These studies reveal that ANNs have been shown to perform nonlinear data well and have better predictive quality as compared to traditional mathematical models due to its good learning ability. However, they also have some shortcomings, such as the unexplained nature of relationships When the rolling process and equipment parameters are changed, the preset value of bending force needs to be adjusted in time. As a result of the adjustment, the roll gap shape is consistent with the cross-sectional shape of the strip, so that the shape of the strip rolled by the rolling mill can meet the requirements. In the production process, the setting value of bending force needs to be adjusted constantly with the requirement of the rolling process. Therefore, the bending force is usually calculated according to the rolling factors such as temperature, thickness, width, rolling force, material, thermal expansion of rolls, wears of rolls, and so on, aiming at the convexity and flatness of the strip. Due to the multivariable, strong coupling, nonlinear, and time-varying characteristics of rolling factors, the calculation model of hot rolling bending force is extremely complicated [5,6]. The traditional mathematical model considers that all the rolling factors related to bending force have linear or approximate linear effects on bending force, and the coupling relationship between the rolling factors is weak in the model. Therefore, the mathematical model established according to the traditional theory is difficult to achieve the ideal prediction effect of bending force because of the limitations of its own structure [7].
Since the 1990s, artificial intelligence methods have been widely applied to rolling processes. Furthermore, Artificial Neural Networks (ANN) have been extensively studied and applied in the fields of mechanical property prediction [8][9][10], rolling force prediction [11][12][13][14], roughing mill temperature prediction [15], strip shape and crown prediction [16][17][18][19][20]. For the first time, Wang et al. [21] used the ANN model optimized by genetic algorithm to predict the bending force in the rolling process. The accuracy of the model is verified by actual factory data, which shows that the model can be flexibly used for on-line control and rolling schedule optimization.
These studies reveal that ANNs have been shown to perform nonlinear data well and have better predictive quality as compared to traditional mathematical models due to its good learning ability. However, they also have some shortcomings, such as the unexplained nature of relationships between the input and output parameters of the process, the need for higher calculation times, and the tendency of over-fitting, which leads to poor performance [22]. In recent years, besides ANN modeling, some new machine learning methods have emerged, such as Support Vector Machine (SVM), Classification and Regression Tree (CART), Bagging Regression Tree (BRT), Least Absolute Shrinkage and Selection Operator (LASSO), and Gaussian Process Regression (GPR). For the prediction research in the rolling field, some scholars have realized that better prediction results and prospects can be obtained by adopting new machine learning methods [21,23,24], and so far, there are few literature reports on bending force prediction.
Therefore, this research is motivated to investigate the application of SVM, CART, BRT, LASSO, GPR and ANN on bending force prediction in hot rolling process, and to comprehensively analyze and evaluate the prediction performance of these models from prediction accuracy, stability and computational cost. Through the comprehensive evaluation results of these models, a prediction model of bending force with high prediction accuracy and good stability can be proposed, and finally the profile quality of strip can be improved.
Inspired by this motivation, this paper first provides the basic principles of the six models; verifies the predictability of the bending force using these models based on real-life dataset of a 1580-mm hot rolling process in a steel factory. All the gauges used in this research were calibrated, and the measurement results are reliable and valid. The remaining part of the paper is organized as follows. Section 2 briefly describes the HSR process, the influencing factors of bending force, the acquisition, and processing of experimental data. Section 3 gives a brief description of literature review and basic theories of the six machine learning models. In addition, the three evaluation metrics are also given in this section. Sections 4 and 5 report the experimental results and discussion, respectively. In Section 6, we draw the conclusions. Figure 2 shows the complete rolling process in a typical HSR process. The HSR process consists of 6 key parts: the reheating furnace, the roughing mill, the hot coil box and flying shear, the finishing mill, the laminar cooling, and the coiler. The key equipment of the production line is a finishing mill group composed of 8 groups of stands, which determines the final shape of the strip. Each group of stand consists of a pair of work rolls and a pair of backup rolls. The spacing between the stand is 5.5 m. The whole line is equipped with work roll shifting and hydraulic roll bending systems to control flatness and plate crown.

Hot Rolling Technology and Bending Force
A single batch consists of a coil of rough steel, which enters the reheating furnace to be reheated to the appropriate temperature. Next, the strip passes through the roughing mill, where its thickness and width are reduced to close to the desired value. Then, the strip enters the finishing mill section, where the strip is carefully milled to the required width and thickness. The profile of the strip can be controlled by changing the bending forces between the two work rolls [25]. The strip thickness and flatness are measured in real time by an X-ray gauge at the end of the finishing stands as shown in Figure 2. Measuring the final dimensions of the strip is vital for the mill controllers. The controllers adjust mill parameters in real time with feedback from the gage to minimize strip flatness. Next, the strip is cooled by water to an appropriate final temperature. Finally, the strip is coiled and is ready for shipment.

Data Collection and Analysis
In this paper, the final stand rolling data of a 1580-mm HSR process in a steel factory are collected

Data Collection and Analysis
In this paper, the final stand rolling data of a 1580-mm HSR process in a steel factory are collected for experiments. The purpose of models in hot rolling is to predict strip characteristics prior to rolling the strip based on information about the mill. For the proposed prediction model of bending force, the input variables are entrance temperature ( • C), entrance thickness (mm), exit thickness (mm), strip width (mm), rolling force (kN), rolling speed (m/s), roll shifting (mm), yield strength (MPa), and target profile (µm). The output variable of the model is the bending force (kN). The information of the detection equipment for these parameters is shown in Table 1. In order to ensure the validity of the parameters, all the gauges used in this research were calibrated. The fractal dimension visualization diagram of the collected dataset is shown in Figure 3. Obviously, the input data vary considerably in different dimensions. Table 2 shows the data distributions for each input variables. In order to eliminate the difference between the numbers of different dimensional data, avoid prediction error increase because of the big difference between input and output data, and update the weights and biases conveniently in the modeling process. It is necessary to scale data to a small interval in a certain proportion. Normalization is required prior to data entry into the model [26]. The following formula is used to normalize the data: where y i , y i , y min , and y max are the normalized data, original data, maximal data, and minimal data, respectively.     In the present study, the K-fold cross validation method was used and 1440 pairs of measured bending force data were divided into five subsets. Four subsets were employed to train the machine learning models and the remaining one for testing the models. Furthermore, the measurement data should be processed with z-score normalization to the same scale to reduce the impact of different magnitudes and dimensions.

Artificial Neural Network (ANN)
ANNs are complex computational models inspired by the human nervous system, which are capable of machine learning and pattern recognition. ANN includes a wide range of learning algorithms that have been developed in statistics and artificial intelligence. It uses analogy with biological neurons to generate general solutions to the problem. Since ANNs are nonlinear classification techniques and also composed of an interconnected group of artificial neurons, they have the ability to learn complex relationships between input and output variables [17]. ANN is the earliest prediction model applied in the rolling field, including mechanical property prediction, rolling force prediction, roughing mill temperature prediction, flatness, and crown prediction [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. Therefore, ANN is the most widely applied model and the basic model for comparison.

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a supervised learning method developed from statistical learning theory to analyze data and pattern recognition, which can be used to classify and regression data [27]. SVM as a regression technique (SVR) is a nonlinear algorithm and the basic principle is to map the data to a high-dimensional feature space using a nonlinear mapping, and then construct the regression estimation function in the high-dimensional feature space and then map back to the original space, and this nonlinear transformation is achieved by defining the appropriate kernel function. Many machine learning algorithms follow the principle of empirical error minimization, while SVR follows the principle of structural risk minimization, so it can obtain better generalization performance [28]. SVRs are prominent in research and practice, due to their use of linear optimization techniques to find optimal solutions to nonlinear predictive problems in higher-dimensional feature spaces. Therefore, it has been widely employed for regression and forecasting in the fields of agriculture, hydrology, the environment, and metallurgy [29][30][31]. This encourages us to apply an SVR to the prediction of the HSR process.

Classification and Regression Tree (CART)
Decision trees (DT) is an important algorithm for machine learning. The classification and regression tree methodology, also known as the CART was introduced in 1984 by Breiman et al. [32]. CART has low computational complexity because of its recursive computation. To predict a response, follow the decisions in the tree from the root (beginning) node down to a leaf node. The leaf node contains the response. So CART is non-parametric and can find complex relationships between input and output variables. Therefore, CART also has the advantage of discovering nonlinear structures and variables interactions in the training samples [33]. Regression tree is a data mining algorithm widely used in regression problems of biology [34], environment [35] and material processing [36]. We selected CART to prediction bending force because they are an explanatory technique, able to reveal data structure, identify important characteristics, and develop rules.

Bagging Regression Tree (BRT)
Bagging (short for bootstrap aggregating) is a simple and very powerful ensemble method. Bagging is one of the simplest techniques, which can reduce variance when combined with the base learner generation, with surprisingly good performance [37]. Bagging Regression Tree (BRT) is the application of the bootstrap procedure to RT. The basic idea underlying BRT is the recognition that part of the output error in a single regression tree is due to the specific choice of the training dataset. Therefore, if several similar datasets are created by resampling with replacement (that is, bootstrapping) and regression trees are grown without pruning and averaged, the variance component of the output error is reduced [38,39]. The BRT has been widely used in the fields of biostatistics [40], remote sensing [41], and material processing [42] due to its flexibility and interpretability to high-order nonlinear modeling. Therefore, it is reasonable to compare and evaluate BRT as one of the optional models.

Least Absolute Shrinkage and Selection Operator (LASSO)
LASSO stands for Least Absolute Shrinkage and Selection Operator [43]. The idea behind the LASSO algorithm is to achieve a minimization of the residual sum of squares while regularizing the sum of the absolute value of the coefficients being less than a given constant. LASSO technique has been successfully developed in recent years, combining shrinkage and highly correlated variables. LASSO regression is characterized by variable selection and regularization while fitting the generalized linear model. Regularization is to control the complexity of the model through a series of parameters so as to avoid over fitting. LASSO has been widely used in temperature prediction [44], the wavelength analysis [45], and streamflow prediction [46]. In view of its wide application in industry, LASSO is also taken as one of the research models in this paper.

Gaussian Process Regression (GPR)
Gaussian Process Regression (GPR) is a new machine learning regression method developed in recent years, and it is also a non-parametric model algorithm based on Bayesian network. The GPR algorithm can adaptively determine the number of model parameters according to the information of the training samples, and add the prior knowledge of the existing objects in the modeling process, and then combine the actual experimental data to obtain the posterior Gauss process model [47]. When GPR is applied to practical problems, GPR can give a confidence interval while outputting the mean value, making the validity of the prediction result continuously enhanced. In addition, because the GPR can quantitatively model Gaussian noise, it has excellent prediction accuracy [48,49]. Because of its good predictive ability, GPR has been widely used in data-driven modeling of various problems in industry [50][51][52][53], so GPR has also become an optional scheme in this paper.

Model Information
All models were implemented in Matlab (Version 2015b, MathWorks, Natick, MA, USA) under a computer with a hardware configuration of Intel Core i7-7500U CPU 2.7 GHz, and 8 GB of RAM. CART, BRT, LASSO, and GPR were carried out with treefit, fitensemble, lasso, and fitrgp functions, respectively. These four functions are included in Matlab's Statistics and Machine Learning Toolbox. The parameters of these models were automatically optimized by Matlab function according to the training dataset. For SVR, the parameter C (trade-off parameter between the minimization of errors and smoothness of the solution), and the parameter σ (the width of the RBF kernel function) are needed to determine the process of model establishment. In order to reveal the effect of C and σ on prediction results in training dataset. Ten logarithmically, equally spaced points were generated between 1 and 1000 for C. Twenty logarithmically, equally spaced points were generated between 10 and 1000 for σ. The optimized C and σ were determined to be 100 and 69.5, respectively. For ANN, the transfer function of hidden layer and output layer are "tansig" and "purelin", respectively. The "tainlm" (Levenberg-Marquardt) was chosen as the optimal learning algorithm. In this paper, the performance of neural networks with 2-30 neurons was investigated considering the single hidden layer. The number of neurons in the network hidden layer is determined to be 6, and the ANN has the best performance.

Comparison of Model and Statistical Error Analysis
The accuracy and performance of the studied models for bending force prediction were evaluated and compared using three commonly used statistical metrics, which were root mean square error (RMSE, Equation (2)), mean absolute error (MAE, Equation (3)), and coefficient of determination (R 2 , Equation (4)). The mathematical equations of the statistical indicators are described below.
where y i and y * i are the measured values and predictive values respectively, N is the total number of predicted data. Higher values of R 2 are preferred, i.e., closer to 1 means better model performance and regression line fits the data well. On the contrary, the lower the RMSE and MAE values are, the better the model performs. Table 3 shows the results of the ANN, SVM, CART, BRT, LASSO, and GPR models performing 30 trials in training and testing dataset. It can be seen that the predicted bending force varies considerably depending on the model selection. In training dataset, no matter which evaluation metrics are used, BRT shows the best prediction performance with the highest R 2 and lowest RMSE and MAE. In testing dataset, GPR shows the best prediction performance with the highest R 2 and lowest RMSE and MAE. The prediction performance of BRT follows that of GPR. On the contrary, LASSO has the worst prediction accuracy not only in the training dataset but also in the testing dataset.   Figure 4 shows the accuracy ranking results of the six models in training and testing dataset. The prediction accuracy value of the model can be read out from the ordinate, and the ranking results of these models are showed with numbers above the color bars. As can be seen from Figure 4, first of all, the accuracy ranking results show slight difference in training and testing dataset. In training dataset, the accuracy ranking results is consistent with the three evaluation metrics, and the rank order is: BRT, GPR, CART, ANN, SVM, LASSO. However, the accuracy ranking result changes slightly in testing dataset. In addition, with different accuracy evaluation metrics, the accuracy ranking results are also different. The accuracy rank with the metrics of RMSE and R 2 in descending order is: GPR, BRT, CART, ANN, SVM, LASSO; and that of MAE in descending order is: GPR, CART, BRT, ANN, SVM, LASSO. Based on the comprehensive performance of the two datasets, it can be considered that GPR model shows the best prediction accuracy and LASSO model has the worst prediction accuracy.  Figure 4 shows the accuracy ranking results of the six models in training and testing dataset. The prediction accuracy value of the model can be read out from the ordinate, and the ranking results of these models are showed with numbers above the color bars. As can be seen from Figure 4, first of all, the accuracy ranking results show slight difference in training and testing dataset. In training dataset, the accuracy ranking results is consistent with the three evaluation metrics, and the rank order is: BRT, GPR, CART, ANN, SVM, LASSO. However, the accuracy ranking result changes slightly in testing dataset. In addition, with different accuracy evaluation metrics, the accuracy ranking results are also different. The accuracy rank with the metrics of RMSE and R 2 in descending order is: GPR, BRT, CART, ANN, SVM, LASSO; and that of MAE in descending order is: GPR, CART, BRT, ANN, SVM, LASSO. Based on the comprehensive performance of the two datasets, it can be considered that GPR model shows the best prediction accuracy and LASSO model has the worst prediction accuracy. The scatter plot of the bending force values measured by the factory and the values predicted by the six machine learning models in training and testing dataset are presented in Figure 5. Scattered points of different colors in the figure represent the predicted values by different models. In training and testing dataset, the predicted values of BRT and GPR models are closely distributed on both sides of the straight line y = x. The results show that the predicted bending force of the two models have a better correlation with the measured bending force value, and the two models are superior to the other four models for bending force prediction. On the contrary, LASSO has the worst prediction accuracy with the most scattered predicted value around the straight line y = x, indicating that the predicted values are much different from the measured values. The maximum error of all other data points predicted by the six models is within 10%. Therefore, these six models have achieved good prediction performance. The scatter plot of the bending force values measured by the factory and the values predicted by the six machine learning models in training and testing dataset are presented in Figure 5. Scattered points of different colors in the figure represent the predicted values by different models. In training and testing dataset, the predicted values of BRT and GPR models are closely distributed on both sides of the straight line y = x. The results show that the predicted bending force of the two models have a better correlation with the measured bending force value, and the two models are superior to the other four models for bending force prediction. On the contrary, LASSO has the worst prediction accuracy with the most scattered predicted value around the straight line y = x, indicating that the predicted values are much different from the measured values. The maximum error of all other data points predicted by the six models is within 10%. Therefore, these six models have achieved good prediction performance. Figure 6 shows the measured bending force and predicted bending force by the six models in training and testing dataset, and also shows prediction errors below. It clearly shows that the BRT has the best prediction performance in training dataset, the maximum positive error is 23.60 kN and the maximum negative error is −25.28 kN. In testing dataset, GPR has the best prediction performance with the maximum positive error as 31.84 kN and the maximum negative error as −26.50 kN. The errors of BRT and GPR are more concentrated in the range of 0 kN, which means that the number of samples with large error values is smaller. On the contrary, the LASSO performs worst in the six models of two datasets. In training dataset, the maximum positive error is 41.07 kN and the maximum negative error is −44.56 kN. In testing datasets, the maximum positive error is 42.99 kN and the maximum negative error is −27.04 kN.  Figure 6 shows the measured bending force and predicted bending force by the six models in training and testing dataset, and also shows prediction errors below. It clearly shows that the BRT has the best prediction performance in training dataset, the maximum positive error is 23.60 kN and the maximum negative error is −25.28 kN. In testing dataset, GPR has the best prediction performance with the maximum positive error as 31.84 kN and the maximum negative error as −26.50 kN. The errors of BRT and GPR are more concentrated in the range of 0 kN, which means that the number of samples with large error values is smaller. On the contrary, the LASSO performs worst in the six models of two datasets. In training dataset, the maximum positive error is 41.07 kN and the maximum negative error is −44.56 kN. In testing datasets, the maximum positive error is 42.99 kN and the maximum negative error is −27.04 kN.  Figure 7 shows the histograms and distribution curves of the errors. All the error distribution curves have a bell shape of normal distribution, which indicates that the prediction errors of all models are normal distribution. Whether in training or in testing datasets, GPR, BRT, and CART models perform relatively well, and their normal distribution curves are higher and narrower, which indicate that more prediction values with smaller errors are obtained. In addition, it is also found that  Figure 7 shows the histograms and distribution curves of the errors. All the error distribution curves have a bell shape of normal distribution, which indicates that the prediction errors of all models are normal distribution. Whether in training or in testing datasets, GPR, BRT, and CART models perform relatively well, and their normal distribution curves are higher and narrower, which indicate that more prediction values with smaller errors are obtained. In addition, it is also found that the dataset centers (the highest point of normal distribution curve) of most models are close to the zero point of errors. The dataset center represents the average value of errors, indicating that the probabilities of positive error and negative error are almost equal. However, the normal distribution curve of LASSO shifts to right obviously in testing dataset, which indicates that the positive error of LASSO model is much more than the negative error and the predicted values are higher.

Stability of Various Models
The prediction accuracy results show that the prediction accuracy in testing dataset of all models is lower than that in the training dataset. In addition, in training dataset, the prediction accuracy of BRT model is better than the GPR. However, in testing dataset, the prediction accuracy of GPR model is better than the BRT (showed in Figure 4). The difference of prediction accuracy between in training and testing dataset can be regarded as the stability of the model. The stability of machine learning model is also an important factor affecting the prediction performance, which should be taken into account when evaluating the reliability of predicted result. The stability of the machine learning model is the relative change percentage of the evaluation metrics (including RMSE, MAE, and R 2 ) of the model in training and testing datasets [31]. The smaller the relative change percentage, the higher the stability of the model. The relative change percentage of evaluation metrics under the two datasets can be described by δi,N and calculated by the following formula: where i represent the evaluation metrics (RMSE, MAE, or R 2 ) and N represent the one of the six models.

Stability of Various Models
The prediction accuracy results show that the prediction accuracy in testing dataset of all models is lower than that in the training dataset. In addition, in training dataset, the prediction accuracy of BRT model is better than the GPR. However, in testing dataset, the prediction accuracy of GPR model is better than the BRT (showed in Figure 4). The difference of prediction accuracy between in training and testing dataset can be regarded as the stability of the model. The stability of machine learning model is also an important factor affecting the prediction performance, which should be taken into account when evaluating the reliability of predicted result. The stability of the machine learning model is the relative change percentage of the evaluation metrics (including RMSE, MAE, and R 2 ) of the model in training and testing datasets [31]. The smaller the relative change percentage, the higher the stability of the model. The relative change percentage of evaluation metrics under the two datasets can be described by δ i,N and calculated by the following formula: where i represent the evaluation metrics (RMSE, MAE, or R 2 ) and N represent the one of the six models. Figure 8 shows the δ i,N from three evaluation metrics of the six models performing 30 trials. It shows that the stability rankings of ANN, SVR, CART, BRT, LASSO, and GPR models are slightly different with different evaluation metrics. With the evaluation metrics of RMSE and R 2 , SVR shows the most stable performance with the lowest δ i,N values of 2.33% and 0.35%, respectively. LASSO shows the most stable performance with the lowest δ i,N values of 2.33% in the evaluation metrics of MAE. However, no matter which evaluation metrics is used, the BRT shows the most unstable performance with the highest δ i,N . The δ i,N are 56.68%, 58.93%, and 2.05% calculated by RMSE, MAE, and R 2 , respectively. This unstable performance reveals that when new input data is used, it will lead to a significant reduction in prediction accuracy. This is because the BRT model has a large number of hyper-parameters, which need to be carefully optimized for model application [31].
Metals 2020, 10, x FOR PEER REVIEW 13 of 18 Figure 8 shows the δi,N from three evaluation metrics of the six models performing 30 trials. It shows that the stability rankings of ANN, SVR, CART, BRT, LASSO, and GPR models are slightly different with different evaluation metrics. With the evaluation metrics of RMSE and R 2 , SVR shows the most stable performance with the lowest δi,N values of 2.33% and 0.35%, respectively. LASSO shows the most stable performance with the lowest δi,N values of 2.33% in the evaluation metrics of MAE. However, no matter which evaluation metrics is used, the BRT shows the most unstable performance with the highest δi,N. The δi,N are 56.68%, 58.93%, and 2.05% calculated by RMSE, MAE, and R 2 , respectively. This unstable performance reveals that when new input data is used, it will lead to a significant reduction in prediction accuracy. This is because the BRT model has a large number of hyper-parameters, which need to be carefully optimized for model application [31]. The distribution of the three evaluation metrics obtained from six machine learning models performing 30 trials in training and testing dataset are illustrated in Figure 9 using a boxplot. It represents the degree of spread for the prediction accuracy with its respective quartile. In training dataset, the suspected outliers of the prediction accuracy only appear in ANN and CART models. In testing dataset, suspected outliers appear in ANN, SVM, CART, and BRT models. At the same time, it is found that the quartile distance of testing dataset is increasing compared with that in training dataset, which indicates that the degree of dispersion of prediction accuracy became larger. Although Figure 8 shows that both SVM and LASSO models have the most stable performance. However, the variation of the quartile distance of the LASSO model is most obvious in the two datasets. Considering comprehensively, it can be considered that SVM is the most stable model. The distribution of the three evaluation metrics obtained from six machine learning models performing 30 trials in training and testing dataset are illustrated in Figure 9 using a boxplot. It represents the degree of spread for the prediction accuracy with its respective quartile. In training dataset, the suspected outliers of the prediction accuracy only appear in ANN and CART models. In testing dataset, suspected outliers appear in ANN, SVM, CART, and BRT models. At the same time, it is found that the quartile distance of testing dataset is increasing compared with that in training dataset, which indicates that the degree of dispersion of prediction accuracy became larger. Although Figure 8 shows that both SVM and LASSO models have the most stable performance. However, the variation of the quartile distance of the LASSO model is most obvious in the two datasets. Considering comprehensively, it can be considered that SVM is the most stable model. dataset, the suspected outliers of the prediction accuracy only appear in ANN and CART models. In testing dataset, suspected outliers appear in ANN, SVM, CART, and BRT models. At the same time, it is found that the quartile distance of testing dataset is increasing compared with that in training dataset, which indicates that the degree of dispersion of prediction accuracy became larger. Although Figure 8 shows that both SVM and LASSO models have the most stable performance. However, the variation of the quartile distance of the LASSO model is most obvious in the two datasets. Considering comprehensively, it can be considered that SVM is the most stable model.   Table 4 and Figure 10 shows the computational cost (time used for computation) of the six machine learning models. The CART and LASSO show the lowest computational costs of 0.59 s and 0.35 s, respectively. BRT and ANN also show smaller computational costs of 2.11 s and 9.35 s, respectively. Compared with the above four models, the computational cost of GPR increases to 63.25 s. Furthermore, the computational cost of SVM reaches the maximum value, which is 305.09 s.  Table 4 and Figure 10 shows the computational cost (time used for computation) of the six machine learning models. The CART and LASSO show the lowest computational costs of 0.59 s and 0.35 s, respectively. BRT and ANN also show smaller computational costs of 2.11 s and 9.35 s, respectively. Compared with the above four models, the computational cost of GPR increases to 63.25 s. Furthermore, the computational cost of SVM reaches the maximum value, which is 305.09 s.

Comprehensive Evaluation of Various Models
Based on the above results, six machine learning models comprehensively evaluated from prediction accuracy, stability, and computational cost, and the results are shown in Figure 11. It must be pointed out that the prediction accuracy here is in testing dataset. Figure 11 shows that GPR

Comprehensive Evaluation of Various Models
Based on the above results, six machine learning models comprehensively evaluated from prediction accuracy, stability, and computational cost, and the results are shown in Figure 11. It must be pointed out that the prediction accuracy here is in testing dataset. Figure 11 shows that GPR provides the best combination of prediction accuracy, stability, and computational cost. The prediction accuracy of BRT, CART, and ANN models is slightly worse than that of GPR. For the three models, the prediction accuracy and computational cost of them are not much different, but BRT has the worst stability. In addition, SVM does not perform well in terms of prediction accuracy and computational cost. LASSO has good stability and computational cost, but the prediction accuracy is the worst.
Metals 2020, 10, x FOR PEER REVIEW 15 of 18 the worst stability. In addition, SVM does not perform well in terms of prediction accuracy and computational cost. LASSO has good stability and computational cost, but the prediction accuracy is the worst.

Discussion
With the development of technology, today, in the production process of a series of steel, the equipment will maintain a stable operation state, so the rolling process is also carried out stably. Therefore, for specific strip specification, most rolling processes will obtain relatively stable datasets without large variability. Then, the key technology of rolling parameter prediction is how to improve the prediction accuracy. The stable data can better reflect the normal rolling process. The purpose of this paper is to discuss the comparison results of prediction accuracy when different models are applied to bending force prediction. Therefore, in order to reflect the most essential characteristics of the model and obtain a fair comparison result, the machine learning models used in this paper were not over optimized by combining with other intelligent optimization algorithms (such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), etc.). We believe that this paper can be used as a reference and basis for other similar prediction applications and research to select machine learning models, and it is more practical to select basic original models and parameters.

Conclusions
In this paper, we applied six machine learning models, including ANN, SVR, CART, BRT, LASSO, and GPR, to predict the bending force in the HSR process. A comparative experiment was carried out based on real-life dataset, and the prediction performance of the six models was analyzed from prediction accuracy, stability, and computational cost. All the gauges used in this research were calibrated to ensure the validity of the data and reliability of the results. The prediction performance of the six models was assessed using three evaluation metrics of RMSE, MAE, and R 2 .
1) The comparison results of prediction accuracy show that the accuracy ranking results in testing dataset are slightly different under the three evaluation metrics. However, considering that GPR performs best, followed by BRT, CART, ANN, SVM, and LASSO respectively. The bending force measured by experiment is 690~890 kN, while the prediction error of GPR is only 8.51 kN (RMSE) and 6.61 kN (MAE). 2) The ranking results of stability show inconsistency in the three evaluation metrics. However, considering comprehensively, SVM shows the most stable performance with the γ of 2.33% (RMSE), 0.32% (MAE) and 0.35% (R 2 ). The stability decreases in the order of LASSO, ANN, GPR, CART, and BRT. BRT shows the most unstable performance with the γ of 56.68% (RMSE), 58.93% (MAE), and 2.05% (R 2 ). 3) The computational cost of the six models presents three levels. The computational costs of LASSO, CART, BRT, and ANN are increasing gradually, but they are all within ten seconds. The computational cost of GPR model is slightly higher, at about 63 s. However, the computational cost of SVM has reached more than 300 s. 4) Comprehensively considering the prediction accuracy, stability, and computational cost of the

Discussion
With the development of technology, today, in the production process of a series of steel, the equipment will maintain a stable operation state, so the rolling process is also carried out stably. Therefore, for specific strip specification, most rolling processes will obtain relatively stable datasets without large variability. Then, the key technology of rolling parameter prediction is how to improve the prediction accuracy. The stable data can better reflect the normal rolling process. The purpose of this paper is to discuss the comparison results of prediction accuracy when different models are applied to bending force prediction. Therefore, in order to reflect the most essential characteristics of the model and obtain a fair comparison result, the machine learning models used in this paper were not over optimized by combining with other intelligent optimization algorithms (such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), etc.). We believe that this paper can be used as a reference and basis for other similar prediction applications and research to select machine learning models, and it is more practical to select basic original models and parameters.

Conclusions
In this paper, we applied six machine learning models, including ANN, SVR, CART, BRT, LASSO, and GPR, to predict the bending force in the HSR process. A comparative experiment was carried out based on real-life dataset, and the prediction performance of the six models was analyzed from prediction accuracy, stability, and computational cost. All the gauges used in this research were calibrated to ensure the validity of the data and reliability of the results. The prediction performance of the six models was assessed using three evaluation metrics of RMSE, MAE, and R 2 .
(1) The comparison results of prediction accuracy show that the accuracy ranking results in testing dataset are slightly different under the three evaluation metrics. However, considering that GPR performs best, followed by BRT, CART, ANN, SVM, and LASSO respectively. The bending force measured by experiment is 690~890 kN, while the prediction error of GPR is only 8.51 kN (RMSE) and 6.61 kN (MAE). (2) The ranking results of stability show inconsistency in the three evaluation metrics. However, considering comprehensively, SVM shows the most stable performance with the γ of 2.33% (RMSE), 0.32% (MAE) and 0.35% (R 2 ). The stability decreases in the order of LASSO, ANN, GPR, CART, and BRT. BRT shows the most unstable performance with the γ of 56.68% (RMSE), 58.93% (MAE), and 2.05% (R 2 ). (3) The computational cost of the six models presents three levels. The computational costs of LASSO, CART, BRT, and ANN are increasing gradually, but they are all within ten seconds. The computational cost of GPR model is slightly higher, at about 63 s. However, the computational cost of SVM has reached more than 300 s. (4) Comprehensively considering the prediction accuracy, stability, and computational cost of the six models, GPR can be considered the most promising machine learning model for predicting bending force. The prediction accuracy and stability of CART and ANN is slightly lower than GPR, but the computational cost is relatively small, so it can also be used as an alternative.
In addition, BRT also shows the better combination of prediction accuracy and computational cost, but the stability of BRT is the worst among the six models. SVM not only performs poorly in prediction accuracy, but also has the greatest computational cost. While LASSO has good stability and small computational cost, but it also has the worst prediction accuracy.