Controlled Cooling Temperature Prediction of Hot-Rolled Steel Plate Based on Multi-Scale Convolutional Neural Network

: Controlled cooling technology is widely used in hot-rolled steel plate production lines. The final cooling temperature directly affects the microstructure and properties of steel plates, but cooling and heat transfer constitutes a nonlinear process, which is difficult to be accurately described using a mathematical model. In order to improve the accuracy of the controlled cooling temperature, a multi-scale convolutional neural network is used to predict the final cooling temperature. Convolution kernels with different sizes are introduced in the layer of a multi-scale convolutional neural network. This structure can simultaneously extract the feature information of different sizes and improve the perceptual power of the network model. The measured steel plate thickness, speed, header flow, and other variables are taken as input. The final cooling temperature is taken as the output and predicted using a multi-scale convolutional neural network. The results show that the multi-scale convolution neural network prediction model has strong generalization and nonlinear fitting ability. Compared with the traditionally structured BP neural network and convolution neural network (CNN), the mean square error ( MSE ) of the multi-scale convolutional neural network decreased by 24.7% and 12.2%, the mean absolute error ( MAE ) decreased by 19.6% and 7.97%, and the coefficient of determination ( R 2 ) improved by 4.26% and 2.65%, respectively. The final cooling temperature traditional structure by the multi-scale CNN agreed with the actual temperature within ±10% error bands. As the prediction accuracy


Introduction
With the development of steel industry, low-alloy, high-strength, high-toughness, and high-welding performance steel has been widely used in the shipbuilding, automobile-manufacturing, and construction industries, among others.The requirements for different varieties, quality, and performance of hot-rolled steel plates are constantly increaseing.The important index to measure the quality of steel plates is the cooling temperature control precision.How to improve the cooling temperature control precision has become a research hot spot in the steel industry [1].
Controlled cooling technology can be used to give steel plates better organizational properties [2].The precipitation behavior of carbide and phase transformation can be controlled by changing the cooling conditions of steel plates with water flow [3].
In the steel making process, the process of converting the refined, molten steel into steel billets is known as continuous casting.In the continuous casting process, the molten steel is transformed into solid form through continued cooling.In the subsequent process, the billet is subjected to reheating and maintained at a specific temperature in the reheat furnace.Once the billet is discharged from the reheating furnace, it undergoes the deformation process in the rough mill and the finishing mill.During this step, the geometry of the steel plate is tailored to meet the final specifications.Following this step, the steel plates are cooled to the specified temperature using accelerating control cooling (ACC).It is worth mentioning that the use of accelerated cooling control devices plays an important role in controlling the cooling temperature and cooling rate of steel plates.This controlled cooling process has a profound effect on the organization and mechanical properties of steel plates [4].Finally, the steel plates are geometrically adjusted in the leveling machine (leveler) before entering the heat treatment furnace [5]. Figure 1 shows a schematic diagram of the plate production process described above.In order to ensure the quality and the correct shape of steel plates, the control accuracy of the cooling temperature is particularly important.However, the cooling process is accompanied by a complex heat exchange reaction, which is a nonlinear process with strong coupling and multi-variable characteristics.In actual production, the method of correcting the final cooling temperature using an adaptive function and a prior model has a poor effect on thick gauge steel plates [6].
Many researchers have studied the heat transfer coefficient, which is the core process parameter in controlling cooling.Wang et al. [7], measured the heat transfer coefficient of hot steel in their experiments.Ma et al. [8], studied the heat transfer coefficient for supercritical water based on a neural network in a nuclear reactor.Olivia et al. [9], studied air/water spray cooling to predict the heat transfer coefficient through neural network analysis in the forging process.Zheng et al. [10], developed an online modeling method for ACC to calculate the heat transfer coefficient.Although the prediction accuracy of the finish cooling temperature was improved by these studies, the prediction error still involved nonlinear compensation.
With the increase in cooling data and computing power in the production process, the advantages of deep learning models have become more prominent [11].Among deep learning algorithms, convolutional neural networks have achieved a wide range of applications in target detection [12], semantic segmentation [13], natural language processing [14], bioinformatics [15], and other fields.Due to their strong anti-noise feature extraction ability and complex function expression, deep learning algorithms are especially suitable for complex nonlinear processes and have stable performance when learning data are sufficient.
As mentioned, neural networks are widely used in steel processing.Wang et al. [16], studied the neural network to predict the cooling temperature, which was a useful approach to the final cooling temperature.Lim et al. [17], applied artificial neural network (ANN) of backpropagation to solve the nonlinear tendency of a specific heat during the accelerated control cooling process.Ai et al. [18], developed the microstructure prediction model based on controlled rolling and cooling process parameters using an artificial neural network.Bhutada et al. [19], used a convolution neural network model to correlate the microstructures against various components and measures of stress.Artificial neural networks have been applied to intelligent manufacturing, such as flow stress for rolling force [20], tensile performance evaluation [21], strip flatness prediction in the tandem cold rolling process [22], mechanical cooling system [23], analyzing microstructures [24], the heat transfer prediction of supercritical water [25], the optimization of process parameters in feed manufacturing [26], and calculating heat flux in nuclear engineering [27].
The convolutional neural network is a feedforward neural network with a certain depth due to overlaying multiple layers of convolutional computation, which is a typical algorithm of deep learning [28].Convolutional neural networks use convolutional layers to extract feature information at different levels, use pooling layers to reduce dimensionality, and finally fuse low-level features into high-level features [29].Unlike traditional BP neural networks, convolutional neural networks can automatically extract features and use weight sharing to significantly reduce network complexity [30].
A controlled cooling temperature prediction model based on a multi-scale convolutional neural network is proposed in this paper.The above model avoids the complicated theoretical calculation of the heat exchange coefficient and forecasts the final cooling temperature through the neural network.Therefore, the accuracy of prediction is improved, which has a certain theoretical relevance for the convolutional neural network prediction model in practical industrial production applications.

Convolutional Layer
Convolutional neural networks are usually composed of several convolutional layers, each of which contain several convolutional kernels.The advantage of the convolution layer is that it adopts local connectivity and weight-sharing operations to reduce computational parameters.Thus, the computational efficiency can be improved effectively.
The convolutional layer parameters are updated via back-propagation through network training several times.The steps of convolution operation are as follows: First, the input is multiplied by the elements at the corresponding positions of the convolution kernel, and then summed up to obtain the characteristic information of the input data.Finally, the result is transmitted to the next convolution layer, and the convolution operation is repeated continuously to achieve feature extraction [31].The convolution formula is shown in Equation (1): where l j Y is the j term output of the l layer convolution feature.f is the activation function.j M is the number of i layer inputs.W is the weight matrix.l j B is the bias term.

Pooling Layer
After the convolutional layer, the input information usually has a very high dimensionality.The next operation will increase the computational burden, so it needs to be sent to the pooling layer for feature dimensionality reduction.Different pooling functions can be set at the pooling layer to sample the obtained feature graph data by block partition and reduce parameters to alleviate the over-fitting situation.The pooling layer can change the output size by modifying the pooled layer size, moving the step, and padding.Common pooling methods include maximum pooling and mean pooling.The pooling output formula is shown in Equation (2): where l j Y is the j term output of the l layer convolutional feature.F is the activation function.
1 l i X − is the i term input of the upper layer convolution feature.pooling is the pooling operation.l j β is the pooling factor.l j B is the bias term.

Model Input and Output Selection
The input features of the neural network model, which have little influence on the output result, should be removed as much as possible.Reducing the number of useless features can effectively improve the training effect and reduce the training time [32].
In the controlled cooling model, the convective heat transfer coefficient has a complex nonlinear relationship with its associated physical quantities, such as the final rolling temperature, water flow density, plate thickness, and water temperature.It is difficult to determine the specific functional relationship between them.
The convective heat transfer coefficient can affect the final cooling temperature, but it cannot be directly used as the input of the neural network.Therefore, several other variables indirectly affecting the heat transfer coefficient are used as the input [33].
The final input variables of the neural network were the plate width, plate thickness, plate speed, plate surface temperature, water temperature, water pressure, direct quenching (DQ) header opening ratio, ACC header opening ratio, DQ header flow, and ACC header flow.A total of 22 variables were selected as the input of the neural network, and the output variable was the final cooling temperature.

Data Acquisition and Preprocessing
The original data of the neural network were obtained from one iron and one steel plant, which were detected using industrial equipment.The production process of each steel plate corresponded to a set of data.Steel plates of the same specification are usually mass produced.It can be assumed that most samples were repeated more than 50 times, and each repetition added a new datum.
The original data of plate production process were measured using different sensors that were connected to the Level 1 automation system (based on the programmable logic controller).The plate speed was measured using the roller speed measuring encoder, the temperature data were obtained using the non-contact infrared pyrometer, and the flow data were measured using the valve flow meter.Furthermore, the Level 2 automation system (based on a computer server) analyzed the original data from Level 1, and then stored the average values in a database.In the process of data collection, all data will fluctuate in the whole production cycle of steel plates.Due to this situation, the data used in this paper were the average data of the steel plate length method after removing the two-meter positions of the heads and tails of the steel plates, which can better reflect the technological characteristics of each steel plate.
The experimental plan was based on the production characteristics of the site.The main objective was to cover as wide a range of steel sheet specifications as possible, including different widths, thicknesses, and speeds.Such a plan aimed to better verify the accuracy and generalization performance of the model and provide better support for industrial applications.
Before network training, original data needed to be preprocessed.By eliminating the data with large errors and missing data, 13,513 sets of sample data were finally identified.The data set included the plate speed, plate width, plate thickness, water temperature, water pressure, plate surface temperature, and different header flow in the steel plate cooling process as the input value of the network, and the final cooling temperature as the output value.The original data sets statistics are shown in Table 1.

Multi-Scale Convolutional Neural Network Structure and Parameter Design
The convolutional neural network is different from the fully connected neural network due to its local connection, weight sharing, and pooling operation.In the case of ensuring the extraction ability, the convolutional neural network size, the number of parameters and the training time are effectively reduced.Figure 2 is a schematic diagram of the temperature prediction model using a convolutional neural network structure.A one-dimensional input matrix (Input) was formed by influencing factors, such as the plate thickness, plate width, plate speed, water temperature, water pressure, plate surface temperature, DQ header flow, DQ header opening ratio, ACC header flow, and ACC header opening ratio, and the final cooling temperature was used as the output.
The convolutional neural network uses a one-dimensional convolution kernel for convolution operations.First, a 1 × 5 convolution kernel (Conv) was used for feature extraction.Second, a 1 × 2 max pooling (Maxpool) layer was used for downsampling to reduce the data dimension.Third, a 1 × 3 convolution kernel and 1 × 2 max pooling were used again to repeat the operation of stacking convolution and pooling layers to improve the feature extraction capability.
After the multi-layer convolution operation, the original low-level features were complemented and fused to form high-level features.The results were flattened into a one-dimensional vector and connected to the fully connected layer.Finally, it was transmitted to the output layer to predict the final cooling temperature.

Multi-Scale Convolutional Structure Design
The inception structure was innovatively introduced by GoogleNet.It uses convolution kernels of different sizes to conduct a convolution operation on the same input, and then concatenates feature information of different sizes in the depth direction.Compared to convolutional neural networks with a single-size convolution kernel at each layer, the inception structure is more adaptable and can extract more effective features [34].In this paper, the inception structure was introduced into the controlled cooling temperature prediction model, which is called multi-scale convolutional neural network.
In the convolutional neural network structure, each layer usually uses only one operation, such as convolution or pooling, and the convolution kernel size of the convolution operation is also fixed.In the first layer of the multi-scale convolutional neural network, convolution kernels with different sizes of 1 × 3 Conv and 1 × 5 Conv are introduced.This structure can simultaneously extract the feature information of different university sizes and improve the perceptual power of the network model.A 1 × 2 Maxpool can reduce the data dimension and calculation amount while effectively retaining most of the feature information.First, by setting less 1 × 1 Conv kernels than the number of feature dimensions, the number of channels can be reduced while keeping the size of the feature graph constant, so as to reduce the amount of parameter computation in the convolution layer.Second, the increased 1 × 1 Conv is followed by non-linear excitation, which can also improve the expression ability of the network.Finally, the network will concatenate the features obtained from the four convolutional branches according to the depth direction and then pass them to the next layer of the network.The structure of the first layer of the network is shown in Figure 3.The second layer of the network continued to use asymmetric convolutional kernels to extract the features obtained from the previous layer.Convolution kernels with different sizes of 1 × 3 Conv, 1 × 1 Conv, and 1 × 3 Maxpooling were used in combination.The structure is shown in Figure 4.The complete network model is shown in Figure 5.The factors affecting the cooling temperature were constructed into a one-dimensional matrix characteristic input network.First, the backbone 1 × 3 Conv was used to roughly extract features.Next, the first layer network (Figure 3) was combined with the second layer network (Figure 4) into a block, while stacking two blocks to increase the network depth.Then, all the feature information was pooled and down sampled using global average pooling to reduce the data dimension.Finally, the one-dimensional vector obtained by global pooling was fully connected with 50 neurons, and the final cooling temperature was predicted using the regression model in the output layer.

Parameter Design 1. Activation function
The Relu function was used for the input layers and hidden layers of the network, which has the advantage that the gradient will not be saturated and alleviates the occurrence of gradient disappearance.

Optimization algorithm
After comparing different optimizers, this paper chose an adaptive gradient descent method (Adam) to avoid the slow convergence problem of the stochastic gradient descent algorithm that maintains the single learning rate.

Other parameters
The initial learning rate of the network was 0.001, the maximum amount of training was 1500 times, and the batch number was 30.

Evaluation function
The mean square error (MSE), mean absolute error (MAE), and coefficient of determination (R 2 ) were used to evaluate the performance index function in the regression task.

Experimental Environment Configuration
The software environment was the Windows 10 64-bit operating system.The CPU was Inter I7-10700.The memory was 64 G.The graphics card was RTX-2080Ti.The program was written using open source Tensorflow.The experimental pre-configured hyperparameters are shown in Table 2.

Experimental Results and Discussion
In order to verify the performance of the multi-scale convolutional neural network, the BP neural network with the optimal structure and the convolutional neural network with the traditional structure were used as contrast tests in this paper.The test set errors of the BP neural network, convolutional neural network, and multi-scale convolutional neural network are shown in Table 3.As can be seen from Table 3, the MSE and MAE of the multi-scale convolutional neural network were smaller than those of the BP neural network and traditional convolutional neural network, and had a higher R 2 with a better generalization ability.Compared with the BP neural network and CNN, the MSE of the multi-scale convolutional neural network decreased by 24.7% and 12.2%, MAE decreased by 19.6% and 7.97%, and R 2 improved by 4.26% and 2.65%, respectively.
The comparison between the partially predicted values and actual values is shown in Table 4.In order to visually show the prediction effect, six samples were randomly selected for prediction.It can be seen that among the three models, the error between the predicted value and the actual value of the multi-scale convolutional neural network was the smallest.Taking the first set of data as an example, the actual value of the steel plate final cooling temperature was 647.55 °C, the predicted value of the BP neural network was 599.43 °C, the convolution neural network value was 660.24 °C, and the multi-scale convolution neural network value was 655.11 °C.The prediction errors of these three neural networks were −48.12 °C, 12.69 °C, and 7.56 °C, respectively.The multi-scale convolutional neural network had the smallest error.Figure 6 shows the correlation curves between the predicted temperature and the actual temperature of the three neural network models, and the test data cover the whole data range.R 2 is the correlation coefficient indicating the closeness between them.The closer R 2 is to one, the closer the predicted value is to the actual value.As can be seen from Figure 6, the BP neural network model had a relatively scattered scatter distribution, and the value of R 2 was 0.891, which was the lowest among the three neural networks.The value of convolutional neural network model R 2 was 0.905.However, most of the scattered points of the multi-scale convolutional neural network were concentrated near the diagonal direction, and the R 2 was 0.929, which is the highest among the three kinds of neural networks, indicating that the prediction results are more accurate.
In Figure 7, 300 samples were randomly selected for prediction.The proportion of samples with a relative error within ±10% was counted, among which, the proportion of BP neural network was 89%, CNN was 93%, and multi-scale CNN was 97%.Compared with other prediction models, the relative prediction error of the multi-scale CNN was the smallest.
Figure 8 is a visual bar chart of the three models.It can be clearly seen that the MSE of the multi-scale convolutional neural network was 1432.72, the MAE was 30.93, the R 2 was 0.929, and the results of the three indicators were the best.
The fitting curve of the predicted value and actual value is shown in Figure 9, where the horizontal axis is the predicted sample number and the vertical axis is the deviation between the predicted and actual final cooling temperature.It can be seen that the fitting error of the BP neural network was larger, and the prediction error was within ±25 °C.The fitting error of multi-scale CNN was smaller, and the prediction error was within ±15 °C.In comparison, it can be concluded that the multi-scale CNN has strong anti-volatility, stronger robustness, and better generalization ability.
From the above results, it can be seen that the prediction error of the convolutional neural network model was smaller than that of the BP neural network, and the prediction curves were better fitted.As the neurons of BP neural network are fully connected, the nonlinear fitting ability of network is not strong enough and the prediction effect is poor if the number of hidden layers is small in the experimental process.If the number of hidden layers is too large, there are too many parameters to be trained.At the same time, the model demonstrates overfitting and the training time becomes longer.
The convolutional neural network has the characteristics of local connection, parameter sharing, and pooling operation.Compared with the BP neural network, under the condition of effectively retaining most of the information, the parameters to be trained were greatly reduced, and the corresponding features could be effectively learned from the samples.At the same time, the time consumption was reduced and the accuracy was increased.
The error of the multi-scale convolutional neural network was further reduced compared with the traditional convolutional neural network.The multi-scale convolutional neural network performed a convolution operation with four branches on the input.In this way, the traditional convolution structure was transformed into a sparse structure, and convolution and pooling operations were carried out with convolution kernels of different sizes.The advantage of using convolution kernels of different sizes is that the network can obtain receptive fields of different sizes.Subsequently, the feature information extracted by convolution was spliced according to the depth direction, which means the fusion of features of different scales.In addition, the use of the 1 × 1 convolution kernel reduced the data dimension, calculation parameters, and model complexity, and improved the training effect.

Conclusions
The final cooling temperature is an important technological parameter for plate mechanical properties during the controlled cooling process.Good results were obtained using the BP neural network for final cooling temperature prediction.With the increase in network layers, the BP neural network appeared to show overfitting, and the prediction accuracy was difficult to improve.The convolution structure of the traditional CNN adopted the local connection mode, which overcomes the disadvantages of the BP neural network full connection mode.However, the network structure was fixed, and the improvement of prediction accuracy was limited.In order to improve the prediction accuracy of the final cooling temperature, a prediction model of a hot-rolled steel plate based on a multi-scale CNN was established.The measured steel plate thickness, speed, header flow, and other variables were selected for the final cooling temperature prediction in the multi-scale CNN model.As convolution kernels of different sizes were introduced in the multi-scale CNN to extract different features, the prediction accuracy was higher than that of the BP neural network and the traditional CNN model.Compared with the BP neural network and CNN, the MSE of the multi-scale convolutional neural network decreased by 12.2% and 24.7%, MAE decreased by 7.97% and 19.6%, and R 2 improved by 4.26% and 2.65%, respectively.It was found that the final cooling temperature predicted by the multi-scale CNN agreed with the actual temperature within ±10% error bands.Therefore, multi-scale CNNs can deal with more complex predictive modeling problems and provide a reference for the predictive control of the final cooling temperature.

1 .Figure 6 .Figure 7 .Figure 8 .Figure 9 .
Figure 6.(a) Confusion matrix between the predicted value and actual value of BP neural network in the test set; (b) confusion matrix between the predicted value and actual value of convolutional neural network in the test set; (c) confusion matrix between the predicted value and actual value of multi-scale convolutional neural network in the test set.

Table 1 .
Original data set information.

Table 3 .
Test results of each model.

Table 4 .
Comparison table of partial predicted values and actual values.