Real-Time Dynamic Carbon Content Prediction Model for Second Blowing Stage in BOF Based on CBR and LSTM

: The endpoint carbon content is an important target of converters. The precise prediction of carbon content is the key to endpoint control in converter steelmaking. In this study, a real-time dynamic prediction of the carbon content model for the second-blowing stage of the converter steelmaking process was proposed. First, a case-based reasoning (CBR) algorithm was used to retrieve similar historical cases and their corresponding process parameters in the second blowing stage, based on the process parameters of the new case in the main blowing stage. Next, a long short-term memory (LSTM) model was trained by using process parameters of similar cases from the previous moment as the input and the carbon content for the next moment as the output. Finally, the process parameters of the new case were input into the trained LSTM model to produce a real-time dynamic prediction of the carbon content in the second blowing stage. Actual production data were used for the veriﬁcation, and the results showed that the prediction errors of the proposed model within the ranges of ( − 0.005, 0.005), ( − 0.010, 0.010), ( − 0.015, 0.015) and ( − 0.020, 0.020) were 25%, 54%, 71%, and 91% respectively, which were higher than the prediction accuracies of the traditional carbon integral model, cubic model, and exponential model.


Introduction
Converter steelmaking is a very complex process, involving physical and chemical changes at high temperatures. It features high production efficiency, low energy consumption and low costs, and it is currently the main steelmaking method in China. Endpoint control refers to controlling the composition and temperature of molten steel within a reasonable range. However, existing detection methods cannot continuously monitor the composition and temperature in the converter bath because of its extremely high temperature during smelting. Thus, most steelworks still rely on manual experience to maintain endpoint control, and the control accuracy is low and unstable. Establishing an accurate prediction model would be of great significance for endpoint control.
At present, prediction models for the endpoint control of converter steelmaking can be divided into static and dynamic approaches. The former can be further divided into theoretical and data-driven models. Theoretical models are based on the material balance and heat balance, but they incorporate a large number of assumptions that do not truly reflect the actual production process, so their control accuracy is low. Data-driven models use historical production data for learning and modelling, so they can reflect the production process more effectively. Thus, many authors have used the data-driven approach for in-depth research into the converter steelmaking process. Han et al. used case-based reasoning (CBR) and combined different methods for case retrieval and reuse to establish a control model for oxygen supply [1]. Wang et al. proposed a CBR model for static parametric control based on causal relationships [2]. Park et al. performed a sensitivity analysis to eliminate irrelevant input parameters and established models based on an artificial neural network and least-squares support vector machine for endpoint temperature prediction [3]. Gao et al. proposed an improved twin support vector regression model for endpoint prediction [4,5]. Li et al. established an endpoint prediction model based on a backpropagation neural network and the improved particle swarm optimisation algorithm [6]. Han et al. proposed an endpoint prediction model based on a membrane algorithm for an improved extreme learning machine [7]. Yan et al. established a model for predicting the endpoint carbon content based on a genetic algorithm and kernel partial least-squares regression [8]. Cheng et al. proposed a data-driven multitask learning method for endpoint prediction [9]. Liang et al. proposed a two-step CBR method based on attribute reduction for predicting the endpoint phosphorus content [10]. Wang et al. established an integrated CBR model for predicting endpoint temperature of molten steel in argon oxygen decarburisation [11]. Other endpoint prediction models have been established based on the flame spectrum and flame images taken at the converter mouth [12][13][14]. Data-driven models are also widely used for other steelmaking processes. Iftikhar et al. integrated the grey-box model and bootstrap filter to establish a prediction model for the molten steel temperature that accounts for uncertainty [15]. Lv established a model for sensing the sulphur content during ladle furnace steel refinement [16]. Okura used the grey-box model to produce high prediction accuracy for the temperature of molten steel in a tundish [17]. Although the above data-driven models offer improved prediction accuracy, the following problems remain. Firstly, since the static models are only built with consideration for the initial conditions and static process data (a small dataset without a time-series feature cannot represent the actual production), their prediction accuracy is limited [18]. Secondly, the static model can only make endpoint predictions, so it cannot provide a reference for operators to adjust the operating parameters in the blowing process.
In order to solve the problems of the static model, dynamic models were established. Dynamic models can be divided into those based on sub-lance and off-gas analysis. Among sub-lance-based control models, Yue et al. established prediction models for the endpoint carbon, temperature, and phosphor and manganese contents based on the exponential model, heat balance and thermodynamic equations, respectively [19]. Min et al. used an adaptive network-based fuzzy inference system and relevance vector machine to establish a dynamic control model [20]. Wang et al. established a data-driven real-time prediction model for the endpoint carbon content [21]. Among the off-gas analysis-based control models, Hu et al. used off-gas analysis techniques to establish a carbon integral model for continuously calculating the carbon content and temperature [22,23]. Liu et al. proposed an algorithm based on off-gas analysis for dynamically calculating the carbon content of the converter bath at the second blowing stage [24]. Dofasco (Canada) stopped using the sub-lance after adopting an off-gas analysis-based blowing control system. All the heat sources were directly tapped, and the prediction accuracy was 100%, whereas the re-blowing rate was less than 1% [25]. Lin et al. proposed an improved exponential model that fits and updates the critical carbon content curve simultaneously to predict the carbon content in the second blowing stage [26].
The above dynamic control methods improve the prediction accuracy of endpoint control to some extent but still face several problems. The sub-lance cannot achieve continuous measurement and the previous models based on the sub-lance can only fit the relationship between oxygen supply amount or time and carbon content, without considering the impact of the lance position and bottom blowing flow rate on the carbon content. Furthermore, the off-gas detection equipment is far from the reaction zone in the converter bath, which induces a delay in the off-gas data and cannot make real-time prediction of the carbon content.
In this study, CBR and long short-term memory (LSTM) were used to establish a model for the real-time dynamic prediction of the carbon content in the second blowing stage.
The proposed model should offer improved endpoint control of the converter steelmaking process 2. Converter Steelmaking Process 2.1. Smelting Figure 1 shows the smelting process of converter steelmaking; the capacity of converter is 300 t. The process is performed to reduce impurities in the hot metal. The temperature of the molten iron is raised from 1350 • C to about 1650 • C. The smelting process includes the addition of scrap and hot metal, blowing oxygen, tapping, and alloying [27]. The blowing oxygen process can be divided into two stages: the main blowing stage and the second blowing stage. In the main blowing stage, the scrap steel and molten iron are loaded into the converter bath, and oxygen is blown into the bath through the oxygen lance while additives (e.g., lime, dolomite and sintered ore) are added. In the second-blowing stage (about 85% oxygen supply), a sub-lance is inserted into the converter bath for temperature, sample and carbon (TSC) measurements. The amounts of the oxygen supply and coolant are adjusted according to the measured carbon content and temperature. After the blowing is completed, the sub-lance is inserted into the bath again for temperature, sample and oxygen (TSO) measurements. If the requirements for endpoint control are met, the tapping and alloying processes are performed. Otherwise, the blowing process is repeated.

Decarburisation
During the converter steelmaking process, the decarburisation rate is slow at the beginning and end and fast in the middle. As shown in Figure 2, the blowing process can be divided into three stages according to the decarburisation rate: initial, intermediate, and final. In the initial stage, oxygen is blown into the bath and oxidises Si and Mn in the hot metal. Only part of the oxygen reacts with the carbon, and the decarburisation reaction is relatively gentle. As the Si and Mn contents in the hot metal decrease and the bath temperature increases, the decarburisation rate gradually increases. In the intermediate stage, the decarburisation reaction becomes violent, and the decarburisation rate remains high for a considerable time. In this stage, the decarburisation rate mainly depends on the oxygen supply. In the final stage, the decarburisation reaction causes the carbon content in the bath to decrease to a critical threshold. The limit on the decarburisation reaction becomes the mass transfer of carbon in the bath. Thus, the decarburisation rate decreases with the carbon content in the bath.
The decarburisation rate during the blowing process is described by: where w c is the carbon content in hot metal; K 1 , K 2 , and K 3 are undetermined coefficients; and t A , t B , and t E are the ending times of the initial, intermediate, and final stages, respectively.

Process Parameters for Converter Steelmaking
As shown in Figure 3, the process parameters for converter steelmaking can be divided into two categories according to the data type: single values and time series. (1) Single-value data Single-value data include various elemental compositions (e.g., C, Si, Mn, P and S) in the hot metal, the weight of the hot metal, the temperature of the hot metal, the type and weight of scrap, the type and weight of the slagging agent (e.g., lime and dolomite), the iron ore, heating agent, and so on.
(2) Time-series data According to the delay of automatic acquisition, time-series data can be further divided into non-delayed time-series data and delayed time-series data. The former mainly include changes in the lance position, oxygen flow, and bottom blowing gas flow. The latter mainly generally refer to gas data, including off-gas components and flow. As the detection equipment is away from the converter bath, so the off-gas data feature a certain delay. Therefore, off-gas analysis cannot directly participate in the control of the blowing process, and it is mainly used to evaluate the decarburisation stage.

Case-Based Reasoning
CBR is a critical method in the field of artificial intelligence. Once a new problem occurs, similar problems that have been solved and corresponding solutions can be retrieved from the case library. By comparing differences in the backgrounds and times of the occurrence of present and previous problems, solutions to the latter may be adjusted and altered to solve the former, as shown in Figure 4. The CBR process consists of case description, retrieval, reuse, revision, and retention. Case description is also known as case representation, and it is the basis for CBR. A case is generally described in terms of its characteristics and solutions: where x i is the i-th characteristic of the case, n is the number of characteristics, and s is the solution.
Case retrieval involves searching the case database for the cases closest to the description of the case in question. When a case has many attributes, the case database usually will not include an exact match. Therefore, a computational method is needed to locate the most similar case, such as the Euclidean distance. Suppose that a case has m influencing factors, the j-th influencing factor of a case in the case database is y j and the j-th influencing factor for the new case is x j . Next, the Euclidean distance between the new case and the case from the case database is described by The similarity between the cases can be calculated as follows:

Long Short-Term Memory Neural Network
LSTM was first proposed by Hochreiter and Schmidhuber in 1997. In contrast to traditional neural networks, its hidden layers consist of basic units, called memory blocks, in a structure, as shown in Figure 5. The memory block has three gates (forget gate f, input gate I, and output gate o) and a memory cell c; the value ⊕ represents the addition of two vectors, ⊗ represents the dot product of two vectors, σ is the sigmoid activation function, and tanh is the hyperbolic tangent activation function. The calculation formula of the model is as follows: where f t is the output of the forget gate at time t, σ is the sigmoid function, h t−1 is the final output of the cell unit of the previous time and x t is the input of the present time t. The value i t is the output of the input gate at time t, ∼ C t is the input cell state at time t, C t is the memory cell state at time t and h t is the final output at time t. The values W and b are the matrix of coefficients and bias for each gate, respectively.

Prediction Model Bsaed on off-Gas Analysis
(1) Carbon integral model Based on the principle of mass balance, the carbon integral model first calculates the initial carbon amount of the molten steel according to the composition of steelmaking raw materials and then subtracts the amount of carbon overflowing from the off-gas in the form of CO and CO 2 ; the remaining part is the amount of carbon in the molten steel in the molten pool.
According to the flow of off-gas and the percentage content of CO and CO 2 in the off-gas, the decarburization rate in the molten pool can be calculated by using the carbon balance in the converter smelting process: where: V c is the decarburization rate of molten pool; Q smoke is the flow of off-gas; ϕ(CO), ϕ(CO 2 ) is the percentage content of CO and CO 2 in the off-gas; Then the instantaneous carbon content in the molten pool can be expressed as: where: w(C) (t) is the carbon content in the molten pool at time t; W C ini is the carbon content in the molten pool at the initial conditions; W steel is the weight of the molten steel in the molten pool; (2) Exponential decay model The exponential model is the most widely used decarburization characteristic model in the later stage of converter. It assumes that there is an exponential attenuation relationship between the decarburization rate and the carbon content in molten pool in the later stage of converter blowing.
Equation (14) is obtained from Equation (13): where k 1 , k 2 is the undetermined coefficient of the model. Cubic model In addition to the exponential model, the curve used to describe the decarburization characteristics of the converter in the later stage also has a cubic equation: where b 0 , b 1 , b 2 , b 3 is the undetermined coefficient of the model.

Principles of Model
The model framework can be divided into three parts: similar case retrieval, model training, and model validation, as shown in Figure 6. Figure 7 shows a similar case retrieval process. The process parameters of the main blowing stage (e.g., the composition, temperature, and weight of the hot metal, the type and weight of scrap, and the measurement results of the TSC) of the new case are used to retrieve m similar cases in the historical case database and the corresponding time-series data (e.g., the lance position, the oxygen flow, the bottom-blowing gas flow, and variations in the off-gas and carbon content) in the second blowing stage. In this paper, m is the hyperparameter of the model, which is determined through experiment.     Figure 9 shows the model verification process. The trained LSTM models take the inputs for the last moment of the new case (i.e., lance position, oxygen flow, bottom blowing gas flow, and carbon content) to obtain the change in carbon content for the next moment. The predicted carbon content is then used as the input for the next moment, and so on. Eventually, a curve is obtained for the variation in the carbon content for the new case.

Datasets
In order to verify the prediction accuracy of the model, 1209 heats of the actual production data collected from B steel plant were used. The statistical results of the single-process parameters of the converter blowing process are shown in Tables 1 and 2.  Where TSC [C] means the carbon content of TSC detector and TSC [T] means the temperature of TSC detector. The value TSO [C] means the TSO measurement result of carbon content. The length of the time-series in Table 2 means the time interval between the end-point and the beginning-point of the time-series data. The role of argon bowing is to stir the molten pool to achieve a uniform chemical composition and temperature of the molten steel and to accelerate the chemical reaction.

Similar Case Retrieval
Because the converter steelmaking process is a system that is far from equilibrium, the sub-lance detection results cannot fully reflect the state of the converter bath. Therefore, the conditions of the raw materials and the operating process in the main blowing stage need to be comprehensively considered. The inputs of the CBR model are the carbon content of the hot metal, the silicon content of the hot metal, the manganese content of the hot metal, the phosphorus content of the hot metal, the temperature of the hot metal, weight of the hot metal, the weight of the scrap, the amount of lime, the amount of dolomite, the level of oxygen consumption in the main blowing stage, the carbon content of the TSC detector, and the temperature of the TSC detector. The Euclidean distance was used to measure similarity. The data were standardised to (0, 1), and the four cases with the highest similarity were selected for reuse. This paper uses an example when m = 4 to show the process of the model. Table 3 presents the similar cases retrieved by CBR for a certain heat. Heat 1 is the new case, and heats 2, 3, 4, and 5 are similar cases retrieved from the case database.  Figure 10 shows the lance position, oxygen flow, bottom-blowing gas flow and carbon content variation in the second blowing stage of similar cases; the change in carbon content is fitted to a curve according to the carbon integral model.

Model Training and Validation
The lance position, oxygen flow rate, bottom-blowing gas flow and carbon content at the previous moment were used as the model inputs, and the change in carbon content at the next moment was used as the model output. The model was set to 10 neurons, 1 batch and 500 epochs, with the mean absolute error as the loss function and Adam as the optimisation solver. Figure 11 shows the error curve of the trained model.  As can be seen from the above Figure, the endpoint prediction result of the example was 0.0477 and the actual value was 0.046. The prediction error was 0.0011.
In order to further verify the model, all 1209 heat data were divided into training set and test set at random, of which 1109 heats were in the training set and 100 heats were in the testing set. The prediction results of the model when m = 4 is shown in Figure 13. The hit rate of the prediction error within the range of (−0.02, 0.02) was 91%. This paper also presents the influence of the hyperparameter on the model's prediction accuracy. The prediction results are shown in Figure 14. As can be seen from the above Figure, as the number of cases increases, the prediction accuracy of the model increases first and then decreases. The prediction accuracy of the model is highest when m = 4. This shows that increasing the number of cases when training the model helps to improve the prediction accuracy of the model, but as the similarity of the retrieved cases decreases, the reference value of the case decreases, which reduces the prediction accuracy of the model instead.
To further verify the prediction accuracy of the proposed model, its performance was compared against those of a carbon integral model, cubic model and exponential model. The prediction results are shown in Figure 15. The prediction accuracy of the proposed model (m = 4) reached 25%, 54%, 74%, and 91% within the ranges of (−0.005, 0.005), (−0.010, 0.010), (−0.015, 0.015), and (−0.020, 0.020), respectively. Compared with the other three models, the proposed model improved the prediction accuracy in the range of (−0.02, 0.02) by 22%, 14%, and 6%, respectively. This demonstrates that the established model can effectively predict the carbon content at the end of the converter steelmaking process and provide guidance to operators.

Conclusions
The precise prediction of carbon content is the key to endpoint control in converter steelmaking. In this study, a data-driven model based on CBR and LSTM was established for the endpoint carbon content by predicting the carbon content variation in the second blowing stage. The prediction accuracy of the proposed model (m = 4) within the ranges of (−0.005, 0.005), (−0.010, 0.010), (−0.015, 0.015), and (−0.020, 0.020) reached 25%, 54%, 71%, and 91%, respectively. The success rate was higher than those of the traditional carbon integral model, the cubic model, and the index model.
In future studies, we will investigate the following problems in the endpoint control of the converter.
(1) The dynamic prediction model to predict the temperature of molten steel in the second blowing stage. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Some of the data provided in this study are from third party and some are from my own research. All data has not been stored in the database.