Machine Learning-Based Prediction of a BOS Reactor Performance from Operating Parameters

A machine learning-based analysis was applied to process data obtained from a Basic Oxygen Steelmaking (BOS) pilot plant. The first purpose was to identify correlations between operating parameters and reactor performance, defined as rate of decarburization (dc/dt). Correlation analysis showed, as expected a strong positive correlation between the rate of decarburization (dc/dt) and total oxygen flow. On the other hand, the decarburization rate exhibited a negative correlation with lance height. Less obviously, the decarburization rate, also showed a positive correlation with temperature of the waste gas and CO2 content in the waste gas. The second purpose was to train the pilot-plant dataset and develop a neural network based regression to predict the decarburization rate. This was used to predict the decarburization rate in a BOS furnace in an actual manufacturing plant based on lance height and total oxygen flow. The performance was satisfactory with a coefficient of determination of 0.98, confirming that the trained model can adequately predict the variation in the decarburization rate (dc/dt) within BOS reactors. Record Type: Published Article Submitted To: LAPSE (Living Archive for Process Systems Engineering) Citation (overall record, always the latest version): LAPSE:2020.0515 Citation (this specific file, latest version): LAPSE:2020.0515-1 Citation (this specific file, this version): LAPSE:2020.0515-1v1 DOI of Published Version: https://doi.org/10.3390/pr8030371 License: Creative Commons Attribution 4.0 International (CC BY 4.0) Powered by TCPDF (www.tcpdf.org)


Introduction
The processing of lower grade ores is a topic of particular interest, as fluctuation in raw material cost is a key challenge to sustainability in the steel industry. Raw materials flexibility is to a great extent enabled in the basic oxygen steelmaking (BOS) process, wherein oxidizable impurities, such as phosphorus (P) and silicon (Si), are separated into a slag phase. However, the primary purpose of the BOS process is to convert pig iron into crude steel and, therefore, impurity removal being a secondary function of the converter, has to be balanced by the decarburization process.
As a result, processing parameters have to be balanced, and many different chemical reactions compete for the available oxygen. There are also many processing parameters that are interconnected in complicated ways that cannot be readily predicted. The BOS process is in general advised by static mass-and heat-balanced models, which predict, based on inputs, the resultant end-point. The input parameters include at least the quantities of hot metal, steel scrap, iron ore, fluxes and oxygen to blow, while the resultant end-point is the temperature, weight and composition of steel produced. The process control is supported by various measurements, such as in-blow sampling, end-blow sampling, oxygen flow rate/ total oxygen blown, lance height, waste gas flow rate, waste gas pressure, waste gas composition, etc. The operating information is recorded and stored in a plant manufacturing system normally at the interval of one second. Dynamic process modeling of the BOS process, depending on the researchers, can be based on thermodynamics [1], multizone reactions [2][3][4], empirical or

The Multiphysics of the Basic Oxygen Steelmaking Process
The overall purpose of the BOS process is to refine hot metal (pig iron) to a steel of desired chemistry (C, Si, Mn, P and S) and temperature in a controlled manner in a refractory converter (as shown schematically in Figure 1). In the BOS process wherein C-rich pig iron is converted to steel, pure oxygen is injected though a water-cooled lance into the molten metal bath, and the bath is covered by molten oxides (slag phase). The role of the oxygen is to selectively react with oxidizable elements and primarily remove the C (from >4% to less than 0.1%) as a gas phase but also remove other unwanted impurities, stemming from the iron ore as oxides, that separate and dissolve into the slag phase.
The kinetics of the BOS process is very complex, because it involves multiple simultaneous processes. Simultaneous multiphase interactions, heat and mass transfer, gas-slag-metal chemical reactions in multiple zones and vigorous fluid flow caused by the impingement of the oxygen jet occur in a BOS reactor at high temperatures. In addition, it is a dynamic and transient process, which makes the kinetics involved in a reactor more complex. Direct measurements of temperatures and chemistries are very difficult due to the nature of the process, which involves harsh conditions. That is why many researchers have been trying to address these difficulties through modeling the process.
A multiphysics description of the converter process, with the ultimate aim of predicting carbon content in the melt, will have to involve several sub-models at different scales. These models will have to capture various phenomenon occurring in the gas/metal, metal/slag and metal/slag/gas mixtures, as well as transport processes in the bulk metal and bulk slag. The models include, but are not limited to, the following: lowered into the position, the outer three bars are retracted, lifting the lids in situ and allowing the sample pots to fill. A schematic of the converter is given in Figure 1 with dimensions. The converter had a lining of magnesia-carbon (fired and fused) bricks. Gas inlets include a single bath agitation tuyere, blowing nitrogen at a rate of 0.5 Nm 3 min −1 , and a water-cooled oxygen lance blowing oxygen at a norminal rate of 17 Nm 3 min −1 . During the blow, the following was measured as output data from the reactor: oxygen flow rate/total oxygen flow; lance height and waste gas (temperature; flow rate and composition of CO, CO2, N2 and O2). There are various models, such as static (mass balance model, thermodynamic model, etc.); empirical (regression based on individual plant data) and dynamic. These models are intended to predict the end-point of steel or the steel/slag compositions as a function of blowing time and to control/optimize BOF steelmaking. In the BOF process, the blown oxygen is used for decarburization; the removal of other impurities of Si, Mn and P; the formation of iron oxides in the slag and postcombustion. A typical example is off-gas information. The industry has been exploring the use of offgas information for process control. The most important one is to use the off-gas composition to monitor the decarburization rate in the BOS converter. If the decarburization rate can be precisely controlled or predicted, then under the known oxygen-blowing conditions, the amount of oxygen in the slag (iron oxides) and the amount of oxygen to remove other elements, especially for phosphorus, can be predicted. The decarburization rate can be calculated by the off-gas analysis (CO, CO2, N2 and O2). Then, we will link the decarburization rate (calculated from off-gas composition) to the operating parameter(s). This will provide the foundation to predict the decarburization rate (dc/dt) and, as a (i) Multiphase fluid flow in the various multiphase mixtures. Since this is a complex and computational task by itself, it is often simplified to 2D and informed by physical water models [5,21,22].
(ii) Reaction kinetics and competition between the injected oxygen and various dissolved elements, such as C, but also P, Si, Mn, etc. [4,23].
(iii) Microscale interactions between the phases, which include the separation of elements and precipitated phases across interfaces and emulsification [4,24,25].
The decarburization process mainly takes place in two reaction zones: namely, the jet impact zone and the gas-slag-metal emulsion zone [4,24]. At the emulsion zone, a sequence of chemical reactions take place: Firstly, FeO reacts with CO to form CO 2 at the slag-gas interface. Secondly, CO 2 transfers from the slag-gas interface to the gas-steel droplet interface via the gas phase. Subsequently, CO 2 reacts with carbon in an iron droplet at the metal-gas interface, and the formed CO is transferred back to the slag-gas interface via the gas phase. It should be noted that there is disagreement among researchers about the description of the mechanisms and the rate-controlling step(s) under different operating conditions [26]. Some other studies proposed that the reaction rate is controlled by the interfacial area change because of material exchanges during the process. The phase field models developed according to the interfacial instability normally consider only the interfacial tension and fluid flow [24]. The coupling of models across scales is a hard enough (and computationally intensive) problem, which can be simplified and described under simplified process windows. However, the predictive capability under widely ranging process parameters is not possible. Consequently, all the previous studies, although mechanistic in nature, have different levels of simplification to address the complex phenomena in the BOS reactor. The removal of elements (C, Si, Mn, P and S) from pig iron in the industrial BOS process is mainly controlled by two operating parameters: lance height (i.e., the height of the lance tip above the static steel bath) and oxygen blowing rate (or total oxygen flow/blown). The control strategy can be varying one or both of these two parameters. The lance profile (i.e., lance height as a function of blowing time) is critically important for the efficient production of high-quality steel. For example, a high lance position can help stir and oxidize the slag with a high FeO content, which can accelerate the removal of carbon. A low lance height can substantially increase the impingement of the oxygen jet into the metal bath and splash the metal droplets into the slag layer, which accelerates the formation of the emulsion zone and the removal of carbon and other elements through the interfacial reactions between the massive numbers of metal droplets with emulsified slag [27]. The refining behavior (e.g., the decarburization rate) has a close relationship with the lance profile and oxygen blowing rate (or total oxygen flow). Thus, in this work, we intend to develop a novel algorithm to predict the refining behavior of the BOS reactor (using the decarburization rate as an example) by using machine learning to analyze the massive dataset of operating parameters, including lance height and total oxygen flow.
Using machine learning, we propose a technique that does not employ any simplifying assumptions. On the contrary, the algorithm was trained on a real dataset. In addition, the machine learning algorithm does not require to take into account any of the physics involved in the process. It actually provides a ring road to all the complexities involved in a BOS reactor. In addition, as shown later, the technique is able to predict the decarburization rate precisely. The aim of the current study is, thus, twofold: (1) Firstly, to identify the correlations and trends from a set of processing parameters on carbon removal. (2) Secondly, to try and circumvent the granularity of multiphysics and develop an artificial intelligence-based predictive model for carbon removal.

Dataset
A six-ton BOF (Basic Oxygen Furnace) pilot plant converter located at Swerea MEFOS, Sweden, as part of the European commission funded project IMPHOS: Improved Phosphorus refining [2]. In this reactor, operating conditions, such as oxygen blow rate (total oxygen flow), lance height, the quantities of hot metal and fluxes used and off-gas analysis, are recorded. Gas/slag/metal emulsion samples were taken from seven various heights and at 2-min intervals from the start of blowing during a blow via robotic delivery. The sampler lances consist of an inner and outer structure of three mild steel bars, the inner being joined to the sides of inline sample pots and the outer joined to the sides of the disc sample pot lids. The sampler lance is lowered into the converter through an opening in the top (with a slight offset from the oxygen lance), with the lids in the closed position. Once lowered into the position, the outer three bars are retracted, lifting the lids in situ and allowing the sample pots to fill. A schematic of the converter is given in Figure 1 with dimensions. The converter had a lining of magnesia-carbon (fired and fused) bricks. Gas inlets include a single bath agitation tuyere, blowing nitrogen at a rate of 0.5 Nm 3 min −1 , and a water-cooled oxygen lance blowing oxygen at a norminal rate of 17 Nm 3 min −1 . During the blow, the following was measured as output data from the reactor: oxygen flow rate/total oxygen flow; lance height and waste gas (temperature; flow rate and composition of CO, CO 2 , N 2 and O 2 ).
There are various models, such as static (mass balance model, thermodynamic model, etc.); empirical (regression based on individual plant data) and dynamic. These models are intended to predict the end-point of steel or the steel/slag compositions as a function of blowing time and to control/optimize BOF steelmaking. In the BOF process, the blown oxygen is used for decarburization; the removal of other impurities of Si, Mn and P; the formation of iron oxides in the slag and post-combustion. A typical example is off-gas information. The industry has been exploring the use of off-gas information for process control. The most important one is to use the off-gas composition to monitor the decarburization rate in the BOS converter. If the decarburization rate can be precisely controlled or predicted, then under the known oxygen-blowing conditions, the amount of oxygen in the slag (iron oxides) and the amount of oxygen to remove other elements, especially for phosphorus, can be predicted. The decarburization rate can be calculated by the off-gas analysis (CO, CO 2 , N 2 and O 2 ). Then, we will link the decarburization rate (calculated from off-gas composition) to the operating parameter(s). This will provide the foundation to predict the decarburization rate (dc/dt) and, as a result, dynamically control the converter operation. The features we are investigating from the dataset are presented in Table 1. The other operating conditions are listed in Table 2, including the chemistries of hot metal; steel and slag; temperature; charges (hot metal, scrap and fluxes) and lance height. Note: dC/dt (kg C/s) is calculated from the waste gas composition, which was recorded in the operating system in every second, while dc/dt (kg C/min) is converted from dC/dt (kg C/s). The decarburization rate is generally expressed as dc/dt (kg C/min), and therefore, the target of this study is to predict dc/dt (kg C/min) by a machine-learning (ML)-based algorithm from the operating parameters. It should be pointed out that dC/dt (kg C/s) is still mentioned in this study for the purpose of explanation.
In order to validate the correlation between the dc/dt and the operating parameters (the ML-based methodology, as explained in Section 4, Methods, to predict dc/dt from operating parameters) established from the six-ton BOF converter, production information has been taken from a 330-ton converter at the Tata Steel UK Port Talbot Plant. The operating parameters and the resultant information include oxygen flow rate; total oxygen flow; off-gas analysis (temperature, rate, composition CO, CO 2 , N 2 and O 2 ) and dC/dt (kg C/s) calculated from the off-gas analysis. A total of 1100 data points were obtained at different times for each value corresponding to Table 1, from the 6-ton pilot reactor, which were used for training and test data. In addition, a total of 1200 points corresponding to known conditions in the 330-ton full scale reactor were used as additional test data to investigate if the trained model could predict the decarburization rate in a scaled-up system. It should be noted that differences in the levels/quantities of the parameters exist between the pilot and industrial converters; for example, the heat size (6t for pilot converter and 330t for industrial converter), blowing time (~18 min for the 6t pilot converter and~20 min for the 330t industrial converter), lance height (110~180 mm for the 6t pilot converter and 2.0~2.6 meters for the 330t industrial converter), O 2 flow rate (14.00~17.30 Nm 3 /min for the 6t pilot converter and 598~1025 Nm 3 /min for the 330t industrial converter) and total oxygen flow (278 Nm 3 for the 6t pilot converter and 18863 Nm 3 for the 330t industrial converter). However, the parameters and their features recorded in their systems are similar for both conditions, and in this paper, we use the dataset from one heat of both the 6t pilot and the 330t industrial converters, respectively, to develop and demonstrate the machining learning-based model for the prediction of the decarburization rate from operating parameters under different conditions.

Method
While neural networks are mainly known for applications in deep learning, they can be easily used for regression problems and are especially suitable when linear regression does not work. Neural network regression is another supervised learning algorithm available in Microsoft Azure Machine Learning Studio and requires a label column which has to be numerical [20]. Since our label column (dc/dt) was numerical, and the results generated by classical linear regression had low accuracy, we used a neural network regression. Linear regression resulted in a coefficient of determination of 0.45 (Table 3, column 2), which was much lower than that by using neural network regression (0.99) and deemed to be below what is acceptable. The output of a single neuron has a form of g(Σ j w ji x j ), where w ji are the weights and x j are the input. A continuous activation function, e.g., sigmoid function of a form of 1/(1 + e −x ), was employed. We used "Parameter Range" in the Create trainer mode and subsequently employed the Tune Model Hyperparameters module to iterate over the possible combinations of parameters to achieve the optimal configuration. In "Hidden layer specification" we selected "fully connected case" to create a model that has exactly one hidden layer, and the output layer is fully connected to the hidden layer, and the hidden layer is fully connected to the input layer. Number of nodes in the hidden layer and learning rate were used as the hyperparameters. The optimized number of hidden nodes was calculated to be 100. Another hyperparameter that was tuned was the learning rate. Learning rate is the step taken at each iteration before correction. A large value of the learning rate makes the convergence faster; however, it can result in overshooting the local minima. The tuned learning rate was calculated to be 0.1. Min-max normalizer was selected to linearly rescale each feature to the [0, 1] interval. Rescaling to the [0, 1] interval was carried out by shifting the values of each features, i.e., the minimum value is 0, and then dividing by the new maximum value. The error after each iteration is calculated and "back-propagated" to the network using the chain rule.  Figure 2 shows the correlation matrix between different features, which was generated by the available dataset. The correlation between two datasets varies between -1 and 1, with 0 implying no correlation. Correlation of +1 implies a perfect positive correlation (e.g., as x increases, so does y), and -1 implies a perfect negative correlation (as x increases, y decreases). Dark blue implies a strong positive correlation, and lighter pink shows a strong negative correlation. For example, there is a strong positive correlation between Total O 2 flow and dO/dt, or there is a strong negative correlation between O 2 waste gas and dc/dt.

Machine Learning Predictions of the Decarburization Rate dc/dt
Total oxygen flow has nearly perfect positive (>0.9) correlation with dc/dt (kg C/min) or dC/dt (kg C/s), total C removed, dO/dt, dOs/dt (0.71) and waste gas CO 2 composition. The total oxygen flow was mainly used for decarburization escaping from the reactor as waste gas and remaining in the slag (oxidization of Si, Mn, Fe and P elements in the liquid metal). The former is directly linked with parameters dc/dt (or dC/dt), total C removed, dO/dt and waste gas CO 2 composition, while the latter is in the form of dOs/dt. Furthermore, dc/dt (or dC/dt) has nearly perfect positive correlation with total O 2 flow and waste gas CO 2 composition and nearly perfect negative correlation with lance height and waste gas O 2 . Except the oxygen from steel scrap and iron ore coolant, the main oxygen comes from the total oxygen blown through the lance that is also the main oxygen source for decarburization. Thus, the dc/dt (or dC/dt) has a nearly perfect correlation with the total O 2 flow. During the pilot plant experiment, O 2 was blown at a fixed flow rate, and the refining performance in the converter was controlled by adjusting the lance height. The decarburization mainly occurred in two zones of a hot spot zone (at the vicinity of the location where the lance releases oxygen to the bath) and gas-slag-metal droplet emulsification zone (where the available area for slag/metal/gas reaction is high). Lower lance height increases the hot spot zone and the amount of metal droplets in the emulsification zone, and the latter increases the decarburization in the gas-slag-metal droplet zone. Therefore, the overall decarburization rate increases with decreasing the lance height, which explains the observed negative correlation. The decarburization rate is calculated from the waste gas composition according to the equation dc/dt dc dt = (CO + CO 2 ) × waste gas f low rate × 12 22.4 . This explains well the perfect positive correlation between dc/dt (or dC/dt) and waste gas CO 2 concentration. From the above analysis, both the total O 2 flow and the lance height are the controlling parameters for the decarburization, which indicates that the probability of predicting the decarburization rate by the combination of both parameters (see Section 5.3. Prediction of the dc/dt After Excluding Parameters). Finally, dO/dt has nearly perfect positive correlation with the total O 2 flow, dc/dt (or dC/dt) and waste gas CO 2 composition but nearly perfect negative correlation with the lance height and waste gas O 2 content (similar explanation to that of the dc/dt or dC/dt dependence upon the parameters).

Prediction of dc/dt With All the Features Included in the Dataset
Initially, we used all the features present in the dataset (including dC/dt) to predict the value of the dc/dt. Figure 3 shows a statistical comparison between the actual values of the dc/dt (Figure 3a) and the predicted ones (Figure 3b). The blue bars show the histogram of the values. The green shadowed area shows the cumulative distribution function (CDF) and the blue shadowed area shows the probability density function (PDF). As shown in the figure, the predicted functions closely follow those of the actual values of the dc/dt. Both histograms show a frequency around 700 for a dc/dt value close to 0. In addition, both the histograms of the actual value of dc/dt and that of the predicted value show a bump in frequency around 100 for dc/dt values close to 9.6 and a bump at a frequency about 100 for dc/dt values around 17. As evident from Figure 3, the CDF of the actual values and that of the predicted values have a very similar shape. This confirms that the algorithm successfully captured the statistics of the dataset. Initially, we used all the features present in the dataset (including dC/dt) to predict the value of the dc/dt. Figure 3 shows a statistical comparison between the actual values of the dc/dt (Figure 3a) and the predicted ones (Figure 3b). The blue bars show the histogram of the values. The green shadowed area shows the cumulative distribution function (CDF) and the blue shadowed area shows the probability density function (PDF). As shown in the figure, the predicted functions closely follow those of the actual values of the dc/dt. Both histograms show a frequency around 700 for a dc/dt value close to 0. In addition, both the histograms of the actual value of dc/dt and that of the predicted value show a bump in frequency around 100 for dc/dt values close to 9.6 and a bump at a frequency about 100 for dc/dt values around 17. As evident from Figure 3, the CDF of the actual values and that of the predicted values have a very similar shape. This confirms that the algorithm successfully captured the statistics of the dataset.  Table 4 compares the statistical metrics between the actual and predicted values of the dc/dt. Although the statistical metrics for the predicted values are slightly higher, it can be concluded that the machine-learning model successfully captured the statistics of the values of the dc/dt. This is further confirmed by Figure 4, the scatter plot, showing that the neural network regression model could accurately predict the values of the dc/dt. It should be noted that a scatter plot that compares the predicted values of the test set with the "true" values of the target is one the main metrics to evaluate a model performance. As is shown in Figure 4, the scored label as a function of the "true" value of the dc/dt follows a y = x line, meaning the model performed very well.   Table 4 compares the statistical metrics between the actual and predicted values of the dc/dt. Although the statistical metrics for the predicted values are slightly higher, it can be concluded that the machine-learning model successfully captured the statistics of the values of the dc/dt. This is further confirmed by Figure 4, the scatter plot, showing that the neural network regression model could accurately predict the values of the dc/dt. It should be noted that a scatter plot that compares the predicted values of the test set with the "true" values of the target is one the main metrics to evaluate a model performance. As is shown in Figure 4, the scored label as a function of the "true" value of the dc/dt follows a y = x line, meaning the model performed very well.   Figure 5 shows the error histogram of the neural network regression model. Errors with a value of 0.000049 had the highest frequency, confirming the excellent performance of the model. Table 3 (column 3) shows the performance metrics of the neural network regression model. The metrics were recorded to be 0.029 for the mean absolute error, 0.043 for the root mean squared error, 0.005 for the relative absolute error and 0.0046 for the relative squared error. The coefficient of determination for the model was calculated to be 0.99, which shows the excellent performance of the neural network regression algorithm.    Table 3 (column 3) shows the performance metrics of the neural network regression model. The metrics were recorded to be 0.029 for the mean absolute error, 0.043 for the root mean squared error, 0.005 for the relative absolute error and 0.0046 for the relative squared error. The coefficient of determination for the model was calculated to be 0.99, which shows the excellent performance of the neural network regression algorithm.   Figure 5 shows the error histogram of the neural network regression model. Errors with a value of 0.000049 had the highest frequency, confirming the excellent performance of the model. Table 3 (column 3) shows the performance metrics of the neural network regression model. The metrics were recorded to be 0.029 for the mean absolute error, 0.043 for the root mean squared error, 0.005 for the relative absolute error and 0.0046 for the relative squared error. The coefficient of determination for the model was calculated to be 0.99, which shows the excellent performance of the neural network regression algorithm.  We used the permutation method to measure the feature importance for the prediction of the dc/dt. The feature importance values are shown in Table 5. It was computed that the dC/dt had the highest importance, with a value of 9.06. In second place stands the dO/dt with a value of 0.16.

Prediction of the dc/dt After Excluding Parameters
In order to establish whether the dc/dt could be predicted with reasonable accuracy with fewer parameters, parameters were successively removed. As discussed in the previous section, the dC/dt had a very high prediction power for the values of the dc/dt. However, it was anticipated that the dC/dt might lead to data leakage, as the dC/dt is calculated from the value of the dc/dt. Thus, the feature dC/dt was removed in order to measure the performance of the neural network regression model (Table 3, column 4). It was found that the scored labels (predicted values) have the very similar histogram, CDF and PDF graphs to those of the actual values of the dc/dt. Similar to the previous calculations, the statistical parameters belonging to the predicted values were slightly higher; however, the difference is very low, and it can be thus said that the model was able to capture the statistics of the data when the feature dC/dt was removed.
Then, we attempted to predict the value of the dc/dt using only two features: namely, total O 2 flow and lance height, because these are the two inputs that are controllable in an industrial reactor. The oxygen blown into the converter will be used for the decarburization (dO/dt), oxidization of other elements into the slag (dOs/dt), oxygen in the waste gas, etc.; therefore, in the prediction of the dc/dt, the features of the dO/dt, dOs/dt, etc. are excluded. Figure 6 shows the scatter plot of the dc/dt versus the predicted values. As is evident from the figure, except for some values of the dc/dt where the predicted values were slightly lower, for most of the values, the neural network regression model was able to predict the dc/dt with a good accuracy. We used the permutation method to measure the feature importance for the prediction of the dc/dt. The feature importance values are shown in Table 5. It was computed that the dC/dt had the highest importance, with a value of 9.06. In second place stands the dO/dt with a value of 0.16.

Prediction of the dc/dt After Excluding Parameters
In order to establish whether the dc/dt could be predicted with reasonable accuracy with fewer parameters, parameters were successively removed. As discussed in the previous section, the dC/dt had a very high prediction power for the values of the dc/dt. However, it was anticipated that the dC/dt might lead to data leakage, as the dC/dt is calculated from the value of the dc/dt. Thus, the feature dC/dt was removed in order to measure the performance of the neural network regression model (Table 3, column 4). It was found that the scored labels (predicted values) have the very similar histogram, CDF and PDF graphs to those of the actual values of the dc/dt. Similar to the previous calculations, the statistical parameters belonging to the predicted values were slightly higher; however, the difference is very low, and it can be thus said that the model was able to capture the statistics of the data when the feature dC/dt was removed.
Then, we attempted to predict the value of the dc/dt using only two features: namely, total O2 flow and lance height, because these are the two inputs that are controllable in an industrial reactor. The oxygen blown into the converter will be used for the decarburization (dO/dt), oxidization of other elements into the slag (dOs/dt), oxygen in the waste gas, etc.; therefore, in the prediction of the dc/dt, the features of the dO/dt, dOs/dt, etc. are excluded. Figure 6 shows the scatter plot of the dc/dt versus the predicted values. As is evident from the figure, except for some values of the dc/dt where the predicted values were slightly lower, for most of the values, the neural network regression model was able to predict the dc/dt with a good accuracy.  Figure 7 shows the error histogram for this prediction. The most frequent error was 0.0000033. Table 3 (column 5) shows the performance metrics for the prediction of the dc/dt using only two features. For this computation, the mean absolute error was calculated to be 0.034, root mean squared error to be 0.06, relative absolute error to be 0.008 and relative squared error to be 0.0001. The coefficient of determination was computed to be 0.97. These performance metrics showed that we were able to successfully predict the value of the dc/dt using only the two variables of total oxygen flow and lance height.
Processes 2019, 7, x FOR PEER REVIEW 13 of 16 Figure 6. Scatter plot comparing the predicted values of the dc/dt using the neural network method with the actual values of dc/dt by using the two features of total oxygen flow and lance height in the pilot dataset. Figure 7 shows the error histogram for this prediction. The most frequent error was 0.0000033. Table 3 (column 5) shows the performance metrics for the prediction of the dc/dt using only two features. For this computation, the mean absolute error was calculated to be 0.034, root mean squared error to be 0.06, relative absolute error to be 0.008 and relative squared error to be 0.0001. The coefficient of determination was computed to be 0.97. These performance metrics showed that we were able to successfully predict the value of the dc/dt using only the two variables of total oxygen flow and lance height.

Prediction of the dc/dt for an Industrial Dataset
We used a dataset that was acquired from an industrial reactor to evaluate the performance of our trained model. Figure 8 shows the comparison between the predicted values and the actual values of the dc/dt. Again, the authors would like to emphasize if the predicted value of the target as a function of the "true" values of the test samples follows the "y = x" line, one can conclude that the model performance is at its best. It is worth mentioning that we only used two features for this prediction: namely, total O2 flow and lance height. It is always one the goals of any machine-learning development to predict the target with a fewer number of variables/predictors. This is because if the model deals with many predictor variables, then there is a high chance that there are hidden relationships between some of them, leading to redundancy and, even if there is no relationship between any of them, the model can suffer from overfitting when there are a large number of predictor variables. In addition, a model that can predict with a fewer number of predictor variables is more practical due to some considerations, such as data availability, storage, computer resources, time taken for computation, etc. Thus, this is one of the achievements of this study, that the developed model can predict the target by only using two predictor variables.
The scatter plot demonstrates that our trained model could predict the dc/dt precisely. Figure 9 shows the error histogram for this prediction. The most frequent error was 0.000026. The metrics ( Table 3, column 6) were recorded to be 0.25, 0.62, 0.04 and 0.009 for the mean absolute error, root mean squared error, relative absolute error and relative squared error, respectively. The coefficient

Prediction of the dc/dt for an Industrial Dataset
We used a dataset that was acquired from an industrial reactor to evaluate the performance of our trained model. Figure 8 shows the comparison between the predicted values and the actual values of the dc/dt. Again, the authors would like to emphasize if the predicted value of the target as a function of the "true" values of the test samples follows the "y = x" line, one can conclude that the model performance is at its best. It is worth mentioning that we only used two features for this prediction: namely, total O 2 flow and lance height. It is always one the goals of any machine-learning development to predict the target with a fewer number of variables/predictors. This is because if the model deals with many predictor variables, then there is a high chance that there are hidden relationships between some of them, leading to redundancy and, even if there is no relationship between any of them, the model can suffer from overfitting when there are a large number of predictor variables. In addition, a model that can predict with a fewer number of predictor variables is more practical due to some considerations, such as data availability, storage, computer resources, time taken for computation, etc. Thus, this is one of the achievements of this study, that the developed model can predict the target by only using two predictor variables.
The scatter plot demonstrates that our trained model could predict the dc/dt precisely. Figure 9 shows the error histogram for this prediction. The most frequent error was 0.000026. The metrics ( Table 3, column 6) were recorded to be 0.25, 0.62, 0.04 and 0.009 for the mean absolute error, root mean squared error, relative absolute error and relative squared error, respectively. The coefficient of determination was computed to be 0.98. These values confirm that our trained model can be used at industries to predict and control the variation of the dc/dt in an actual reactor. Figure 8 shows that the machine-learning algorithm can predict the decarburization rate very accurately without employing any simplifications and without taking into account all the reactions, interactions, mass and heat transfers and fluid flows. In fact, all of these parameters are already inherited in the dataset, and an algorithm trained on the real dataset naturally learns the relationships between all of the parameters involved in the process without exactly knowing the physical meanings of them.
Processes 2019, 7, x FOR PEER REVIEW 14 of 16 of determination was computed to be 0.98. These values confirm that our trained model can be used at industries to predict and control the variation of the dc/dt in an actual reactor. Figure 8 shows that the machine-learning algorithm can predict the decarburization rate very accurately without employing any simplifications and without taking into account all the reactions, interactions, mass and heat transfers and fluid flows. In fact, all of these parameters are already inherited in the dataset, and an algorithm trained on the real dataset naturally learns the relationships between all of the parameters involved in the process without exactly knowing the physical meanings of them.   of determination was computed to be 0.98. These values confirm that our trained model can be used at industries to predict and control the variation of the dc/dt in an actual reactor. Figure 8 shows that the machine-learning algorithm can predict the decarburization rate very accurately without employing any simplifications and without taking into account all the reactions, interactions, mass and heat transfers and fluid flows. In fact, all of these parameters are already inherited in the dataset, and an algorithm trained on the real dataset naturally learns the relationships between all of the parameters involved in the process without exactly knowing the physical meanings of them.

Conclusions
We applied a machine learning-based analysis to a dataset (operating and output data) from a pilot basic oxygen steelmaking (BOS) converter. Correlation analysis showed: • A strong positive correlation between the rate of decarburization (dc/dt) and total oxygen flow. • A negative correlation with lance height. • Less obviously, the decarburization also showed a positive correlation with the temperature of the waste gas, CO 2 content in the waste gas and O 2 in the waste gas.

•
The pilot plant dataset was used for training and test data to develop a neural network-based regression to predict the decarburization rate. The developed algorithm was used successfully to predict the decarburization rate in a BOS furnace in an actual manufacturing plant based on the two operating parameters of total oxygen flow and lance height only.

•
The performance was satisfactory, with a coefficient of determination of 0.98, confirming that the trained model can adequately predict the variation in the dc/dt within BOS reactors.
The method is easily scalable for industrial applications. In addition, the machine-learning model does not simplify the problem and is able to predict the decarburization rate accurately through learning from the real dataset acquired from BOS pilot plants.