4.1. Model for Predicting Nitrogen Content in Molten Desulphurized Pig Iron
Section 3.1 contains
Table 2, which shows the order of influence of the significant factors that cause metal saturation by nitrogen during the desulfurization phase of pig iron. The amount of nitrogen in pig iron after desulfurization is most influenced by the amount of sulfur removed from the pig iron. Sulphur is a strong surface-active element in pig iron, occupying active sites at the metal–gas interface and slowing down the decomposition of molecular nitrogen {N
2} from the carrier gas. During desulphurization, the activity of the sulfur in the metal decreases, freeing up reaction sites at the phase interface and accelerating the dissolution of atomic nitrogen [N] into the metal. Therefore, the more sulfur that is removed from pig iron, the more nitrogen will dissolve in it. This is closely related to the amount of carrier gas used in the desulfurization mixture, which is nitrogen. Nitrogen enters the metal via an adsorption apparatus at the phase interface. Here, molecules are dissociated into their atomic form, which then dissolves into the metal volume. The amount of nitrogen supplied as a carrier gas is related to the amount of desulfurization mixture used. As the amount of desulfurization mixture added to pig iron can vary within a given volume of carrier gas, these two factors both influence the oversaturation of pig iron by nitrogen. The weight of the pig iron is directly proportional to the amount of desulfurization mixture (for pig iron with a consistent sulfur content) and the volume of nitrogen used as the carrier gas. The difference in temperature at the beginning and end of pig iron desulfurization is significant in relation to nitrogen solubility in metal.
As shown in
Table 5, the ordinary least squares (OLS) method yielded correct results. The mean value of the dependent variable, which, in this case, is the amount of nitrogen in the metal after desulphurization of pig iron, gives the mean value of the observed dependent variable in the processed dataset.
The sum of squares of residuals is the total of the squares of the differences between the measured and estimated values of the dependent variable. This sum must be minimal, a goal that has been achieved.
The coefficient of multiple determination for residuals is a measure of the degree of suitability of the regression. The values taken are in the range 〈0,1〉. The objective is to attain the maximum possible value of R
2. The interpretation of the obtained value of the coefficient of determination is dependent to a significant extent on the nature of the processed datasets. It is imperative to emphasize that non-stationary variables, which are characteristic of the metallurgical industry, were processed. When such non-stationary variables are obtained from integration systems (metallurgical aggregates, storage tanks, etc.), R
2 values of only around 0.15 are often achieved. However, this does not necessarily indicate a low degree of suitability of the proposed regression [
44].
The F-test is a statistical procedure that is employed to ascertain whether the standard deviations of two datasets are equal [
68]. The objective of this procedure is to ascertain whether the typical cases in the set of examined numbers differ from each other, i.e., to determine whether
primarily applies [
69]. The critical value of the F distribution F
crit for a significance level of 10% (for a two-tailed test) is F
crit(5, 70) = 1.931. The null hypothesis has the form H
0:
, and the alternative hypothesis has the form H
1:
. It can be stated that, if F
crit(5, 70) > F(5, 70), i.e., 1.931 > 1.5103, then the null hypothesis can be accepted, whereby the standard deviations of the datasets at a given significance level are 90% similar to each other.
The Durbin–Watson autocorrelation test was used to determine the presence of autocorrelation, i.e., whether random components influence each other. At the relevant significance level, we test the null hypothesis (H
0: there is no autocorrelation) against the alternative hypothesis (H
1: autocorrelation is present). The test statistic can take values ranging from 0 to 4, with values around 2 indicating the absence of autocorrelation [
70]. From the calculated value of the Durbin–Watson autocorrelation coefficient (DW: 1.641496), it can be concluded that the null hypothesis (H
0) of the absence of autocorrelation is accepted. Therefore, the random components are considered to be statistically independent at the relevant significance level (α). A special Granger–Newbold comparison was introduced to indicate spurious regression using Durbin–Watson statistics (DW). To indicate spurious regression, inequality in Equation (28) must be fulfilled; in other words, the multiple determination coefficient (R
2: 0.097373) must exceed the Durbin–Watson statistic value [
71,
72].
In the regression for which the results are displayed in
Table 5, this inequality is not satisfied, which correctly indicates the absence of spurious regression. The time series cointegration analysis was employed to accurately and correctly distinguish spurious regression [
72].
The augmented Dickey–Fuller test was employed to test for cointegration. The testing itself is based on the null hypothesis H
0: the variables are not cointegrated, or the alternative hypothesis H
1: the variables are cointegrated. The subsequent data (
Figure 11) are the result of an investigation into the cointegration of the dataset in the Gretl 2025a program.
As illustrated in
Figure 11, the
p-value is 0.0246. Given that this value is lower than the significance level α = 0.05, it can be concluded that the null hypothesis H
0 must be rejected and the alternative hypothesis H
1 accepted. This indicates that the factors are cointegrated, i.e., they are non-stationary in themselves, but their linear dependencies are stationary. Consequently, spurious regression is not a possibility in mathematical modelling. Similarly, the significantly negative value of tau_c(6) = −4.9592 indicates strong evidence for rejecting the null hypothesis.
The parameters outlined above demonstrate the suitability of the proposed configuration of variables (
Table 4). As demonstrated in Equation (23), it is possible to formulate a mathematical model to predict the nitrogen content in desulphurized pig iron (N
DeS). The resulting form of the model is as follows (29):
where:
NDeS: predicted nitrogen content in desulphurized pig iron;
A1: amount of sulfur removed [%];
A2: amount of blown nitrogen as carrier gas for the desulphurization mixture [l];
A3: weight of pig iron after desulfurization [kg];
A4: amount of desulphurization mixture [kg];
A5: temperature difference of pig iron before and after desulfurization [°C].
Validity ranges of the model (29) are listed in
Table 18.
The proposed model was subjected to a rigorous diagnostic process, which involved the utilization of precise testing analyses and graphical tools. This study employed a range of statistical techniques, including tests for normality, heteroscedasticity, multicollinearity, and autocorrelation, to analyze the data. The analysis of the model was facilitated by the utilization of graphical representations, including scatter plots, line graphs, and histograms. The evaluation of the model is based on residuals, which represent the difference between the measured and predicted values of the amount of nitrogen in the desulphurized pig iron.
Figure 12a presents a graphical representation of the residual variance, while
Figure 12b provides a residual analysis of the timeline.
As demonstrated in
Figure 12a, the residual deviations are randomly dispersed around zero, and no potential trend or pattern can be observed in the graph. Consequently, the model is well designed and meets the assumptions.
Figure 12b clearly shows that the sign of the residual values changes sufficiently over time. This finding suggests that the designed model generally does not overestimate or underestimate the calculated nitrogen values in metal when making predictions. The sum of squares of residuals (
Table 5) with a value of 0.000036 confirms this opinion, as the value is close to zero. This fact—the normality of the residuals—is also evident from
Figure 13, in which the red dots are arranged close to the blue line, demonstrating the normal distribution of the residuals, which is also confirmed by the histogram in
Figure 14.
Figure 15 provides a graphical representation of the comparison between measured and predicted results using the N
DeS model (29). The red curve signifies the measured values, whilst the blue curve denotes the predicted values of nitrogen content in pig iron following desulfurization. The green curves represent the 95% confidence interval. The standard deviation of the residuals is 0.000718051. This low value indicates minimal discrepancies between the measured and predicted datasets. Consequently, the N
DeS model (29) provides accurate results.
To detect incorrect specifications in the proposed model, heteroscedasticity tests are used to test the non-constant variance of random components [
73]. The tests verify whether any significant variables have been omitted from the mathematical model [
74]. White’s heteroscedasticity test and Breusch–Pagan’s heteroscedasticity test were used for testing. For both tests, the null hypothesis H
0 applies: no heteroscedasticity, as opposed to the alternative hypothesis H
1: heteroscedasticity present. The test statistic for White’s test of heteroscedasticity is 15.0079. The null hypothesis is rejected if the value of White’s test statistic is greater than the critical value χ
2(20) at the corresponding confidence level α. However, this is not the case, because χ
2(20) = 31.41 > 15.0079. Consequently, the null hypothesis H
0 regarding the absence of heteroscedasticity is accepted. The Breusch–Pagan test statistic for heteroscedasticity is 3.00616. The null hypothesis H
0 is rejected if the value of the Breusch–Pagan statistic is greater than the corresponding critical chi-squared value χ
2(5) at the chosen confidence level α. However, this is not the case because χ
2(5) = 11.07 > 3.00616. Therefore, we accept the null hypothesis (H
0) of the absence of heteroscedasticity. Both tests confirmed the absence of heteroscedasticity, and thus the variance of random components exhibits homoscedasticity, meaning that no significant variable is omitted from the proposed model.
Multicollinearity testing is used to verify the suitability of the factors affecting the amount of nitrogen in metal. In a multiple regression model, multicollinearity assesses the extent to which two or more prognostic factors are correlated. If there is a high degree of correlation, even a small change in the dataset could result in a significant change in the estimated strength of the coefficient. However, multicollinearity does not reduce the model’s overall predictive power and reliability, only affecting the calculations relating to individual predictors [
75]. Multicollinearity is assessed by the Variance Inflation Factor (VIF) [
76], whose values for the analyzed factors a
1–a
5 (
Table 4) are provided in
Table 19. The minimum value is 1, with values above 10 indicating high multicollinearity.
Time series extrapolation produces forecasts based on estimates of the parameters of a specific mathematical model whose quality has been confirmed by various statistical tests. Therefore, it can be expected that the resulting forecasts will not differ greatly from reality. The accuracy of the forecasts is assessed using various average characteristics.
The Mean Absolute Error (MAE) is a measure of the average absolute deviation of actual values from estimated (predicted) values. Following the substitution of the given Equation (24), the resulting statistic is found to be MAEDeS = 1.3374 × 10−10. Consequently, it can be deduced that the Mean Absolute Error is negligible.
The Mean Percentage Error (MPE) expresses the degree of distortion. After substituting into Equation (25), the result MPEDeS = −3.5785% was obtained. It can, therefore, be concluded that the proposed model systematically overestimates reality, with the predicted values being, on average, 3.5785% higher than the actual values.
The Mean Absolute Percentage Error (MAPE) is a statistical metric that calculates the average magnitude of forecast errors as a percentage relative to actual values over the entire forecast period. Following the substitution of the observed statistic into Equation (26), the result is MAPEDeS = 16.5411%.
The accuracy of the N
DeS model (29) is determined through the substitution of the model’s parameters into Equation (27). Subsequently, Equation (30) can be established, thereby determining the accuracy of the N
DeS model, as outlined in Equation (31).
4.2. Model for Predicting Nitrogen Content in Molten Crude Steel Before Tapping from BOF
Section 3.2 and
Table 6 show the ranking of factors affecting the amount of dissolved nitrogen in crude steel before it is tapped from the BOF. The results can be interpreted as follows: The most significant factor causing increased nitrogen content in crude steel is the total oxygen reblow time, which correlates very well with operational reality. The nitrogen content of high-purity oxygen (oxygen purity level of 95%) can vary significantly, from 70 to 1250 ppm of nitrogen. Reblow is performed either due to an inadequate chemical composition or a low temperature of the crude steel, and this increases the amount of nitrogen dissolved in the metal. The content of manganese, phosphorus, and carbon relates to the amount removed during the refining process. The greater the removal of these elements during heat, the greater the nitrogen content of the crude steel, which is closely related to the blowing time, amount of high-purity oxygen supplied, and intensity of oxygen blowing. The effect of the amount of briquettes added on the increase in dissolved nitrogen in the produced crude steel is related to its binder, molasses. Molasses is used in the production of briquettes at a proportion of up to 10 wt.%. Sugar beet molasses contains nitrogen in its structure and has the chemical formula C
6H
12NNaO
3S [
77]. Tapping temperature affects the amount of nitrogen dissolved in crude steel. More nitrogen is dissolved at higher tapping temperatures. This is because the solubility of nitrogen in molten steel increases with temperature.
Based on
Table 9, it can be observed that correct results were achieved using the OLS method. The standard deviation of the dependent variable (0.000614) when compared to the mean value of the dependent variable (0.002002) yielded a coefficient of variation of approximately 0.31 (SD/Mean). This indicates that the variability of the dependent variable is to a considerable extent governed, and it is estimated to account for approximately 31% of its mean value.
As demonstrated in
Table 1, the coefficient of determination, according to Cohen’s distribution, manifests only moderate values. However, it is imperative to underscore the non-stationary nature of data from integrated systems and the fact that these are operational data, for which achieving average correlation coefficients is a substantial accomplishment.
The sum of squares of residuals is 0.000017. A low value indicates that the absolute errors of the model are very small, which is a positive sign for the accuracy of predictions.
The F-test tests whether the null hypothesis H0 primarily applies, i.e., , or if the alternative hypothesis applies, i.e., H1: . The critical value of the F-distribution for a significance level of 10% is Fcrit(7, 55) = 1.829. Since Fcrit(7, 55) < F(7, 55), i.e., 1.829 < 2.326593, it can be concluded that the null hypothesis can be rejected, meaning that the standard deviations of the datasets are 90% different from each other.
The Durbin–Watson statistic value of 2.000830 is almost ideal. A value close to 2.0 indicates the absence of autocorrelation in the model residuals, thus supporting the null hypothesis H0 concerning the absence of autocorrelation. This is a highly positive finding, as it fulfils one of the fundamental assumptions of linear regression, namely, the independence of errors. In Granger–Newbold’s comparison of spurious regression, it is possible to conclude, on the basis of Equation (28), that, in this case, there is no indication of spurious regression, because the value of the coefficient of determination R2 is lower than the value of the DW test.
The augmented Dickey–Fuller test was applied to assess cointegration. This test evaluates the null hypothesis, H
0: the variables are not cointegrated, against the alternative hypothesis, H
1: the variables are cointegrated.
Figure 16 presents the results of this cointegration analysis, carried out using the statistical tool Gretl 2025a.
As depicted in
Figure 16, the
p-value is 0.01596, which is below the significance level (α = 0.05), leading us to reject the null hypothesis (H
0) in favor of the alternative (H
1). Consequently, the series are cointegrated: each is non-stationary on its own, but their linear combination is stationary. Accordingly, spurious regression cannot occur in the mathematical model. Moreover, the markedly negative value of tau_c(8) = −5.61958 offers strong evidence for rejecting the null hypothesis.
The test parameters demonstrate the suitability of the configuration of variables listed in
Table 8. Equation (23) can be used to create a mathematical model for predicting the nitrogen content of raw steel before tapping it from the basic oxygen furnace. The resulting model takes the form of Equation (32).
where:
NBOF: predicted nitrogen content in crude steel before tapping from BOF;
B1: oxygen reblow time [s];
B2: manganese content in crude steel [%];
B3: phosphorus content in crude steel [%];
B4: carbon content in crude steel [%];
B5: briquettes [kg];
B6: temperature of tapping steel [°C];
B7: oxygen blowing time [s].
The validity ranges of the model (32) are exhibited in
Table 20.
The proposed N
BOF model (32) followed the same rigorous diagnostic process as the N
DeS model (29). This process involved the use of precise testing analyses and graphical representations. The evaluation of the model is grounded in the assessment of residuals. Residuals serve to illustrate the discrepancy between the measured value and the predicted value of the amount of nitrogen in the crude steel prior to tapping. As illustrated in
Figure 17a, the residual variance is presented graphically, while
Figure 17b offers a residual analysis with respect to the timeline.
As illustrated in
Figure 17a, the residual deviations are randomly dispersed around zero. Furthermore, the graph does not exhibit any discernible trend or pattern. Consequently, the model is well designed and meets the assumptions. As demonstrated in
Figure 17b, the sign of the residual values undergoes a substantial alteration over the course of the experiment. This finding suggests that the designed model generally does not significantly overestimate or underestimate the calculated nitrogen values in metal when making predictions. The calculation of the sum of squares of residuals (
Table 9) yields a value of 0.000017, which corroborates this viewpoint, as it is proximate to zero. This finding—that is, the normality of the residuals—can be seen in
Figure 18, where the points (red dots) can be seen to be arranged quite close to the blue line. This arrangement demonstrates a normal distribution of the residuals. This finding is also confirmed by the histogram in
Figure 19.
As illustrated in
Figure 20, a graphical representation is provided of the comparison between measured and predicted results using the N
BOF model (32). The red curve represents the measured values, whilst the blue curve denotes the predicted values of nitrogen content in crude steel prior to tapping. The green curves represent the 95% confidence interval. The standard deviation of the residuals is calculated to be 0.000548932. The observed value, which is of negligible magnitude, signifies that there is minimal discrepancy between the two datasets, i.e., the measured and predicted ones, respectively. Consequently, it can be posited that the N
BOF model (32) provides accurate results.
The White and Breusch–Pagan tests were used to assess the presence of heteroscedasticity, and to verify the null hypothesis (H0), which assumes the absence of heteroscedasticity, and the alternative hypothesis (H1), which assumes its presence. The White test yielded a value of 34.3269. The null hypothesis is rejected if the value of the test statistic is greater than the corresponding critical value, χ2(34), at the chosen confidence level, α. However, this is not the case because χ2(34) = 48.602 > 34.3269. Therefore, in White’s test, the null hypothesis of no heteroscedasticity is accepted. The Breusch–Pagan test statistic for heteroscedasticity is 8.66205. The null hypothesis (H0) is rejected if the value of the Breusch–Pagan test statistic is greater than the corresponding critical χ2(7) value at the chosen confidence level α. However, this is also not the case here, as χ2(7) = 14.067 > 8.66205. According to the Breusch–Pagan statistic, the null hypothesis (H0): no heteroscedasticity is accepted.
As part of the solution to multicollinearity, the variance inflation factor (VIF) is evaluated. The values for the analyzed factors b
1–b
7 (
Table 8) are shown in
Table 21. The test results indicate low to no multicollinearity (1 is the minimal value). This indicates that the independent variables are well separated, meaning that each variable contributes unique information to the model. Based on the VIF test, the proposed N
BOF (32) model also shows stability, with reliable regression coefficients that are unaffected by excessive correlation between variables.
The accuracy of the NBOF model (32) can be evaluated as follows. The Mean Absolute Error (MAE) was calculated using Equation (24), and the result is MAEBOF = 1.3606 × 10−10. It can therefore be concluded that the average absolute error is very small.
The Mean Percentage Error (MPE) was computed based on Equation (25), and, after substituting into the relationship, the result is MPEBOF = −6.1515%. It can therefore be concluded that reality is systematically overestimated by the proposed model, with the predicted values being 6.1515% higher than the actual values on average.
The average size of forecast errors compared to actual values across the entire forecast period is expressed using the Mean Absolute Percentage Error (MAPE). Substituting into Equation (26), we obtain the result MAPE
BOF = 22.7696%. The accuracy of the N
BOF model (32) can be determined by substituting the model’s parameters into Equation (27). This establishes Equation (33), which determines the accuracy of the NBOF model as outlined in Equation (34).
4.3. Model for Predicting Nitrogen Content in Molten Steel at the Beginning of Secondary Metallurgy
It has been shown in
Section 3.3 (
Table 10) that, as the tapping angle is increased, the nitrogen content in molten steel is reduced. This dependency is associated with the length of the tapping steel stream. With a smaller BOF vessel tilt, the length of the tapped steel stream is greater, meaning more tapped steel comes into contact with the atmosphere, creating a larger reaction area. As the converter tilt increases, the steel flowing into the ladle is straighter and shorter, resulting in a smaller reaction surface. The duration of the tapping time is also found to be significantly related to the length of time the steel is in contact with air (79% of air consists of nitrogen). It has been demonstrated that an increase in the duration of the tapping time results in an increase in the nitrogen content dissolved in the steel, due to the prolongation of the steel’s exposure to the air. At the beginning of secondary metallurgy, silicon in steel comes from the FeSi ferroalloy. This ferroalloy is added to steel only after a carbonized deoxidizer or carburizing coke is added. This ensures the boiling of the steel and the generation of a large amount of CO bubbles, which subsequently generate CO
2. This reduces the amount of active oxygen at the metal–gas interface, enabling the metal to become supersaturated with atmospheric nitrogen during intense steel boiling. After FeSi is added, the silicon also reacts with the active oxygen in the metal to form SiO
2, which increases the nitrogen transfer coefficient into the metal. Depending on the manufacturer, FeSi contains approximately 80–150 ppm of nitrogen. Moreover, Wagner’s interaction coefficient for the Fe–Si–N system is
. A positive value indicates that silicon increases the activity coefficient of nitrogen and thus also the equilibrium solubility of nitrogen in molten steel. At the beginning of secondary metallurgy (SM), manganese comes from both the crude steel produced in the BOF and the FeMn aff. alloy added during the SM process. Manganese increases the solubility of nitrogen in steel. Similarly, the FeMn aff. ferroalloy contains 40–80 ppm of nitrogen, depending on the supplier. Wagner’s interaction coefficient for the Fe–Mn–N system is
. A positive value indicates that manganese increases the activity coefficient of nitrogen and thus also the equilibrium solubility of nitrogen in molten steel. Oxygen is a highly active element on the surface of metal and occupies active sites at the metal–gas phase interface. Therefore, oxygen in crude steel slows down the dissolution of nitrogen in the metal. Adding a large amount of aluminum in the form of blocks significantly reduces the activity of oxygen in the metal. This reduces the amount of oxygen at the metal–gas interface and increases the nitrogen transfer coefficient, allowing nitrogen to dissolve into the metal and increasing its content. This is the reason why fully-killed steels have a higher nitrogen content than semi-killed steels, as deoxidation removes more oxygen and requires more added aluminum as a deoxidizer. During this process, the metal is mixed intensively and comes into contact with air. This is why the nitrogen that enters the metal during aluminum-based deoxidation comes from the atmosphere. For steel grades that require a very low final nitrogen value, deoxidation using aluminum is performed during processing at SM with chopped aluminum wire rather than aluminum blocks during tapping from BOF. The efficiency of deoxidation using chopped aluminum wire is 85–92%. Wagner’s interaction coefficient for the Fe–Al–N system is
. A negative value indicates that aluminium decreases the activity coefficient of nitrogen, thereby reducing the equilibrium solubility of nitrogen in molten steel. Therefore, adding 0.03% aluminium to the metal reduces the equilibrium nitrogen content by approximately 3–4 ppm at 1600 °C.
The results presented in
Table 13 confirm the validity of the ordinary least squares (OLS) estimation approach. The coefficient of variation, calculated as the ratio of the standard deviation (0.000888) to the mean value (0.003160) of the dependent variable, equals approximately 0.28 (SD/Mean). This indicates that the dispersion of the dependent variable is substantially controlled, with the variability representing approximately 28% of the mean value.
As shown in
Table 1, according to Cohen’s distribution, the coefficient of determination only exhibits moderate values. However, it is important to emphasize the non-stationary nature of the data from integrated systems, and the fact that these are operational data for which achieving average correlation coefficients is a significant achievement.
The sum of the squares of the residuals is 0.000035. This low value indicates that the model’s absolute errors are very small, suggesting that predictions will be accurate.
The F-test determines whether the null hypothesis (H0: ) or the alternative hypothesis (H1: ) applies. The critical value of the F-distribution for a 10% significance level is Fcrit(7, 67) = 1.808. As Fcrit(7, 67) < F(7, 67), i.e., 1.808 < 6.453, the null hypothesis can be rejected. This means that the standard deviations of the datasets are 90% different from each other.
The Durbin–Watson statistic of 1.872243 (
Table 13) provides evidence for the absence of autocorrelation among model residuals, thereby supporting the null hypothesis H
0 regarding the independence of error terms. This finding is particularly significant as it satisfies a fundamental assumption underlying linear regression analysis, specifically the requirement for error independence. According to the Granger–Newbold criterion in Equation (28), no spurious regression is detected since R
2 < DW, confirming model validity.
The augmented Dickey–Fuller test was employed to examine cointegration relationships among the variables. The test framework evaluates the null hypothesis H
0 (absence of cointegration) against the alternative hypothesis H
1 (presence of cointegration). The cointegration analysis results, conducted using the statistical software Gretl 2025a, are presented in
Figure 21.
As illustrated in
Figure 21, the
p-value is 0.003933, which is significantly below the significance level (α = 0.05). This indicates that the null hypothesis (H
0) is rejected and the alternative (H
1) is accepted. Consequently, the series are cointegrated: each is non-stationary on its own, but their linear combination is stationary and spurious regression is precluded in the mathematical model. Furthermore, the markedly negative value of tau_c(8) = −6.0245 provides substantial evidence to support the rejection of the null hypothesis—series are cointegrated.
The test parameters demonstrate the suitability of the configuration of variables listed in
Table 10. Equation (23) can be utilized to formulate a mathematical model for predicting the nitrogen content in steel at the beginning of secondary metallurgy. The resulting model assumes Equation (35).
where:
NSMB: predicted nitrogen content in steel at the beginning of secondary metallurgy,
C1: tapping angle [°];
C2: silicon in molten steel prior to argon bubbling [%];
C3: total aluminum prior to argon bubbling [%];
C4: carbon in molten steel prior to argon bubbling [%];
C5: manganese in molten steel prior to argon bubbling [%];
C6: tapping time [s];
C7: added aluminum blocks [kg].
The validity ranges of the model (35) are illustrated in
Table 22.
The proposed N
SMB model (35) was subjected to the same rigorous diagnostic process as the N
DeS model (29) and N
BOF model (32). This process involved precise testing, analysis, and graphical interpretation. The model’s evaluation is based on the analysis of residuals. Residuals illustrate the discrepancy between the measured and predicted amounts of nitrogen in the crude steel prior to tapping.
Figure 22a illustrates the residual variance graphically, while
Figure 22b provides a residual analysis over time.
As depicted in
Figure 22a, the residual deviations exhibit a random distribution around zero without any observable systematic trend or pattern. This distribution confirms that the model is appropriately specified and satisfies the underlying statistical assumptions. As shown in
Figure 22b, the sign of the residual values changes substantially over the course of the experiment. This suggests that the designed model generally does not significantly overestimate or underestimate the calculated nitrogen values in metal when making predictions. Calculating the sum of squares of residuals (
Table 13) yields a value of 0.000035, which corroborates this viewpoint as it is close to zero.
Figure 23 illustrates this finding, showing that the points (red dots) are arranged quite close to the blue line, indicating the normality of the residuals. This demonstrates a normal distribution of the residuals. The histogram in
Figure 24 also confirms this finding.
Figure 25 presents a comparative analysis of measured versus predicted values generated by the N
SMB model (35). The graphical representation displays measured values (red curve), predicted nitrogen concentrations in crude steel prior to tapping (blue curve), and the 95% confidence interval (green curves). The calculated standard deviation of residuals is 0.000721066, indicating minimal discrepancy between observed and predicted datasets. This negligible deviation demonstrates that the N
SMB model (35) exhibits satisfactory predictive accuracy.
The White and Breusch–Pagan tests were employed to evaluate the presence of heteroscedasticity and to verify the null hypothesis (H0), which assumes the absence of heteroscedasticity, and the alternative hypothesis (H1), which assumes its presence. The White test yielded a value of 42.766. The null hypothesis is to be rejected if the value of the test statistic is greater than the corresponding critical value, χ2(35), at the chosen confidence level, α = 0.05. However, this is not the case, because χ2(35) = 49,802 > 42,766. Consequently, in White’s test, the null hypothesis of no heteroscedasticity is accepted. The Breusch–Pagan test statistic for heteroscedasticity is 4.94841. The null hypothesis (H0) is rejected if the value of the Breusch–Pagan test statistic is greater than the corresponding critical χ2(7) value at the chosen confidence level. However, this is not applicable in this instance, as χ2(7) = 14.067 > 4.94841. It is evident that, in accordance with the Breusch–Pagan statistic, the null hypothesis (H0) is accepted, namely that there is no heteroscedasticity.
As part of the solution to multicollinearity, the variance inflation factor (VIF) is evaluated. The values for the analyzed factors c
1–c
7 (
Table 12) are shown in
Table 23. The VIF values indicate that the regression model is relatively favorable. Those that do not exhibit multicollinearity have values close to the ideal of 1, while variables such as carbon, manganese, and silicon in steel exhibit slight multicollinearity but do not exceed the critical VIF value of 10. The VIF values for carbon, manganese, and silicon indicate their correlated behavior. However, their higher VIF test values are not a shortcoming of the model but rather reflect actual metallurgical relationships. This correlation stems from their shared roles in steelmaking processes as they naturally form part of the chemical composition of both raw iron and steel. Due to their similar affinity for oxygen at high temperatures, they react similarly with oxygen, are subject to similar thermodynamic laws in steel production and processing, and influence each other’s final properties. In the context of steel finishing in secondary metallurgy, this correlation is both expected and technologically justified, confirming the accuracy of the statistical analysis observations. Intensive mixing of the steel during tapping from the Basic Oxygen Furnace (BOF), the boiling of the steel at the bottom of the ladle, and the addition of aluminum to deoxidize the steel significantly reduce the amount of nitrogen dissolved in the metal by transporting nitrogen to the metal–slag interface. In secondary steel metallurgy, the correlation between carbon, manganese, and silicon dissolved in steel is a natural phenomenon with fundamental practical significance. Understanding and utilizing this correlation enables more efficient process control, improves product quality in terms of nitrogen content, and generates economic savings. For modern steel producers, this correlation is an invaluable tool for optimizing production processes and ensuring consistent steel quality. Due to this, each variable provides unique information to the N
SMB model (35).
The accuracy of the NSMB model can be evaluated as follows: The Mean Absolute Error (MAE) was calculated using Equation (24), giving a result of MAESMB = 2.407 × 10−11. Therefore, it can be concluded that the average absolute error is very negligible.
The mean percentage error (MPE) was computed based on Equation (25). After substituting this into the relationship, the result is MPESMB = −5.3582%. Therefore, it can be concluded that the proposed model systematically overestimates reality, with the predicted values being, on average, 5.3582% higher than the actual values.
The Mean Absolute Percentage Error (MAPE) is used to express the average size of forecast errors compared to actual values across the entire forecast period. Substituting this into Equation (26) gives MAPE
SMB = 20.0341%. The accuracy of the N
SMB model can be determined by substituting its parameters into Equation (27). This establishes Equation (36), which determines the accuracy of the NBOF model, as outlined in Equation (37).
4.4. Model for Predicting Nitrogen Content in Molten Steel at the End of Secondary Metallurgy
The most significant factors affecting the amount of nitrogen dissolved in molten steel (see
Table 14) can be described as follows: The solubility of nitrogen in liquid steel is governed by Sievert’s law, whereby the equilibrium solubility of nitrogen in steel increases with temperature. Despite thermodynamics predicting higher nitrogen solubility at higher temperatures, industrial observations demonstrate a contrary trend, whereby the ultimate nitrogen content in steel decreases with increasing temperature during secondary metallurgy. This phenomenon can be attributed to the predominance of kinetic factors over thermodynamic equilibrium. At the beginning of the SM process, a significant number of CO bubbles are generated in the ladle, thereby assisting in the mixing of the melt. It has been demonstrated that at elevated temperatures, there is an increase in the volume of CO bubbles, and that the reaction [C] + [O] = {CO} proceeds at a faster rate [
78]. However, at this stage, the elevated presence of surface–active elements, such as oxygen and sulfur, inhibits the process of rapid desulfurization. In the later stages of SM, when the generation of CO is reduced due to the depletion of reagents in the metal, argon assumes the role of the mixing agent. However, it has been demonstrated that, at elevated temperatures, the inhibitory effect of surface–active elements is reduced [
79]. It has been demonstrated that, by reducing the amount of oxygen in the metal, it is possible to effectively remove nitrogen from the metal using the residual amount of CO bubbles in combination with argon [
80]. The argon is fed into the metal through a porous plug located at the bottom of the casting ladle. As the temperature of the metal is increased, the viscosity of the steel is also reduced, thus facilitating the movement of CO and Ar bubbles. Experimental evidence has demonstrated that elevating the temperature from 1550 °C to 1620 °C enhances the saturation solubility of nitrogen, whereas concurrently increasing the rate constant for nitrogen removal and the mass transfer coefficient has also been observed [
81]. During the process of deoxidation of steel tapped from BOF, a carbonized deoxidizer or carburizing coke is added. This process ensures that the steel boils and a large number of CO bubbles are generated. Subsequently, CO
2 is created. This reduces the amount of active oxygen at the metal–gas interface, enabling the metal to become supersaturated with atmospheric nitrogen during intense steel boiling. Adding FeMn aff. (not nitrogenous FeMnN) increases the nitrogen content of steel. The nitrogen in FeMn comes from atmospheric nitrogen that comes into contact with molten FeMn during the production process. Carousel tapping of FeMn creates a large reaction surface between the ferroalloy and the atmosphere, causing the absorption of large amounts of atmospheric nitrogen into the FeMn. The ferroalloy FeMn aff. has been found to contain nitrogen at concentrations ranging from 40 to 80 ppm, with variations depending on the supplier. The interaction coefficient for the Fe–Mn–N system, as determined by Wagner, is
. A positive value indicates that manganese increases the activity coefficient of nitrogen, thus increasing the equilibrium solubility of nitrogen in molten steel. The final manganese content at the end of secondary metallurgy is closely related to the amount of FeMn added during the secondary metallurgy stage of steel processing.
The results presented in
Table 17 confirm the validity of the ordinary least squares (OLS) estimation approach. The coefficient of variation, calculated as the ratio of the standard deviation (0.000790) to the mean value (0.003270) of the dependent variable, equals approximately 0.24 (SD/Mean). Consequently, the dataset displays low relative dispersion, indicating that variability constitutes only 24% of the central tendency. This supports the ordinary least squares assumptions, thereby confirming the methodological soundness of the OLS estimator and validating the reliability of the inferences derived from
Table 17.
In the context of analyzing industrial data from steel production, the value of R2 = 0.241736 can be considered acceptable. Industrial processes are characterized by high variability and complex interactions between process variables. Consequently, even a low value of R2 can be informative and useful, especially if the regression coefficients are statistically significant and the result of the coefficient of determination can, therefore, be considered significant in terms of the nature of the data being processed.
The sum of the squares of the residuals is 0.000033. It is evident that the low value indicates that the model’s absolute errors are minimal, thereby suggesting that predictions will be accurate.
The F-test is a statistical procedure used to determine whether the null hypothesis (H0: ) or the alternative hypothesis (H1: ) applies. The critical value of the F-distribution for a 10% significance level is Fcrit(4, 67) = 2.031. As Fcrit(4, 67) < F(4, 67), i.e., 2.031 < 2.682089, the null hypothesis can be rejected. This indicates that the standard deviations of the datasets differ by 90%.
With a value of 2.029951, the Durbin–Watson statistic is almost ideal. A value close to 2.0 indicates an absence of autocorrelation in the model residuals, thus supporting the null hypothesis (H0) of an absence of autocorrelation. This is a highly positive finding as it fulfils one of the fundamental assumptions of linear regression: the independence of errors. According to Granger and Newbold’s comparison of spurious regression, the relation in Equation (28) suggests that there is no indication of spurious regression in this case, as the value of the coefficient of determination R2 is lower than the DW test value.
The augmented Dickey–Fuller test was used to analyze the cointegration relationships between the variables. This test evaluates the null hypothesis (H
0: absence of cointegration) against the alternative hypothesis (H
1: presence of cointegration). The results of the cointegration analysis, which was conducted using the Gretl 2025a statistical software, are presented in
Figure 26.
As shown in
Figure 26, the
p-value is 0.03379, which is below the significance level α = 0.05. This indicates that the null hypothesis (H
0) is rejected, and the alternative hypothesis (H
1) is accepted. Consequently, the series are cointegrated: while each series is non-stationary on its own, their linear combination is stationary. Therefore, spurious regression can be ruled out in the mathematical model. Furthermore, the markedly negative value of tau_c(5) = −4.559 provides substantial evidence in support of rejecting the null hypothesis. For this reason, the series are cointegrated.
The test parameters demonstrate the suitability of the configuration of variables listed in
Table 14. Equation (23) can be employed to formulate a mathematical model for predicting the nitrogen content in steel at the conclusion of secondary metallurgy. The resulting model assumes the form of Equation (38).
where:
NSME: predicted nitrogen content in steel at the end of secondary metallurgy;
D1: steel temperature at the end of SM [°C];
D2: final carbon in molten steel [%];
D3: addition of FeMn aff. during SM [%];
D4: final manganese in molten steel [%].
The validity ranges of the model (38) are illustrated in
Table 24.
The proposed N
SME model (38) was subjected to the same rigorous diagnostic process as previous models. The process entailed methodical testing, thorough analysis, and graphical interpretation. The evaluation of the model is based on an analysis of residuals. As illustrated in
Figure 27a, the residual variance is presented graphically, while
Figure 27b provides a residual analysis over time.
The distribution of residuals in
Figure 27a suggests that the random homoscedasticity assumption is valid, which is favorable for the model’s reliability. However, the time series in
Figure 27b reveals potential issues of serial correlation and systematic sampling (around observations 15–20), though testing for autocorrelation of residuals does not confirm these. It can therefore be concluded that the model is adequately specified.
Both diagnostic plots (
Figure 28 and
Figure 29) suggest that the residuals exhibit an approximately normal distribution, with slight deviations from perfect normality, particularly at extreme values and at the peak of the distribution. For practical purposes, however, we can consider the assumption of residual normality to be sufficiently satisfied.
Figure 30 shows a comparison of the measured and predicted values generated by the N
SME model (38). The graph shows the measured values (red curve), the predicted nitrogen concentrations in crude steel prior to tapping (blue curve), and the 95% confidence interval (green curves). The calculated standard deviation of residuals is 0.00070872, indicating a minimal discrepancy between the observed and predicted datasets. This negligible deviation shows that the N
SME model (38) has satisfactory predictive accuracy.
The White and Breusch–Pagan tests were employed to evaluate the presence of heteroscedasticity and to verify the null hypothesis (H0), which assumes the absence of heteroscedasticity, and the alternative hypothesis (H1), which assumes its presence. The White test yielded a value of 15.7133. The null hypothesis is to be rejected if the value of the test statistic is greater than the corresponding critical value, χ2(14), at the chosen confidence level, α = 0.05. However, this would not be the case because χ2(14) = 23.685 > 15.7133. Due to this, in White’s test, the null hypothesis of no heteroscedasticity is accepted. The Breusch–Pagan test statistic for heteroscedasticity is 3.90746. The null hypothesis (H0) is rejected if the value of the Breusch–Pagan test statistic is greater than the corresponding critical χ2(4) value at the chosen confidence level. However, this is not applicable in this instance, as χ2(4) = 9.488 > 3.90746. It is evident that, in accordance with the Breusch–Pagan statistic, the null hypothesis (H0) is accepted, namely, that there is no heteroscedasticity.
The variance inflation factor (VIF) is evaluated as part of the solution to multicollinearity. The VIF values for the analyzed factors d
1–d
4 (
Table 16) are shown in
Table 25. These values indicate that the regression model is relatively favorable. The resulting VIF statistics indicate either no multicollinearity (values around 1) or slightly increased multicollinearity (values around 6). However, all values are below the critical threshold of 10, indicating that serious multicollinearity problems do not threaten the model. The analysis of VIF values confirms the statistical robustness of the regression model. Higher values for carbon and manganese are technologically justified by the chemical dependence of these elements in steelmaking processes. Therefore, the model can be considered suitable for further analysis without the need to eliminate variables or make further structural adjustments. Because of this, each variable provides unique information to the N
SME model (38).
The accuracy of the NSMB model can be evaluated as follows: The Mean Absolute Error (MAE) was calculated using Equation (24), giving a result of MAESME = 0.00056649. Therefore, it can be concluded that the average absolute error is very small.
The mean percentage error (MPE) was computed based on Equation (25). After substituting this into the relationship, the result is MPESME = −5.6189%. Consequently, it can be deduced that the proposed model systematically overestimates reality, with the predicted values being, on average, 5.6189% higher than the actual values.
The Mean Absolute Percentage Error (MAPE) is used to express the average size of forecast errors compared to actual values across the entire forecast period. Substituting into Equation (26) yields MAPE
SME = 19.6271%. The accuracy of the N
SME model can be determined by substituting its parameters into Equation (27). This establishes Equation (39), which determines the accuracy of the N
SME model, as outlined in Equation (40).