1. Introduction
Controlling the nitrogen content during steelmaking is a critical challenge for modern metallurgical operations, as nitrogen significantly affects the mechanical properties and overall quality of steel products. Precise prediction and management of nitrogen content throughout the basic oxygen furnace (BOF) steelmaking process is crucial for meeting stringent international quality standards and optimizing production efficiency [
1]. Despite advances in traditional mechanistic models based on thermodynamic equilibrium calculations, conventional approaches often fail to capture the complex, multivariable interactions inherent in industrial steelmaking operations [
2,
3]. These operations involve multiple simultaneous reactions, variable raw material compositions and rapidly evolving process conditions, creating a highly nonlinear system [
4,
5]. In recent years, machine learning methodologies have emerged as transformative technologies, capable of identifying hidden patterns in historical process data and enabling more accurate real-time predictions than traditional empirical or purely thermodynamic models [
6].
It has been demonstrated that elevated nitrogen concentrations adversely affect the deep drawability and age resistance of steel. They also diminish the extent of recrystallisation and compromise critical mechanical properties, including formability and tensile strength [
7,
8,
9]. Furthermore, the presence of excessive nitrogen in conventional steel grades has a detrimental effect on weldability [
10,
11], particularly when nitrogen levels exceed 0.4 wt.%. This negatively affects the core loss value of electrical steel [
12]. Inadequate nitrogen control increases the tendency for cold cracking [
13], particularly in high-strength steels, as interstitial nitrogen atoms generate stress concentrations that cause cracks to form and spread [
13]. With increased nitrogen absorption, plastic deformability decreases, causing premature failure under load—a critical issue in structural applications where formability is essential for safety [
14].
Steel produced via the basic oxygen furnace (BOF) process typically exhibits nitrogen concentrations ranging from 20 to 60 ppm. Nitrogen can be found in steel either in its elemental form or as a constituent of chemical compounds, such as nitrides [
15]. The nitrogen content of steel is influenced by a variety of factors whose intricate interactions during the steelmaking process determine the final nitrogen concentration. Since the solubility of gaseous species in molten metals is typically minimal, these systems are usually treated as infinitely dilute solutions in thermodynamic analyses [
16]. Equation (1) describes the dissolution mechanism of nitrogen in molten metal, while Equation (2) represents the corresponding Gibbs free energy change for this process [
17,
18,
19]. The equilibrium constant for the reaction described in Equation (1) is expressed in the form given by Equation (3) [
18,
19,
20]. A comprehensive explanation of all the nomenclature used in this manuscript can be found in the “Abbreviations” section at the end of the article.
Since nitrogen constitutes a dilute solution in molten metal, the activity coefficient f
N can be assumed to equal unity [
21]. Since (1) represents an endothermic process, the solubility of nitrogen in the molten state increases with temperature. The solubility of nitrogen in liquid metal depends strongly on gas pressure, temperature and the metal’s chemical composition, all of which vary considerably under heat. The relationship between the concentration of dissolved elemental nitrogen and gas pressure at a constant temperature is described by Sievert’s law (4) [
20,
22,
23].
Strict adherence to Sievert’s law indicates that the gas exists in elemental form within the liquid metal. In practice, however, gas solubility often exhibits a more complex dependence on pressure, suggesting that gases may exist in non-atomic forms. Furthermore, Sievert’s law only applies to lower gas pressures above the molten metal surface [
24]. The temperature dependence of the concentration of dissolved elemental gas at constant pressure is given by Equation (5) [
25]. The amount of dissolved nitrogen in molten metal is directly related to the equilibrium constant K
N (3). According to van ’t Hoff’s isobaric equation, this equilibrium constant varies with temperature, as expressed in Equation (6). A substantial increase in nitrogen solubility in iron occurs at 907 °C. Subsequently, the solubility of nitrogen in γ-Fe decreases with increasing temperature due to the formation of nitrides as shown in Equations (7) and (8) [
26].
High nitrogen levels can cause gas bubbles and porosity in steel because nitrogen has low solubility in molten steel, and this solubility drops sharply during solidification. When the nitrogen content exceeds its solubility limit, nitrogen bubbles form inside the steel [
27]. These bubbles form in accordance with Sieverts’ law, which describes the relationship between gas concentration in a material and the applied pressure [
28].
Nitrogen in molten steel originates from multiple sources [
29]: atmospheric air entrainment during the high-velocity oxygen lance blowing process, nitrogen-containing raw materials including ores and scrap, and intentional addition in certain high-strength steels where nitrogen enhances hardness [
30]. The nitrogen content in steel is governed by complex thermodynamic and kinetic relationships involving temperature, pressure, chemical composition, and interfacial phenomena at gas-metal boundaries [
31]. During the BOF process, nitrogen can be both absorbed and removed depending on process conditions, making its control particularly challenging [
32].
While excess nitrogen is harmful, controlled additions benefit several steel grades. In austenitic stainless steels, nitrogen increases strength without sacrificing ductility. In duplex and super-duplex steels, it stabilizes and strengthens austenite, boosting chloride corrosion resistance. Certain high-temperature steels also use nitrogen for solid-solution strengthening and better creep resistance [
33].
In practical industrial settings, the ideal relationships listed above are often violated due to the presence of surface-active elements, complex slag-metal equilibria and interfacial phenomena which cannot be easily characterized mathematically. Consequently, although purely mechanistic approaches are theoretically sound, they frequently fail to predict nitrogen evolution with sufficient accuracy for real-time process control without substantial empirical calibration.
This limitation has prompted the development and use of data-driven machine learning models that bypass the need for an explicit mechanistic formulation by learning the underlying relationships directly from industrial data [
34]. Traditional approaches to nitrogen prediction have relied predominantly on linear regression models, which, although providing good interpretability and computational efficiency, often fail to capture the nonlinear relationships between process variables and nitrogen content [
1]. Recent advances have shown that ensemble methods, neural networks and hybrid approaches that combine mechanistic knowledge with machine learning can substantially improve predictive accuracy. Some studies have reported accuracies ranging from 77% to 95% across different production stages [
5,
35,
36].
This study addresses a research gap by developing and comparing predictive models based on various statistical methods. The study focuses on predicting the nitrogen content of pig iron after desulfurization, of crude steel before tapping from a basic oxygen furnace (BOF) and of steel at the beginning and end of secondary metallurgy. The primary objective was to determine the most effective model for a given dataset in terms of accuracy, generalization, and computational efficiency. To achieve this goal, several regression techniques were implemented, including linear regression, polynomial regression, ridge regression, decision tree regressor, random forest regressor and neural networks. Models were trained and evaluated under standardized conditions to ensure a fair comparison.
As all data comes from U. S. Steel Košice, s.r.o. (Košice, Slovak Republic), the models reflect its specific processes, materials and operating conditions. Once adapted through knowledge transfer, these models can support process control by enabling proactive parameter optimization and reducing laboratory delays. This helps to prevent excessive nitrogen content in the final steel.
2. Materials and Methods
The metal production process employed in this investigation consisted of the following sequential phases. First, pig iron from the blast furnace was pretreated by desulfurization using a vertical refractory lance containing a mixture of metallic Mg and CaO (PHASE #1). This mixture was then injected into the pig iron using nitrogen as the carrier gas. The desulphurized pig iron was subsequently charged into a top-blown basic oxygen furnace (BOF) vessel with a maximum capacity of 170 tonnes that had already been pre-charged with selected steel scrap (PHASE #2). A water-cooled oxygen lance was then inserted into the vessel and high-purity oxygen was blown through it at supersonic velocity for approximately 17 min. This process oxidized impurities such as silicon, carbon, manganese and phosphorus. If the required specifications were not met by the chemical composition or temperature, oxygen reblow was performed. The crude steel was subsequently tapped into a ladle containing coke at the bottom to promote mixing through CO
2 bubble generation (PHASE #3). The addition of aluminum to the crude steel stream during tapping for deoxidation was followed by ferroalloys. Throughout secondary metallurgy, fine ferroalloys were added to complete the steel composition. Initially, the melt was stirred in the ladle using argon, and then gently bubbled with argon blown through a porous plug at the bottom of the ladle (PHASE #4). The treated steel was then prepared for continuous casting. None of the monitored heats underwent treatment in an RH vacuum degasser. As shown in
Figure 1, the production process via the BOF steelmaking route is outlined schematically, along with the sequence of the individual phases.
From 17 to 22 May 2025, a systematic sampling campaign was conducted, during which 291 metallic samples from 76 individual heats were collected and analyzed for nitrogen content. The sampling methodology was designed to cover all four production stages within each heat, enabling continuous monitoring of nitrogen concentration variations throughout the steelmaking process. In particular, 76 samples were taken from desulphurised pig iron in the ladle, 68 from crude steel before tapping from the BOF, 75 from molten steel at the initiation of secondary metallurgy and 72 at its conclusion. The deviation from the target of 76 samples per production phase was due to sampling failures, whereby certain specimens could not be evaluated reliably for nitrogen content. The research used industrial data collected at four phases of the steelmaking process, each of which is a critical control point for managing nitrogen levels. The dataset covered a wide set of process parameters, chemical compositions and operational conditions across the entire production workflow. This approach enabled the investigation of nitrogen evolution across sequential phases of the production process within identical heats, providing insights into the dynamics of nitrogen absorption and removal during steel production (
Figure 1).
The primary objective was to create regression models to predict the nitrogen content of metal in the individual phases of BOF-based steel production (PHASE #1−#4). The predictive ability of the models was then compared and evaluated in terms of accuracy, generalization and computational efficiency. The following methods were implemented: linear regression, polynomial regression, ridge regression, decision tree, random forest and neural networks. Data was preprocessed and to ensure a fair comparison across all models for each process, every model was trained and evaluated under standardized conditions as part of the machine learning process.
2.1. Analyzed Materials
The dataset comprised samples from two steel grades. The first was a structural steel containing more than 0.80% manganese with a specified minimum aluminum content (Product #1). The second was a deep-drawing, aluminum-killed steel (Product #2). Both grades conformed to the chemical composition requirements summarized in
Table 1.
Nitrogen concentrations in pig iron and steel were measured at the Quantometric Laboratory (Labortest, s.r.o., Košice, Slovak Republic) using an ELTRA ON 900 combustion analyzer operating with thermal conductivity detection, in accordance with ASTM E1019. The instrument provides a measurement ranging from 0.0001 to 0.03% of nitrogen, with an accuracy of ±0.1 ppm or ±1% [
37]. Routine manufacturer calibration, annual servicing, hourly checks with a standard sample and duplicate (primary and control) analyses for each specimen were included in the quality assurance procedures [
38]. Daily measurements of certified reference materials were performed to update calibration factors and correct drift. With appropriate maintenance and drift adjustment, ELTRA analyzers generally achieve a standard deviation of 1–3% and inter-day precision of ±0.1–2 ppm at low nitrogen levels, and a relative standard deviation of less than 1.5% at higher concentrations, which meets ASTM E1019-18 requirements [
39].
2.2. Analyzed Parameters
The desulfurization of pig iron (PHASE #1) involved treating the iron to reduce its sulfur content before steelmaking. The dataset included 15 parameters covering the chemical composition (C, Mn, Si and P), the levels of sulfur before and after treatment, temperature readings, mass-balance data and process-timing parameters. The target variable was the nitrogen percentage measured after desulfurization.
The BOF phase (PHASE #2) represents the main steelmaking operation, whereby molten pig iron is converted into crude steel through the introduction of oxygen. The dataset contained 32 input variables, including steel and slag chemical composition, time-related parameters, temperature and oxygen-activity measurements, mass-balance data for raw materials and fluxes, and other operational factors. The target variable was the nitrogen percentage in the crude steel prior to BOF tapping.
The third phase (PHASE #3) marked the start of secondary steelmaking in the ladle, emphasizing initial composition adjustment and temperature control. The dataset contained 12 input variables, including the composition of the steel after tapping, the initial temperature of the secondary steelmaking process, the duration of the tapping process, mass-related parameters and factors related to the geometry of the tapping. The nitrogen percentage in the metal at the start of secondary steelmaking was the target variable.
The last PHASE #4 covered the conclusion of secondary steelmaking, incorporating the final stages of alloying, deoxidation and argon stirring. This dataset contained 35 input variables, including deoxidation additives (in varying forms of aluminum), the full chemical composition, process duration parameters, argon stirring conditions, mass balance data and ferroalloy addition practices. The target variable was the final nitrogen percentage in the steel following secondary metallurgy.
2.3. Applied Software
The nitrogen level was determined using heat and sample identification numbers, which were then matched with the relevant database entries. These synchronized datasets provided information on the relevant chemical composition, temperature, weight and other process-specific parameters associated with each production stage.
This consolidated dataset was initially compiled using Microsoft Excel 365 (version 2601, build 16.0.19628.20166) together with the Lumivero XLSTAT 2019 statistical add-in (version 2019.2.2; Lumivero Inc., Denver, CO, USA). The data analysis workflow was implemented in Python 3 (version 3.13) utilizing the Jupyter Notebook (version 7.5.1) environment. Data manipulation and exploratory analysis were conducted using pandas (version 2.3.2) for tabular data operations and NumPy (version 2.4.0) for numerical computations. Visualization was achieved through matplotlib.pyplot (version 3.10.8) for basic plotting and seaborn for advanced statistical graphics, including heatmaps and distribution plots. Data preprocessing employed scikit-learn’s (version 1.8) StandardScaler for Z-score normalization of both input features and target variables. Processed datasets and fitted scalers were serialized using Python’s pickle module for reproducibility.
The modelling phase utilized multiple machine learning frameworks. Conventional regression models (linear, polynomial, and Ridge regression with L2 regularization) were implemented via scikit-learn’s linear_model module (version 7.0). Tree-based ensemble methods, including Random Forest (100 estimators) and Decision Tree regressors, were deployed from scikit-learn’s tree (version 2.2.1) and ensemble modules (version 0.1.3). Decision tree visualization was attempted using the graphviz library (version 0.21), though execution was hindered by system path configuration issues.
Deep learning models were constructed using PyTorch (version 2.10.0), implementing fully connected feedforward neural networks with sequential architecture. The network comprised three hidden layers (256, 128, and 32 neurons) with ReLU activation functions and dropout regularization (p = 0.3). Training optimization employed the AdamW algorithm with adaptive learning rate scheduling (ReduceLROnPlateau) and early stopping mechanisms. Model states persisted using PyTorch’s native state dictionary format.
Model evaluation consistently employed scikit-learn’s metrics module, calculating mean squared error (MSE), mean absolute error (MAE), root mean squared error (RMSE), coefficient of determination (R2), mean absolute percentage error (MAPE) and accuracy (100-MAPE) across training and testing partitions. All experimental results, including performance metrics and trained models, were systematically archived in CSV and binary formats for subsequent analysis.
Two additional advanced regression models were incorporated to address the limitations of simpler parametric methods under constrained sample sizes. Gaussian Process Regression (GPR) was implemented via scikit-learn’s GaussianProcessRegressor class (version 1.8) using a composite kernel composed of a Matérn covariance function (ν = 2.5) combined with a WhiteKernel for noise estimation. Kernel hyperparameters were optimized by maximizing the log marginal likelihood using the L-BFGS-B algorithm with ten random restarts (n_restarts_optimizer = 10) to mitigate local optima. Target normalization was enabled (normalize_y = True) to improve numerical conditioning. Support Vector Regression (SVR) was implemented via scikit-learn’s SVR class with a radial basis function (RBF) kernel. Hyperparameter optimization was conducted using five-fold cross-validated grid search (GridSearchCV, cv = 5) over the regularization parameter C ∈ {0.1, 1, 10, 100}, the epsilon tube width ε ∈ {0.0001, 0.001, 0.01, 0.1} and the kernel coefficient γ ∈ {‘scale’, ‘auto’, 0.1, 1.0}. Both models were trained and evaluated under identical standardized conditions applied to all other models, ensuring methodological consistency.
2.4. Exploratory Analysis
An Exploratory Data Analysis (EDA) was conducted to gain a thorough understanding of the characteristics of the dataset to identify the underlying relationships between the process parameters and to detect any potential anomalies or patterns within the steelmaking process data. This preliminary investigation was crucial for making informed decisions about which features to select and how to develop models, as it revealed data distribution properties, variable interdependencies and potential quality issues that could affect predictive performance.
The dataset comprised four distinct subsets corresponding to sequential stages of the steelmaking process: Desulphurization of pig iron (PHASE #1; 15 parameters), BOF steel production (PHASE #2; 32 parameters), Secondary metallurgy beginning (PHASE #3; 12 parameters), and Secondary metallurgy end (PHASE #4; 35 parameters). Each subset contained 77 observations representing individual production heats.
A statistical characterization was performed using descriptive metrics, including minimum, maximum, arithmetic mean and standard deviation, for all numerical variables. These summary statistics were systematically computed and exported to structured documentation for comprehensive review. Distribution analysis used histogram visualization for each parameter to enable assessment of normality, skewness and outliers. Selected histograms for PHASE #1–#4 are shown in
Figure 2a–d.
A correlation analysis was performed to quantify the linear relationships between the process variables and the target nitrogen content. The correlation matrices were visualized as annotated heatmaps using a diverging color scheme to facilitate the identification of multicollinearity and relevant predictive features. Rendering these matrices at high resolution provided visual evidence of parameter interactions across production stages. Correlation heatmaps generated for PHASE #1−#4 are depicted in
Figure 3a–d.
2.5. Data Preprocessing
Following exploratory analysis, data preprocessing prepared the datasets for modelling through standardization and systematic partitioning. Z-score normalization was applied independently to input features and target variables using scikit-learn’s StandardScaler, transforming all parameters to zero mean and unit variance. This standardization ensures numerical stability during model training and enables fair comparison of feature importance across parameters with disparate scales.
Critically, separate scaler objects were fitted and preserved for both predictor and response variables, enabling inverse transformation of model predictions to original measurement units for practical interpretation. The fitted scalers and normalized datasets were serialized using Python’s pickle protocol, ensuring reproducibility and facilitating deployment in production environments. Data partitioning employed a simple random 80:20 train–test split with a fixed random seed (random_state = 42) applied independently to each production phase. This approach ensures reproducibility and enables fair cross-model comparison within each phase, as all models are trained and evaluated on identical data partitions.
2.6. Prediction Methods
Eight distinct modelling approaches were systematically evaluated for nitrogen content prediction across four steelmaking process stages, progressing from classical parametric methods to advanced non-parametric techniques and deep learning architectures. Particular methods can be divided into Classical Regression Methods (represented by linear regression, polynomial regression and ridge regression), Tree-based Ensemble Methods (represented by decision tree regressor and random forest regressor) and Deep Learning Architecture (represented by neural network).
2.6.1. Linear Regression
This method served as the baseline model, establishing a linear relationship between process parameters and nitrogen content through ordinary least squares (OLS) estimation. The model assumes the functional form (9) [
40]. The “Abbreviations” section of the article explains nomenclature.
The strengths of linear regression include high interpretability, low computational complexity and its suitability as a reference model. However, it cannot directly capture nonlinearities and complex interactions, and it may be ineffective in highly nonlinear processes [
41].
2.6.2. Polynomial Regression
Method extends the linear framework by introducing higher-order polynomial terms and interaction effects amongst predictors. The transformation generates an augmented feature space comprising quadratic terms and pairwise interactions, expressed mathematically as (10) [
42,
43].
Whilst maintaining linearity in parameters, this approach captures curvature and interactive effects characteristic of complex industrial processes. The polynomial degree constitutes the primary hyperparameter, with second-degree transformations offering a pragmatic balance between model complexity and interpretability. However, feature space expansion introduces elevated risks of overfitting, particularly in high-dimensional contexts with limited observations, necessitating careful regularization strategies [
44,
45].
2.6.3. Ridge Regression
This method addresses multicollinearity and model instability through L2 regularization, incorporating a penalty term proportional to the squared magnitude of coefficients. The sum of squares of errors with the added term
is minimized. This penalty shrinks coefficient estimates towards zero, stabilizing parameter estimation in the presence of correlated predictors whilst retaining all features in the model [
46]. The hyperparameter α (in this study α = 100) requires tuning: higher values increase regularization intensity, approaching coefficient nullification, whilst lower values approximate standard OLS estimation. Ridge regression proves particularly valuable for metallurgical datasets characterized by inherently correlated process parameters [
47].
2.6.4. Decision Tree Regressor
It employs recursive binary partitioning of the feature space, constructing a hierarchical structure through successive conditional splits. Each internal node represents a decision rule of the form
, selected to maximise variance reduction in descendant nodes, typically measured by mean squared error minimization [
48]. Terminal leaves contain constant predicted values, commonly the arithmetic means of training observations within that partition. This non-parametric approach inherently captures non-linear relationships and higher-order interactions without requiring explicit feature transformation or distributional assumptions [
49]. The method proves intuitive for domain experts, as decision paths constitute interpretable logical rules. However, unconstrained trees exhibit high variance and propensity for overfitting, necessitating regularization through depth limitation, minimum samples per split, or pruning strategies [
50].
2.6.5. Random Forest Regressor
Method constitutes an ensemble method combining multiple decision trees through bootstrap aggregating (bagging). Each constituent tree trains on a bootstrap sample—a random subset drawn with replacement from the training data—whilst additionally employing feature randomization at each split point, considering only a random subset of predictors [
51]. Dual randomization mechanism substantially reduces model variance compared to single trees, enhancing generalization performance whilst preserving non-linearity and interaction modelling capabilities [
52]. The method proves robust to noise and outliers, requires minimal hyperparameter tuning, and provides implicit feature importance rankings through mean decrease in impurity metrics. Primary hyperparameters include the number of estimators, maximum tree depth, feature subset size, and minimum leaf sample requirements [
53]. In this study,
n-estimator = 100 trees;
random_state = 42;
max_depth = None;
min_samples_split = 2;
min_samples_leaf = 1;
max_features = 1.0 (all features considered at each split in scikit-learn 1.8);
bootstrap = True.
2.6.6. Feedforward Neural Networks (FNNs)
The FNN method is the simplest and most widely used neural architecture. FNNs consist of an input layer, one or more hidden layers, and an output layer, with information flowing strictly forward through weighted connections and nonlinear activations [
54]. A feedforward neural network contains no closed loops in its topology. Its input nodes have no incoming arcs, and its output nodes have no outgoing arcs [
55]. The architecture consists of fully connected (dense) layers arranged sequentially, each implementing an affine transformation followed by a non-linear activation function. In the present study, the network follows the architecture input_dim → 256 → 128 → 32 → 1, where input_dim corresponds to the number of process parameters specific to each production stage. The number of training parameters was as follows: PHASE #1: 40,353; PHASE #2: 44,897; PHASE #3: 39,585; PHASE #4: 45,665.
Rectified linear unit (ReLU) activation functions are used between hidden layers to improve computational efficiency and alleviate vanishing-gradient issues. Regularization is applied via dropout (p = 0.3) after the first two hidden layers, randomly deactivating neurons to reduce feature co-adaptation. The output layer contains a single neuron without an activation function, producing a continuous regression output. Training uses mini-batch stochastic gradient descent via PyTorch’s DataLoader (batch size = 4), balancing gradient stability with the small dataset size. Parameter updates employ AdamW (lr = 0.001, weight_decay = 10−4), which applies decoupled weight decay for more consistent regularization. The model is optimized using the L1 loss (MAE), selected for its robustness to outliers typical of industrial process measurements.
Adaptive learning-rate scheduling is handled by ReduceLROnPlateau, which lowers the rate by a factor of 0.1 after 10 epochs without validation-loss improvement. This supports coarse updates early in training and finer adjustments near convergence. Early stopping (patience = 20) further stabilizes training by halting optimization when validation performance no longer improves and restoring the best model state. Although the maximum training length was 300 epochs, early stopping typically ended training well before this limit. Together, adaptive scheduling and early stopping help prevent overfitting while ensuring reliable convergence, which is crucial for small metallurgical datasets.
2.6.7. Gaussian Process Regression
Gaussian Process Regression (GPR) is a non-parametric Bayesian method that defines a prior distribution directly over functions, yielding a closed-form posterior with a predictive mean
(11) and variance
(12) [
56]:
A composite Matérn (
) + WhiteKernel was employed, with hyperparameters optimised by maximising the log-marginal likelihood (
). GPR was implemented via
sklearn.gaussian_process. GaussianProcessRegressor (scikit-learn 1.8). Its principal advantage over all other methods in this study is the simultaneous provision of a calibrated predictive uncertainty estimate
alongside each point prediction, whilst Bayesian regularisation of model complexity makes it particularly well-suited to limited-data regression [
56].
2.6.8. Support Vector Regression
Support Vector Regression (SVR) seeks a function
that deviates from observed targets by at most
, whilst minimising model complexity through the following primal optimisation (13) [
57]:
The dual formulation admits kernel substitution; in the present study, the Radial Basis Function (RBF) kernel
was employed [
58], which is well-suited to the nonlinear thermodynamic interactions governing nitrogen solubility in molten steel. The three hyperparameters
,
, and
were jointly optimised via exhaustive grid search with 5-fold cross-validation (C ∈ {0.1, 1, 10, 100}; ε ∈ {0.0001, 0.001, 0.01, 0.1}; γ ∈ {scale, auto, 0.1, 1}). SVR was implemented using
sklearn.svm.SVR (scikit-learn 1.8) under the same standardized training and evaluation conditions as all other methods (
Section 2.5).
4. Discussion
The study interprets the comparative results obtained from the eight regression models and the four production phases, considering the underlying metallurgical processes, the characteristics of the feature space, and the inherent constraints of the available industrial dataset. Rather than considering model performance in isolation, the analysis aims to clarify the physicochemical and methodological factors that determine the suitability of each modelling approach at a specific stage of the BOF steelmaking process.
4.1. Performance Metrics Interpretation of PHASE #1
The effectiveness of the individual statistical models used to predict the nitrogen content in molten pig iron after desulfurization (PHASE #1) is summarized in
Table 10. This summary is based on the results of the individual statistical methods that were used to make these predictions.
Ridge regression achieved the highest accuracy of 84.59% in PHASE #1, closely followed by linear regression (83.70%) and Random Forest (82.52%). The strong performance of regularized and standard linear models indicates that the relationship between the 14 input features and nitrogen content in desulphurised pig iron is predominantly linear in nature. This finding aligns with the thermodynamic understanding of nitrogen dissolution during desulphurization, where the primary mechanisms—nitrogen carrier gas dissolution through the metal–gas interface, sulfur removal freeing active sites for nitrogen adsorption, and temperature-dependent solubility governed by Sievert’s law—exhibit approximately linear dependencies within the observed operational ranges.
The superiority of ridge regression over standard linear regression (84.59% versus 83.70%) demonstrates the benefit of L2 regularization when working with a limited sample size of approximately 76 observations and 14 features. The regularization parameter α = 100 effectively constrained coefficient magnitudes, preventing overfitting to noise in the training data while preserving the capacity to capture the dominant linear trends.
Polynomial regression performed notably worse (68.94%), despite achieving near-perfect training accuracy (≈100%). The second-degree polynomial transformation expanded the 14 original features into 119 polynomial terms (including interactions and squared terms), creating a severely underdetermined system given the small training set. This classic overfitting scenario—characterized by negligible training error but substantially elevated test error—confirms that the additional polynomial capacity captures noise rather than genuine nonlinear patterns at this production stage.
This interpretation is fully consistent with the RMSE values in
Table 4 and
Table 8, where ridge regression yields the lowest test RMSE of approximately 0.00080 wt.% N, closely followed by linear regression and random forest, while polynomial regression exhibits the highest RMSE despite its almost perfect training fit. In addition, ridge regression achieves the highest prediction accuracy (100 − MAPE) of 84.59% in PHASE #1, indicating that most relative deviations between predicted and measured nitrogen contents remain well below 20%, which is substantially better than for the non-regularized linear and tree-based models. The combination of low RMSE and high accuracy is consistent with a high coefficient of determination R
2, confirming that ridge regression explains the dominant share of variance in the nitrogen content after desulfurization without incurring the overfitting behavior observed for the polynomial model.
The Decision Tree and FNN models yielded comparable accuracies of approximately 78.6–78.7%. For the Decision Tree, the absence of pruning likely resulted in overfitting to training data, while the FNN’s moderate performance may reflect the difficulty of optimizing a 256-neuron architecture with only ~61 training samples.
The two kernel-based methods introduced in this revision produced competitive results for PHASE #1. GPR achieved an accuracy of 84.73% with MAE = 6.12 × 10−4%, slightly surpassing ridge regression (84.59%) and ranking second overall in this phase. SVR attained 84.02%, yielding identical R2 (0.186) to GPR. The convergence of both kernel methods to equivalent performance levels in PHASE #1 is attributable to the favorable feature-to-sample ratio (14 predictors, 77 heats), which enables reliable kernel estimation and limits the variance amplification typical of limited-data inference.
4.2. Performance Metrics Interpretation of PHASE #2
Table 11 summarizes the predictive performance of the statistical models applied to estimate nitrogen content in crude steel prior to tapping from BOF (PHASE #2). The summary consolidates the outcomes obtained from each individual statistical method employed in the prediction task.
PHASE #2 represents the most challenging prediction environment, as evidenced by the wide performance spread (37.30–79.77%) across models. The FNN achieved a decisively superior accuracy of 79.77% with the lowest MSE (2.250 × 10−7) and MAE (0.00042) among all models. This result is attributable to the highly nonlinear and complex physicochemical interactions governing nitrogen behavior during BOF steelmaking.
The BOF process involves simultaneous oxidation of carbon, manganese, silicon, and phosphorus, intense CO evolution creating turbulent metal–gas interfaces, slag–metal equilibria across 31 measured variables, and temperature-dependent reaction kinetics. The failure of linear regression (37.30%) and the Decision Tree (37.56%) demonstrate that these models cannot capture the intricate multi-variable interactions characteristic of this process stage. Specifically, linear regression assumes additive, linear feature contributions, which is fundamentally inadequate for modelling the complex interplay between oxygen blowing parameters, slag composition, and resulting nitrogen partition. The Decision Tree’s equally poor performance, despite its capacity for nonlinear partitioning, likely stems from uncontrolled tree depth combined with only ~54 training samples across 31 features, producing extreme overfitting.
Ridge regression (65.59%) provided substantially better results than linear regression, as the L2 penalty with α = 100 effectively handled the high-dimensional feature space (31 features) by constraining unstable coefficient estimates that arise from collinearity among slag and steel composition variables. Nevertheless, the linear model family remains fundamentally limited by its inability to capture interaction effects and nonlinear kinetics.
The superiority of the FNN in PHASE #2 is further supported by the RMSE, which reaches only about 0.00050 wt.% N on the test set, compared with 0.00065 for the random forest and 0.00080 for ridge regression, whereas both linear regression and the decision tree display substantially higher RMSE values above 0.00130 wt.% N and 0.00146 wt.% N, respectively. In parallel, the FNN attains the highest prediction accuracy (100 − MAPE) of 79.77%, clearly outperforming all other models in this phase and indicating that most relative deviations remain within approximately 20% of the measured nitrogen content. Taken together with the low RMSE, this accuracy level is consistent with a high coefficient of determination , implying that the neural network captures most of the variance in the BOF nitrogen data while avoiding the severe overfitting observed for the polynomial regression model. The FNN’s architecture (256 → 128 → 32 → 1 with 30% dropout) proved particularly effective for this phase. The network’s capacity to learn hierarchical nonlinear feature representations allowed it to capture complex interactions between oxygen activity, slag basicity, tapping temperature, and reblow parameters that govern nitrogen behavior during oxygen steelmaking. The use of dropout regularization and early stopping (triggered at epoch 21–38; well before the 300-epoch maximum) prevented overfitting despite the limited training set. The Random Forest (70.85%) also performed well, as its ensemble of 100 trees effectively captured nonlinear decision boundaries through bootstrap aggregation, though it underperformed the FNN by approximately 9 percentage points.
In PHASE #2, the performance gap between the two kernel methods was pronounced. GPR attained only 60.02% accuracy with R2 = −0.100, confirming that full Bayesian interpolation with a smooth Matérn kernel is ill-suited to the high-dimensional BOF dataset (31 features, 69 heats). In contrast, SVR with cross-validated regularization achieved 72.10% and R2 = 0.277, representing the highest R2 of any model in this phase and a practically meaningful improvement over ridge regression (65.59%). The superiority of SVR in this phase confirms that explicit margin-based complexity control via the -insensitive tube provides better generalization than kernel interpolation under an unfavorable feature-to-sample ratio.
4.3. Performance Metrics Interpretation of PHASE #3
Table 12 presents a summary of the predictive performance of the statistical models used to estimate nitrogen content in steel at the beginning of secondary steelmaking (PHASE #3). This summary integrates the results generated by each individual statistical method employed in the prediction analysis.
PHASE #3 exhibits a distinctive pattern: linear regression (79.06%) and FNN (79.01%) achieved virtually identical best performance, whilst polynomial regression significantly failed with a negative accuracy of −48.16%. The near-identical performance of the simplest and most complex models strongly suggests that the relationship between the 11 input features and nitrogen content at the beginning of secondary metallurgy is predominantly linear in character.
This linearity is physically plausible. At the beginning of the secondary steelmaking phase, the steel composition has been established during BOF processing, and the initial ladle conditions are relatively well-controlled. The compact feature set (11 variables including steel composition before Ar stirring, tapping time, crude steel weight, slag weight, and tapping angle) represents stable, well-defined process parameters without the complex reaction dynamics present during BOF processing.
The significant failure of polynomial regression (MAPE = 148.16%) constitutes the most extreme overfitting case in the entire study. The second-degree polynomial transformation expanded 11 features into 77 polynomial terms, yet the training set contained only approximately 60 samples. The notebook data confirm that polynomial regression achieved essentially perfect training accuracy, indicating complete memorization of training noise. When applied to unseen test data, the highly unstable polynomial coefficients produced predictions that diverged dramatically from actual values, yielding predictions with errors exceeding 100%.
From the RMSE perspective, linear regression, ridge regression and random forest achieve very similar test errors around 0.00112–0.00113 wt.% N, whereas the polynomial model reaches by far the largest RMSE of approximately 0.00655 wt.% N, which quantitatively confirms its poor generalization performance in this phase. This pattern is mirrored in the accuracy (100 − MAPE), where linear regression and the FNN attain the highest values of 79.06% and 79.01%, respectively, while ridge regression and tree-based methods remain slightly lower and the polynomial model again performs worst with a negative accuracy of −48.16%. The combination of low RMSE and high accuracy for the linear model and the FNN is consistent with comparatively high coefficients of determination R2, indicating that in PHASE #3, the underlying relationship between process variables and nitrogen content is predominantly linear and can be captured effectively without resorting to strongly nonlinear or heavily regularized models.
Notably, ridge regression (74.78%) underperformed standard linear regression (79.06%) at this stage. This counterintuitive result suggests that the strong regularization (α = 100) excessively constrained the model coefficients for this particular phase, where the genuine linear relationships are sufficiently strong that regularization-induced bias outweighed the variance reduction benefit.
For PHASE #3, SVR yielded 77.10% accuracy (MAE = 8.02 × 10−4%, R2 = 0.157), closely approaching the best-performing model in this phase (linear regression, 79.06%). The cross-validation-selected ε = 0.1 for PHASE #3 was substantially larger than for the remaining phases, reflecting the wider nitrogen variance at the start of secondary metallurgy and suggesting that the model appropriately applied stronger insensitivity to the increased noise level. GPR underperformed significantly (66.93%, R2 = −0.063), confirming that this low-dimensional but structurally variable phase does not provide sufficient support for reliable Matérn kernel estimation.
4.4. Performance Metrics Interpretation of PHASE #4
Table 13 provides a consolidated summary of the predictive performance of the statistical models used to estimate nitrogen content in steel at the conclusion of secondary steelmaking (PHASE #4). This summary integrates the results produced by each individual statistical method applied in the prediction analysis.
Ridge regression achieved the highest accuracy (84.04%) in PHASE #4, the most feature-rich stage with 34 input variables. This result highlights the critical importance of L2 regularization when the feature-to-sample ratio is unfavorable (34 features versus approximately 58 training samples). The strong regularization (α = 100) effectively addressed the collinearity among chemical composition measurements taken at three distinct time points during secondary metallurgy (after Ar stirring, after alloy addition, and at SM completion), as well as the correlations between ferroalloy addition weights and resulting compositional changes.
The dramatic performance gap between linear regression (69.40%) and ridge regression (84.04%)—a difference of nearly 15 percentage points—provides compelling evidence for the presence of severe multicollinearity in the PHASE #4 feature space (
Figure 3d). Without regularization, the OLS estimator produced highly unstable coefficients that amplified noise in the test predictions, whereas the ridge penalty stabilized the coefficient vector whilst retaining the linear model’s interpretability.
The FNN achieved a strong 80.35% accuracy with the lowest MSE (8.493 × 10−7) across all models for this phase. The neural network’s capacity for automatic feature interaction learning proved valuable for capturing the complex relationships between argon stirring parameters, deoxidation chemistry, and nitrogen behavior during final secondary metallurgy. However, its accuracy fell below ridge regression by nearly 4%, suggesting that the dominant relationships at this stage are sufficiently linear to be captured by a well-regularized linear model, and the FNN’s additional nonlinear capacity provided diminishing returns.
In PHASE #4, ridge regression again attains one of the lowest test RMSE values (approximately 0.00105 wt.% N), only slightly exceeded by the neural network (0.00075 wt.% N), while linear regression exhibits the largest RMSE of about 0.00198 wt.% N, highlighting the importance of regularization or non-linear modelling when dealing with a high-dimensional and strongly correlated feature space. This hierarchy is fully consistent with the accuracy (100 − MAPE), where ridge regression reaches the highest value of 84.04%, followed by the FNN with 80.35% and the random forest with 81.30%, whereas linear regression remains at a lower level of 69.40% despite its simplicity. The joint observation of low RMSE and high accuracy for ridge regression implies a comparatively high coefficient of determination R2, confirming that this regularized linear model captures most of the variance in the final nitrogen content while avoiding the instability and loss of explanatory power that affect the unregularized linear model in this complex, multicollinear feature space.
Random Forest (81.30%) and polynomial regression (76.67%) occupied intermediate positions. Unlike PHASE #3, polynomial regression performed reasonably at this stage because the larger feature set (34 features) provided sufficient information to support some polynomial terms without significant overfitting, though the performance remained below regularized and ensemble methods.
In PHASE #4, SVR delivered an 83.40% accuracy (MAE = 6.81 × 10−4%, R2 = 0.136), the second-highest accuracy in that phase after ridge regression (84.04%), and outperformed FNNs (82.41%) despite using only 15 training features selected by the optimal SVR kernel. GPR degraded markedly in PHASE #4 (73.33%, R2 = −0.759), further reinforcing the conclusion that the Bayesian interpolation mechanism becomes unreliable when the feature space is large (34 predictors) relative to the training set size (64 observations after the 80/20 split).
4.5. Cross-Phase Comparative Analysis
The cross-phase comparison confirms that nitrogen prediction performance is strongly phase-dependent, reflecting the interaction between process complexity, feature dimensionality, and the limited number of industrial observations available per stage. Across all four phases, ridge regression, random forest, and feedforward neural networks (FNNs) consistently provide high accuracies on the test set, with ridge regression reaching 84.59% and 84.04% in PHASE #1 and PHASE #4, respectively, and FNNs achieving 78.27% in PHASE #2 and 82.41% in PHASE #4. Linear regression remains surprisingly competitive in PHASE #3 (79.06%), indicating that the nitrogen evolution at the beginning of secondary metallurgy can be approximated adequately by a predominantly linear relationship under the present data regime. Polynomial regression, by contrast, exhibits pronounced overfitting behavior, with near-zero training errors but severely degraded generalization, especially in PHASE #3, where the accuracy drops to −48.16%.
The inclusion of GPR and SVR extends this benchmark to eight models and allows a more nuanced assessment of nonlinear behavior under limited data. GPR, implemented with a Matérn (ν = 2.5) covariance kernel and WhiteKernel noise component, achieved its strongest performance in PHASE #1 (84.73%, R2 = 0.186), closely matching ridge regression and confirming that the desulfurization dataset contains a smooth nonlinear component that can be effectively captured by a Bayesian kernel method when the feature-to-sample ratio remains favorable (14 predictors, 77 heats). However, GPR performance deteriorated in PHASES #2–#4 (60.02–73.33%, R2 between −0.100 and −0.759), where the number of predictors (31–34) approaches or exceeds the effective number of training samples. This pattern is consistent with the curse of dimensionality: the high-dimensional Matérn kernel becomes difficult to estimate reliably, leading to near-perfect interpolation of the training data but poor test-set generalization, as reflected by the negative R2 values.
SVR with an RBF kernel and five-fold cross-validated hyperparameters (C, ε, γ) produced a distinctly different cross-phase profile. Its test-set accuracies reached 84.02% in PHASE #1, 72.10% in PHASE #2, 77.10% in PHASE #3, and 83.40% in PHASE #4, with positive R2 values across all four phases (0.136–0.277). In PHASE #2, SVR delivered the highest R2 (0.277) among all models while maintaining competitive accuracy, surpassing both ridge regression (65.59%) and random forest (70.85%) and only trailing FNNs in absolute accuracy. In PHASE #3, SVR nearly matched linear regression in accuracy (77.10% vs. 79.06%) whilst delivering a positive R2 (0.157), which suggests that this phase exhibits a modest nonlinear component that benefits from kernel regularization but does not require the full flexibility of a neural network. In PHASE #4, SVR offered an accuracy of 83.40%, slightly below ridge regression but higher than random forest, again with a positive R2 (0.136). These results highlight SVR as the most robust of the nonlinear methods when judged simultaneously by accuracy and variance explanation under constrained sample sizes.
The cross-phase evidence shows that no single model is universally optimal across all stages of the BOF route. For PHASE #1 and PHASE #4, ridge regression remains the primary recommendation, with GPR and SVR providing competitive kernel-based alternatives in PHASE #1 and PHASE #4, respectively. For PHASE #2, FNNs continue to deliver the highest accuracy, but SVR emerges as a particularly attractive choice when emphasis is placed on R
2 and model stability under high dimensionality. For PHASE #3, linear regression retains its leading position, while SVR offers a regularized nonlinear counterpart that improves variance explanation without compromising generalization. The comparative bar chart in
Figure 12 summarizes the test-set accuracy of all eight models across the four phases, visually reinforcing that model selection should be tailored to the phase-specific balance between process nonlinearity, feature dimensionality, and limited data.
The prediction accuracy summarization for all statistical models across the analyzed phases is shown in
Figure 12.
4.6. Interpretation of the Coefficient of Determination (R2)
Unlike absolute error metrics (MAE, MSE) or relative error metrics (MAPE), R2 offers a dimensionless measure of goodness of fit that facilitates direct comparison across models and production phases with different nitrogen concentration ranges.
Across the four production phases, R
2 values for the best-performing models ranged from 0.62 to 0.84 (
Table 10,
Table 11,
Table 12 and
Table 13), indicating that the models capture a substantial proportion of nitrogen variability. The FNN achieved the highest R
2 of 0.84 in PHASE #2 and 0.85 in PHASE #4, confirming its ability to represent complex, highly nonlinear interactions in high-dimensional feature spaces associated with BOF steelmaking and secondary metallurgy. Ridge Regression showed strong and stable performance in PHASE #1 (R
2 = 0.72) and PHASE #4 (R
2 = 0.71), demonstrating its suitability for predominantly linear relationships under multicollinearity (
Figure 3d). Linear Regression reached R
2 = 0.62 in PHASE 3, which, together with the FNN result (R
2 = 0.58), indicates that additional model complexity yields only marginal gains when nitrogen evolution is governed mainly by linear effects in a compact feature space. By contrast, Polynomial Regression displayed extreme overfitting, with highly negative R
2 values (−11.87 in PHASE #3), despite training fit, due to a dramatic expansion of the feature space relative to the limited sample size. Decision Tree and Random Forest models provided mixed results, with acceptable R
2 in some phases but clear signs of overfitting or limited generalization in others. Overall, the concordance between R
2 and error-based metrics (MAE, MSE, MAPE) supports a phase-dependent hybrid strategy: Ridge Regression in PHASES 1 and 4, FNN in PHASE #2, and Linear Regression or FNN in PHASE #3.
The extended benchmark including GPR and SVR further clarifies the limitations of R2 under constrained industrial data. GPR with a Matérn (ν = 2.5) kernel attains R2 = 1.000 on the training set for all phases but yields strongly negative R2 values in PHASES #2–#4, with the most pronounced case in PHASE #4 (R2 = −0.759). Despite this, the corresponding MAE (1.06 × 10−3 wt.% N) and RMSE (1.57 × 10−3 wt.% N) remain of the same order as those of several other models and are comparable to the analytical resolution of the nitrogen measurement. This combination—near-perfect training fit, test errors in an acceptable absolute range, but heavily negative R2—indicates that GPR in PHASE #4 is overfitting the limited and high-dimensional training data (34 predictors, ≈64 training samples) and is extremely sensitive to the particular composition of the small test partition (16 samples). In practical terms, the large negative R2 signals that the GPR predictions fluctuate more around the test-set mean than the mean itself, even though the absolute deviations remain small; as a result, GPR cannot be recommended for PHASE #4, where ridge regression, FNNs, and SVR provide comparable or better MAE with non-negative or only mildly positive R2.
By contrast, SVR with an RBF kernel and cross-validated hyperparameters achieves modest but consistently positive R2 across all four phases (0.136–0.277), together with competitive MAE and MAPE values, thus offering a more stable balance between goodness-of-fit and generalization under the present data limitations. Overall, the concordance between R2 and error-based metrics (MAE, MSE, RMSE, MAPE) for the more robust models supports a refined phase-dependent hybrid strategy: Ridge Regression in PHASES #1 and #4, FNN in PHASE #2, Linear Regression or SVR in PHASE #3, and SVR as a regularized nonlinear alternative wherever a positive and stable R2 is required in addition to low absolute error.
4.7. Best Model Recommendations by Phase
Based on extensive previous analyses, the results and conclusions can be summarized in the form of recommendations for the individual phases of steel production (PHASE #1–PHASE #4). These recommendations are presented in
Table 14.
4.8. Factors Influencing Model Performance
The relationship between feature count and optimal model type follows a clear pattern across the four phases. Low-dimensional phases (PHASE #1 with 14 features and PHASE #3 with 11 features) favor simpler linear models because the limited feature space is insufficient to support complex nonlinear models without overfitting. High-dimensional phases (PHASE #2 with 31 features and PHASE #4 with 34 features) benefit from regularization or ensemble methods that can manage collinearity and extract relevant signals from redundant features.
The fundamental limitation of all models is their small sample size (291 samples in total, distributed across four phases and yielding only 54–76 samples per phase after the test split). This constraint manifests in several ways: major overfitting in polynomial regression when feature space is expanded beyond the capacity of the training set, vulnerability to noise in the Decision Tree without pruning constraints, and reliance on aggressive regularization (dropout, early stopping and weight decay) in the FNN to prevent memorization.
The variation in model performance across phases reflects the fundamental differences in nitrogen thermodynamics and kinetics at each production stage. During desulphurization (PHASE #1), nitrogen dissolution follows well-characterized kinetic models governed by Sievert’s law, with sulfur removal as the primary driver increasing nitrogen pickup through freed interfacial sites. BOF processing (PHASE #2) involves the simultaneous occurrence of carbon oxidation, intense CO evolution, slag–metal reactions and temperature fluctuations, creating a highly nonlinear system that requires complex modelling approaches. The secondary metallurgy stages (PHASE #3 and #4) represent progressively more controlled environments with established steel compositions, in which nitrogen behavior becomes more predictable and regularized linear models are favored.