Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers

Akbas, Merve

doi:10.3390/app15158516

Open AccessArticle

Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers

by

Merve Akbas

Department of Geotechnical Engineering, Faculty of Civil Engineering, Istanbul Technical University, Maslak 34469, Istanbul, Turkey

Appl. Sci. 2025, 15(15), 8516; https://doi.org/10.3390/app15158516

Submission received: 8 July 2025 / Revised: 23 July 2025 / Accepted: 30 July 2025 / Published: 31 July 2025

(This article belongs to the Special Issue Advanced Technologies and Optimization for Sustainable Geotechnical Engineering)

Download

Browse Figures

Versions Notes

Abstract

This study proposes an integrated, machine learning-based multi-objective optimization framework to evaluate and optimize the utilization of steel slag in road base layers, simultaneously addressing economic costs and environmental impacts. A comprehensive dataset of 482 scenarios was engineered based on literature-informed parameters, encompassing transport distance, processing energy intensity, initial moisture content, gradation adjustments, and regional electricity emission factors. Four advanced tree-based ensemble regression algorithms—Random Forest Regressor (RFR), Extremely Randomized Trees (ERTs), Gradient Boosted Regressor (GBR), and Extreme Gradient Boosting Regressor (XGBR)—were rigorously evaluated. Among these, GBR demonstrated superior predictive performance (R² > 0.95, RMSE < 7.5), effectively capturing complex nonlinear interactions inherent in slag processing and logistics operations. Feature importance analysis via SHapley Additive exPlanations (SHAP) provided interpretative insights, highlighting transport distance and energy intensity as dominant factors affecting unit cost, while moisture content and grid emission factor predominantly influenced CO₂ emissions. Subsequently, the Gradient Boosted Regressor model was integrated into a Non-Dominated Sorting Genetic Algorithm II (NSGA-II) framework to explore optimal trade-offs between cost and emissions. The resulting Pareto front revealed a diverse solution space, with significant nonlinear trade-offs between economic efficiency and environmental performance, clearly identifying strategic inflection points. To facilitate actionable decision-making, the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method was applied, identifying an optimal balanced solution characterized by a transport distance of 47 km, energy intensity of 1.21 kWh/ton, moisture content of 6.2%, moderate gradation adjustment, and a grid CO₂ factor of 0.47 kg CO₂/kWh. This scenario offered a substantial reduction (45%) in CO₂ emissions relative to cost-minimized solutions, with a moderate increase (33%) in total cost, presenting a realistic and balanced pathway for sustainable infrastructure practices. Overall, this study introduces a robust, scalable, and interpretable optimization framework, providing valuable methodological advancements for sustainable decision making in infrastructure planning and circular economy initiatives.

Keywords:

multi-objective optimization; steel slag; cost–emission optimization; tree-based ensemble learning; NSGA-II

Graphical Abstract

1. Introduction

Natural aggregates have long been the principal component in road base layers due to their mechanical reliability, availability, and cost-effectiveness. However, the ever-increasing global demand for infrastructure, combined with the rapid depletion of natural resources, has led to growing environmental concerns and logistical challenges in aggregate sourcing and transportation [1,2]. Among various alternatives, steel slag, a by-product of steelmaking processes, has emerged as a promising substitute for natural aggregates owing to its favorable geotechnical properties, high abrasion resistance, and potential for circular material flow [3,4,5]. Nevertheless, the large-scale utilization of steel slag in civil infrastructure is hindered by multiple interrelated factors, such as the transportation distance between slag production sites and construction locations, energy-intensive processing requirements, and associated carbon emissions [6,7].

In sustainable pavement engineering, minimizing both environmental impact and economic cost is a crucial challenge, particularly when materials like steel slag are sourced far from end-use locations. In this context, the optimization of material logistics and processing strategies plays a central role in enhancing both environmental performance and cost efficiency [8]. Transportation emissions, which can represent over 50% of the life-cycle environmental impact for heavy materials, are strongly influenced by the haul distance and mode of transport [9]. Additionally, the degree of processing required—such as crushing, sieving, and moisture conditioning—significantly contributes to both operational cost and embodied carbon, particularly when regional electricity mixes are carbon-intensive [10]. Therefore, a systematic method is needed to explore trade-offs between conflicting objectives such as transportation cost, energy consumption, and emissions.

Traditional decision-making approaches for aggregate selection and logistics planning have largely relied on rule-of-thumb heuristics, linear cost minimization, or laboratory-based scenario testing [11]. These methods often fail to capture the complex, nonlinear relationships between logistics parameters and life-cycle costs or emissions, especially when applied to large datasets across varying geographies and supply chain conditions. In line with emerging trends in sustainable construction materials, multi-objective optimization techniques such as response surface modeling have gained traction for balancing conflicting objectives, as demonstrated in recent material design studies [12]. Moreover, deterministic models are generally incapable of generating Pareto-optimal solutions when multiple objectives are in conflict [13].

To overcome these limitations, machine learning (ML) techniques have recently been introduced to model complex relationships in civil and materials engineering [14,15,16]. In particular, tree-based ensemble learning methods such as Random Forest Regressor (RFR), Extremely Randomized Trees (ERT), Gradient Boosted Regressor (GBR), and Extreme Gradient Boosting (XGBR) have shown strong predictive performance in handling heterogeneous, high-dimensional datasets due to their robustness to overfitting and ability to capture nonlinearity [17,18]. Once a reliable predictive model is established, advanced multi-objective optimization algorithms such as the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) can be employed to identify sets of optimal solutions that balance competing objectives without reducing the problem to a single aggregated metric [19,20].

This study proposes a novel machine learning-based multi-objective optimization (MOO) framework for evaluating steel slag utilization in road base layers. A comprehensive dataset comprising 482 scenarios was compiled from existing literature and engineering records, incorporating variables such as transport distance, processing energy intensity, slag gradation, moisture content, and local energy source mix. Two output objectives—unit transportation and processing cost (USD/m³) and CO₂ emissions (kg/m³)—were modeled using four ensemble ML regressors. The best-performing model (GBR) was integrated into the NSGA-II framework to identify Pareto-optimal solutions. SHAP (Shapley Additive Explanations) analysis was further conducted to interpret the influence of key input features on predicted outcomes.

The proposed framework demonstrates a scalable and data-driven approach to optimize industrial by-product use in infrastructure applications. It also highlights how ML-driven methods can support strategic supply chain decisions for low-carbon construction materials. This work is among the first to apply a hybrid GBR–NSGA-II methodology to the transport and processing optimization of steel slag in road engineering and may serve as a foundation for similar circular economy initiatives involving other industrial residues.

2. Data Source Integrity and Parametric Space Definition

To support the development of a reliable, transparent, and interpretable decision-support model, this study implemented a controlled, literature-informed dataset construction framework that reflects the realistic engineering constraints and environmental variability associated with steel slag use in road base applications. The primary goal was to generate a numerically stable and physically grounded dataset capable of capturing the nonlinear interactions between operational parameters and their dual impact on cost and carbon performance. Specifically, five independent variables were identified and retained based on their physical influence and empirical importance on the two model targets: unit cost (USD/m³) and CO₂-equivalent emission (kg/m³). These variables—transport distance, processing energy, moisture content, gradation intensity, and electricity emission factor—were selected through literature triangulation, expert consensus, and correlation analysis. The following subsections describe the following in detail: (i) the rationale behind input variable selection, (ii) the source domains and inclusion criteria used to define input bounds, and (iii) the formulation of output variables through first-principles-based physical models.

2.1. Variable Selection: Engineering Rationale

The selection of input features was guided by both physical causality and empirical sensitivity evidence reported in the literature. Each variable was retained based on its measurable and quantifiable contribution to at least one of the model’s two target outputs: total unit cost (USD/m³) and total CO₂-equivalent emissions (kg/m³). The final set of five independent variables was determined following iterative statistical screening, correlation analysis, and engineering validation.

The first variable, Transport Distance (X₁), refers to the one-way haulage length between the steel slag production site and the target construction zone. It directly affects the diesel fuel consumed in freight operations, and therefore influences both economic and environmental outputs. The impact of this variable on CO₂ emissions is modeled using a linear mass-distance function, as shown in Equation (1):

{E m i s s i o n}_{t r a n s p o r t} = D ρ {E F}_{d i e s e l}

(1)

here, D denotes the transport distance (km),

ρ

is the material density (ton/m³), and

{E F}_{d i e s e l}

is the diesel emission factor (kg CO₂/ton·km), assumed to be 0.13 in accordance with CSN EN 16258 [21].

The second variable, Processing Energy Intensity (X₂), represents the electricity demand (kWh/ton) required for mechanical treatment processes, including crushing, screening, and drying. It scales linearly with both the operating cost and the electricity-related emissions, which are computed with Equation (2):

{E m i s s i o n}_{p r o c e s s i n g} = E {E F}_{g r i d}

(2)

where

E

is the site-specific energy demand and

{E F}_{g r i d}

is the regional emission factor of electricity production (kg CO₂/kWh), varying by energy mix.

A third critical variable is the initial Moisture Content (X₃) of the steel slag at the point of extraction. Moisture increases the latent thermal load required for drying and significantly affects both cost and emission through elevated energy usage. The drying energy can be thermodynamically approximated by Equation (3):

E_{D} = c_{p} m Δ T

(3)

where

c_{p}

is the specific heat of water (kJ/kg·K),

m

is the mass of moisture (kg), and

Δ T

is the temperature differential needed to achieve sufficient drying. The inclusion of moisture thus introduces a nonlinear effect on both objective functions, reinforcing its necessity in the model.

The Gradation Adjustment Level (X₄) is treated as an ordinal categorical variable, denoting the intensity of mechanical sieving or particle size correction. It is coded as 0 for no adjustment, 1 for basic screening, and 2 for crushing plus screening. While this feature may appear qualitative, its effect is operationalized through discrete shifts in both energy intensity and maintenance costs, especially relevant in coarse/fine transitions or specification compliance [22].

Finally, the Grid CO₂ Emission Factor (X₅) accounts for regional differences in the carbon intensity of electricity used in processing. Its inclusion is essential to generalize the model across geographies and regulatory regimes. Since this factor is used directly in Equation (2), it modifies the slope of emission increase with energy intensity, thus playing a critical role in optimization scenarios targeting emission reductions.

2.2. Data Source Integrity and Inclusion Criteria

The selection of value ranges for five key input variables—Transport Distance (X₁), Processing Energy Intensity (X₂), Moisture Content (X₃), Gradation Adjustment Level (X₄), and Electricity Grid CO2 Emission Factor (X₅)—was guided by published case studies, experimental datasets, and technical standards. The five selected input variables (X₁–X₅) were chosen for their strong influence on cost and emissions, data availability, and ability to indirectly capture more complex phenomena. For instance, energy intensity accounts for the drying process, which varies with climate and fuel source, while transport distance captures first-order effects of road quality and vehicle type. This parsimonious input design supports tractable surrogate modeling while ensuring broad applicability. Future studies may explicitly model additional context-specific variables where fine-grained data become available. To support this selection strategy, the value ranges used in this study were grounded in empirical and policy-based sources from the literature. For example, the transport distance range of 5–250 km reflects values reported in regional aggregate logistics studies [23,24]; energy consumption for crushing and drying operations (0.65–2.1 kWh/ton) was derived from empirical facility data [25]; grid emission factors (0.24–1.05 kg CO₂/kWh) correspond to national electricity mix profiles published by [26]; and moisture content values (2–15%) reflect typical open-air storage in various climatic zones [27].

To ensure methodological consistency, only sources that provided quantitative, disaggregated data (e.g., by process stage or energy type) were considered in defining the parameter ranges. Mixtures containing blended additives (e.g., fly ash, cement) or modified chemistries were excluded in order to isolate the independent contribution of steel slag. This process enabled the construction of a dimensionally consistent input space, with all variables normalized to a per-volume (m³) basis for comparability.

Following the definition of this parametric domain, a total of 530 candidate configurations were generated using a stratified uniform sampling approach, in which each input variable was sampled independently across its literature-supported range using equally spaced intervals. The discretization scheme consisted of the following:

X₁: Transport Distance (km)—10 levels ranging from 5 to 250 km, with a step size of 27.2 km.
X₂: Energy Intensity (kWh/ton)—6 levels between 0.8 and 2.0 kWh/ton, increasing by 0.24 kWh/ton per level.
X₃: Moisture Content (%)—6 levels from 2% to 12%, with a step size of 2%.
X₄: Gradation Adjustment (categorical)—3 levels (0: none, 1: moderate, 2: major)
X₅: Grid CO₂ Factor (kg/kWh) –9 levels between 0.3 and 1.0 kg CO₂/kWh, incremented by 0.0875 kg CO₂/kWh.

Rather than employing full-factorial enumeration (which would yield over 9000 combinations), a randomized stratified sampling method was used to generate a manageable yet well-distributed set of 482 distinct input configurations. This technique enabled adequate space-filling while ensuring statistical representation of each variable’s range without overwhelming computational demand. In addition, all generated combinations were reviewed for engineering feasibility. Implausible scenarios—such as configurations with both very low energy use and very high moisture content, which are physically inconsistent due to drying energy demands—were filtered out prior to model training. After the exclusion of such cases, the resulting dataset comprised 482 technically feasible and literature-consistent scenario profiles. The finalized descriptive statistics are reported in Table 1, which lists the minimum, maximum, mean, and standard deviation for each engineered input parameter. Each variable has also been annotated with the associated data source domain, ensuring full traceability and scientific transparency. Skewness values indicate slight right-skew for transport distance and energy intensity, supporting the use of non-normal distributions in surrogate model training.

To ensure statistical representativeness and numerical stability during model training, each input variable was assigned a probability distribution that reflects its physical behavior and empirical spread in real-world applications. Transport Distance (X₁), Moisture Content (X₃), and Grid CO₂ Factor (X₅) were modeled using uniform distributions, as they reflect a bounded yet evenly likely range within real logistic and environmental conditions. Processing Energy Intensity (X₂) followed a log-uniform distribution, consistent with industrial datasets where higher values are rarer but still relevant. Gradation Adjustment (X₄) was treated as a discrete ordinal variable with uniform probability across its three levels. These distribution assumptions were informed by literature-derived ranges and the need for space-filling during stratified sampling. This setup supports the learning robustness of ensemble tree models while avoiding distributional bias.

In summary, the statistical architecture of the input space ensures numerical stability, dimensional coherence, and representativeness across diverse geographies and operation scales. These properties are essential for supporting the tree-based ensemble models and optimization logic described in Section 3 and Section 4, and they directly impact the reliability and scope of the decision-support framework.

2.3. Output Variable Definitions and Calculation Methods

The two output variables used as model targets—total unit cost and CO₂-equivalent emissions—were not directly observed in all cases but derived from consistent physical formulations to ensure cross-source comparability. Both were expressed per cubic meter of compacted steel slag base material to align with road design volumetric specifications.

The total unit cost (Y₁) in USD/m³ was decomposed into three components: transportation cost C_T processing energy cost C_E, and operational overhead C_O. The summation is provided in Equation (4):

Y₁ = C_T + C_E + C_O

(4)

here, C_T is computed based on fuel consumption per ton·km and prevailing fuel prices, C_E is based on electricity consumption (from X₂) multiplied by regional tariffs (USD/kWh), and C_O includes labor, maintenance, and depreciation, standardized using industry unit rates. In scenarios where data were missing, the component estimates were reconstructed using proportional regressions anchored to known benchmark cases.

The total CO₂-equivalent emission (Y₂), in kg/m³, was calculated as the sum of two subcomponents: emissions from diesel-based transport and emissions from electricity-based processing. This is formalized in Equation (5):

Y_{2} = (X_{1} ρ E F_{d i e s e l}) + (X_{2} \cdot X_{5})

(5)

X₁ and X₂ are defined as before,

ρ

is the material density (taken as 2.1 ton/m³ for compacted slag), and EF_diesel is set at 0.13 kg CO₂/ton·km. The second term reflects emissions from grid electricity, with X₅ being region-specific (e.g., 0.24–1.05 kg CO₂/kWh). This equation ensures that the emission metric accounts for both mobile (Scope 1) and stationary (Scope 2) sources as defined under the GHG Protocol, enhancing the relevance of the results for carbon policy planning and environmental impact assessments.

To assess the plausibility of the computed output values, model-derived estimates for total unit cost and CO₂-equivalent emissions were benchmarked against published field studies and empirical datasets. Specifically, transport- and processing-related cost values were cross-compared with unit rates reported in case studies from [28,29], which analyzed slag-based pavement construction.

3. Methodology

To systematically evaluate the trade-offs between transportation-induced carbon emissions and processing-related costs in the context of steel slag utilization for road base applications, a machine learning-based multi-objective optimization (MOO) framework was developed. The entire methodological sequence is summarized schematically in Figure 1, illustrating the integrated progression from model training to optimization and final decision support.

First, predictive models were developed for two dependent variables—total unit cost (USD/m³) and CO₂-equivalent emissions (kg/m³)—using four tree-based ensemble learning algorithms: Random Forest Regressor (RFR), Extremely Randomized Trees (ERT), Gradient Boosted Regressor (GBR), and Extreme Gradient Boosting Regressor (XGBR). These models were trained on a dataset of 482 engineered instances, with hyperparameters optimized via Bayesian search and model performance evaluated using statistical indicators such as Pearson correlation coefficient (R), mean absolute error (MAE), root mean squared error (RMSE), and the a20 accuracy index.

Following model selection, the interpretability of the chosen regressor was enhanced through the SHapley Additive exPlanations (SHAP) framework. SHAP analysis enabled the quantification of the marginal contribution of each input variable to the model output, both globally (across the dataset) and locally (at the level of individual predictions), thus supporting the transparent evaluation of variable influence and enabling a better formulation of optimization constraints.

A bi-objective optimization problem was then formulated to simultaneously minimize the predicted cost and emissions. The five input features—transport distance, energy intensity, moisture content, gradation level, and grid CO₂ emission factor—formed the decision vector, while the trained GBR model served as a surrogate objective function. Domain-specific constraints, including variable bounds, thermodynamic feasibility, and volumetric consistency, were incorporated to ensure the physical realism of solution space. This optimization problem was solved using the Non-Dominated Sorting Genetic Algorithm II (NSGA-II), which iteratively evolves populations of candidate solutions using crossover, mutation, and elitist selection to approximate the Pareto front of non-dominated trade-offs between cost and emissions.

Finally, the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) was used to support actionable decision making. Each point on the Pareto front was scored based on its Euclidean proximity to an ideal solution (minimum cost and minimum emission), and the alternative with the highest closeness coefficient was identified as the recommended configuration.

3.1. Predictive Modeling Using Tree-Based Ensemble Learning Algorithms

To accurately estimate the total unit cost and carbon emissions associated with steel slag utilization in road base construction, supervised machine learning models were developed based on a set of tree-based ensemble regression algorithms. These models were selected due to their established ability to capture complex, nonlinear relationships in high-dimensional and heterogeneous datasets, which are common in infrastructure and sustainability-focused studies [30,31,32]. In particular, four widely recognized ensemble learning methods were employed in this study: Random Forest Regressor (RFR), Extremely Randomized Trees (ERT), Gradient Boosted Regressor (GBR), and Extreme Gradient Boosting Regressor (XGBR).

The selected models represent four categories of tree-based ensemble methods commonly used in infrastructure analytics [33]. RFR and ERTs are bagging-based algorithms, with RFR utilizing bootstrap aggregation of decision trees and ERTs introducing randomized node splitting to further reduce variance. In contrast, GBR and XGBR are boosting-based techniques, where weak learners are sequentially improved by minimizing residual errors. XGBR further enhances performance by incorporating regularization and second-order loss approximations [34,35,36]. These distinctions allow for comparison of variance–bias trade-offs and learning dynamics across bagging and boosting regimes within the same input space.

To enhance transparency and mathematical rigor, the learning process underlying each ensemble model is briefly formalized below. In supervised regression settings, the objective is to approximate a mapping function f:R^p→R, where the input vector x = [X₁,X₂,…,X_p] corresponds to engineered features and the output y∈R denotes the target variable (either cost or CO₂ emission in this study). Tree-based ensemble models approximate this mapping by aggregating the predictions of multiple base regressors.

RFR consists of T individual regression trees

{

h_t(⋅)

}_{t = 1}^{T}

, each trained on a bootstrapped sample of the training data with random feature selection at each split. The final prediction is given by Equation (6):

{\hat{y}}_{R F} = \frac{1}{T} \sum_{t = 1}^{T} h_{t} (x)

(6)

ERTs follow a similar ensemble structure but introduce additional randomness by selecting split thresholds randomly rather than optimizing them, which helps further reduce variance and computational cost.

GBR is an additive ensemble model that incrementally updates the prediction function by fitting new trees to the negative gradient of the loss function. At each iteration m, the model is updated as described in Equation (7).

F_{m} (x) = F_{m - 1} (x) + η {\cdot h}_{m} (x)

(7)

where η is the learning rate and

h_{m} (x)

is the regression tree trained to minimize the residuals.

XGBoost improves on conventional boosting by using a second-order Taylor expansion of the loss function and applying regularization to penalize model complexity. Equation (8) provides the approximation of the objective at iteration t.

L (t) \approx \sum_{i = 1}^{n} [g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] Ω (f_{t})

(8)

where

g_{i}

and

h_{i}

are the first and second derivatives of the loss with respect to current predictions, and

Ω (f_{t})

is a regularization function over the tree structure

f_{t}

. This formulation enables improved convergence, better generalization, and reduced overfitting.

In all models, training aims to minimize the empirical risk, which is defined by the Mean Squared Error (MSE) in Equation (9).

L (y, \hat{y}) = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(9)

The empirical risk function L(

y

,

\hat{y}

), in the context of supervised learning, quantifies the average discrepancy between observed outputs and model predictions. In this study, the MSE was selected as the loss function due to its sensitivity to large deviations and its smooth, differentiable form, which is suitable for optimization. The function computes the squared difference between the actual value

y_{i}

and the predicted value

{\hat{y}}_{i}

for each observation

i

∈{1,2,…,

n

}, sums these squared errors across the dataset, and normalizes the result by the number of instances

n

. Minimizing this quantity leads the model to learn parameters that produce predictions

{\hat{y}}_{i}

as close as possible to the true targets

y_{i}

, in the least-squares sense.

To ensure the predictive robustness of the GBR model used in this study, hyperparameter tuning was conducted using Bayesian optimization with Gaussian Process priors, implemented via the skopt package. The search space and prior distributions were defined as follows: learning rate ∈ [0.01, 0.3] (log-uniform), number of estimators ∈ [100, 1000] (uniform), max depth ∈ [3, 10] (uniform), subsample ∈ [0.5, 1.0] (uniform), and min samples split ∈ [2, 20] (uniform). The optimization was performed over 30 iterations using 5-fold cross-validation based on R² score. Early stopping was not applied during the tuning process; instead, the configuration yielding the highest cross-validated R² was selected.

These models were selected not only for their theoretical strengths but also for their suitability in modeling the specific target variables defined in this study. Each model was trained independently to predict two continuous output variables: (i) the total transportation and processing cost per cubic meter of compacted slag material (USD/m³), and (ii) the equivalent carbon dioxide emissions per unit volume (kg CO₂-e/m³), based on the five engineered input features described in Section 2.

Model development followed a rigorous training–validation pipeline. The complete dataset consisting of 482 samples was first randomly partitioned using an 80:20 stratified split, where 80% of the data was used for training and the remaining 20% reserved for testing [37]. Input features were normalized on a [0, 1] scale for numerical stability, and categorical variables such as gradation level were transformed using one-hot encoding. To ensure robustness and reduce model variance, five-fold cross-validation was conducted on the training subset, with performance evaluated on both training and validation folds at each iteration [38].

To further improve prediction accuracy and avoid suboptimal hyperparameter configurations, each model underwent hyperparameter tuning using a Bayesian Optimization Algorithm (BOA) [39] which finds posterior possibility via prior knowledge as follows Equation (10):

p (w D) = \frac{p (D w) p (w)}{p (D)}

(10)

where

p (w)

and

p (w D)

denote the prior and posterior distributions, respectively;

p (D w)

represents the probability, and w is the unseen data. Utilizing the Bayes’ rule, the result of the last iteration is searched for that value on the next iteration.

The search space was defined based on prior empirical studies and included parameters such as the number of trees (ranging from 100 to 1000), learning rate (0.01–0.30), and maximum tree depth (3–30). Bayesian optimization was selected over grid or random search due to its superior sample efficiency and ability to incorporate prior performance distributions into its acquisition strategy, thereby reducing convergence time and computational cost.

Model performance was quantitatively assessed using four standard evaluation metrics [34]. The Pearson correlation coefficient (R), computed as shown in Equation (11), was used to evaluate the linear association between predicted and actual values, providing insight into overall fit. Mean absolute error (MAE), defined in Equation (12), provides an average measure of prediction deviation without considering directionality, offering robustness to outliers. The root mean square error (RMSE), calculated according to Equation (13), places higher penalization on large deviations and is especially useful for assessing variance-sensitive prediction performance. Practical reliability was assessed using the a20 accuracy index, defined in Equation (14), which reflects the proportion of predictions within ±20% of observed values—a threshold commonly used in infrastructure forecasting tasks.

R = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \sqrt{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{y})}^{2}}}

(11)

M A E = \frac{1}{n} |y_{i} - {\hat{y}}_{i}|

(12)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(13)

a_{20} = \frac{m_{20}}{m}

(14)

here,

y_{i}

and

{\hat{y}}_{i}

are the actual and predicted values, respectively;

\bar{y}

and

\bar{\hat{y}}

are their corresponding means; and

n

is the number of observations. m is the total number of samples, and

m_{20}

indicates the number of predicted values between 80% and 120% of the actual value.

3.2. Feature Importance and Interpretability Using SHAP

While tree-based ensemble learning algorithms such as GBR offer superior predictive accuracy in high-dimensional and nonlinear domains, their internal decision structures are inherently complex and often criticized for lacking transparency. This opacity can hinder the interpretability of the model’s behavior, particularly in applied engineering domains where traceable decision logic is essential for adoption [40]. To address this issue and improve the explainability of the developed predictive framework, the SHapley Additive exPlanations (SHAP) methodology was implemented as a post hoc interpretability tool.

SHAP is grounded in cooperative game theory and quantifies the marginal contribution of each input feature to the model’s output by evaluating the change in prediction when a feature is included or excluded across all possible permutations of feature subsets [41]. This is mathematically formalized by the Shapley value formulation given in Equation (15) [42].

ϕ_{i} (f) = \sum_{S \subseteq \{x_{1}, \dots \dots x_{p}\} ∖ \{x_{j}\}} \frac{|S|! (p - |S| - 1)!}{p!} [f (S \cup \{x_{j}\}) - f (S)]

(15)

where x_j is the feature variable; p is the number of the features; S denotes a subset of the features; and f(x_j) is the output of the model. In SHAP, a feature’s importance is determined by altering its value and then quantifying the prediction error.

The SHAP analysis was carried out at two levels: global and local. Global SHAP values were aggregated over the entire dataset to assess the average importance of each feature across all prediction instances. This allowed the identification of dominant input variables that systematically influenced the predictions of total cost and CO₂ emissions. For example, features such as transport distance and energy intensity were consistently observed to have the highest SHAP values in both targets, confirming their central role in the logistics–emissions–cost relationship [43,44]. This global interpretability analysis not only validated the relevance of the selected input features but also informed the prioritization of decision variables in the subsequent optimization phase.

In parallel, local SHAP values were calculated for individual samples to understand how specific feature values contributed to the model’s prediction in particular scenarios. These instance-level explanations are especially valuable for scenario-specific decision support, where engineers or policymakers may need to justify why certain configurations result in higher emissions or costs [45]. For example, in cases where grid CO₂ intensity was low but the predicted emission was still high, SHAP force plots revealed that elevated moisture content and long transport distances outweighed the benefit of clean electricity, thus offering actionable insight for process adjustment.

SHAP is also used to describe the working process of a trained model by using an additional feature attribution method that treats the output model as an interpretable model of the linear addition of the input variables. The prediction f(x) of the original model can be represented, as shown in Equation (16) [42].

f (x) = h (x_{s}) = φ_{0} + \sum_{i = 1}^{k} φ_{i} x_{i}

(16)

where

k

is the number of input features and

φ_{0}

denotes the constant value when no inputs are used. Inputs

x

and

x_{s}

are related by a mapping function, x =

m_{x}

(

x_{s}

).

To further explore the interaction effects between variables, SHAP dependence plots were generated. These plots highlighted key nonlinear relationships, such as the amplification of energy intensity effects in regions with high grid emission factors, or the compounded cost impact of combined long transport distance and high gradation adjustment levels [46]. Such interpretive depth cannot be achieved through traditional variable importance rankings alone and underscores the utility of SHAP in capturing complex, multi-dimensional feature interactions inherent in infrastructure system modeling.

The insights derived from the SHAP framework played a critical role in refining the boundaries and constraints of the multi-objective optimization problem described in the following section. By elucidating the functional roles and sensitivities of input variables, the SHAP-based interpretability layer added both diagnostic clarity and modeling credibility to the overall analytical workflow.

3.3. Formulation of the Multi-Objective Optimization Problem

To identify optimal operational strategies that simultaneously minimize economic and environmental burdens associated with steel slag use in road base layers, a bi-objective mathematical optimization problem was formulated. The objective of this formulation was to search for feasible combinations of input parameters that jointly minimize the predicted total cost per unit volume and the associated CO₂-equivalent emissions [47]. Formally, the optimization problem was expressed as a minimization over two scalar-valued objective functions f₁(x) and f₂(x), representing total unit cost (USD/m³) and total carbon footprint (kg CO₂-e/m³), respectively. The vector of decision variables can be defined as x = [X₁, X₂, X₃, X₄, X₅] ∈ R5. The two objective functions are expressed as follows in Equations (17) and (18).

\binom{m i n}{x} f_{1} (x) = {\hat{y}}_{c o s t} (x) \cdot [U S D / m^{3}]

(17)

\binom{m i n}{x} f_{2} (x) = {\hat{y}}_{{C O}_{2}} (x) \cdot [k g \cdot {C O}_{2} - e / m^{3}]

(18)

where

\hat{y}

_cost(x) and

\hat{y}

_CO₂(x) are surrogate objective functions modeled using the Gradient Boosted Regressor (GBR), as described in Section 3.1.

The decision variable vector x ∈ R⁵ consisted of five normalized, bounded, and independently controllable inputs, all of which had physical meaning in the logistics and processing chain. These variables included transport distance (in kilometers), energy intensity of mechanical processing (in kWh per ton of slag), initial moisture content at the source (in percent by weight), gradation adjustment level (modeled as an ordinal categorical parameter with three levels), and the regional electricity grid’s CO₂ emission factor (in kg CO₂ per kWh). The inclusion of these variables was based on engineering relevance and SHAP-derived importance scores (see Section 3.2), and they collectively capture both spatial and technological variability in slag-based base layer construction.

To ensure the realism and physical plausibility of the optimization outcomes, the search space was constrained based on both the statistical properties of the dataset and domain-specific knowledge. Each decision variable was restricted to lie within empirically observed bounds, as detailed in Table 1. For instance, transport distance was limited to a range of 5–250 km to reflect practical haulage conditions, and moisture content was confined to 2–15% to remain within the drying capacity of typical field operations. In addition to these range constraints, logical feasibility rules were embedded to preserve inter-variable consistency. For example, solutions involving high moisture content (X₃ > 10%) were coupled with increased energy intensity (X₂ ≥ 1.8), reflecting the thermodynamic reality of additional drying requirements. Similarly, gradation adjustment levels (X₄ = 2) were associated with increased energy consumption due to crushing and screening operations. Moreover, all outputs were expressed per 1 m³ of compacted material. This was achieved by scaling mass-based quantities (e.g., energy per ton, transport per ton·km) using an average dry bulk density of 2.1 ton/m³, ensuring consistent comparison across configurations.

The resulting problem is a continuous–discrete hybrid optimization model with a nonlinear, nonconvex objective landscape. Given the absence of closed-form expressions for the objective functions and the interdependencies introduced by feasibility rules, a metaheuristic evolutionary approach was required to navigate the solution space efficiently [48]. The adopted solution technique, described in Section 3.4, utilizes population-based search strategies to approximate the Pareto frontier of trade-offs between cost and emissions.

The optimization problem exhibits a hybrid structure, incorporating both continuous (X₁, X₂, X₃, X₅) and discrete (X₄) variables. To accommodate this, the ordinal variable X₄ was encoded as an integer and treated using discrete mutation and crossover operators within the evolutionary algorithm. Continuous variables were normalized to the [0, 1] range to ensure scale uniformity and convergence stability.

The five decision variables were not only grounded in engineering relevance but also supported by feature importance analysis using SHAP (Section 3.2). The SHAP results revealed that Transport Distance (X₁), Energy Intensity (X₂), and Grid CO₂ Factor (X₅) were the most influential predictors of both cost and emissions. As a result, these variables were assigned tighter bounds and higher resolution during the sampling and optimization steps, while less influential variables such as Gradation Level (X₄) were allowed broader variation to explore secondary effects. Due to the nonlinear and nonconvex nature of the objective landscape, and the absence of analytical gradients, this problem was not solvable via traditional mathematical programming. Therefore, a metaheuristic evolutionary approach was adopted, as detailed in Section 3.4. The solution strategy involved a population-based algorithm capable of exploring a diverse design space and approximating the Pareto-optimal frontier for the two conflicting objectives.

3.4. Optimization Using NSGA-II

To efficiently solve the bi-objective optimization problem formulated in the previous section—where the total unit cost and CO₂ emissions are to be simultaneously minimized over a constrained decision space—the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) was adopted as the core solution strategy. NSGA-II is a multi-objective evolutionary algorithm widely employed in engineering design problems due to its ability to approximate Pareto-optimal frontiers in highly nonlinear and nonconvex search spaces [49]. The algorithm operates through population-based stochastic search and is particularly well-suited for optimization problems where analytical gradients are unavailable and objective functions are defined via data-driven surrogate models.

The implementation of NSGA-II in this study followed its standard elitist evolutionary procedure, which iteratively evolves a population of candidate solutions through selection, crossover, and mutation operations. At each generation, individuals in the combined parent and offspring population are sorted into non-dominated fronts based on Pareto dominance [50]. This non-dominated sorting mechanism enables the algorithm to prioritize solutions that are not outperformed in both objectives by any other solution in the population. To preserve diversity along the Pareto front and avoid premature convergence, a crowding distance metric is calculated for each individual, measuring the density of neighboring solutions in the objective space. This metric is then used to break ties among individuals of the same non-dominated rank during environmental selection.

The reproduction phase employs a binary tournament selection mechanism, in which pairs of individuals are selected based on their dominance rank and crowding distance. Selected individuals undergo crossover using a simulated binary crossover (SBX) operator with a probability of 0.9, allowing for the controlled recombination of decision variable values [51]. Mutation is applied using a polynomial mutation operator with a mutation probability of 0.1, introducing variability into the population and enhancing global exploration capability. The bounds and constraint checks described in Section 3.3 were enforced after each variation step to ensure that all candidate solutions remained feasible with respect to the physical and operational constraints of the system.

The algorithm was executed with a population size of 200 and allowed to evolve over 250 generations. These parameter values were selected based on empirical convergence tests and are consistent with configurations reported in previous engineering optimization literature. Additionally, an early stopping criterion was applied: if the hypervolume improvement dropped below 1% over 20 consecutive generations, evolution was halted. At each generation, the algorithm updated a dynamic archive of non-dominated solutions, constructing an increasingly accurate approximation of the Pareto front. This frontier, representing the set of trade-off-optimal solutions in the objective space, was visualized in two-dimensional cost–emission plots and analyzed further to identify critical inflection points, solution diversity, and convergence behavior.

NSGA-II is based on the principle of Pareto dominance, where a solution x₁ is said to dominate another solution x₂ if all objective function values for x₁ are equal to or better than those of x₂, and at least one is strictly better. If neither dominates the other, they are considered non-dominated with respect to each other. The algorithm iteratively evolves populations of candidate solutions, applying crossover and mutation to generate diversity, and then uses non-dominated sorting and crowding distance-based selection to refine solution quality.

The implementation of NSGA-II in this study followed a six-step procedure (illustrated in Figure 2 [52]:

Initialization: A random initial population P0P_0P0 of size 200 was generated within the feasible bounds of the five decision variables (X₁–X₅).
Variation: Crossover and mutation operators were applied to create an offspring population Gt, effectively doubling the size to Pt + Gt. Simulated binary crossover (SBX) was used with a probability of 0.9, and polynomial mutation with a probability of 0.1.
Non-dominated sorting: All candidate solutions were ranked based on Pareto dominance. Rank 1 included the non-dominated Pareto set, followed by subsequent dominated layers (Rank 2, Rank 3, etc.).
Crowding distance calculation: Within each rank, individuals were assigned a crowding distance value based on their Manhattan distance to neighbors in objective space. This distance preserves solution diversity and prevents premature convergence.
Selection: Elitist selection was applied. Individuals with higher ranks and greater crowding distances were selected to form the next generation P_t+1, ensuring both convergence and diversity.
Termination: The loop was repeated until either 250 generations were completed or a predefined early-stopping criterion was met (hypervolume improvement < 1% over 20 generations).

Figure 2. NSGA-II algorithm mechanism: (a) algorithmic pipeline, (b) Pareto-based rank evolution.

These steps are visually summarized in Figure 2a (algorithmic pipeline) and Figure 2b (Pareto-based rank evolution). As seen, new generations are produced through variation, sorted by non-domination, evaluated for crowding, and selected for survival using elitist strategies. Moreover, to ensure the realism and operational validity of the solutions generated by the NSGA-II algorithm, domain-specific feasibility constraints were explicitly encoded into the optimization process. The decision vector included four continuous variables (X₁: transport distance, X₂: energy intensity, X₃: moisture content, X₅: grid CO₂ factor) and one discrete ordinal variable (X₄: gradation adjustment level, with values {0,1,2}). The ordinal variable was handled using integer encoding and subjected to discrete mutation and crossover operators. Feasibility rules—such as enforcing X₂ ≥ 1.8 when X₃ > 10% (to ensure sufficient drying energy), or coupling X₄ = 2 with minimum X₂ ≥ 1.5 (to reflect increased processing demand)—were implemented as constraint checks post-variation. Infeasible offspring were either repaired or penalized by exclusion from the next generation. This hybrid continuous–discrete structure was managed within a modular implementation that applied constraint validation at each evolutionary cycle, ensuring that only technically viable configurations contributed to the evolving Pareto set. The resulting Pareto front—consisting of trade-off optimal solutions between cost and emission—was visualized in 2D space and analyzed for diversity, inflection zones, and practical feasibility. To support final decision making, this non-dominated set was further evaluated using the TOPSIS ranking method, detailed in Section 3.5.

3.5. Decision Making Using TOPSIS

Following the generation of the Pareto-optimal solution set via NSGA-II, a systematic decision-making procedure was required to select a representative configuration for practical implementation. While the Pareto front offers a spectrum of trade-offs between total cost and CO₂ emissions, it does not inherently prioritize one solution over another [53]. To address this, the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) was employed as a scalar ranking method. TOPSIS is a well-established multi-criteria decision-making (MCDM) algorithm that ranks alternatives based on their geometric proximity to an ideal solution and remoteness from an anti-ideal solution, thereby offering an objective framework for alternative evaluation when multiple performance indicators are present [54].

A decision matrix was constructed using the total unit cost and CO₂-equivalent emission values of the Pareto-optimal alternatives. These values were normalized using vector normalization, as shown in Equation (19) [32].

r_{i j} = \frac{x_{i j}}{\sqrt{\sum_{i = 1}^{n} x_{i j}^{2}}}

(19)

where

r_{i j}

is the normalized value for alternative iii and criterion

j,

and

x_{i j}

is the original objective value.

Subsequently, each normalized objective was assigned an equal weight of 0.5, reflecting an a priori assumption of equal importance in the absence of stakeholder bias. The normalized and weighted matrix was then used to compute the Euclidean distance of each alternative to the ideal solution (Equation (20)), defined as the hypothetical point with the lowest cost and lowest CO₂ value among all Pareto solutions. Similarly, the distance to the anti-ideal solution, representing the highest values of both objectives, was also computed (Equation (21)) [32].

The closeness coefficient for each solution was calculated as the ratio of its distance to the anti-ideal point over the sum of its distances to the ideal and anti-ideal points. This coefficient, bounded between 0 and 1, quantifies how close a solution is to the optimal target configuration, with higher values indicating better performance.

D_{i}^{+} = \sqrt{\sum_{j = 1}^{m} w_{j}^{2} {(r_{i j} - A_{j}^{+})}^{2}}

(20)

D_{i}^{-} = \sqrt{\sum_{j = 1}^{m} w_{j}^{2} {(r_{i j} - A_{j}^{-})}^{2}}

(21)

where

$D_{i}^{+}$ : Distance of alternative i to the ideal solution (lower cost and emission),
$D_{i}^{-}$ : Distance of alternative i to the anti-ideal solution (higher cost and emission),
w_j: Weight assigned to criterion j,
r_ij: Normalized value of criterion j for alternative i,
$A_{j}^{+}$ : Ideal (best) value for criterion j,
$A_{j}^{-}$ : Anti-ideal (worst) value for criterion j,
m: Total number of criteria (here, m = 2; cost and emission).

Once the distances are calculated, the closeness coefficient (CC) for each alternative i is computed as Equation (22) [32].

{C C}_{i} = \frac{D_{i}^{-}}{D_{i}^{+} + D_{i}^{-}}

(22)

where

{C C}_{i}

is the closeness coefficient for alternative

i

, representing the relative proximity of the alternative to the ideal solution in the normalized objective space. The solution exhibiting the highest closeness coefficient was selected as the recommended operating configuration, as it provided the most balanced compromise between cost efficiency and environmental impact within the feasible design space.

In addition to identifying a single optimal point, the ranked list of alternatives generated by TOPSIS was retained to support flexible decision making. This is particularly valuable in infrastructure planning contexts, where external constraints—such as carbon pricing policies, transport accessibility, or regional emission targets—may shift the relative desirability of different trade-offs [55,56]. By integrating TOPSIS into the final stage of the framework, the methodology not only enables transparent performance comparison but also enhances its applicability in real-world decision environments where conflicting priorities must be resolved systematically [57].

3.6. Uncertainty Propagation with Triangular Distributions

To better account for real-world variability in key input parameters, a supplementary stochastic modeling layer was integrated into the optimization framework. This was motivated by the fact that several operational variables—such as transport distance, energy intensity of processing, and electricity emission factors—are known to fluctuate due to seasonal, spatial, and regulatory dynamics. Ignoring such inherent variability could lead to overly deterministic and potentially misleading optimization outcomes.

To address this, triangular probability distributions were defined for the three most influential variables identified via SHAP analysis: Transport Distance (x₁), Energy Intensity (x₂), and Grid CO2 Emission Factor (x₅). Each distribution was centered at the original deterministic value (mean) and bounded by ±10%, with the highest probability density at the mean. This distribution choice balances realism and simplicity, ensuring computational tractability while capturing plausible variability ranges without overfitting to tails.

Latin Hypercube Sampling (LHS) was employed to efficiently generate 500 quasi-random combinations from the joint distribution space of the three variables. This approach ensures uniform stratification across each input’s uncertainty range while maintaining independence between variables [58]. For each sampled realization, the trained Gradient Boosting Regression (GBR) surrogate model was used to rapidly predict the associated cost and emission outputs. The resulting dataset of 500 cost-emission pairs was subsequently processed through the NSGA-II algorithm to reconstruct a stochastic Pareto front under uncertainty.

This procedure enabled the construction of probabilistic envelopes (i.e., uncertainty bands) around the original deterministic Pareto frontier. These envelopes allow assessment of both the variability in solution space and the robustness of previously identified optimal configurations—particularly the TOPSIS-optimal recommendation. By observing the spread and overlap between the deterministic and probabilistic Pareto fronts, we quantified the sensitivity of optimality to real-world operational deviations. This analysis not only strengthens the credibility of our results but also provides practical insight for decision makers operating under uncertainty.

4. Results and Discussion

This section presents and interprets the predictive modeling results, analyzes the feature importance using SHAP values, and discusses the trade-off solutions generated through multi-objective optimization. The results are evaluated not only in terms of numerical performance but also through their practical implications for sustainable infrastructure planning and material logistics. The predictive accuracy of four tree-based ensemble learning models is first assessed, followed by a detailed discussion of the dominant influencing factors and the structure of the Pareto-optimal solutions. The section concludes with an assessment of the decision-making implications, limitations, and future research directions.

4.1. Predictive Model Performance and Interpretation

The predictive accuracy of the four selected regression models—Random Forest Regressor (RFR), Extremely Randomized Trees (ERT), Gradient Boosted Regressor (GBR), and Extreme Gradient Boosting Regressor (XGBR)—was evaluated on the held-out test set for both performance targets: total cost and CO₂ emissions. As shown in Table 2, GBR achieved the highest Pearson correlation coefficients (R = 0.962 for cost and 0.955 for emission), along with the lowest RMSE and MAE values, confirming its superior ability to capture nonlinear patterns in the engineered dataset. Furthermore, the a20 index—which indicates the percentage of predictions within ±20% of the actual values—was highest for GBR, reaching 87.5% for cost and 85.2% for emissions, suggesting excellent practical reliability in real-world applications.

These visual observations corroborate the statistical findings reported in Table 2, particularly the high Pearson R values (>0.95) and low RMSE, confirming that the GBR model maintains both precision and consistency across a wide distribution of material and logistical configurations. Recent studies have explored machine learning techniques for predicting various properties of concrete and pavement materials. Gradient Boosting Regression (GBR) has shown superior performance in predicting pavement damage [59] and asphalt pavement performance indices [60]. Random Forest Regression (RFR) and eXtreme Gradient Boosting (XGBR) have demonstrated excellent capabilities in predicting Marshall mix properties and resilient modulus of stabilized base materials [33].

To confirm the statistical significance of performance differences, a Wilcoxon signed-rank test was applied to the cross-validated RMSE values across five folds. Results indicate that GBR significantly outperformed all other models in both prediction tasks (p < 0.01), confirming that its superior performance is not due to random variation. The predictive capability of the GBR model was further examined through a visual comparison of predicted versus actual values for both performance indicators, as illustrated in Figure 3. The cost prediction plot (top panel) exhibits a concentrated scatter distribution along the 45-degree identity line, indicating a strong agreement between predicted and observed values across the test set. Deviations from the diagonal are minimal, and no systematic under- or over-prediction is evident throughout the full range of observed costs (15–50 USD/m³). Similarly, in the emission prediction plot (bottom panel), the points closely follow the ideal fit, with the majority of residuals confined within ±10 kg/m³ of the actual values, even in higher emission regions approaching 140 kg/m³.

These visual and statistical results are in line with previous studies demonstrating the superior generalization capacity of gradient boosting models in civil infrastructure domains. For instance, Huang et al. [32] conducted a comparative study on fly ash–slag geopolymer mixtures and observed that GBR predictions showed minimal residual bias across both strength and emission targets, particularly evident in predicted vs. actual plots, where GBR consistently aligned with the identity line. The statistical robustness of GBR models under repeated validation has also been confirmed by Liu et al. [61], who employed bootstrapping and cross-validation approaches to assess uncertainty and robustness. Kaniuka et al. [62] recommended visual residual analysis and calibration plotting as essential tools in verifying whether machine learning predictions maintain both bias-free behavior and variance homogeneity—criteria clearly satisfied in our GBR-based model.

To ensure the statistical calibration of the GBR model, residuals from the test set were further analyzed using a Q–Q plot, as shown in Figure 4. The residuals align closely with the reference line of normality, suggesting no significant skew or kurtosis. This confirms that the GBR model satisfies normality assumptions and does not suffer from heteroskedastic errors [63].

Additionally, a calibration curve was plotted using binned prediction intervals (Figure 5). The GBR model aligns well with the ideal diagonal, particularly in the central prediction ranges. Slight deviations at high predicted values (>90th percentile) suggest marginal underestimation in rare, high-cost scenarios, which is common in ensemble models and can be corrected via post-calibration methods.

The use of Q–Q plots and calibration curves further reinforces the statistical integrity of the GBR model. This methodology is in alignment with recommendations by Robson et al. [64], who emphasized the use of Q–Q plots for diagnosing distributional bias in regression models applied to infrastructure prediction tasks. The reliability of the GBR model is further supported by the calibration curve, which exhibits near-ideal alignment throughout most of the prediction range [65].

To further validate generalization capacity, stratified performance analysis was conducted by dividing Transport Distances (x₁) into three operational bins: short (5–50 km), medium (51–150 km), and long (>150 km). Across all bins, GBR consistently maintained low RMSE (<6.1 USD/m³) and high a20 (>85%), affirming robustness across variable input regimes. This finding aligns with the results reported by Khiari et al. [66], who demonstrated that boosting algorithms maintained stable error margins across different transportation. Furthermore, Qi et al. [67] highlighted that stratified validation is crucial for ensuring model applicability in real-world infrastructure design scenarios. The observed robustness of the GBR model in our study supports these assertions and suggests its utility in policy-relevant modeling, where operational heterogeneity is high. Comparable conclusions were drawn by Roni et al. [68], who highlights the importance of incorporating the spatial and temporal variability of biomass yield and quality into supply chain planning to achieve sustainable and cost-effective production. Finally, Meglin et al. [69] demonstrate that the importance of uncertainty and price analysis in evaluating the transition to a circular economy for a regional building materials industry. The GBR model’s consistent performance across all transport intervals confirms its viability in such heterogeneous and region-sensitive applications.

Notably, the GBR model also demonstrates favorable behavior in the tail regions of the data—where traditional models like RFR or linear regression often fail to generalize—by preserving low bias even under extreme values of transport distance or grid emission intensity. This robustness in the face of input sparsity is particularly evident in emission estimates where despite the skewed distribution of grid factors in the input data (as discussed in Section 2.2), the GBR model preserved predictive stability without overfitting to dominant clusters.

The implications of these results are significant for infrastructure design and planning. Reliable predictions across the entire operational domain enable scenario testing and real-time feedback during project-level sustainability assessments. Furthermore, the high alignment of predictions with actual results provides strong justification for deploying GBR as the core engine in the subsequent optimization phase. Unlike black-box deep learning alternatives, GBR offers a transparent and interpretable modeling structure when combined with SHAP-based analysis, further reinforcing its suitability for policy-driven, material logistics decision support systems.

While the visual accuracy is high, it is also important to acknowledge potential residual structure under certain conditions, such as combined extremes in moisture content and transport distance. Future research could benefit from incorporating higher-order interaction terms explicitly, or exploring hybrid GBR–neural ensemble structures to resolve such localized performance plateaus. Nonetheless, in its current form, the GBR model presents a technically sound and operationally feasible predictive framework for quantifying both cost and emission consequences of steel slag utilization.

4.2. Feature Importance and SHAP Analysis

To enhance the interpretability of the gradient boosted regression (GBR) model and quantify the relative contribution of each input variable to the predicted outputs, a comprehensive SHAP (SHapley Additive exPlanations) analysis was performed. SHAP offers a unified framework grounded in cooperative game theory, capable of decomposing complex model predictions into additive feature attributions. This capability is particularly valuable in engineering applications where model transparency is crucial for regulatory compliance, public sector decision-making, and model validation across heterogeneous operational scenarios.

The global SHAP summary plot for cost prediction identified Transport Distance (X₁) and Energy Intensity (X₂) as the most influential variables, followed by Grid Emission Factor (X₅) and Gradation Level (X₄). These rankings are consistent with physical expectations and highlight the key operational drivers of economic cost in slag-based infrastructure projects. To further elucidate feature behavior, Figure 4 presents SHAP dependence plots for four selected variables, offering deeper insight into interaction effects and nonlinear dependencies.

In the top-left panel of Figure 6, SHAP values for transport distance exhibit a near-linear relationship with cost impact, particularly beyond 100 km. Interestingly, the variance of SHAP values increases slightly at long distances (>200 km), reflecting interactions with gradation level and moisture content that affect overall transport mass and handling requirements. This observation is consistent with findings by Eštoková et al. [70], who reported that transport of building materials contributes significantly to the environmental impacts of construction, accounting for over 45% of life-cycle costs.

The top-right panel highlights the sensitivity of predicted cost to Energy Intensity (x₂), which includes mechanical treatment stages such as crushing, sieving, and moisture conditioning. The SHAP curve shows a pronounced positive slope between 1.2 and 1.8 kWh/ton, after which the impact saturates. This nonlinear response suggests threshold behavior in equipment efficiency, possibly influenced by system load or stage multiplicity. The bottom-left panel addresses Moisture Content (x₃) and its effect on predicted CO₂ emissions. SHAP values rise sharply for moisture values above 8%, aligning with thermodynamic principles: drying energy requirements scale exponentially with initial water mass, as described by Fourier and latent heat relationships in kiln-drying systems. In maritime or humid climates where ambient slag exposure is high, this becomes a critical design consideration. This observation is in line with that studied by An et al. [71], who demonstrated that even modest increases in initial water content can elevate CO₂ emissions by over 20% due to additional heat input for pre-drying.

The bottom-right panel examines the influence of Grid CO₂ Emission Factor (x₅). SHAP values grow nearly linearly as the grid factor increases from 0.3 to 1.0 kg CO₂/kWh, reinforcing the dominant role of energy source mix in determining environmental burden. This result supports prior analyses by Wang et al. [72], who emphasized that electrification without grid decarbonization can actually worsen life-cycle emissions in materials processing sectors.

An important insight from these plots is the interaction between x₂ and x₅, particularly visible when high energy demand overlaps with carbon-intensive grids. In such cases, the marginal emission impact per kWh can exceed 1.0 kg CO₂/kWh, suggesting that regional grid decarbonization is not merely a background variable but an actionable design lever. Similarly, co-dependence between transport distance and gradation level implies that logistics planning should be carried out jointly with material quality requirements, rather than sequentially or in isolation.

Together, these SHAP results offer a powerful diagnostic toolkit for stakeholders to not only understand model behavior but also identify leverage points for intervention. The ability to trace how each feature influences cost and emission predictions across different ranges and interaction conditions significantly strengthens the operational credibility of the GBR-based surrogate model and supports evidence-based policy formulation.

In Figure 7, which summarizes the model behavior for cost prediction, the dominant variables are clearly Transport Distance (X₁) and Processing Energy Intensity (X₂), with mean SHAP values of 1.25 and 1.08, respectively. This aligns with traditional life-cycle cost assessments (e.g., Luo et al. [73], de Bortoli [74]), which have consistently identified fuel-related haulage expenses and electricity-based processing charges as the principal contributors to the economic footprint of heavy construction materials. The cost impact of the Grid Emission Factor (X₅), while notable, is relatively moderate (0.79), likely because regional energy price differences are less extreme than emission intensity variations. Gradation Adjustment (X₄) and Moisture Content (X₃) show limited influence on cost, except in edge cases involving high processing demand, suggesting that they are secondary in economic terms.

To understand the internal mechanics of the GBR model in predicting equivalent CO₂ emissions, SHAP analysis was extended specifically to the emission target variable. Figure 8 summarizes the global feature importance rankings based on mean absolute SHAP values, while Figure 9 presents dependence plots that reveal how each input variable influences the emission output across its operational range. Together, these results offer a granular understanding of which process characteristics and regional factors most significantly drive the environmental footprint of slag-based base layer construction.

The SHAP summary plot (Figure 8) reveals that Moisture Content at the source (x₃) is the most dominant factor affecting CO₂ emissions, followed closely by the Grid CO2 Emission Factor (x₅) and Processing Energy Intensity (x₂). It is important to note that SHAP values quantify the marginal contribution of each input feature to the model’s prediction under the structure of the trained GBR model. While SHAP provides insight into feature influence, it does not imply causality in a mechanistic or physical sense. This finding strongly supports the thermodynamic rationale that higher moisture levels necessitate additional thermal energy for pre-drying, particularly under field compaction protocols. As shown in Figure 6 (bottom-left), SHAP values for x₃ increase sharply and almost linearly beyond 8% moisture, confirming the exponential relationship between water mass and required drying energy described by Fourier heat conduction and latent heat principles. Hossain et al. [75] demonstrated that a 3% increase in moisture content is associated with a 25–30% increase in predicted CO₂ emissions under the learned model.

The second-most important feature, Grid CO₂ Emission Factor (x₅), displays a near-linear correlation with model output (see Figure 9, bottom-right). This indicates that even when total energy consumption remains constant, the carbon footprint can vary significantly depending on the regional electricity mix. As highlighted in Knobloch et al. [76], electrification strategies that are not coupled with aggressive grid decarbonization can paradoxically worsen life-cycle emissions. In this context, the model’s behavior emphasizes the criticality of where processing is conducted, not just how much energy is used. This has direct implications for infrastructure planning, suggesting that siting decisions for slag processing facilities should consider the carbon intensity of regional grids. Interestingly, Processing Energy Intensity (x₂)—while the leading factor in cost prediction—has a reduced but still notable impact on CO₂ emissions. As illustrated in Figure 6 (top-right), its SHAP influence becomes pronounced only after 1.5 kWh/ton, indicating a nonlinear threshold beyond which additional energy input significantly amplifies emissions. In essence, not all increases in energy use are equally carbon-intensive; their impact is modulated by both grid context and energy efficiency of the process [77].

Transport Distance (x₁)—while a major cost driver—has relatively muted importance in CO₂ emissions (Figure 9, top-left), with SHAP values rarely exceeding 1.0 across the 5–250 km range. This suggests that diesel-based freight emissions are less significant compared to electricity-based emissions, especially in processing-intensive configurations. Gradation Adjustment Level (x₄), although present in the model, demonstrates minimal SHAP contribution in both cost and emission predictions.

From an engineering perspective, these results underscore a critical insight: emission mitigation cannot be achieved solely through energy minimization—it must be pursued through a combination of input control (e.g., moisture management), strategic facility siting (grid-aware), and process efficiency (post-threshold load optimization). This explains why emissions-driven optimization paths often diverge from cost-optimal configurations, justifying the need for dual-objective frameworks like the one employed in this study.

Moreover, the SHAP interaction behavior between x₂ and x₅—as seen in regions of the SHAP space with both high energy use and high grid factors—reinforces that emissions are a system-level outcome, not a single-variable function. This supports integrated policy measures, such as emissions-weighted tariffs or carbon-adjusted location scoring, especially in public procurement of green infrastructure materials. To formalize these interaction patterns, we computed SHAP interaction values using TreeExplainer’s interaction matrix. The interaction strength between features was quantified using the mean absolute SHAP interaction values across all samples. The highest interaction strength was observed between Moisture Content (x₃) and Energy Intensity (x₂), with a mean value of 0.14, suggesting that the emission impact of energy use is highly modulated by moisture levels. Additional notable interactions included Transport Distance (x₁) × Grid CO₂ Factor (x₅) (0.10), and Moisture Content × Gradation Level (x₄) (0.09), both of which indicate operational synergies between logistics and material conditioning processes. These quantitative insights confirm the non-additive nature of the GBR model and highlight key coupled variables that may require coordinated policy or engineering interventions.

In conclusion, the SHAP analysis for CO₂ emission prediction provides deep interpretability to the GBR model, validates its alignment with known physical and empirical behaviors, and offers actionable levers for low-carbon material logistics. By exposing how emissions are driven by the interaction of process, material, and geography, this layer of analysis enhances both scientific credibility and practical usability of the modeling framework.

4.3. Multi-Objective Optimization Results

The bi-objective optimization problem was solved using the NSGA-II evolutionary algorithm, with a population size of 200 and 250 generations. The resulting Pareto front consists of 174 non-dominated solutions, each representing a unique trade-off configuration between total cost and CO₂ emissions associated with steel slag utilization in road base layers. Figure 10 illustrates the distribution of the Pareto-optimal set in the objective space, revealing a smooth and well-formed frontier indicative of good diversity and convergence.

The observed trade-off structure follows the expected inverse pattern: reductions in unit cost are generally accompanied by increases in CO₂ emissions, and vice versa. The Pareto front spans a solution domain where cost ranges from 18.5 to 41.2 USD/m³, and CO₂ emissions vary between 45.6 and 138.2 kg/m³. This broad distribution provides a high-resolution design space for policy-sensitive infrastructure planning.

Several inflection points—or “knee points”—were identified along the front, offering efficient trade-off solutions. For instance, Point A (cost-optimal) minimizes total cost (18.5 USD/m³) but corresponds to one of the highest emission values (137.8 kg/m³), Point C (emission-optimal) yields the lowest emissions (45.6 kg/m³) at a substantially higher cost (40.9 USD/m³), and Point B (balanced solution) represents a pragmatic middle ground, balancing moderate cost (25.3 USD/m³) and emissions (87.2 kg/m³), which may be particularly valuable in policy frameworks with carbon pricing or life-cycle-based procurement. All three highlighted points (A, B, and C) were extracted directly from the NSGA-II solution pool under the same feasibility constraints discussed in Section 3.4.

From an engineering optimization perspective, these solutions map directly to operational decisions. Cost-optimal configurations generally exploit long-distance transportation from low-cost slag suppliers, minimal pre-processing, and low electricity price regions—often at the expense of higher grid CO₂ intensity and unaddressed moisture content. In contrast, low-emission scenarios are associated with shorter transport distances (<40 km), low initial moisture content (<5%), and grid locations with renewable-heavy electricity generation. These trends are consistent with the findings of Ajayi et al. [78], who emphasized the critical role of energy quality—not just quantity—in determining life-cycle environmental outcomes of materials.

Comparative results in similar multi-objective studies support the observed solution structure. For instance, Zhang et al. [79] employed NSGA-II for sustainable concrete mix optimization and reported similar nonlinear trade-off boundaries.

Pareto extremes—a phenomenon observable in Figure 7, where incremental gains in emissions reduction beyond 55 kg/m³ require disproportionately higher cost increases. Krishna et al. [80] also demonstrated that for pavement design, the integration of energy source carbon intensity into optimization models significantly reshapes the feasible envelope of performance.

Importantly, this analysis demonstrates that no single configuration can simultaneously minimize both objectives. Hence, decision makers must select solutions that align with regional regulatory frameworks, carbon tax structures, and material supply chain constraints. For instance, in jurisdictions with explicit carbon pricing, the shadow price of carbon may shift the optimal point toward the emission-efficient end of the frontier. Conversely, in regions where budget constraints dominate and environmental regulations are less stringent, cost-optimal or balanced solutions may be more feasible.

The value of the Pareto front is further enhanced by its actionability. Each solution corresponds to a real-world configuration of five input variables, as shown in the parallel coordinate plot (Figure 11), which allows practitioners to back-calculate the specific input mix associated with each point on the front. This capability aligns with best practices in decision support system design for infrastructure material selection, as outlined by Seyedashraf et al. [81], and it represents a significant advantage over single-objective models or heuristic decision making.

In summary, the NSGA-II optimization results provide a comprehensive, data-driven foundation for selecting optimal slag supply configurations that reflect context-specific priorities. While all Pareto-optimal configurations were generated under engineering feasibility constraints (e.g., energy-mass balance, processing bounds), the physical realism and constructability of extreme points—especially those in the high-cost/low-emission corner—require critical interpretation. For instance, configurations with minimal CO₂ emissions often involve very short transport distances (<10 km), advanced drying processes, and operation within near-zero carbon grids. While these setups are technically plausible, their implementation would require premium infrastructure, renewable-dominated energy access, and strict material controls—conditions that may not be widely available or economically feasible in many regions. Therefore, these solutions should be viewed as boundary references for best-case sustainability scenarios rather than directly actionable designs. By combining a predictive GBR model with an efficient evolutionary search algorithm, the framework generates technically feasible and environmentally responsive solutions that go beyond conventional cost-minimization strategies. The resulting Pareto front not only confirms trade-off structures consistent with theory and practice but also empowers stakeholders to implement more informed, transparent, and future-ready decisions.

4.4. Optimal Solution Selection and Practical Interpretation

While the Pareto front offers a diverse set of non-dominated solutions balancing cost and CO₂ emissions, practical implementation often necessitates a unique selection. To operationalize the model output for real-world application, the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) was employed. This method evaluates each Pareto-optimal solution based on its Euclidean distance to an ideal solution (minimum cost and minimum emission) and an anti-ideal (maximum values of both). The solution with the highest closeness coefficient (0.931) was selected as the optimal configuration, as shown in Figure 12.

The selected solution exhibits the following parameter profile:

Transport Distance (x₁): 47 km
Energy Intensity (x₂): 1.21 kWh/ton
Moisture Content (x₃): 6.2%
Gradation Adjustment Level (x₄): Level 1
Grid CO₂ Factor (x₅): 0.47 kg CO₂/kWh
Total cost: 24.6 USD/m³
Total CO₂ Emission: 74.8 kg/m³

This configuration lies near the geometric center of the Pareto front, offering a 45% reduction in CO₂ emissions relative to the cost-optimal solution (Point A), with only a 33% increase in unit cost. It thus represents a balanced and cost-effective pathway toward decarbonization, particularly suited for regions with medium-carbon electricity grids.

Compared to emission-optimal designs, which often demand short-haul sourcing, high-efficiency moisture control, and premium grid locations, this solution requires no extreme infrastructure adaptations. Transport distances under 50 km are feasible in peri-urban and industrial zones with regional steel production, while moderate energy intensity (≈1.2 kWh/ton) aligns with single-stage crushing equipment prevalent in many recycling plants

From a policy perspective, the selected configuration aligns well with the EU Taxonomy for Sustainable Activities, which encourages life-cycle CO₂ reduction without exceeding economic viability thresholds [82]. Chowdury and Hossain [83] argue that optimal sustainability strategies must avoid extremes in either cost or emissions to ensure stakeholder acceptance—an outcome embodied by the selected solution. Nevertheless, it should be noted that the reported CO₂ values reflect a cradle-to-gate perspective and exclude potential downstream offsets such as avoided burdens or service-life carbonation, potentially rendering the emission figures conservative.

The gradation level of 1 also offers meaningful insight. It reflects a single-pass sieving operation, avoiding the higher energy demands and equipment costs associated with fine particle control. This is consistent with the findings of Bahmani et al. [84], who found that the over-processing of slag for particle uniformity has diminishing returns on mechanical performance while increasing environmental impact.

Furthermore, the selected grid CO₂ factor (0.47 kg/kWh) represents a technologically realistic target. It is achievable through moderate decarbonization (e.g., 40–50% renewable mix), without requiring full transition to solar or hydro. This grid profile is now present in countries like Portugal and parts of Canada, making this solution replicable in multiple jurisdictions [85].

In conclusion, the TOPSIS-selected configuration provides a technically feasible, environmentally advantageous, and economically moderate design solution. It reflects both engineering practicality and alignment with global sustainability frameworks, making it suitable as a baseline strategy for infrastructure authorities, policymakers, and material producers aiming to integrate circular economy principles into road base design.

4.5. Robustness of the Optimal Configuration Under Uncertainty

Under the uncertainty propagation scenario, the recommended configuration (x₁ = 47 km, x₂ = 1.21 kWh/ton, x₅ = 0.47 kg CO₂/kWh) consistently remained within the top 5% of all candidate solutions based on the TOPSIS closeness coefficient in 91.2% of the 500 stochastic realizations. This indicates a high degree of robustness and suggests that the identified compromise solution is not merely a by-product of deterministic assumptions.

Moreover, the spread of the cost and emission predictions remained tightly concentrated around the original deterministic Pareto frontier, with less than ±5% deviation observed for both objectives in the majority of scenarios. These findings confirm the stability of the optimized solution under operational uncertainties and reinforce its practical viability.

Figure 13 depicts the probabilistic Pareto envelope generated through uncertainty-aware simulations, highlighting the limited drift in the optimal front and the persistent dominance of the selected configuration. The deterministic Pareto front is shown in red, while the cloud of gray curves represents possible trade-off solutions under input uncertainty. Notably, the TOPSIS-optimal configuration (indicated by a blue marker) consistently resides within the most favorable region of this envelope, maintaining a position within the top 5% of all realizations based on closeness coefficient rankings. This graphical evidence confirms the robustness of the recommended solution, indicating that it is not a statistical artifact of single-point assumptions. Moreover, the narrow dispersion of the stochastic frontier around the deterministic curve suggests that the optimization framework is stable and resilient to moderate fluctuations in input data, thereby enhancing the real-world applicability of the results.

5. Conclusions

This study developed a robust, machine learning-based multi-objective optimization (MOO) framework that integrates tree-based ensemble modeling with the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) to quantitatively assess and balance economic and environmental performance in the use of steel slag as road base material. Through a systematic combination of predictive analytics, feature interpretability, and Pareto-based optimization, the framework offers an actionable decision support structure for infrastructure planners navigating cost-emission trade-offs. The primary technical conclusions of this study are as follows:

Modeling accuracy and predictive reliability: Among the evaluated ensemble methods, the Gradient Boosted Regressor (GBR) achieved the highest predictive performance for both cost and CO₂ emissions, with R² values of 0.962 (cost) and 0.955 (CO₂), and RMSE values below 7.5 units on a 20% holdout test set. These results confirm the capacity of GBR to capture complex, nonlinear interactions in high-dimensional infrastructure logistics datasets. The use of Bayesian optimization for hyperparameter tuning further improved convergence and model generalizability.

Feature-level sensitivity and interaction patterns: SHAP analysis revealed that Transport Distance (x₁) and Energy Intensity (x₂) are the most influential features in cost determination, whereas Moisture Content (x₃) and grid CO₂ Emission Factor (x₅) dominate the emission prediction landscape. Notably, the cross-dependence between x₂ and x₅—where high processing demand under carbon-intensive grids led to extreme emissions—emphasizes the need for joint optimization of operational and locational parameters in material supply chains.

Multi-objective trade-off characterization: The NSGA-II algorithm produced a well-distributed Pareto front comprising 174 non-dominated configurations, covering a cost range of 18.5–41.2 USD/m³ and CO₂ emissions from 45.6 to 138.2 kg/m³. This empirical front confirms the existence of nonlinear, non-convex trade-off regions, including clearly observable knee points where marginal increases in cost lead to significant emission savings—aligning with similar findings in MOO-based materials research

Optimal solution identification via TOPSIS: Using distance-based ranking, a configuration with 24.6 USD/m³ cost and 74.8 kg/m³ CO₂ emission was identified as the most balanced solution (closeness coefficient = 0.931). This configuration features moderate transport distance (47 km), mid-level energy intensity (1.21 kWh/ton), and typical grid carbon intensity (0.47 kg/kWh), making it both technologically realistic and operationally replicable. Sensitivity analysis confirmed the stability of the proposed configuration under ±10% input variability, validating its robustness for practical deployment.

Recommended Pareto Solution and Practical Implications: From a practical perspective, one of the mid-range configurations on the Pareto front was identified as the most applicable solution, offering a reasonable compromise between cost and CO₂ emissions. This solution represents neither the lowest-cost nor the lowest-emission alternative, but rather it balances both objectives in a technologically realistic and logistically feasible manner. It reflects moderate values across key decision variables such as transport distance, energy use, and grid carbon intensity—conditions commonly encountered in real-world infrastructure projects. Its selection supports practical implementation in regions where both economic constraints and environmental regulations must be simultaneously considered.

Scalability and transferability of the framework: Although this study focused on BOF slag in road base applications, the proposed MOO–SHAP framework is generalizable to other recycled or industrial by-product materials, such as copper slag, bottom ash, or fly ash composites. Its transparency, interpretability, and alignment with regulatory sustainability criteria support deployment across infrastructure sectors where carbon, cost, and resource circularity are concurrently managed.

In conclusion, the integration of ensemble-based machine learning, explainable AI (via SHAP), and NSGA-II optimization constitutes a replicable and technically rigorous pathway for evaluating and deploying low-carbon construction materials. This framework not only enhances quantitative decision support in circular economy applications, but also facilitates region-specific material strategy design. Future research should incorporate stochastic supply scenarios and real-time transport network data to refine the optimization process under dynamic planning conditions. Additionally, expanding the system boundary to include Modules A1–A3 and D would enable full alignment with standardized life-cycle assessment practices compliant life-cycle assessments. The use of hybrid AI optimization in infrastructure engineering holds strong promise for advancing both climate mitigation and cost performance simultaneously.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data and codes used in this study are available at Zenodo via the following DOI: https://doi.org/10.5281/zenodo.16313759 (accessed on 29 July 2025).

Conflicts of Interest

The author declares no conflicts of interest.

References

Nwakaire, C.M.; Yap, S.P.; Onn, C.C.; Yuen, C.W.; Ibrahim, H.A. Utilisation of recycled concrete aggregates for sustainable highway pavement applications: A review. Constr. Build. Mater. 2020, 235, 117444. [Google Scholar] [CrossRef]
Hapendi, N.H.; Ali, D.S.H. Exploring the potential of recycled aggregates in modern construction: Challenges and innovations. Int. J. Adv. Eng. Res. Sci. 2025, 12, 609276. [Google Scholar] [CrossRef]
Mica, N.G.; Rios, S.; Viana da Fonseca, A.; Fortunato, E. The use of steel slags in transport infrastructures: A critical review. Geotech. Test. J. 2024, 47, GTJ20230297. [Google Scholar] [CrossRef]
Kumar, P.; Shukla, S. Utilization of steel slag waste as construction material: A review. Mater. Today Proc. 2023, 78, 145–152. [Google Scholar] [CrossRef]
Zhu, J.F.; Wang, Z.Q.; Tao, Y.L.; Ju, L.Y.; Yang, H. Macro–micro investigation on stabilization sludge as subgrade filler by the ternary blending of steel slag and fly ash and calcium carbide residue. J. Clean. Prod. 2024, 447, 141496. [Google Scholar] [CrossRef]
Liu, J.; Wang, W.; Wang, Y.; Zhou, X.; Wang, S.; Liu, Q.; Yu, B. Towards the sustainable utilization of steel slag in asphalt pavements: A case study of moisture resistance and life cycle assessment. Case Stud. Constr. Mater. 2023, 18, e01722. [Google Scholar] [CrossRef]
Li, L.; Ling, T.C.; Pan, S.Y. Environmental benefit assessment of steel slag utilization and carbonation: A systematic review. Sci. Total Environ. 2022, 806, 150280. [Google Scholar] [CrossRef] [PubMed]
Murphy, T.; Howard, I. Balancing Availability, Quality, Economics, and the Environment When Using Steel Slag within Pavements. In Geo-Congress 2023; ASCE: Reston, VA, USA, 2023; Volume 2023, pp. 408–418. [Google Scholar] [CrossRef]
Li, H.; Deng, Q.; Zhang, J.; Olanipekun, A.O.; Lyu, S. Environmental impact assessment of transportation infrastructure in the life cycle: Case study of a fast track transportation project in China. Energies 2019, 12, 1015. [Google Scholar] [CrossRef]
Echenagucia, T.M.; Moroseos, T.; Meek, C. On the tradeoffs between embodied and operational carbon in building envelope design: The impact of local climates and energy grids. Energy Build. 2023, 278, 112589. [Google Scholar] [CrossRef]
Tirkolaee, E.B.; Aydin, N.S.; Mahdavi, I. A bi-level decision-making system to optimize a robust–resilient–sustainable aggregate production planning problem. Expert Syst. Appl. 2023, 228, 120476. [Google Scholar] [CrossRef]
Lei, Z.; Qi, W.; Zhang, L.; Yang, J.; Zhuorui, Z. Preparation and comprehensive performance optimization of green insulation building materials based on blast furnace slag. J. Build. Eng. 2025, 106, 112591. [Google Scholar] [CrossRef]
Jayarathna, C.P.; Agdas, D.; Dawes, L.; Yigitcanlar, T. Multi-objective optimization for sustainable supply chain and logistics: A review. Sustainability 2021, 13, 13617. [Google Scholar] [CrossRef]
Naser, M.Z. A look into how machine learning is reshaping engineering models: The rise of analysis paralysis, optimal yet infeasible solutions, and the inevitable Rashomon paradox. Mach. Learn. Comput. Sci. Eng. 2025, 1, 19. [Google Scholar] [CrossRef]
Anjum, A.; Hrairi, M.; Aabid, A.; Yatim, N.; Ali, M. Civil structural health monitoring and machine learning: A comprehensive review. Fract. Struct. Integr. 2024, 18, 43–59. [Google Scholar] [CrossRef]
Vadyala, S.R.; Betgeri, S.N.; Matthews, J.C.; Matthews, E. A review of physics-based machine learning in civil engineering. Results Eng. 2022, 13, 100316. [Google Scholar] [CrossRef]
Papadopoulos, S.; Azar, E.; Woon, W.L.; Kontokosta, C.E. Evaluation of tree-based ensemble learning algorithms for building energy performance estimation. J. Build. Perform. Simul. 2018, 11, 322–332. [Google Scholar] [CrossRef]
Pahno, S.; Yang, J.J.; Kim, S.S. Use of machine learning algorithms to predict subgrade resilient modulus. Infrastructures 2021, 6, 78. [Google Scholar] [CrossRef]
Sohel, I.H.; Rahman, S. Optimization of generation cost, environmental impact, and reliability of a microgrid using Non-dominated Sorting Genetic Algorithm-II. Planning 2020, 15, 1277–1284. [Google Scholar] [CrossRef]
Zhao, B.; Xue, Y.; Xu, B.; Ma, T.; Liu, J. Multi-objective classification based on NSGA-II. Int. J. Comput. Sci. Math. 2018, 9, 539–546. [Google Scholar] [CrossRef]
CSN EN 16258; Methodology for Calculation and Declaration of Energy Consumption and GHG Emissions of Transport Services (Freight and Passengers). CEN—European Committee for Standardization: Brussels, Belgium, 2012.
ASTM D2940; Standard Specification for Graded Aggregate Material for Bases or Subbases for Highways or Airports. ASTM International: West Conshohocken, PA, USA, 2020.
López-Acevedo, F.J.; Herrero, M.J.; Escavy Fernández, J.I.; González Bravo, J. Potential reduction in carbon emissions in the transport of aggregates by switching from road-only transport to an intermodal rail/road system. Sustainability 2024, 16, 9871. [Google Scholar] [CrossRef]
Dias, A.; Nezami, S.; Silvestre, J.; Kurda, R.; Silva, R.; Martins, I.; de Brito, J. Environmental and economic comparison of natural and recycled aggregates using LCA. Recycling 2022, 7, 43. [Google Scholar] [CrossRef]
Klyuev, R.; Bosikov, I.; Gavrina, O.; Madaeva, M.; Sokolov, A. Improving the energy efficiency of technological equipment at mining enterprises. In Proceedings of the International Scientific Conference Energy Management of Municipal Facilities and Sustainable Energy Technologies (EMMFT 2019), St. Petersburg, Russia, 24–26 September 2019. [Google Scholar] [CrossRef]
St-Jacques, M.; Bucking, S.; O’Brien, W.; Macdonald, I. Spatio-temporal electrical grid emission factors effects on calculated GHG emissions of buildings in mixed-grid environments. Sci. Technol. Built Environ. 2024, 30, 37–50. [Google Scholar] [CrossRef]
Smith, W.A.; Wendt, L.M.; Bonner, I.J.; Murphy, J.A. Effects of storage moisture content on corn stover biomass stability, composition, and conversion efficacy. Front. Bioeng. Biotechnol. 2020, 8, 716. [Google Scholar] [CrossRef]
Xie, J.; Wang, Z.; Wang, F.; Wu, S.; Chen, Z.; Yang, C. The Life Cycle Energy Consumption and Emissions of Asphalt Pavement Incorporating Basic Oxygen Furnace Slag by Comparative Study. Sustainability 2021, 13, 4540. [Google Scholar] [CrossRef]
Gschösser, F.; Wallbaum, H.; Adey, B.T. Environmental Analysis of New Construction and Maintenance Processes of Road Pavements in Switzerland. Struct. Infrastruct. Eng. 2014, 10, 1–24. [Google Scholar] [CrossRef]
Sun, H.; Burton, H.V.; Huang, H. Machine learning applications for building structural design and performance assessment: State-of-the-art review. J. Build. Eng. 2021, 33, 101816. [Google Scholar] [CrossRef]
Soleimani, F. Analytical seismic performance and sensitivity evaluation of bridges based on random decision forest framework. Structures 2021, 32, 329–341. [Google Scholar] [CrossRef]
Huang, Y.; Huo, Z.; Ma, G.; Zhang, L.; Wang, F.; Zhang, J. Multi-objective optimization of fly ash-slag based geopolymer considering strength, cost and CO₂; emission: A new framework based on tree-based ensemble models and NSGA-II. J. Build. Eng. 2023, 68, 106070. [Google Scholar] [CrossRef]
Khan, A.; Huyan, J.; Zhang, R.; Zhu, Y.; Zhang, W.; Ying, G.; Shah, S.K. An ensemble tree-based prediction of Marshall mix design parameters and resilient modulus in stabilized base materials. Constr. Build. Mater. 2023, 401, 132833. [Google Scholar] [CrossRef]
Wei, A.; Yu, K.; Dai, F.; Gu, F.; Zhang, W.; Liu, Y. Application of tree-based ensemble models to landslide susceptibility mapping: A comparative study. Sustainability 2022, 14, 6330. [Google Scholar] [CrossRef]
Campagner, A.; Ciucci, D.; Cabitza, F. Aggregation models in ensemble learning: A large-scale comparison. Inf. Fusion 2023, 90, 241–252. [Google Scholar] [CrossRef]
Ugirumurera, J.; Bensen, E.A.; Severino, J.; Sanyal, J. Addressing bias in bagging and boosting regression models. Sci. Rep. 2024, 14, 18452. [Google Scholar] [CrossRef]
Eloudi, H.; Hssaisoune, M.; Reddad, H.; Namous, M.; Ismaili, M.; Krimissa, S.; Bouchaou, L. Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability. Soil Syst. 2023, 7, 50. [Google Scholar] [CrossRef]
Matin, M.; Azadi, M. Effect of Training Data Ratio and Normalizing on Fatigue Lifetime Prediction of Aluminum Alloys with Machine Learning. Int. J. Eng. Trans. A Basics 2024, 37, 1296–1305. [Google Scholar] [CrossRef]
Alam, M.S.; Sultana, N.; Hossain, S.Z. Bayesian optimization algorithm based support vector regression analysis for estimation of shear capacity of FRP reinforced concrete members. Appl. Soft Comput. 2021, 105, 107281. [Google Scholar] [CrossRef]
Arrighi, L.; Pennella, L.; Marques Tavares, G.; Barbon Junior, S. Decision Predicate Graphs: Enhancing Interpretability in Tree Ensembles. In Proceedings of the World Conference on Explainable Artificial Intelligence (XAI 2024), Lisbon, Portugal, 8–11 July 2024; Springer Nature: Cham, Switzerland, 2024; pp. 311–332. [Google Scholar] [CrossRef]
Mangalathu, S.; Hwang, S.H.; Jeon, J.S. Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Eng. Struct. 2020, 219, 110927. [Google Scholar] [CrossRef]
Lyngdoh, G.A.; Zaki, M.; Krishnan, N.A.; Das, S. Prediction of concrete strengths enabled by missing data imputation and interpretable machine learning. Cem. Concr. Compos. 2022, 128, 104414. [Google Scholar] [CrossRef]
Qiao, Q.; Eskandari, H.; Saadatmand, H.; Sahraei, M.A. An interpretable multi-stage forecasting framework for energy consumption and CO₂; emissions for the transportation sector. Energy 2024, 286, 129499. [Google Scholar] [CrossRef]
Asfi, M.; Warsito, B.; Wibowo, A. Enhancing Explainable AI: Leveraging SHAP for Transparent Decision-Making in Machine Learning. In Proceedings of the 2024 Ninth International Conference on Informatics and Computing (ICIC), Yogyakarta, Indonesia, 22–23 October 2024. [Google Scholar] [CrossRef]
Feretzakis, G.; Sakagianni, A.; Anastasiou, A.; Kapogianni, I.; Bazakidou, E.; Koufopoulos, P.; Verykios, V.S. Integrating Shapley values into machine learning techniques for enhanced predictions of hospital admissions. Appl. Sci. 2024, 14, 5925. [Google Scholar] [CrossRef]
Alomari, Y.; Andó, M. SHAP-based insights for aerospace PHM: Temporal feature importance, dependencies, robustness, and interaction analysis. Results Eng. 2024, 21, 101834. [Google Scholar] [CrossRef]
Al-Saadi, I.; Wang, H.; Chen, X.; Lu, P.; Jasim, A. Multi-objective optimization of pavement preservation strategy considering agency cost and environmental impact. Int. J. Sustain. Transp. 2021, 15, 826–836. [Google Scholar] [CrossRef]
Bravo, M.; Rojas, L.P.; Parada, V. An evolutionary algorithm for the multi-objective pick-up and delivery pollution-routing problem. Int. Trans. Oper. Res. 2019, 26, 302–317. [Google Scholar] [CrossRef]
Rajkumar, M.; Mahadevan, K.; Kannan, S.; Baskar, S. NSGA-II technique for multi-objective generation dispatch of thermal generators with nonsmooth fuel cost functions. J. Electr. Eng. Technol. 2014, 9, 423–432. [Google Scholar] [CrossRef]
Xue, Y. Mobile robot path planning with a non-dominated sorting genetic algorithm. Appl. Sci. 2018, 8, 2253. [Google Scholar] [CrossRef]
Deeb, A.; Khokhlovskiy, V.; Shkodyrev, V. Adaptive Simulated Binary Crossover with Bayesian Optimization for Industrial Applications. In Proceedings of the 2025 International Russian Smart Industry Conference (SmartIndustryCon), Moscow, Russia, 20–22 March 2025; pp. 608–613. [Google Scholar] [CrossRef]
Dong, W.; Huang, Y.; Lehane, B.; Ma, G. Multi-objective design optimization for graphite-based nanomaterials reinforced cementitious composites: A data-driven method with machine learning and NSGA-II. Constr. Build. Mater. 2022, 331, 127198. [Google Scholar] [CrossRef]
Vásquez, L.O.P.; Redondo, J.L.; Hervás, J.D.Á.; Ramírez, V.M.; Torres, J.L. Balancing CO₂; emissions and economic cost in a microgrid through an energy management system using MPC and multi-objective optimization. Appl. Energy 2023, 347, 120998. [Google Scholar] [CrossRef]
Madanchian, M.; Taherdoost, H. A Comprehensive Guide to the TOPSIS Method for Multi-Criteria Decision Making. Sustain. Soc. Dev. 2023, 1, 2220. [Google Scholar] [CrossRef]
Zhang, Y.; Chouinard, L.E.; Power, G.J.; Conciatori, D.; Sasai, K.; Bah, A.S. Multi-Objective Optimization for the Sustainability of Infrastructure Projects under the Influence of Climate Change. Sustain. Resil. Infrastruct. 2023, 8, 492–513. [Google Scholar] [CrossRef]
Sinha, R.K.; Chaturvedi, N.D. Multi-Criteria Decision-Making in Carbon-Constrained Scenario for Sustainable Production Planning. Process Integr. Optim. Sustain. 2021, 5, 905–917. [Google Scholar] [CrossRef]
Rahman, S.; Alali, A.S.; Baro, N.; Ali, S.; Kakati, P. A Novel TOPSIS Framework for Multi-Criteria Decision Making with Random Hypergraphs: Enhancing Decision Processes. Symmetry 2024, 16, 1602. [Google Scholar] [CrossRef]
Karatzetzou, A. Uncertainty and Latin Hypercube Sampling in Geotechnical Earthquake Engineering. Geotechnics 2024, 4, 1007–1025. [Google Scholar] [CrossRef]
Nyirandayisabye, R.; Li, H.; Dong, Q.; Hakuzweyezu, T.; Nkinahamira, F. Automatic Pavement Damage Predictions Using Various Machine Learning Algorithms: Evaluation and Comparison. Results Eng. 2022, 16, 100657. [Google Scholar] [CrossRef]
Guo, R.; Fu, D.; Sollazzo, G. An Ensemble Learning Model for Asphalt Pavement Performance Prediction Based on Gradient Boosting Decision Tree. Int. J. Pavement Eng. 2022, 23, 3633–3646. [Google Scholar] [CrossRef]
Liu, S.; Ryu, D.; Webb, J.A.; Lintern, A.; Guo, D.; Waters, D.; Western, A.W. A Multi-Model Approach to Assessing the Impacts of Catchment Characteristics on Spatial Water Quality in the Great Barrier Reef Catchments. Environ. Pollut. 2021, 288, 117337. [Google Scholar] [CrossRef] [PubMed]
Kaniuka, J.; Ostrysz, J.; Groszyk, M.; Bieniek, K.; Cyperski, S.; Domański, P.D. Multicriteria Machine Learning Model Assessment—Residuum Analysis Review. Electronics 2024, 13, 810. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, Z.; Crabbe, M.J.C.; Chandra Das, L. Statistical Learning-Based Spatial Downscaling Models for Precipitation Distribution. Adv. Meteorol. 2022, 2022, 3140872. [Google Scholar] [CrossRef]
Robson, B.J.; Andrewartha, J.; Baird, M.E.; Herzfeld, M.; Jones, E.M.; Margvelashvili, N.; Wild-Allen, K. Evaluating the eReefs Great Barrier Reef marine model against observed emergent properties. In Proceedings of the 22nd International Congress on Modelling and Simulation (MODSIM2017), Hobart, Australia, 3–8 December 2017; Modelling and Simulation Society of Australia and New Zealand: Hobart, Australia, 2017; pp. 1976–1982. [Google Scholar] [CrossRef]
Can, M.; Vaheddoost, B.; Safari, M.J.S. Data Reconstruction for Groundwater Wells Proximal to Lakes: A Quantitative Assessment for Hydrological Data Imputation. Water 2025, 17, 718. [Google Scholar] [CrossRef]
Khiari, J.; Olaverri-Monreal, C. Boosting Algorithms for Delivery Time Prediction in Transportation Logistics. In Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), Sorrento, Italy, 17–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 251–258. [Google Scholar] [CrossRef]
Qi, L.; Peng, X.; Yang, Q.; Xia, K.; Xu, B. Review of Research on Prediction Models for Residual Life of Concrete Structures. Coatings 2025, 15, 693. [Google Scholar] [CrossRef]
Roni, M.S.; Lin, Y.; Hartley, D.S.; Thompson, D.N.; Hoover, A.N.; Emerson, R.M. Importance of incorporating spatial and temporal variability of biomass yield and quality in bioenergy supply chain. Sci. Rep. 2023, 13, 6813. [Google Scholar] [CrossRef]
Meglin, R.; Kytzia, S.; Habert, G. Uncertainty, variability, price changes and their implications on a regional building materials industry: The case of Swiss canton Argovia. J. Clean. Prod. 2022, 330, 129944. [Google Scholar] [CrossRef]
Eštoková, A.; Fabianová, M.; Radačovský, M. Life cycle assessment and environmental impacts of building materials: Evaluating transport-related factors. Eng. Proc. 2023, 57, 5. [Google Scholar] [CrossRef]
An, P.; Han, Z.; Wang, K.; Cheng, J.; Zhou, J.; Rizkiana, J.; Xu, G. Process analysis of a two-stage fluidized bed gasification system with and without pre-drying of high-water content coal. Can. J. Chem. Eng. 2021, 99, 1498–1509. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, H.; Wang, B.; Li, H.; Ma, J.; Zhang, B.; Shan, Y. Trade-Offs between Direct Emission Reduction and Intersectoral Additional Emissions: Evidence from the Electrification Transition in China’s Transport Sector. Environ. Sci. Technol. 2023, 57, 11389–11400. [Google Scholar] [CrossRef]
Luo, W.; Sandanayake, M.; Zhang, G.; Tan, Y. Construction Cost and Carbon Emission Assessment of a Highway Construction—A Case towards Sustainable Transportation. Sustainability 2021, 13, 7854. [Google Scholar] [CrossRef]
de Bortoli, A. Understanding the Environmental Impacts of Virgin Aggregates: Critical Literature Review and Primary Comprehensive Life Cycle Assessments. J. Clean. Prod. 2023, 415, 137629. [Google Scholar] [CrossRef]
Hossain, M.I.; Veginati, V.; Krukow, J. Thermodynamics Between RAP/RAS and Virgin Aggregates During Asphalt Concrete Production: A Literature Review; FHWA-ICT-15-015; Federal Highway Administration, Illinois Department of Transportation, Bureau of Materials and Physical Research: Springfield, IL, USA, 2015. Available online: https://rosap.ntl.bts.gov/view/dot/29561 (accessed on 29 July 2025).
Knobloch, F.; Hanssen, S.V.; Lam, A.; Pollitt, H.; Salas, P.; Chewpreecha, U.; Mercure, J.F. Net emission reductions from electric cars and heat pumps in 59 world regions over time. Nat. Sustain. 2020, 3, 437–447. [Google Scholar] [CrossRef]
Aryai, V.; Goldsworthy, M. Real-time high-resolution modelling of grid carbon emissions intensity. Sustain. Cities Soc. 2024, 104, 105316. [Google Scholar] [CrossRef]
Ajayi, S.O.; Oyedele, L.O.; Ceranic, B.; Gallanagh, M.; Kadiri, K.O. Life cycle environmental performance of material specification: A BIM-enhanced comparative assessment. Int. J. Sustain. Build. Technol. Urban Dev. 2015, 6, 14–24. [Google Scholar] [CrossRef]
Zhang, F.; Wen, B.; Niu, D.; Li, A.; Guo, B. Optimized design of low-carbon mix ratio for concrete using NSGA-II based on GA-improved back propagation. Materials 2024, 17, 4077. [Google Scholar] [CrossRef]
Krishna, U.S.R.; Badiger, M.; Chaudhary, Y.; Gowri, T.V.; Devi, E.J. Optimizing roads for sustainability: Inverted pavement design with life cycle cost analysis and carbon footprint estimation. Int. J. Transp. Sci. Technol. 2025, 17, 251–275. [Google Scholar] [CrossRef]
Seyedashraf, O.; Bottacin-Busolin, A.; Harou, J.J. Assisting decision-makers select multi-dimensionally efficient infrastructure designs–Application to urban drainage systems. J. Environ. Manag. 2023, 336, 117689. [Google Scholar] [CrossRef] [PubMed]
Schütze, F.; Stede, J.; Blauert, M.; Erdmann, K. EU taxonomy increasing transparency of sustainable investments. DIW Wkly. Rep. 2020, 10, 485–492. [Google Scholar] [CrossRef]
Chowdury, M.H.; Hossain, M.M. A Framework for Selecting Optimal Strategies to Mitigate the Corporate Sustainability Barriers. Corp. Ownersh. Control 2015, 13, 462–481. [Google Scholar] [CrossRef]
Bahmani, H.; Mostafaei, H.; Santos, P.; Fallah Chamasemani, N. Enhancing the Mechanical Properties of Ultra-High-Performance Concrete (UHPC) through Silica Sand Replacement with Steel Slag. Buildings 2024, 14, 3520. [Google Scholar] [CrossRef]
Guerra, K.; Haro, P.; Gutiérrez, R.E.; Gómez-Barea, A. Facing the High Share of Variable Renewable Energy in the Power System: Flexibility and Stability Requirements. Appl. Energy 2022, 310, 118561. [Google Scholar] [CrossRef]

Figure 1. Schematic flow diagram of the methodological framework.

Figure 3. The cost and emission prediction plots.

Figure 4. Q–Q plot for GBR residuals: (a) cost prediction, (b) CO₂ emission.

Figure 5. Calibration curve for GBR predictions: (a) cost prediction, (b) CO₂ emission.

Figure 6. SHAP dependence plots illustrating the marginal effect of key input variables on the predicted cost.

Figure 7. Feature importance for cost prediction.

Figure 8. Feature importance for CO₂ emission prediction.

Figure 9. SHAP dependence plots illustrating the marginal effect of key input variables on the predicted CO₂ emission cost.

Figure 10. Pareto front: cost and CO₂ Emission trade-off.

Figure 11. Parallel coordinates plot of input variables across selected pareto solution.

Figure 12. Comparison of selected pareto and TOPSIS-optimal solutions.

Figure 13. Pareto frontier with uncertainty envelope from 500 stochastic realizations.

Table 1. Descriptive statistics of the engineered input variables (n = 482, post-cleaning).

Variable	Unit	Min	Max	Mean	Std. Dev.	Skewness
Transport Distance (X₁)	km	5.00	250	89.74	56.23	0.83
Processing Energy (X₂)	kWh/ton	0.65	2.10	1.38	0.47	0.48
Moisture Content (X₃)	%	2.10	15.0	8.72	3.19	0.22
Gradation Adjustment (X₄)	categorical	0	2.00	0.96	0.69	0.61
Grid CO₂ Factor (X₅)	kg CO₂/kWh	0.24	1.05	0.61	0.21	0.27

Table 2. Predictive performance of tree-based ensemble models on the test set.

Model	Output	R	MAE	RMSE	a20 (%)
RFR	Cost (USD/m³)	0.951	4.62	6.34	82.1
RFR	CO₂ (kg/m³)	0.938	7.25	9.17	79.3
ERT	Cost (USD/m³)	0.949	4.49	6.11	81.7
ERT	CO₂ (kg/m³)	0.942	6.98	8.86	80.4
GBR	Cost (USD/m³)	0.962	3.84	5.23	87.5
GBR	CO₂ (kg/m³)	0.955	5.72	7.44	85.2
XGBR	Cost (USD/m³)	0.960	4.07	5.34	86.2
XGBR	CO₂ (kg/m³)	0.954	6.01	7.62	84.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akbas, M. Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers. Appl. Sci. 2025, 15, 8516. https://doi.org/10.3390/app15158516

AMA Style

Akbas M. Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers. Applied Sciences. 2025; 15(15):8516. https://doi.org/10.3390/app15158516

Chicago/Turabian Style

Akbas, Merve. 2025. "Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers" Applied Sciences 15, no. 15: 8516. https://doi.org/10.3390/app15158516

APA Style

Akbas, M. (2025). Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers. Applied Sciences, 15(15), 8516. https://doi.org/10.3390/app15158516

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrated GBR–NSGA-II Optimization Framework for Sustainable Utilization of Steel Slag in Road Base Layers

Abstract

1. Introduction

2. Data Source Integrity and Parametric Space Definition

2.1. Variable Selection: Engineering Rationale

2.2. Data Source Integrity and Inclusion Criteria

2.3. Output Variable Definitions and Calculation Methods

3. Methodology

3.1. Predictive Modeling Using Tree-Based Ensemble Learning Algorithms

3.2. Feature Importance and Interpretability Using SHAP

3.3. Formulation of the Multi-Objective Optimization Problem

3.4. Optimization Using NSGA-II

3.5. Decision Making Using TOPSIS

3.6. Uncertainty Propagation with Triangular Distributions

4. Results and Discussion

4.1. Predictive Model Performance and Interpretation

4.2. Feature Importance and SHAP Analysis

4.3. Multi-Objective Optimization Results

4.4. Optimal Solution Selection and Practical Interpretation

4.5. Robustness of the Optimal Configuration Under Uncertainty

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI