A Data-Driven Hybrid Intelligent Optimization Framework for Sustainable Mineral Resource Extraction

Xu, Ziying; Sun, Jinshan; Lv, Haoyuan; Sun, Yang

doi:10.3390/su17209143

Open AccessArticle

A Data-Driven Hybrid Intelligent Optimization Framework for Sustainable Mineral Resource Extraction

¹

State Key Laboratory of Precision Blasting, Jianghan University, Wuhan 430056, China

²

Hubei (Wuhan) Institute of Explosion Science and Blasting Technology, Jianghan University, Wuhan 430056, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(20), 9143; https://doi.org/10.3390/su17209143

Submission received: 11 September 2025 / Revised: 9 October 2025 / Accepted: 13 October 2025 / Published: 15 October 2025

(This article belongs to the Special Issue Data-Driven Sustainable Development: Techniques and Applications)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of mean fragment size is a fundamental requirement for enhancing operational efficiency, reducing ecological disturbances, and fostering the sustainable use of mineral resources. However, traditional empirical and statistical approaches often struggle with high-dimensional variables, limited computational speed, and the challenge of modeling small or sparse datasets. This study proposes a hybrid machine learning optimization framework that integrates Random Forest (RF), Whale Optimization Algorithm (WOA), and Extreme Gradient Boosting (XGBoost). Based on high-dimensional and small-sample data collected from historical blasting operations in open-pit mines, the framework employs a data-driven approach to construct a prediction model for mean fragment size, with the aim of enhancing the sustainability of mineral resource extraction through optimized blast design. The raw blasting fragmentation dataset was first preprocessed using a multi-step procedure to improve data quality. RF was then employed to assess and select 19 input features for dimensionality reduction, while WOA was utilized to optimize the hyperparameters of the predictive model. Finally, XGBoost was applied to model the small-sample blasting fragmentation dataset. Comparative experiments demonstrated that the proposed model achieved superior predictive performance with a coefficient of determination (R²) of 0.93. In addition, the cosine amplitude method was used to analyze the sensitivity of different variables affecting the mean fragment size (MFS), and the SHAP method was applied to quantitatively reveal the marginal contribution of each input variable to the prediction.

Keywords:

sustainable mining; mean fragment size; intelligent prediction; machine learning; optimization algorithm

1. Introduction

Mineral resources are indispensable to sustainable human development [1,2]. However, when blasting parameters are poorly designed, the operation often yields an excessive amount of oversized fragments and unfavorable muckpile characteristics. Such outcomes reduce the efficiency of loading and hauling processes, elevate the expenses of secondary breakage, and accelerate equipment degradation, ultimately constraining the long-term sustainability of open-pit mining operations [3].

Rock fragmentation after blasting is the most critical indicator of blasting performance as it directly affects subsequent operations and overall production costs. While oversized fragments often require costly secondary blasting, excessively fine fragments can also raise operational expenses. Prediction errors in the mean fragment size are a key factor threatening the sustainability of mining because they result in higher energy consumption, severe dust and vibration problems, and intensified environmental damage, ultimately restricting the long-term sustainable use of mineral resources. Therefore, improving the prediction of blasting fragmentation is vital to advancing environmentally friendly, low-carbon, and energy-efficient mining practices that support sustainable development.

Although historical records of blasting parameters and fragmentation outcomes are available, conventional analytical methods have not fully exploited their potential. Previous research has examined the relationships between blasting parameters and fragmentation outcomes, resulting in the development of several empirical models [4,5,6,7,8]. These empirical models were constructed on the basis of extensive blasting case histories, the rock fracture mechanism in blasting, and mathematical statistical methods. Nevertheless, because fragment size is governed by the combined influence of numerous interdependent factors, constructing a fitting equation that accounts for all variables remains highly challenging [9].

In recent years, artificial intelligence (AI) techniques have advanced rapidly and demonstrated strong capabilities in modeling nonlinear interactions among high-dimensional variables. Consequently, AI-based approaches have been increasingly employed to address prediction tasks in mining engineering [10,11,12,13,14,15,16,17]. Several advanced models have been successfully applied to predict blasting fragmentation, highlighting their potential for practical engineering applications [18,19,20,21,22,23,24]. Miao et al. [25] employed a Support Vector Machine (SVM) to establish a predictive model of blasting fragmentation, using multiple blasting parameters as input variables, and achieved accurate prediction of the mean fragment size. Akyildiz et al. [26] adopted an Adaptive Neuro-Fuzzy Inference System (ANFIS) for modeling, successfully realizing high-accuracy prediction of rock fragmentation characteristics by blasting and demonstrating the advantages of ANFIS in dealing with nonlinear problems. Rong et al. [27] applied a deep learning model, namely a Multi-Layer Perceptron (MLP), using blasting datasets collected from multiple open-pit mines and augmented through data enhancement, to predict the mean blast fragment size. Moreover, machine learning algorithms based on decision trees construct predictive models through recursive partitioning of the data space, thereby achieving predictive functionality. Compared with commonly used algorithms such as SVM and MLP, tree-based models have advantages in handling high-dimensional sparse data and perform particularly well in engineering applications where rapid parameter tuning is required [28]. Although tree models such as Random Forest (RF) and XGBoost have been extensively applied in engineering fields [29,30,31,32,33,34], their application to rock fragmentation prediction has received comparatively less attention.

Nevertheless, hyperparameter selection has a significant influence on model accuracy, and therefore further optimization of XGBoost hyperparameters is required. Swarm intelligence algorithms, inspired by the collective behavior of natural populations, provide an efficient means of exploring the search space to identify optimal solutions. Predictive models that incorporate hyperparameter optimization using the Gray Wolf Optimizer (GWO), Firefly Algorithm (FFA), Whale Optimization Algorithm (WOA), and Particle Swarm Optimization (PSO) have been reported to improve prediction performance by up to 40% [35,36]. Among these, WOA simulates the bubble-net hunting strategy of humpback whales and combines a spiral function with stochastic search mechanisms, which enables it to flexibly address high-dimensional and nonlinear optimization problems. Compared with other optimization algorithms, WOA demonstrates faster convergence speed and is more suitable for practical engineering applications.

Predicting post-blast fragment size usually requires handling high-dimensional input variables. RF provides a natural solution by assessing feature importance, which makes it particularly suitable for selecting key predictors in fragmentation modeling [37]. In addition, machine learning models are often characterized by a black-box nature, which results in poor interpretability. SHapley Additive exPlanations (SHAP) provide an effective approach by quantifying the contribution of each feature to the prediction results, thereby enhancing the understanding of model decisions [38,39].

To enhance the sustainable utilization of open-pit resources, this study introduces a data-driven hybrid framework that combines machine learning with intelligent optimization to improve the prediction of mean blasting fragment size by integrating RF, WOA, and XGBoost algorithms. High-dimensional raw data are first preprocessed, and RF is employed for feature selection, where feature importance is quantified and an adaptive threshold is applied to reduce dimensionality while preserving key predictive information. WOA is then used to automatically optimize the hyperparameters of XGBoost in a data-driven manner. Finally, the RF-WOA-XGBoost hybrid model is constructed and applied to a blasting fragmentation dataset from a sandstone formation in an open-pit mine in India [40]. Experimental results demonstrate that, under high-dimensional data conditions, the proposed framework significantly enhances fragmentation prediction performance by combining feature selection with hyperparameter optimization. Furthermore, the SHAP framework is employed to interpret the model predictions and visualize the decision-making process. Overall, the proposed data-driven hybrid optimization framework not only improves the accuracy of mean blasting fragmentation predictions but also provides a practical methodological reference for promoting sustainable mining practices.

2. Proposed Hybrid Modeling Framework

2.1. Feature Selection Based on RF

RF model [41] is an ensemble learning algorithm that integrates multiple Decision Trees (DTs) within the Bagging framework. It evaluates feature importance using Gini impurity, and an adaptive thresholding strategy is then applied to retain highly informative variables while discarding redundant or less relevant ones. The calculation of feature importance based on RF is shown in Figure 1. For each decision tree DT_t, if a feature

f

is used for splitting at multiple nodes, the importance value

F I_{t} (f)

of feature

f

is expressed as

F I_{t} (f) = \sum_{n \in N_{f}} (Gini (n) - Gini (n_{L}) - Gini (n_{R}))

(1)

where

n

,

n_{L}

and

n_{R}

are the current node, the left child node after splitting, and the right child node, respectively;

Gini (n)

,

Gini (n_{L})

and

Gini (n_{R})

represent the Gini impurities of nodes

n

,

n_{L}

and

n_{R}

, respectively;

N_{f}

denotes the set of all nodes split using feature

f

.

If RF is composed of T decision trees, the total importance value of feature

f

is given as

F I (f) = \frac{1}{T} \sum_{t = 1}^{T} F I_{t} (f)

(2)

For comparison and ranking purposes, the importance values of all K features are normalized as follows

Normalized F I (f) = \frac{F I (f)}{\sum_{k = 1}^{K} F I (k)}

(3)

The normalized importance value of each feature represents its relative contribution to the overall model. By adopting a threshold strategy, features with normalized importance values higher than the threshold are selected, thereby eliminating redundant or less significant features for the prediction results. This method effectively reduces the dimensionality of features while retaining those most useful for the predictive model.

2.2. Parameters Optimization Based on WOA

WOA [42] is a swarm intelligence optimization algorithm inspired by the bubble-net hunting behavior of humpback whales. Whales are among the relatively intelligent marine species, and humpback whales have even evolved unique communication strategies. WOA simulates the foraging behavior of humpback whales, as shown in Figure 2, which can be summarized into three main phases: encircling prey, bubble-net feeding maneuver, and prey search.

In the iterative search process, the current best solution was considered the prey target. By updating their positions, whales gradually approach the prey, eventually achieving encirclement. This process can be represented as

X_{(t + 1)} = X_{(t)}^{*} - A \cdot D

(4)

D = C \cdot X_{(t)}^{*} - X_{(t)}

(5)

A = 2 a \cdot r a n d () - a

(6)

C = 2 \cdot r a n d ()

(7)

a = 2 - 2 t / t_{\max}

(8)

where

X_{(t + 1)}

is the position of the whale at the next iteration;

X_{(t)}^{*}

and

X_{(t)}

represent the position of the best solution (prey position) and the position of the current whale individual, respectively;

D

is the distance between the whale and the prey;

A

and

C

are coefficient vectors;

r a n d ()

is a random number in the range [0, 1] expanded into a vector with identical components for calculation;

a

is the convergence factor;

t

and

t_{\max}

denote the current iteration number and the maximum number of iterations, respectively.

Two mathematical models are used to simulate the shrinking encircling mechanism and the spiral updating mechanism in the bubble-net feeding behavior. The choice between the two mechanisms is controlled by the probability p, which is a random number within [0, 1], expressed as

X_{(t + 1)} = \{\begin{array}{l} X_{(t)}^{*} - A \cdot D, & p < 0.5 \\ D^{'} \cdot e^{b l} \cdot \cos (2 π l) + X_{(t)}^{*}, & p \geq 0.5 \end{array}

(9)

D^{'} = |X_{(t)}^{*} - X_{(t)}|

(10)

where

D^{'}

is the distance between the search agent and the target prey.

The large-scale movement of humpback whales to search for prey is designed as a global search strategy, which can be expressed as

X_{(t + 1)} = X_{r a n d} - A \cdot D

(11)

D = |C \cdot X_{r a n d} - X_{(t)}|

(12)

where

X_{r a n d}

represents the position of a randomly selected whale individual.

2.3. Prediction Based on XGBoost

The mean blast fragment size after open-pit blasting is influenced by multiple factors, and the acquisition of relevant data is relatively difficult. XGBoost is selected as the predictive tool for mean blast fragment size, which is particularly advantageous for small datasets [43]. XGBoost employs the gradient boosting algorithm to iteratively train models, compensating for errors from the previous iteration. In each iteration, a new decision tree is generated and integrated with the previously generated tree, thereby progressively enhancing the performance of the model, as shown in Figure 3. The objective function of XGBoost consists of a loss function and a regularization term, given as Equation (13). This design enables stable training and good generalization capability on small-sample datasets.

O b j = \sum_{i = 1}^{n} L (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{m} Ω (f_{k})

(13)

In Equation (13), the first term

\sum_{i = 1}^{n} L (y_{i}, {\hat{y}}_{i})

represents the loss function, which measures the error between the predicted value and the actual value, while the second term

\sum_{k = 1}^{m} Ω (f_{k})

denotes the regularization penalty term, which controls the complexity of the model. The incorporation of regularization is considered to improve the generalization ability of the model and to prevent overfitting.

In addition to the objective function, the prediction function of XGBoost can be expressed as follows:

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} (x_{i})

(14)

where

{\hat{y}}_{i}

denotes the predicted value of the i-th sample, and

f_{k}

represents the k-th regression tree in the ensemble. The final prediction is obtained by the additive combination of multiple regression trees, each sequentially trained to minimize the residual errors of its predecessor. This additive modeling strategy enables XGBoost to capture complex nonlinear relationships and progressively enhance predictive accuracy while maintaining robustness on small-sample datasets.

2.4. A Hybrid Model Framework RF-WOA-XGBoost

Based on the analysis of the blasting fragmentation dataset from open-pit mines, a tri-model hybrid optimization model, namely RF-WOA-XGBoost, was developed under the Scikit-Learn framework in Python 3.9 by integrating the complementary advantages of RF, WOA, and XGBoost. The pseudocode of the model is presented in Table 1. Following analysis and preprocessing of the original dataset, RF was applied to select the most relevant variables from the 19 input features, thereby reducing dimensionality. The dataset was then divided into training and testing sets with a ratio of 9:1. WOA was subsequently used to optimize the hyperparameters of XGBoost and identify their best combination. The model’s predictive accuracy and stability were assessed using five-fold cross-validation along with the testing dataset. By combining RF’s feature selection capability, WOA’s global optimization power, and XGBoost’s efficient nonlinear modeling, the hybrid framework enables precise prediction of mean blast fragment size in open-pit mines. The overall workflow of the RF-WOA-XGBoost model is illustrated in Figure 4. The training dataset and source code are available upon request.

3. Data Processing and Hyperparameter Optimization

3.1. Data Description and Correlation Analysis

The raw dataset was collected from an open-pit mine in India, where the blasting process was carried out in overburden sandstone strata without significant geological anomalies [40]. A total of 76 samples were included in the dataset, comprising 19 input variables and 1 output variable. The distribution of variables is shown in Figure 5. According to engineering attributes, the input variables can be classified into three categories: (i) blasting design parameters, including borehole diameter (D), average bench height (H), average drilling subdrilling depth (J), average spacing (S), average burden (B), average stemming length (T), average bench length (L), average bench width (W), spacing-to-burden ratio (S/B), stemming-to-burden ratio (T/B), bench stiffness ratio (H/B), subdrilling-to-burden ratio (J/B), burden-to-diameter ratio (B/D), bench length-to-width ratio (L/W), and number of blast holes (NH); (ii) explosive parameters, including total mass of explosives (Qe), charge density per unit length (De), and powder factor (PF); (iii) rock mass parameter, characterized by uniaxial compressive strength (UCS). The output variable is the mean fragment size (MFS). The statistical characteristics of the dataset are provided in Table 2.

A systematic correlation analysis was conducted among all variables in the raw dataset, as illustrated in Figure 6. According to the correlation coefficients, T, W, T/B, NH, and UCS are identified as the five most significant variables affecting the output variable MFS. The explosive parameters Qe, De, and PF also exhibit substantial influence on MFS from a physical perspective. Among them, PF shows relatively weak correlation with other input variables, with the highest correlation coefficient of only 0.23. Variables J/B and L/W not only represent the characteristics of the input variables J, B, L, and W but also demonstrate no strong correlation with other input variables. Variables D, H, and S exhibit strong correlations with multiple other input variables. Specifically, the numbers of input variables showing correlation coefficients greater than 0.7 with D, H, and S are 10, 9, and 10, respectively.

3.2. Data Processing

A multi-step data preprocessing procedure was applied in Python to enhance the quality of the modeling dataset, including outlier handling, standardization, and normalization.

Outliers are considered as data points that significantly deviate from the main distribution and can substantially impair the predictive capability of the model. The boxplot method was utilized to detect and remove such outliers. For each variable, the first quartile (Q1) and third quartile (Q3) were calculated, and the interquartile range (IQR) was obtained using the pandas library as follows

IQR = Q 3 - Q 1

(15)

Based on the IQR, Q1 and Q3 the upper and lower bounds were determined as

lower bound = Q 1 - 1.5 \times IQR

(16)

upper bound = Q 3 + 1.5 \times IQR

(17)

Data points that fell outside the calculated upper and lower bounds were treated as outliers. Once identified, the corresponding rows were removed to prevent any adverse impact on subsequent model training.

Standardization was then performed, which transforms the data to have zero mean and unit variance. The Z-score method, implemented using the preprocessing module of scikit-learn, was adopted to standardize the dataset. Using Equations (18) and (19), the transformed data followed a standard normal distribution with a mean of 0 and a variance of 1.

σ = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i} - μ)}^{2}}

(18)

z_{i} = \frac{x_{i} - μ}{σ}

(19)

where

x_{i}

represents the i-th data point,

μ

denotes the mean of the feature data,

σ

represents the standard deviation of the feature data, and

z_{i}

is the standardized data point.

Normalization was subsequently carried out by linearly scaling the data to a specified range. The min–max normalization method, also implemented in scikit-learn, was applied to map the dataset into the interval [0, 1], as expressed by

x_{i}^{'} = \frac{x_{i} - \min (X)}{\max (X) - \min (X)}

(20)

where

\min (X)

and

\max (X)

denote the minimum and maximum values of the dataset

X

, respectively.

To enhance data quality and the stability of model training, 17 outliers were first identified and removed using the boxplot method. The remaining 59 samples were then standardized and normalized in order to eliminate the influence of dimensional inconsistencies among variables. Considering the relatively small sample size of the dataset, the data were divided into training and testing subsets at a ratio of 9:1. A total of 90% of the samples were utilized to establish the predictive model, and the remaining 10% were reserved for independent testing to evaluate the model performance.

3.3. Feature Selection

Based on prior information obtained from correlation analysis, feature importance analysis was conducted using RF to reduce redundancy among features in the preprocessed dataset. The ranking and corresponding importance values of the input features in the RF model are presented in Figure 7. Through the built-in feature importance evaluation mechanism of RF, the contribution of each feature to the output variable MFS was quantified. By adjusting the threshold parameter of feature importance, it was determined that when the threshold was set to 0.04, the input dimensionality of the model was significantly reduced while maintaining high predictive accuracy. A threshold higher than 0.04 resulted in the loss of important features, whereas a threshold lower than 0.04 introduced redundant noisy features, which increased model complexity without notable improvement in performance. Therefore, the threshold was fixed at 0.04. By integrating the feature importance ranking obtained from RF with the variable dependencies revealed by the previous correlation analysis, the following features were selected for model training: T, W, T/B, J/B, L/W, NH, Qe, De, PF, and UCS.

3.4. Hyperparameter Optimization

During the hyperparameter optimization of XGBoost, the selection of the hyperparameter search ranges significantly affected model complexity and convergence efficiency. The ranges of several key hyperparameters are presented in Table 3, where small adjustments in the values of max_depth and learning_rate caused considerable fluctuations in model performance. Therefore, refined adjustments were conducted for the intervals of max_depth and learning_rate. The default ranges for max_depth and learning_rate were [2, 6] and [0.1, 0.5], respectively. The optimal ranges were determined by gradually adjusting the upper and lower bounds of the intervals. When max_depth exceeded 4, both the mean squared error (MSE) of the testing set and the error bar (defined as the difference between the maximum and minimum values of MSE within the interval) increased markedly, which lead to unsatisfactory and unstable prediction performance. When learning_rate exceeded 0.4, both the MSE and the error bar of the testing set suddenly increased by approximately 180% compared with the preceding interval. Consequently, the value ranges of max_depth and learning_rate were finally set to [2, 4] and [0.1, 0.4], respectively, as illustrated in Figure 8 and Figure 9.

During the hyperparameter optimization of XGBoost using WOA, each candidate solution was encoded as a vector of hyperparameter configurations, and the MSE was employed as the fitness evaluation index. In the initial stage, a global search strategy was applied to widely sample the predefined parameter space, ensuring sufficient global exploration. Subsequently, a local search mechanism was used to progressively narrow the search range, enabling a refined exploration of potentially optimal regions. Figure 10 presents the fitness curves of different populations. The model exhibited rapid convergence within the first ten iterations, corresponding to the global search stage of WOA. During iterations 10–40, the curve slope became less steep and displayed a stepwise variation, indicating the transition of the algorithm from global search to local search. After 40 iterations, the curves became nearly stable, which implied that the algorithm had completed the transition from global exploration to local exploitation and achieved convergence under the current parameter configuration. It is worth noting that when the whale population size was set to 20, the lowest fitness value of approximately 0.02 × 10⁻³ was achieved. Larger populations did not further reduce the fitness value but slightly increased convergence time and computational cost. Therefore, a population size of 20 and 40 iterations were selected to achieve a good balance between optimization performance and computational efficiency. The optimal combination of hyperparameters is summarized in Table 3.

4. Verification of Prediction Performance and Comparative Analysis

4.1. Cross-Validation

Considering the characteristics of small- to medium-scale datasets, the predictive performance of the model was comprehensively evaluated using three validation strategies: five-fold cross-validation (5-Fold CV), leave-one-out cross-validation (LOOCV), and repeated five-fold cross-validation with 10 repetitions, as summarized in Table 4. In the 5-Fold CV procedure, the original training dataset was uniformly divided into five mutually exclusive subsets, as illustrated in Figure 11. In each iteration, four subsets were used as the training set while the remaining subset served as the validation set. After five iterations, the mean and the standard deviation of the MSE were calculated as the evaluation indicators. The results presented in Table 4 show that 5-Fold CV yielded a mean MSE of 0.004599 with a standard deviation of 0.002167; LOOCV produced a mean MSE of 0.005441 with a standard deviation of 0.012943; and repeated 5-Fold CV with 10 repetitions resulted in a mean MSE of 0.005029 with a standard deviation of 0.003108. Across the different partitioning strategies, the model consistently demonstrated stable predictive performance, and the relatively low variance further indicates its robustness under limited data conditions and its effectiveness in controlling bias.

4.2. Sensitivity Analysis

The magnitude cosine method was applied to perform sensitivity analysis on the variables affecting MFS, aiming to evaluate the relative influence of different parameters on the target response. The sensitivity coefficients of each variable are presented in Figure 12. As shown in Figure 12, among the explosive parameters, the sensitivity coefficient of the PF was the highest at 0.94, followed by the De at 0.87. Among the blasting design parameters, W, NH, L/W, J/B, S/B, J, and L all exhibited sensitivity coefficients exceeding 0.87. The UCS showed a sensitivity coefficient of 0.84. The calculation results demonstrate that highly sensitive variables exist in explosive parameters, blasting design parameters, and rock mass parameters with respect to MFS. Moreover, these variables show a high degree of overlap with the input variables selected by the RF feature selection process, which is consistent with the major influencing factors of blasting performance in practical blasting engineering projects. This confirms that the selected input variables possess reasonable physical significance.

4.3. Comparison of Prediction Performance

4.3.1. Model Evaluation Metrics

The coefficient of determination (R²), MSE, mean absolute error (MAE), mean absolute percentage error (MAPE), residual standard deviation (RSD), and variance accounted for (VAF) [44] were selected as the evaluation metrics of the model. The calculation formulas are as follows

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(21)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(22)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(23)

V A F = (1 - \frac{V a r (y_{i} - {\hat{y}}_{i})}{V a r (y_{i})}) \times 100 %

(24)

where

y_{i}

,

{\hat{y}}_{i}

and

\bar{y}

denote the actual value, the predicted value, and the mean of the actual values, respectively;

n

represents the sample size;

V a r

denotes the variance. A value of R² closer to 1 indicates a better goodness of fit of the model. Smaller values of MSE, MAE, and MAPE imply higher prediction accuracy of the model. A smaller RSD indicates lower fluctuation of residuals (the difference between the actual and the predicted values of each sample), suggesting that the model possesses better stability. A higher VAF value demonstrates stronger explanatory ability of the model for the variation trend of the data, with a smaller proportion of error between the prediction results and the actual values.

4.3.2. Model Comparison

Since the cross-validation results have already verified the overall robustness of the proposed framework, two simplified versions were further constructed to examine the specific effectiveness of RF-WOA-XGBoost: WOA-XGBoost, which omitted RF-based feature selection, and RF-XGBoost, which did not employ WOA-based hyperparameter optimization. The test-set results are presented in Table 5. Compared with WOA-XGBoost, RF-WOA-XGBoost improved the test-set R² from 0.670 to 0.930, reduced the MSE from 0.00138 to 0.00029, and decreased the RSD from 0.03089 to 0.01676, indicating that RF effectively removes redundant features and retains the most informative variables, thereby enhancing model performance. In contrast, relative to RF-XGBoost, RF-WOA-XGBoost increased the test-set R² from 0.611 to 0.930, reduced the MSE from 0.00162 to 0.00029, and lowered the MAPE from 0.1463 to 0.0334, suggesting that WOA improves predictive accuracy by efficiently searching for optimal hyperparameters. These findings clearly demonstrate the complementary contributions of RF and WOA: the former optimizes the feature space by minimizing redundancy, while the latter fine-tunes parameters to boost predictive capability, and their integration enables XGBoost to more accurately capture complex relationships in high-dimensional nonlinear blasting fragmentation data.

After confirming the superiority of RF-WOA-XGBoost over its simplified versions, further comparisons were conducted with other hybrid models, including RF-WOA-ANFIS, RF-WOA-LightGBM, and RF-WOA-CatBoost. The prediction results of each model are illustrated in Figure 13. The larger the number of data points falling on the 45° diagonal line between the horizontal and vertical axes, the better the prediction performance. The evaluation results in Figure 13 indicate that the RF-WOA-XGBoost model achieved superior performance across all evaluation metrics, with the R² of the test set reaching 0.930, which is 35.8% higher than that of the RF-WOA-ANFIS model ranked second.

A comparison between Figure 13a,b shows that the RF-WOA-XGBoost model exhibits a significant advantage over the RF-WOA-ANFIS model. The R² of the test set for XGBoost reached 0.930, which is 35.8% higher than that of ANFIS. The MSE of XGBoost was 0.00029, substantially lower than the 0.00131 of ANFIS, representing a reduction of 77.9%. The MAPE was 0.0334, superior to the 0.0449 of ANFIS. These results demonstrate that, compared with neural network systems relying on fuzzy inference rules, tree-based models possess stronger data fitting ability and generalization performance in capturing the complex nonlinear relationships involved in the prediction of blasting fragmentation. A comparison among Figure 13a,c,d further reveals that RF-WOA-XGBoost achieved the best performance among gradient boosting tree algorithms. With respect to R² on the test set, XGBoost (0.930) was far superior to LightGBM (0.154) and CatBoost (0.435). In terms of MSE, the value of XGBoost (0.00029) was significantly lower than those of LightGBM (0.00353) and CatBoost (0.00236). Regarding MAPE, XGBoost (0.0505) was 75.5% and 70.2% lower than LightGBM and CatBoost, respectively. Finally, from the distribution characteristics of the scatter plots, the data points of the RF-WOA-XGBoost model were concentrated most closely around the 45° diagonal line, while the data points of the other models were more widely scattered. This further confirms that XGBoost, as an optimized gradient boosting decision tree model, by virtue of its unique regularization mechanism, second-order gradient optimization, and refined tree pruning strategy, is able to capture the complex patterns and interactions in high-dimensional nonlinear engineering problems such as blasting fragmentation prediction. Consequently, the tree-based structure of XGBoost enables an appropriate balance between model complexity and generalization capacity, leading to optimal predictive performance and stability.

4.4. Model Interpretation Based on SHAP

The SHAP method was employed to interpret the predictive model, revealing the marginal contribution of each input variable during the prediction process. The marginal contribution refers to the individual effect of the variation in a specific feature value on the prediction result, while keeping all other features unchanged. The SHAP value of each input variable for every sample in the training set was calculated. The mean absolute SHAP value was then adopted as a metric of global feature importance, which allowed identification of the key features that play a dominant role in model decision-making. Subsequently, the SHAP value distribution of each variable across all training samples was presented in the form of scatter plots, in which each point represents the contribution of a certain variable in a single sample, as illustrated in Figure 14. In Figure 14, the horizontal axis corresponds to the SHAP value, reflecting both the direction and magnitude of the marginal effect of the feature on MFS prediction. A positive SHAP value indicates that an increase in the feature value leads to an increase in the predicted MFS, whereas a negative SHAP value signifies that an increase in the feature value results in a decrease in the predicted MFS. The color of the points denotes the magnitude of the original feature values, with the color gradient from green to orange representing the transition from low to high values. For instance, when the value of Qe is relatively high, its SHAP values are mainly concentrated in the negative region. This suggests that a sufficient charge amount provides adequate energy for fragmentation, resulting in smaller fragment sizes and a more uniform fragmentation effect. The SHAP analysis quantitatively described the marginal contributions of all input features to the prediction of MFS.

In addition, the analysis revealed the interaction effects among the features through SHAP. Under high PF values, the variation in other associated features, such as W and UCS, exerted more pronounced effects on the prediction results. As illustrated in Figure 14, this indicates that the model captured the synergistic interactions among features during the internal decision-making process, instead of merely summing the individual contributions of each feature.

5. Conclusions

This study proposes a hybrid optimization framework that combines RF, WOA, and XGBoost within a machine learning approach to predict MFS. The framework fully exploits historical blasting datasets and combines data-driven machine learning with artificial intelligence techniques to enhance the prediction accuracy of mean fragmentation. With a clear technical workflow and high practical relevance, it provides an effective methodological reference for intelligent and sustainable mining.

(1): By combining correlation analysis and RF-based feature selection, the primary strategy for determining input feature variables in the predictive model is to select those variables that exert a significant influence on MFS, while simultaneously avoiding the inclusion of variables that exhibit strong intercorrelation.
(2): A total of ten key input variables were identified through RF-based feature selection, and the hyperparameters of XGBoost were optimized using WOA. On this basis, the RF-WOA-XGBoost tri-model hybrid optimization model was constructed. Compared with WOA-XGBoost, RF-XGBoost, RF-WOA-ANFIS, RF-WOA-LightGBM, and RF-WOA-CatBoost models, the proposed model demonstrated significantly enhanced predictive accuracy. This indicates that the integration of feature selection, hyperparameter optimization, and tree-based algorithms can effectively improve the predictive capability of models when dealing with high-dimensional and small-sample blasting fragmentation datasets.
(3): The SHAP method quantitatively revealed the marginal contributions and interaction effects of input variables in the predictive model. The SHAP visualization further illustrated the relationships between contribution direction, intensity, and original feature values across different samples, thereby significantly enhancing model interpretability.

In summary, the hybrid optimization model developed in this study demonstrates excellent performance using data from a single mine and has the potential to provide a transparent and transferable modeling framework applicable to other mining contexts. However, the limited dataset size and the single-site data source restrict the current scope of validation. Future research could focus on extending the dataset to multiple mining sites with varying geological conditions and on exploring advanced techniques such as data augmentation, transfer learning, or ensemble learning to further improve adaptability and generalizability. These efforts would strengthen the robustness of the proposed framework and support its broader application in intelligent and sustainable mining.

Author Contributions

Conceptualization, Z.X., J.S. and Y.S.; methodology, Z.X.; formal analysis, Z.X. and H.L.; investigation, J.S. and H.L.; resources, J.S. and Y.S.; data curation, H.L. and Y.S.; writing—original draft preparation, Z.X. and J.S.; writing—review and editing, Z.X., J.S. and Y.S.; visualization, H.L. and Y.S.; supervision, J.S.; project administration, Z.X.; funding acquisition, J.S., Y.S. and Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China [2021-008]; the National Natural Science Foundation of China [Grant No. 42107176]; the Doctor initiated Fund of State Key Laboratory of Precision Blasting [Grant No. PBSKL-2023-QD-03]; the Research Fund of Jianghan University [2023KJZX47]; Natural Science Foundation of Hubei Province [2025DJA066]; Natural Science Foundation of Wuhan [2025040601020171] and the Graduate Scientific Research Foundation of Jianghan University [KYCXJJ202438].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study were obtained from the published work [40]. The processed data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors wish to thank the anonymous reviewers for their careful work and thoughtful suggestions that substantially improved this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Monteiro, N.B.R.; Bezerra, A.K.L.; Moita Neto, J.M.; Silva, E.A.d. Mining Law: In Search of Sustainable Mining. Sustainability 2021, 13, 867. [Google Scholar] [CrossRef]
Aghdamigargari, M.; Avane, S.; Anani, A.; Adewuyi, S.O. Sustainability in Long-Term Surface Mine Planning: A Systematic Review of Operations Research Applications. Sustainability 2024, 16, 9769. [Google Scholar] [CrossRef]
Ding, X.; Jamei, M.; Hasanipanah, M.; Abdullah, R.A.; Le, B.N. Optimized Data-Driven Models for Prediction of Flyrock due to Blasting in Surface Mines. Sustainability 2023, 15, 8424. [Google Scholar] [CrossRef]
Hekmat, A.; Munoz, S.; Gomez, R. Prediction of Rock Fragmentation Based on a Modified Kuz–Ram Model. In Proceedings of the 27th International Symposium on Mine Planning and Equipment Selection (MPES 2018), Las Condes, Chile, 19–23 November 2018; Springer: Cham, Switzerland, 2019. [Google Scholar]
Yilmaz, O. Rock factor prediction in the Kuz–Ram model and burden estimation by mean fragment size. Geomech. Energy Environ. 2023, 33, 100415. [Google Scholar] [CrossRef]
Ouchterlony, F.; Sanchidrián, J.A. The Fragmentation-Energy Fan Concept and the Swebrec Function in Modeling Drop Weight Testing. Rock Mech. Rock Eng. 2018, 51, 3129–3156. [Google Scholar] [CrossRef]
Sanchidrián, J.A.; Ouchterlony, F. Blast-Fragmentation Prediction Derived from the Fragment Size-Energy Fan Concept. Rock Mech. Rock Eng. 2023, 56, 8869–8889. [Google Scholar] [CrossRef]
Chung, S.H.; Katsabanis, P.D. Fragmentation Prediction Using Improved Engineering Formulae. Fragblast 2000, 4, 198–207. [Google Scholar] [CrossRef]
Moomivand, H.; Gheybi, M. Novel empirical models to assess rock fragment size by drilling and blasting. Measurement 2024, 238, 115375. [Google Scholar] [CrossRef]
Guo, H.; Nguyen, H.; Bui, X.N.; Armaghani, D.J. A New Technique to Predict Fly-Rock in Bench Blasting Based on an Ensemble of Support Vector Regression and GLMNET. Eng. Comput. 2019, 37, 421–435. [Google Scholar] [CrossRef]
Taiwo, B.O.; Famobuwa, O.V.; Mata, M.M.; Sazid, M.; Fissha, Y.; Jebutu, V.A.; Akinlabi, A.A.; Ogunyemi, O.B.; Ozigi, A. Granite Downstream Production Dependent Size and Profitability Assessment: An Application of Mathematical-Based Artificial Intelligence Model and WipFrag Software. J. Min. Environ. 2024, 15, 497–515. [Google Scholar] [CrossRef]
Nguyen, H.; Bui, X.N.; Bui, H.B.; Mai, N.L. A Comparative Study of Artificial Neural Networks in Predicting Blast-Induced Air-Blast Overpressure at Deo Nai Open-Pit Coal Mine, Vietnam. Neural Comput. Appl. 2020, 32, 3939–3955. [Google Scholar] [CrossRef]
Yu, Z.; Shi, X.; Zhou, J.; Gou, Y.; Huo, X.; Zhang, J.; Armaghani, D.J. A New Multikernel Relevance Vector Machine Based on the HPSOGWO Algorithm for Predicting and Controlling Blast-Induced Ground Vibration. Eng. Comput. 2020, 38, 1905–1920. [Google Scholar] [CrossRef]
Zhou, J.; Li, C.; Arslan, C.A.; Hasanipanah, M.; Bakhshandeh Amnieh, H. Performance Evaluation of Hybrid FFA-ANFIS and GA-ANFIS Models to Predict Particle Size Distribution of a Muck-Pile after Blasting. Eng. Comput. 2021, 37, 265–274. [Google Scholar] [CrossRef]
Zhou, J.; Li, E.; Yang, S.; Wang, M.; Shi, X.; Yao, S.; Mitri, H.S. Slope Stability Prediction for Circular Mode Failure Using Gradient Boosting Machine Approach Based on an Updated Database of Case Histories. Saf. Sci. 2019, 118, 505–518. [Google Scholar] [CrossRef]
Ebrahimi, E.; Monjezi, M.; Khalesi, M.R.; Armaghani, D.J. Prediction and Optimization of Back-Break and Rock Fragmentation Using an Artificial Neural Network and a Bee Colony Algorithm. Bull. Eng. Geol. Environ. 2016, 75, 27–36. [Google Scholar] [CrossRef]
Armaghani, D.J.; Hajihassani, M.; Monjezi, M.; Mohamad, E.T.; Marto, A.; Moghaddam, M.R. Application of Two Intelligent Systems in Predicting Environmental Impacts of Quarry Blasting. Arab. J. Geosci. 2015, 8, 9647–9665. [Google Scholar] [CrossRef]
Sayevand, K.; Arab, H. A Fresh View on Particle Swarm Optimization to Develop a Precise Model for Predicting Rock Fragmentation. Eng. Comput. 2019, 36, 533–550. [Google Scholar] [CrossRef]
Hasanipanah, M.; Amnieh, H.B.; Arab, H.; Zamzam, M.S. Feasibility of PSO–ANFIS Model to Estimate Rock Fragmentation Produced by Mine Blasting. Neural Comput. Appl. 2018, 30, 1015–1024. [Google Scholar] [CrossRef]
Chen, L.; Taiwo, B.O.; Hosseini, S.; Kahraman, E.; Fissha, Y.; Sazid, M.; Famobuwa, O.V.; Faluyi, J.O.; Akinlabi, A.A.; Ikeda, H.; et al. Swarm-Based Metaheuristic and Reptile Search Algorithm for Downstream Operation-Dependent Fragmentation Size Prediction. Neural Comput. Appl. 2025, 37, 25033–25059. [Google Scholar] [CrossRef]
Kulatilake, P.H.S.W.; Hudaverdi, T.; Wu, Q. New Prediction Models for Mean Particle Size in Rock Blast Fragmentation. Geotech. Geol. Eng. 2012, 30, 665–684. [Google Scholar] [CrossRef]
Huang, J.; Asteris, P.G.; Manafi Khajeh Pasha, S.; Mohammed, A.S.; Hasanipanah, M. A New Auto-Tuning Model for Predicting the Rock Fragmentation: A Cat Swarm Optimization Algorithm. Eng. Comput. 2022, 38, 2209–2220. [Google Scholar] [CrossRef]
Mojtahedi, S.F.F.; Ebtehaj, I.; Hasanipanah, M.; Bonakdari, H.; Amnieh, H.B. Proposing a Novel Hybrid Intelligent Model for the Simulation of Particle Size Distribution Resulting from Blasting. Eng. Comput. 2019, 35, 47–56. [Google Scholar] [CrossRef]
Zhang, S.; Bui, X.N.; Trung, N.T.; Nguyen, H.; Bui, H.B. Prediction of Rock Size Distribution in Mine Bench Blasting Using a Novel Ant Colony Optimization-Based Boosted Regression Tree Technique. Nat Resour Res 2020, 29, 867–886. [Google Scholar] [CrossRef]
Miao, Y.; Zhang, Y.; Wu, D.; Li, K.; Yan, X.; Lin, J. Rock Fragmentation Size Distribution Prediction and Blasting Parameter Optimization Based on the Muck-Pile Model. Min. Metall. Explor. 2021, 38, 1071–1080. [Google Scholar] [CrossRef]
Akyildiz, O.; Hudaverdi, T. ANFIS Modelling for Blast Fragmentation and Blast-Induced Vibrations Considering Stiffness Ratio. Arab. J. Geosci. 2020, 13, 1162. [Google Scholar] [CrossRef]
Rong, K.; Xu, X.; Wang, H.; Yang, J. Prediction of the Mean Fragment Size in Mine Blasting Operations by Deep Learning and Grey Wolf Optimization Algorithm. Earth Sci. Inform. 2024, 17, 2903–2919. [Google Scholar] [CrossRef]
Kahraman, E.; Hosseini, S.; Taiwo, B.O.; Fissha, Y.; Jebutu, V.A.; Akinlabi, A.A.; Adachi, T. Fostering Sustainable Mining Practices in Rock Blasting: Assessment of Blast Toe Volume Prediction Using Comparative Analysis of Hybrid Ensemble Machine Learning Techniques. J. Saf. Sustain. 2024, 1, 75–88. [Google Scholar] [CrossRef]
He, H.; Wang, W.; Wang, Z.; Li, S.; Chen, J. Enhancing Seismic Landslide Susceptibility Analysis for Sustainable Disaster Risk Management through Machine Learning. Sustainability 2024, 16, 3828. [Google Scholar] [CrossRef]
Moosavi, S.M.H.; Ma, Z.; Armaghani, D.J.; Aghaabbasi, M.; Ganggayah, M.D.; Wah, Y.C.; Ulrikh, D.V. Understanding and Predicting the Usage of Shared Electric Scooter Services on University Campuses. Appl. Sci. 2022, 12, 9392. [Google Scholar] [CrossRef]
Hong, Z.; Tao, M.; Liu, L.; Zhao, M.; Wu, C. An Intelligent Approach for Predicting Overbreak in Underground Blasting Operation Based on an Optimized XGBoost Model. Eng. Appl. Artif. Intell. 2023, 126, 107097. [Google Scholar] [CrossRef]
Goulet, A.; Grenon, M. Managing Seismic Risk Associated to Development Blasting Using Random Forests Predictive Models Based on Geologic and Structural Rockmass Properties. Rock Mech. Rock Eng. 2024, 57, 9805–9826. [Google Scholar] [CrossRef]
Gu, Z.; Cao, M.; Wang, C.; Yu, N.; Qing, H. Research on Mining Maximum Subsidence Prediction Based on Genetic Algorithm Combined with XGBoost Model. Sustainability 2022, 14, 10421. [Google Scholar] [CrossRef]
Sun, M.; Yang, J.; Yang, C.; Wang, W.; Wang, X.; Li, H. Research on Prediction of PPV in Open-Pit Mine Used RUN-XGBoost Model. Heliyon 2024, 10, e28246. [Google Scholar] [CrossRef]
Li, E.; Yang, F.; Ren, M.; Zhang, X.; Zhou, J.; Khandelwal, M. Prediction of Blasting Mean Fragment Size Using Support Vector Regression Combined with Five Optimization Algorithms. J. Rock Mech. Geotech. Eng. 2021, 13, 1380–1397. [Google Scholar] [CrossRef]
Xie, C.Y.; Nguyen, H.; Bui, X.N.; Choi, Y.; Zhou, J.; Nguyen-Trang, T. Predicting Rock Size Distribution in Mine Blasting Using Various Novel Soft Computing Models Based on Meta-Heuristics and Machine Learning Algorithms. Geosci. Front. 2021, 12, 101108. [Google Scholar] [CrossRef]
Huan, B.; Li, X.; Wang, J.; Hu, T.; Tao, Z. An Interpretable Deep Learning Model for the Accurate Prediction of Mean Fragmentation Size in Blasting Operations. Sci. Rep. 2025, 15, 11515. [Google Scholar] [CrossRef] [PubMed]
Yari, M.; He, B.; Armaghani, D.J.; Abbasi, P.; Mohamad, E.T. A Novel Ensemble Machine Learning Model to Predict Mine Blasting–Induced Rock Fragmentation. Bull. Eng. Geol. Environ. 2023, 82, 187. [Google Scholar] [CrossRef]
Ibrahim, B.; Ahenkorah, I.; Ewusi, A. Explainable Risk Assessment of Rockbolts’ Failure in Underground Coal Mines Based on Categorical Gradient Boosting and SHapley Additive exPlanations (SHAP). Sustainability 2022, 14, 11843. [Google Scholar] [CrossRef]
Sharma, S.K.; Rai, P. Establishment of Blasting Design Parameters Influencing Mean Fragment Size Using State-of-the-Art Statistical Tools and Techniques. Measurement 2017, 96, 34–51. [Google Scholar] [CrossRef]
Ohadi, B.; Sun, X.; Esmaieli, K.; Consens, M.P. Predicting Blast-Induced Outcomes Using Random Forest Models of Multi-Year Blasting Data from an Open-Pit Mine. Bull. Eng. Geol. Environ. 2020, 79, 329–343. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
Ma, X.; Chen, Z.; Chen, P.; Zheng, H.; Gao, X.; Xiang, J.; Chen, L.; Huang, Y. Predicting the Utilization Factor of Blasthole in Rock Roadways by Random Forest. Undergr. Space 2023, 11, 232–245. [Google Scholar] [CrossRef]

Figure 1. Random forest-based feature importance calculation.

Figure 2. Bubble-net foraging behavior of humpback whales in WOA.

Figure 3. Flowchart of XGBoost.

Figure 4. Implementation framework of the RF-WOA-XGBoost.

Figure 5. Data distribution plots of all variables.

Figure 6. Correlation analysis among variables.

Figure 7. Results of RF feature importance ranking.

Figure 8. The tuning process of max_depth.

Figure 9. The tuning process of learning_rate.

Figure 10. Fitness curves of different populations.

Figure 11. Schematic diagram of cross-validation.

Figure 12. Sensitivity analysis.

Figure 13. Comparison of predictive performance of different models. (a) RF-WOA-XGBoost; (b) RF-WOA-ANFIS; (c) RF-WOA-LightGBM; (d) RF-WOA-CatBoost.

Figure 14. SHAP plot of model feature contributions and interaction effects.

Table 1. Pseudocode of the RF-WOA-XGBoost.

Algorithm: RF-WOA-XGBoost
Input: Dataset D(x,y), feature selection threshold θ, population size N, maximum iterations T
Output: Optimized model M∗, prediction results y_pred, evaluation metrics Metrics

1. Data preprocessing:
x ← Outlier removal (x) // Remove outliers based on quartiles and IQR
x ← Standardization and normalization(x)
(x_train, x_test, y_train, y_test) ← Split dataset(x, y, test ratio = 9:1)

2. RF-based feature selection:
RF model ← Train Random Forest (x_train, y_train)
Selected features ← {f | FeatureImportance(f) > θ}
x_train, x_test ← Retain only selected features

3. WOA-based parameter optimization:
Initialize N whale positions // Each position corresponds to an XGBoost parameter set Best parameters ← null, Best score ← ∞

for t = 1 to T do
for each whale w do
Current score← Cross-validation (XGBoost(w.params), x_train, y_train)
if Current score < Best score then update Best score and Best parameters
end for
Update whale positions // Following WOA updating rules
end for

4. Model training and evaluation:
Final model ←Train XGBoost(Best parameters, x_train, y_train)
y_pred ← Final model.predict (x_test)
Metrics ← Compute (MSE, R², MAE, Accuracy, VAF)

5. Feature analysis:
Compute SHAP values and rank features by importance

return Final model, y_pred, Metrics

Table 2. Statistical characteristics of the original dataset.

Type	Parameter	Unit	Min	Max	Mean	Median	Std. Dev.
Input Parameters	D	m	0.25	0.31	0.28	0.27	0.02
	H	m	12.0	45.5	25.1	24.0	10.38
	J	m	0.0	3.5	1.5	1.5	0.94
	S	m	9.0	12.8	10.4	10.0	1.51
	B	m	7	10	8.8	9	0.90
	T	m	4.5	12.2	6.9	6.0	2.34
	L	m	26.2	133.0	75.4	72.5	27.09
	W	m	38	175	73.3	71.5	22.39
	S/B	-	1.00	1.29	1.18	1.17	0.09
	T/B	-	0.50	1.22	0.77	0.71	0.20
	H/B	-	1.33	4.55	2.79	2.70	0.94
	J/B	-	0.00	0.39	0.16	0.20	0.10
	B/D	-	28.00	35.86	31.83	32.16	1.97
	L/W	-	0.23	2.00	1.10	1.07	0.43
	NH	-	16	145	58.2	56.5	21.93
	Qe	t	9140	282,088	90,667.4	60,038.5	70,900.59
	De	kg/m	45.00	99.22	72.21	67.62	15.85
	PF	kg/m³	1.34	2.53	1.74	1.68	0.22
	UCS	MPa	11.8	36.0	22.4	22.0	4.77
Output Parameter	MFS	m	0.180	0.707	0.348	0.340	0.097

Table 3. Key hyperparameter value ranges and optimal.

Hyperparameter	Range	Optimal Value
n_estimators	[150, 300]	274
max_depth	[2, 4]	2
learning_rate	[0.1, 0.4]	0.39
subsample	[0.00001, 1.0]	0.56
colsample_bytree	[0.0001, 1.0]	0.32
reg_alpha	[0.1, 0.8]	0.10
reg_lambda	[0.1, 0.9]	0.45

Table 4. Model validation results under different cross-validation strategies.

Validation Strategies	Mean MSE	Standard Deviation of MSE
5-Fold CV	0.004599	0.002167
LOOCV	0.005441	0.012943
Repeated 5-Fold (10x)	0.005029	0.003108

Table 5. Test-set performance comparison of WOA-XGBoost, RF-XGBoost and RF-WOA-XGBoost.

Model	MSE	R²	MAE	MAPE	RSD
WOA-XGBoost	0.00138	0.670	0.2971	0.1097	0.03089
RF-XGBoost	0.00162	0.611	0.03546	0.1463	0.02676
RF-WOA-XGBoost	0.00029	0.930	0.01386	0.0505	0.01676

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Z.; Sun, J.; Lv, H.; Sun, Y. A Data-Driven Hybrid Intelligent Optimization Framework for Sustainable Mineral Resource Extraction. Sustainability 2025, 17, 9143. https://doi.org/10.3390/su17209143

AMA Style

Xu Z, Sun J, Lv H, Sun Y. A Data-Driven Hybrid Intelligent Optimization Framework for Sustainable Mineral Resource Extraction. Sustainability. 2025; 17(20):9143. https://doi.org/10.3390/su17209143

Chicago/Turabian Style

Xu, Ziying, Jinshan Sun, Haoyuan Lv, and Yang Sun. 2025. "A Data-Driven Hybrid Intelligent Optimization Framework for Sustainable Mineral Resource Extraction" Sustainability 17, no. 20: 9143. https://doi.org/10.3390/su17209143

APA Style

Xu, Z., Sun, J., Lv, H., & Sun, Y. (2025). A Data-Driven Hybrid Intelligent Optimization Framework for Sustainable Mineral Resource Extraction. Sustainability, 17(20), 9143. https://doi.org/10.3390/su17209143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Data-Driven Hybrid Intelligent Optimization Framework for Sustainable Mineral Resource Extraction

Abstract

1. Introduction

2. Proposed Hybrid Modeling Framework

2.1. Feature Selection Based on RF

2.2. Parameters Optimization Based on WOA

2.3. Prediction Based on XGBoost

2.4. A Hybrid Model Framework RF-WOA-XGBoost

3. Data Processing and Hyperparameter Optimization

3.1. Data Description and Correlation Analysis

3.2. Data Processing

3.3. Feature Selection

3.4. Hyperparameter Optimization

4. Verification of Prediction Performance and Comparative Analysis

4.1. Cross-Validation

4.2. Sensitivity Analysis

4.3. Comparison of Prediction Performance

4.3.1. Model Evaluation Metrics

4.3.2. Model Comparison

4.4. Model Interpretation Based on SHAP

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI