Compressive Strength Prediction of Cemented Backfill Containing Phosphate Tailings Using Extreme Gradient Boosting Optimized by Whale Optimization Algorithm

Unconfined compressive strength (UCS) is the most significant mechanical index for cemented backfill, and it is mainly determined by traditional mechanical tests. This study optimized the extreme gradient boosting (XGBoost) model by utilizing the whale optimization algorithm (WOA) to construct a hybrid model for the UCS prediction of cemented backfill. The PT proportion, the OPC proportion, the FA proportion, the solid concentration, and the curing age were selected as input variables, and the UCS of the cemented PT backfill was selected as the output variable. The original XGBoost model, the XGBoost model optimized by particle swarm optimization (PSO-XGBoost), and the decision tree (DT) model were also constructed for comparison with the WOA-XGBoost model. The results showed that the values of the root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE) obtained from the WOA-XGBoost model, XGBoost model, PSO-XGBoost model, and DT model were equal to (0.241, 0.967, 0.184), (0.426, 0.917, 0.336), (0.316, 0.943, 0.258), and (0.464, 0.852, 0.357), respectively. The results show that the proposed WOA-XGBoost has better prediction accuracy than the other machine learning models, confirming the ability of the WOA to enhance XGBoost in cemented PT backfill strength prediction. The WOA-XGBoost model could be a fast and accurate method for the UCS prediction of cemented PT backfill.


Introduction
Cemented backfill utilizes solid waste as aggregate to prepare sustainable construction materials for use in refilling underground voids, which provides an efficient and economical approach to waste management [1,2]. The cemented backfill is prepared by homogenously mixing aggregate, hydraulic binder, and water [3,4]. After mixing, the prepared backfill slurry is transported to underground voids through pipelines, where the backfill slurry gradually solidifies and develops strength [5]. The hardened backfill provides support to the nearby rocks, reduces surface subsidence, and improves ore recovery [6][7][8]. It is reported that an unconfined compressive strength (UCS) of 0.7-2 MPa is required for the cemented backfill in typical underground mining operations [9]. Hence, sufficient UCS of cemented backfill is critical for stope safety. On the other hand, currently, determination of the UCS of cemented backfill is based on a mechanical test using a servo pressure testing machine. The mechanical test requires the preparation of backfill specimens and a rather long curing age, resulting in difficulty in obtaining the UCS values. Therefore, it is crucial for mining to create a quick and accurate method to predict the UCS of cemented backfill.
Recent advances in machine learning (ML) models could offer a method for the UCS prediction of cemented backfill. Previous studies have applied machine learning models for the strength prediction of cement-based materials, such as cement, concrete, and mortar [10][11][12]. To predict the compressive strength of self-compacted concrete, Prado-Gil et al. used a variety of machine learning models, including extreme gradient boosting (XGBoost) gradient boosting (GB), and random forest (RF). The RF model performed best [13]. To predict the compressive strength of high-performance concrete, Mosbeh et al. also created a gradient boosting regression tree model. The model demonstrated good strength prediction potential. [14]. Due to the great advantages of machine learning models, there has been a trend of using machine learning models to predict the UCS of cemented backfill in recent years [15,16]. For instance, Qi et al. combined machine learning models with multi-objective optimization in order to construct an integrated intelligent design framework to predict the UCS and slump of cemented backfill, and the model performed well in the prediction of multiple indexes [17]. In previous studies, part hyperparameters in single machine learning models were set using trial-and-error or grid search method, which could seriously reduce the accuracy of prediction. According to previous studies, optimizing the hyperparameters of a machine learning model using optimization algorithms is an effective method to improve prediction performance, which is known as a hybrid model [18][19][20]. Optimization algorithms usually include the whale optimization algorithm (WOA), the sparrow search algorithm (SSA), the particle swarm optimization algorithm (PSO), and the butterfly optimization algorithm (BOA), etc. In recent years, various hybrid models have been formed and applied in geotechnical fields, such as tunnel extrusion classification and advanced rate prediction of the tunnel [21][22][23]. Previous research has demonstrated that hybrid models outperform solely machine learningmodels. For instance, Qiu et al. applied the WOA, gray wolf optimization (GWO), and Bayesian optimization algorighms (BO) to fine-tune the hyper-parameters of the XGBoost model to predict the peak particle velocity (PPV) induced by blasting. Their results show that the performance of three optimized hybrid models was superior to the original XGBoost model [23]. Moreover, hybrid models also showed good performance in the UCS prediction of cemented backfill. Hu et al. applied the sparrow search algorithm (SSA) to optimize the extreme learning machine (ELM) to predict the UCS of cemented backfill, and the SSA-ELM model had higher prediction accuracy compared to ELM [24]. Qi et al. optimized the adaptive neuro-fuzzy inference system (ANFIS) by using the artificial bee colony (ABC) algorithm to predict the UCS of cemented backfill, and the prediction results were accurate [25]. It appears that using an optimization algorithm and a machine learning model to build a hybrid model is a feasible approach to predict the UCS of cemented backfill. However, to the authors' knowledge, only a few studies have applied a hybrid model to UCS prediction of cemented backfill [26].
Extreme gradient boosting (XGBoost) is an innovative and efficient tree-based machine learning model proposed by Chen and Guestrin [27]. The XGBoost model can prevent the model from overfitting during training and reduce computational costs by automating parallel computation. The performance of the XGBoost model has been widely recognized in the Kaggle competition and the industrial community [28]. The WOA is an optimization algorithm that has a simple structure, few parameters, strong search ability, and simple implementation. The WOA-XGBoost model has been applied in the fields of prediction of daily reference evapotranspiration and prediction of blast-induced ground vibration [23,29]. Previous studies showed that the WOA algorithm can effectively optimize the hyperparameters of the XGBoost model and increase prediction accuracy. The WOA-XGBoost model could be a promising model for the UCS prediction of cemented backfill.
This study prepared cemented backfill using phosphate tailings (PT) as aggregate (cemented PT backfill). The WOA algorithm was used to optimize the XGBoost model in order to construct a hybrid model for the prediction of the UCS of cemented PT backfill. Five influencing factors, including the PT proportion, the OPC proportion, the FA proportion, the solid concentration, and the curing age, were used as input variables. In this paper, the UCS of backfill specimens was first determined by mechanical test. Then, the UCS results were used to build a dataset containing 63 groups. After that, RMSE, MAE, and R2 values were evaluated to determine the performance of the hybrid WOA-XGBoost model. Finally, the UCS-prediction performance of the WOA-XGBoost model was compared with the other three machine learning models (PSO-XGBoost, XGBoosst, and DT). In addition, feature importance analysis was used to obtain the importance of the input variables.

Materials
Phosphate tailings (PT) were gathered from a phosphate mine's (Guiyang, China) tailings pond and utilized as an aggregate. The main phase compositions of PT are dolomite and quartz. Fly ash (FA, collected from a powerplant in Guiyang, China) and 42.5 ordinary Portland cement (OPC) were used as binder. Based on the Chinese standard GB/T 1596-2017, the strength activity index (A = (R/R0) × 100, where A is the strength activity index, R0 is the UCS of hardened type 42.5 ordinary Portland cement, and R is the UCS of the hardened mixture of fly ash and type 42.5 ordinary Portland cement in a 3:7 ratio) of fly ash was more than 70% after 28 days.The particle size distribution of PT (measured by a particle size analyzer, Malvern, UK) is shown in Figure 1, and the components of PT (measured by X-ray fluorescence, Bruker, Switzerland) are listed in Table 1.

Preparation of Backfill Specimens
The experimental design of various ratios of raw materials and solid concentrations is shown in Table 2. The backfill slurry was prepared by mixing the PT, binder, and tap water homogeneously. The aggregate was not washed before mixing. The backfill slurry was sampled at a mixing time of 10 min, and then it was placed into models with dimensions of 40 mm × 40 mm × 40 mm. The backfill samples were taken out of the models after the final set and put into a curing box with a temperature of 20 • C and a relative humidity of 95%.

UCS Test
The backfill specimens were removed from the curing box after curing for 7 days, 14 days, and 28 days. A servo pressure testing machine (Hualong, China) with an accuracy of ±1 N was used to determine the UCS of backfill specimens. The displacement rate was set at 0.5 mm/min. According to Chinese standard JGJ/T 70-2009, the average value of three specimens was tested for each curing age under test [30].

Extreme Gradient Boosting Model
Extreme gradient boosting (XGBoost) is an efficient algorithm proposed by Chen and Guestrin to address supervised learning issues such as classification and regression [27]. The basic idea of the XGBoost is boosting, which uses additive training strategies and integration techniques to combine various weak learners in order to build a powerful learner [31]. To improve the computational precision, the loss function of the XGBoost model is expanded using a second-order Taylor method. Furthermore, XGboost adds a regularization term to avoid over-fitting and reduce the model complexity. The objective function of the XGBoost model consists of a term for the error function L and a term for the model complexity function Ω. The objective function can be formulated as follows: whereŷ i and y i are the predicted and target values, YT is the L1 regular term, 1 2 is the L2 regular term, T is the number of trees, ω is the internal split tree weight, and Y and λ are the regularization coefficients.
In order to minimize the objective function as much as feasible, a new function f was introduced to the model. The objective function can be formulated as follows: is the predicted value of the model at time t − 1, and f i (x i ) is the new function added at time t.
A second-order Taylor expansion is used in the XGBoost technique to minimize the objective function. When the constant component is taken out, it can be seen that the objective function is only related to the error function's first-order and second-order derivatives. The objective function can be formulated as follows: where G j represents first-order gradient statistics on the loss function, and H j represents second-order gradient statistics on the loss function [32].

Whale Optimization Algorithm
The Whale optimization algorithm (WOA) is a metaheuristic optimization algorithm created by Mirjalili et al. [33]. By simulating humpback whale feeding behavior, the algorithm seeks to identify the optimal target parameters. It is known that humpback whales use a special behavior called the "bubble-net feeding method" when hunting. There are three phases in the whole hunting process, namely encircling prey, the bubble-net attacking method (exploitation phase), and searching for prey (exploration phase). In this study, WOA was applied in order to optimize the hyperparameters of the XGBoost model for the UCS prediction of cemented PT backfill.

Encircling Prey
Each humpback whale symbolizes an individual, and the position of each individual in the search area indicates a solution. Humpback whales can identify the location of prey by echolocation, and all humpback whales, as a group, share information about the location of prey. The humpback whale closest to the prey position is chosen as the best candidate, after which the other humpback whales move close to that whale and tighten the envelope. A mathematical model of this behavior is described as follows: where t represents the current number of iterations, → X(t) represents the position vector at the t-th iteration, and → X * (t) represents the position vector of the best solution obtained so far. A and C are the coefficient vectors, which can be formulated as follows: where → r represents a random vector with values ranging from 0 to 1; → a is a convergence factor, linearly decreasing from 2 to 0.

Bubble-Net Attacking Method
When using a bubble net to trap prey, humpback whales assault their prey by spiraling upward and continually contracting the net. [34]. To describe this behavior graphically, two mathematical methods (shrinking encircling mechanism and spiral updating position) are designed. The shrinking encircling mechanism is achieved by decreasing the value of → a . The spiral updating position phase can be formulated as follows: where → D represents the distance from the i-th whale to the target prey, b is a constant that determines the logarithmic spiral's shape, and l is a random number in the range [−1, 1].
To simulate the simultaneous behaviors of the shrinking encircling mechanism and the spiral updating position, a 50% probability of choosing between the two behaviors is assumed. The mathematical model can be expressed as follows: where p is a random number in [0, 1].

Search for Prey
In the exploration phase, WOA updates the position of the individual to a random whale in order to implement a global search. The process can be formulated as follows: where → X rand is a random position vector chosen from the current population [33].

WOA-XGBoost Model
In the XGBoost model, hyperparameters could have considerable influence on the prediction performance of the model [35][36][37]. The grid search method is the most-used method for the adjustment of hyperparameters. However, the search range of the grid search method is too narrow to find the optimal hyperparameters [38]. In this study, the hyperparameters of the XGBoost model were optimized by WOA; thus, the hybrid model WOA-XGBoost was proposed. Figure 2 shows the analysis process of the WOA-XGBoost model. Each whale's position vector was determined by the XGBoost model's parameters (learning rate, number of boosting rounds, and regular lambda). WOA outputs the final parameters of the XGBoost model after iteratively searching for the algorithm's global optimal location. The mean squared error of attrition (MSE) is displayed at its lowest level in the optimal XGBoost model.

The Process of WOA-XGBoost Modeling
To develop the WOA-XGBoost model, an initial XGBoost model was developed. Then, the relevant parameters of the WOA were set. Eight different population sizes (25, 50, 75, 100, 125, 150, 175, and 200) were selected during the model development to find the optimal population size. In this study, 63 sets of data obtained from the UCS test were used as the input database. Fly ash proportion, cement proportion, phosphate tailings proportion, solid concentration, and curing age were selected as input variables. The UCS of cement PT backfill was used as the output variable. According to the Pareto principle, the original data were randomly split into a training set and a test set in the ratio of 8:2. The WOA-XGBoost model was trained using 50 sets of data from the training set, and its prediction performance was assessed using 13 sets of data from the test set. The overcomplication of ML models could result in overfitting. A 5-fold cross-validation method was used to overcome overfitting in this study. The training set was divided into 5 equal subsets, which were selected in turn for tests. The remaining four subsets were used for training to ensure that each subset had the opportunity to train and validate the model. For each training and test cycle, the cross-validation error was calculated and used to incrementally change the model parameters. Potential errors due to uneven data distribution or other unexpected circumstances were eliminated by a 5-fold cross-validation, and the robustness of each integrated learning model was ensured.

Evaluation Methodology
The dependability of the hybrid model is assessed using three statistical performance metrics: coefficient of determination (R 2 ), mean absolute error (MAE), and root mean square error (RMSE). RMSE, MAE, and R 2 are calculated using Equations (13)-(15), respectively, as follows: where N is the number of samples; y j is the actual value; y i is the predicted value, and y is the average of the predicted values. The units of RMSE and MAE are Mpa, and R 2 is expressed as a percentage. A hybrid model with lower RMSE and MAE and higher R 2 could exhibit a better prediction performance. Figures 3 and 4 show the UCS of backfill specimens. As expected, the UCS of all backfill specimens increased with curing age. For instance, the UCS of backfill specimens (FA:OPC:PT = 0:1:2, solid concentration of 70%) increased from 2.30 MPa to 5.15 MPa when the curing age was extended from 7 days to 28 days. The further hydration of the binder with curing age contributed to the development of UCS. Moreover, it appears that the backfill specimens with a higher binder/aggregate ratio gained a higher rate of UCS development over a curing age from 7 to 28 days. It was observed that the UCS is positively related to the binder/aggregate ratio and solid concentration. As shown in Figure 3a, when the solid concentration increased from 70% to 75%, the UCS of backfill specimens with an FA:OC:PT ratio of 0:1:2, 0:1:4, and 0:1:6 increased by 28.7%, 102.6%, and 23.2%, respectively. The refinement of pores by high solid concentration could be the reason for the increase in UCS value [39]. Moreover, when the FA:OPC:PT ratio varied from 0:1:6 to 0:1:2, Figure 3a shows that the UCS of the backfill samples with solid concentrations of 70%, 72%, and 75% increased by 4.1 times, 3.67 times, and 4.3 times, respectively. The greater quantity of hydration products induced by a high binder/aggregate ratio resulted in a higher UCS value. It is worth noting that when the curing age was extended to 28 days, Figures 3c and 4c show that the UCS values of backfill specimens with the highest binder/aggregate ratio (with FA:OPC:PT ratio in 0:1:2 and 1:1:4) were rather stable when subjected to different solid concentrations.

UCS Development
It appeared that the UCS of specimens is more sensitive to the FA:OPC:PT ratio than solid concentration. For instance, for the backfill specimens using a FA:OPC:PT ratio of 0:1:6, Figure 3c shows that when the solid concentration increased from 70% to 75%, the UCS only increased by 27.2%. However, when the FA:OPC:PT ratio varied from 0:1:6 to 0:1:2, the UCS of backfill specimens (solid concentration of 70%) obtained an increase of 294.3%. It seems that the gelling and filling of the pores of hydration products could significantly promote UCS development, which is more effective than the refinement of pores by high solid concentration.

Performance of WOA-XGBoost Model
The variation of fitness value with the number of iterations is shown in Figure 5. It was found that the fitness value of eight population values decreased dramatically as the number of iterations increased. At 100 iterations, eight population values of the WOA-XGBoost model had reached a stable level, and a stable fitness value state had been produced. After the training of the WOA-XGBoost model was complete, the calculation of the performance metrics was conducted, which were given in Table 3. It was observed that the WOA-XGBoost model with a population size of 100 gained the lowest RMSE and MAE values and the highest R 2 value. Therefore, the population size of 100 (Num_boosting_rounds = 167, Learning_rate = 0.5235, Reg_lambda = 0.2167) was selected as the optimal parameter for the WOA-XGBoost model. Figure 6 shows a comparison of the predicted and actual values when the WOA-XGBoost model was subjected to a population size of 100. It was found that the predicted values are close to the actual values, indicating that the WOA-XGBoost model performed well in the UCS prediction of the cemented PT backfill.

Comparison with Machine Learning Models
The WOA-XGBoost model was compared against the PSO-XGBoost model, the original XGBoost model, and the decision tree (DT) model to further assess its performance in UCS prediction. The initial parameters of the XGBoost model and DT model were set according to the WOA-XGBoost model to ensure consistency. The initial parameters of the PSO algorithm were established by previous studies [24,26], as shown in Table 4.  The WOA-XGBoost model shows a better prediction performance than the remaining three models. The performance metrics for the training and test sets of the four models are shown in Table 5. Unsurprisingly, the WOA-XGBoost model had the highest R 2 value and lowest RMSE and MAE values in both the training and test sets. Specifically, compared with PSO-XGBoost, XGBoost, and DT, the performance indexes of WOA-XGBoost on the test set showed a reduction of 37.08%, 47.86%, and 55.39% in RMSE, respectively, and 40.55%, 45.29% and 57.70% in MAE, respectively, and the R 2 increased by 3.39%, 6.20%, and 14.55%, respectively. The actual and predicted results for the training set and test set of the four models are shown in Figure 8. It was observed that the point of the WOA-XGBoost model is located near the perfect-fitting line, indicating that the predicted values and actual values are close. On the other hand, the remaining three models exhibited larger errors (Table 5) and larger discreteness (Figure 7).  The Taylor diagram enables a more intuitive comparison of model performance than a single index of model evaluation. Taylor diagrams can present information about various models, and it has been a popular method for evaluating and testing models in recent years [32]. The RMSE, correlation coefficient, and standard deviation of the actual and predicted values were computed to further compare the prediction performance of the four models. The results are plotted as a Taylor diagram, as shown in Figure 8. Compared with other models, the WOA-XGBoost model is closer to the reference point, indicating that the WOA-XGBoost model performs better in terms of the UCS prediction of cemented PT backfill.

Feature Importance Analysis of Input Variables
Feature importance analysis is a method used to quantify the correlation between input variables and the model [40]. The predicted results are more sensitive to the input variables with higher importance scores. Figure 9 shows the importance scores of five input variables in the WOA-XGBoost model. It can be seen that the PT proportion had the highest importance score, followed by curing age, OPC proportion, FA proportion, and solid concentration.  PT proportion, OPC proportion, and FA proportion belong to the factor of proportions of raw materials. The importance score of PT proportion, OPC proportion, and FA proportion was 0.48, 0.13, and 0.11, respectively. The total importance score of the proportions of raw materials was 0.72, which could be regarded as the most significant factor affecting the UCS of cemented PT backfill. Moreover, the importance score of the PT proportion gained the highest value among these three input variables, indicating that the binder/aggregate ratio had a greater effect on UCS than the OPC/FA ratio.
The importance score of the curing age in WOA-XGBoost obtained a value of 0.18, which meant that the effect of curing age on UCS cannot be ignored in cemented PT backfill. The experiment results in this study (Figures 3 and 4) also showed that the UCS of backfill specimens could gain an increase of 83.4-552.2% over a curing age of 7 days to 28 days.
The solid concentration obtained the lowest value of importance score, which was only 0.09. The result was quite consistent with the UCS results (Figures 3 and 4), which showed that the FA:OPC:PT ratio had a more significant effect on the UCS of backfill specimens than solid concentration. This phenomenon also was noticed by a previous study, which showed that the solid concentration has a smaller effect on UCS compared with the binder/aggregate ratio and curing age [17].

Conclusions
In this study, the XGBoost model was optimized using the WOA algorithm for the UCS prediction of cemented PT backfill. A total of 63 sets of experimental data were obtained through the mechanical test, and 5 variables, including fly ash percentage, cement percentage, phosphate tailings percentage, solid concentration, and curing age were used as input variables. The prediction performance was evaluated and the feature importance of input variables was calculated. The following conclusions can be drawn:

1.
The WOA-XGBoost prediction model had high accuracy for the UCS prediction of cemented PT backfill. Compared with PSO-XGBoost, XGBoost, and DT, the prediction results of WOA-XGBoost showed a 37.08%, 47.86%, and 55.39% reduction in RMSE, 40.55%, 45.29%, and 57.70% reduction in MAE, 3.39%, 6.20%, and 14.55% improvement in R 2 , respectively. The results indicated that the prediction performance of the XGBoost model can be greatly improved by the WOA algorithm.

2.
The results of the feature importance analysis showed that PT proportion was the most important input variable, followed by curing age, OPC proportion, FA proportion, and solid concentration. The importance score of the PT proportion was 0.48, and the total importance score of the proportions of raw materials was 0.72, indicating that the binder/aggregate ratio was the key to obtaining sufficient UCS for cemented PT backfill.

3.
WOA-XGBoost model could provide a promising method for the UCS prediction of cemented PT backfill. Therefore, the model can facilitate mine production. The model achieved better performance than other machine learning models and demonstrated potential for use in other geotechnical applications. In the future, with the addition of more training data, the performance of the WOA-XGBoost model may be more accurate. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data will be made available on request.

Conflicts of Interest:
The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.