Developing Hybrid Machine Learning Models for Estimating the Unconfined Compressive Strength of Jet Grouting Composite: A Comparative Study

Coal-grout composites were fabricated in this study using the jet grouting (JG) technique to enhance coal mass in underground conditions. To evaluate the mechanical properties of the created coal-grout composite, its unconfined compressive strength (UCS) needed to be tested. A mathematical model is required to elucidate the unknown nonlinear relationship between the UCS and the influencing variables. In this study, six computational intelligence techniques using machine learning (ML) algorithms were used to develop the mathematical models, which includes back-propagation neural network (BPNN), random forest (RF), decision tree (DT), support vector machine (SVM), k-nearest neighbors (KNN), and logistic regression (LR). In addition, the hyper-parameters in these typical algorithms (e.g., the hidden layers in BPNN, the gamma in SVM, and the number of neighbor samples in KNN) were tuned by the recently developed beetle antennae search algorithm (BAS). To prepare the dataset for these ML models, three types of cementitious grout and three types of chemical grout were mixed with coal powders extracted from the Guobei coalmine, Anhui Province, China to create coal-grout composites. In total, 405 coal-grout specimens in total were extracted and tested. Several variables such as grout types, coal-grout ratio, and curing time were chosen as input parameters, while UCS was the output of these models. The results show that coal-chemical grout composites had higher strength in the short-term, while the coal-cementitious grout composites could achieve stable and high strength in the long term. BPNN, DT, and SVM outperform the others in terms of predicting the UCS of the coal-grout composites. The outstanding performance of the optimum ML algorithms for strength prediction facilitates JG parameter design in practice and could be the benchmark for the wider application of ML methods in JG engineering for coal improvement.


Introduction
Jet grouting (JG) has been a widely applied approach for the stabilization of loose materials such as soil, fragmented rock, coal, etc. [1]. Generally, a stabilizing fluid is injected into soft materials under high velocity and pressure [2]. The injecting equipment is composed of a JG string with a nozzle at the end, which can inject the fluid into the soft materials via a rotary motion as the string is raised and rotated slowly [3,4]. Usually, cementitious grout or chemical grout can be utilized as blinders to consolidate the loose materials in order to improve their mechanical properties, such as unconfined compressive strength (UCS).
In underground coal mines, the coal-grout composites created by the JG technique were used as a new support material. Due to its cutting and erosion effect, the mechanical properties of the raw coal were completely changed by the injected grout [5,6]. After the JG process, a rigid coal-grout pile surrounded by coal mass was formed with low deformability and high strength, which can provide a sufficient support for roadways in coal mines [7][8][9][10]. In practice, conducting UCS tests is an extensively used approach to assess the supporting ability of support materials [11][12][13]. However, UCS tests are time-consuming and costly since many influencing variables should be considered. To address this problem, an indirect method for evaluating the UCS of supporting materials is to establish empirical equations based on statistical regression [14,15]. Nevertheless, some empirical rules used for designing JG parameters are often conservative and have limited applicability [12]. In addition, the nonlinear relationship between the UCS and the multiple influencing variables is unknown, which makes it more difficult to develop empirical equations [16,17]. Therefore, to predict the UCS of coal-grout composites efficiently and accurately, some computational models are necessary.
In recent years, machine learning (ML) based models have been used extensively in rock materials [18,19], composite materials [5,[20][21][22][23][24], and other areas [25][26][27]. Due to the high-efficiency performance for data processing, the outputs could be predicted accurately by using the inputs without knowing their relationship, which can reduce the need for time-and money-consuming experiments. However, to the author's knowledge, no relevant studies on the strength prediction of jet grouted coal-grout composites using ML methods have been reported, since the coal-grout composite is a relatively new support material. In addition, compared with the one or two conventional ML methods used in the composites mentioned above, more advanced ML algorithms, such as decision tree (DT) and random forest (RF), should also be applied to study their feasibility in coal-grout composites.
In the current paper, six algorithms, such as DT, RF, the back-propagation neural network (BPNN), k-nearest neighbors (KNN), support vector machine (SVM), and logistic regression (LR), were used to evaluate the UCS of coal-grout composites combining influencing variables (i.e., coal-grout ratio, grout types, and curing time). Some experimental UCS values were used to construct a dataset to train and validate our proposed models. The performances of these algorithms were compared and discussed The evaluation of coal-grout composite is of significance before JG application in the field. Therefore, to determine the composite accurately and rapidly, the ML methods were proposed and utilized. Considering several key influence variables and the UCS of coal grout composite, non-linear relations were established by intelligent models. This novel work provides some new and fast approaches to estimate coal-grout composite strength, which can promote the JG design and contribute to the artificial intelligence application.

Applied ML Algorithms
Six ML algorithms (i.e., BPNN, SVM, DT, RF, KNN, and LR) were applied to establish the relationship between the UCS of coal-grout composites and the input variables, i.e., grout types, coal-grout ratio, and curing time. The simple descriptions of these algorithms are summarized as follows: BPNN is a type of artificial neural networks (ANN). The main advantage is that the weight by the error between the real values and calculated output can be adjusted and iterated using back propagation [28]. SVM uses the hyperplane to classify samples into various classes [29]. DT can partition the labeled dataset recursively into increasing small subdivisions according to a set stop criterion until a suitable breakdown level is reached [30]. RF uses an ensemble of classification trees to achieve classification and regression [31]. K-NN is a non-parametric and instance-based learning model for classification and regression [32]. LR is an extensively applied approach to classify dependent variables. LR aims to find the best hypothesis to establish the relationship between outcome and variables [33].

Beetle Antennae Search Algorithm (BAS)
The hyper-parameters of these six algorithms were tuned by BAS and the definitions of the hyper-parameters are summarized in Table 1. BAS can be used for global optimization questions [34]. It simulates the behavior of beetle. The pseudo code is shown in Figure 1.
The searching behavior is expressed as follows: where → b is a normalized random unit vector, rnd(.) means a random function, k is the dimensions of the position, and d is the sensing length of antennae; x r and x l represent the right and left searching area.
The detecting behavior is as follows: where δ is the step size of each iteration, and sign(.) means a sign function. The antennae length d and step size δ are given as follows:

K-Fold Cross-Validation
To validate the regression model, some methods have been used, such as the simple substitution method [35], bootstrap method [36], holdout method [37], and bolstered method [38]. The k-fold cross-validation (CV) was employed for the training data in this paper [39], in which k was assigned to be 10 according to the recommendations and number of datasets. During hyper-parameter tuning, the training data were divided into 10 folds. Nine folds were used for training, while other one was used for validating (see Figure 2). The final result of a model was the average of the 10 results from the 10 rounds. By this method, the overfitting problem can be overcome.

Hyper-Parameter Tuning
As mentioned above, 10-fold CV and BAS were employed for tuning the hyper-parameters of the described ML algorithms. To validate the performance of hyper-parameter tuning and model, the mean squared error (MSE) and correlation coefficient (R) are applied for evaluation. MSE and R are defined as following [40]: where N means the numbers of specimens; y* i and y i represent the predicted values and actual values, respectively; and y and y * are the mean predicted values and mean real values. The hyper-parameters of the six proposed algorithms were first tuned by BAS, and then the trained regression models were compared. Specifically, according to the 10-folder CV approach, the 9 data sets for training were employed for searching the optimum hyper-parameters of the algorithms (e.g., the hidden layers in BPNN, the gamma in SVM, the number of samples at a leaf node in DT, the number of neighbor samples in KNN, etc.) by BAS several times. The smallest RMSE in the validation set could be selected after iterations. Consequently, the best ML models and their ideal hyper-parameters were selected after 10 folds.
The flowchart for applying the six proposed algorithms tuned by BAS is displayed in Figure 3. It should be pointed out that to overcome the local minima issue, a key modification to BAS was introduced (i.e., the Levy flight strategy for adjusting the step size of BAS) [41][42][43]. To enlarge the size step and avoid a lock optimum, the equation is triggered as follows: where α is a randomization parameter, α ∈ [0, 1], ⊗ is entrywise multiplications, and Levy represents a Levy distribution:

Materials
Field raw coal, as one of the component materials in coal-grout composites, was extracted from the Guobei coalmine. The coal mass was fragmentized due to its extremely low compressive strength (0.56 MPa) and previous complex tectonic movements. Hence, the JG technique was firstly used to enhance the coal mass for roadway support. According to the experiments and field trial, six types of frequently-used grouting materials (three types of cementitious grouts: P.O 32.5, P.O 42.5, and superfine cement (SF-C) as well as three types of chemical grouts: MP 364, MP 398, and MP 325) were used as binders in this study. For field application, the water-cement ratio was chosen to be 1:1. The chemical grouts consisted of component A and component B with the ratio between A and B being 1:1 according to the trial tests. A five-level coal-grout ratio (0.4:1, 0.6:1, 0.8:1, 1:1, and 1.2:1) was designed to evaluate its influence [44,45]. We tested the 1-, 7-, 14-, and 28-day UCS for all specimens, and we also tested the 4 h UCS only for coal-chemical grout composites due to their high rapid chemical reaction. The specific design of the experiment is given in Table 2. The statics of input and output are summarized in Table 3.

Specimen Preparation
In practice, to test the UCS of coal-grout composites after grouting, standard samples (normally, of 50 mm and 100 mm in diameter and height, respectively) were drilled from the jet grouted coal-grout piles. Hence, to simulate this procedure, the experimental steps were as follows: (1) Preparation of the grout. Cementitious or chemical binders were fully mixed with water and accelerator to achieve the designed cementitious grout or chemical grout. (2) Creation of coal-grout composite mixes. The raw coal was then mixed with the cementitious grout or chemical grout by a mixer (HJW-60) for about 5 min and 1 min, respectively. (3) Casting of coal-grout composites. The pre-produced coal-grout composite mixes were poured into a rectangular mold. After compaction, the model was placed in a curing chamber for 4 hours (coal-chemical grout only), and 1, 7, 14, and 28 days, respectively. The environment conditions were at approximately 20 • C and 90% humidity. (4) Sampling method for fabricating standard specimens. The core-drilling machine (HZ-20) was used to obtain the cylindrical specimens (of 50 mm and 100 mm in diameter and height, respectively) from the casted specimen. To ensure the flatness and parallelism, a grinding machine was utilized.

UCS Test
The compressive strength data collected from 405 coal-grout composite specimens with different influencing variables were used as the database for presented models. The UCS of specimens was performed as ASTM C39 (ASTM, 2001 [46]) by a testing machine (SANS, Shenzhen, China). The displacement rate of loading was set as 0.5 mm/min. The compressive strength of coal-grout composite was determined according to the maximum values. To improve the reliability of the tests, every test was duplicated three times. Then the average, UCS was regarded as the final strength of composite. Finally, the collected dataset was used for ML models.

Dataset Partition
For supervised regression problems, the algorithm should be trained and tested. Therefore, the training set and the test set was set based on the UCS dataset of the coal-grout composites. The former was used for training the algorithms and tuning the hyper-parameters. The latter was applied for testing the trained regression models. A smaller training dataset may cause under-fitting, while a larger one may result in over-fitting. Besides, a relatively large dataset is required to verify the generalization capability of the trained algorithms [47]. In this paper, as described above, the crucial influencing parameters, such as the types of grout, coal-grout ratio, and curing time, were regarded as inputs, while UCS was chosen as the output. The MATLAB was used for data processing. The whole database was split into training dataset (70%) and test dataset (30%) based on the study [48]. Figure 4 shows the UCS results of coal-grout composites. It can be seen that the 7-day UCS of the coal-MP 364 composite is higher than other chemical grout composites with the same coal-grout ratio. As for the cementitious grout composites with the same coal-grout ratio and curing time, the UCS of SF-C composites was the highest followed by coal-P.O 42.5, while coal-P.O 32.5 had the lowest UCS. It can also be observed that the UCS of coal-grout composites decreased with an increasing coal-grout ratio. The possible reason for this is that the coal particles were bonded together by the grout; as the grout decreased, the bonding became weaker [49]. Furthermore, the curing time had different effects on cement-grout and chemical-grout composites. The UCS of coal-chemical grout composites increased and peaked on the seventh day, and then declined. This is due to the rapid chemical reaction of the chemical grout during the early days. However, an obvious shrinkage of the chemical grout was observed after the seventh day, especially in the MP 364 group (shown in Figure 5a). The porous structure affected the overall strength of the chemical grout composites. As for the coal-cement grout composites, the UCS increased with curing time, but the growth rate decreased, which agreed well with much of the literature [50][51][52]. In addition, its average early strength (1-day strength) was lower than the 4 h strength of coal-chemical grout composite. In terms of its microstructure, no clear shrinkage between coal and cement was observed, indicating that cementitious grout performs better than chemical grout (Figure 5b).

Results of Hyper-Parameters Tuning
The effectiveness of using BAS to tune hyper-parameters was analyzed by tracing the MSE values of each iteration. Figure 6 shows the average MSE curves versus iteration during the hyper-parameter tuning. Different evolution patterns can be seen for different ML algorithms in terms of the MSE values and convergent rates. It took more time for KNN and SVM to converge than the other three algorithms, and KNN behaved the worst with the largest MSE value after convergence. Also, the MSE of DT decreased sharply with each iteration during BAS tuning to the minimum value among these ML algorithms. This indicates that BAS tuned DT most efficiently. Overall, the ideal hyper parameters of these models could be found by BAS efficiently and reliably. These results of hyper-parameters are shown in Table 4.

Comparison of Integrated ML Algorithms
As discussed above, six ML algorithms were employed to examine their predictive effects in the dataset. Figure 7 shows the predictive performance of the six ML algorithms in terms of the most commonly used statistical measures: MSE and R 2 . It can be observed that BPNN, DT, and SVM have better performance than other modules. Besides, DT outperformed BPNN and SVM with a marginally smaller MSE and slightly higher R 2 . The worst-performing algorithm was LR with the highest MSE of 6.1202 and the lowest R 2 of 0.3089. This indicates that this baseline algorithm is not suitable for modeling the UCS data. The differences between the predicted and actual values in the testing dataset can be compared using the boxplot (see Figure 8). A boxplot is convenient to visually display the distribution of the data through their quartiles [53]. It can be seen that DT was grouped the most tightly (with a comparatively short boxplot), and was slightly better than BPNN and SVM, although no outliers were observed in SVM. This indicates the overall UCS values of the composites were predicted accurately by these three algorithms. As for RF, KNN, and LR, these boxplots are comparatively tall with larger interquartile ranges and more outliers, indicating that the predicted UCS values were more scattered. LR was the worst-performing algorithm in terms of the highest median and largest interquartile range.
The realistic characteristics of these ML algorithms can be graphically indicated by a Taylor diagram [54]. The Taylor diagram shown in Figure 9 provides a summary of the skill with which the six ML algorithms simulate the relationship between the predicted and observed UCS. The distance between each ML algorithm and the point labeled "measured" is a measure of how realistically each algorithm reproduces the measured values. It is clear that DT, SVM, and BPNN lie nearest the point marked "measured" with comparatively higher R 2 and lower RMSE, demonstrating that the three algorithms agree well with the actual values. LR has a low pattern correlation, while KNN has larger variations than the actual values. It should be noted that although RF has about the same correlation coefficient as SVM, DT and BP, the standard deviation of RF is much larger, resulting in a larger RMSE.  To compare these ML algorithms more comprehensively, some other statistics including mean absolute error (MAE), mean absolute percentage error (MAPE), minimum iteration, and time cost were also introduced in this study. To visualize data with so many variables, a radar chart is a good choice. The radar chart is composed of a series of equi-angular spokes arranged radially from the center, with each spoke standing for one statistical measure. The spoke length is proportional to the normalized magnitude of the measure between 0 (worst) to 1.0 (best) [55], and hence, a better model encloses a worse model. It is easily seen from Figure 10 that DT, SVM, and BPNN performed the best overall, and enclosed RF, KNN, and LR, although SVM performed slightly worse in terms of iteration times, indicating that SVM cost more time and effort during modeling. Among the models with poorer performance, LR was nearly enclosed by KNN due to lower scores for several variables including MSE, MAE, R, R 2 , and difference median. RF performed better in all the statistics except minimum integration, but many scores were not high enough to represent a good prediction model such as MSE, MAE, MAPE, and difference median.

Analysis of the Variable Importance
Using global sensitivity study by the evolutionary SVM model, the variable importance is calculated, as shown in Figure 11. It can be observed that grout types have the most significant influence on the strength of coal-grout composite with an influencing percentage of 47.6%, followed by curing time (32.8%), while and coal-grout ratio is the least sensitive variable with an influence percentage of 19.6%. It should be noted that the importance score is obtained by the dataset used in this paper. More accurate results can be obtained if more data samples are included in the dataset in the future.

Conclusions
This study applied six integrated ML algorithms to predict the strength of coal-grout composites systematically. BAS was applied to tune hyper-parameters in these algorithms. A total of 405 specimens combined with variables (i.e., curing time, grout types, and the coal-grout ratio) were tested to construct the dataset to train and test the six models. The optimum algorithms were selected by assessing different statistical metrics. Some interesting conclusions were obtained. First, coal-chemical grout composites had higher strength in the short-term due to the rapid chemical reaction; however, the long-term strength decreased because chemical binders shank considerably. In comparison, the coal-cementitious grout composites could achieve stable and high strength in the long term. Secondly, BAS could find the optimum hyper-parameters of the algorithms effectively. Lastly, the optimum DT, BPNN, and SVM models achieved better predictive performance in the testing dataset.
In the future, a large dataset with multi-variables can be used to further improve the generalization of the presented model. In engineering projects, the strength of the real field coal-grout column could be predicted by these models, which will reduce costs and improve efficiency. This frontier work can boost the jet grouting formulation design for coal mass improvement in practical applications. Funding: The study is supported by the program "the Fundamental Research Funds for the Central Universities (2020ZDPYZD02)." Acknowledgments: Thanks for the reviewers' comments and editor's work. Special thanks to Zuqi Wang for her encouragement and help.

Conflicts of Interest:
The authors declare no conflict of interests.