Development of Prediction Models for Shear Strength of Rockﬁll Material Using Machine Learning Techniques

: Supervised machine learning and its algorithms are a developing trend in the prediction of rockﬁll material (RFM) mechanical properties. This study investigates supervised learning algorithms—support vector machine (SVM), random forest (RF), AdaBoost, and k-nearest neighbor (KNN) for the prediction of the RFM shear strength. A total of 165 RFM case studies with 13 key material properties for rockﬁll characterization have been applied to construct and validate the models. The performance of the SVM, RF, AdaBoost, and KNN models are assessed using statistical parameters, including the coefﬁcient of determination (R 2 ), Nash–Sutcliffe efﬁciency (NSE) coefﬁcient, root mean square error (RMSE), and ratio of the RMSE to the standard deviation of measured data (RSR). The applications for the abovementioned models for predicting the shear strength of RFM are compared and discussed. The analysis of the R 2 together with NSE, RMSE, and RSR for the RFM shear strength data set demonstrates that the SVM achieved a better prediction performance with (R 2 = 0.9655, NSE = 0.9639, RMSE = 0.1135, and RSR = 0.1899) succeeded by the RF model with (R 2 = 0.9545, NSE = 0.9542, RMSE = 0.1279, and RSR = 0.2140), the AdaBoost model with (R 2 = 0.9390, NSE = 0.9388, RMSE = 0.1478, and RSR = 0.2474), and the KNN with (R 2 = 0.6233, NSE = 0.6180, RMSE = 0.3693, and RSR = 0.6181). Furthermore, the sensitivity analysis result shows that normal stress was the key parameter affecting the shear strength of RFM.


Introduction
Rockfill materials (RFMs) are commonly used in civil engineering projects such as rockfill dams, slopes, and embankments as construction materials for filling. This material is either obtained from a river's alluvial deposits or by blasting available rock [1,2]. RFMs are widely being used in the construction of rockfill dams to trap the river water because of their inherent flexibility, capacity to absorb large seismic energy, and adaptability to various foundation conditions. The behavior of RFMs used in rockfill dams is important for the safe and cost-effective construction of these structures. Generally, rockfill behaves 2 of 22 like a Mohr/Coulomb material, albeit without cohesion and with relatively high internal friction angles. Crushed rockfill, loosely layered, can behave like coarse sand. The shear strength of both types of RFM is affected by many factors such as mineral composition, surface structure, particle size, shape, relative density, individual particle strength, etc. [3][4][5]. Because of the variable jointing, angularity/roundness, and rock particle size distribution, the RFM can be considered the most complex material [6]. In order to know the mechanical properties of RFMs, extensive field and laboratory research is essential for understanding RFM behavior and determining shear strength parameters in order to design safe and cost-effective structures. In situ direct shear system was used to monitor the shear strength of RFM, as well as the variation in the shear strength of rockfill along with the fill lift [7]. Linero [8] carried out some large-scale shear resistance experiments to simulate the material's original grain size distribution and the expected load level. RFM with a large particle size (maximum particle size of 1200 mm) is incompatible in laboratory testing [9]. Owing to restricting the effects of large particle sizes on test apparatus, such behavior makes it much more difficult to design representative/realistic large-scale strength tests. Furthermore, determining the shear strength of RFM directly is considered a costly and difficult process. Large-scale shear tests are often time-consuming and complex, and estimating the nonlinear shear strength function without using an analytical method is difficult. As a result, several researchers have attempted to determine the mechanical properties of RFM using indirect methods based on machine learning (ML) techniques.
In recent years, several researchers used ML algorithms and achieved efficient successes in different civil engineering and other sectors such as environmental [10], geotechnical [11][12][13][14][15][16][17][18], and other fields of science [19][20][21][22][23][24][25][26][27][28]. Numerous researchers have documented the behavior of the RFM. Marsal [3], Mirachi et al. [4], Venkatachalam [5], Gupta [29], Abbas [30], and Honkanadavar and Sharma [31] carried out laboratory experiments on different rockfill materials and concluded that the behavior of stress-strain is nonlinear, inelastic and based on the level of stress. They also noted that with an increase in maximum particle size for riverbed rockfill material, the angle of internal friction increases, and a reverse pattern for quarried rockfill material is observed. Frossard et al. [32] proposed a rational approach for assessing rockfill shear strength on the basis of size effects; Honkanadavar and Gupta [9] developed power law to relate the shear strength parameter to some index properties of riverbed RFM. Describing the mechanical behavior of rockfill materials and challenges in large-scale strength tests have incited several approaches in modeling the respective behavior of such soils. In this context, the artificial neural network (ANN) approach used by Kaunda [33] needs fewer rockfill parameters and was found to be more efficient in predicting RFM shear strength. Zhou et al. [34] have recently used cubist and random forest regression algorithms and have found that both can deliver better predictive RFM shear strength results than ANN and conventional regression models. This field, however, continues to be further explored. Considering that large-scale strength tests to characterize the shear strength are challenging, ML algorithms based on support vector machine (SVM), random forest (RF), AdaBoost, and k-nearest neighbor (KNN) models are proposed. Furthermore, the ML algorithms SVM, RF, AdaBoost, and KNN have demonstrated excellent prediction efficiency in a variety of fields [35][36][37][38][39] because of their generalization capability. The application in civil engineering field more significantly in prediction of RFM shear strength is limited based on literature surveys.
The main intention of the present study is to explore the capability of using SVM, RF, AdaBoost, and KNN algorithms to establish a more precise and parsimonious behavioral model for predicting the RFM shear strength. A critical review of existing literature suggests that despite the successful implementation of these techniques in various domains, their implementation in the prediction of RFM shear strength is scarcely explored. One of the primary significances of this study is that the data division in the training and testing data sets has been made with due regard to statistical aspects such as maximum, minimum, mean, and standard deviation. The splitting of the data sets is made to determine the predictive capability and generalization performance of established models and later helps to better evaluate them. Additionally, sensitivity analysis is carried out to find the main parameter influencing RFM shear strength. Concisely, the present study investigated and expanded the scope of machine learning algorithms for the development of the RFM shear strength model, which will provide theoretical support for researchers to establish a basis in selecting optimal machine learning algorithms in improving the predictive performance of RFM shear strength.
The rest of this article is structured as follows: The next section introduces the description of the used database and preliminaries of the algorithms used in the proposed approach and discusses the model evaluation metrics. Development of SVM, RF, AdaBoost, and KNN models are described in Section 3. Section 4 is dedicated to the performances and comparison of proposed models. Finally, Section 5 draws conclusions and outlines promising directions for future work.

Data Set
In this study, 165 samples of rockfill material (RFM) shear strength case history acquired by Kaunda [33] presented in Table A1 in the Appendix A were used to develop and evaluate the effectiveness of the proposed models. The RFM shear strength case history data are summarized in Table 1, where D 10 , D 30 , D 60 , and D 90 correspond to the 10%, 30%, 60%, and 90% passing sieve sizes, C c and C u refer to coefficients of uniformity and curvature (C c ), respectively, FM and GM describe fineness modulus and gradation modulus, respectively, R represents ISRM hardness rating, UCS min and UCS max (MPa) indicate the minimum and maximum uniaxial compression strengths (MPa), γ is the dry unit weight (kN/m 3 ), σ n is normal stress (MPa), and τ is the shear strength of RFM (MPa) as the output variable. In this study, the output parameter selected to determine shear strength was the shear stress value at the failure of test samples and was the single output variable. The database was divided into two different sets, consisting of 80 percent (132 cases) and 20 percent (33 cases) of data, respectively, represented as training and testing sets. The testing set was accustomed to determine when training should be stopped in order to avoid overfitting. In order to achieve a consistent data splitting, different combinations of training and testing sets were experienced. The abovementioned selection was in such a way that the maximum (Max), minimum (Min), mean, and standard deviation of the parameters were consistent in the training and testing data sets (Table 2).

Support Vector Machine
Boser, Guyon, and Vapnik were the first to formulate and introduce the support vector machine (SVM) [40]. In the case of non-separable data, to accommodate errors for certain objects i, the "ideal boundary" must be introduced: where C is the penetrating parameter; δ and b are, respectively, the normal vector and the bias of the hyperplane; and each ξ i refers to the distance within object i and the respective margin hyperplane [41,42]. Data are implicitly mapped to a higher-dimensional space through mercer kernels, which can be broken down into a dot product to learn nonlinearly separable functions [42]. The kernel of the radial basis function (RBF) that is used widely is listed below: where σ is the kernel parameter.

Random Forest
The use of a large series of low-dimensional regression trees is the basis of the random forest (RF). The theoretical development of RF is described by Breiman [43]. RF is an example of ensemble learning, which requires the development of a large number of decision trees to be implemented. In general, there are two types of decision trees: regression trees and classification trees. Regression trees were designed in the RF model since the main goal of this analysis was to predict the shear strength of RFM. Figure 1 depicts a general architecture for RF analysis. The protocol for analysis can be divided into two stages: Stage 1: To create a sequence of sub-data sets, the bootstrap statistical technique is used to randomly sample from the initial data set (training data). The forest is then built using regression trees based on these sub-data sets. Each tree is trained by choosing a set of variables at random (a fixed number of descriptive variables selected from the random subset). Two important parameters that can be adjusted during the training stage are the number of trees (ntree) and the number of variables (mtry).
Stage 2: Once the model has been trained, a prediction can be made. In an ensemble approach, input variables are evaluated for all regression trees first, and then the final output is calculated by measuring the average value of each individual tree's prediction.
where  is the kernel parameter.

Random Forest
The use of a large series of low-dimensional regression trees is the basis of the random forest (RF). The theoretical development of RF is described by Breiman [43]. RF is an example of ensemble learning, which requires the development of a large number of decision trees to be implemented. In general, there are two types of decision trees: regression trees and classification trees. Regression trees were designed in the RF model since the main goal of this analysis was to predict the shear strength of RFM. Figure 1 depicts a general architecture for RF analysis. The protocol for analysis can be divided into two stages: Stage 1: To create a sequence of sub-data sets, the bootstrap statistical technique is used to randomly sample from the initial data set (training data). The forest is then built using regression trees based on these sub-data sets. Each tree is trained by choosing a set of variables at random (a fixed number of descriptive variables selected from the random subset). Two important parameters that can be adjusted during the training stage are the number of trees (ntree) and the number of variables (mtry).
Stage 2: Once the model has been trained, a prediction can be made. In an ensemble approach, input variables are evaluated for all regression trees first, and then the final output is calculated by measuring the average value of each individual tree's prediction.

AdaBoost Algorithm
The sequential ensemble technique AdaBoost, or adaptive boosting, is based on the concept of developing many poor learners using different training sub-sets drawn at random from the original training data set. Weights are allocated during each training session, and these are used to learn each hypothesis. The weights are used to calculate the hypothesis error on the data set and are a measure of the relative importance of each in-

AdaBoost Algorithm
The sequential ensemble technique AdaBoost, or adaptive boosting, is based on the concept of developing many poor learners using different training sub-sets drawn at random from the original training data set. Weights are allocated during each training session, and these are used to learn each hypothesis. The weights are used to calculate the hypothesis error on the data set and are a measure of the relative importance of each instance. After each iteration, the weights are recalculated so that instances classified wrongly by the previous hypothesis obtain higher weights. This allows the algorithm to concentrate on instances that are more difficult to understand. The algorithm's most important task is to assign updated weights to instances that were wrongly labeled. In regression, the instances represent a real-value error. The AdaBoost technique can be used to mark the calculated error as an error or not an error by comparing it to a predefined threshold prediction error. Instances that have made a greater mistake on previous learners are more likely (i.e., have a higher probability) to be chosen for training the next base learner. Finally, an ensemble estimate of the individual base learner predictions is made using a weighted average or median [44].

k-Nearest Neighbor
The supervised ML algorithm k-nearest neighbor (KNN) can be used to solve both classification and regression problems. It is, however, most commonly used in classification problems [45]. In regression problems, the input data set consists of k that is nearest to the training data sets deployed in the featured set. The output is dependent if KNN is deployed to function as a regression algorithm. For KNN regression, the ensuing result is the characteristic value for the object, which is the mean figure of k's nearest neighbors. To locate the k of a data point, a parameter such as Euclidean, Mahalanobis can be used as the distance metric [46,47].

Performance Metric
The coefficient of determination (R 2 ), Nash-Sutcliffe efficiency (NSE) coefficient, root mean square error (RMSE), and the ratio of the RMSE to the standard deviation of measured data (RSR) were taken into account to examine the predictive capacity of the models, as shown in Equations (3)-(6) [48][49][50]: where n is the number of observations under consideration, O i is the ith observed value, O is the mean observed value, P i is the ith model-predicted value, and P is the mean model-predicted value. R-squared, also called the determination coefficient, describes the change in data as the degree of fit. The normal "determination coefficient" range is (0-1). The model is considered to be efficient if the R 2 value is greater than 0.8 and is close to 1 [51]. The NSE is a normalized statistic that controls the relative extent of the residual variance relative to the variance of the data measured [52]. The NSE varies between −∞ and 1. When NSE = 1, it presents a flawless match among observed and predicted values. Model predictive output with a range of 0.75 < NSE ≤ 1.00, 0.65 < NSE ≤ 0.75, 0.50 < NSE ≤ 0.65, 0.40 < NSE ≤ 0.50, or NSE ≤ 0.4 is graded as very good, good, acceptable, or unacceptable, respectively [53,54]. The RMSE is the square root of the ratio of the square of the deviation between the observed value and the true value of the number of observations n. The RMSE has a value greater than or equal to 0, where 0 is a statistically perfect fit for the data observed [55][56][57]. The RSR is interpreted as the ratio of the measured data's RMSE and standard deviation. The RSR varies between an optimal value of 0 and a large positive value. A lower RSR presents a lower RMSE, which indicates the model's greater predictive efficiency. RSR classification ranges are described as very good, good, acceptable, and unacceptable with ranges of 0.00 ≤ RSR ≤ 0.50, 0.50 ≤ RSR ≤ 0.60, 0.60 ≤ RSR ≤ 0.70, and RSR > 0.70, respectively [53].

Model Development to Predict RFM Shear Strength
The models for RFM shear strength prediction were developed using Orange software, which is a popular open-source environment for statistical computing and data visualization. All data processing is carried out using Orange software (version 3.13). The most prevalent supervised learning classification algorithms are given by Orange. In the package documentation manuals, one can find more information about input parameters, implementation, and references.
The structure of the model was based on an input matrix identified by predictor variables, x = {D 10 , D 30 , D 60 , D 90 , C c , C u , GM, FM, R, UCS min , UCS max , γ, and σ n } and output, also called target variable (y), was the RFM shear strength. In every modeling process, achieving a consistent data division and the appropriate size of the training and testing data sets is the most important task. The statistical features, such as the minimum, maximum, mean, and standard deviation of the data sets, have therefore been taken into account in the splitting process. The statistical accuracy of the training and testing data sets optimizes the performance of the models and ultimately helps to evaluate them better. On the remaining data set, the proposed models were tested. In other words, to build and test the models, 132 and 33 data sets were used, respectively. To fairly assess the predictive performance of the models, the data set used for the testing of all models was kept the same.
In order to optimize the RFM shear strength prediction, all the models (AdaBoost, RF, SVM, and KNN) were tuned based on the trial and error process. Initially, the values were chosen for model tuning parameters and gradually varied in experiments until the best fitness measurements were achieved. Figure 2 shows the schematic diagram of the proposed methodology. The optimization method aims to find the best parameters for AdaBoost, RF, SVM, and KNN in order to achieve the best prediction accuracy. Some critical hyperparameters in the AdaBoost, RF, SVM, and KNN algorithms are tuned in this study, as shown in Table 3. The definitions of these hyperparameters are also clarified in Table 3. The values for the tuning parameters of the models were first chosen and then varied in the trials until the best fitness measures mentioned in Table 3 were achieved.

Results and Discussion
In this study, R 2 , NSE coefficient, RMSE, and RMSE to standard deviation of measured data are chosen as the criterion for defining the model's output. The database is split into a training data set and a testing data set to evaluate the performance of the presented models. To make a fair comparison, all the models are developed by applying them to the same RFM shear strength training and testing data sets.

Results and Discussion
In this study, R 2 , NSE coefficient, RMSE, and RMSE to standard deviation of measured data are chosen as the criterion for defining the model's output. The database is split into a training data set and a testing data set to evaluate the performance of the presented models. To make a fair comparison, all the models are developed by applying them to the same RFM shear strength training and testing data sets.  Figure 4, presenting the predicted RFM shear strength, is plotted with the actual RFM shear strength data. According to the test data set, all models demonstrated very good predictive potential (R 2 > 0.8) with the exception of KNN, which displayed slightly worse results (i.e., R 2 = 0.6304) for the test data set. The result of R 2 demonstrated that all SVM, RF, and AdaBoost models except KNN are appropriate, but the SVM model performed better because it had the highest R 2 value (0.9656), and after that, the RF (0.9181) and AdaBoost (0.8951) models. In comparison to the other models, the KNN model presented the worst estimates with maximum dispersion (Figure 4). Figure 4, presenting the predicted RFM shear strength, is plotted with the actual RFM shear strength data. According to the test data set, all models demonstrated very good predictive potential (R 2 > 0.8) with the exception of KNN, which displayed slightly worse results (i.e., R 2 = 0.6304) for the test data set. The result of R 2 demonstrated that all SVM, RF, and AdaBoost models except KNN are appropriate, but the SVM model performed better because it had the highest R 2 value (0.9656), and after that, the RF (0.9181) and AdaBoost (0.8951) models. In comparison to the other models, the KNN model presented the worst estimates with maximum dispersion (Figure 4).  In addition, the NSE measure was ranked from highest to lowest predictive strength, following the way: SVM (0.9654) > RF (0.9164) > AdaBoost (0.8835) > KNN (0.6076), which is similar to R 2 . With regard to RMSE score, the SVM model also had the maximum predictive ability by having the lowest RMSE (0.0153), succeeded by the models RF (0.0797), AdaBoost (0.0941), and KNN (0.1727).
Finally, the reliability of all applied models was divided into four groups based on RSR values: unsatisfactory, satisfactory, good, and very good with ranges of RSR > 0.70, 0.60 ≤ RSR ≤ 0.70, 0.50 ≤ RSR ≤ 0.60, and 0.00 ≤ RSR ≤ 0.50, respectively. The RSR value therefore demonstrates very good results throughout all our established models except the KNN model, whose performance is considered to be satisfactory. Figure 5 depicts the bar graphs comparing the R 2 , NSE, RMSE, and RSR for the training and testing data sets of all the models. The R 2 defines the degree of co-linearity between our predicted and actual data. The value of RMSE is more focused on large errors than on small errors. A lower RSR indicates a lower RMSE, indicating the model's better predictive efficiency. The SVM model has high R 2 and NSE while lower RMSE and RSR values, revealing that the SVM model is preferable for predicting the RFM shear strength for the testing data. The SVM achieved a better prediction performance with (R 2 = 0.9655, RMSE = 0.0513 and mean absolute error (MAE) = 0.0184) in comparison to the cubist method (R 2 = 0.9645, RMSE = 0.0975, and MAE = 0.0644) and ANN method (R 2 = 0.9386, RMSE = 0.1320 and MAE = 0.0841) reported by Zhou et al. [34] and Kaunda [33], respectively, for the test data. Additionally, the accuracy of modeling determined by the linear regression method reported by Andjelkovic et al. [58] between measured and calculated values of shear strength (R 2 = 0.836) was slightly lower than the proposed SVM model. In general, the generalization and reliability of the SVM algorithm perform well, and larger data sets can yield better prediction results.  In addition, the NSE measure was ranked from highest to lowest predictive strength, following the way: SVM (0.9654) > RF (0.9164) > AdaBoost (0.8835) > KNN (0.6076), which is similar to R 2 . With regard to RMSE score, the SVM model also had the maximum predictive ability by having the lowest RMSE (0.0153), succeeded by the models RF (0.0797), AdaBoost (0.0941), and KNN (0.1727).
Finally, the reliability of all applied models was divided into four groups based on RSR values: unsatisfactory, satisfactory, good, and very good with ranges of RSR > 0.70, 0.60 ≤ RSR ≤ 0.70, 0.50 ≤ RSR ≤ 0.60, and 0.00 ≤ RSR ≤ 0.50, respectively. The RSR value therefore demonstrates very good results throughout all our established models except the KNN model, whose performance is considered to be satisfactory. Figure 5 depicts the bar graphs comparing the R 2 , NSE, RMSE, and RSR for the training and testing data sets of all the models. The R 2 defines the degree of co-linearity between our predicted and actual data. The value of RMSE is more focused on large errors than on small errors. A lower RSR indicates a lower RMSE, indicating the model's better predictive efficiency. The SVM model has high R 2 and NSE while lower RMSE and RSR values, revealing that the SVM model is preferable for predicting the RFM shear strength for the testing data. The SVM achieved a better prediction performance with (R 2   In the present research, a sensitivity analysis was also conducted using Yang and Zang's [59] method to evaluate the influence of input parameters on RFM shear strength. This approach has been used in several studies [60][61][62][63] and is formulated as: where n is the number of data values (this study used 132 data values) and y im and y om are the input and output parameters. The r ij value ranged from zero to one for each input parameter, and the highest r ij values suggested the most efficient output parameter (which was RFM shear strength in this study). The r ij values for all input parameters are presented in Figure 6. It can be seen from Figure 6 that the σ n with r ij is 0.990. Similar research of sensitivity analyses on RFM shear strength was also implemented by Kaunda [33] and Zhou et al. [34]. The findings demonstrated that normal stress is the most sensitive factor, which shows agreement with the present mentioned results.
Despite the fact that the proposed model produces desirable prediction results, certain limitations should be addressed in the future.
(1) Similar to other machine learning methods, the major disadvantages of SVM, RF, AdaBoost, and KNN models are sensitive to the fitness of the data set. Generally, if the data set is small, the generalization and reliability of the model would be influenced. However, the SVM, RF, and AdaBoost algorithms work with a limited data set, i.e., 165 cases, except for KNN. The prediction performances could be better on a larger data set. Furthermore, the developed models can always be updated to yield better results as new data becomes available. (2) Other qualitative indicators such as the Los Angeles abrasion value and lithology may also have influences on the prediction results of the shear strength of RFM. Accordingly, it is significant to analyze the influence of these indicators on the prediction results for improving performance.  [34] and Kaunda [33], respectively, for the test data. Additionally, the accuracy of modeling determined by the linear regression method reported by Andjelkovic et al. [58] between measured and calculated values of shear strength (R 2 = 0.836) was slightly lower than the proposed SVM model. In general, the generalization and reliability of the SVM algorithm perform well, and larger data sets can yield better prediction results. In the present research, a sensitivity analysis was also conducted using Yang and Zang's [59] method to evaluate the influence of input parameters on RFM shear strength. This approach has been used in several studies [60][61][62][63] and is formulated as: the input and output parameters. The rij value ranged from zero to one for each input parameter, and the highest rij values suggested the most efficient output parameter (which was RFM shear strength in this study). The rij values for all input parameters are presented in Figure 6. It can be seen from Figure 6 that the σn with rij is 0.990. Similar research of sensitivity analyses on RFM shear strength was also implemented by Kaunda [33] and Zhou et al. [34]. The findings demonstrated that normal stress is the most sensitive factor, which shows agreement with the present mentioned results. Despite the fact that the proposed model produces desirable prediction results, certain limitations should be addressed in the future.
(1) Similar to other machine learning methods, the major disadvantages of SVM, RF, AdaBoost, and KNN models are sensitive to the fitness of the data set. Generally, if the data set is small, the generalization and reliability of the model would be influenced. However, the SVM, RF, and AdaBoost algorithms work with a limited data set, i.e., 165 cases, except for KNN. The prediction performances could be better on a larger data set. Furthermore, the developed models can always be updated to yield better results as new data becomes available. (2) Other qualitative indicators such as the Los Angeles abrasion value and lithology may also have influences on the prediction results of the shear strength of RFM. Accordingly, it is significant to analyze the influence of these indicators on the prediction results for improving performance.

Conclusions
This

Conclusions
This study employed and examined the SVM, RF, AdaBoost, and KNN algorithms in the RFM shear strength prediction problem. To construct and validate a new model on the basis of the aforementioned algorithms, a comprehensive database containing 165 RFM case studies was collected from the available literature. Thirteen different predictive variables for rockfill characterization were selected as the input variables: D 10 (mm), D 30 (mm), D 60 (mm), D 90 (mm), C c , C u , GM, FM, R, UCS min (MPa), γ (kN/m 3 ), UCS max (MPa), and σ n (MPa). The predictive performance of the proposed models is verified and compared. The conclusions can be outlined as follows:

1.
In this study, the SVM model (R 2 = 0.9656, NSE = 0.9654, RMSE = 0.0153, and RSR = 0.1861) successfully achieved a high level of modeling prediction efficiency to RF (R 2 = 0.9181, NSE = 0.9164, RMSE = 0.0797, and RSR = 0.2891), AdaBoost (R 2 = 0.8951, NSE = 0.8835, RMSE = 0.0941, and RSR = 0.3414), and KNN (R 2 = 0.6304, NSE = 0.6076, RMSE = 0.1727, and RSR = 0.6264) in the test data set. As the same methodology (having the same training and test data sets) for structuring all models is taken into consideration, the SVM model resulted the best and highest performance in this aspect. This implies that this algorithm is robust in comparison with others in RFM shear strength prediction.

2.
The performance (in terms of R 2 ) of the test data set for the SVM, RF, and AdaBoost algorithms studied falls in the range of 0.9656-0.8951 across the three models with 13 input valuables. Results conclude that it is rational and feasible to estimate the shear strength of RFM from the gradation, particle size, dry unit weight (γ), material hardness, FM, and normal stress (σ n ).

3.
Sensitivity analysis results revealed that normal stress (σ n ) was the key parameter affecting the shear strength of RFM.
The findings show that the SVM model is a useful and accurate artificial intelligence technique for predicting RFM shear strength and can be used in various fields. Further, the generalization of the proposed approach for achieving improved performance results, more experimental data should be collected in future research. Finally, RFM shear strength prediction using advanced machine learning algorithms (i.e., deep learning) is left as a future research topic.

Data Availability Statement:
The data used to support the findings of this study are included within the article.

Conflicts of Interest:
The authors declare no conflict of interest.