New Prediction Model of Rock Cerchar Abrasivity Index Based on Gene Expression Programming

Sun, Jingdong; Fan, Xiaohua; Wang, Hao; Shang, Yong; Sun, Chaoyang

doi:10.3390/app152010901

Open AccessArticle

New Prediction Model of Rock Cerchar Abrasivity Index Based on Gene Expression Programming

by

Jingdong Sun

¹,

Xiaohua Fan

²,

Hao Wang

³,

Yong Shang

^1,3 and

Chaoyang Sun

^1,*

¹

School of Mechanical Engineering, University of Science and Technology Beijing, 30 Xueyuan Road, Haidian District, Beijing 100083, China

²

Huaneng Xizang the Yarlung Zangbo River Hydropower Development Investment Co., Ltd., Lhasa 850000, China

³

China Railway Engineering Equipment Group Tunnel Equipment Manufacturing Co., Ltd., Xinxiang 453000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 10901; https://doi.org/10.3390/app152010901

Submission received: 1 September 2025 / Revised: 26 September 2025 / Accepted: 2 October 2025 / Published: 10 October 2025

(This article belongs to the Special Issue New Insights into Digital Rock Physics)

Download

Browse Figures

Versions Notes

Abstract

In recent years, the rapid development of underground engineering projects has driven a significant increase in the variety and quantity of excavation equipment. The wear of excavation tools significantly increases construction costs and reduces construction efficiency. The wear rate of excavation tools is closely related to the abrasiveness of the rock. The Cerchar abrasivity index (CAI) is the most widely used index for estimating rock abrasiveness. The primary objective of this paper is to develop a novel prediction model for CAI, which is established based on the mechanical properties and petrographic parameters of rocks. These parameters include uniaxial compressive strength, Brazilian splitting strength, quartz content, equivalent quartz content, average quartz size, brittleness indices, rock abrasive index, and Schimazek’s F-abrasiveness. Correlation analysis was used to conduct a preliminary analysis between CAI and single-influence parameters. The results indicated that a single factor is not suitable for directly predicting CAI. In addition, multiple linear regression (MLR) and a non-linear algorithm, gene expression programming (GEP), were used to establish new prediction models for CAI. A statistical comparison was conducted between the prediction accuracy of the GEP-based model and the MLR-based model. In comparison to the MLR-based model, the GEP-based model demonstrates higher accuracy in predicting CAI.

Keywords:

Cerchar abrasivity index; multiple linear regression; gene expression programming; rock properties; abrasivity

1. Introduction

With the growing demand for underground space construction projects such as metro tunnels, water conveyance tunnels, and oil extraction tunnels, the types and quantities of tunneling boring machines (TBMs), rock drilling tools, and sawing equipment increase year by year [1,2,3]. During the construction of projects, a large number of rock-breaking tools are expended due to wear and tear [4], which significantly increases construction costs and reduces construction efficiency. Tool wear depends in part on the tool material and operating conditions, but the most important factor is the abrasiveness of the rock. Rock abrasiveness is defined as the inherent capacity of the rock to cause material loss in contacting material [5]. In tunneling and mining engineering, the Cerchar abrasivity index (CAI) test [6], Gouging abrasion test [7], Laboratoire des Ponts et Chaussées (LCPC) abrasivity test [5], and Norwegian University of Science and Technology (NTNU) test [8] are four commonly used tests to evaluate rock abrasivity. Among them, the CAI test is the most widely used method due to its simple, low-cost, fast test procedure [9].

The CAI test was developed in the 1970s by the Cerchar Institute for evaluating the abrasiveness of coal-bearing rocks [10]. There are two types of test apparatus in use today, which are the original test device and West test apparatus [5]. The test specification is sliding a stylus (HRC 55 ± 1) with a 90° conical tip on the rock sample surface for 10 mm under a constant load of 70 N. The diameter of the worn stylus tip is observed, and the CAI value of the rock sample is 10 times the wear flat diameter [11]. With the promotion and application of the CAI test, scholars have conducted a number of studies on CAI. Some scholars focus on studying the impact of test specification on CAI. Yaralı et al. [10] measured CAI values at different scratch lengths and suggested that it was suitable to use a 15 mm scratch length when conducting the CAI test. Aydm [12] analyzed the effect of test parameters on CAI and indicated that the rock sample surface conditions and the hardness of the stylus significantly influence the value of CAI. Michalakopoulos et al. [13] studied in detail the effect of the stylus hardness on CAI and gave the conversion formula between CAI values under different stylus hardness. Zhang et al. [5] investigated the relationships between CAI and several abrasive parameters, such as test force and scratching specific energy, based on a new test device. Other scholars explored the relationship between CAI and the intrinsic properties of rock [14]. Ko et al. [15] used single and multiple linear regression (MLR) analysis methods to study the relationship between CAI and rock geomechanical properties. The results showed that a single parameter alone is not suitable for predicting CAI and that quartz content has little effect on CAI. Yaralı et al. [16] studied the relationships between CAI and petrographic properties. It was found that the quartz content and cement type affect the CAI. And a strong correlation between CAI and average quartz grain size was found. Alber et al. [17] proposed a novel test device to research the effect of situ stresses and demonstrated that the CAI is stress-dependent. Moradizadeh et al. [18] analyzed the effect of the equivalent quartz content (EQC), point load index, and water absorption on CAI with bivariate and multivariate regression methods. The results showed that the main effect factors are different for different rock types, and the main factors are point load index and EQC for sandstone, while the main factor is EQC for igneous. Er et al. [19] studied the empirical relationships between petrographical characteristics and CAI with the regression analysis method. Zhang et al. [1] indicated that the EQC and uniaxial compressive strength (UCS) significantly affect CAI based on univariate regression analysis results. Most of the aforementioned studies rely primarily on regression analysis methods, which fail to deeply explore the intrinsic relationship between CAI and the inherent parameters of rocks.

With the development of machine learning (ML), various algorithms have been proposed for analyzing the relationships between parameters, such as neural networks, ant colony algorithms, and genetic algorithms. Compared to traditional regression analysis methods, it exhibits higher advantages in predicting the accuracy of parameter relationships. Perez et al. [8] developed an artificial intelligence (AI) model to estimate CAI. The results showed that the acoustic emission, UCS, quartz content (QC), Young’s Modulus, and pin hardness are the optimum input parameters for predicting CAI. Elbaz et al. [20] developed a hybrid model for predicting the life of disc cutters by coupling a group method of data handling (GMDH)-type neural network (NN) and a genetic algorithm (GA), which has high prediction accuracy. Qi et al. [21] proposed an abrasive index prediction model, and the model was optimized with metaheuristic optimization algorithms (MOAs). The results reveal that the quartz content significantly impacts the abrasive index. Houshmand et al. [22] utilized supervised ML models to predict rock hardness by rock geophysical and geochemical features. Geng et al. [23] compared the differences in the accuracy of disc cutter force prediction between four ML models and theoretical and empirical prediction formulas. The results indicated that four ML models consistently achieved high prediction accuracy. Shin et al. [24] proposed an ensemble model to forecast disc cutter wear, which combines the advantages of random forest (RF) and extreme gradient boosting (XGB). Tripathy et al. [25] described an artificial neural networking (ANN) prediction methodology to evaluate the CAI using geomechanical properties as input parameters and confirmed the effectiveness of the ANN-based model. Onifade et al. [26] developed a reliable model which combines a shallow neural network (SNN) and a deep neural network (DNN) to predict the abrasive index of coal samples. The results indicated that the ML method demonstrated superior performance in prediction, and the ash content is the most important parameter compared to volatile matter, fixed carbon, and calorific value. Although the abovementioned studies have achieved good prediction accuracy using ML, they cannot provide a clear and simple prediction equation, which is more convenient to use in the mining industry and excavation projects.

To develop a multi-parameter, visual, and accurate model for predicting the value of CAI, this paper developed a new mathematical predictive model for predicting CAI based on gene expression programming (GEP). A comprehensive database consisting of 81 typical samples was collected include various rock types. Ten parameters of rock, such as average quartz diameter, UCS, and EQC, Brazilian splitting strength, quartz content, brittleness indices, rock abrasive index, and Schimazek’s F-abrasiveness were selected as input features to estimate CAI. The database is divided into a training group and a testing group for model iteration training and prediction accuracy validating. In addition, the MLR method was also adopted to analyze the database and propose MLR-based models. The accuracy of prediction between the different models was compared based on multiple evaluation metrics.

2. Database Creation and Statistics Analysis

2.1. Database Creation

Many studies have focused on the relationship between CAI and rock properties. They measured CAI and various rock properties parameters, and the summary results are shown in Table 1. According to the results of previous studies, the rock performance parameters that affect the value of CAI are mainly divided into two categories. The first category is the mechanical property parameters of rocks, such as UCS, Brazilian splitting strength (BTS), direct shear strength, and point load index. Compared to other mechanical parameters, UCS and BTS are widely studied and show good correlation with CAI in specific rocks [9,15]. Therefore, they are selected as mechanical input parameters in this study. The other category is the petrographic properties of rocks, such as QC, EQC, and average quartz size (D). A large number of scholars have shown that petrographic parameters significantly affect the value of CAI [16,18,19]. Hence, QC, EQC, and D are selected as petrographic input parameters for predicting CAI. According to the selected input parameters, 81 rock datasets containing all necessary input parameters were selected from the literature, as shown in Appendix A. It is worth noting that the dataset includes different rock types. The different rock types have different mineral compositions, quartz sizes, and mechanical properties. Among the input parameters selected above, EQC and QC can reflect the influence of mineral composition, while UCS and BTS can capture the impact of mechanical properties, and D is able to represent the influence of quartz size. Hence, these input parameters can reflect the influence of rock type to a certain extent, and it is reasonable to treat different rock types as a single group.

In addition, some rock property indices can be calculated from the above input parameters, such as brittleness index, rock abrasive index (RAI), and Schimazek’s F-abrasiveness (SF-a). Brittleness is usually considered an important geomechanical parameter of rock, which significantly influences the CAI in some rock types [15]. There is no standardized calculation formula for rock brittleness [15]. Based on the UCS and BTS of rock, Hucka [36] and Altindag [15] proposed three different formulas to reflect the brittleness index of rock. The three brittleness indices B₁, B₂, and B₃ are calculated as follows:

B_{1} = \frac{σ_{c}}{σ_{t}},

(1)

B_{2} = \frac{σ_{c} - σ_{t}}{σ_{c} + σ_{t}},

(2)

B_{3} = \sqrt{\frac{σ_{c} σ_{t}}{2}},

(3)

where

σ_{c}

is the uniaxial compressive strength, and

σ_{t}

is the Brazilian tensile strength.

Similarly, the RAI [1] and SF-a [29] can preliminarily reflect the abrasiveness of rocks. Those two parameters are determined by comprehensively considering the influence of mineral contents and mechanical parameters. And the formulas can be defined as follows:

R A I = \frac{E Q C \times U C S}{100},

(4)

SF-a = \frac{E Q C \times D \times B T S}{100},

(5)

These derived parameters preliminarily consider the effects of mechanical parameters and petrographic properties, which are intrinsically related to CAI [1,15,29]. Using derived parameters as input parameters during GEP research can decrease the complexity of the proposed model [37]. Hence, it is reasonable to use these derived parameters as input parameters to predict the value of CAI.

2.2. Statistics Analysis

The value of CAI represents the abrasiveness of rocks. A high CAI value indicates that the rock is highly abrasive, which leads to more rapid wear of excavation tools. A low CAI value indicates that the rock is less abrasive, which leads to a longer tool life. According to the classification proposed by the International Society for Rock Mechanics (ISRM) [14,33], the rock abrasiveness is divided into seven levels based on the value of CAI.

The quantity distribution of collected data on 81 rocks in different abrasiveness levels is shown in Figure 1. As can be seen from Figure 1, when the CAI value is between 0.1 and 0.4, the rock abrasiveness is at an extremely low level, and the collected rock data does not include extremely low abrasive rock. In addition, 7 of the collected rocks have very low abrasivity (8.6%), 26 rocks have low-abrasivity (32.1%), 18 rocks have medium abrasivity (22.2%), 13 rocks have high abrasivity (16%), 6 rocks have very high abrasivity (7.4%), and 11 rocks are extremely highly abrasive (13.6%). Except for the extremely low abrasiveness level, the rock data collected basically covers different abrasiveness levels.

As mentioned in Section 2.1, the CAI is the output parameter that can evaluate the abrasiveness of rock. And the UCS, BTS, QC, EQC, and D are assigned as the input parameters, which have been measured in the literature. Simultaneously, the brittleness index, RAI, and SF-a were also assigned as the input parameters, which can be calculated based on mechanical and petrographic properties. The statistical properties of the database samples are summarized in Table 2. In the database samples, the value of rock UCS ranges from 11.04 MPa to 313.2 MPa, the BTS ranges from 0.48 MPa to 18.65 MPa, the QC ranges from 0.01% to 85%, the EQC ranges from 2.03% to 89.7%, and the average quartz size ranges from 0 mm to 2.5 mm.

The Pearson correlation coefficient reflects the linear relationship [24] between two parameters, where 1 represents a complete positive correlation, and −1 represents a complete negative correlation. And there is no linear relationship between the two parameters when the value is 0. The Pearson correlation coefficient r can be calculated as follows:

r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}},

(6)

where n is the sample size of the database,

x_{i}

and

y_{i}

represents the i-th sample point indexed, and

\bar{x}

and

\bar{y}

denotes the mean value of sample parameters.

In order to preliminarily observe the correlation between input parameters. The Pearson correlation analysis was adopted to examine the relationship between different input parameters, as shown in Figure 2, where p is the value of statistical significance. The Pearson correlation matrix indicates that the value of r for most parameters was always between −0.5 to 0.5 [21]. Hence, these parameters are strongly independent and have minimal information redundancy. It is worth noting that the UCS was significantly positively correlated with the BTS (r = 0.66, p < 0.01), and the QC was significantly positively correlated with the EQC (r = 0.94, p < 0.01). Although there is a strong linear relationship between UCS and BTS and QC and EQC, previous studies have shown that their effects on CAI vary depending on the type of rock [1,15,29].

The linear relationships and 95% fitting confidence band between CAI and the original input parameters are depicted in Figure 3. It can be noted that the determination coefficient R² between CAI and QC, UCS, BTS, EQC, and D are 0.028, 0.16, 0.31, 0.004, and 0.24, respectively. The results show that the UCS, BTS, and D are positively correlated with CAI. As the strength increases, the hard particles within the rock are less likely to spall off during the CAI test, resulting in a two-body abrasive wear between the stylus and the rock surface. When the strength is low, the detached hard particles can form three-body abrasive particles. These particles roll between the stylus and the rock, thereby reducing both the relative slip and the CAI value. Likewise, an increase in the average quartz size will produce larger and deeper scratches, leading to an increase in CAI. The results also indicated that there are no linear correlations between CAI and EQC and QC.

The linear relationship and 95% fitting confidence band between CAI and the calculated input parameters are shown in Figure 4. The results show that the B₃, SF-a, and RAI are positively correlated with CAI. The indices RAI and SF-a have both been proposed to describe the abrasiveness of rocks. Hence, the positive correlation between RAI, SF-a, and CAI is reasonable. On the contrary, the B₁ and B₂ are negatively correlated with CAI. As can be seen from Figure 4a–c, the determination coefficient R² of CAI and three different brittleness indices B₁, B₂, and B₃ are 0.016, 0.021, and 0.26. Compared to the brittleness indices B₁ and B₂, the linear correlation is best between CAI and B₃. Figure 4d,e show the determination coefficient R² of CAI with RAI and SF-a are 0.103 and 0.023, respectively.

Overall, UCS, BTS, D, B3, and RAI had the better predictive effects and were positively correlated with CAI, with coefficients of determination R² of 0.16, 0.31, 0.24, 0.26, and 0.10, respectively. The greatest value of R² between CAI and different original input parameters is 0.31, and the greatest value of R² between CAI and different derived input parameters is 0.26. Hence, it is not suitable to predict CAI with a single input parameter [15].

3. Gene Expression Programming

3.1. Overview of GEP Algorithm

Based on the concepts of natural selection and genetics, Ferreira [38] proposed the GEP algorithm based on genetic programming (GP). GEP is a type of genetic algorithm that uses artificial intelligence to automatically generate mathematical models [39]. It screens the population based on individual fitness and introduces genetic operators to perform gene mutations, ultimately finding the optimal individual. The population individuals in GEP are chromosomes of fixed length, and the chromosomes are composed of multiple genes (characters representing corresponding input parameters and operators) [40]. The schematic diagram of the GEP algorithm operation process is shown in Figure 5. First, a chromosome population is randomly generated based on the given input parameters and operator symbols. The chromosome individuals in the initial population are expressed as a corresponding mathematical model using a parse tree model. The fitness of each individual is calculated using a fitness calculation method. Then, based on roulette wheel method, the initial generation of individuals is selectively inherited, and the inherited individuals are mutated through mutation techniques such as gene mutation, recombination, and transposition. The individuals with the highest fitness are selected and combined with the inherited/mutated individuals to form the next-generation chromosome population. The process of inheritance and mutation removes individuals with low fitness and makes the population evolve towards high fitness. Finally, the algorithm terminates when the termination condition is met.

The population individual expression method based on the parse tree is shown in Figure 6. The parse tree expresses the mathematical model contained in the individual in a structured and hierarchical way. The original chromosome gene of an individual is +.sqrt.ln.−.+.×.b.c.2.a.c, and the parse tree is shown in Figure 6a. The pink genes in the figure represent mathematical operation symbols, and the yellow genes represent input parameters. After gene mutation, the mutated chromosome genome becomes +.sqrt.ln./.a.×.b.c.2.a.c, and the parse tree is shown in Figure 6b. The red genes in the figure represent mutated genes, and the gray genes represent genes that are not expressed. After the gene mutation, gene c and gene 2 become recessive genes. The mathematical model expressed by the chromosome individual gene before and after the mutation is shown in Equation (7).

\sqrt{a \times c - b} + \ln (c + 2) \Rightarrow \sqrt{a \times c / b} + \ln (a)

(7)

3.2. GEP Algorithm Parameters

During the GEP process, the dataset needs to be divided into a training set and a test set. The training set is used to learn and iterate the mathematical model using GEP. The test set is used to validate the mathematical model proposed by the training set. The test set can prevent overfitting during GEP iterations. For the mechanical learning process, the optimal ratio between the test set and the training set is 0.4 [40]. Therefore, this paper randomly selected 80% of the data (65 data points) as the training set and the remaining 20% of the data (16 data points) as the test set.

Previous studies have shown that hyperparameters such as the number of chromosome genes and gene length can affect the prediction accuracy of iterative models [37]. In this study, a trial-and-error approach [37] was used to investigate the impact of these hyperparameters on the model’s performance. When the number of chromosome genes exceeded 3 and the gene length exceeded 13, the optimal fitness of the GEP model after iteration did not significantly improve. The addition operator (+) was used to link genes on different chromosomes [37,38]. Furthermore, the proposed algorithm allows genes to randomly select integers from the range (−10, 10) as input parameters, which expands the model’s spatial search capabilities [37]. The aim of research is proposing a conceptual prediction model. Hence, in addition to the regular function symbols, exponential and trigonometric functions are also selected as symbols [37]. Table 3 shows the parameters and hyperparameters used in the GEP.

3.3. Fitness Evaluation Method

In this study, the GEP algorithm was used to evaluate the relationship between CAI and mechanical parameters and petrographic properties. In order to obtain the best mathematical prediction model, the coefficient of determination R² was used to detect the fitness of the individual during the training stages. To further assess the model’s prediction accuracy, this article will also use the root mean square error (RMSE), mean absolute percentage error (MAPE), and mean absolute error (MAE) to evaluate the model’s prediction results. And the calculation equations of the four parameters are as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}},

(8)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{n}},

(9)

M A P E = \frac{1}{n} [\frac{\sum_{i = 1}^{n} |x_{i} - y_{i}|}{\sum_{i = 1}^{n} x_{i}} \cdot 100],

(10)

M A E = \frac{\sum_{i = 1}^{n} |x_{i} - y_{i}|}{n},

(11)

where n is the sample size of the database,

x_{i}

is the experimental data,

y_{i}

is the predicted value, and

\bar{x}

denotes the mean value of the experimental value.

4. Results and Discussion

4.1. GEP-Based Model

During the iterative training of the GEP, the coefficient of determination R² is used to evaluate the accuracy of the mathematical model. Figure 7a shows the determination coefficient of the optimal mathematical model in each generation on the training sets. At the 1722nd generation, the R² reaches its maximum value and remains constant after this. Figure 7b–d show the prediction performance of the optimal model on the training set, test set, and all data, respectively. The results show that the coefficients of determination of the optimal model prediction results with the training set, test set, and all data are 0.911, 0.907, and 0.906, respectively. The R² values are all greater than 0.9, indicating that the model has good prediction performance. In addition, the coefficient of determination of the training set and the test set is not much different, indicating that the model is not overfitting.

Figure 8 shows the parse tree diagram of the optimal individual iterated by GEP. Each chromosome consists of three genes, and different genes are connected by the ‘+’ function. And the corresponding mathematical model is as follows:

C A I = B_{2} + D + \exp (B_{2}) + \tan (D) + \frac{(E Q C + B_{3} - B T S - Q C)}{B_{1} + Q C / B_{3}} - 3,

(12)

4.2. Multiple Linear Regression Model

The multiple linear regression method is adopted to analyze the relationships between CAI and input parameters. Considering that this study selected a large number of dependent variables, this paper used stepwise multiple linear regression for analysis. Stepwise multiple linear regression is a statistical method for feature selection. During the analysis process, independent variables are gradually added or deleted through an iterative process based on their impact on the dependent variable, thereby constructing a simple and effective regression model [1]. The general form of the multiple linear regression equation is as follows:

Y = α_{0} + α_{1} X_{1} + α_{2} X_{2} + \dots + α_{p} X_{p} + ε,

(13)

According to the principle of multiple linear regression, the CAI was set as the dependent variable, and the QC, UCS, BTS, EQC, D, B₁, B₂, B₃, RAI, and SF-a were used as independent parameters. The results of stepwise multiple linear regression are shown in Table 4 and Table 5.

The results of the analysis of variance (ANOVA) of different models are shown in Table 4. The F hypothesis test is used to verify the overall utility of different models. The null hypothesis (H0) states that there is no correlation between the dependent variable and the independent variable of different models. The alternative hypothesis (Ha) is the opposite of the null hypothesis. When F(model) > F(critical), the null hypothesis is rejected and the alternative hypothesis is accepted. According to the F distribution table, the critical F values of the four models are F_0.05(1,79) = 3.96, F_0.05(2,78) = 3.114, F_0.05(3,77) = 2.723, and F_0.05(4,76) = 2.492, respectively. It can be seen from Table 5, F(model 1) > F(critical) = 3.96, F(model 2) > F(critical) = 3.114, F(model 3) > F(critical) = 2.723, F(model 4) > F(critical) = 2.492. Hence, the alternative hypothesis is accepted, which indicates that a correlation exists between independent parameters and dependent parameters in different models. In addition, the P values of different models were all less than the significance level of 0.05. This shows that at the 95% confidence level, all four models are statistically significant.

Table 5 shows the regression results for the independence parameter coefficients of the four models. As can be seen in Table 5, the dependent variables in all four models are all initial input variables. The calculated input variables are all calculated from the initial input variables and have no significant effect on CAI compared to the initial input variables [18]. Therefore, all derived input variables are excluded from the stepwise regression. Similarly, the UCS was excluded from the stepwise regression analysis. The correlation analysis between the UCS and BTS in Figure 2 shows a significant correlation between the two parameters. Rock surface damage in the CAI test is significantly affected by the BTS, so it is reasonable to consider the BTS in the model while excluding the UCS. The determination coefficient R² of the four regression models was 0.31, 0.58, 0.62, and 0.75, respectively. The determination coefficient increases as the dependent variable increases. Model 4 has the largest R² and a better prediction accuracy. The four different regression models are as follows:

C A I = 0.19 B T S + 1.13,

(14)

C A I = 0.2 B T S + 2.06 D + 0.26,

(15)

C A I = 0.18 B T S + 2.37 D - 0.01 Q C + 0.84,

(16)

C A I = 0.16 B T S + 1.57 D - 0.07 Q C + 0.07 E Q C + 0.16 .

(17)

4.3. Models’ Goodness of Fit

The prediction performance of the GEP-based model was compared with MLR-based models. And the four performance indices of R², RMSE, MAPE, and MAE were calculated in the different models. The results are shown in Table 6.

For the GEP-based model, values of R², RMSE, MAPE, and MAE obtained are 0.906, 0.46, 0.18, and 0.37, respectively. For the MLR-based model 1, values of R², RMSE, MAPE, and MAE obtained are 0.31, 1.20, 0.48, and 1.00, respectively. For the MLR-based model 2, the values of R², RMSE, MAPE, and MAE obtained are 0.58, 0.94, 0.38, and 0.79, respectively. For the MLR-based model 3, the values of R², RMSE, MAPE, and MAE obtained are 0.62, 0.90, 0.36, and 0.75, respectively. For the MLR-based model 4, the values of R², RMSE, MAPE, and MAE obtained are 0.75, 0.73, 0.28, and 0.58, respectively. As can be seen from the table, among the MLR-based models, model 4 has higher prediction accuracy. Compared with the R², RMSE, MAPE, and MAE indices values in the MLR-based model 4, the values based on the GEP model changed by 0.16, −0.26, −0.10, and −0.21, respectively. The results indicated that the GEP-based model outperforms the MLR-based model in all indicators. Based on machine learning theory, the optimal mathematical model generated by the GEP algorithm has higher accuracy in predicting CAI. The displayed mathematical model is not only convenient for engineers to use directly in the field, but also facilitates further research by scholars.

4.4. Limitation

Compared with the multiple linear regression model, although the model proposed based on GEP can accurately predict CAI, it is not universal due to its limitations. The limitations of the prediction model developed in this paper are mainly reflected in the following aspects:

(1) Previous studies have shown that factors such as stylus hardness, rock surface, and testing equipment can affect the test value of CAI. The dataset in this paper contains CAI test results from different studies, which may have used different test programs. The data from different studies should be normalized appropriately.

(2) There are only 81 samples in the dataset, which lead to a limitation in the coverage of the data. In addition, the dataset covers different rock types, the dataset should either be divided according to rock type or categorical variables related to lithology should be included in the model.

(3) Although using derived parameters as input parameters can reduce the complexity of the model, using both direct and derived parameters simultaneously can cause collinearity issues and may lead to biases in the analysis.

5. Conclusions

Cerchar abrasivity index (CAI) is an important index for assessing rock abrasiveness. Accurately predicting CAI is crucial for reducing tool wear and estimating tool replacement time. This study proposed different models for predicting CAI using the Gene Expression Programming (GEP) algorithm and the multivariate linear regression (MLR) algorithm. A total of 81 rock datasets were collected from the studies. The values of CAI were used as the output variable, and parameters such as quartz content (QC), uniaxial compressive strength (UCS), Brazilian splitting strength (BTS), equivalent quartz content (EQC), average quartz size (D), brittleness indices, rock abrasive index (RAI), and Schimazek’s F-abrasiveness (SF-a) were used as input variables. Further investigating the predictive performance of different models, the following main conclusions were drawn:

(1) The predictive effects of different input parameters on CAI were studied using a univariate linear regression method. Among them, UCS, BTS, D, B₃, and rock abrasive index had the better predictive effects and were positively correlated with CAI, with coefficients of determination R² of 0.16, 0.31, 0.24, 0.26, and 0.10, respectively. The optimal R² is 0.31 when using a single parameter to predict CAI. Hence, it is not suitable to predict CAI with a single input parameter.

(2) Four prediction models were proposed using stepwise multiple linear regression. The model’s prediction accuracy increased with the number of independent variables. The coefficient of determination of the optimal model was 0.75. At the same time, the F-test results showed that there was a correlation between the independent variable parameters and CAI.

(3) The prediction performance of the GEP-based model and the MLR-based model was compared using four indicators. For the GEP-based model, R², RMSE, MAPE, and MAE obtained are 0.906, 0.46, 0.18, and 0.37, respectively. For the optimal MLR-based model, R², RMSE, MAPE, and MAE obtained are 0.75, 0.73, 0.28, and 0.58, respectively. The GEP-based model was found to be superior to MLR-based models.

Author Contributions

J.S.: Original draft, Validation, Software, Data curation, Conceptualization. X.F.: Conceptualization, Investigation, Review and editing. H.W.: Project administration, Supervision, Methodology. Y.S.: Resource, Writing—review and editing. C.S.: Methodology, Conceptualization. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key R&D Projects in Henan Province, grant number (251111222700). And the APC was funded by China Railway Engineering Equipment Group Tunnel Equipment Manufacturing Co., Ltd.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors are grateful to Haifeng Jiang for his help in this paper.

Conflicts of Interest

Author Xiaohua Fan was employed by the company Huaneng Xizang the Yarlung Zangbo River Hydropower Development Investment CO., Ltd. Author Hao Wang and Yong Shang was employed by the company China Railway Engineering Equipment Group Tunnel Equipment Manufacturing CO., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from China Railway Engineering Equipment Group Tunnel Equipment Manufacturing CO., Ltd. The funder had the following involvement with the study: study design, collection, writing review.

Appendix A

Table A1. Datasets collected from the literature.

No.	Rock Type	QC/%	UCS/MPa	BTS/MPa	EQC/%	D/mm	B1	B2	B3/MPa	SF-a/N/mm	RAI	CAI
1	Granodiorite [19]	25.10	110.00	14.88	44.01	0.56	7.39	0.76	28.61	3.67	48.41	5.22
2	Monzogranite [19]	21.10	95.00	11.28	38.63	0.42	8.42	0.79	23.15	1.83	36.70	5.06
3	Monzogranite [19]	19.30	98.00	11.34	39.15	0.48	8.64	0.79	23.57	2.13	38.37	5.03
4	Monzogranite [19]	28.40	124.00	16.41	47.40	0.72	7.56	0.77	31.90	5.60	58.78	5.67
5	Monzogranite [19]	25.20	115.00	15.64	44.89	0.57	7.35	0.76	29.99	4.00	51.62	5.34
6	Granodiorite [19]	18.90	110.00	14.50	40.73	0.42	7.59	0.77	28.24	2.48	44.80	5.29
7	Monzonite [19]	6.80	105.00	12.19	30.87	0.35	8.61	0.79	25.30	1.32	32.41	5.12
8	Granodiorite [19]	21.70	94.00	11.25	42.61	0.45	8.36	0.79	22.99	2.16	40.06	5.10
9	Monzonite [19]	5.60	102.50	12.02	31.59	0.25	8.53	0.79	24.82	0.95	32.38	4.65
10	Monzogranite [19]	29.10	105.00	12.68	46.73	0.52	8.28	0.78	25.80	3.08	49.07	5.14
11	Granodiorite [19]	34.80	141.50	17.45	53.37	0.75	8.11	0.78	35.14	6.98	75.51	5.82
12	Monzonite [19]	5.20	95.00	10.64	26.29	0.16	8.93	0.80	22.48	0.45	24.97	5.07
13	Dolerite [29]	5.00	214.50	6.76	37.39	0.22	31.73	0.94	26.93	0.57	80.19	3.24
14	Dolerite [29]	7.00	199.30	9.82	40.96	0.23	20.30	0.91	31.28	0.91	81.63	3.54
15	Granite [29]	74.00	83.81	3.37	81.86	1.10	24.87	0.92	11.88	3.04	68.61	4.61
16	Granite [29]	73.00	231.99	18.65	82.06	0.39	12.44	0.85	46.51	5.91	190.38	3.59
17	Granite [29]	24.60	44.80	2.30	58.85	2.50	19.48	0.90	7.18	3.38	26.37	3.90
18	Migmatite [29]	70.01	56.76	2.27	79.58	1.22	25.00	0.92	8.03	2.19	45.17	4.32
19	Andesite [29]	10.01	231.46	14.07	36.42	0.18	16.46	0.89	40.35	0.92	84.30	3.53
20	Diorite [29]	20.01	171.20	5.30	49.16	0.48	32.30	0.94	21.30	1.25	84.16	3.65
21	Granite [29]	67.00	53.90	2.23	78.45	1.19	24.12	0.92	7.76	20.00	42.28	3.907
22	Phyllite [29]	50.00	54.33	4.10	53.95	0.14	13.25	0.86	10.55	22.00	29.31	1.433
23	Dolomite [29]	2.00	144.43	11.96	7.04	0.05	12.08	0.85	29.39	30.00	10.17	2.223
24	Sandstone [16]	75.00	85.54	7.93	75.09	0.58	10.79	0.83	18.42	3.45	64.23	2.45
25	Sandstone [16]	80.01	77.45	6.29	81.69	0.78	12.31	0.85	15.61	4.01	63.27	3.05
26	Sandstone [16]	70.01	87.36	8.71	70.40	0.36	10.03	0.82	19.51	2.21	61.50	1.60
27	Sandstone [16]	60.01	77.05	6.28	60.70	0.33	12.27	0.85	15.55	1.26	46.77	1.50
28	Sandstone [16]	55.00	116.20	8.60	56.01	0.38	13.51	0.86	22.35	1.83	65.08	1.50
29	Siltstone [16]	70.01	61.51	8.63	70.33	0.06	7.13	0.75	16.29	0.38	43.26	1.15
30	Siltstone [16]	30.01	73.20	8.20	31.02	0.06	8.93	0.80	17.32	0.15	22.71	1.00
31	Siltstone [16]	40.01	70.10	7.30	44.25	0.06	9.60	0.81	16.00	0.20	31.02	1.25
32	Siltstone [16]	50.01	62.50	7.18	52.80	0.05	8.70	0.79	14.98	0.19	33.00	0.80
33	Mudstone [16]	10.01	44.65	5.89	10.00	0.04	7.58	0.77	11.47	0.02	4.47	0.80
34	Mudstone [16]	10.01	45.86	5.89	10.00	0.04	7.79	0.77	11.62	0.02	4.59	0.70
35	Sandstone [16]	75.01	123.21	7.42	75.45	0.52	16.61	0.89	21.38	2.91	92.96	1.90
36	Sandstone [16]	70.01	103.40	6.74	71.03	0.54	15.34	0.88	18.67	2.59	73.45	2.00
37	Sandstone [16]	65.01	89.79	9.24	65.12	0.40	9.72	0.81	20.37	2.41	58.47	1.65
38	Sandstone [16]	45.01	78.65	7.84	51.90	0.40	10.03	0.82	17.56	1.63	40.82	1.72
39	Siltstone [16]	40.01	83.20	7.20	41.45	0.06	11.56	0.84	17.31	0.19	34.49	0.70
40	Sandstone [16]	80.01	76.33	8.32	80.35	0.92	9.17	0.80	17.82	6.15	61.33	2.92
41	Sandstone [16]	65.01	56.93	5.73	68.60	0.40	9.94	0.82	12.77	1.57	39.05	2.22
42	Sandstone [16]	65.01	96.40	8.20	68.47	0.42	11.76	0.84	19.88	2.36	66.01	1.50
43	Sandstone [16]	55.01	126.60	10.80	58.70	0.60	11.72	0.84	26.15	3.80	74.31	2.60
44	Sandstone [16]	50.01	66.92	8.70	53.72	0.55	7.69	0.77	17.06	2.57	35.95	2.30
45	Sandstone [16]	70.01	98.64	9.86	71.95	0.58	10.00	0.82	22.05	4.11	70.97	2.44
46	Siltstone [16]	40.01	58.31	7.03	42.45	0.05	8.29	0.78	14.32	0.15	24.75	0.50
47	Siltstone [16]	45.01	64.81	6.84	49.13	0.07	9.48	0.81	14.89	0.24	31.84	1.20
48	Sandstone [16]	40.01	72.14	6.21	42.29	0.25	11.62	0.84	14.97	0.66	30.51	1.10
49	Sandstone [16]	80.01	85.56	8.32	81.75	0.75	10.28	0.82	18.87	5.10	69.95	2.67
50	Siltstone [16]	45.01	56.37	6.05	49.05	0.07	9.32	0.81	13.06	0.21	27.65	0.55
51	Sandstone [16]	85.00	128.40	10.60	85.23	0.75	12.11	0.85	26.09	27.00	109.44	3.1
52	Siltstone [29]	22.01	57.88	9.02	36.31	0.15	6.42	0.73	16.15	0.50	21.02	2.22
53	Sandstone [29]	68.01	39.80	1.85	72.27	0.41	21.56	0.91	6.06	0.55	28.77	1.78
54	Sandstone [29]	67.01	41.55	0.48	77.51	0.24	86.56	0.98	3.16	0.09	32.20	0.62
55	Sandstone [29]	64.01	127.60	6.38	79.36	0.59	20.00	0.90	20.18	2.98	101.26	3.92
56	Sandstone [29]	78.01	26.73	1.45	84.26	0.39	18.47	0.90	4.40	0.48	22.52	1.41
57	Sandstone [29]	62.31	44.00	2.84	76.63	0.51	15.50	0.88	7.90	1.12	33.72	3.04
58	Sandstone [29]	70.11	109.73	6.03	89.70	0.72	18.20	0.90	18.19	3.87	98.43	3.30
59	Sandstone [29]	67.51	61.51	7.32	89.69	0.11	8.40	0.79	15.00	0.70	55.17	2.03
60	Sandstone [29]	55.51	11.04	1.31	69.31	0.41	8.43	0.79	2.69	0.38	7.65	1.43
61	Sandstone [29]	78.01	29.04	1.87	84.47	0.59	15.53	0.88	5.21	0.93	24.53	2.32
62	Sandstone [29]	75.01	16.69	0.70	76.91	0.45	23.84	0.92	2.42	0.24	12.84	1.39
63	Sandstone [29]	73.01	21.18	2.05	82.43	0.58	10.33	0.82	4.66	0.98	17.46	1.95
64	Sandstone [29]	55.01	27.09	1.61	64.62	0.24	16.82	0.89	4.67	0.25	17.50	1.62
65	Sandstone [29]	77.01	46.40	1.60	85.33	0.27	29.00	0.93	6.09	0.37	39.59	1.64
66	Sandstone [29]	72.51	17.07	0.86	83.90	0.43	19.85	0.90	2.71	0.31	14.32	1.26
67	Sandstone [29]	78.01	69.04	6.10	84.26	0.09	11.32	0.84	14.51	0.46	58.17	1.94
68	Dolomite [29]	1.01	61.84	6.54	6.01	0.05	9.46	0.81	14.22	0.02	3.71	2.12
69	Dolomite [29]	2.51	99.93	12.53	7.32	0.35	7.98	0.78	25.02	0.32	7.32	2.45
70	Dolomite [29]	10.01	132.70	6.65	17.77	0.18	19.95	0.90	21.01	0.21	23.57	2.50
71	Limestone [29]	0.01	95.78	4.60	2.60	0.00	20.80	0.91	14.85	0.00	2.49	1.10
72	Limestone [29]	0.01	80.70	5.62	3.44	0.00	14.36	0.87	15.06	0.00	2.77	1.48
73	Limestone [29]	0.01	66.45	5.39	2.03	0.00	12.33	0.85	13.38	0.00	1.35	0.96
74	Limestone [29]	0.01	92.75	7.89	2.22	0.00	11.75	0.84	19.13	0.00	2.06	1.16
75	Tuff [32]	70.00	313.20	16.00	84.34	0.47	19.58	0.90	50.06	5.00	264.15	3
76	Sandstone [32]	48.00	118.80	4.60	50.44	0.10	25.83	0.93	16.53	7.00	59.92	1.62
77	Mudstone [32]	8.00	22.50	1.60	12.75	0.06	14.06	0.87	4.24	10.00	2.87	1.39
78	Sandstone [32]	35.00	105.40	5.10	54.63	0.22	20.67	0.91	16.39	11.00	57.58	2.95
79	Sandstone [32]	25.00	206.70	10.23	28.62	0.09	20.21	0.91	32.52	13.00	59.16	2.43
80	Sandstone [32]	70.00	163.83	7.19	73.31	0.75	22.79	0.92	24.27	14.00	120.10	3.78
81	Sandstone [32]	15.00	119.46	3.56	17.93	0.12	33.56	0.94	14.58	16.00	21.42	1.75

References

Zhang, S.-R.; She, L.; Wang, C.; Wang, Y.-J.; Cao, R.-L.; Li, Y.-L.; Cao, K.-L. Investigation on the relationship among the Cerchar abrasivity index, drilling parameters and physical and mechanical properties of the rock. Tunn. Undergr. Space Technol. 2021, 112, 103907. [Google Scholar] [CrossRef]
Sun, B.; Zhang, S.; Deng, M.; Wang, C. Nonlinear dynamic analysis and damage evaluation of hydraulic arched tunnels under mainshock–aftershock ground motion sequences. Tunn. Undergr. Space Technol. 2020, 98, 103321. [Google Scholar] [CrossRef]
Abu Bakar, M.Z.; Majeed, Y.; Rostami, J. Effects of rock water content on Cerchar abrasivity index. Wear 2016, 368–369, 132–145. [Google Scholar] [CrossRef]
Sun, J.; Wang, K.; Wei, J.; Shang, Y.; Sun, C.; Ma, F. A mechanics model of constant cross-section type disc cutter based on dense core forming mechanism. Tunn. Undergr. Space Technol. 2023, 140, 105301. [Google Scholar] [CrossRef]
Zhang, G.; Konietzky, H.; Song, Z.; Zhang, M. Study of Cerchar abrasive parameters and their relations to intrinsic properties of rocks for construction. Constr. Build. Mater. 2020, 244, 118327. [Google Scholar] [CrossRef]
Kahraman, S.; Alber, M.; Fener, M.; Gunaydin, O. The usability of Cerchar abrasivity index for the prediction of UCS and E of Misis Fault Breccia: Regression and artificial neural networks analysis. Expert Syst. Appl. 2010, 37, 8750–8756. [Google Scholar] [CrossRef]
Golovanevskiy, V.A.; Bearman, R.A. Gouging abrasion test for rock abrasiveness testing. Int. J. Miner. Process. 2008, 85, 111–120. [Google Scholar] [CrossRef]
Perez, S.; Karakus, M.; Sepulveda, E. A preliminary study on the role of acoustic emission on inferring Cerchar abrasivity index of rocks using artificial neural network. Wear 2015, 344–345, 1–8. [Google Scholar] [CrossRef]
Deliormanlı, A.H. Cerchar abrasivity index (CAI) and its relation to strength and abrasion test methods for marble stones. Constr. Build. Mater. 2012, 30, 16–21. [Google Scholar] [CrossRef]
Yaralı, O.; Duru, H. Investigation into effect of scratch length and surface condition on Cerchar abrasivity index. Tunn. Undergr. Space Technol. 2016, 60, 111–120. [Google Scholar] [CrossRef]
Sun, J.; Shang, Y.; Wang, K.; Wang, C.; Ma, F.; Sun, C. A new prediction model for disc cutter wear based on Cerchar abrasivity index. Wear 2023, 526–527, 204927. [Google Scholar] [CrossRef]
Aydın, H. Investigating the effects of various testing parameters on Cerchar abrasivity index and its repeatability. Wear 2019, 418–419, 61–74. [Google Scholar] [CrossRef]
Michalakopoulos, T.N.; Anagnostou, V.G.; Bassanou, M.E.; Panagiotou, G.N. The influence of steel styli hardness on the Cerchar abrasiveness index value. Int. J. Rock Mech. Min. Sci. 2006, 43, 321–327. [Google Scholar] [CrossRef]
Teymen, A. The usability of Cerchar abrasivity index for the estimation of mechanical rock properties. Int. J. Rock Mech. Min. Sci. 2020, 128, 104258. [Google Scholar] [CrossRef]
Ko, T.Y.; Kim, T.K.; Son, Y.; Jeon, S. Effect of geomechanical properties on Cerchar abrasivity index (CAI) and its application to TBM tunnelling. Tunn. Undergr. Space Technol. 2016, 57, 99–111. [Google Scholar] [CrossRef]
Yaralı, O.; Yaşar, E.; Bacak, G.; Ranjith, P.G. A study of rock abrasivity and tool wear in coal measures rocks. Int. J. Coal Geol. 2008, 74, 53–66. [Google Scholar] [CrossRef]
Alber, M. Stress dependency of the Cerchar abrasivity index (CAI) and its effects on wear of selected rock cutting tools. Tunn. Undergr. Space Technol. 2008, 23, 351–359. [Google Scholar] [CrossRef]
Moradizadeh, M.; Cheshomi, A.; Ghafoori, M.; TrighAzali, S. Correlation of equivalent quartz content, Slake durability index and Is50 with Cerchar abrasiveness index for different types of rock. Int. J. Rock Mech. Min. Sci. 2016, 86, 42–47. [Google Scholar] [CrossRef]
Er, S.; Tuğrul, A. Correlation of physico-mechanical properties of granitic rocks with Cerchar abrasivity index in Turkey. Measurement 2016, 91, 114–123. [Google Scholar] [CrossRef]
Elbaz, K.; Shen, S.-L.; Zhou, A.; Yin, Z.-Y.; Lyu, H.-M. Prediction of disc cutter life during shield tunneling with AI via the incorporation of a genetic algorithm into a GMDH-Type neural network. Engineering 2021, 7, 238–251. [Google Scholar] [CrossRef]
Qi, H.; Zhou, J.; Khandelwal, M.; Onifade, M.; Lawal, A.I.; Li, C.; Bada, S.O.; Genc, B. An optimized machine learning framework for prediction of coal abrasive index: Leveraging supervised learning, metaheuristic optimization, and interpretability analysis. Fuel 2026, 403, 136065. [Google Scholar] [CrossRef]
Houshmand, N.; Esmaeili, K.; Goodfellow, S.; Carlos Ordóñez-Calderón, J. Predicting rock hardness using Gaussian weighted moving average filter on borehole data and machine learning. Miner. Eng. 2023, 204, 108448. [Google Scholar] [CrossRef]
Geng, Q.; Huang, Y.; Chen, J.; Wang, X.; Liu, W.; Luo, Y.; Zhang, Z.; Ye, M. Prediction of rock-breaking forces of tunnel boring machine (TBM) disc cutter based on machine learning methods. Tunn. Undergr. Space Technol. 2025, 163, 106682. [Google Scholar] [CrossRef]
Shin, Y.J.; Kwon, K.; Bae, A.; Choi, H.; Kim, D. Machine learning-based prediction model for disc cutter life in TBM excavation through hard rock formations. Tunn. Undergr. Space Technol. 2024, 150, 105826. [Google Scholar] [CrossRef]
Tripathy, A.; Singh, T.N.; Kundu, J. Prediction of abrasiveness index of some Indian rocks using soft computing methods. Measurement 2015, 68, 302–309. [Google Scholar] [CrossRef]
Onifade, M.; Lawal, A.I.; Bada, S.O.; Khandelwal, M. Predictive modelling for coal abrasive index: Unveiling influential factors through Shallow and Deep Neural Networks. Fuel 2024, 374, 132319. [Google Scholar] [CrossRef]
Barzegari, G.; Khodayari, J.; Rostami, J. Evaluation of TBM cutter wear in Naghadeh water conveyance tunnel and developing a new prediction model. Rock Mech. Rock Eng. 2021, 54, 6281–6297. [Google Scholar] [CrossRef]
Sun, Z.; Zhao, H.; Hong, K.; Chen, K.; Zhou, J.; Li, F.; Zhang, B.; Song, F.; Yang, Y.; He, R. A practical TBM cutter wear prediction model for disc cutter life and rock wear ability. Tunn. Undergr. Space Technol. 2019, 85, 92–99. [Google Scholar] [CrossRef]
Majeed, Y.; Abu Bakar, M.Z. Statistical evaluation of Cerchar abrasivity index (CAI) measurement methods and dependence on petrographic and mechanical properties of selected rocks of Pakistan. Bull. Eng. Geol. Environ. 2016, 75, 1341–1360. [Google Scholar] [CrossRef]
Rostami, J.; Ghasemi, A.; Alavi Gharahbagh, E.; Dogruoz, C.; Dahl, F. Study of dominant factors affecting Cerchar abrasivity index. Rock Mech. Rock Eng. 2014, 47, 1905–1919. [Google Scholar] [CrossRef]
Capik, M.; Yilmaz, A.O. Correlation between Cerchar abrasivity index, rock properties, and drill bit lifetime. Arab. J. Geosci. 2017, 10, 15. [Google Scholar] [CrossRef]
He, J.; Li, S.; Li, X.; Wang, X.; Guo, J. Study on the correlations between abrasiveness and mechanical properties of rocks combining with the microstructure characteristic. Rock Mech. Rock Eng. 2016, 49, 2945–2951. [Google Scholar] [CrossRef]
Ozdogan, M.V.; Deliormanli, A.H.; Yenice, H. The correlations between the Cerchar abrasivity index and the geomechanical properties of building stones. Arab. J. Geosci. 2018, 11, 604. [Google Scholar] [CrossRef]
Torrijo, F.J.; Garzón-Roca, J.; Company, J.; Cobos, G. Estimation of Cerchar abrasivity index of andesitic rocks in Ecuador from chemical compounds and petrographical properties using regression analyses. Bull. Eng. Geol. Environ. 2019, 78, 2331–2344. [Google Scholar] [CrossRef]
Plinninger, R.; Käsling, H.; Thuro, K.; Spaun, G. Testing conditions and geomechanical properties influencing the Cerchar abrasiveness index (CAI) value. Int. J. Rock Mech. Min. Sci. 2003, 40, 259–263. [Google Scholar] [CrossRef]
Hucka, V.; Das, B. Brittleness determination of rocks by different methods. Int. J. Rock Mech. Min. Sci. Geomech. Abstr. 1974, 11, 389–392. [Google Scholar] [CrossRef]
Shaffiee Haghshenas, S.; Shirani Faradonbeh, R.; Mikaeil, R.; Haghshenas, S.S.; Taheri, A.; Saghatforoush, A.; Dormishi, A. A new conventional criterion for the performance evaluation of gang saw machines. Measurement 2019, 146, 159–170. [Google Scholar] [CrossRef]
Sarıdemir, M. Genetic programming approach for prediction of compressive strength of concretes containing rice husk ash. Constr. Build. Mater. 2010, 24, 1911–1919. [Google Scholar] [CrossRef]
Korkmaz, E.; Tarım, S. Comprehensive analysis of impeller trimming modifications on submersible pump characteristics using a gene expression programming approach. Measurement 2025, 254, 117876. [Google Scholar] [CrossRef]
Abdullah, A.; Saddiqi, H.A.; Qasim, M.; Khitab, A.; Khan, M.; Ahmad, S. Anemia prediction using gene expression programming (GEP) and explainable artificial intelligence approaches. Comput. Biol. Med. 2025, 196, 110856. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The CAI distribution of collected data.

Figure 2. The correlation analysis of the database parameters.

Figure 3. The relationships between CAI and original input parameters: (a) QC, (b) UCS, (c) BTS, (d) EQC, and (e) D.

Figure 4. The relationships between CAI and calculated input parameters: (a) B₁, (b) B₂, (c) B₃, (d) SF-a, and (e) RAI.

Figure 5. Schematic representation of GEP.

Figure 6. GEP chromosome tree: (a) original, (b) mutation.

Figure 7. GEP model results: (a) generation process, (b) R² of train set, (c) R² of test set, (d) R² of all data.

Figure 8. Tree diagram of the optimal individual.

Table 1. Summary of CAI prediction literature.

Researchers	Year	Analysis Method	Rock Properties
Researchers	Year	Analysis Method	QC	EQC	UCS	BTS	Is₅₀	D	N
Ko [15]	2016	Regression analysis	√		√	√			√
Barzegari [27]	2021	Statistical analysis			√	√	√
Sun [28]	2019	Experiment analysis		√	√
Yaralı [16]	2008	Regression analysis	√	√	√	√	√	√	√
Deliormanlı [9]	2012	Regression analysis			√				√
Alber [17]	2008	Theoretical analysis	√	√	√			√	√
Perez [8]	2015	Neural network	√		√			√	√
Moradizadeh [18]	2016	Statistical analysis	√	√			√		√
Er [19]	2016	Regression analysis	√	√	√	√		√
Zhang [1]	2021	Regression analysis	√	√	√				√
Majeed [29]	2015	Statistical analysis	√	√	√	√		√
Rostami [30]	2014	Experiment analysis		√	√	√
Capik [31]	2017	Regression analysis		√	√	√	√
He [32]	2016	Regression analysis	√	√	√	√		√	√
Ozdogan [33]	2018	Regression analysis			√				√
Torrijo [34]	2019	Regression analysis		√				√
Plinninger [35]	2003	Regression analysis	√	√
Teymen [14]	2020	Regression analysis			√	√	√		√

QC is quartz content, Is₅₀ is point load index, D is average quartz diameter, N is other parameters.

Table 2. Descriptive statistics of parameters.

Parameter		Maximum	Minimum	Average	Standard Deviation
Input	QC/%	0.01	85.00	43.91	27.32
	UCS/MPa	11.04	313.20	92.28	53.64
	BTS/MPa	0.48	18.65	7.48	4.22
	EQC/%	2.03	89.70	53.23	25.50
	D/mm	0.00	2.50	0.39	0.37
	B1	6.42	86.56	14.90	10.41
	B2	0.73	0.98	0.85	0.06
	B3/MPa	2.42	50.06	18.16	9.53
	SF-a/N/mm	0.00	30.00	3.62	5.89
	RAI	1.35	264.15	47.79	40.08
Output	CAI	0.50	5.82	2.57	1.46

Table 3. The parameters and hyperparameters of GEP.

Type	Parameters/Hyperparameters	Value/Symbol
General setting	Input parameters	QC, UCS, BTS, EQC, D, B₁, B₂, B₃, SF-a, RAI
	Function symbol	+,−,*,/,sqrt,exp,^2,^3,tan,^(1/3)
	Fitness method	R²
	Population size	60
	Iteration number	3000
	Linking function	+
Genetic variation parameters	Mutation	0.2
	Inversion	0.2
	IS transposition	0.15
	RIS transposition	0.15
	One-point recombination	0.15
	Two-point recombination	0.15
	Gene recombination	0.1

* IS is the insertion sequence, RIS is the root insertion sequence.

Table 4. Analysis of variance (ANOVA) for different models.

Model		Sum of Squares	df	Mean Square	F	Sig.
1	Regression	52.51	1	52.51	35.45	0.00
	Residual	117.01	79	1.48
	Total	169.52	80
2	Regression	98.09	2	49.05	53.56	0.00
	Residual	71.43	78	0.916
	Total	169.52	80
3	Regression	105.34	3	35.11	42.13	0.00
	Residual	64.18	77	0.83
	Total	169.52	80
4	Regression	127.27	4	31.82	57.24	0.00
	Residual	42.247	76	0.56
	Total	169.52	80

Table 5. Stepwise regression coefficients.

Model		Unstandardized Coefficients		t	Sig.	R²
Model		B	Std. Error	t	Sig.	R²
1	Constant	1.13	0.28	4.09	0.00	0.31
1	BTS	0.19	0.03	5.95	0.00	0.31
2	Constant	0.26	0.25	1.02	0.31	0.58
	BTS	0.20	0.03	7.90	0.00
	D	2.06	0.29	7.06	0.00
3	Constant	0.84	0.31	2.70	0.01	0.62
	BTS	0.18	0.03	7.06	0.00
	D	2.37	0.30	7.96	0.00
	QC	−0.01	0.004	−2.95	0.00
4	Constant	0.16	0.27	0.60	0.55	0.75
	BTS	0.16	0.02	7.46	0.00
	D	1.57	0.28	5.70	0.00
	QC	−0.07	0.01	−7.14	0.00
	EQC	0.07	0.01	6.28	0.00

Table 6. Performance indices for different models.

Indices	GEP-Based	MLR-Based
Indices	GEP-Based	Model 1	Model 2	Model 3	Model 4
R²	0.906	0.31	0.58	0.62	0.75
RMSE	0.46	1.20	0.94	0.90	0.73
MAPE	0.18	0.48	0.38	0.36	0.28
MAE	0.37	1.00	0.79	0.75	0.58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, J.; Fan, X.; Wang, H.; Shang, Y.; Sun, C. New Prediction Model of Rock Cerchar Abrasivity Index Based on Gene Expression Programming. Appl. Sci. 2025, 15, 10901. https://doi.org/10.3390/app152010901

AMA Style

Sun J, Fan X, Wang H, Shang Y, Sun C. New Prediction Model of Rock Cerchar Abrasivity Index Based on Gene Expression Programming. Applied Sciences. 2025; 15(20):10901. https://doi.org/10.3390/app152010901

Chicago/Turabian Style

Sun, Jingdong, Xiaohua Fan, Hao Wang, Yong Shang, and Chaoyang Sun. 2025. "New Prediction Model of Rock Cerchar Abrasivity Index Based on Gene Expression Programming" Applied Sciences 15, no. 20: 10901. https://doi.org/10.3390/app152010901

APA Style

Sun, J., Fan, X., Wang, H., Shang, Y., & Sun, C. (2025). New Prediction Model of Rock Cerchar Abrasivity Index Based on Gene Expression Programming. Applied Sciences, 15(20), 10901. https://doi.org/10.3390/app152010901

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Prediction Model of Rock Cerchar Abrasivity Index Based on Gene Expression Programming

Abstract

1. Introduction

2. Database Creation and Statistics Analysis

2.1. Database Creation

2.2. Statistics Analysis

3. Gene Expression Programming

3.1. Overview of GEP Algorithm

3.2. GEP Algorithm Parameters

3.3. Fitness Evaluation Method

4. Results and Discussion

4.1. GEP-Based Model

4.2. Multiple Linear Regression Model

4.3. Models’ Goodness of Fit

4.4. Limitation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI