Classification-Based Regression Models for Prediction of the Mechanical Properties of Roller-Compacted Concrete Pavement

: In the field of pavement engineering, the determination of the mechanical characteristics is one of the essential processes for reliable material design and highway sustainability. Early determination of the mechanical characteristics of pavement is essential for road and highway construction and maintenance. Tensile strength (TS), compressive strength (CS), and flexural strength (FS) of roller-compacted concrete pavement (RCCP) are crucial characteristics. In this research, the classification-based regression models random forest (RF), M5rule model tree (M5rule), M5prime model tree (M5p), and chi-square automatic interaction detection (CHAID) are used for simulation of the mechanical characteristics of RCCP. A comprehensive and reliable dataset comprising 621, 326, and 290 data records for CS, TS, and FS experimental cases was extracted from several open sources in the literature. The mechanical properties are determined based on influential input combinations that are processed using principle component analysis (PCA). The PCA method specifies that volumetric/weighted content forms of experimental variables (e.g., coarse aggregate, fine aggregate, supplementary cementitious materials, water, and binder) and specimens’ age are the most effective inputs to generate better performance. Several statistical metrics were used to evaluate the proposed classification-based regression models. The RF model revealed an optimistic classification capacity of the CS, TS, and FS prediction of the RCCP in comparison with the CHAID, M5rule, and M5p models. Monte-Carlo simulation was used to verify the results in terms of the uncertainty and sensitivity of variables. Overall, the proposed methodology formed a reliable soft computing model that can be implemented for material engineering, construction, and design.


Introduction
In this technologically advanced world, along with advances in various scientific fields, the concrete industry has also grown, and such advances have resulted in the production of rollercompacted concrete pavement (RCCP). In recent years, the construction and maintenance of road pavements has become an important challenge [1,2]. The high cost of producing bituminous pavement and the quantity of petroleum contaminants in the environment necessitate the use of alternative technologies for solving roading problems [3]. Lower cement paste content and higher aggregate volume in RCCP have led to its low consistency, which results in greater durability of RCCP than bituminous asphalt. Higher temperature rise resistance, lower water absorption, better compressive strength, and less long-term deformation under load are other advantages of RCCP. In cold regions, RCCP is also resistant to frost cycles [4]. In addition, due to the impermeability of the constituent materials, it acts as an environmentally friendly pavement and presents no problem in the used regions. The use of pozzolanic materials to ensure sufficient compaction in the mixtures with standard fine-grained aggregates in the production of RCCP has also attracted interest due to lower production costs than cement and improved strength [5,6]. Therefore, this study explores the RCCP mixtures containing pozzolan. Pozzolans are mixed with the gels produced in the concrete and increase the concrete's hydration, thereby increasing the density of produced concrete and enhancing the chemical and mechanical properties of RCCP.
The important mechanical characteristics of concrete are highly influenced by the concrete mix design [7]. Parameters such as cement content, water-to-cement ratio, and cement substitutes affect the mechanical properties of concrete, which makes it difficult to predict the mechanical properties of concrete due to the presence of numerous parameters. In the mix design methods, effort has been made to reduce the cost of production. It is time-consuming and costly to use the regulation methods for the calculation of the mix design and it is necessary to comply with the conditions and assumptions of the regulations for all constituent materials of concrete [8][9][10]. Therefore, different researchers have presented valuable models using different mathematical techniques to estimate concrete behavior, which have mainly been based on linear and nonlinear regressions. Nowadays, methods based on machine learning (ML) have been successfully used in this field, and these models have generally stemmed from laboratory experiments and analyses.
To date, various ML techniques have been used to simulate the mechanical characteristics of concretes, including multivariate adaptive regression splines (MARS) [11], genetic expression programming (GEP) [12], artificial neural network (ANN) [13], adaptive neuro-fuzzy inference systems (ANFIS) [14], and support vector machines (SVM) [15]. For instance, Ashrafian et al. developed an evolutionary method based on a MARS-integrated water cycle algorithm to propose a nonlinear relationship between mixture components and the compressive strength of foamed cellular lightweight concrete [16]. Hardened strength estimation of recycled aggregate concrete using a traditional ANN system was considered by Deng et al. [17]. Sun et al. proposed an extended SVM model to estimate the permeability coefficient and unconfined compressive strength [18]. Shahmansouri et al. applied the GEP method to simulate the hardened characteristics and electrical resistivity of zeolite based eco-friendly concrete [19]. Feng et al. implemented an intelligent ML method, named the adaptive boosting approach, for estimating the compressive strength of concrete [20]. Iqbal et al. focused on comprehensive data to present a simple and robust model to formulate the mechanical characteristics of green concrete using a GEP approach [21]. Asteris et al. used datadriven methods for hardened properties of self-compacting concrete prediction as surrogate models [22]. Golafshani et al. predicted the compressive strength of normal and high-performance concretes using ANN and ANFIS hybridized with a grey wolf optimizer [23]. Yoon et al. presented a predictive model for the mechanical properties of lightweight aggregate concrete using an ANN method [24]. Dao et al. evaluated artificial intelligence approaches for simulation of compressive strength of geopolymer concrete [25]. Sun et al. applied an evolutionary algorithm to estimate and optimize the compressive strength of concrete mixtures [26]. Moayedi et al. applied an optimized ANN method in modeling of concrete slump [27].
Although the aforementioned ML methods provide reliable and robust tools for modeling concrete properties, they are complex and computationally costly during the learning phase. As such, classification-based regression methods as extended ensemble ML tools have the attractions of few setting parameters model development and robust resistance to overfitting [28]. They have become increasingly implemented for regression challenges because they are relatively simple, straightforward, flexible, and have relatively low computational cost [29]. Behnood et al. formulated the mechanical properties of poplar concretes based on the tree method [30,31]. Han et al. proposed an improved RF model to simulate the CS of high-performance concrete [32]. Mohamed used the RF technique to approximate the hardened properties of sustainable concrete [33]. Ashrafian et al. evaluated a tree-based heuristic regression model, named the M5p model tree, to predict the properties of fiber-reinforced concrete [34]. Gholampour et al. applied the M5 model tree to estimate the mechanical properties of coarse recycled aggregate concrete, and reported the influential predictor variables [35]. In the present research, classification by a regression method has been investigated for discovery of numerical dependencies applied in ML approaches. The capability of classification-based regression models to discover functional dependencies and efficient mechanisms for evaluation of model significance mean that they allow one to overcome the difficulties listed in the introduction [31][32][33]. To assess the characteristics of the presented approach, four benchmarks were applied for modeling the mechanical properties of RCCP.
The main goals of this study are: (1) development and evaluation of nonlinear decision treebased classification methods, including model tree M5rule (M5rule), chi-square automatic interaction detector (CHAID), RF, and M5p to simulate mechanical characteristics of RCCP (e.g., CS, TS, and FS); (2) improvement of the proposed regression-based models using principal component analysis (PCA) for better selection of predictor variables; (3) comparison of proposed models and integration of the advantages into the decision tree-based classification methods to build and evaluate the proposed models; (4) presenting a new ensemble-based method, CHAID, for mechanical characteristics estimation of RCCP for the first time in concrete technology prediction, which could potentially lead to enhanced estimation capability.
This research is organized into four different sections. The introduction describes the relevant research (Section 1). Section 2 proposes materials and methods, RCCP background, and the experimental dataset, and describes the investigated methods. We then present the modeling process, the training and testing phases, and a comparison of the developed models in Section 3. Finally, Section 4 summarizes the research findings.

Theoretical Background and Data Description
Proper blend design is challenging in production of high-quality concrete [36]. Mechanical characteristics, economic benefit, and project constructability should be considered when designing RCCP blends [37]. Among the types of concrete, RCCP has become conventional due to the fact that it has a simple production process and it can be sourced quickly fast when producing a large structure. RCCP blends have lower cement weight (110-120 kg/m 3 ), utilize natural aggregate, and are specified by the American Concrete Institute (ACI) standard 325-10R-95 as concrete incorporating less water, cement, and supplementary cementitious material [38].
A comprehensive and integrated dataset was utilized for building reliable simulation models based on ML techniques. A database was compiled from the open-source studies available in the literature . From this database, models of the mechanical characteristics of RCCP were developed using 621, 326, and 290 data records for CS, TS, and FS of RCCP, respectively, at ages of 1, 3, 7, 28, 90, and 180 days. The gathered datasets contain information about the mixture components of RCCP in different combinations. For the ML techniques, the originally collected experimental data was randomized and categorized into two phases. The training (calibration) phase is implemented for learning and used to construct the models for CS, TS, and FS. The testing (validation) phase is performed to evaluate the capability of the developed models. For the development of the proposed methods, 75% of the data (466, 245, and 218 data records) for CS, TS, and FS, respectively, were used for the training phase, while the remainder (155, 81, and 72 data records) were used for testing phase of the classification-based regression methods. A schematic workflow of the simulation procedure of mechanical characteristics using ML-based models is presented in Figure 1.

Random Forest
Breiman [61] proposed RF, a nonparametric and classification-based regression method [62,63]. Instead of parametric models, many easy-to-interpret decision trees are incorporated in the RF model. By integration of the decision tree model results, a more comprehensive estimation technique can be attained. The objective of the current research is estimation of the mechanical properties of RCCP via the regression approach. The training steps of RF are as follows [61][62][63].
(a) Based on the dataset, draw an instance that is chosen randomly with substitution. (b) Using the bootstrap instance, evolve a tree with these modifications: for each node, select the best randomized subset of m try descriptors (i.e., the number of predictors tried per each node). M try here has the role of a tuning a parameter in the RF algorithm. The tree is generated to its maximum size without pruning it.
(c) Stage (b) is iterated until the user-manual numbers of trees (ntree) are grown on the basis of the bootstrap instance of observations. The final prediction values are determined by combining all individual tree outcomes [61]. After growing K trees {Tk(x)}, the regression explanatory variables in RF is stated by the following formula: A new training set for each constructed RF regression tree is derived by replacing the original calibrating phase. Thus, after constructing a regression tree each time, through application of a randomized training sample, the out-of-bag instance is utilized for validating its precision [61].
The validation features improve the robustness of random forests due to the use of independent test data. The random forest algorithm is a feasible method for classification and regression purposes, and has many engineering applications, such as forecasting the properties of concrete [63].

M5 Rule Model Tree
The complex or hidden information in a dataset can be explored using the IF-THEN rules-based M5rule model tree, a commonly used model in machine learning for classification and regression tasks [64]. The M5rule model can create a single classification tree through repeated data splitting into groups while ensuring the uniformity in the output and applying some decision rules that are applicable to specific explanatory parameters [65]. The uniformity of the output can be estimated as the residual sum of the squares. The first stage involves the selection of the input variable for node splitting, which ensures the maximum uniformity of the resulting child nodes from the original parent nodes. The next step would be devoted to the selection of the other input variables which are the child nodes [66]. Having constructed the optimal regression tree, the next thing is to prune the tree to prevent overfitting, and for this purpose, a cross-validation process is applied for the selection of the model with the least prediction error.

M5 Prime Model Tree
The M5p model, which is based on linear regressions and decision trees, was first developed in 1992. A binary decision tree consists of the primary terminal node with extra leaf nodes, which provide a connection between input (independent) and output (dependent) parameters [67]. It is essential to bear in mind that decision trees are generally applied for categorical data, although it is also appropriate for quantitative type data [68]. The M5p model can be summarized in two main steps: (a) splitting input data to create a decision tree; it is reached when defining the standard deviation of each subset to find an appropriate primary node (parent node). Because of this step (splitting), the SD of the child node is smaller than the parent node; (b) testing each node in the decision tree to diminish the error. The standard deviation is calculated as: where sd represents the standard deviation, T is a set of examples that reach the primary node, and Tj represents the subset of patterns that possess the jth outcome of the potential set. Thus, as stated above, based on different processes of splitting the input data, the most probable error-reducing node is chosen. For the overfitting problem in decision trees, pruning techniques were used for omitting subtrees. This pruning technique is based on methods of linear regression functions. One of the strengths of this model over the M5rule model is its efficiency in learning and treating problems with high complexity. One of the features of this model is that its regression functions do not have many variables. The M5p model has widespread applications in engineering, medical, and agricultural disciplines [69].

Chi-Square Automatic Interaction Detector
This CHAID model was first introduced by Kass for use in qualitative and classified quantitative variables [70]. As a modeling approach, this algorithm is suitable for establishing the relationship between a dependent parameter and several independent parameters. The CHAID model is mainly characterized by the following: (1) finding the influential parameters in the final result by applying a chi-square test of independence; (2) useful in the combination of effective variable groups [71]. This implies that CHAID employs the chi-squared independence test to examine the significance of independent parameters within a classification in comparison to the dependent parameters [72]. The chi-square statistic is expressed as follows: where Oij is the observed value, and Eij is the predicted value. There are three stages in the CHAID model; merging, splitting, and stopping. The merging phase involves the application of the chisquare test to test the significance of each independent parameter. Each pair of dependent and independent parameters, as well as the probable tables, are subjected to this test. For the splitting stage, it initiates with the comparison of the calculated p-values of each independent parameter with the independent parameters that have the least p-value, followed by their selection as the node separator. In situations where no variable has a significant p-value, there will be no splitting stage and the final node will be determined as the node that precedes no branching [68]. The last stage (the stopping stage) begins with a repeat of the combination and analysis stages of all subsets. The process is terminated after all the subsets have been analyzed [71]. The formation of different parts in the CHAID model is represented by a classification tree diagram, where each dependent parameter is represented by a root, and the independent parameters are associated with significant p-values and are directly related with the root [72][73][74]. The weakness of this algorithm is that it cannot generate the best feasible divisions from the current parameters. More information on CHAID has been provided by [70][71][72][73].

Principal Component Analysis
Issues such as high dimensional input space, variables correlation, and insufficient training samples can create problems in the learning process, and the conditions might become worse when we want to spatially interpolate values for various locations within a city, but with few observation points [75]. It becomes inevitable to implement dimension reduction methods to reduce the number of correlated variables into the uncorrelated ones. Through application of PCA, while maintaining the highest variation and dispersion in the data, one can transform the input variables into a set of new uncorrelated variables called the principal components [76,77]. Equations (5) and (6) are used to provide linear transformation from the input space to the principal component space. Here, the orthogonal linear transformation matrix is defined by P, Z represents the original data matrix, according to which, each row denotes a variable, and Y represents the transformed matrix. In this matrix, each row denotes the uncorrelated principle components.
The transformation matrix (P) is obtained from the eigenvalues (λ1, λ2, …, λ1) of the covariance matrix of the original variables by applying PCA. The rows of this matrix represent the corresponding eigenvector. The eigenvectors specify the directions of the new space, and the eigenvalues specify their magnitude [77,78]. In order to find which eigenvector(s) could be removed without much affecting the information needed for building a subspace with lower dimensions, we should inspect their corresponding eigenvalues. Those eigenvectors which have smaller corresponding eigenvalues are those that have lower information on the data distribution and can be removed.

Statistical Criteria
In the present research, the following performance metrics (Equations (7)-(10)) were applied: correlation coefficient (R), Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), ratio of RMSE to standard deviation (RSD) [62][63][64]: where and denote the experimental and predicted target variable values, respectively. and are the mean of experimental and predicted target variable values, respectively. N denotes the total number of data. The R index, which is in the range of (0,1) (with R = 1 as the ideal value), shows the selected predictors suitability in predicting the target variable. NSE, with the range of (−∞, 1) and ideal value equal to unity, is used for assessing the capability of the proposed methods. Therefore, a value equal to unity shows perfect fitting between the actual and measured target values, and a negative value means bad performance of the model with respect to the arithmetic mean of the used models. RMSE and RSD with the range of (0, +∞) and ideal value of zero are used to assess the accuracy.

Selection of the Input Variables Using the PCA Technique
In this paper, to propagate the most effective combination of inputs for the simulation matrix of the mechanical characteristics, principal component analysis (PCA) based on dimensionality reduction was performed. The predictor variables affecting mechanical characteristics of RCCP of different ages are described as bellow: where CA (Kg/m 3  cementitious material content, binder content, water content, ratio of water to cement, ratio of water to binder, ratio of supplementary cementitious material to binder, and ratio of coarse to fine aggregate, respectively. Table 1 reports the results of analysis consisting the contribution of 10 inputs to 10 PCs, the explained variance (EV) of each PC, and the cumulative sum (CS) of EV. PC1 represents 51.3% and the first four PCs represent 99.1% of total variance. The optimal input combinations are made bold in the table. The higher the EV, the better the combination of inputs. The optimal combination of mixture proportions is calculated using Equation (11) using the PCA technique, as presented in Table 1. Five predictors provided the majority of the explained variance. Table 1 presents the values of the PCs and their variances. In Table 1, it can be seen that the volumetric and weighted forms of the experimental variables of CA, FA, SCM, W, and B, based on PC1, are the most effective independent predictor variables. Therefore, this combination of simulation variables along with age of specimens (AS) is used to construct the models to predict the mechanical characteristics of the concrete. The descriptive measures of the best combination of inputs for simulation of the mechanical characteristics of RCCP are presented in Table 2. The correlation coefficients of the selected independent variables for development of the proposed models are presented in Figure 2. According to the matrix, there are no significant relationships between the developed matrixes of CS, TS, and FS.

Estimation of RCCP Mechanical Characteristics Using Classification-Based Regression Methods
Application of the decision tree classification system, which is based on artificial intelligence, is a recent method proposed for solving engineering problems. The final properties of models are recaptured on the basis of network calibration. Then, the network can generalize those learned in a similar condition [67]. In the present study, the modelling methods included are four classification-based regression methods, namely RF, CHAID, M5rule, and M5prime, which were explored for the prediction of the characteristics of RCCP.
Definition of the matrix, consisting of CA, FA, SCM, W, B, and AS datasets, indicated the independent variables, and the dependent variables were CS, TS, and FS, which were used in each decision tree-based regression model. RF, M5rule, and M5p were performed using WEKA 3.9 and CHAID was implemented using STATISTICA software on an AMD A-12 9700, 10-core 2.5 GHz computer system.
To implement the RF model, the default Bagger algorithm was used with bag size percent set to 200, leaf number was set to eight, and delta criterion set to 0.1007. No mathematical formulation was utilized to fine the optimum number of trees. Commonly, a larger number of trees produces more precise results, but increases computational cost.
The M5tree procedure for simulation of RCCP properties was generated using a set of tuning parameters to initialize the proposed model. A pruning factor of 4.0 and smoothing option were selected to evaluate the performance of the M5 model towards proposing the mathematical linear formulations for RCCP. After classifying, the developed M5p model, consisting of six input variables and three output variables, was used for simulation of CS, TS, and FS of RCCP using 12, 18, and 3 rules, respectively. The proposed models have the optimum number of decision trees (linear models (LMs)) as this value achieves the lowest error in the training stage. These LMs (rules), on the basis of conditional sentences, are illustrated in Figure 3.

Compressive Strength
The observed and simulated compressive strength values estimated by the RF, M5rule, M5p, and CHAID models for RCCP are presented in Figure 4. As presented in Figure 4, the closer the ratio is to 1 (black and dotted line), the better the visual agreement between the observed CS and the simulated RF than other tree-based models. There were significant statistical correlations between the observed and simulated CS values for the four models under study. To compare the proposed tree-based models' performances based on quantitative measures (i.e., NSE RSD, R, and RMSE), Table 3  In the testing phase, it is obvious that the CS values simulated by RF performed the best with the highest NSE (0.931) and lowest RSD (1.181 MPa) values in comparison with other ML methods. Figure 5 plots the observed and simulated CS of RCCP and their relative error using the tree-based techniques. The estimated CS of RF and CHAID models were in coherence with the observed data points. However, RF could only roughly simulate extreme CS values.

Tensile Strength
The performance indicators of the calibration and validation capability of estimating the tensile strength of RCCP using tree-based methods are reported in Table 4. According to Table 4 Figure 6. The presented tree-based models achieved acceptable simulation results for the TS of RCCP based on data correlated around the ideal line (1:1 line). Although a few data points developed by M5p and M5rule around the TS of 2-5 MPa indicated some small divergence from the 1:1 line, the results revealed that all of the treebased methods provided high accuracy to simulate of tensile strength. The time series and residual plots for tree-based simulation and actual TS are presented in Figure 7. The RF model generated the minimum RMSE and outperforms the M5rule, M5p, and CHAID models for estimation of the TS of RCCP.

Flexural Strength
The applicability of tree-based models, namely RF, M5rule, M5p, and CHAID was investigated for estimation of the flexural strength of RCCP. The statistical evaluation of the developed models in the simulation of FS is presented considered in Table 5. In the 75%-25% data split of this study, the RF model outperformed the other ML methods in both training (R = 0.988 and RSD = 0.049 MPa) and testing stages (R = 0.970 and RSD = 0.108 MPa), respectively. RF has the lowest RMSE (0.197 MPa) and highest NSE (0.939); it enhanced the precision of testing phase in terms of NSE of the M5rule, M5p and CHAID by 28%, 27%, and 30%, respectively. Figure 8 and 9 show the plots for comparison of the actual results with those of the four models inspired of tree-based regression methods. It can be shown in the aforementioned figures of the proposed models that the RF model has the highest accuracy in the simulation of FS during the training and testing steps. It is also evident from this plot that RF had a slightly higher precision in estimation of the local maximum and minimum FS values compared to the other ML methods.

Model Validity
External validation (EV) is used for comparison of the results of estimated and experimental data. Golbraikh and Tropsha [79] adopted new external validation criteria to assess the estimation precision of models according to the performance of validation data. EV means assessing the model performance with independent samples [80].
where and represent the experimental and estimated target values, respectively.
Furthermore, Roy and Roy [81] used Rm (calculated by Equation (14)), a stabilization criterion, for external predictability of the models [81]. They found that an Rm value less than 0.5 shows an appropriate simulation.
The determination coefficients passing through the source between the estimated and experimental values (R ), and conversely (R ), are derived using the following equations: The validation indicator and the related performance of CS, TS, and FS prediction obtained by the various models are presented in Table 6. According to this table, the RF models for CS, TS, and FS, which yielded Rm = 0.691, Rm = 0.834, and Rm = 0716, respectively, satisfy the conditions and provide the best validation compared to the other models. In addition, the CART and M5tree values for CS (Rm = 0.187), TS (Rm = 0.195), and FS were less than the required value for Rm (Rm > 0.5). Thus, it is seen that RF shows highest validity for predicting the mechanical characteristics of RCCP. Monte-Carlo simulation (MCS)-based uncertainty analysis is used for determining the randomness of model uncertainty. This method was first used by Ulam and Neman [82] in military projects for simulation of probabilistic events. It is well known that CS, TS, and FS contains various uncertainties, such as uncertainty of input variables and uncertainty of model parameters.
For this purpose, an investigation of quantitative uncertainty associated with output prediction rate (E) was performed using the RF, M5rule, M5p, and CHAID models. The MCS was performed for CS, TS, and FS values. The individual error of prediction was calculated for all the datasets (Equation (19)). Equations (20) and (21) are utilized for calculation of the mean ( ̅ ) and standard deviation ( ) of the estimation error, respectively [76]: In the above equations, n is the dataset length, and and denote the estimated and experimental target values, respectively. A positive mean prediction denotes an overestimated prediction of the target variable, and a negative one denotes an underestimated value of the target variable compared to the observed values. Thus, a confidence band could be drawn around the predicted error value through application of Wilson score approach [83,84]. Furthermore, ±1.96 yields 95% confidence band around predicted Pi as follows: The outputs of this analysis, such as the uncertainty band width and mean absolute deviation (MAD), are given in Table 7. According to this table, the positive mean prediction error indicates that the predicted values calculated by all these methods are higher than the experimental values. It is seen that RF and CHAID methods for CS yielded the minimum (33.065% and 33.240) bandwidth uncertainties, respectively. Moreover, in other developed models, RF had the lowest uncertainty and satisfied the bandwidth criteria.

Sensitivity Analysis and Variable Importance
Sensitivity analysis (SA) of variables is a technique used to determine how different values of predictor variables will affect an output variable. For each independent variable, the SA% is calculated as follows [7]: = ∑ × 100 (24) where and are the maximum and minimum of the estimated target over the ith input domain, respectively, where other independent variable values are equal to their average values. The result of variable importance for the simulation of mechanical characteristics of RCCP is indicated in Figure 10 based on the RF model (best model). These figures show that the most effective variable in CS, TS, and FS estimation of RCCP is fine aggregate content.

Conclusions
In this research, classification-based regression methods based on the RF, M5rule, M5p, and CHAID techniques were applied as a ML tools to develop new predictive models of the mechanical characteristics of RCCP. The models were constructed using comprehensive datasets of RCCP design codes. Before development of the models, PCA was applied to determine the most important inputs predictors for data dimension reduction. RF and CHAID presented better performance for the training dataset compared to the other methods utilized in this research. The higher rank of RF and CHAID for the training data indicates that their flexibility, a result of combining multiple decision trees, is particularly useful for estimating the mechanical properties of RCCP. The performance of RF was significantly better than the other classification-based regression methods. This difference may be due to the larger diversity among the learned trees of RF, which is a consequence of RF's implementation for randomized splitting at nodes. Typically, classification-based regression methods function better if there is notable diversity among the models [85,86]. However, the performance of the M5rule and M5p models was inferior to both RF and CHAID. This may be because the M5rule and M5p methods are more prone to overfitting, while RF and CHAID focus on variance reduction and consequently avoid overfitting. According to results of this research, the following conclusions can be drawn: