A New Ensemble Prediction Method for Reclaimed Asphalt Pavement (RAP) Mixtures Containing Different Constituents

: Fatigue and rutting are two common damage types in asphalt pavements. Reclaimed asphalt pavement (RAP), as a sustainable approach in the pavement industry, deals with the foregoing damage. Fatigue and rutting characteristics of asphalt pavement are generally assessed using laboratory tests, taking a long time and consuming significant amounts of raw material. This study aims to propose a novel approach for predicting fatigue and rutting performance of RAP mixtures. A new ensemble prediction method, named COA-KNN, is introduced by combining the coyote optimization algorithm and K-nearest neighbor to increase the accuracy of fatigue and rutting prediction. In order to evaluate the accuracy, the proposed method was compared against robust prediction meth-ods, including random forest (RF), gradient boosting (GB), decision tree regression (DT), and multiple linear regression (MLR). Afterward, the influence of each variable on the mentioned damages is examined, and the variables are ranked based on their relative influence on the mentioned damages. The results suggest that COA-KNN outperformed other prediction techniques when comparing different performance indicators. Total binder content in asphalt mixes and the PG span of the virgin binder added to the recycled asphalt mixture had the highest relative influence on fatigue and rutting performance, respectively.


Introduction
Transportation is a critical system for every communication as most human and goods transmissions are conducted using infrastructure [1,2]. As an enormous consumer of natural resources, the transportation industry contributes 25% and 22% of global energy usage and fossil fuels burned across the world, respectively. Accordingly, 22.7% of global greenhouse gas (GHG) production is generated in this industry. A total of 7% of the mentioned GHG is emitted by pavement which is responsible for transferring 60% of freight and 80% of passenger traffic [3]. Furthermore, the pavement industry consumes a significant amount of energy. For instance, aggregate and binder production and hot mix asphalt (HMA) operation to produce asphalt consume 54, 5810, and 275 MJ/ton energy, respectively [4]. Thus, the pavement sector can be considered a harmful industry to the environment, which is a significant concern.
The concerning environmental impacts and the high cost of crude oil have caused the pavement industry to develop a method to recycle waste pavement materials [5]. Consequently, many methods have been suggested as sustainable approaches, including the application of recycled materials such as reclaimed asphalt pavement (RAP) [6,7] or warm and cold mix asphalt technologies [8]. Warm and cold mix technologies still face problems, such as moisture damage potential and durability due to water in the binder emulsion, long curing time, and uncertainty of long-term performance [9]. Consequently, among these sustainable options, asphalt recycling is the most common method currently used in this industry, as it creates a cycle of reusing natural resources and reduces the need for virgin binder and material [10]. However, RAP application provides the mentioned environmental benefits, increasing its content in asphalt mixtures, and the effect of the degree of blending between virgin and RAP binders may deteriorate some characteristics of asphalt pavements [11], such as cracking resistance, moisture damage, etc. Thus, there has been a dire need to come up with new solutions to increase RAP content without the mentioned problems [11].
To this end, laboratory tests are generally executed to evaluate the mentioned mechanical behavior of asphalt mixes as a conventional technique. Two major damage types of concern in asphalt pavements are fatigue cracking and rutting. Thus, previous studies investigated these failures in mixtures containing RAP using laboratory tests. In order to evaluate fatigue cracking, the resilient modulus, fracture energy, dynamic modulus, etc., have been investigated using various tests such as four-point bending beam, dynamic modulus, semi-circular bending, and dissipated creep strain energy tests [12,13]. Similarly, regarding the rutting resistance of asphalt mixtures, rut depth is a parameter that is used for evaluating this performance. Accordingly, rutting has been investigated in various studies with tests including wheel tracking and asphalt pavement analyzer tests [14][15][16][17][18][19]. These laboratory tests generally showed that increasing the RAP content in mixtures increased the stiffness of mixtures and improved their rutting resistance but clearly did not specify the effects of the mixture components on the behavior of RAP mixes.
All the previous laboratory studies experimentally evaluated the performance of asphalt mixtures containing RAP, though the optimal mixtures are selected between the produced samples. That is, a limited number of specimens have been generally examined in experimental laboratory experiments. Moreover, laboratory testing takes a long time to prepare the specimens, equilibrate the test condition, and perform the tests. Additionally, these complex procedures take expert technicians, expensive equipment, and even more energy and material. Furthermore, laboratory specimens harm the environment because they should be stored in landfills after performing the tests. To diminution these issues, researchers began to apply soft computing (i.e., machine learning and optimization) techniques to predict various asphalt damages [20]. As such, Garbowski and Pożarycki [21] used an optimization framework to determine the thickness and stiffness of the pavement layers through a multi-level inverse approach. Sarkhani Benemaran et al. [22] applied extreme gradient boosting to predict the resilient modulus of flexible pavement foundations. Esmaeili-Falak and Sarkhani Benemaran [23] developed a new prediction model using extreme gradient boosting to predict the resilient modulus of modified base materials.
Ghasemi et al. [24] applied artificial neural networks (ANN) and linear regression to predict the dynamic modulus of asphalt, considering volumetric and particle size gradation as input variables. Only nine mixture proportions were taken as the dataset. The results suggested that ANN outperforms linear regression when comparing the prediction accuracy. Behnood and Daneshvar [20] estimated the dynamic modulus of asphalt mixes using the M5P algorithm and a dataset including 4022 asphalt mixture samples. The dataset included different input variables, including aggregate gradation, volumetric properties, binder characteristics, etc., and one output variable (i.e., dynamic modulus). The mentioned model obtained an R 2 of 0.919. Behnood and Golafshani [25], in another study with the same dataset and variables, used the biogeography-based optimization (BBO) algorithm, which enhanced the accuracy with an R 2 of 0.9601.
Likewise, prediction techniques have been applied to predict the rutting of asphalt pavements. Initially, regression models were used to predict the rutting of asphalt pave-ments [26]. However, recently, researchers have started to apply powerful prediction techniques to predict rutting. As such, Ullah and Zainab [27] studied the rutting performance with a dataset including 0 to 60% RAP. The input variables were loading cycles, RAP percentage, RAP binder content, etc. Using an ANN-based model, they predicted the rutting behavior with a testing data R 2 of 0.997. Majidifard et al. [28] predicted the rut depth of asphalt mixes using genetic expression programming. The dataset consisted of 96 test samples. The used parameters were mix and binder characteristics, aggregate gradation, and the content of additives in the mixtures, which led to an R 2 of 0.84.
Previous studies mainly focused on dynamic modulus and rut depth as indices for fatigue and rutting resistance, respectively, which are the mechanical properties of asphalt mixtures. Since the aim of this study is the evaluation of fatigue and rutting characteristics of these mixes in the field, two general indexes (fatigue life and rutting resistance index) are chosen for the prediction procedure. In addition, the used algorithm is a key element of the prediction procedure, and hence, an accurate and robust algorithm must be developed to achieve reliable results. Moreover, a major part of the dataset used for the prediction procedure was concentrated on low percentages of RAP. The lack of data for mixes with high percentages of RAP affects the prediction procedure resulting in inaccurate models.
In this study, unlike the discussed studies, two general indices were used for fatigue and rutting resistance of asphalt pavement instead of dynamic modulus and rut depth to better indicate the field behavior of asphalt mixtures. Meanwhile, in this study, the database included mixes containing high percentages of RAP. An appropriate prediction algorithm should be applied to the mentioned prediction problem to reach the highest possible accuracy. In this regard, a new robust ensemble method was developed to predict the fatigue and rutting performance of asphalt mixes accurately. The mentioned method was developed by merging a machine learning technique with a metaheuristic optimization algorithm. The introduced prediction method was then compared against conventional prediction methods in order to evaluate its performance and prioritize the performance of all the prediction algorithms applied in this research on fatigue and rutting prediction problems. Further, the impact of each input variable on the outputs of the models was investigated. That is, the features (variables) were ranked based on their relative influence on fatigue and rutting resistance. Therefore, researchers could select the optimal parameters to improve the fatigue and rutting performance of asphalt mixtures containing high RAP content.

Methodology
This study attempts to predict the fatigue and rutting performance of asphalt mixtures containing RAP. To this end, two general indices for fatigue and rutting (i.e., Nf and RRI, which will be further explained in the following sections) are used since these indices better indicate the fatigue and rutting life of asphalt. The Table of abbreviations with their explanations are shown in Appendix A. In order to achieve higher accuracy in the predicted values, a new ensemble method is developed. The performance of the introduced method is further evaluated by comparing its accuracy with random forest, gradient boosting, decision tree, and multiple linear regression. Ultimately, the effect of each input variable (e.g., RAP percentage and total binder percentage) on the fatigue and rutting prediction models were assessed and prioritized. The methodology flowchart is indicated in Figure 1. The following sub-sections are presented in this section:

•
The process of data preparation • The applied algorithms for the prediction procedure • The description of machine learning performance indicators used to evaluate the accuracy of prediction algorithms • The methods applied to analyze the effects of the model's inputs on output variables

The Data Preparation Process
An accurate model needs to contemplate all affecting features on the intended parameters, which are fatigue and rutting indices, in this study. As this study aims at the fatigue and rutting prediction of asphalt mixes containing RAP, two datasets are extracted and generated from authentic international publications. A dataset including 227 asphalt mixture samples is prepared for the fatigue life modeling . The model's input variables (features) include the RAP percentage, the span of the PG of the virgin binder, the intermediate-temperature performance grade (PG) of the virgin binder, the rejuvenator content, the virgin asphalt content, the asphalt content in RAP, the total asphalt content, the percentage of aggregate smaller than 4.75 mm (fine aggregate), the percentage of aggregate larger than 4.75 mm divided by the percentage of aggregate smaller than 4.75 mm (coarse aggregate/fine aggregate ratio), and the dynamic modulus test temperature. It should be noted that since fatigue life (Nf) is very skewed, the natural logarithmic scale of it is preferred to be considered for the model's output variable since the logarithmic transformation increases the fatigue life prediction accuracy [67]. The fatigue life of the mixes is evaluated to assess the fatigue performance of asphalt pavements. Equation (1) is applied to calculate the fatigue life of the asphalt mixture [68].  (1) where Nf is the allowable cycles of fatigue life, εt is the tensile strain at the bottom of asphalt layers under the wheel (strain) which has been considered 0.0002 strains [69], and |E*| is the dynamic modulus of asphalt mixture (MPa) extracted from dynamic modulus test (AASHTO TP62).
The description of the fatigue prediction input and output variables and their descriptive statistics are given in Table 1. The dataset contains 12 input variables and one output variable. Likewise, a dataset including 226 data samples of asphalt mixes containing RAP is employed for rutting resistance performance prediction [14][15][16]19,68,[70][71][72][73][74][75][76][77][78][79][80][81][82][83][84][85][86][87][88]. Different input variables, including the RAP content, the span of PG of the virgin binder, the hightemperature PG of the virgin binder, the rejuvenator content, the virgin binder content, the binder content in RAP, the total binder content, the nominal maximum aggregate size, the percentage of fine aggregate size, and the coarse aggregate to fine aggregate ratio, are taken into consideration as the model's input features. Meanwhile, the rutting resistance index (RRI) is considered the model's output variable since RRI implies the rutting resistance of asphalt pavement based on the rut depth and the loading cycles reported from the laboratory tests. The RRI can be calculated from Equation (2) [89].
where RRI is the rutting resistance index, N signifies the number of cycles at the end of the test, and RD is the rut depth (inch) (AASHTO T324). In this equation, the maximum allowable rut depth is 1 in [89].
The description of the input and output features and their descriptive statistics are presented in Table 2 for rutting. The dataset contains ten input variables and one output variable. Further, the distributions of fatigue and rutting indices are shown in Figure 2.  It is noteworthy that 75% of data samples are used as training data. Training data is used to tune hyperparameters and train the model. That is, K-fold cross-validation (considering K = 5) and grid search methods are simultaneously used to tune the hyperparameters of prediction techniques. Then, the other 25% of (testing data) is used to evaluate the performance of prediction techniques on unseen data samples.

Prediction Algorithms
As described before, in order to increase the accuracy of the fatigue and rutting prediction, a new hybrid algorithm is proposed, and to assess the performance of this newly developed algorithm, four conventional methods are employed. Before running the machine learning algorithms, the input variables are scaled using the standard scaler in the python SKlearn library. Standard scaler applies Equation (3) to scale the range of all variables.
where, Si implies the scale value of the data sample i, xi is the raw data value of the data sample I, and u and std signify the mean and standard deviation of data samples.
In this section, the utilized algorithms are explained.

Decision Tree
The decision tree (DT) is a prediction technique capable of modeling regression and classification problems. DT simulates the structure of trees, particularly their leaves, and nodes, to divide the dataset into categories with similar characteristics. DT is a straightforward prediction method, and its outcomes are easy to interpret. DT has been widely used for prediction purposes since it can be modeled as a white-box prediction technique. It was demonstrated that DT could face large-scale datasets with a significant number of samples. DT generally works based on some decision rules, determining feature cut-off thresholds. Initially, it is assumed that all data samples constitute a tree. Then the algorithm moves forward and splits samples into different sub-samples using a feature threshold in each leaf node. Subsequently, each sub-sample is divided into smaller sub-samples using another feature threshold. This data sample division is kept on until the model obtains pure sub-samples or termination criteria are met. Afterward, DT employs a backforward pruning technique to cut nonessential splits (branches) to enhance the model's efficiency and decrease the computational complexity [90].

Random Forest
Random forest (RF) is an ensemble prediction technique applied for regression, classification, and outlier detection. RF mainly obtains high prediction accuracy, performs well in large-scale datasets, and reduces the likelihood of over-fitting. It was demonstrated that RF is a computationally efficient prediction method and can perform well even if the number of data samples is inadequate. RF generally employs different randomly generated decision trees. Then, the generated models are aggregated using the bagging technique. That is, a powerful ensemble method is created using bootstrap aggregation decision trees [91]. Consequently, decision tree algorithms are run and predict the model's dependent variable. RF considers the prediction value of all decision trees in the prediction process. Hence, the average value of decision trees' outputs is considered the RF prediction value [3].

Gradient Boosting
Gradient boosting regression is an ensemble prediction method that can be used for regression and classification problems. Gradient boosting regression (GB) combines various decision trees to create a powerful prediction model. GB generates basic decision trees iteratively and combines these weak regressions through boosting process. Additional decision trees are added in each iteration of the boosting process. Hence, a new regression model is generated by GB at each iteration, and the combination method is optimized using a loss function (e.g., absolute error or squared error) and training data. Accordingly, the model seeks to add basic trees that can reduce the loss function to the prediction model. It has been demonstrated that GB is a powerful method for databases containing unbalanced data and outliers, and it can guarantee prediction efficiency for these databases [92].

Multiple Linear Regression
Multiple linear regression (MLR) is a conventional technique applied for prediction, data representation, and an indication of relevant feature relationships in cases where data follow a linear pattern. MLR is a quick and straightforward method that is easy to interpret since it is a white-box prediction technique. However, the regression structure in MLR should be determined before running the model, and the predefined structure may reduce the model's flexibility and prediction accuracy [93].

The Proposed Prediction Method (COA-KNN)
As mentioned, a new ensemble regression method is introduced in this study to predict asphalt's fatigue index and rutting index accurately. The proposed model is generated using the coyote optimization algorithm (COA) as a robust metaheuristic algorithm and K-nearest neighbors regression (KNN) as a powerful prediction technique. COA and KNN are described in this section, and afterward, the proposed ensemble regression method is presented.
COA is a novel swarm intelligence optimization algorithm proposed by Pierezan and Coelho. This algorithm mimics the social behaviors and interactions of coyote Canis latrans. COA assumes that each solution vector is a coyote in the feasible region (coyote society). The coyote's behavior is modeled to investigate the feasible region and finds the optimal solution. Each coyote is a vector representing the values for each independent variable of the optimization problem. First, some coyotes are generated by allocating random values to all independent variables. Afterward, the fitness values of coyotes are calculated using the problem's objective function. Consequently, the coyotes are categorized into different herds. That is, solution vectors are divided into various groups to investigate different parts of the feasible region at the same time and avoid sticking to locally optimal solutions. Subsequently, the coyotes are compared based on their fitness value, and the most valuable coyotes (highest fitness value) in each group are called alpha [94].
Then, the transfer culture operation is performed, and all coyotes are attracted by their groupmates and their groups' alpha. Hence, they are transferred to points, which are located near their alpha and their groupmates. The coyotes with higher fitness values have higher attraction than others. Therefore, more coyotes are accumulated in areas with better previous experiences. Meanwhile, some solution vectors are transferred between groups in each iteration. This process reduces the likelihood of sticking to local optimal solutions by scattering some solutions in the problem's feasible region. Ultimately, the death and birth operator is performed to enhance the algorithm's performance. In this operation, the weakest solution vectors are removed from society (death), and new random solution vectors are created (birth) to investigate new areas in the feasible region. These processes are performed in all iterations, and after meeting the termination criteria, the algorithm stops working, and the solution vector with the highest fitness value (best coyote) is reported as the optimization problem's optimal solution [95].
K-nearest neighborhood regression (KNN) is a robust regression model applied for statistical analysis and prediction since the 1970s. KNN is a non-parametric regression model that predicts the output values considering the resembling data points in the data set. That is to say; all training data points are plotted in an F-dimension space according to the prediction problem's input values, where F implies the number of input values in the prediction problem. Subsequently, the distance of each data point to all data samples is computed. Then, the output value of K nearest data samples is employed to predict the output value of the mentioned data point. Among K nearest data samples, the data points with lower distances affect the output value more than the data samples with longer distances. However, the model is highly sensitive to the value of K, and the model may not obtain accurate predictions if appropriate values are not considered for K [96]. To this end, the proposed model combines different KNN models with different K values to prevail in the mentioned deficiency.
As previously mentioned, COA and KNN are combined to propose a new ensemble prediction method. In this regard, an optimization problem is modeled, which is indicated in Equations (4) Where yi pre and yi exp signify the prediction value and real value of validation data sample i. yi KNNk denotes the prediction value of validation data sample i predicted by the KNN model k. c0 and ck are the constant value and the coefficient of the KNN model k. That is, c0 and ck are the optimization decision variables that should be optimized using COA. I is the number of validation data, and K is the number of KNN models applied in the proposed ensemble method. Equation (4) is the objective function of the optimization problem. As can be seen, the introduced model aims to minimize the mean absolute error of validation data by finding the optimal coefficient of each KNN model. Training data are used to run the model. Then, the trained model predicts the response variable (i.e., rutting or fatigue) for the validation data (yi KNNk ). Afterward, the real validation data and their predicted values are used to evaluate the mean absolute error. The optimization model aims to minimize the mean absolute error by finding optimal values of decision variables. Equation (5) is the constraint of the optimization problem, in which the prediction value is calculated using the values predicted by different KNN models, coefficients, and the optimal constant value. In this investigation, ten KNN models with different K values from 1 to 10 are modeled. However, it is recommended that the number of KNN is increased in the cases where the number of data points is higher than that of the current study. Afterward, these ten KNN models are run. Consequently, the validation data are employed in the models, and the validation data output value for all KNN models is estimated. Then, the optimal constant value and the optimal coefficients for each KNN model are obtained by solving the optimization problem (using COA). The KNN models with the coefficient of zero are removed from the ensemble model, and the prediction values are computed using the KNN models with their non-zero coefficients and optimal constant value. This process removes the models with inappropriate K values, and only robust KNN models remain in the proposed prediction method. Ultimately, testing data are applied to assess the prediction accuracy of the proposed model and other conventional prediction techniques. Since COA is a metaheuristic optimization algorithm, it was run five times to find the optimal structure of KNN models [97].

The Evaluation of Algorithms Performance Using the Performance Indicators
The accuracy of prediction algorithms is evaluated using machine learning performance indicators. Consequently, these performance indicators are utilized to compare the performance of prediction algorithms and detect the most accurate model. In this study, coefficient of determination (R 2 ), mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared error (MSE), variance account for (VAF), and A10-index are considered machine learning performance indicators [98]. These indicators are presented in this part. R 2 is the proportion of the variance in the dependent variable that is predictable from the independent variables, calculated using Equation (6)  Mean absolute error (MAE) is an indicator used for measuring the arithmetic average of deviations between predicted and real values. MAE is estimated using Equation (7) [100].

Prioritizing Features on Asphalt Mixture Characteristics
A good practice in modeling a prediction technique is to analyze the sensitivity of potential variables to the prediction model [102]. In ensemble learning techniques, the important weight represents the sensitivity of potential variables to the prediction model. In this study, the effect of each input variable (e.g., RAP content, total binder content, the span of the PG of the virgin binder, etc.) on the fatigue and rutting prediction models is investigated. In this regard, prediction models are run for each dataset. Afterward, the models with the highest accuracy are detected. Subsequently, the average importance weight of the most accurate models is used to prioritize each input feature for each dataset. The ranking of input variables on fatigue and rutting performance can determine the essential parameters in optimizing fatigue and rutting performance of asphalt mixes containing RAP, which is an optimal approach to modify the mixture proportion or mixture conditioning considering the most important input variables.

Results and Discussion
As previously mentioned, this study aims to propose a novel approach to accurately predict two major damages (i.e., fatigue and rutting) in asphalt mixtures containing different RAP contents using two general indices. To this end, various robust prediction techniques, including random forest, gradient boosting, decision tree, and multiple linear regression, are employed. Moreover, a new hybrid machine learning method is introduced using a combination of COA and KNN. In this section, the following steps are described, respectively: • The performance of prediction algorithms on fatigue and rutting prediction is analyzed.

•
The effects of each variable on fatigue and rutting characteristics are presented, and then the mentioned variables are prioritized based on their effects on the prediction performance.

•
The error histogram of the fatigue and rutting models is presented.

Performance of Algorithms
As mentioned, four conventional machine learning algorithms are used for the prediction process. Moreover, a new ensemble method, called COA-KNN, is proposed to increase the prediction accuracy. The developed model and the conventional prediction algorithms are then run on each dataset (i.e., fatigue and rutting) individually in order to predict the fatigue and rutting behavior of asphalt mixes containing RAP. In this section, the performance of prediction techniques on fatigue and rutting prediction problems are discussed. The performance of prediction techniques is compared through the machine learning performance indicators.

Performance of Machine Learning Algorithms-Fatigue
The obtained prediction model for fatigue, using multiple linear regression, is shown in Equation (12). It should be noted that the scaled values of the coefficients are given in this equation, so S refers to the parameters' scaled value. In multiple linear regression, the coefficient of each feature indicates the importance and the effect of the feature on the damage. That is, a greater coefficient implies that the feature has more impact on the model's output (i.e., fatigue). According to Equation (12), since the total binder content coefficient is 0.137, it is the most important feature of fatigue. In addition, the coarse aggregate/fine aggregate, with a coefficient value of 0.010, has the lowest impact on fatigue performance. Based on the results, the most effective feature on fatigue performance of asphalt mixtures containing RAP is the virgin binder content, followed by RAP content, the intermediate-temperature PG of virgin binder, the span of PG of virgin added binder, the nominal maximum aggregate size in gradation, the virgin binder content, the fine aggregate, the rejuvenator content, the dynamic modulus test temperature, the binder content in RAP, and the coarse aggregate/fine aggregate, respectively.
In the COA-KNN algorithm, to achieve the final predicted values, different KNN models with different values for K (the number of neighbors in KNN) are applied simultaneously. The result of each KNN model is multiplied by an optimal coefficient to present the ultimate prediction value. The optimal coefficients of the mentioned ensemble method are determined by an optimization algorithm (i.e., COA).
The predicted values of KNN models for fatigue performance are given in Equation (13). As can be seen, only four KNN models are available in the final equation (Equation (13)), and this is because of allocating a coefficient of zero to other KNN models in the optimization process.
where KNN (i) signifies the KNN model considering the number of neighbors (K) is equal to i. To predict the fatigue performance, the predicted value by KNN (5) is multiplied by 0.079 and is summed up with the predicted value by KNN(8) multiplied by 0.004, then again is summed up with the predicted value by KNN(10) multiplied by 0.918. As mentioned, the coefficient of other KNN models is zero, i.e., they are not chosen by the model. That is, three KNN models predict fatigue performance, and COA-KNN applies these predicted values to predict the ultimate prediction value. The machine learning performance indicators of prediction algorithms on the fatigue index prediction problem for testing and training data are indicated in Figure 3. The machine learning performance indicators were used to evaluate the performance of the developed model in comparison to the conventional methods. Regarding Figure 3, the testing data R 2 of COA-KNN are 0.07 more than the gradient boosting, 0.10 more than the random forest, 0.26 more than the decision tree, and 0.35 more than multiple linear regression. Thus, these testing data R 2 obtained using COA-KNN are more than the other conventional techniques, indicating the proposed method's higher accuracy. Based on these testing data MAE performance indicators, the lowest MAE of test data belongs to COA-KNN, followed by gradient boosting, random forest, decision tree, and multiple linear regression. The MAEs of COA-KNN test data are 0.01, 0.02, 0.04, and 0.07 ln (cycles), lower than the MAE test data in gradient boosting, random forest, decision tree, and multiple linear regression, respectively. Hence, it can be theorized that COA-KNN outperforms other prediction techniques in terms of MAE. Similarly, the MAPE COA-KNN test data are 0.04% lower than gradient boosting, 0.11% lower than random forest, 0.21% lower than decision tree, and 0.38% lower than multiple linear regression. Thus, it can be postulated that the introduced prediction model (COA-KNN) is more accurate than the other methods regarding MAPE. The same trend can be observed from the testing data MSE. That is, the MSE test data obtained using COA-KNN are 0.01, 0.01, 0.03, and 0.03 (ln (cycles)) 2 lower than gradient boosting, random forest, decision tree, and multiple linear regression, respectively. The VAF test data of COA-KNN are 5.77%, 5.97%, 14.85%, and 47.47% higher than gradient boosting, random forest, decision tree, and multiple linear regression, in the order given. However, the highest testing data A10-index is reached by decision tree, which is 5.7% higher than COA-KNN.
Considering all performance indicators, it can be deduced that the introduced ensemble regression method (i.e., COA-KNN) outperforms all conventional prediction techniques. In addition, among the conventional models, random forest and gradient boosting have acceptable performances. Nonetheless, linear regression and decision tree regression may not be qualified to predict fatigue damages.

Performance of Machine Learning Algorithms-Rutting
The multiple linear regression prediction equation for rutting performance is presented in Equation (14).
where Rutting represents the predicted value of the rutting index, SRAP, SPGspan, SPGhigh, SRejuvenator, SACvirgin, SACRAP, SACTotal, SNMAS, SAgg.below#4, and SAgg.above#4/Agg.below#4 imply the scaled value of the RAP content, the span of PG of virgin added binder, the high-temperature PG of virgin binder, the rejuvenator content, the virgin binder content, the binder content in RAP, the total binder content, the nominal maximum aggregate size in gradation, the percentage of fine aggregate, and the coarse aggregate/fine aggregate value.
Based on Equation (14), it can be observed that the rutting performance is highly affected by the coarse aggregate/fine aggregate value with a coefficient of −2684.74, indicating the corresponding impact is negative. Meanwhile, the NMAS coefficient in aggregate gradation shows that aggregate gradation has the lowest effect on the rutting index, with a value of 197.983. Other effective input features on the rutting performance of asphalt mixtures containing RAP are the virgin binder content, the total binder content, the span of PG of virgin added binder, the binder content in RAP, the high-temperature PG of virgin binder, the rejuvenator content, the RAP content, and the percentage of fine aggregate, respectively.
The KNN prediction model for rutting performance is shown in Equation (15 (6) and KNN (7), respectively. The obtained value is summed up with the optimal constant (i.e., 51.562), and the ultimate value is the COA-KNN rutting performance prediction. The machine learning performance indicators of prediction algorithms on the rutting index prediction problem for testing and training data are indicated in Figure 4. As can be seen, the testing data R 2 reaches the maximum value using COA-KNN. The testing data R 2 of COA-KNN is 0.050 more than random forest, 0.091 more than gradient boosting, 0.222 more than decision tree, and 0.806 more than multiple linear regression. Hence, COA-KNN outweighs other prediction algorithms based on testing data R 2 . Considering the MAE test data, the MAE test data of COA-KNN are 361.44 inches lower than random forest, 890.44 inches lower than gradient boosting, 897.21 inches lower than decision tree, and 3123.89 inches lower than multiple linear regression. Accordingly, COA-KNN is the most accurate algorithm in terms of error reduction. Similarly, the MAPE test data of COA-KNN are 1.44%, 5.06%, 5.70%, and 22.60% lower than that of random forest, decision tree, gradient boosting, and multiple linear regression, in the order mentioned. Regarding the MSE, the lowest MSE test data are related to COA-KNN. The MSE test data of COA-KNN are significantly lower than conventional methods, which are 1,297,190 inches 2 less than random forest, 2,506,610 inches 2 less than gradient boosting, 7,206,180 inches 2 less than decision tree, and 19,897,280 inches 2 less than multiple linear regression. The VAF test data of COA-KNN are 6.6%, 11.72%, 28.97%, and 60.87% higher than random forest, gradient boosting, decision tree, and multiple linear regression. Likewise, the highest A10 index is obtained using COA-KNN, with a value of 98.73%. Thus, it can be postulated that COA-KNN performs better than conventional prediction techniques. To this end, the introduced ensemble method predicts the rutting damage with the highest accuracy, while the multiple linear regression has the worst performance. Among the other conventional algorithm, the prediction models of random forest and gradient boosting are more accurate.

Relative Influence of Variables
It is essential to assess the effects of each input variable (e.g., RAP percentage) on the prediction model's output (i.e., fatigue and rutting) to understand better how asphalt performance can be improved. Since the results show that multiple linear regression is the worst algorithm based on performance indicators, the outcomes of Equation (12) and Equation (14) are not trustable, and a practical approach should be considered to analyze model parameters meticulously. To this end, the importance weight values presented by Random Forest and Gradient Boosting (the most accurate conventional techniques) are used to prioritize models' features. Hence, the average value of importance weights presented by the mentioned algorithms is employed in order to consider both algorithms.
That is, the average importance weights values of the two models are calculated, and subsequently, the features are prioritized based on it. The feature with the largest average important weight value is considered a variable that affects the model's output the most. Table 3 shows the important weight values of the random forest and gradient boosting for fatigue performance of asphalt mixtures containing RAP. As can be seen from Table 3, the total binder content in asphalt mixes affects the fatigue performance the most since the binder content is responsible for the stiffness of asphalt mixes which is in line with the results presented by Sreedhar and Coleri [103]. The second important feature in fatigue damage is the virgin binder added to the recycled asphalt mixtures due to the fact that the virgin binder lessens the stiffness of the mixture containing stiff binder in RAP; therefore, it increases the fatigue life of the mixture [17]. The intermediate-temperature PG of virgin binder is the next feature that can affect fatigue cracking since fatigue cracking occurs in the intermediate temperature of the asphalt pavement environment [104]. It is worth noting that RAP content is the fourth important feature of fatigue performance of mixtures containing RAP as it makes the mixture stiffer [17]. Other input features affecting fatigue cracking in asphalt pavements are the dynamic modulus test temperature, the asphalt content in RAP, the span of PG of virgin added binder, the percentage of aggregate larger than 4.75 mm, and the coarse aggregate/fine aggregate value, the nominal maximum aggregate size in gradation, the percentage of aggregate smaller than 4.75 mm and the rejuvenator content, in the order given. Hence, it can be concluded that the total binder content, the virgin binder added to the recycled asphalt mixtures, and the intermediate-temperature PG of the virgin binder are the most important parameters affecting the fatigue index. In this regard, considering these parameters to enhance recycled asphalt's fatigue performance can be an optimal approach. The importance weight values of random forest and gradient boosting and their average value for input variables in the rutting performance of asphalt mixtures containing RAP are in Table 4. As can be perceived, the most influential parameter on rutting performance is the PG span of the virgin binder added to the recycled asphalt mixture. This rheological property of binder (i.e., PG span) plays the main role in asphalt pavement rutting performance, and this result is in harmony with the outcomes of Taher et al. [105].

Fatigue Input Feature Performance
The following important feature in rutting is the total binder content in the mixture followed by the coarse aggregate/fine aggregate value, as the binder in mixes affects the friction between aggregates and causes these deformations [106]. Furthermore, investigations have shown that the coarser gradations in asphalt mixes perform better in rutting than fine gradations [107], which is in accordance with the results of the present study. RAP content in recycled asphalt mixes is the fourth feature affecting the rutting damage, which indicates the improvement in rutting performance due to the stiff binder of RAP [31]. The binder content in RAP, the percentage of fine aggregate, the high-temperature PG of added virgin binder, the virgin binder content, the nominal maximum aggregate size in gradation, and the rejuvenator content, are the following essential parameters on rutting, respectively. Therefore, optimizing the PG span of the virgin binder, the total binder content in the mixture, and the coarse aggregate/fine aggregate value should be considered the priority in the cases where rutting performance enhancement is investigated.

Error Histogram
The performance of prediction algorithms can be evaluated by comparing the predicted values with the measured values at laboratories. The lower distance of data points from the quality lines indicates higher accuracy and less error in the method [25]. It is noteworthy that the data points used for the charts belong to the training and testing datasets. Figure 5 depicts the predicted values versus the experimental values of the natural logarithm of the fatigue index. As can be seen from Figure 5, in COA-KNN, approximately all data points are on or close to the quality line, which means the appropriate correlation between the predicted and experimental values. Random forest and gradient boosting have acceptable performances compared to other conventional algorithms. The data points in decision tree show more distance from the quality line followed by multiple linear regression, indicating a higher error in contrast to other prediction algorithms. Overall, it can be concluded that the introduced ensemble model outperforms the previously developed methods with significantly higher accuracy and lower error.

Accuracy of the Machine Learning Methods-Rutting Prediction
The graphs of the predicted data points versus measured data points for the rutting performance of asphalt mixtures containing RAP are represented in Figure 6. As can be seen, the best prediction performance is for the newly developed algorithm (COA-KNN) due to the accumulation of all data points on or around the quality line except for just two data samples. Additionally, among the conventional algorithms, random forest and gradient boosting perform admissibly as most data points are near the quality line. The data points in the decision tree graph show further distance from the quality line, indicating a higher error in comparison to the foregoing methods. Likewise, except for one point, none of the data points in multiple linear regression are on the quality line, implying the predicted values of the model are not reliable. To this end, it can be concluded that the proposed ensemble regression method exceeds in performance compared to the mentioned techniques.

Conclusions
This study aims to accurately predict the fatigue and rutting performances of asphalt mixtures containing different contents of RAP. Contrary to previous investigations, fatigue and rutting indices are used instead of dynamic modulus and rut depth, as these indices directly indicate the fatigue and rutting life of asphalt pavements. Random forest, gradient boosting, decision tree, and multiple linear regression were utilized as conventional prediction tools. Meanwhile, a new hybrid ensemble method, called COA-KNN, was developed to improve the prediction accuracy, and the results were compared against conventional prediction algorithms. COA-KNN was further applied to two databases of fatigue and rutting performances. The following conclusions can be drawn from the results of this investigation:

•
The R 2 values of COA-KNN test data are 0.07, 0.10, 0.26, and 0.35 more than that of the gradient boosting, random forest, decision tree, and multiple linear regression, respectively, in the fatigue performance prediction.

•
Applying COA-KNN to the fatigue database reduces the MAE test data by 0.01, 0.02, 0.04, and 0.07 ln (cycles) compared to gradient boosting, random forest, decision tree, and multiple linear regression, respectively.

•
The R 2 values of test data attained using COA-KNN for rutting performance are 0.050, 0.091, 0.222, and 0.806 more than random forest, gradient boosting, decision tree, and multiple linear regression, respectively.

•
The MAE COA-KNN test data are 361.44, 890.44, 897.21, and 3123.89 inches lower than random forest, gradient boosting, decision tree, and multiple linear regression, respectively, in rutting performance prediction.

•
Replacing COA-KNN on the fatigue database with random forest, decision tree, gradient boosting, and multiple linear regression can reduce the MAPE test data by 1.44%, 5.06%, 5.70%, and 22.60%, respectively.

•
The MSE test data of COA-KNN are 1297190, 2506610, 7206180, and 19897280 inches 2 less than the random forest, gradient boosting, decision tree, and multiple linear regression, respectively, for the rutting database.

•
Considering the performance indicators, COA-KNN outperforms the other conventional algorithms in fatigue and rutting predictions. Moreover, random forest and gradient boosting methods had an appropriate accuracy, compared to other methods, in both datasets.

•
Based on the importance weights of random forest and gradient boosting, the most effective features on fatigue performance of asphalt mixes containing RAP are the total binder content in the mix, the virgin binder content, and the intermediate-temperature PG of the virgin binder, respectively.

•
According to the ranking of the variables on the rutting characteristic, the PG span of the virgin binder, total binder content, and the coarse-to-fine aggregate ratio are the most effective parameters on the rutting damage of recycled asphalt pavements.

•
One of the limitations of this study is to apply a limited number of machine learning algorithms. Hence, it is recommended that other machine learning techniques, such as artificial neural networks or Gaussian Processes [108], will be applied to predict rutting and fatigue in future studies, and their performance will be compared with the proposed method in the current investigation. Funding: This research received no external funding.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A