Ensemble Machine-Learning-Based Prediction Models for the Compressive Strength of Recycled Powder Mortar

Recycled powder (RP) serves as a potential and prospective substitute for cementitious materials in concrete. The compressive strength of RP mortar is a pivotal factor affecting the mechanical properties of RP concrete. The application of machine learning (ML) approaches in the engineering problems, particularly for predicting the mechanical properties of construction materials, leads to high prediction accuracy and low experimental costs. In this study, 204 groups of RP mortar compression experimental data are collected from the literature to establish a dataset for ML, including 163 groups in the training set and 41 groups in the test set. Four ensemble ML models, namely eXtreme Gradient-Boosting (XGBoost), Random Forest (RF), Light Gradient-Boosting Machine (LightGBM) and Adaptive Boosting (AdaBoost), were selected to predict the compressive strength of RP mortar. The comparative results demonstrate that XGBoost has the highest prediction accuracy when the a10-index, MAE, RMSE and R2 of the training set are 0.926, 1.596, 2.155 and 0.950 and the a10-index, MAE, RMSE and R2 of the test set are 0.659, 3.182, 4.285 and 0.842, respectively. SHapley Additive exPlanation (SHAP) is adopted to interpret the prediction process of XGBoost and explain the influence of influencing factors on the compressive strength of RP mortar. According to the importance of influencing factors, the order is the mass replacement rate of RP, the size of RP, the kind of RP and the water binder ratio of RP. The compressive strength of RP mortar decreases with the increase in the RP mass replacement rate. The compressive strength of RBP mortar is slightly higher than that of RCP mortar. Machine learning technologies will benefit the construction industry by facilitating the rapid and cost-effective evaluation of RP material properties.


Introduction
During the rapid process of the urbanization of the world, a large amount of construction and demolition (C&D) waste has been generated; for example, United States and China produce 700 and 1800 million tons of C&D waste each year, respectively [1]. C&D waste is usually disposed of by dumping and landfilling, which not only causes air and soil pollution but also enforces enormous pressure on limited landfills [2]. Since most of the C&D waste can serve as replaceable materials to aggregates and mortar, the recycling of C&D waste is a promising method of conserving natural resources [3].
There have been massive researches on the resourceful reuse of C&D waste, which can be generally divided into two categories: recycled aggregate concrete (RAC) and recycled powder concrete (RPC). At present, there are substantial experimental, numerical and theoretical studies on the strength of RAC. Researchers proposed methods based on aggregate skeleton theory, gray correlation analysis, finite element analysis and so on to predict the compressive strength of RAC and analyze its influencing factors [4][5][6][7].
Recycled powder (RP) is a by-product of recycling C&D waste. The process of recovering RP from C&D waste can be divided into three stages: collecting C&D waste, sorting the raw materials and grinding into powder. Figure 1 shows the RP preparation flow chart. The research of RP is still in the initial stage, and there are few empirical formulas Spurred by the constantly challenging requirements of the construction industry, researchers used machine learning (ML) to predict the compressive strength of new types of concrete, such as bio-concrete [15], recycled aggregate concrete [16] and fiber concrete [17]. In addition, ML can be applied to predict the mechanical properties and durability of reinforced concrete structures and to evaluate their service life. At present, some scholars have used ML models to analyze reinforced concrete beams [18], squat reinforced concrete walls [19], reinforced concrete slabs and other engineering problems [20,21]. Whatever the accuracy, however, the black-box nature of the predictions makes ML models unexplainable. The emergence of SHapley Additive exPlanation (SHAP) can solve this problem. SHAP can reasonably explain the interaction of variables in the ML model and the influence of eigenvalues on the results. Researchers have applied explicable ML combined with the SHAP method to concrete to predict the mechanical properties, durability and working performance of reinforced concrete beam [22], slab [23] and column members [24] and to explain the prediction process. The application of ensemble learning in concrete structures is of great interest to researchers [25][26][27][28][29]. Ensemble learning is combining multiple individual learners into one learner to complete a learning task. According to the generation of individual learners, the current ensemble learning methods can be roughly divided into three categories: boosting, bagging and stacking. Ensemble learning integrates several learning devices to achieve better performance than a single learning device [30][31][32]. The three kinds of ensemble learning are widely used in concrete structures. Vimal [33] et al. predicted the compressive strength of concrete with the boosting ML algorithm. Ahmad [34] et al. used the bagging model to predict the compressive strength of concrete containing supplementary cementing materials. Gupta [35] et al. estimated the compressive strength of geopolymer composites by using the stacking model.
All studies on the compressive strength of RP mortar have been conducted through experiments. To the author's knowledge, no studies have been carried out (1) using an integrated ML model to predict the compressive strength of RP mortar, (2) explaining the prediction mechanism of compressive strength based on ML, or (3) on the factors affecting the compressive strength of the RP mortar. This paper aims to: (1) develop an integrated ML algorithm model to predict the compressive strength of RP mortar and to (2) explain the contribution of input variables to the prediction results. Therefore, four different ensemble ML-based algorithms are developed to predict the compressive strength of the RP mortar and the underlining relationship between input influential factors and output strength is also revealed. This research consists of four stages: firstly, 204 groups of RCP and RBP mortar compressive strength test results are collected from the relevant literature; secondly, eXtreme Gradient-Boosting (XGBoost), Random Forest (RF), Light Gradi- Spurred by the constantly challenging requirements of the construction industry, researchers used machine learning (ML) to predict the compressive strength of new types of concrete, such as bio-concrete [15], recycled aggregate concrete [16] and fiber concrete [17]. In addition, ML can be applied to predict the mechanical properties and durability of reinforced concrete structures and to evaluate their service life. At present, some scholars have used ML models to analyze reinforced concrete beams [18], squat reinforced concrete walls [19], reinforced concrete slabs and other engineering problems [20,21]. Whatever the accuracy, however, the black-box nature of the predictions makes ML models unexplainable. The emergence of SHapley Additive exPlanation (SHAP) can solve this problem. SHAP can reasonably explain the interaction of variables in the ML model and the influence of eigenvalues on the results. Researchers have applied explicable ML combined with the SHAP method to concrete to predict the mechanical properties, durability and working performance of reinforced concrete beam [22], slab [23] and column members [24] and to explain the prediction process. The application of ensemble learning in concrete structures is of great interest to researchers [25][26][27][28][29]. Ensemble learning is combining multiple individual learners into one learner to complete a learning task. According to the generation of individual learners, the current ensemble learning methods can be roughly divided into three categories: boosting, bagging and stacking. Ensemble learning integrates several learning devices to achieve better performance than a single learning device [30][31][32]. The three kinds of ensemble learning are widely used in concrete structures. Vimal [33] (1) using an integrated ML model to predict the compressive strength of RP mortar, (2) explaining the prediction mechanism of compressive strength based on ML, or (3) on the factors affecting the compressive strength of the RP mortar. This paper aims to: (1) develop an integrated ML algorithm model to predict the compressive strength of RP mortar and to (2) explain the contribution of input variables to the prediction results. Therefore, four different ensemble ML-based algorithms are developed to predict the compressive strength of the RP mortar and the underlining relationship between input influential factors and output strength is also revealed. This research consists of four stages: firstly, 204 groups of RCP and RBP mortar compressive strength test results are collected from the relevant literature; secondly, eXtreme Gradient-Boosting (XGBoost), Random Forest (RF), Light Gradient-Boosting Machine (LightGBM) and Adaptive Boosting (AdaBoost) algorithms are applied, and then the hyperparameters are optimized to establish the strength prediction model; thirdly, the accuracy and generalization ability of the ML-based models are evaluated with the a10-index, root mean square error, mean absolute error and determination coefficient; finally, SHAP is used to illustrate the prediction process of the best algorithmic model and to investigate the interpretability of the influencing factors. The RP mortar compressive strength test requires a lot of manpower, material resources and time costs. The ML method only needs data to predict the compressive strength and does not need experimental research to test the comprehensive impact of RP mortar compressive strength and to provide reference value for potential users of RP.

Data Collection and Analysis
Lots of RP mortar test data are required to establish the compressive strength prediction model. To this end, 125 groups of RCP mortar strength and 79 groups of RBP mortar strength data are collected from previous literature. The cement used in the datasets is P.O 42.5 or P.O 42.5R. The complete dataset and its resource are listed in Appendix A.
It should be noted that the acceptable numbers of data for ML modeling should be greater than 10 times the number of input variables [36]. According to the literature review, four variables are selected as input variables: mass replacement rate (MRR) of RP (%), kinds of RP, water-to-binder ratio (W/B), particle size of RP (µm) and mortar compressive strength (f c ) of RP(Mpa) served as output variables [37,38]. Therefore, the data size (204) chosen in this paper meets the requirement of data number for ML modeling. Figure 2 depicts the Pearson correlation coefficient plot for each variable. It is found that the compressive strength of RP mortar is strongly correlated with MRR and negatively correlated. However, the correlation degree of other variables is considered a weak correlation or no correlation. In addition, the distribution of all variables is shown in Figure 3, and the cumulative percentage of each variable is shown in the orange curve in Figure 3. The distribution of each variable is described below; 125 groups of RCP, accounting for 61%, are classified as kind 0; 79 groups of RBP, accounting for 39%, are classified as kind 1. The particle size of RP refers to the average particle size of RP; the distribution range is 0-100 µm, and the main distribution is in the range of 10-30 µm. The W/B is 0.35, 0.4, 0.5 and 0.55, with 0.5 accounting for the highest proportion [39,40]. The mass replacement rate of RP ranges from 0 to 60%, with 30% being the most frequently used in the experiment [41]. As the output variable, the mortar compressive strength ranges from 16.60 to 56.80 Mpa, and most of them are distributed between 30 and 50 Mpa. uated with the a10-index, root mean square error, mean absolute error and determ coefficient; finally, SHAP is used to illustrate the prediction process of the best algo model and to investigate the interpretability of the influencing factors. The RP compressive strength test requires a lot of manpower, material resources and tim The ML method only needs data to predict the compressive strength and does n experimental research to test the comprehensive impact of RP mortar com strength and to provide reference value for potential users of RP.

Data Collection and Analysis
Lots of RP mortar test data are required to establish the compressive strength tion model. To this end, 125 groups of RCP mortar strength and 79 groups of RBP strength data are collected from previous literature. The cement used in the da P.O 42.5 or P.O 42.5R. The complete dataset and its resource are listed in Append It should be noted that the acceptable numbers of data for ML modeling sh greater than 10 times the number of input variables [36]. According to the litera view, four variables are selected as input variables: mass replacement rate (MRR (%), kinds of RP, water-to-binder ratio (W/B), particle size of RP(μm) and mort pressive strength (fc) of RP(Mpa) served as output variables [37,38]. Therefore, size (204) chosen in this paper meets the requirement of data number for ML m Figure 2 depicts the Pearson correlation coefficient plot for each variable. It is fou the compressive strength of RP mortar is strongly correlated with MRR and ne correlated. However, the correlation degree of other variables is considered a wea lation or no correlation. In addition, the distribution of all variables is shown in F and the cumulative percentage of each variable is shown in the orange curve in F The distribution of each variable is described below; 125 groups of RCP, accoun 61%, are classified as kind 0; 79 groups of RBP, accounting for 39%, are classified 1. The particle size of RP refers to the average particle size of RP; the distribution 0-100 μm, and the main distribution is in the range of 10-30 μm. The W/B is 0.35 and 0.55, with 0.5 accounting for the highest proportion [39,40]. The mass repl rate of RP ranges from 0 to 60%, with 30% being the most frequently used in the ment [41]. As the output variable, the mortar compressive strength ranges from 56.80 Mpa, and most of them are distributed between 30 and 50 Mpa.

Data Preprocessing
After collecting the raw data, a detailed dataset can be obtained after filling in the missing values and processing the outliers [42]. However, various metrics have different properties (continuous, discrete) and orders of magnitude, thus training directly will weaken the impact of data with lower orders of magnitude, so the data also need to be normalized (target values usually do not need to be scaled) to allow the data to be trained. Data standardization is the process of turning dimensioned data into dimensionless data and processing data of different magnitudes to the same magnitude, so that data of different dimensions can be compared, which can improve the prediction effect of the model for ML. The formula for z-score normalization can be expressed as: where is the raw datum, is the mean of the data, and is the standard deviation [43].

ML Model Algorithm
The model was developed using Python code in the PyCharm (version 2017.1) software, which comes with a set of tools to help users improve their productivity when developing in the Python language, version 3.6.4 of which was used.
Four ensemble ML algorithms XGBoost, RF, LightGBM and AdaBoost are selected among a handful of ML algorithms to predict the compressive strength of the RP mortar. Ensemble learning algorithms accomplish learning tasks by building and combining multiple machine learners, so they can often obtain superior generalization performance than a single learner [44]. At present, there are three groups of ensemble learning algorithms: bagging-based algorithms, boosting-based algorithms and stacking-based algorithms; RF [45,46] is a typical kind of bagging-based representative algorithms, while boosting-based representative algorithms include AdaBoost, XGBoost, LightGBM, etc. [47] The stacking

Data Preprocessing
After collecting the raw data, a detailed dataset can be obtained after filling in the missing values and processing the outliers [42]. However, various metrics have different properties (continuous, discrete) and orders of magnitude, thus training directly will weaken the impact of data with lower orders of magnitude, so the data also need to be normalized (target values usually do not need to be scaled) to allow the data to be trained. Data standardization is the process of turning dimensioned data into dimensionless data and processing data of different magnitudes to the same magnitude, so that data of different dimensions can be compared, which can improve the prediction effect of the model for ML. The formula for z-score normalization can be expressed as: where x is the raw datum, µ is the mean of the data, and σ is the standard deviation [43].

ML Model Algorithm
The model was developed using Python code in the PyCharm (version 2017.1) software, which comes with a set of tools to help users improve their productivity when developing in the Python language, version 3.6.4 of which was used.
Four ensemble ML algorithms XGBoost, RF, LightGBM and AdaBoost are selected among a handful of ML algorithms to predict the compressive strength of the RP mortar. Ensemble learning algorithms accomplish learning tasks by building and combining multiple machine learners, so they can often obtain superior generalization performance than a single learner [44]. At present, there are three groups of ensemble learning algorithms: baggingbased algorithms, boosting-based algorithms and stacking-based algorithms; RF [45,46] is a typical kind of bagging-based representative algorithms, while boosting-based representative algorithms include AdaBoost, XGBoost, LightGBM, etc. [47]. The stacking algorithm refers to the combination of multiple basic learning tools to generate final predictions that are more accurate than a single stacking model. For example, the SVR, XGBoost and GBDT algorithms can be used to build a stacking algorithm [48]. In this section, the four ML algorithms used in the experiment are briefly explained, which helps to understand the principles and compare their differences. Figure 4 depicts a flowchart of the research method in this paper. algorithm refers to the combination of multiple basic learning tools to generate final predictions that are more accurate than a single stacking model. For example, the SVR, XGBoost and GBDT algorithms can be used to build a stacking algorithm [48]. In this section, the four ML algorithms used in the experiment are briefly explained, which helps to understand the principles and compare their differences. Figure 4 depicts a flowchart of the research method in this paper.

eXtreme Gradient-Boosting (XGBoost)
XGBoost is an efficient, flexible and light gradient-boosting decision tree algorithm [49]. XGBoost has the same basic idea as GBDT, but XGBoost has a lot of optimizations and is different from light gradient-boosting [50]. For example, the second-order Taylor formula expansion is used to optimize the loss function and improve the calculation accuracy. A regularization term is used to simplify the model to avoid overfitting. The blocks storage structure is adopted, and parallel computing is implemented. As a forward addition model, it is critical to adopt the ensemble idea (boosting), which integrates multiple weak learners into a strong learner through certain methods. Figure 5 demonstrates the derivation steps of the XGBoost algorithm.

eXtreme Gradient-Boosting (XGBoost)
XGBoost is an efficient, flexible and light gradient-boosting decision tree algorithm [49]. XGBoost has the same basic idea as GBDT, but XGBoost has a lot of optimizations and is different from light gradient-boosting [50]. For example, the second-order Taylor formula expansion is used to optimize the loss function and improve the calculation accuracy. A regularization term is used to simplify the model to avoid overfitting. The blocks storage structure is adopted, and parallel computing is implemented. As a forward addition model, it is critical to adopt the ensemble idea (boosting), which integrates multiple weak learners into a strong learner through certain methods. Figure 5 demonstrates the derivation steps of the XGBoost algorithm. algorithm refers to the combination of multiple basic learning tools to generate final pre dictions that are more accurate than a single stacking model. For example, the SVR XGBoost and GBDT algorithms can be used to build a stacking algorithm [48]. In this sec tion, the four ML algorithms used in the experiment are briefly explained, which helps to understand the principles and compare their differences. Figure 4 depicts a flowchart o the research method in this paper.

eXtreme Gradient-Boosting (XGBoost)
XGBoost is an efficient, flexible and light gradient-boosting decision tree algorithm [49]. XGBoost has the same basic idea as GBDT, but XGBoost has a lot of optimization and is different from light gradient-boosting [50]. For example, the second-order Taylo formula expansion is used to optimize the loss function and improve the calculation ac curacy. A regularization term is used to simplify the model to avoid overfitting. Th blocks storage structure is adopted, and parallel computing is implemented. As a forward addition model, it is critical to adopt the ensemble idea (boosting), which integrates mul tiple weak learners into a strong learner through certain methods. Figure 5 demonstrate the derivation steps of the XGBoost algorithm.

Random Forest (RF)
RF adopts the idea of bagging [51] with the following steps: (1) each time a sample is taken back for training to form a new training set; (2) adopt the new training set to train n sub-models; (3) for regression problems, the predicted value is obtained by using a simple averaging method [52]. The training modeling process of the RF method is shown in Figure 6. RF takes decision trees as the basic unit, and a lot of decision trees are integrated to form a random forest [53,54].
Materials 2023, 16, x FOR PEER REVIEW

Random Forest (RF)
RF adopts the idea of bagging [51] with the following steps: (1) each time a sa taken back for training to form a new training set; (2) adopt the new training set to sub-models; (3) for regression problems, the predicted value is obtained by using a averaging method [52]. The training modeling process of the RF method is shown ure 6. RF takes decision trees as the basic unit, and a lot of decision trees are integr form a random forest [53,54].

Light Gradient-Boosting Machine (LightGBM)
LightGBM is also an efficient implementation of GBDT. In principle, lightG similar to XGBoost, but XGBoost consumes considerable space and time. In order t the defects of XGBoost, lightGBM is optimized on the traditional GBDT algorithm LightGBM is a decision tree algorithm based on a histogram, which can reduce m footprint and computation times. Gradient-based One-Side Sampling (GOSS) is exclude most samples with a small gradient, and only the remaining samples are calculate the information gain, so as to balance the reduction of data volume and accuracy. Using Exclusive Feature Bundling (EFB), many mutually exclusive featu bound to a single feature to reduce dimensionality. The leaf-wise algorithm with restriction is used to control the complexity of the model and to ensure high eff while preventing overfitting [57]. Figure 7 shows the schematic diagram of the Lig optimization methods.

Light Gradient-Boosting Machine (LightGBM)
LightGBM is also an efficient implementation of GBDT. In principle, lightGBM is similar to XGBoost, but XGBoost consumes considerable space and time. In order to avoid the defects of XGBoost, lightGBM is optimized on the traditional GBDT algorithm [55,56]. LightGBM is a decision tree algorithm based on a histogram, which can reduce memory footprint and computation times. Gradient-based One-Side Sampling (GOSS) is used to exclude most samples with a small gradient, and only the remaining samples are used to calculate the information gain, so as to balance the reduction of data volume and ensure accuracy. Using Exclusive Feature Bundling (EFB), many mutually exclusive features are bound to a single feature to reduce dimensionality. The leaf-wise algorithm with depth restriction is used to control the complexity of the model and to ensure high efficiency while preventing overfitting [57]. Figure 7 shows the schematic diagram of the LightGBM optimization methods.

Random Forest (RF)
RF adopts the idea of bagging [51] with the following steps: (1) each time a sample is taken back for training to form a new training set; (2) adopt the new training set to train n sub-models; (3) for regression problems, the predicted value is obtained by using a simple averaging method [52]. The training modeling process of the RF method is shown in Figure 6. RF takes decision trees as the basic unit, and a lot of decision trees are integrated to form a random forest [53,54].

Light Gradient-Boosting Machine (LightGBM)
LightGBM is also an efficient implementation of GBDT. In principle, lightGBM is similar to XGBoost, but XGBoost consumes considerable space and time. In order to avoid the defects of XGBoost, lightGBM is optimized on the traditional GBDT algorithm [55,56]. LightGBM is a decision tree algorithm based on a histogram, which can reduce memory footprint and computation times. Gradient-based One-Side Sampling (GOSS) is used to exclude most samples with a small gradient, and only the remaining samples are used to calculate the information gain, so as to balance the reduction of data volume and ensure accuracy. Using Exclusive Feature Bundling (EFB), many mutually exclusive features are bound to a single feature to reduce dimensionality. The leaf-wise algorithm with depth restriction is used to control the complexity of the model and to ensure high efficiency while preventing overfitting [57]. Figure 7 shows the schematic diagram of the LightGBM optimization methods. AdaBoost is also called reinforcement learning or the promotion method, which is proposed by Freund et al. [58]. Its self-adaptation is reflected in each training data set;

Adaptive Boosting (AdaBoost)
AdaBoost is also called reinforcement learning or the promotion method, which is proposed by Freund et al. [58]. Its self-adaptation is reflected in each training data set; each sample prediction is assigned a weight, and the wrong prediction is identified and further assigned to the next basic learner with a high weight for this wrong prediction, while the weight of the correctly predicted sample will be reduced. Meanwhile, in each iteration, a new weak learner is added, and the final strong learner is not determined until a predetermined error rate is small enough or until the predetermined maximum number of iterations is reached. However, AdaBoost is sensitive to abnormal samples, and abnormal samples may get higher weights in iteration, which may affect the prediction accuracy. The expression for AdaBoost is Equation (2) below.

Model Tuning: K-Fold Cross-Validation
When conducting ML research, the raw data are divided into training and test sets, and the algorithm is evaluated by simple cross-validation. However, the data are only used once; it is not fully utilized, and the division ratio of the raw data has a large impact on the evaluation index of the test set. Therefore, k-fold cross-validation is required, which can solve the problem of not having enough data in the dataset and can also solve the problem of parameter tuning [59]. In this study, five-fold cross-validation is used. Firstly, all samples are divided into five subsets with equal number of samples. Then, the five subsets are traversed successively, and the current subset is served as the validation set each time, and all the remaining samples are used as the training set to train and evaluate the model. Finally, the average value of the five evaluation indices is taken as the final evaluation index [60]. The schematic five-fold cross-validation diagram is given in Figure 8. further assigned to the next basic learner with a high weight for this wrong prediction while the weight of the correctly predicted sample will be reduced. Meanwhile, in each iteration, a new weak learner is added, and the final strong learner is not determined unti a predetermined error rate is small enough or until the predetermined maximum number of iterations is reached. However, AdaBoost is sensitive to abnormal samples, and abnormal samples may get higher weights in iteration, which may affect the prediction accuracy. The expression for AdaBoost is Equation (2) below.

Model Tuning: K-Fold Cross-Validation
When conducting ML research, the raw data are divided into training and test sets, and the algorithm is evaluated by simple cross-validation. However, the data are only used once it is not fully utilized, and the division ratio of the raw data has a large impact on the evaluation index of the test set. Therefore, k-fold cross-validation is required, which can solve the problem of not having enough data in the dataset and can also solve the problem of parameter tuning [59]. In this study, five-fold cross-validation is used. Firstly, all samples are divided into five subsets with equal number of samples. Then, the five subsets are traversed successively, and the current subset is served as the validation set each time, and al the remaining samples are used as the training set to train and evaluate the model. Finally the average value of the five evaluation indices is taken as the final evaluation index [60] The schematic five-fold cross-validation diagram is given in Figure 8.

Model Evaluation
In order to evaluate the predictive performance of the algorithm used in this study three evaluation metrics are selected, namely, root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination (R 2 ) [61]. RMSE is a measure of the deviation between observed values and true values, also known as standard error. Its significance is that, after the root number of the mean square error, the result of the error is at the same level as the data, and the data can be better described. RMSE is sensitive to the reflection of very large or small errors in a set of measurements, so RMSE can well reflect the precision of the measurement. MAE is used to evaluate the degree of proximity between predicted results and true values, and it is the most easily understood regression error indicator. The smaller the value, the better the fitting effect. R 2 can be understood as the proportion of the variability of the dependent variable that can be explained by the independent variable through the multiple regression equation. It measures the degree of

Model Evaluation
In order to evaluate the predictive performance of the algorithm used in this study, three evaluation metrics are selected, namely, root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination (R 2 ) [61]. RMSE is a measure of the deviation between observed values and true values, also known as standard error. Its significance is that, after the root number of the mean square error, the result of the error is at the same level as the data, and the data can be better described. RMSE is sensitive to the reflection of very large or small errors in a set of measurements, so RMSE can well reflect the precision of the measurement. MAE is used to evaluate the degree of proximity between predicted results and true values, and it is the most easily understood regression error indicator. The smaller the value, the better the fitting effect. R 2 can be understood as the proportion of the variability of the dependent variable that can be explained by the independent variable through the multiple regression equation. It measures the degree of interpretation of each independent variable to the change of the dependent variable. Its value is between 0 and 1. The closer its value is to 1, the better the regression fitting effect of the variable is. On the contrary, the worse the regression fitting effect is. Furthermore, a following new engineering index, the a10-index evaluates artificial intelligence models by showing the number of samples that fit the prediction values with a deviation of ±10% compared to experimental values. The index is used to evaluate the reliability of the ML model [62]. Equations (3)-(6) are the evaluation of three indicators: where f (x i ) is the predicted value, y i is the target value, and y i is the average of the target values, i = 1, 2, 3, . . . , m. m is the number of dataset sample, and m10 is the number of samples with a value of the ratio between the experimental and predicted values between 0.90 and 1.1.

Model Results and Discussion
All the samples are randomly divided into a training set and a test set according to the proportion of 8:2. The training set is Appendix B, and the test set is Appendix C. In order to ensure the comparability of performance evaluation, the same training and test sets are used for all ML algorithms, hyperparameters are tuned by using grid search combined with five-fold cross validation, and the comprehensive performance is evaluated by using four evaluation metrics (RMSE, MAE, R 2 , a10 − index).

Hyperparameter Settings and Optimization
In order to obtain higher prediction accuracy and generalization ability, the parameters and hyperparameters of the ML algorithm model are adjusted. Parameters refer to the numbers of optimized algorithms, such as the least squares method or the gradient descent method. Model parameters are configuration variables within the model that are usually not set manually by the programmer and whose values can be estimated from the data. Hyperparameters are artificially set parameters before machine learning begins, whose values cannot be estimated from the data, and are often used to help estimate model parameters. Common hyperparameter optimization methods include grid search, random search and Bayesian optimization [63,64]. As there are few hyperparameters that need to be tuned in this study, grid search combined with five-fold cross validation is used to optimize the hyperparameters. Grid search refers to pre-defining a set of potential values for the desired hyperparameters. Each hyperparameter is then assigned a value from the set of potential values to form a different combination of hyperparameters. Then, the machine learning model is trained and evaluated for each hyperparameter combination, and the optimal combination is found during the tuning process [65]. Table 1 lists the major hyperparameter potential values used by each ML model in grid search. In order to determine the final prediction model, its prediction performance needs to be tested by five-fold cross validation [20]. Table 2 presents the values of the major hyperparameters of the ML model, as well as the mean scores of five-fold cross-validation.

Model Prediction Results
The prediction results shown in Figure 9a can be used as a reference for the comparison of generalization ability among the four ML models; blue dots represent the training set results, and orange dots represent the test set results. The data point in the scatterplot of the four ensemble models are distributed around the baseline (y = x). Figure 9a demonstrates the XGBoost-predicted results. It is found in Figure 9a that the difference between the test values and the predicted values is small, and it shows good performance in predicting the compressive strength of the RP mortar. In order to compare the prediction results of the proposed ML models, Table 3 lists the specific evaluation metrics (R 2 , RMSE, MAE, a10-index) of the training and test sets of the four ML models. Using the same training set and test set for prediction, it is found that the R 2 value of XGBoost is higher than that of the other three ML models, which significantly indicates that XGBoost has the highest prediction accuracy. The evaluation criterion of RMSE and MAE is the smaller the better. It can be seen from the Table 3 that the RMSE and MAE evaluation indices of the XGBoost training set and test set are lower than those of the other three ML models, while the RMSE and MAE of the AdaBoost training set and test set are the largest, indicating that the error of the XGBoost model is the smallest. The evaluation criterion of the a10-index is that the closer it is to 1, the better the reliability will be. It can be seen from Table 3 that the reliability of XGBoost is higher than that of other models and that AdaBoost performs the worst. According to the comprehensive consideration of the four evaluation indices of the training set and the test set prediction model, the accuracy and generalization of the four ML models from high to low are ranked as XGBoost, RF, LightGBM and AdaBoost. AdaBoost training set and test set are the largest, indicating that the error of the XGBoost model is the smallest. The evaluation criterion of the a10-index is that the closer it is to 1, the better the reliability will be. It can be seen from Table 3 that the reliability of XGBoost is higher than that of other models and that AdaBoost performs the worst. According to the comprehensive consideration of the four evaluation indices of the training set and the test set prediction model, the accuracy and generalization of the four ML models from high to low are ranked as XGBoost, RF, LightGBM and AdaBoost.

Interpretability of the Model
Based on the above evaluation results, the XGBoost model is selected for interpretation and analysis. ML algorithm model can only obtain the final prediction results, unable to explain the characteristics and the influence of the predicted results. Therefore, SHAP is applied to interpret each variable on a global and individual scale in the following sections.

Features Explained: Shapley Additive exPlanations (SHAP)
The interpretation methods of the ML model include the linear model, PDP, SHAP, ALE, etc. Among them, SHAP is widely applicable to the interpretable field, which not only reflects the influence of all features of each sample but also shows the positive and negative of the influence [66]. Therefore, the analysis of the research process of the ML regression prediction model in this study is done by SHAP technology. The SHAP interpretation method is to calculate the Shapley value according to the alliance game theory [67]. The eigenvalues of the data instance act as participants in the federation (collection). The Shapley values reveal how much eigenvalues contribute to the prediction [68,69]. The SHAP value follows the following equation.
where is the baseline of the whole model (usually the mean of the target variables

Interpretability of the Model
Based on the above evaluation results, the XGBoost model is selected for interpretation and analysis. ML algorithm model can only obtain the final prediction results, unable to explain the characteristics and the influence of the predicted results. Therefore, SHAP is applied to interpret each variable on a global and individual scale in the following sections.

Features Explained: Shapley Additive exPlanations (SHAP)
The interpretation methods of the ML model include the linear model, PDP, SHAP, ALE, etc. Among them, SHAP is widely applicable to the interpretable field, which not only reflects the influence of all features of each sample but also shows the positive and negative of the influence [66]. Therefore, the analysis of the research process of the ML regression prediction model in this study is done by SHAP technology. The SHAP interpretation method is to calculate the Shapley value according to the alliance game theory [67]. The eigenvalues of the data instance act as participants in the federation (collection). The Shapley values reveal how much eigenvalues contribute to the prediction [68,69]. The SHAP value follows the following equation.
where y base is the baseline of the whole model (usually the mean of the target variables of all samples), x i,k is the k feature of the i sample, f (x i,k ) is the SHAP value of x i,k , when f (x i,k ) > 0, indicating that the feature improves the predicted value and has a positive effect. Conversely, it means that the feature makes the predicted value lower and has the opposite effect. Figure 10 shows a typical individual prediction graph for a sample. The SHAP value quantifies the local interpretation of the model, using the sum of the effects of each input variable to explain each prediction ( Figure 10). The base value represents the mean value of the predicted value; the length of the bar represents the SHAP value of each eigenvalue; red represents the positive influence of the eigenvalue on the predicted value, and blue represents the negative influence. As can be seen from Figure 10, for this sample, starting from the base value, the SHAP values of the four eigenvalues are superimposed to get the result. The mass replacement rate of RP has the greatest positive influence on the prediction of the compressive strength of te RP mortar. Note that it is likely to vary the importance factor for individual predictions (Figure 10) from the mean SHAP value (Figure 11) as the mean value is a global indication.

R PEER REVIEW 11 of 26
red represents the positive influence of the eigenvalue on the predicted value, and blue represents the negative influence. As can be seen from Figure 10, for this sample, starting from the base value, the SHAP values of the four eigenvalues are superimposed to get the result. The mass replacement rate of RP has the greatest positive influence on the prediction of the compressive strength of te RP mortar. Note that it is likely to vary the importance factor for individual predictions (Figure 10) from the mean SHAP value ( Figure  11) as the mean value is a global indication.

Global Interpretation
Passing the SHAP values to the bar plot function creates a global feature importance plot, where the global importance of each feature is treated as the average absolute value of that feature across all given samples, as shown in Figure 11. It shows that the mass replacement rate (MRR) of RP has a significant contribution to the prediction of compressive strength of RP mortar, and its SHAP value is twice that of its size, and kind has the least influence on the predicted results. The water-binder ratio in the data set is more concentrated, so it has the least influence on the prediction results.
Each row in Figure 12 represents a feature that is sorted by importance from top to bottom, with the abscissa is displayed the SHAP value. A dot represents a sample, and the change in color from blue to red indicates that the value of the feature itself changes from small to large. Through the overall analysis diagram of features, it can be intuitively seen that the mass replacement rate (MRR) of RP is an important influencing factor, and its feature value is basically negatively correlated with the SHAP value, indicating that the mass replacement rate of RP has a negative impact on the compressive strength of the mortar. The influence of kind on the prediction result is that the eigenvalue is positively correlated with the SHAP value. The characteristic value of RCP is 0 and that of RBP is 1, indicating that the compressive strength of RBP mortar is generally higher than that of the RCP mortar.  red represents the positive influence of the eigenvalue on the predicted value, and blue represents the negative influence. As can be seen from Figure 10, for this sample, starting from the base value, the SHAP values of the four eigenvalues are superimposed to get the result. The mass replacement rate of RP has the greatest positive influence on the prediction of the compressive strength of te RP mortar. Note that it is likely to vary the importance factor for individual predictions (Figure 10) from the mean SHAP value ( Figure  11) as the mean value is a global indication.

Global Interpretation
Passing the SHAP values to the bar plot function creates a global feature importance plot, where the global importance of each feature is treated as the average absolute value of that feature across all given samples, as shown in Figure 11. It shows that the mass replacement rate (MRR) of RP has a significant contribution to the prediction of compressive strength of RP mortar, and its SHAP value is twice that of its size, and kind has the least influence on the predicted results. The water-binder ratio in the data set is more concentrated, so it has the least influence on the prediction results.
Each row in Figure 12 represents a feature that is sorted by importance from top to bottom, with the abscissa is displayed the SHAP value. A dot represents a sample, and the change in color from blue to red indicates that the value of the feature itself changes from small to large. Through the overall analysis diagram of features, it can be intuitively seen that the mass replacement rate (MRR) of RP is an important influencing factor, and its feature value is basically negatively correlated with the SHAP value, indicating that the mass replacement rate of RP has a negative impact on the compressive strength of the mortar. The influence of kind on the prediction result is that the eigenvalue is positively correlated with the SHAP value. The characteristic value of RCP is 0 and that of RBP is 1, indicating that the compressive strength of RBP mortar is generally higher than that of the RCP mortar.

Global Interpretation
Passing the SHAP values to the bar plot function creates a global feature importance plot, where the global importance of each feature is treated as the average absolute value of that feature across all given samples, as shown in Figure 11. It shows that the mass replacement rate (MRR) of RP has a significant contribution to the prediction of compressive strength of RP mortar, and its SHAP value is twice that of its size, and kind has the least influence on the predicted results. The water-binder ratio in the data set is more concentrated, so it has the least influence on the prediction results.
Each row in Figure 12 represents a feature that is sorted by importance from top to bottom, with the abscissa is displayed the SHAP value. A dot represents a sample, and the change in color from blue to red indicates that the value of the feature itself changes from small to large. Through the overall analysis diagram of features, it can be intuitively seen that the mass replacement rate (MRR) of RP is an important influencing factor, and its feature value is basically negatively correlated with the SHAP value, indicating that the mass replacement rate of RP has a negative impact on the compressive strength of the mortar. The influence of kind on the prediction result is that the eigenvalue is positively correlated with the SHAP value. The characteristic value of RCP is 0 and that of RBP is 1, indicating that the compressive strength of RBP mortar is generally higher than that of the RCP mortar. Figure 13 depicts the relationship between a given variable and the SHAP value and frequently interacting variables. Each point represents a sample in the feature correlation graph, where the color represents the size of the feature value on the right. The experimental data of the water-binder ratio are concentrated, so the feature interaction analysis is not carried out. The kind, size and mass replacement rate affect each other. It can be seen from Figure 13a that when kind = 0, the SHAP value of kind decreases with the increase in size, while, when kind = 1, the SHAP value of kind increases with the increase in size, indicating that size has an effect on species. It can be seen from Figure 13c,d that, with the increase in the RP quality replacement rate (MRR), the SHAP value decreases to varying degrees. Therefore, as a major factor in reducing compressive strength, the reduction in the SHAP value means the reduction of compressive performance. Some experimental evidence supports these findings [70,71]. It is difficult to obtain obvious results from the figure on the influence of particle size on the compressive strength of mortar, which needs further study in the future.  Figure 13 depicts the relationship between a given variable and the SHAP value and frequently interacting variables. Each point represents a sample in the feature correlation graph, where the color represents the size of the feature value on the right. The experimental data of the water-binder ratio are concentrated, so the feature interaction analysis is not carried out. The kind, size and mass replacement rate affect each other. It can be seen from Figure 13a that when kind = 0, the SHAP value of kind decreases with the increase in size, while, when kind = 1, the SHAP value of kind increases with the increase in size, indicating that size has an effect on species. It can be seen from Figure 13c,d that, with the increase in the RP quality replacement rate (MRR), the SHAP value decreases to varying degrees. Therefore, as a major factor in reducing compressive strength, the reduction in the SHAP value means the reduction of compressive performance. Some experimental evidence supports these findings [70,71]. It is difficult to obtain obvious results from the figure on the influence of particle size on the compressive strength of mortar, which needs further study in the future.

Sensitivity of Feature
Sensitivity analysis is a method used to study how the input changes of ML models affect the output changes [72]. Because the XGBoost model is the best in predicting the compressive strength of RP mortar, XGBoost is selected for sensitivity analysis. In order to investigate the sensitivity of the selected ML model, one feature value is disturbed each time for two kinds of RP, and the average value of the other two feature values remain unchanged so as to predict the compressive strength of the RP mortar. Figures 14 and 15. show the feature sensitivity analysis diagrams of RCP and RBP, respectively. It can be seen from the correlation coefficients −0.592 and −0.669 that the mass replacement rates of RCP and RBP are moderately negatively correlated with the predicted mortar compressive strength. According to the fitting equation, the decline trend of RCP is steeper than that of RBP. There is no obvious linear relationship between the particle size of the two kinds of RP and the predicted compressive strength of the mortar. Due to the concentration of water-binder in the data, sensitivity analysis of the water-binder ratio was not carried out. By comparing the predicted values of the two kinds of RP under different disturbance feature values, it is found that the mass replacement rate of the regenerated powder leads to the biggest change in the predicted results.
where r is the correlation coefficient, a measure of linear correlation between variables. X i is the feature value of the perturbation, and Y i is the predictive value. X is the average of the feature values of the perturbation, and Y is the average of the predicted values.
where r is the correlation coefficient, a measure of linear correlation between variables. is the feature value of the perturbation, and is the predictive value. is the average of the feature values of the perturbation, and is the average of the predicted values.
where r is the correlation coefficient, a measure of linear correlation between variables. is the feature value of the perturbation, and is the predictive value. is the average of the feature values of the perturbation, and is the average of the predicted values.

Conclusions
In this paper, four ensemble ML models (XGBoost, RF, LightGBM, AdaBoost) are applied to the prediction of the compressive strength of RP mortar. Four variables are used as inputs, including W/B, particle size of RP (µm), kind of RP and mass replacement rate of RP (%). The RMSE, MAE, R 2 and a10-index are used to compare the performance of four ML models for predicting the compressive strength of the RP mortar. Finally, the SHAP algorithm is used to explain the model prediction and analyze the influence of the input variables on the output. The following conclusions can be drawn from the study:

1.
After the comparison of four performance indicators, it is found that XGBoost achieves the best results in four ML models. Among the four ML models used in this paper, AdaBoost has the worst performance, with the R 2 value in the training set only 0.708. This is because AdaBoost is sensitive to abnormal samples, which may gain higher weights in iterations, affecting the prediction accuracy of the final strong learner. 3.
SHAP is an additive interpreter that adds up the contribution values of each influencing factor to obtain the final predicted value. In the first place, SHAP provides a global interpretation of the strength prediction and sorts the feature importance of the four input variables to conclude that the mass replacement rate of RP has the greatest influence on the prediction process, which is consistent with the results found in previous experiments [73]. On the contrary, W/B has the least effect on the predicted results, because the W/B used in the experiment is more concentrated. 4.
In the feature dependence analysis, the SHAP value decreases with the increase in the mass replacement rate, and the SHAP value of RBP is slightly higher than that of RCP. These findings provide reference value for future research into recycled powder.