Machine Learning Application to Eco-Friendly Concrete Design for Decarbonisation

: Cement replacement materials can not only beneﬁt the workability of the concrete but can also improve its compressive strength. Reducing the cement content of concrete can also lower CO 2 emissions to mitigate the impact of the construction industry on the environment and improve energy consumption. This paper aims to predict the compressive strength (CS) and embodied carbon (EC) of cement replacement concrete using machine learning (ML) algorithms, i.e., deep neural network (DNN), support vector regression (SVR), gradient boosting regression (GBR), random forest (RF), k-nearest neighbors (kNN), and decision tree regression (DTR). Not only is producing an optimal ML model helpful for predicting accurate results, but it also saves time, energy, and costs, compared to conducting experiments. Firstly, 367 pieces of experimental datasets from the open literature were collected, in which cement was replaced with any of the cementitious materials. Secondly, the datasets were imported into the ML models, whose parameters were tuned by the grid search algorithm (GSA). Then, the prediction performance, the coefﬁcient of determination (R 2 ), the prediction accuracy, and the root mean square error (RMSE) were employed to indicate the prediction ability of the ML models. The results demonstrate that the GBR models perform the best prediction of the CS and EC. The R 2 of the GBR models for predicting the CS and EC are 0.946 and 0.999, respectively. Thus, it can be concluded that the GBR models have promising abilities for design assistance in cement replacement concrete. Finally, a sensitivity analysis (SA) was conducted in this paper to analyse the effects of the inputs on the CS and EC of the cement replacement concrete. Pulverised fuel ash (PFA), blast-furnace slag (GGBS), Expanded perlite (EP), and Silica fume (SF) were noticed to affect the CS and EC of cement replacement concrete signiﬁcantly.


Introduction
Concrete is the most widely used artificial material, and it is the second most consumed resource in the world after water. More than four billion tonnes of cement are produced each year, which accounts for approximately 8% of global CO 2 emissions [1]. The most significant mechanical property of a concrete design is its compressive strength. The comprehensive strength is a reliable indicator of the overall performance of a concrete mix because it can be used to calculate other properties of concrete with high precision. Cement is a vital ingredient in concrete. Cement acts as the "glue", which holds the fine and coarse aggregates together and gives the concrete strength. The primary raw materials of cement are limestone and clay, which are pulverised and blended with other elements, such as iron ore. These materials are then fed into a cylindrical kiln and are heated to approximately 1450 degrees Celsius. This process, known as "calcination", generates more than 50% of the total CO 2 emissions of cementitious products [1]. Embodied carbon (EC) is the total CO 2 emitted when materials are produced. It can be estimated from the energy used to extract, process, and transport materials. The EC stated throughout this paper considers the carbon emissions from the manufacturing, transportation, and extraction processes of the supplementary cementitious materials (SMCs). When replacing the cement with other binding materials, the EC is reduced. In the majority of cases, the CS tends to decrease as the cement content drops. However, studies have demonstrated that some cementitious materials, such as ground granulated blast-furnace slag (GGBS) and metakaolin, produce a slightly increased CS when replacing cement in small amounts [2]. For this reason, it is not easy to calculate the CS and EC when reducing the cement content. This paper aims to assist in concrete design by producing high-quality ML models that can accurately predict the CS and EC of concrete with various cement replacement materials.

Cement Replacement Materials
Pulverised fuel ash (PFA) is a type of waste material that is collected from coalburning power stations. It cannot be considered the best replacement material for cement as there are few coal-burning power stations left in the United Kingdom and, thus, it would have to be shipped in from elsewhere [3]. This not only increases the cost and time to construction, but also the total EC of the concrete because of the shipping emissions. GGBS is a byproduct from the production of iron. It is a cementitious material produced in blast furnaces, and its typical chemical composition is 40% calcium oxide, 35% silica, 13% alumina, and 8% magnesia [4]. It is a famous cement replacement material and has been researched in detail for many years. It is often combined with PFA at higher volumes of cement replacement [5]. Limestone fines (also known as limestone powder) can reduce the number of voids between aggregates and cement paste. This property makes it a useful material for reducing the CO 2 emissions of concrete, and it can be utilised as up to a 35% replacement within the British standard, BS EN 197-1 [6]. Silica fume (SF) is a pozzolanic material. It is a byproduct of the ferrosilicon industry that can enhance the mechanical properties of concrete. Employing SF increases the demand for water within the mix but it can produce dense and impermeable concrete [7]. Metakaolin is a relatively new cement replacement material. It was observed to increase and decrease the overall CS of concrete in various studies [8]. It should also be noted that, similarly to SF, metakaolin increases the demand for water within the concrete mix. Perlite is a naturally occurring material that is produced by the rapid cooling of volcanic lava. Expanded perlite (EP, also known as perlite powder) is perlite that has undergone a heating process that causes it to expand 4-20 times its original size [9]. EP is a widely available building material that has been reported to reduce CO 2 emissions significantly, and to improve the flowability of concrete slurry by means of replacing a part of the cement content [10,11]. A cement replacement of up to 40% of EP could produce 28-day compressive strength values comparable to traditional concrete [11]. Pumice is a pozzolanic material derived from volcanic sources, similar to perlite. It can be used in concrete as an aggregate, or as an SMC when ground into a powder. Studies have demonstrated that, when utilised at levels up to 25%, adding ground pumice as an SMC could improve the CS of concrete mixes at later ages [12].

Literature Review of ML-Assisted Prediction
Many articles were found from extensive research that provides details about using ML models to predict the mechanical properties of concrete containing cement replacement materials. For instance, four ML models were developed by Mohammed et al. [13] to predict the CS of mortar containing PFA. A total of 450 pieces of experimental CS data, on mortar with PFA ranging from 0% to 70%, were used to develop the ML models. A linear regression model (LR), a nonlinear regression model (NLR), an M5P-tree model (M5P), and an artificial neural network (ANN) were created. The ANN predicted the CS better than all of the other applied models, with a reliable correlation coefficient (R) of 0.934. Moreover, the CS of high-performance concrete with PFA and SF was investigated using Sustainability 2021, 13, 13663 3 of 17 an ANN. The ANN used nonlinear modelling and employed a radial basis function (RBF). The ANN results showed a strong correlation between the experimental and predicted CSs, with the coefficient of correlation being 0.96 [14]. Furthermore, the CS of geopolymer concrete (GPC), with partial cement replacement with GGBS, SF, and natural zeolite, was experimentally studied. The replacement ranged from 0% to 30% in 5% increments. An ANN model to predict the CS of GPC containing these materials was also proposed, using the results collected during the study. To achieve the lowest absolute percent error, the ANN had two hidden layers, with six and five neurons, respectively. The ANN produced a training mean square error (MSE) of 3.5262 and a R of 0.985, validating the experimental results and the model itself [15]. Moreover, the CS of concrete with GGBS was modelled using the M5P model and an ANN. The ANN models hybridised with the M5P model produced architecture with approximately half the errors produced from the M5P model alone, in both the training and testing phases. The study also states that developing a predictive model is necessary for GGBS, as it is required in many design codes. This indicates that more cement replacement materials have the potentials to be included in codes as standards with time. This also evidences that the optimal ML models produced in this paper will become more useful over time [16]. Moreover, the CS of the mortar containing metakaolin was to be predicted employing the support vector machine, RF, decision tree, adaptive boosting (AdBoost), and kNN algorithms. The results show that AdaBoost and RF exhibited higher R 2 values than other types of ML models. The R 2 values of AdaBoost and RF were 0.9473 and 0.9439, respectively. Moreover, the CS of highperformance concrete was predicted using GBR, Gaussian process regression, and kernel ridge regression. The results demonstrate that the GBR model had the best prediction performance (R = 0.965) among the ML models [17].
However, only three types of cement replacement materials were considered in one article, at most. More cement replacement materials, such as SF, PFA, pumice, and GGBS, can be employed together as variables to enhance the practical applicability of an ML model for the better design instruction of cement replacement concrete. Additionally, there is no research related to the EC prediction of cement replacement concrete, which means that the EC of cement replacement concrete cannot be chosen in concrete design. Furthermore, the amount of cement should be considered an input to improve the prediction ability of ML models. Moreover, the proportions of coarse and fine aggregates need to be taken into account in the ML models, even though the EC of them are not considered in this paper. This paper will build on the ideas addressed in previous literature and will fill in the gaps in the research by combining seven alternative cement replacement materials into six types of ML models. It is proposed, for the first time, that the EC of cement replacement concrete can reduce the negative influence of concrete on the environment. Firstly, 367 experimental datasets, containing the CSs and EC of cement replacement concrete, were collected from the open literature. Secondly, 12 variables of the datasets were considered as the inputs of the ML models. Then, SVR, RF, GBR, DTR, kNN, and DNN were utilised to predict the CS and EC of the cement replacement concrete, while the parameters of the ML models were tuned by the GSA. Finally, the R 2 and the RMSE values of the ML models were compared to identify the optimal ML model. This means that the concrete mixes containing at least one of the inputs can be designed to produce the expected CS while lowering the overall amount of EC produced by the concrete.

Data Collection
In this paper, 367 pieces of experimental datasets associated with the CS and EC of concrete containing cement replacement materials were collected from the open literature [2,6,7,[18][19][20][21][22][23][24][25][26][27][28][29][30][31]. The datasets were collected from a large number of credible and published sources to ensure that the datasets were as accurate and robust as possible. Table 1 shows the EC values for each of the cementitious input materials of the ML models, according to British standards and the open literature. The EC of concrete mixes containing SMCs is calculated according to Equation (1), in accordance with EN15978 [32]. The datasets are simplified into 12 input variables and 2 output variables, CS and EC, listed in Table 2. Regarding Table 2, the water binder ratio (WB), the amount of cement (C), the amount of superplasticiser (S), the amount of coarse aggregates (CA), and the amount of fine aggregates (FA) are essential concrete components. In addition, the variables associated with cement replacement materials include PFA, LP, SF, M, PP, and GP.
Cumulative EC of a concrete mix = EC process,m + where EC process,m indicates the EC produced by the manufacturing of SMCs (m = C, PFA, SF, etc.). EC i and EC transportation,i represent the EC due to the manufacturing and transportation processes of the intermediate product, i, to produce SMCs.

Prediction Procedure
Firstly, 367 datasets were normalised, employing Equation (2), to enhance the prediction ability of the ML models. The normalised data ranged from zero to one.
where Yn is the experimental data after normalisation; y min and y max are the minimum and maximum experimental data, respectively; and y is the raw experimental data. Secondly, the normalised datasets were input into the introduced ML models, i.e., GBR, DTR, RF, SVR, kNN, and DNN. Detailed information about the ML models can be found in the articles, [17,[39][40][41][42][43][44][45][46][47][48][49][50]. The experimental datasets were randomly split at 8:2. A total of 20% of the datasets were randomly selected to test the generalisation ability of the ML models, while the rest of the datasets were utilised for training the ML models. Then, the parameters of the ML models were tuned by the GSA to generate the best prediction ability. The GSA can search for the optimal parameter of multidimensional arrays from various directions. The theory of the GSA is to select the optimal parameter through an exhaustive analysis of a series of parameter combinations [51]. The GSA has been extensively employed to conduct ML parameter optimisation. For instance, a high prediction ability, 0.952 of the R 2 value, was obtained for predicting the CS [52]. The reasons for employing the GSA as the optimisation method are as follows [22,53]: • Multiple parameters can be tuned simultaneously; • It will not take a long time to conduct the GSA for fewer parameters; • The global optimal solution can be obtained by employing the GSA.
Finally, the optimal ML models for predicting the CS and EC of concrete containing cement replacement materials were selected according to the prediction ability indicators, the R 2 and the RMSE, explained in Equations (3) and (4) [54,55]: where n indicates the samples number; y i indicates the predicted value; and y i represents the experimental value.

Results
In this section, the prediction ability indicators of the aforementioned ML models are demonstrated. The relationship between the experimental and predicted properties (CS and EC) of cement replacement concrete for the training and testing datasets is demonstrated in Figures 1 and 2, respectively. As shown in Figures 1 and 2, the horizontal axis stands for the CS and EC generated by the ML models, respectively, while the vertical axis indicates the CS and EC of the experimental datasets collected from the open literature. Furthermore, the differences in the R 2 and the RMSE values of the ML models for predicting the CS and EC are shown in Table 3. Moreover, the tuned parameters of the ML models utilizing the GSA are exhibited in Table 4. The optimal ML model is defined when the model performs the highest R 2 and the lowest RMSE. as shown in Figure 2a,b. As demonstrated in Figure 2c-f,I,j, the R 2 and RMSE values of the DTR, DNN, and RF are (0.998, 0.015), (0.995, 0.015), and (0.997, 0.012), respectively. It can be interpreted that the DTR, RF, and DNN models have similar abilities for predicting the EC of cement replacement concrete. Moreover, the prediction ability of the SVR and kNN models is slightly lower than other ML models. The R 2 and RMSE values of the SVR and kNN models are (0.985, 0.026) and (0.965, 0.039), respectively (Figure 2g,h,k,l).         With regard to the ML-aided prediction of the CS, what can be clearly seen in Figure 1a,b is that the GBR model shows the highest R 2 (0.946) and the lowest RMSE (0.058) among the ML algorithms. Furthermore, the R 2 and RMSE values of the SVR and RF models are (0.924, 0.057) and(0.933, 0.062), respectively, which are similar to those of the GBR model (Figure 1g-j). As shown in Figure 1c-f,k,l, the R 2 and RMSE values of the DTR, DNN, and kNN are (0.876, 0.093), (0.892, 0.077), and (0.888, 0.082), respectively, representing a slightly low prediction performance and a poorer prediction accuracy than other ML models.
With regard to the prediction of the EC, the GBR model shows the best prediction performance (R 2 = 0.999) and prediction accuracy (RMSE = 0.012) among the ML models, as shown in Figure 2a,b. As demonstrated in Figure 2c-f,I,j, the R 2 and RMSE values of the DTR, DNN, and RF are (0.998, 0.015), (0.995, 0.015), and (0.997, 0.012), respectively. It can be interpreted that the DTR, RF, and DNN models have similar abilities for predicting the EC of cement replacement concrete. Moreover, the prediction ability of the SVR and kNN models is slightly lower than other ML models. The R 2 and RMSE values of the SVR and kNN models are (0.985, 0.026) and (0.965, 0.039), respectively (Figure 2g,h,k,l).

Discussion
The relationship between the predicted and experimental CSs and EC of cement replacement concrete employing the GBR models is illustrated in Figures 1 and 2. As demonstrated, the GBR models for predicting the CS and EC of cement replacement concrete perform better in terms of prediction ability than other types of ML models. In other words, the relationship between the outputs of the cement replacement concrete and the 12 variables can be precisely explained employing the GBR models. The better prediction ability of the GBR models can be attributed to the fact that the GBR model is a type of ensemble learning algorithm that has a remarkable generalisation capacity because of the employed boosting strategy. Weak learners can be generated by the boosting strategy. Higher weights will be distributed to the weak learners with promising prediction ability, while the weak learners with poor prediction ability will obtain lower weights. The robust prediction ability of the GBR models will be generated by a strong learner made up of the weak learners. However, DNN, SVR, kNN, and DTR are individual ML algorithms, with relatively lower generalisation capacities than ensemble ML algorithms.

K-Fold Cross Validation
K-fold cross validation is employed in this paper to further investigate the prediction ability of the optimal ML models for predicting the CS and EC of cement replacement concrete. Moreover, the reliability of the optimal ML models are reported using K-fold cross validation [56]. Employing K-fold cross validation reduces the variance from the training and testing dataset selection. In this paper, 10-fold cross validation is utilised to demonstrate the prediction ability of the GBR models [57]. Firstly, the datasets are equally split into ten groups. Secondly, nine groups are used to train the GBR models, while the rest of the datasets are employed to conduct the validation of the GBR models. Then, the second step is repeated ten times. Finally, the prediction ability of the GBR models is generated by averaging the R 2 and RMSE values of the 10-fold cross validation [58]. Figure 3a-d demonstrate the R 2 and RMSE results of each fold for predicting the CS and EC in the 10-fold cross validation. It can be observed from Figure 3a that the R 2 values of the 10-fold cross validation for predicting the CS demonstrate slight fluctuations. For instance, the minimum R 2 is 0.939 at Fold 1, while the maximum R 2 is 0.951 at Fold 5. Moreover, the RMSE value exhibited in Figure 3b slightly decreases, from 0.246 to 0.223 between Folds 1 and 6. It then keeps constant at around 0.221, until Fold 10. As shown in Figure 3c, the R 2 values of the 10-fold cross validation for predicting the EC are maintained at approximately 0.997, while the RMSE values fluctuate between 0.012 and 0.014 from Fold 1 to Fold 10, demonstrated in Figure 3d. Furthermore, several statistical results, the average R 2 and RMSE values of the CS and EC predictions, are listed in Table 5. The average R 2 and RMSE values of the CS prediction are 0.9471 and 0.2270, respectively. Moreover, the standard deviations (SDs) of the R 2 and the RMSE for predicting the CS are 0.0037 and 0.0087, respectively, which means that the coefficient of variations (COVs) of the R 2 and RMSE for predicting the CS are only 0.4% and 3.8%, respectively. In addition, the average R 2 and RMSE values of the 10-fold cross validation for the EC prediction are 0.9967 and 0.0125, respectively, while the SDs are 0.0013 and 0.0007, indicating that the COVs of the R 2 and RMSE are 0.1% and 5.6%, respectively. On the basis of the average R 2 , the RMSE, and the COVs of the 10-fold cross validation for CS and EC prediction, it can be concluded that the prediction error of the GBR models is small; in other words, the excellent prediction ability of the GBR models is reliable.

Sensitivity Analysis
Sensitivity analysis is a method used to evaluate how the changes in the outputs of ML models can be affected by their inputs [59]. In this paper, GBR ML models were

Sensitivity Analysis
Sensitivity analysis is a method used to evaluate how the changes in the outputs of ML models can be affected by their inputs [59]. In this paper, GBR ML models were selected to conduct the SA because they have the best performances for predicting the CS and EC of cement replacement concrete. In order to investigate the sensitivity of the chosen ML models, one type of cement replacement material is perturbed at a time, while the other five types of cement replacement materials are kept constant at their mean values. Then, the new datasets are introduced to the GBR models to predict the CS and EC of the cement replacement concrete [60]. After that, the corresponding sensitivity analysis parameter (SAP) of each input can be calculated using Equation (5): where P max (I i ) and P min (I i ) are the maximum and minimum predicted CS and EC values of the cement replacement concrete corresponding to the input, I i . SA i is the SAP of the input I i . Figure 4 represents the SAPs of the inputs, from which a pronounced influence, 19 Figure 4 demonstrates the high SAPs of the inputs for predicting the EC: 24.05, 16.61, and 21.77% for GGBS, LP, and SF, respectively. Additionally, the SAPs for M, PP, and GP are lower than for GGBS, LP, and SF, which are 4.25, 1.58, and 1.30%, respectively. The SAP results of the inputs for predicting the CS and EC of cement replacement concrete indicate that PFA plays the most significant role in the CS and EC of cement replacement concrete. As such, GGBS, LP, and SF should be thoroughly investigated in predicting the CS and EC, employing the GBR models. According to these findings, PFA, GGBS, LP, and SF need to be carefully controlled in cement replacement concrete design because of their prominent effect on the CS and the EC.
where and are the maximum and minimum predicted CS and EC values of the cement replacement concrete corresponding to the input, .
is the SAP of the input . Figure 4 represents the SAPs of the inputs, from which a pronounced influence, 19.96% and 30.44% of the PFA on the predicted CS and EC, respectively, can be observed. With regard to the SAPs of the inputs for predicting CS, similar SAPs of 15.25, 14.07, 12.37, 13.99, and 15.32% for GGBS, SF, LP, PP, and GP, respectively, are investigated. Furthermore, Figure 4 demonstrates the high SAPs of the inputs for predicting the EC: 24.05, 16.61, and 21.77% for GGBS, LP, and SF, respectively. Additionally, the SAPs for M, PP, and GP are lower than for GGBS, LP, and SF, which are 4.25, 1.58, and 1.30%, respectively. The SAP results of the inputs for predicting the CS and EC of cement replacement concrete indicate that PFA plays the most significant role in the CS and EC of cement replacement concrete. As such, GGBS, LP, and SF should be thoroughly investigated in predicting the CS and EC, employing the GBR models. According to these findings, PFA, GGBS, LP, and SF need to be carefully controlled in cement replacement concrete design because of their prominent effect on the CS and the EC.

Conclusions
This paper set out to predict the CS and EC of cement replacement concrete employing the ML models, with six kinds of algorithms, to aid in cement replacement concrete design. Thus, cement replacement concrete with the expected CS and the lowest EC can be designed by using the ML models produced in this paper. The ML models are employed to explain the relationship between 12 inputs and 2 outputs, the CS and the EC. Meanwhile, the GSA hyperparameter tuning method is utilised to optimise the parameters of the ML models. A 10-fold cross validation is employed to investigate the prediction

Conclusions
This paper set out to predict the CS and EC of cement replacement concrete employing the ML models, with six kinds of algorithms, to aid in cement replacement concrete design. Thus, cement replacement concrete with the expected CS and the lowest EC can be designed by using the ML models produced in this paper. The ML models are employed to explain the relationship between 12 inputs and 2 outputs, the CS and the EC. Meanwhile, the GSA hyperparameter tuning method is utilised to optimise the parameters of the ML models. A 10-fold cross validation is employed to investigate the prediction ability of the optimal ML models. Finally, several key inputs are observed by applying SA.
With regard to the R 2 and RMSE values of the ML models, the prominent findings to emerge from this paper can be concluded as follows: • This paper shows that the GBR ML models have the best ability to predict the CS and EC of concrete containing cement replacement materials, as indicated by the R 2 and the RMSE values of the CS prediction (0.946, 0.058), and of the EC prediction (0.999, 0.012). On the basis of the R 2 and the RMSE values, it can be stated that the GBR ML models have an excellent ability for predicting the CS and EC of cement replacement concrete using the 12 inputs; • The average R 2 and RMSE values of the 10-fold cross validation for predicting CS are 0.9471 and 0.2270, respectively. Moreover, the average R 2 of the 10-fold cross validation for predicting the EC is 0.9967, while the RMSE value is 0.0125. The 10-fold cross-validation results indicate that the prediction error of the GBR models is very low. Hence, the promising prediction ability of the GBR models is robust; • The R 2 and the RMSE values of the other five ML models (SVR, RF, DNN, kNN, and DTR) are compared with the GBR model. The results reveal that the GBR model, as an ensemble ML algorithm, exhibits an outstanding superiority to other individual ML algorithms; • The SAP results of the inputs note that PFA, GGBS, LP, and SF have stronger correlations to the CS and EC predictions of cement replacement concrete than other inputs. Thus, more attention should be paid to PFA, GGBS, LP, and SF in the ML-aided design of cement replacement concrete in order to reduce the EC.