Comparative Study of Supervised Machine Learning Algorithms for Predicting the Compressive Strength of Concrete at High Temperature

High temperature severely affects the nature of the ingredients used to produce concrete, which in turn reduces the strength properties of the concrete. It is a difficult and time-consuming task to achieve the desired compressive strength of concrete. However, the application of supervised machine learning (ML) approaches makes it possible to initially predict the targeted result with high accuracy. This study presents the use of a decision tree (DT), an artificial neural network (ANN), bagging, and gradient boosting (GB) to forecast the compressive strength of concrete at high temperatures on the basis of 207 data points. Python coding in Anaconda navigator software was used to run the selected models. The software requires information regarding both the input variables and the output parameter. A total of nine input parameters (water, cement, coarse aggregate, fine aggregate, fly ash, superplasticizers, silica fume, nano silica, and temperature) were incorporated as the input, while one variable (compressive strength) was selected as the output. The performance of the employed ML algorithms was evaluated with regards to statistical indicators, including the coefficient correlation (R2), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE). Individual models using DT and ANN gave R2 equal to 0.83 and 0.82, respectively, while the use of the ensemble algorithm and gradient boosting gave R2 of 0.90 and 0.88, respectively. This indicates a strong correlation between the actual and predicted outcomes. The k-fold cross-validation, coefficient correlation (R2), and lesser errors (MAE, MSE, and RMSE) showed better performance than the ensemble algorithms. Sensitivity analyses were also conducted in order to check the contribution of each input variable. It has been shown that the use of the ensemble machine learning algorithm would enhance the performance level of the model.


Introduction
Due to the fact that concrete has a relatively low cost when compared to other materials, as well as the fact that it is commonly used in engineering structures all over the world, its technology is subjected to constant innovations and improvements [1]. The fast and advanced development of urbanization requires a high demand for concrete [2], which possesses many desired properties including compressive strength, the ability to adopt any shape, and the capacity to resist environmental conditions [3]. In addition, porosity, impact resistance, fire resistance, durability, and acoustic insulation are also cited as being the advantages of concrete [4]. These various aspects enable it to be applied in the construction of infrastructures, dams, tunnels, bridges, and reservoirs [5]. The local availability of ingredients, such as coarse aggregate, fine aggregate, water, and binding material, significantly influences the economical factor [6]. In comparison, other building materials such as steel also possess many properties but cannot be cheaper than concrete. However, in order to make concrete a more advantageous material with improved properties, the techniques of adding other materials such as fly ash, silica fume, other cementitious material, and various fibers are widely adopted [7][8][9]. The use of waste materials in concrete plays a vital role in minimizing environmental risks, as well as in reducing the cost of the material [10]. High temperature and fire severely affect the properties of concrete in both its fresh and hardened states [11]. Some structures or structural elements are exposed to high temperatures, i.e., chimneys, factories with chemicals, and structures used in atomic industries. Moreover, the casting and curing of concrete in hot areas are considered a challenging task to perform, and what is more, concrete loses its mechanical properties (compressive and flexural strength) at high temperature, which ultimately results in the loss of its durability [12].
The development of new materials and methods for protecting against high temperatures has gained more importance in the field of research due to the increased number of incidents caused by fire [13,14]. The effect of fire is considered as a high frequency disaster, which not only causes the deterioration of cement composites, but also plays a role in the spalling of such material [15,16]. The paper [17] indicated that the resistance of a structure against the impact of high temperature caused by fire is one of the critical factors that influence the safety of using structures. This issue requires further research. Concrete is a commonly applied material and is also considered to be one of the best materials for protecting against high temperatures and the effect of fire [18,19]. The components of concrete (at the stage of the hydration of C-S-H and Ca(OH) 2 , and at the stage of the formation of calcium aluminate gels), due to an extended exposure to heat, can disintegrate. This can result in the deterioration of the physicochemical properties of concrete. Therefore, scientists concentrate on analyzing the influence of raised temperature on the mechanical properties of hardened concrete. The differences in the flexural and compressive strength of both ordinary and high-performance concrete have also been investigated when cooled in various conditions (air and water) [20]. In cement composite material (concrete), the decomposition reaction occurs due to the high porosity of the cement matrix and a decrease in strength parameters. The residues of calcium hydro silicate can be recognized in the cement matrix when the material is exposed to high temperatures of about 600 to 700 • C [21]. The performance of other types of concrete, i.e., lightweight concrete, have also been investigated with regards to the impact of high temperature [22]. Extensive research work has been carried out by researchers in order to investigate the mechanical properties of concrete heated to temperatures of up to 800 • C [23][24][25] or higher [26][27][28]. It was proven that the rise in the natural temperature (which depends on the climate zone) also has a significant effect on the properties of concrete, which also involves several energy projects [29,30].
Although concrete is generally a non-combustible material, its chemical, physical, and mechanical properties are directly affected by excessive temperature [31]. Thermal stresses, decomposition, and dehydration cause the spillage, perforation, and cracking of concrete [32]. Moreover, the strength properties of the ingredients of concrete at high temperatures are reduced. Cement paste requires a standard temperature range in order to work effectively inside the concrete matrix. High temperature does not allow cement paste to contribute positively towards the strength of concrete. This is especially the case for highstrength concrete, as it requires a normal temperature to achieve its desired strength [33]. The failure of concrete due to fire is caused by many factors, such as the heating rate and temperature, or structural element conditions, i.e., the application of loads [34]. Therefore, it is usually difficult to analyze the direct effect of high temperature on concrete, especially with regards to the microstructural changes of the aggregate, hydrated cement paste, and interfacial transition zone [35].
Sammy et al. [36] studied the compressive strength and properties of high-performance concrete at high temperatures of about 800 and 1100 • C, as well as during cooling. They investigated that the strength properties decreased sharply after a gradual (26-34%) and rapid cooling process (22-28%). Haruin at al. [37] investigated the effect of high temperature on the compressive strength and splitting tensile strength of light weight concrete with fly ash. The experimental investigation was conducted at 200, 400, and 800 • C. In the case of 800 • C, a decrease by 63.8% and 76.45% in the compressive strength and the splitting tensile strength of concrete, respectively, was noted. Sammy et al. [38] compared normal-strength concrete and high-strength concrete subjected to high temperatures. The 28-day compressive strength of concrete was tested after different exposure times of various temperatures (400, 600, 800, 1000, and 1200 • C). The compressive strength of concrete containing rubber-modified recycled aggregate was also investigated at elevated temperatures [39].
It is clear from past studies that input parameters directly correlate with output results [40,41]. Supervised machine learning approaches also have the capability of incorporating the effect of temperature change, which indicates the positive aspect of these techniques. ML algorithms show a better performance, with a smaller variance, when considering the parameter of temperature change [42]. The performance of ML approaches is associated with several parameters, including the number of parameters and the data that are used to create the model. The novelty of the authors' research approach also includes the addition of another parameter (temperature effect) for predicting the strength of concrete. The ML approaches, and their comparison in terms of their performance, were investigated in this study. This study included the temperature effect, which was used as an input parameter for investigating the performance of the selected ML approaches during the prediction of the compressive strength of concrete.

Research Significance
This study aimed to forecast the compressive strength of concrete exposed to high temperatures by employing individual and ensemble machine learning algorithms. The decision tree (DT) and artificial neural network (ANN) (as a system), as well as the bagging regressor and gradient boosting regressor (as ensemble machine learning approaches) were used. The novelty of this research involves the investigation of the accuracy level of individual and ensemble ML algorithms, as well as the evaluation of the accuracy level of each approach for predicting the compressive strength of concrete at high temperatures. This study also compares statistical indicators that are used to evaluate the model's accuracy. This study shows that the ensemble algorithms yielded a strong relationship when compared to individual machine learning techniques. Furthermore, the validity and accuracy of all the employed models were evaluated by using the method of k-fold cross-validation and by applying statistical checks. However, sensitivity analysis provides information regarding the contribution of the temperature parameter for predicting compressive strength. The purpose of this research also includes the comparison of the employed machine learning approaches with the techniques adopted in the literature.

Supervised Machine Learning (ML) Techniques
Machine learning algorithms are more commonly applied in civil engineering for predicting the mechanical properties of concrete. Examples of their application are listed in the Table 1. The compressive or flexural strength of concrete can be determined by using the hit and trial method for various ages of concrete samples. To overcome some limitation in this method, we used machine learning algorithms to forecast outcomes for input data. Hao et al. [43] used the support vector machine (SVM) and k-fold crossvalidation to predict the compressive strength of concrete in a marine environment, stating that the SVM performs better when compared to the artificial neural network (ANN) and decision tree (DT). Chengyeo et al. [44] predicted the compressive strength of concrete in a wet-dry environment using the backpropagation artificial neural network (BP-ANN). It was shown that the BP-ANN provides better accuracy regarding the actual and predicted results. Hocine et al. [45] applied the ANN model for predicting the compressive strength of limestone filler concrete. The training, testing, and validation of their data provides a strong correlation (exceeding 97%) with the real data. Behfernia et al. [46] used the ANN and adaptive neuro-based fuzzy inference (ANFIS) to predict the compressive strength of concrete. It was evaluated that the ANN model is a well-organized model for predicting the compressive strength of concrete. Hoang et al. [47] employed efficient machine learning models for predicting the strength of concrete. They proposed that the performance of the trained models of the gradient boosting regressor (GBR) and extreme gradient boosting (XGBoost) were better when compared to the support vector regressor and multilayer perceptron (MLR).

Description of the Obtained Data
The data points used to run the models via machine learning algorithms were obtained from the literature [20,[70][71][72][73][74][75][76][77], and can be seen in Appendix A. The data taken from the published article explains the behavior of concrete in a hot environment. Nine parameters were taken as the input parameters, namely, cement, water, fine aggregate, coarse aggregate, fly ash, superplasticizer, nano silica, silica fume, and temperature, while compressive strength was taken as the output parameter. These parameters were employed in Jupiter python software in order to indicate the graphical representation in the form of their relative frequency distributions, which can be seen in Figure 1. It is clear that the model's performance was significantly affected by the input variables. The descriptive analysis, as well as the mathematical indication of the variables used to run the models (with their ranges), are listed in Table 2.

Machine Learning Approaches
This section explains the types of algorithms used for predicting the compressive strength of concrete at high temperatures. The strength property (compressive strength) of concrete was forecasted using both ensemble and individual algorithms. The decision tree, bagging, and gradient boosting techniques were used to run the models. Python coding was used in Anaconda software for all three employed machine learning approaches. The applied algorithms are illustrated in Figure 2. A decision tree is a supervised machine learning technique used for the distribution of regression problems, as well as for the classification of problems. The structure of the decision tree is like a flowchart with nodes, branches, and roots. The internal node exhibits a test on an attribute; every branch shows the outcome of the test, while each leaf node provides the indication of the class tag. The classification rule is represented by the path followed from the root to the leaf. Three different types of nodes of decision tree, with three geometric shapes (square, circle, and triangle), are available. It can generally be seen as a simple technique that can be used for understanding and interpreting.
Bagging is also known as bootstrap aggregating, the arrangement of bagging in such a way that can improve the firmness and accuracy of the machine learning algorithms used in the regression and classification. It is normally used to reduce the variances among the actual and predicted results. Bagging can be applied to any type of method but has commonly been applied with decision tree methods. It is also considered to be one of the special cases of the model averaging technique. Bagging is a parallel ensemble machine learning approach that gives an explanation about the variance of predicted models by providing supplementary data in the training stage. There are equal chances for each element to appear in the new dataset. Predictive power cannot be improved while altering the training set. The decision tree with bagging is modulated with 20 sub-models to have an optimized value, and as a result a strong adamant output result can be obtained.
Gradient boosting is generally considered and accepted as one of the powerful approaches for creating predictive models. It is an ensemble machine learning algorithm that is normally employed for regression and classification problems. It develops a forecasted model in the form of an ensemble of frail predicted models-normally the decision tree. When the decision tree provides the result as a weak learner, the resulting algorithm will then be considered as a gradient boosting tree. Gradient boosting can also be employed in the field of learning to rank. It is also used for high energy physics in data analysis.
The artificial neural network (ANN) algorithm has a brain-like structure with connected neurons. The ANN is essentially the collection of connected units or nodes (known as artificial neurons), which act as the model of the human brain. These neural networks learn by example of processing. They contain a known "input" and "result", which creating probability-weighted associations among the input and result and are stored within the data structure of the net itself. The application of the ANN in the field of civil engineering is of great interest nowadays, especially for predicting the mechanical properties of concrete. This is due to its high accuracy level of predicting results for the actual strength properties of concrete.

Statistical Analysis
The statistical results for the actual and predicted (using supervised machine learning algorithms) compressive strength of concrete obtained at high temperature, as well as their error distribution, are shown in Figure 3. The accuracy level of the performance of the model was compared with the value of the correlation coefficient (R 2 ). The DT (individual algorithm) model appeared to be better, with the value of R 2 equal to 0.83, as depicted in Figure 3a. The model's error distribution can be seen in Figure 3b. The minimum and maximum error values of the DT model were determined at a level of 14.5 MPa and 101.4 MPa, respectively. The average value of the errors was 51.2 MPa. However, 50% data of the errors data lay between 30 and 70 MPa, and only 7.1% data showed as error above 100 MPa, as illustrated in Figure 3b.
The predictive performance of the bagging (ensemble algorithm) model indicates a strong relation with the actual outcomes. The highest value of R 2 (0.90) was obtained in the case of the bagging regressor. In turn, the values of R 2 for the ANN, DT, and GB were equal to 0.82, 0.83, and 0.88, respectively. These results indicate a high accuracy level of the prediction. The graphical representation of the predicted and actual results of the compressive strength of concrete at high temperatures can be seen in Figure 3c, with its error distribution in Figure 3d. The maximum and minimum error values for the bagging regressor when predicting the strength property of concrete at increased temperatures were equal to 94.1 and 12.95 MPa, respectively. However, 59.92% of the errors data lay between 30 and 70 MPa, as shown in Figure 3d.
The gradient boosting (ensemble ML approach) model also indicates a better accuracy in the case of the predictive and actual outcomes for the compressive strength of concrete at high temperatures. In comparison, the performance of gradient boosting was almost similar to the bagging regressor (with less margin for the bagging regressor due to the R 2 value being equal to 0.88), as shown in Figure 3e. The error distribution is shown in Figure 3f. The average value of the gradient boosting regressor was equal to 50.76 MPa, whereas the maximum and minimum error values were 114.5 and 6 MPa, respectively. In addition, only 4.76% of the error data were above 100 MPa for the regressor. The same statistical result for the ANN model also indicates the better performance of this model when compared to the DT algorithm. The ANN model indicated a strong relation, with a smaller variance between the actual and predicted outcome, and provided the R 2 value equal to 0.82, as shown in Figure 3g. The distribution of the errors for the ANN model can be seen in Figure 3h. The distribution indicates the maximum and minimum values of the error, which were equal to 24.58 and 0.29 MPa, respectively. However, the average value was equal to 9.158 MPa. It was also noted that 57.14% of the error data lay between 0 to 10 MPa, and 19.04% of the data lay between 10 to 15 MPa, with only 2.38% of the data being above 20 MPa.

k-Fold Cross Validation and Statistical Checks
To evaluate the model's authentic execution, we adopted the k-fold cross validation approach. This method is normally employed to analyze the actual performance of models. In this test, the data were arranged randomly and divided into 10 groups. Nine groups were allocated for training purposes, and the remaining one was assigned for validation of the model. The average value was obtained by repeating the same process 10 times. The application of the 10-fold cross validation test was used to obtain the most accurate performance of the models. It was also important to apply the statistical checks in order to obtain the performance level of the model. This research also includes the application of the statistical check of the performance of the models with regards to the prediction according to Equations (1)-(5) where ex i = the experimental value; mo i = the predicted value; ex i = the mean experimental value; mo i = the mean predicted value obtained by the model; n = the number of samples.
The correlation coefficient (R 2 ), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) were introduced for evaluating the k-fold cross validation, as depicted in Figure 4. The validation process was performed for all the employed (DT, ANN, bagging, and gradient boosting) ML algorithms. The small values of the errors of the bagging model, and at the same time the increased value of the correlation coefficient (R 2 ), indicated a better accuracy level when compared to the ANN, DT, and GB. The details of the analysis used for the k-fold cross validation process are included in Table 3.
In addition, the statistical checks, including mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE), were evaluated for all the machine learning approaches (Table 4). A smaller value of the error increased the value of the correlation coefficient (R 2 ). The bagging regressor provided the value of MAE equal to 5.65 MPa, which was less than the MAE value of the DT (7.54 MPa), ANN (9.15 MPa), and GB (6.93 MPa). Similarly, the MSE and RMSE of the ANN was higher than the DT, bagging, and GB, while the R 2 value of the ANN was lower than that of the other regressors.  Moreover, the statistical representation of the k-fold cross validation, including the correlation coefficient and errors, is presented in Figure 4. The average value of R 2 for the DT was 0.42, with its minimum and maximum R 2 values being equal to 0.03 and 0.82, respectively (Figure 4a). The average R 2 value of the bagging regressor was equal to 0.44, with its minimum and maximum R 2 values being equal to 0.03 and 0.77 (Figure 4b). Similarly, the average R 2 value of the gradient boosting was equal to 0.54, with its minimum and maximum values being 0. 11 Figure 4c). In addition, the average values of the errors (MAE, MSE, RMSE) for the ANN model were 13.44, 258.98, and 15.28 MPa, respectively (as presented in Figure 4d).

Sensitivity Analysis of the Compressive Strength of Concrete at High Temperatures
Sensitivity analysis was conducted in order to check the parameters that have a significant effect on the prediction of the compressive strength of concrete at high temperatures, as shown in Figure 5. Every variable used to run the model plays its role in predicting the strength of concrete. However, cement was the decisive factor that influenced the prediction of the strength of concrete. Its influence on the obtained results was estimated at 32%. In turn, the influence of fly ash, superplasticizers, silica fume, water, temperature, nano silica, fine aggregate, and coarse aggregate was estimated at the levels of 16%, 15%, 14%, 2%, 6%, 3%, 10%, and 2%, respectively. The result of the sensitivity analyses depends on the number of input parameters and the number of data points used to run the model. However, the contribution of each parameter is identified by the employed ML algorithm. The results of these analyses vary due to the different proportions of the concrete mix and the addition of new input parameters.

Discussion
This research shows a comparison of the performance of the various models with the experimental results of the compressive strength of concrete exposed to high temperatures. Ensemble (bagging, gradient boosting) and individual (ANN, DT) supervised machine learning algorithms were used for prediction purposes. The bagging regressor had a better prediction performance when compared to the ANN, DT, and GB. However, it is difficult to analyze and recommend the best machine learning regressor for predicting results for various topics because the performance of the models is directly affected by the input parameters and the data points used to run the model. However, ensemble machine learning techniques normally uses the weak learner by making the sub-models, which can be trained on data and uses the optimization to obtain a maximum value of R 2 . The performance of the 20 sub-models of the bagging and GB regressor, with their correlation coefficient (R 2 ) values, can be seen in Figure 5. Thus, according to the literature, the performance of ensemble models shows more accurate results when compared to individual machine learning approaches. Previous studies also proven that the ensemble ML approaches such as bagging, boosting, and AdaBoost have better response towards the prediction of outcomes.
Moreover, it is also important to know about the performance of each parameter with regards to predicting outcomes. The sensitivity analysis provides information of how an individual parameter contributes towards the predicting of outcomes. The result of sensitivity analysis for this study can be seen in Figure 6. This study was also based on statistical checks, the validation process, and sensitivity analysis in order to verify the execution level of the evaluated ML techniques. This research could be beneficial with regards to reducing costs and minimizing the time consumed during the hit and trial method for achieving the desired strength of concrete. In addition, the research results can also be used in other fields of engineering for predicting required outcomes. It was shown that the ensemble modeling provides a better performance when compared to the other methods. Therefore, this technique is preferred for forecasting results in the case of related issues.

Conclusions and Future Recommendations
This research provides information about the predictive determination of the compressive strength of concrete at high temperatures using individual and ensemble supervised machine learning approaches. The application of the ML techniques for predicting the performance of concrete is quite an effective approach as it shows a high-level accuracy when compared to the actual result. It usually takes a large amount of time (28 days) to determine the strength of concrete. In turn, ML algorithms play an important role in reducing this time, and also save a large amount of the costs and efforts associated with the conducting of experimental works. In this research, the decision tree (DT) and ANN algorithms were selected from the individual techniques, while the bagging and gradient boosting (GB) regressors were used as ensemble algorithms for forecasting the strength of concrete at high temperatures. The bagging technique was most effective and had the highest correlation coefficient value. The lesser values of the errors (MAE of 5.65 MPa, MSE of 61.08 MPa, RMSE of 7.81) from the statistical checks for the bagging were also the indication of its better performance as opposed to ANN, DT, and GB. Practically, it is impossible to evaluate the effect of temperature on the mechanical properties of concrete prepared with various type of mixes. However, the temperature and other related effects such as humidity can also be added as input parameters for running the models to obtain the required output. The following conclusions form this study can be drawn: the ensemble algorithms (bagging and GB) performed well when predicting the compressive strength of concrete-not only at normal temperature, but also at high temperatures.
(a) The performance of the models can be affected by input parameters. Taking into account the thermal aspect (being the main consideration of the paper), we found that the ensemble models showed less discrepancy between actual and predicted results. (b) The accuracy level of the bagging and GB regressors was also confirmed using the k-fold cross validation process. (c) The contribution of each parameter with regards to predicting the outcome was evaluated by means of sensitivity analysis. (d) This study describes the positive role of the supervised ML approaches in the field of civil engineering. The application of these techniques can be successfully adopted to predict the mechanical properties of concrete without spending time on the experimental work in the laboratory. It was also observed that the ensemble machine learning algorithms indicate a strong relation between actual and forecasted results when compared to individual algorithms. (e) The high accuracy of the models can also be achieved by increasing the data points, as number of data points have high influence on the model's outcome. (f) The performance of the models can also be evaluated on the basis of practical work performed in a laboratory in order to understand the difference level between the actual and predicted result. (g) The variance can be reduced by splitting more than 20 sub-models (in the ensemble techniques) for training on data and optimization would give the maximum R 2 value.
It should be underlined that it is difficult to recommend or say about any approach directly on few trails that will provide the most accurate result, while the other techniques (such as AdaBoost Regressor) can be used for the prediction of outcomes for making comparisons.