Supervised Learning Methods for Modeling Concrete Compressive Strength Prediction at High Temperature

Supervised learning algorithms are a recent trend for the prediction of mechanical properties of concrete. This paper presents AdaBoost, random forest (RF), and decision tree (DT) models for predicting the compressive strength of concrete at high temperature, based on the experimental data of 207 tests. The cement content, water, fine and coarse aggregates, silica fume, nano silica, fly ash, super plasticizer, and temperature were used as inputs for the models’ development. The performance of the AdaBoost, RF, and DT models are assessed using statistical indices, including the coefficient of determination (R2), root mean squared error-observations standard deviation ratio (RSR), mean absolute percentage error, and relative root mean square error. The applications of the above-mentioned approach for predicting the compressive strength of concrete at high temperature are compared with each other, and also to the artificial neural network and adaptive neuro-fuzzy inference system models described in the literature, to demonstrate the suitability of using the supervised learning methods for modeling to predict the compressive strength at high temperature. The results indicated a strong correlation between experimental and predicted values, with R2 above 0.9 and RSR lower than 0.5 during the learning and testing phases for the AdaBoost model. Moreover, the cement content in the mix was revealed as the most sensitive parameter by sensitivity analysis.


Introduction
Concrete is one of the most versatile materials used in the construction of buildings, subway systems, and many other civil engineering structures. With the rapid development of urbanization, the demand for structural concrete is increasing. As a core aspect of these structures, concrete may encounter aberrant results such as abrasion, freezing, and chemical erosion during the whole life of the structure. One of the aberrant results is high temperature and fire. Some examples of concrete structure that are vulnerable to high temperature include industrial structures, such as chimneys working at high temperature, as well as factories dealing with chemicals with high fire risk [1]. The fire causes the concrete temperature in the concrete structure to be extremely high. If the concrete surface reaches above 100 • C, it can be observed that heat transfer can increase the internal temperature of concrete to 300-700 • C [2]. and literature-based analysis, Javed et al. [18] predicted the compressive strength of sugar cane bagasse ash concrete. Hadzima-Nyarko et al. [19,20] investigated ANN, k-nearest neighbor, regression trees, and random forests models for predicting the compressive strength of concrete. Zhang et al. [21] developed a model that combines beetle antennae search (BAS) and multi-output least square support vector regression (MOLSSVR) to predict concrete compressive strength and pervious permeability coefficient. Their proposed model outperforms support vector regression (SVR), MOLSSVR, logistic regression, and modified ANN with firefly algorithm, according to the findings of this study. To estimate the compressive strength of concrete with ground granulated blast-furnace slag, Kandiri et al. [22] developed a hybrid ANN with multi-objective salp swarm algorithm. Golafshani et al. [23] showed that an ANN-based model was more effective than modified ANFIS by combining ANFIS and ANN with the grey wolf optimizer. Ali Khan et al. [24] used gene expression programming (GEP) for prediction of the compressive strength of geopolymer concrete (GPC), and found that the GEP model possesses a higher predictive capability and is appropriate to practice in the preliminary design of fly-ash-based GPC. The results showed that the aforementioned ML models re able to obtain the experimental observations with an acceptable performance. However, this field continues to be further explored.
This paper focuses on the use of computational intelligence techniques-especially AdaBoost, random forest (RF), and decision tree (DT) algorithms-to analyze the prediction of concrete's compressive strength at high temperature, emphasizing accuracy and efficiency, and each technique's potential to deal with experimental data. This study also aims to contribute to the knowledge of the application of computational models in the prediction of compressive strength of concrete at high temperature, using machine learning and comparing the obtained results with other studies in the available literature. The primary significance of this study is that the data division in the training and testing datasets has been made with due regard to statistical aspects such as maximum, minimum, mean, and standard deviation. The splitting of the datasets is made to determine the predictive capability and generalization performance of established models, and it later helps to better evaluate them. Finally, a sensitivity analysis is also carried out on input parameters.
The rest of this article is structured as follows: The next section introduces the data catalog and the selection of input variables. Section 3 presents the preliminaries of the algorithms used in the proposed approach, and discusses the model evaluation metrics. Development of AdaBoost, RF, and DT of proposed models are described in Section 4. Section 5 describes the results and discussion. Finally, Section 6 draws conclusions and outlines promising directions for future work.

Data Catalog and Input Variables Selection
The data used in the study comprise a total of 207 experimental results on the residual compressive strength from the synthesis of previously published "source catalogs." The source catalogs are those of Ergün et al. [3], Cülfik and Özturan [25], Behnood and Ziari [26], Bastami et al. [27], Chen et al. [28], Xiong et al. [29], Mousa [30], Fu et al. [31], and Husem [32]. The data catalog is presented in Table 1 (the entire database can be found  in Appendix A, Table A1). It should be noted that the samples that were chosen from the mentioned references were taken at the age of 28 days. In this study, 165 (80%) and 42 (20%) samples were selected based on statistical consistency-such as minimum (Min.), maximum (Max.), and mean-for the training and testing, respectively, of the proposed models. The statistical consistency of the datasets for training and testing optimizes the performance of the models and eventually helps to analyze them better. A significant step in predictive modeling in data mining is the selection of the input variables that represent the system to be modelled. The input variables of a data-driven model should contain all relevant information about the target output. On the other hand, they rely to a large extent on the information available in the form of input-output data pairs. The proportions of the mix, the temperature, and the compressive strength associated with the temperature are the data available from the literature related to concrete's compressive strength when exposed to high temperature. Consequently, the temperature proportions of the mix (the quantity of different materials, such as water, cement, fine and coarse aggregates, and admixtures) may be the correct choice of the input variables to predict the compressive strength of concrete for 28 days at any temperature. In this study, the input variables are cement (C), water (W), fine aggregates (FA), coarse aggregates (CA), silica fume (SF), nano silica (NS), fly ash (F), super plasticizer (SP), and temperature (T); and the output variable is compressive strength of concrete at the temperature T (f' c,T ). The descriptive statistics of each input and output are listed in Table 2.

Methodology
In this section, each of the algorithms used in the proposed methodology is briefly described.

AdaBoost
AdaBoost is a commonly used boosting algorithm that constructs an ensemble by performing multiple iterations each time with different instance weights, and adapts to the errors returned by classifiers from previous iterations [33,34] adaptively. In each iteration, changing the weight of training instances forces the learning algorithms to put more emphasis on instances previously incorrectly classified, and less emphasis on instances previously correctly classified. In other words, for misclassified instances, weights are increased, while for correctly classified instances, weights are reduced. This will make sure that misclassification errors count more heavily in the next iterations for those misclassified instances. AdaBoost utilizes the predictions of several weak classifiers, and a final prediction is given by a combined vote on techniques.
The principal concept of the AdaBoost learning algorithm is to create a strong classifier that has high performance detection by joining weak classifiers. The AdaBoost algorithm learns, and has two functions with repetitive calculations: selecting the function and learning the classifier. By reiterating the calculation, the simple classifying ability is strengthened, because weak classifiers that are an index of performance are added to the powerful iteration classifier [35].

Random Forest
Random forest (RF) is a supervised learning algorithm which is used for both classification and regression. It is, however, primarily used for problems with classification. Breimanan presented the theoretical development of RF [36]. On data samples, the RF algorithm generates decision trees and then gets the prediction from each of them and eventually chooses the best solution by voting. It is an ensemble approach that is better than a single decision tree because by averaging the result, it decreases the over-fitting. An RF algorithm's working procedure consists of the following steps: Step 1-Start by selecting random samples from a given dataset.
Step 2-Next, for each sample, this algorithm creates a decision tree. Then, from any decision tree, it gets the prediction result.
Step 3-For any predicted outcome, voting is carried out in this step.
Step 4-Eventually, the outcome of the most voted prediction is selected as the final result of the prediction. Figure 1 presents RF working architecture.

Decision Tree
The decision tree (DT) constructs classification or regression models in the form of a tree diagram. It divides a dataset into smaller and smaller subsets while simultaneously building an associated decision tree [37]. To predict a class label for a record in DT, we start at the root of the tree. The values of the root attribute are compared to the attributes of the record. We jump to the next node after following the branch of this value based on the relation. Decision trees classify instances by sorting the tree from the root to a specific leaf or terminal node and supplying the instance classification to the leaf node. Every tree node serves as a test case for a specific attribute, and the possible responses to the test case correspond with each edge descending from the node. This is a recursive method that is replicated for all sub-trees that are rooted in the new node. A minimum number of leaf instances, splitting into smaller subsets, a maximum number of depths, and stopping nodes from splitting before the required majority threshold has been reached are all included in the DT parameters.

Construction of Prediction Models
The proposed models for prediction of the compressive strength of concrete at high temperature (f' c,T ) were developed using Orange software. The model structure was based on an input matrix (x) defined by x = [C, W, FA, CA, F, SP, SF, NS, and T], which provided the predictor variables, while the target variable (y) was f' c,T . The most important task is to find the acceptable size of the training data and testing dataset in every modeling phase. Therefore, 80% of the total dataset was selected and used to create the models in this analysis, and the developed models were evaluated on the remaining dataset. In other words, to build and evaluate the models, 165 and 42 datasets were used, respectively. Based on the trial and error process, all models (AdaBoost, RF, and DT) were tuned in order to optimize the f' c,T prediction. The construction of the prediction models is shown in Figure 2.

Hyperparameter Optimization
Hyperparameters that need to be tuned are found in most of the ML algorithms. In order to obtain the best prediction accuracy, the optimization process aims to determine the best parameters for AdaBoost, RF, and DT. In this research, as shown in Table 3, some critical hyperparameters in AdaBoost, RF, and DT algorithms are tuned. Table 3 also clarifies the specific meanings of these hyperparameters. First, the values for the tuning parameters of the models were selected, and then subsequently varied in the trials until the best fitness measures provided in Table 3 were achieved.

Model Evaluation Indexes
The coefficient of determination (R 2 ), root mean square error (RMSE)-observations standard deviation ratio (RSR), mean absolute percentage error (MAPE), and relative root mean square error (RRMSE), as more common criteria in the literature, are used in this study to evaluate the results of the proposed models.
In the equations, n is the number of data; y i andŷ i are the actual and predicted output of ith sample of the data, respectively; y is the averaged actual output of the data. The R 2 coefficient ranges from 0 to 1, and the model a with higher quantity of R 2 has more efficiency. MAPE and RRMSE criteria measure the percentage error of the model in two different forms, which range from 0 to 100. The model is deemed effective when the value of R 2 is greater than 0.8 and is close to 1 [38]. The RMSE-observations standard deviation ratio (RSR) is calculated as the ratio of the RMSE and standard deviation of measured data. The RSR varies from an optimal value of 0 to a significant positive value. A lower RSR presents a lower RMSE, indicating the higher predictive efficiency of the model. RSR classification ranges are described as very good, good, acceptable, and unacceptable with ranges of 0.00 ≤ RSR ≤ 0.50, 0.50 ≤ RSR ≤ 0.60, 0.60 ≤ RSR ≤ 0.70, and RSR > 0.70, respectively [39]. It is obvious the lower the values of RSR, MAPE, and RRMSE criteria, the better the model.

Results and Discussion
In this section, using training and test datasets, the predictive performance of the established AdaBoost, RF, and DT models is assessed. The training dataset is used to assess the model structure and parameters. As a result, the performance of the models on the training dataset can be used to determine which model is well trained. However, the test dataset is used only after the model has been determined to evaluate the quality of the model. According to the predicted values, the values of the different statistical measures of the models for both the training and test phases are shown in Tables 4 and 5. Figures 3 and 4 display the scatter plot of the experimental (actual) and the predicted compressive strength of concrete at high temperature involving three supervised learning techniques for the phases of training and testing. From these findings, it is clear that, in terms of statistical performance measures, all models performed effectively in predicting the compressive strength of concrete at high temperatures. In the training dataset, the R 2 was determined to be higher in the case of AdaBoost (R 2 = 0.999) as compared with the other two models, RF (R 2 = 0.965) and DT (R 2 = 0.968). Similarly, in the testing phase, the R 2 was determined to be higher in the case of AdaBoost (R 2 = 0.938) as compared with the other two models, RF (R 2 = 0.935) and DT (R 2 = 0.911) "−" represents that this performance statistic is not included in the reference. "−" represents that this performance statistic is not included in the reference.
Furthermore, in terms of the statistical measures in training, the lowest value was found for AdaBoost (RSR = 0.032, MAPE = 1.357%, RRMSE = 1.666%) compared to RF (RSR = 0.190, MAPE = 11.306%, RRMSE = 9.869%) and DT (RSR = 0.178, MAPE = 9.747%, RRMSE = 9.265%). Similarly, regarding the prediction results in the testing, the lowest value was found for AdaBoost (RSR = 0.248, MAPE = 12.523%, RRMSE = 11.622%) compared to RF (RSR = 0.256, MAPE = 13.076%, RRMSE = 11.661%) and DT (RSR = 0.324, MAPE = 16.100%, RRMSE = 14.753%). This superiority may be owing to the fact that the AdaBoost model excellently captures the nonlinear relationships between concrete mix proportions and temperature with compressive strength. It can therefore be concluded that, based on statistical analysis checks, the AdaBoost model had the best results. Additionally, the R 2 , MAPE, and RRMSE of the predicted values using the ANFIS method [40] were 0.94, 14%, and 13%, respectively, for the training dataset. The R 2 , MAPE, and RRMSE of the predicted values using the ANN method [20] were 0.96, 9%, and 10%, respectively, for the training dataset. Similarly, the R 2 , MAPE, and RRMSE of the predicted values using the ANFIS method [40] were 0.89, 20%, and 15%, respectively, for the testing dataset. The R 2 , MAPE, and RRMSE of the predicted values using the ANN method [40] were 0.92, 12%, and 12%, respectively, for the testing dataset. The performance has been improved by the AdaBoost model compared with the ANFIS and ANN models in terms of R 2 , MAPE, and RRMSE values. In particular, the AdaBoost model yielded the best result in the section of training and testing datasets.
Finally, it can be seen, the performance accuracy of the AdaBoost model is higher than the RF and DT models. In general, this study may assist engineers in selecting appropriate supervised learning models and parameters for the production of high-temperature concrete. The values obtained from the three models and the experimental values are presented in Figure 5. It can be inferred from this figure that using the AdaBoost model might be sufficient and have reasonable precision with nine input variables for the estimation of the compressive strength of concrete at high temperature. Based on the findings, using a set of nine input variables could be justifiable and useful for practical and engineering applications.
Furthermore, a sensitivity analysis was also conducted using Yang and Zang's [41] method to evaluate the influence of input parameters on f' c,T based on the AdaBoost algorithm. This approach has been used in several studies [42][43][44], and is formulated as where n is the number of data values (this study used 165 data values), and y im and y om are the input and output parameters. The r ij value ranged from zero to one for each input parameter, and the highest r ij values suggested the most efficient output parameter (which was f' c,T in this study). The r ij values for all input parameters are presented in Figure 6.
The cement (C) content (r ij = 0.895) in the mix was revealed as the most sensitive parameter, followed by FA (r ij = 0.852), CA (r ij = 0.846), W (r ij = 0.805), SP (r ij = 0.754), SF (r ij = 0.508), T (r ij = 0.505), F (r ij = 0.432), and NS (r ij = 0.343), by sensitivity analysis. Similar to other artificial intelligence techniques, supervised learning models have a limited domain of applicability and are mostly case dependent. Therefore, their generalization is limited, and they are only applicable in the range of training datasets. Furthermore, the developed AdaBoost model, in comparison to the other models, is suitable to accurately and efficiently predict the NSC and the HSC compressive strength at high temperature. However, this model can always be updated to yield better results as new data becomes available.

Conclusions and Future Prospect
Robustness and sensitivity analyses of three supervised learning models (i.e., Ad-aBoost, RF, and DT) were performed in this study for prediction of the compressive strength of concrete at high temperature. Statistical measure criteria such as R 2 , RSR, MAPE, and RRMSE were used to test the predictive abilities of the aforementioned models. In addition, the developed models were compared with ANIFS and ANN models from the literature in order to evaluate robustness. The testing phase results revealed that the supervised learning models built in this study performed well in predicting concrete compressive strength at high temperature, but the most effective model compared to other supervised learning models was the AdaBoost model (R 2 = 0.938, RSR = 0.248, MAPE = 12.523%, and RRMSE = 11.622%). Statistical analysis checks reveal that the AdaBoost model shows enhancement in model accuracy by minimizing the error difference between targeted and predicted values. The results of the sensitivity analysis show that five parameters-namely, the cement content, the fine and coarse aggregate, the water, and the super plasticizerwere found to be the most sensitive and important factors for predicting the compressive strength of the concrete at high temperature. It can therefore be inferred that the Ad-aBoost model is a promising method for predicting concrete compressive strength at high temperature, which can be extended to predict other significant concrete properties, such as elasticity modulus, flexural strength, or tensile strength. Thus, the application of an AdaBoost in the field of predicting the compressive strength at high temperature against destructive testing methods is appropriate, and can be seen as an alternative and suitable approach.
Different artificial intelligence (AI) techniques, such as fuzzy logic, response surface methodology (RSM), support vector machine (SVM) analysis, random forest regression (RFR), recurrent neural network (RNN), may also be applied for a better understanding and predicting of the compressive strength of concrete at high temperature. Furthermore, to improve the performance results of prediction models, more experimental data should be collected in future work.

Data Availability Statement:
The data used to support the findings of this study are included within the article.

Conflicts of Interest:
The authors declare no conflict of interest.