Random Forest Modeling for Fly Ash-Calcined Clay Geopolymer Composite Strength Detection

Geopolymer is an eco-friendly material used in civil engineering works. For geopolymer concrete (GPC) preparation, waste fly ash (FA) and calcined clay (CC) together were used with percentage variation from 5, 10, and 15. In the mix design for geopolymers, there is no systematic methodology developed. In this study, the random forest regression method was used to forecast compressive strength and split tensile strength. The input content involved were caustic soda with 12 M, 14 M, and 16 M; sodium silicate; coarse aggregate passing 20 mm and 10 mm sieve; crushed stone dust; superplasticizer; curing temperature; curing time; added water; and retention time. The standard age of 28 days was used, and a total of 35 samples with a target-specified compressive strength of 30 MPa were prepared. In all, 20% of total data were trained, and 80% of data testing was performed. Efficacy in terms of mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2), and MSE (mean squared error) is suggested in the model. The results demonstrated that the RFR model is likely to predict GPC compressive strength (MAE = 1.85 MPa, MSE = 0.05 MPa, RMSE = 2.61 MPa, and R2 = 0.93) and split tensile strength (MAE = 0.20 MPa, MSE = 6.83 MPa, RMSE = 0.24 MPa, and R2 = 0.90) during training.


Introduction
Fly ash (FA) is a byproduct of the thermal power plant electricity-generation process [1]. It is carried by the burner gases and collected by using an electrostatic or mechanical separation [2]. Davidovits, a French scientist, first proposed geopolymers. Geopolymer technology is one of the potential alternatives for increasing the use of fly ash. Regarding global warming, the alkaline-enabled geopolymer technology using fly ash not only has the potential to substantially lower the carbon footprint of normal Portland cement concrete but also has an enormous scope as a supplementary binder for applications in composite manufacturing [3]. Efficient waste management is important to maintain a safe environment [4]. The mechanical properties of geopolymer concrete are dependent upon various factors, including initial temperature required in curing temperature, curing time in hours, the age of samples in days, percent of total volume aggregate, sodium hydroxide molarity (M) solution, SiO 2 solid % in sodium silicate, and superplasticizer (percent P) [5,6]. Because of the porosity in the geopolymer network, the compressive strength is poor. Fly ash, however, needs little H 2 O and pushes maximal fill-up of particles to lower porous content due to its round portion shape [7]. Calcined clay and fly ash were mixed and analyzed by roles of the addition, reactiveness, strength due to compression, structural and microstructural characteristics, and CC versus FA ratio. Na 2 SiO 3 /NaOH was synthesized as an activator with 0, 25, 50, and 75% fly ash and calcined clay percentage in geopolymer mortar [8]. From the academic research, 357 data points were obtained, and the compressive strength of high strength concrete was predicted by using an ensemble random forest (RF) and gene 2 of 12 expression programming (GEP) algorithm. A proportioned blend trial mix requires us to determine a specific response. However, engineers are now using mathematical models to simulate a specific response to verify the prediction's performance, such as linear regression, neural networks (NN), or support vector regression (SVR). Since the relationship between attributes and composite properties is strongly nonlinear, everything is achieved [9]. The dataset provides data on cement ratio, silicate ratio, pulverized time, age of the specimen, and strength due to compression. With an increasing number of trees in random forest regression (RFR), the inaccuracy in predicting data beyond the test dataset decreases, and after 600 tries, the inaccuracy would become steady and very reduced. With an R 2 value of 0.89, the random forest model forecasted strength due to compression, using input datasets obtained by laboratory experiments [10]. Cement/fly-ash-based high-performance composite has 56 datasets. RFR was used to detect 28 days' strength due to compression. The RFR model and the back-propagation neural network (NN) model used a common dataset to predict strength selection of functions with and without [11]. Ground granulated blast furnace slag was gathered with 453 experimental samples, using the RFR model, to calculate the strength due to compression of concrete, including GGBFS [12]. Rubberized concrete (RC) is a cost-effective and eco-sustainable building material. There are a total of 138 datasets collected from the literature. The present study suggested establishing the connection between both the random forest (RF) and beetle antennae algorithm to search the essential factors of random forest. The result analysis showed the beetle antennae algorithm adjusted by RF. The correlation coefficient is strong in this case, as the proposed random forest model can accurately predict rubberized concrete's compressive strength with a correlation coefficient of 0.96 [13]. Table 1 offers a brief description of previously performed random forest regression studies. Fly-ash-and-calcined-clay-based geopolymer composites have much less research. In the current study, 35 samples were gathered from experimental work. As FA and CC are available as waste materials, they were mixed at varying proportions of 5%, 10%, and 15%, along with different ingredients, such as coarse aggregate (passing 10 mm and 20 mm IS sieve), stone dust as fine aggregate, NaOH (12M, 14M, and 16M), Na 2 SiO 3 , superplasticizer, and added water; and different curing temperatures were used, such as ambient, 80 • C, and 100 • C, with different curing durations, such as 24 and 48 h. The actual compressive strength obtained was predicted by using random forest regression. A total of 80% of samples were tested, and 20% of samples were trained by using RFR. The R square value describes the acceptability of a model. This machine learning approach saves the cost and time of tedious laboratory work.

Fly Ash
Fly ash following IS 3812-2003 specification was collected from Suratgarh thermal power plant, Rajasthan. The particle density of the fly ash was 2250 kg/m 3 . The chemical composition of fly ash and % of the mass are mentioned in Table 2. The 5%, 10%, and 15% of fly ash were substituted with calcined clay. The physical properties of fly ash were fineness retained on the 45-micron sieve; activity index test results lie between 80 and 86%, with a specification of minimum 75% at 28 days and 95-103% specification of minimum 85% at 90 days. The particle size distribution of fly ash has a significant impact on geopolymer concrete [14]. Raising the curing temperature has a beneficial compressionstrength influence. Because of the porosity in the geopolymer network, the compressive strength declines.

Calcined Clay
The calcined clay was prepared generally by burning the clay at 550 • C for one hour. The mechanical activation was accomplished by milling the clay for 4 h in a ring mill. This was bought from a place named Alwar in Rajasthan, India. It is obtained by calcining clay at a higher temperature. Clay is available as a natural source in abundance in many places. Calcined clay was used with 5%, 10%, and 15% variations of fly ash in geopolymer concrete. The calcined clay was used in geopolymer mortar preparation in one study, but no study was found using it in geopolymer concrete [14].

Sodium Silicate Solution
Na 2 SiO 3 is called water glass, and it is also accessible as a gel. The ratio of SiO 2 to Na 2 O in this study was 1.95 to 2.3. It was obtained from the market in liquid solution form. The chemical composition of it was Na 2 O 13.5%, SiO 2 33%, and water. The chemical composition can be seen in Table 3.

Sodium Hydroxide
Sodium hydroxide was purchased from the local market from Bhiwadi (Rajasthan) in solid-chip form. It was mixed with drinking water to obtain 12 M, 14 M, and 16 M of NaOH solution. By blending calcined clay and fly ash with the alkaline agent, the geopolymers pastes were prepared. The activator's alkalinity was adjusted by mixing it with different molarity sodium hydroxide.

Superplasticizer
Polycarboxylate ether was used as a superplasticizer. Its amount was kept as 1% of the total amount of fly ash.

Fine Aggregate/Crushed Stone Dust
There is a ban on sand, and its availability is very costly, so sand was replaced by stone crusher dust. Stone dust was available in abundance in nearby places in Rajasthan, and it was abundantly used in concrete. Thus, the same stone crusher dust was used in geopolymer concrete. This is following IS 383 specifications. Table 4 shows the physical properties of stone dust. Locally available aggregate varying from 10 to 20 mm in size was used as aggregate.

Sample Preparation and Testing Method
First, 12M, 14M, and 16M of NaOH solution were prepared one day before use by dissolving solid caustic soda in water. The prepared sodium oxide solution was then mixed with sodium silicate solution to prepare an alkaline activator. The alkaline activator ratio was kept at 2.5. To obtain a uniform mix, all the dry ingredients, such as fly ash, calcined clay, stone dust, and aggregate, were mixed for 3 min, as mentioned in Table 5, before adding an alkaline activator. Later, the alkaline activator and superplasticizer and required amount of water were added to the dry mixture and rotated in the concrete mixture for two minutes. After preparing the geopolymer concrete mixture, it was poured into a 100 mm cube mold to test the compressive strength and tensile strength. All concrete-filled specimens were kept for drying at room temperature for a day. Then the samples were removed from the mold and kept for curing at different temperatures. The geopolymer concrete was prepared with a 5-15% variation of fly ash with calcined clay. The prepared samples were then cured at ambient temperature, 80 • C, and 100 • C for 24 and 48 h. Thus, heat-cured samples were removed from the oven after 24 and 48 h and kept at room temperature. Then 28-day compressive strength and the 28-day tensile strength were found. The slump of above-prepared concrete was also found.

Modeling Technique
A prediction of the compression strength and tensile strength of the fly ash-calcined clay geopolymer composites was performed with random forest regression in the current study. The following are briefly described.

Random Forest Research (RFR)
Ho [15] was the first to develop the general method of random decision forest in 1995. Leo Breiman [16] created an extension of the algorithm. The term "random forest" refers to a collection of decision tree algorithms. As a classifier, the random forest algorithm consists of two phases, one is a selection of the feature, and the other is classification. Random forest (RF) is a group classifications used to increase precision. There are many decisionmaking trees in the random forest. In comparison to traditional classification algorithms, random forests have low ranking errors. Minimum size and number of trees, nodes, and characteristics are used to split each node [17]. Random forest is a non-parametric method derived from classification and regression trees. RF includes a mixture of several trees, where each bootstrap sample is generated for every tree, having left around onethird of the total validation sample. A random subset of the determinants at every node is used to determine each split of the tree. The result is that all trees have averaged results [18]. Random forest is a machine learning approach governed by a group that has just evolved [19]. The toughness of an independent decision tree and the relationships between base trees are important elements in determining the random forest classifier's generalization error [16]. In this way, research has been conducted to attempt to restrain the decision trees and discover the best subset of the random forest. Random forest trimming will result in a productive random forest regression for training, as well as testing. Because of bootstrap samples and, in particular, randomized classification techniques at every level of the tree, the random forest produces good results [20]. Random forest trimming will result in a productive random forest regression for training, as well as testing. Because of bootstrap samples and, in particular, randomized classification techniques at every level of the tree, the random forest produces good results [20]. Figure 1 depicts the methodological approach for our suggested method.

Data Collection
Experimental data on calcined clay and fly ash were collected from laboratory work. A total of 35 datasets were prepared from an experimental approach. Out of 35 datasets, 28 were trained on the random forest model, and the testing of 7 datasets was conducted. Figure 2 well explain the procedure of collected data modeling. The following Algorithm 1 is our proposed work.

Algorithm 1 Random forest modeling.
Input-Calcined clay, fly ash geopolymer concrete dataset. Output-Strength due to compression and tension of FACC based geopolymer composite.

Data Collection
Experimental data on calcined clay and fly ash were collected from laboratory work. A total of 35 datasets were prepared from an experimental approach. Out of 35 datasets, 28 were trained on the random forest model, and the testing of 7 datasets was conducted. Figure 2 well explain the procedure of collected data modeling. The following Algorithm 1 is our proposed work.  Stage-by-stage procedure of RFR modeling: Stage 1: Data loading. Stage 2: Use a preprocessing method. Stage 3: Divide the dataset into training and test categories. Stage 4: Random forest is used to train on the dataset. Stage 5: For classification, the test dataset is supplied into a random forest. Stage 6: Calculate the accuracy, errors, and precision.

Analysis of Model Performance
Different metrics were used to demonstrate the feasibility of each prototype and to analyze the performance. Every other indicator has the formula of deducting the model's performance. The commonly used metrics include root mean square error (RMSE), mean absolute error (MAE), mean squared error (MSE), and R 2 . These factors are described below in mathematical terms.
where n was the total number of datasets, x and yref were reference values in the dataset, and xi and ypred were predicted values of models. The performance of the model was also assessed in this paper by using the coefficient of determination (R 2 ). The reflective practice that reveals the connection between experimental and expected outputs was the value obtained through the model [21].

Analysis of Model Performance
Different metrics were used to demonstrate the feasibility of each prototype and to analyze the performance. Every other indicator has the formula of deducting the model's performance. The commonly used metrics include root mean square error (RMSE), mean absolute error (MAE), mean squared error (MSE), and R 2 . These factors are described below in mathematical terms.
where n was the total number of datasets, x and y ref were reference values in the dataset, and x i and y pred were predicted values of models. The performance of the model was also assessed in this paper by using the coefficient of determination (R 2 ). The reflective practice that reveals the connection between experimental and expected outputs was the value obtained through the model [21].

Results and Discussion
The prediction efficiency of the developed random forest regressor models was assessed in by utilizing training and testing datasets. The training set was utilized to evaluate the design and model parameters. The test dataset, on the other hand, was used only if the succeeding regressor had been defined to assess the model's quality.
(1) Tables 6-9 illustrate the results of the various statistical metrics of the models for both the training and testing phases, based on the projected values for compressive strength and split tensile strength. (2) In the case of compressive strength RFR, the R 2 was determined to be 0.93 in the training dataset. Similarly, the R 2 was obtained as 0.58 in the testing phase. Furthermore, RFR was shown to have the best value among the statistical measures used in testing as (MSE = 10.41, RMSE = 3.22, MAE = 3.07). The RFR model excels at capturing the nonlinear interactions between geopolymer mix design proportions and temperatures with compressive strength, which could explain its supremacy. Consequently, since it relies on empirical analytical evaluations, it may be inferred that the RFR model produced the desired results [8,22]. (3) The R 2 , MAE, and RMSE of the predicted values, using the RFR, were also calculated [23]. For split tensile strength the training dataset, MSE, RMSE, R 2 , and MAE values were 0.88, 0.25, 0.88, and 0.0256, respectively. Using the RFR technique, calculate the R 2 , MAE, and RMSE of the anticipated values [24]. This research could help engineers choose optimal supervised learning models and parameters for geopolymer concrete manufacturing. This graph suggests that employing the RFR model could be beneficial. To forecast the strength due to compression of geopolymer concrete at various temperatures, 12 input variables are sufficient and have reasonable precision. Using a set of 12 input variables could be justified and useful for practical and engineering applications, according to the findings. R 2 is regarded as very weak, low, medium, or strong if ranges as >0.3, 0.3 < r < 0.5, 0.5 < r < 0.7, or r > 0.7, respectively [25]. (4) The highest R 2 score and the fewest other errors have shown some positive results with appropriate dimensions [26]. Figure 3, has an R 2 score of 0.93, which show that model is highly trained.The mean MSE for RFR is 6.35 and 5.803 for training and testing data. The predictive precision and widespread potential of the RFR are high [11]. There is a loss of training and testing data that can be sorted when the model is taught from an enormous dataset. For MAE, the average MAE is 1.826 and 2.288 for training and testing. Losses are not so much in training as they are in testing data [27]. (5) Figures 4 and 5 show a graphical representation of experiment value (actual) and projected strength due to compression of fly-ash, calcined-clay-based geopolymer concrete at various temperatures, using RFR supervised learning algorithms for the training and testing phases. These data show that RFR models performed as per training and testing in forecasting geopolymer concrete compressive strength at various temperatures in terms of statistical performance. (6) Supervised learning models, such as other artificial intelligence systems, have a limited range of scope and are heavily case-dependent. As a result, their generalizability is constrained, and therefore can only be used with a limited collection of trained data. Moreover, in contrast to other models, the created RFR model is capable of correctly and effectively predicting the compressive strength at varying temperatures. However, as the latest data arrive, this model can be adjusted to perform better. model is highly trained.The mean MSE for RFR is 6.35 and 5.803 for training and testing data. The predictive precision and widespread potential of the RFR are high [11]. There is a loss of training and testing data that can be sorted when the model is taught from an enormous dataset. For MAE, the average MAE is 1.826 and 2.288 for training and testing. Losses are not so much in training as they are in testing data [27].  (5) Figures 4 and 5 show a graphical representation of experiment value (actual) and projected strength due to compression of fly-ash, calcined-clay-based geopolymer concrete at various temperatures, using RFR supervised learning algorithms for the training and testing phases. These data show that RFR models performed as per training and testing in forecasting geopolymer concrete compressive strength at various temperatures in terms of statistical performance.  (6) Supervised learning models, such as other artificial intelligence systems, have a limited range of scope and are heavily case-dependent. As a result, their generalizability is constrained, and therefore can only be used with a limited collection of trained data. Moreover, in contrast to other models, the created RFR model is capable of correctly and effectively predicting the compressive strength at varying temperatures. However, as the latest data arrive, this model can be adjusted to perform better.

Conclusions
1. In this work, RFR was used to predict the compressive strength at ambient temperature, 80 °C, and 100 °C curing temperature for 24 and 48 h. The best result was  (5) Figures 4 and 5 show a graphical representation of experiment value (actual) and projected strength due to compression of fly-ash, calcined-clay-based geopolymer concrete at various temperatures, using RFR supervised learning algorithms for the training and testing phases. These data show that RFR models performed as per training and testing in forecasting geopolymer concrete compressive strength at various temperatures in terms of statistical performance.  (6) Supervised learning models, such as other artificial intelligence systems, have a limited range of scope and are heavily case-dependent. As a result, their generalizability is constrained, and therefore can only be used with a limited collection of trained data. Moreover, in contrast to other models, the created RFR model is capable of correctly and effectively predicting the compressive strength at varying temperatures. However, as the latest data arrive, this model can be adjusted to perform better.

Conclusions
1. In this work, RFR was used to predict the compressive strength at ambient temperature, 80 °C, and 100 °C curing temperature for 24 and 48 h. The best result was  The RFR model's predictive skills were evaluated by using statistical measure criteria, such as R 2 , MAE, and RMSE. The R 2 value comes out to be 0.58 for the testing phase of RFR, which is an acceptable value of the coefficient of correlation. The training results of R 2 as 0.935 are also good for 28 days of compressive strength.

3.
The findings of the testing phase demonstrated that the supervised learning models developed in this work were successful in predicting geopolymer concrete compressive strength at various ranges of temperature. This paper predicted 28 days of compressive and tensile strength.

4.
Statistics research reveals that the RFR model is effective. Correctness is improved by reducing the erroneous gap between the actual and forecasted parameters. Various metrics, such as MAE, RMSE, R 2 , and MSE, were the deciding parameters.

5.
As a result, the use of RFR in the domain of forecasting compressive strength at various temperatures as an alternative to destructive testing methods is reasonable and can be considered as a viable option, and the same is applied to tensile strength. 6.
Due to the addition of weak classifiers (decision tree), random forest is an ensemble strategy that delivers a consistent performance between observed and forecasted values and gives the coefficient of determination R 2 as 0.58.