A Machine Learning-Assisted Numerical Predictor for Compressive Strength of Geopolymer Concrete Based on Experimental Data and Sensitivity Analysis

: Geopolymer concrete offers a favourable alternative to conventional Portland concrete due to its reduced embodied carbon dioxide (CO 2 ) content. Engineering properties of geopolymer concrete, such as compressive strength, are commonly characterised based on experimental practices requiring large volumes of raw materials, time for sample preparation, and costly equipment. To help address this inefficiency, this study proposes machine learning-assisted numerical methods to predict compressive strength of fly ash-based geopolymer (FAGP) concrete. Methods assessed included artificial neural network (ANN), deep neural network (DNN), and deep residual network (ResNet), based on experimentally collected data. Performance of the proposed approaches were evaluated using various statistical measures including R-squared (R 2 ), root mean square error (RMSE), and mean absolute percentage error (MAPE). Sensitivity analysis was carried out to identify effects of the following six input variables on the compressive strength of FAGP concrete: sodium hydroxide/sodium silicate ratio, fly ash/aggregate ratio, alkali activator/fly ash ratio, concentration of sodium hydroxide, curing time, and temperature. Fly ash/aggregate ratio was found to significantly affect compressive strength of FAGP concrete. Results obtained indicate that the proposed approaches offer reliable methods for FAGP design and optimisation. Of note was ResNet, which demonstrated the highest R 2 and lowest RMSE and MAPE values.

strength of FAGC was investigated in a study by Nguyen et al. [20]. With high rate of recognition accuracy within a complex network, the ResNet model showed better performance than the DNN models, with two main forward and backward passes; therefore, it has been used in several advanced engineering problems [26,27]. ResNet and DNN approaches were also employed to predict compressive strength of conventional and foamed concrete in the studies by Jang et al. [28] and Nguyen et al. [29], respectively. Against this background, current solutions to predict compressive strength of FAGP concrete have not been dealt with in depth within existing literature. Although various machine learning approaches including ANN and DNN have been separately introduced in several studies [19,21] as numerical predictors for FAGP strength, a thorough search of relevant published literature yielded a mere presence of the ResNet approach in FAGP property prediction. The lack of studies on impacts of input parameters (e.g., mix proportion ratios, NaOH concentration, and curing conditions) on geopolymer strength indicates possible improvements for upcoming research. More comprehensive research needs to be carried out to investigate the effectiveness of various machine learning methods in predicting compressive strength of FAGP concrete, considering a wider variety of input parameters and sensitivity analysis.
As such, this study aims to offer advancements to the existing literature by employing ANN, DNN, and ResNet approaches integrated with sensitivity analysis to predict the compressive strength of FAGP concrete. These models were trained through 263 pairs of input/target values obtained from experiments. Performance of FAGP strength prediction of the three proposed approaches was investigated in two phases. In the first phase, the models were trained and validated using randomly shuffled datasets. Additional training and assessment under K-fold cross validation schemes were then carried out to confirm the results obtained from the first phase. Impacts of six input parameters (including NaOH/Na2SiO3 ratio, fly ash/aggregate ratio, alkali liquid/fly ash ratio, NaOH concentration, curing time, and temperature) on prediction models were investigated using sensitivity analysis. Outcomes from sensitivity analysis are expected to identify the critical input parameters in FAGP strength prediction and control them carefully during geopolymer production. Three measures including R-squared (R 2 ), root mean square error (RMSE), and mean absolute percentage error (MAPE) were employed to evaluate the accuracy of the proposed machine learning techniques.

Artificial Neural Network (ANN)
Inspired by the biological neuron system, ANN is based on a suite of mutually connected units, known as perceptrons, which replicate the functions of neurons in the human brain. ANN is one of the main models used in machine learning where its structure is formed by three layers of neurons including input, hidden, and output layers. Independent variables enter the system through the input layer and are processed in the hidden layer, while predicted values are generated in the output layer. Figure 1 presents the basic concept of ANN.

Deep Neural Network (DNN)
DNN consists of more layers and neurons than ANN, leading to its ability to learn functions with a high degree of complexity. DNN possesses a powerful representational ability of input data and can reduce over-fitting issues in regression performance [30]. With powerful representational ability, DNN is able to achieve high accuracy in various tasks [31]. A typical DNN network structure is presented in Figure 2, including two main forward and backward phases.

Deep Residual Network (ResNet)
ResNet was developed to overcome a limitation in training deep networks where training errors can increase as the number of layers increases [20]. Owing to modified architectures, ResNet models have been empirically confirmed to enhance learnability of neural networks with less error on defined tasks using a limited number of layers [32]. ResNet consists of residual blocks with shortcut connections as shown in Figure 3, where the formulation H(x) is the desired mapping output of a specific layer and x is the input data. Given the presence of shortcut connections, gradient-based optimisation algorithms work effectively under ResNet-based architectures and improve the learnability of weight layers representing the function F(x) [33].

Materials and Mixing Process
Constituent materials of the FAGP concretes considered were fly ash, coarse and fine aggregates, alkali activator, and water. Low-calcium fly ash (class F) with a specific gravity of 2500 kg/m 3 was used as the main aluminosilicate precursor. The chemical composition of the fly ash used is presented in Table 1, which conforms to requirements from ASTM 618 [ASTM]. The FAGP concrete mix designs and mixing processes were based on a previous study by Nguyen et al. [20]. Geopolymer mix designs were formulated based on various binder and aggregate contents, concentration of sodium hydroxide, and curing conditions. The ratio of fly ash mass to total aggregate mass (fly ash/aggregate) varied from 0.13-0.37. Specific gravities of the coarse and fine aggregates were 2700 kg/m 3 and 2650 kg/m 3 , respectively. Sodium silicate solution consisting of 36% Na2O and 38% SiO2 by mass was mixed with sodium hydroxide with a wide range of concentrations including 4M, 8M, 11M, 12M, 15M, and 18M to prepare alkali liquid (AL). The ratios of NaOH/Na2SiO3 and AL/fly ash ranged from 0.4-2.5 and 0.3-0.7, respectively.
Fly ash and aggregates were mixed together on a slow setting for about three minutes. Alkali solution was then added and mixed for a further four minutes before casting. Fresh FAGP concrete was cast in standard cylinder moulds (100 mm diameter, 200 mm high), de-moulded after 24 h, and then cured in an oven at temperatures 40, 60, 80, 90, 100, and 120 °C for 2, 4, 6, 8, 10, and 12 h. The processing and testing procedure is represented in Figure 4.

Data Preparation for Machine Learning Approaches
According to previous studies [10,22,34], FAGP concrete properties depend on constituent material proportioning, concentration of sodium hydroxide (CM), and curing conditions. In this study, a total of 263 pairs of input/target values fabricated from different geopolymer mix proportions, NaOH concentration, and curing conditions were designed to generate the data for running the machine learning-based models. Inside these models, the six input variables considered to estimate the compressive strength of FAGP concrete were: NaOH/Na2SiO3, fly ash/aggregate and AL/fly ash, CM, curing time, and curing temperature. For compressive strength measurement, FAGP concrete cylinders were subjected to axial compression with a loading rate of up to 0.35 MPa/s according to ASTM C39/C 39M-18 [14] after seven days. At least three specimens were tested for each mix design of FAGP concrete to obtain the mean value of the compressive strength. The test data from experimental works are given in Table 2.

Research Methodology
In this study, 263 datasets (each comprising six inputs and one output) were used to train and validate ANN, DNN, and ResNet models. In terms of inputs, each dataset comprised a unique combination of the six mix design values considered, as summarised in Table 2. The output for each dataset was the corresponding average compressive 7-day strength result obtained from experimental testing. The range of strength values recorded for the 263 combinations considered was 5.55-67.86 MPa.
A data division scheme was applied to reduce possibilities of error and improve the reliability of predicted results. Random selection of about 90% of the values (235 datasets) in the training dataset were chosen from the original data collection to train the network, while the remaining values (28 datasets) remained untrained as a validation database to confirm the accuracy of the trained network. The structures of three machine learning approaches including ANN, DNN, and ResNet are presented in the schematic flowchart in Figure 5. FAGP concrete compressive strength was predicted by employing ANN, DNN, and ResNet architectures comprising weight, normalisation, and activation layers in regression tasks. For comparative purposes, DNN and ResNet models consisted of the same number of nodes with 128 nodes in Weight Layer 1 and 256 nodes in Weight Layer 2. A layer with 256 nodes, known as Weight Layer 3, was included to enable additional operation at the end of ResNet implementation. The ANN model with one weight layer comprised 384 nodes. One of the stochastic gradient descent methods, known as Adam [35], was used as the optimisation method to update neural networks coefficients since it integrated advanced features from different optimisation algorithms, including AdaGrad and RMSProp. The layer normalisation method introduced by Ba et al. [36] was employed to ensure inputs to layers fell within specific ranges since it exhibited efficient training time in neural network architecture compared to traditional batch normalisation. Training models without normalisation were also carried out to validate the effectiveness of the model integrated with layer normalisation. During the training process, dropping out units with keep probability of 0.2 in the architectures were included in the final models to prevent overfitting problems. Table 3 presents details of the setting of six architectures (known as architectures 1-6) implemented in this study. Three statistical measures including R 2 , RMSE, and MAPE were applied to evaluate the accuracy of the proposed machine learning approaches under the K-fold cross validation scheme. These parameters provide insights into differences between original and estimated values. Higher R 2 value and/or lower MAPE and RMSE values indicate better prediction performance of machine learning approaches [19]. The three statistical measures were calculated using the following equations: n y y y y R n y y n y y where j y and ' j y are the compressive strength obtained from experiments and predictions respectively; n is the number of datasets.
The K-fold cross validation method divides data into K equal folds and then does K independent training iterations on the prediction model with (K-1) folds while leaving the remaining fold for validation purposes. In this experiment, the common value K = 10 was used. The performance of the prediction model was judged by averaging the metric measurement (R 2 , MAPE, and RMSE) measured in K training and evaluating the iterations as follows: where MK-fold denotes a general metric measurement when K-fold cross validation is applied, and mk is the metric measurement in the fold k of the procedure. An important note is that the same training and validation sets in each fold were used to train and validate each model. A hypothesis test (e.g., paired t-test) with a significance level α = 0.05 was then applied to the accurate measurements of each model on validation sets in 10 divided folds to confirm the statistical significance of the results. The null hypothesis was that these measurements are all in the same population (or belong to the same model), suggesting there is no difference between the performance of two evaluated models. From the t-test, a p-value less than the chosen significance level (α = 0.05) can statistically confirm the advance of a prediction model over the others (rejecting the null hypothesis), while the p-value greater than this significance level may suggest that the event, or the numerical conclusion, happens by chance (not rejecting the null hypothesis).

Estimative Performance of ANN, DNN, and ResNet Approaches
In the first phase, six predictive models based on three proposed machine learning approaches (ANN, DNN, and ResNet) were trained and validated using randomly shuffled datasets obtained from experimental works. Input variables in the dataset consisted of six parameters including mixture proportions (i.e., NaOH/Na2SiO3, fly ash/aggregate, AL/fly ash), NaOH concentration, and curing conditions. Compressive strength of FAGP specimens was regarded as output variable. The results from the first phase were aimed to provide a short list of models to further test with K-fold cross validation and the t-test method as described in Section 4. R 2 , RMSE, and MAPE values for the ANN (architecture 1 and 2), DNN (architecture 3 to 4), and ResNet (architecture 5 to 6) models are summarised in Table 4, with the bold numbers representing the best predictive model of each approach. As shown, architectures 1, 3, and 6 were found to be the best ANN, DNN, and ResNet models, respectively. From the six architectures presented in Table 4, ResNet-based architecture 6 was the best model for determining FAGP concrete compressive strength with the highest R 2 of 0.937 and lowest RMSE and MAPE values (1.987 and 6.6, respectively). Apart from the ResNet models, ANN-based architecture 1 (R 2 = 0.889; RMSE = 4.711; MAPE = 14.06) and DNN-based architecture 3 (R 2 = 0.898; RMSE = 2.521; MAPE = 9.496) showed better predictive performance than the other models (architecture 2, 4, and 5). Based on these observations, ANN-based architecture 1, DNN-based architecture 3, and ResNet-based architecture 6 were selected for further investigation. In the second phase, further investigation into performance of the proposed approaches was carried out using additional training and assessment under a 10-fold scheme with three architectures: 1, 3, and 6. Results of various statistic measures (R 2 , MAPE, and RMSE) for each fold and the average (Avg.) values with standard deviations are presented in Table 5. The same training and validation sets of each fold were applied for the three models 1, 3, and 6. As shown in this table, ResNet-based architecture 6 obtained the best strength prediction performance in terms of R 2 (0.934 ± 0.021), RMSE (2.750 ± 0.573), and MAPE (8.552 ± 1.333. Also, a further paired t-test with α = 0.05 was applied to prove the statistical significance of this observation. As presented in Table 6, p-values from the comparisons of ResNet model and ANN/DNN models were lower than the chosen significance level (α = 0.05), providing statistical confirmation that the ResNet model out-performed the ANN/DNN model in terms of FAGP strength prediction.  Table 6. Paired t-test for statistical significance between ResNet and ANN-based models (architecture 6 and 1) and between ResNet and DNN-based models (architecture 6 and architecture 3). Relationships between the experimental and predicted strength values from architectures 1, 3, and 6 are illustrated in Figure 6. As shown, compressive strength values predicted by all machine learning models were close to the actual values obtained from compression experiments, indicating that the proposed approaches were successfully trained to predict FAGP compressive strength. The ResNet model outperformed the other models with the strongest relationship existing between actual and predicted values. This observation was confirmed in Figure 7, which presents the correlation coefficient (R) of the three approaches in terms of validation data. Minimal variation existed between actual and predicted values existed for the ANN, DNN, and ResNet models, albeit with the highest variance being associated with the former (architecture 1).  The relationship between iterations of the three best performed architectures (1, 3, and 6) and validation RMSE is shown in Figure 8. The highest convergence speed was observed in ResNet-based architecture 6 model, which required only 2000 iterations to reach a validation RMSE of 4.8 MPa. For the same RMSE, higher iteration numbers of 6000 and 148,000 were required for DNN and ANN models, respectively. After convergence, sufficiently low values of RMSE were observed in the ResNet and DNN models, indicating better performances over the ANN model. For instance, at the same iteration value of 152,000, the ANN model converged at an RMSE of 4.7 MPa while ResNet and DNN models achieved lower values of RMSE (2.1 MPa and 2.7 MPa, respectively).  Figure 9 presents the distribution of error rates at 5% increments for predicted results obtained from architectures 1, 3, and 6. It is noted that the majority of datasets (61%) from the ResNet model exhibited error levels less than 5%. Corresponding frequencies of errors less than 5% for the ANN and DNN models were significantly lower (approximately 21 and 35%, respectively). In terms of errors less than 20%, frequencies for the ANN, DNN, and ResNet models were 79, 89, and 89%, respectively. In terms of ranking, therefore, the ResNet model provided the best estimative performance, followed by the DNN and ANN models.

Sensitivity Analysis
Sensitivity analysis is commonly used to evaluate how input parameters affect output variation derived by machine learning models [37]. As the best performing model, ResNet-based architecture 6 was exclusively selected for this analysis, which involved calculating FAGP concrete compressive strength by changing one input variable at a time while maintaining the other five as constants based on their mean values. For example, to assess the importance of the NaOH/Na2SiO3 ratio, this value was varied from 0.4-2.5, while fly ash/aggregate, AL/Fly ash, NaOH concentration, curing time, and temperature values were kept constant at mean values of 0.23, 0.5, 14 (M), 8 h, and 85.6 °C, respectively. Data derived from this sensitivity analysis were returned to the training process to estimate compressive strength. For each parameter, a corresponding sensitivity analysis factor was given by the expression: where fmax(xi) and fmin(xi) are the maximum and minimum estimated compressive strengths relating to the input variable xi, with all other input parameters kept constant at their mean values. Figure 10 shows the results of this sensitivity analysis, from which a pronounced influence (35.5%) of fly ash/aggregate ratio on estimated compressive strength can be seen. A similar effect was observed in the study by Joseph and Mathew [11], and can be explained by the fact that the internal void structure formed by fly ash and aggregates has direct effects on FAGP compressive strength. Additionally, shown in this figure are high sensitivity scores of 16.22%, 16.18%, and 14.93% for curing time, NaOH concentration, and curing temperature, respectively. This confirms the observations from previous studies by [22,23] where curing conditions play significant roles in compressive strength of FAGP concrete. As such, various factors such as mix proportions, sodium hydroxide concentration, and curing regimes should be thoroughly considered in the prediction of FAGP mechanical properties using machine learning approaches. In particular, based on these findings, it is recommended that the fly ash/aggregate ratio is carefully determined and controlled in geopolymer manufacturing processes owing to its pronounced effect on FAGP strength.

Conclusions
This study employed three different machine learning approaches including ANN, DNN, and ResNet to predict compressive strength of fly ash-based geopolymer concrete. Six parameters of mix design and curing conditions (including NaOH/Na2SiO3, fly ash/aggregate, AL/Fly ash, concentration of sodium hydroxide, curing time, and temperature) and corresponding 7-day compressive strength results were used to generate 263 unique input/output pairs for model training purposes.
While the results indicated that all three machine learning approaches could predict FAGP concrete compressive strength with some degree of accuracy, the ResNet model was the most promising method with the highest R 2 (0.937) and the lowest RMSE (1.987) and MAPE (6.6) values. This observation was confirmed by additional training and assessment under the K-fold cross validation scheme and paired t-test with α = 0.05, where the highest R 2 (0.934 ± 0.021) and the lowest RMSE (2.750 ± 0.573) and MAPE (8.552 ± 1.333) were observed in the ResNet-based model. Sensitivity analysis performed for the ResNet model confirmed that the ratio of fly ash/aggregate was the most dominant factor when predicting compressive strength, with a sensitivity analysis score of 35%. This was followed in order of importance by curing temperature (16.22%), NaOH concentration (16.18%), curing time (14.93%), NaOH/Na2SiO3 ratio (12.90%), and AL/fly ash ratio (4.22%). This analysis indicates the importance of considering a wide range of input parameters in the prediction of FAGP concrete compressive strength and controlling them carefully during the manufacturing process.
This study provides a detailed understanding of performance of different machine learning approaches in strength prediction for FAGP concrete. The findings highlight potential uses of the proposed machine learning approaches such as ResNet and DNN as effective tools to, not only precisely predict mechanical properties of FAGP, but also to develop mix designs for geopolymer concrete. In the next phase of work, consideration will be given to how ResNet and DNN models can be applied in FAGP manufacturing industries to predict optimised mix designs and curing regimes based on target compressive strength. This predictive ability will also be linked to mix design evaluations in terms of potential cost and environmental benefits prior to construction stages. In addition, the proposed machine learning approaches adopted a general training scheme for neural networks with standard input and output features, indicating promising potential to be applied to other regression problems in upcoming research works.