Comparison of Machine Learning Approaches with Traditional Methods for Predicting the Compressive Strength of Rice Husk Ash Concrete

: Efforts are being devoted to reducing the harmful effect of the construction industry around the globe, including the use of rice husk ash as a partial replacement of cement. However, no method is available to date to predict the compressive strength (CS) of rice husk ash blended concrete (RHAC). In this study, advanced machine learning techniques (artiﬁcial neural network, artiﬁcial neuro-fuzzy inference system) were used to predict the CS of RHAC. Based on the published literature, six inputs, i.e., age of specimen, percentage of rice husk ash, percentage of superplasticizer, aggregates, water, and amount of cement, were selected. Results obtained from machine learning methods were compared with traditional methods such as linear and non-linear regressions. It was observed that the performance of machine learning methods was superior to traditional methods for determining the CS of RHAC. This study will prove beneﬁcial in minimizing the cost and time of executing laboratory experiments for designing the optimum content portions of RHAC.


Introduction
Concrete is considered the most versatile and extensively used human-made material for infrastructure development on the earth. The annual production of concrete is estimated to be around 30 billion tons [1,2]. The global consumption of concrete is increasing day by day due to rapid urbanization in emerging countries. Cement, which is the main constituent of concrete, is a major contributor to greenhouse gases and is responsible for around 7% of total global carbon dioxide emissions [1][2][3]. In addition, with the increase in the fuel prices around globe, the per unit cost of cement has also increased [4]. Keeping in view these concerns about the environment, the cost of construction materials, the shortage of raw materials, and the high energy demand, the practice of utilizing alternate materials for construction is becoming a common concern across the globe.
Supplementary cementitious materials (SCMs) are materials that can be used as a partial replacement for ordinary Portland cement (OPC). Currently, there are numerous industrial and agricultural waste materials in practice that are being used as SCMs all over the world. The most widely used SCMs are fly ash (FA), sugarcane bagasse ash (SCBA), rice husk ash (RHA), volcanic ash (VA), electric arc furnace slag (EAFS), zeolite (ZLT), metakaolin (MK), and silica fume (SF). The utilization of agricultural waste in the construction industry will be a smart choice as it will reduce the harm caused by agricultural and construction industries to the environment. As cement is the most expensive ingredient of concrete, replacing it with potential agricultural waste materials will be beneficial to both the environment and the construction industry.
The percentage replacement of cement is determined based on the physical and chemical properties of waste material. Using SCMs in concrete may affect the different properties of concrete. Some SCMs may increase the durability of concrete while adversely affecting the CS of concrete. Hence, it is necessary to evaluate the physical and chemical properties of SCMs before their usage in concrete. In addition, the calculation of the optimum percentage of SCMs is important to obtain the best quality of concrete. According to past studies, RHA can be utilized as an SCM in concrete because of its high amount of silica (more than 90%) [5]. The chemical composition of RHA is presented in Figure 1 [6]. A vast number of studies were carried out in the past to study the effect of RHA on different properties of concrete. Figure 2a shows the number of RHA-related articles published in each year since 2000. The significant increasing trend during the last five years (Figure 2a) shows the importance and feasibility of RHA as a potential SCM. Furthermore, most of the research on RHA is carried out by agricultural countries in which the construction industry is in a boom ( Figure 2b).
Ameri et al. [7] observed that the early CS of RHAC increased robustly, yet it was limited by the amount of RHA. An increase in the RHA content to 15% resulted in decreased CS due to the excess amount of unreactive silica. The CS of RHAC was 9, 12, 13, and 16% higher than the normal OPC mix. Likewise, Chao-Lung et al. [8] used RHA as an SCM and concluded that RHAC imparted a CS 1.2 to 1.5 times higher than that of the normal OPC mix. Similarly, Chindaprasirt et al. conducted [9] research to test RHAC for sulfate attack resistance and concluded that RHAC showed greater resistance to sulfate attack. Moreover, Thomas et al. [10] highlighted that the dense microstructure of RHA can reduce the water absorption of concrete by 30%. Besides some technical benefits, numerous studied were conducted on the environmental impact of RHA. For instance, Gursel et al. [11] conducted research on the utilization of RHA in cement concrete and found it useful in reducing the global warming potential. Similar research was conducted by Moraes et al. [12], where they found that the use of RHA in cement mortar aided in reducing the harmful impacts on the environment due to cement. Therefore, based on the findings of the above-mentioned studies, RHA can be successfully utilized as an SCM as it inherits a carbon footprint much smaller than that of OPC. Its applications can be varied from structural concrete to sulfateresistant concrete. Hence, it can contribute towards the stability as well as durability of modern structures in a sustainable way [13].
Crystals 2021, 11, x FOR PEER REVIEW 2 of 15 metakaolin (MK), and silica fume (SF). The utilization of agricultural waste in the construction industry will be a smart choice as it will reduce the harm caused by agricultural and construction industries to the environment. As cement is the most expensive ingredient of concrete, replacing it with potential agricultural waste materials will be beneficial to both the environment and the construction industry. The percentage replacement of cement is determined based on the physical and chemical properties of waste material. Using SCMs in concrete may affect the different properties of concrete. Some SCMs may increase the durability of concrete while adversely affecting the CS of concrete. Hence, it is necessary to evaluate the physical and chemical properties of SCMs before their usage in concrete. In addition, the calculation of the optimum percentage of SCMs is important to obtain the best quality of concrete. According to past studies, RHA can be utilized as an SCM in concrete because of its high amount of silica (more than 90%) [5]. The chemical composition of RHA is presented in Figure 1 [6]. A vast number of studies were carried out in the past to study the effect of RHA on different properties of concrete. Figure     Ameri et al. [7] observed that the early CS of RHAC increased robustly, yet it was limited by the amount of RHA. An increase in the RHA content to 15% resulted in decreased CS due to the excess amount of unreactive silica. The CS of RHAC was 9, 12, 13, and 16% higher than the normal OPC mix. Likewise, Chao-Lung et al. [8] used RHA as an SCM and concluded that RHAC imparted a CS 1.2 to 1.5 times higher than that of the normal OPC mix. Similarly, Chindaprasirt et al. conducted [9] research to test RHAC for sulfate attack resistance and concluded that RHAC showed greater resistance to sulfate attack. Moreover, Thomas et al. [10] highlighted that the dense microstructure of RHA can reduce the water absorption of concrete by 30%. Besides some technical benefits, numerous studied were conducted on the environmental impact of RHA. For instance, Gursel et al. [11] conducted research on the utilization of RHA in cement concrete and found it useful in reducing the global warming potential. Similar research was conducted by Moraes et al. [12], where they found that the use of RHA in cement mortar aided in reducing the harmful impacts on the environment due to cement. Therefore, based on the findings of the above-mentioned studies, RHA can be successfully utilized as an SCM as it inherits a carbon footprint much smaller than that of OPC. Its applications can be varied from structural concrete to sulfate-resistant concrete. Hence, it can contribute towards the stability as well as durability of modern structures in a sustainable way [13].
A substantial amount of time is required to develop and carry out extensive testing of RHA. The rate at which the environment is constantly degrading does not provide much time to carry out testing and research on RHA. As a result, RHAC cannot be subjected to ample lab work. Moreover, the hygroscopic nature of RHA makes it difficult to devise a certain control mix. Consequently, artificial intelligence (AI) is being used on a large scale to predict the properties of different SCMs. These properties also include the CS of different mixes. Some of the AI techniques (Table 1) used to model and predict the properties of different materials are artificial neural networks (ANNs), artificial neurofuzzy interface systems (ANFISs), gene expression programming (GEP), support vector machines (SVMs), backpropagation neural networks (BPNNs), extreme learning machines (ELMs), multiple non-linear regression (MNLR), linear regression (LR), and response surface methodology (RSM) [14,15]. Much research is conducted on RHAC by using and comparing different techniques. However, no research has been conducted thus far through a comparison with RSM. A substantial amount of time is required to develop and carry out extensive testing of RHA. The rate at which the environment is constantly degrading does not provide much time to carry out testing and research on RHA. As a result, RHAC cannot be subjected to ample lab work. Moreover, the hygroscopic nature of RHA makes it difficult to devise a certain control mix. Consequently, artificial intelligence (AI) is being used on a large scale to predict the properties of different SCMs. These properties also include the CS of different mixes. Some of the AI techniques (Table 1) used to model and predict the properties of different materials are artificial neural networks (ANNs), artificial neuro-fuzzy interface systems (ANFISs), gene expression programming (GEP), support vector machines (SVMs), backpropagation neural networks (BPNNs), extreme learning machines (ELMs), multiple non-linear regression (MNLR), linear regression (LR), and response surface methodology (RSM) [14,15]. Much research is conducted on RHAC by using and comparing different techniques. However, no research has been conducted thus far through a comparison with RSM.
In this research, four programming techniques were used: ANN, ANFIS, RSM, and LR. Since AI is a complex programming process and a great optimization is required to attain high accuracy, the results obtained from these four techniques will be compared with each other as well as with experimental data obtained from the literature to assess the accuracy. For modeling, a vast database of the peer-reviewed literature was obtained.

Data Collection
To accurately predict the CS of RHAC, a dataset of 192 data points from the literature was used to develop mathematical models [7,8,[30][31][32][33][34]. The RHAC in the whole dataset consists of the same components which are as follows: OPC, RHA, aggregates, water, and superplasticizer (SP). The same types of cement and curing periods were used in all the mixes obtained from the literature.
A conversion factor of 0.8 (according to the standard BS 1881: Part 120:1983) was used to convert the CS of cubic specimens to the CS of cylinders. This research was focused on obtaining the CS of different mixes of RHAC through AI. Variables obtained from the literature such as the quantity of OPC (QOPC), percentage of SP, amount of water (W), curing age (CA), amount of aggregates (AGG), and quantity of RHA (QRHA) were utilized as input parameters. Histograms for all the variables used in this study are shown in Figure 3. Furthermore, the statistical description of the obtained data is presented in Table 2.

Methodology
The Methodology section provides brief details about the approaches taken to determine the CS of concrete mathematically. At first, the AI processes used in this research are explained. The results obtained from AI data processing techniques are assessed for validity by different statistical parameters.

Modeling Techniques
Different properties of materials can be assessed by using machine learning modeling techniques [19,35,36]. The mathematical models developed through these techniques suffice for the prediction of different properties without any elementary knowledge about

Methodology
The Methodology section provides brief details about the approaches taken to determine the CS of concrete mathematically. At first, the AI processes used in this research are explained. The results obtained from AI data processing techniques are assessed for validity by different statistical parameters.

Modeling Techniques
Different properties of materials can be assessed by using machine learning modeling techniques [19,35,36]. The mathematical models developed through these techniques suffice for the prediction of different properties without any elementary knowledge about the lab work or experiments. A brief introduction of modeling techniques used in this paper is provided in this section.

Artificial Neural Network (ANN)
As the name suggests, an artificial neural network's structure is based on the structure of the human brain network. It is a computer-based artificial technique for data analysis. There are different mechanisms in ANN which a researcher may use. However, one of the most common mechanisms is feedforward backpropagation (FFBP). An FFBP-type network essentially consists of three folds, namely, the input, black box, and output. These folds relate to each other through a proper sequence with the aid of different weights. The input fold only receives the data information of variables from outside, while the black box is a hidden fold where data get processed. These processed data are displayed to the viewer through the output fold [37,38].
The FFBP can be further classified into single-layer perceptron (SLP) and multiplelayer perceptron (MLP). An SLP is simple and easy to use; however, it cannot be utilized to solve non-linear relations between inputs and outputs. On the other hand, an MLP is complex, yet it can solve the non-linear relations between different variables.
The process of an MLP has the following steps: Step 1: The inputs are summed and weighted as where n = total number of inputs, I i = current input number, ω ij = weight between the previous layer and jth neuron, and b is used to define the tolerance.
Step 2: An activation function is utilized in this step to start the process. Various activation functions such as sigmoid, ramp, and Gaussian functions can be used in this step. However, this research utilized the sigmoid function, which can be written as Step 3: This step takes place during the outcome process. The final outputs depend on the calculations made in the black box. The outcome can be expressed as In the above equation, ω jk = weighted connection between kth output node to jth hidden node. Similarly, b k = bias output of kth output node.
In this research, 70% of the data points were selected randomly for training the data and 30% for validation.

Artificial Neuro-Fuzzy Interface System (ANFIS)
This technique utilizes an ANN as well as fuzzy logic [39]. The probability of error in outputs is minimized by an ANN, while the expert knowledge is expressed by fuzzy logic [19]. The conditional statements in programming are used to imply fuzzy logic rules. These conditional statements include if-then structures in programming. There are five layers in an ANFIS program (steps taking place in the ANFIS technique), namely: 1. fuzzification, 2. set of rules 3. normalization, 4. defuzzification, 5. aggregation.
The fuzzification layer is the first layer. It contains the functional membership of all the input variables. The outcome in this layer is predicted by the Gaussian method. Mathematically, it can be written as where a i and ε i are parameters of a function membership. Weighted nodes are present in the second layer. These weighted nodes multiply the inputs by some weights before forwarding them to layer 3. Fuzzy AND logic is used in this layer which can be expressed as Data are normalized and smoothened in the third layer. It normalizes the functional membership by calculating the ratios between different firing strengths using the following relation: Defuzzification takes place in the fourth layer. Nodes present in this layer terminate the fuzzy logic rules. The square nodes of this layer can be expressed by the following function: where m i , n i , and r i are linear parameters. Aggregation occurs in the fifth layer. It collects all the layers and presents the final outcome. Mathematically, MATLAB was used for ANFIS in this research. All the data points were used for training the data.

Response Surface Methodology (RSM)
RSM is a statistical technique that is used to model a relationship between the dependent (also known as a response) and independent (also known as factors) variables. RSM works in four steps: (a) designing an experiment, (b) performing the experiment, (c) developing a model based on inputs and outputs of experimental data, and (d) optimization of the model [40].
The following equation represents the RSM: where β = coefficient, K = number of observations, and ε accounts for the error. The above equation can calculate an estimate for the value of Y for each value of X.

Linear Regression (LR)
A linear relationship between the dependent and independent variables exists in this technique. It can be represented mathematically as The above equation can be utilized to find values of Y for each input value of X.
In Equations (10) and (11), Y represents the output, that is, the CS of RHAC, and values of X represent all the inputs such as age, QRHA, QOPC, AGG, W, and SP.

Results
A number of data points are constant for all the models and techniques. The set of 134 data points was used for training, and the set of 58 data points was used for validation.

Artificial Neural Network (ANN)
Parameters were adjusted before utilizing the ANN technique. These parameters were the training function for neural networks, iterations, number of hidden layers, the total number of neurons per hidden layer, and the maximum number of iterations. These parameters were determined through the hit and trial method in this research. A detail of the parametric adjustment is presented in Table 3. The CS of RHAC through the ANN was predicted by using MATLAB. The results predicted by the ANN are the closest to the experimental results. The similarity in the predicted and experimental results can be further verified by statistical parameters.
It can be observed that the correlation factor for the ANN-predicted CS (R 2 = 0.98) is quite high. The prediction result for the ANN is shown in Figure 4. parameters were determined through the hit and trial method in this research. A detail of the parametric adjustment is presented in Table 3.
The CS of RHAC through the ANN was predicted by using MATLAB. The results predicted by the ANN are the closest to the experimental results. The similarity in the predicted and experimental results can be further verified by statistical parameters.
It can be observed that the correlation factor for the ANN-predicted CS (R 2 = 0.98) is quite high. The prediction result for the ANN is shown in Figure 4.

Artificial Neuro-Fuzzy Interface System (ANFIS)
The adjusted parameters, before using ANFIS, included the total number of iterations and the function used for activation of ANFIS. The parametric adjustments from the ANN and ANFIS are presented in Table 3.

Artificial Neuro-Fuzzy Interface System (ANFIS)
The adjusted parameters, before using ANFIS, included the total number of iterations and the function used for activation of ANFIS. The parametric adjustments from the ANN and ANFIS are presented in Table 3.
MATLAB was used for ANFIS. The correlation factor for the ANFIS-predicted CS (R 2 = 0.89) is high as well. Figure 5 illustrates the predicted results which are quite similar to the experimental ones. Figure 6 shows that the predicted CS by RSM is not close to the experimental values. The correlation factor for the RSM-predicted CS (R 2 = 0.70) confirms the deviation in the results. The correlation factors for training and validation are also low (0.75 and 0.69, respectively), as observed by the dispersed points in Figure 6.

Linear Regression (LR)
The results predicted by LR are far from the experimental results. The weakest correlation of all techniques (R 2 = 0.63) existed between the experimental and predicted results of LR. The correlation factors for LR training and LR validation were also low, at 0.64 and 0.62, respectively. The dispersed points in Figure 7 confirm this.
Crystals 2021, 11, x FOR PEER REVIEW 9 of 15 MATLAB was used for ANFIS. The correlation factor for the ANFIS-predicted CS (R 2 = 0.89) is high as well. Figure 5 illustrates the predicted results which are quite similar to the experimental ones.

Linear Regression (LR)
The results predicted by LR are far from the experimental results. The weakest correlation of all techniques (R 2 = 0.63) existed between the experimental and predicted results of LR. The correlation factors for LR training and LR validation were also low, at 0.64 and 0.62, respectively. The dispersed points in Figure 7 confirm this.

Linear Regression (LR)
The results predicted by LR are far from the experimental results. The weakest correlation of all techniques (R 2 = 0.63) existed between the experimental and predicted results of LR. The correlation factors for LR training and LR validation were also low, at 0.64 and 0.62, respectively. The dispersed points in Figure 7 confirm this.

Sensitivity and Parametric Analysis
The CS of RHAC was predicted by using different variables as input. The relative contribution of these variables to the outcome can be predicted by sensitivity analysis (SA). Mathematically, SA can be represented by using Equation (13), as shown below: where f max (x i ). is the maximum and f min (x i ) is the minimum output of the predictive models, whereas the input domain is represented by i, and other input variables are constant. It can be observed from Figure 8 that the different input variables affected the CS of RHAC in the same manner as that of the experimental method. models, whereas the input domain is represented by i, and other input variables are constant. It can be observed from Figure 8 that the different input variables affected the CS of RHAC in the same manner as that of the experimental method. Parametric analysis (PA) was also carried out along with SA. The main aim of PA is to determine the influence of input variables on the output parameter. In PA, all the input variables are kept constant at their mean value, except one input, and the trend of CS is noted for that variable input. All the results of PA are shown in Figure 9. Parametric analysis (PA) was also carried out along with SA. The main aim of PA is to determine the influence of input variables on the output parameter. In PA, all the input variables are kept constant at their mean value, except one input, and the trend of CS is noted for that variable input. All the results of PA are shown in Figure 9. PA deemed that when water is increased from a certain amount, it adversely affects the CS of RHAC. This is obvious from the previous experimental studies as well. Sensale [34] conducted research in which two water-to-cement ratios (w/c) were analyzed. It was concluded that a w/c of 0.4 resulted in a higher CS than that of a w/c of 0.5.
It can also be observed from PA that QRHA contributes towards the enhancement of PA deemed that when water is increased from a certain amount, it adversely affects the CS of RHAC. This is obvious from the previous experimental studies as well. Sensale [34] conducted research in which two water-to-cement ratios (w/c) were analyzed. It was concluded that a w/c of 0.4 resulted in a higher CS than that of a w/c of 0.5.
It can also be observed from PA that QRHA contributes towards the enhancement of CS. However, when QRHA is increased from 15%, it results in a decrease in the CS. This is because of the high silica content (90%) of RHA and that the increase in QRHA results in an increment of silica. Therefore, this excessive silica remains unreacted and results in a reduced CS of RHAC [32].
From the discussion of the results, it can be concluded that the regression models do not accurately predict the CS as the predicted CS values were far from the experimental CS values, whereas the CS predicted by AI techniques was found to be in close agreement with the experimental results. This can be attributed to the fact that the pre-defined equations were used in a regression model that cannot learn the relationship between input variables and the function properly. However, contrary to the regression models, AI techniques learnt the relationship between inputs and the functions effectively and successfully. Hence, the machine learning techniques produced results closer to the experimental values.

Conclusions
Different models for the prediction of the CS of RHAC were developed in this study. The models developed in this study were based on a wide range of data which consist of different parameters demonstrated by experimental studies that are available in the literature. The models considered the most influential parameters on CS as inputs. The results obtained in this research are in close agreement with the experimental research. The following conclusions can be drawn from the obtained results:

1.
It is evident by PA that the CS is efficiently predicted by the input parameters. In addition, the accuracy of data used at different stages such as training, validation, and testing is shown by R 2 .

2.
The results demonstrate R 2 values of 0.98, 0.89, 0.70, and 0.63 for ANN, ANFIS, RSM, and LR, respectively. Therefore, it can be concluded, based on the results and statistical parameters, that the CS predicted by ANN and ANFIS is the most accurate among all AI techniques. Thus, these two AI techniques can be used for the predesign of RHAC.

3.
The close agreement between the predicted and experimental results is in strong favor of employing AI techniques to use RHA in producing RHAC rather than disposing of it. 4.
Using RHAC would contribute towards a green and sustainable environment by reducing the emission of carbon dioxide, cost, and emission of hazardous gases.

5.
Utilization of RHA as a partial replacement of cement ultimately leads to lowering the carbon emissions from the cement industry. Therefore, it may be recommended that extensive research be carried out on RHAC to study the other important mechanical and durability-related properties such as the residual CS, behavior of steel with RHA, resistance to chloride, sulfate resistance, resistance to water penetration, and acid attacks. Many other AI techniques such as GEP, SVM, and ELM can also be used to propose further predictions.