Predicting the Strength Performance of Hydrated-Lime Activated Rice Husk Ash-Treated Soil Using Two Grey-Box Machine Learning Models

: Geotechnical engineering relies heavily on predicting soil strength to ensure safe and efﬁcient construction projects. This paper presents a study on the accurate prediction of soil strength properties, focusing on hydrated-lime activated rice husk ash (HARHA) treated soil. To achieve precise predictions, the researchers employed two grey-box machine learning models—classiﬁcation and regression trees (CART) and genetic programming (GP). These models introduce innovative equations and trees that readers can readily apply to new databases. The models were trained and tested using a comprehensive laboratory database consisting of seven input parameters and three output variables. The results indicate that both the proposed CART trees and GP equations exhibited excellent predictive capabilities across all three output variables—California bearing ratio (CBR), unconﬁned compressive strength (UCS), and resistance value (R value ) (according to the in-situ cone penetrometer test). The GP proposed equations, in particular, demonstrated a superior performance in predicting the UCS and R value parameters, while remaining comparable to CART in predicting the CBR. This research highlights the potential of integrating grey-box machine learning models with geotechnical engineering, providing valuable insights to enhance decision-making processes and safety measures in future infrastructural development projects.


Introduction
Soil stabilization techniques play an important role in geotechnical engineering to improve the engineering properties of weak or problematic soils [1].Traditional methods, such as soil replacement or compaction, have limitations in terms of cost, implementation, and environmental impact [2,3].As a sustainable and economical alternative, soil stabilization using supplementary materials/additives has attracted considerable attention.One of these approaches includes adding hydrated lime and rice husk ash to the soil [4].
Because of its unique properties, hydrated lime is a widely used additive in soil stabilization [4,5].It is a fine, white powder obtained from the hydration of quicklime (calcium oxide) and consists mainly of calcium hydroxide [4,5].When hydrated lime is added to soil, it undergoes chemical reactions with clay minerals, which lead to reduced soil compressibility, improved shear strength, and increased durability [4][5][6].These reactions, known as cation exchange and pozzolanic reactions, lead to the formation of stable compounds that bind soil particles together [4][5][6].
Rice husk ash (RHA), a byproduct of the rice milling industry, is also promising as an additive in soil stabilization [7].RHA is obtained by burning rice husks at high temperatures, which converts the organic matter into amorphous silica and other inorganic components [8].When added to soil, RHA acts as a pozzolanic agent and reacts with calcium hydroxide from hydrated lime to form additional cementitious compounds [4,5].The inclusion of RHA in the soil stabilization process further improves the strength, stiffness, and durability of the stabilized soil [9].Utilizing rice husk ash in cement and concrete offers numerous advantages, including reduced heat of hydration, improved strength, decreased permeability at higher dosages, enhanced resistance to chloride and sulfate, cost savings through reduced cement usage, and environmental benefits by mitigating waste disposal and lowering carbon dioxide emissions [10][11][12][13][14][15][16][17].Jafer et al. [18] focused on developing a sustainable ternary blended cementitious binder (TBCB) for soil stabilization.TBCB incorporates waste materials and improves the engineering properties of the stabilized soil.The results showed a reduced plasticity index and increased compressive strength.XRD and SEM analyses confirmed the formation of cementitious products, leading to a solid structure.TBCB offers a promising solution for soil stabilization with a reduced environmental impact [18].
The combination of hydrated lime and rice husk ash has synergistic effects in soil stabilization [19,20].The pozzolanic reactions between these materials and the soil matrix contribute to the development of cementitious compounds, thereby increasing strength and reducing permeability [21].In addition, the incorporation of rice husk ash contributes to the utilization of an agricultural waste product and increases sustainability in construction practices [22,23].
AI algorithms aimed at material characterization and design have faced doubt because people are worried about the opacity and reliability of their intricate models [54].The primary obstacle is the absence of transparency and methods to extract knowledge from these models [54].There are different categories of mathematical modeling techniques: white-box, black-box, and grey-box, each varying in their level of transparency [54,55].White-box models are grounded in fundamental principles and can elucidate the underlying physical relationships within a system [54][55][56].On the other hand, black-box models lack a clear structure, making it difficult to understand their inner workings [54][55][56].Grey-box models fall in between, as they identify patterns in data and offer a mathematical structure for the model [54][55][56].Artificial neural networks (ANN) are well-known examples of blackbox models in engineering [57].Although they are widely used, they lack comprehensive information about the relationships they establish.In contrast, genetic programming (GP) and classification and regression trees (CART) are newer grey-box modeling techniques that employ an evolutionary process to develop explicit prediction functions and trees, respectively, making it more transparent compared to black-box methods such as ANN.GP and CART models provide valuable insights into system performance as they offer mathematical and tree structures, respectively, that aid in understanding the underlying processes.They have shown promise in terms of accuracy and efficiency across various applications.Here are the benefits of AI-based grey-box models: -Transparency with Structure: Grey-box models, such as genetic programming (GP) and classification and regression trees (CART), strike a balance between white-box and black-box models.They offer transparency by identifying data patterns while providing a clear mathematical or tree-based structure, making it easier to understand the model's operations [58].-Enhanced Insights: The structured nature of grey-box models allows for a deeper understanding of the underlying processes.Unlike black-box models such as artificial neural networks (ANN), GP and CART provide insights into system performance through their explicit prediction functions and tree structures.-Transparency in Evolution: Techniques such as genetic programming (GP) use an evolutionary process to develop prediction functions, making the model's development and evolution more transparent.This transparency aids in tracking the model's progress and understanding its decision making [59].-Accuracy and Efficiency: Grey-box models, including GP and CART, have demonstrated promise across various applications, offering a combination of accuracy and efficiency.Their transparency, coupled with the ability to capture complex relationships in data, makes them valuable tools for mathematical modeling in engineering and other fields.
Based on the literature review provided, this research is groundbreaking in its systematic application of two distinct grey-box artificial intelligence models: genetic programming (GP) and classification and regression tree (CART).These models are utilized for the first time to predict critical soil parameters-California bearing ratio (CBR), unconfined compressive strength (UCS), and resistance value (R value ) determined through in-situ cone penetrometer tests-for expansive soil treated with recycled and activated rice husk ash composites.The study further evaluates the significance of input parameters, performing a sensitivity analysis on 121 datasets, each consisting of seven inputs: hydrated-lime activated rice husk ash (HARHA), liquid limit (LL), plastic limit (PL), plasticity index (PI), optimum moisture content (w OMC ), clay activity (A C ), and maximum dry density (MDD).HARHA, a material produced by blending 5% hydrated lime with rice husk ash, acts as an activator, and it is created from the controlled combustion of rice husk waste.Different proportions of HARHA (ranging from 0.1% to 12% in increments of 0.1% of weight) were employed to treat clayey soil, with the resulting effects on soil properties meticulously examined and documented within the study.

Database Processing
This study uses a 121-laboratory database, originally studied by Onyelowe et al. [25], who examined expansive clays by conducting tests on both untreated and treated soils to determine the datasets by observing the effects of stabilization on the predictor parameters.Onyelowe et al. [25] conducted a series of tests utilizing a laboratory to gather the data.The database is represented as three-dimensional diagrams in Figure 1.Experiments were conducted on expansive clay soil, both untreated and treated with hydrated-lime activated rice husk ash (HARHA).The HARHA, a binder developed by blending rice husk ash with hydrated lime, was tested at varying proportions on the clayey soil.Figure 2 depicts the effect of adding HARHA on three parameters: CBR, UCS, and resistance values.The results indicate that all three parameters increased with the percentage of additive until they reached a peak.Afterward, they showed slight decreases.An approximate value of 11.5% could be considered the optimal amount of HARHA additive.
The existing database contained seven inputs, which were as follows: hydrated-lime activated rice husk ash (HARHA), liquid limit (LL), plastic limit (PL), plasticity index (PI), optimum moisture content (w OMC ), clay activity (A C ), and maximum dry density (MDD).The plasticity index parameter was derived by subtracting the liquid limit from the plastic limit, while the remaining parameters were independent and lacked a direct correlation.
This database is noteworthy for using one of the largest sources of laboratory-derived UCS, CBR, and R value measurements documented in the literature, boasting a considerable number of entries, totaling seven.
Table 1 shows the statistical characteristics of the database utilized in this study, presenting the minimum, maximum, and mean values for both inputs and outputs.These descriptive statistics offer valuable insights into the distribution and properties of the data, providing crucial information for model selection and optimization in subsequent analyses.The existing database contained seven inputs, which were as follows: hydrated-lime activated rice husk ash (HARHA), liquid limit (LL), plastic limit (PL), plasticity index (PI), optimum moisture content (wOMC), clay activity (AC), and maximum dry density (MDD).The plasticity index parameter was derived by subtracting the liquid limit from the plastic limit, while the remaining parameters were independent and lacked a direct correlation.
This database is noteworthy for using one of the largest sources of laboratory-derived UCS, CBR, and Rvalue measurements documented in the literature, boasting a considerable number of entries, totaling seven.
Table 1 shows the statistical characteristics of the database utilized in this study, presenting the minimum, maximum, and mean values for both inputs and outputs.These descriptive statistics offer valuable insights into the distribution and properties of the data, providing crucial information for model selection and optimization in subsequent analyses.

Outliers
Within the realm of database preparation, a pivotal concern revolves around the detection of outliers.Within a database context, an outlier denotes a data point that exhibits notable deviation from the majority of the data points [60,61].The imperative lies in identifying and outrightly excluding these data points from the modeling process, as they have the potential to lead the model astray.Recognizing these particular data instances constitutes a pivotal stride in statistical analysis, commencing with the description of normative observations [62][63][64].This entails an overarching evaluation of the graphed data's configuration, with the identification of extraordinary observations that diverge significantly from the data's central mass-termed outliers.
Two graphical techniques commonly used to identify outliers are scatter plots and box plots [54,65].The latter utilizes the median, lower quartile, and upper quartile to display the data's behavior in the middle and at the distribution's ends.Furthermore, box plots employ fences to identify extreme values in the tails of the distribution.Points beyond an inner fence are considered mild outliers, while those beyond an outer fence are regarded as extreme outliers [54,66,67].
For this study, a dataset consisting of 121 observations was analyzed using a box plot to detect outliers.The process involved computing the median, lower quartile, upper quartile, and interquartile range, followed by calculating the lower and upper fences.Figure 3 indicates that the first quarter parameter of CBR, UCS, and R value values were 14, 141, and 17, respectively, and values of 34, 194, and 24 for the third quarter, respectively.The results demonstrate favorable distribution across the parameter range, from the minimum to the maximum values, with appropriately sized boxes.Additionally, the average parameters within all three boxes fall towards the center of the box, a positive indicator of data dispersion.According to Figure 3, no points were found to exceed the extreme values defined by the computed fences.Furthermore, Figure 4 shows histograms for different outputs.
tection of outliers.Within a database context, an outlier denotes a data point that exhibits notable deviation from the majority of the data points [60,61].The imperative lies in identifying and outrightly excluding these data points from the modeling process, as they have the potential to lead the model astray.Recognizing these particular data instances constitutes a pivotal stride in statistical analysis, commencing with the description of normative observations [62][63][64].This entails an overarching evaluation of the graphed data's configuration, with the identification of extraordinary observations that diverge significantly from the data's central mass-termed outliers.
Two graphical techniques commonly used to identify outliers are scatter plots and box plots [54,65].The latter utilizes the median, lower quartile, and upper quartile to display the data's behavior in the middle and at the distribution's ends.Furthermore, box plots employ fences to identify extreme values in the tails of the distribution.Points beyond an inner fence are considered mild outliers, while those beyond an outer fence are regarded as extreme outliers [54,66,67].
For this study, a dataset consisting of 121 observations was analyzed using a box plot to detect outliers.The process involved computing the median, lower quartile, upper quartile, and interquartile range, followed by calculating the lower and upper fences.Figure 3 indicates that the first quarter parameter of CBR, UCS, and Rvalue values were 14, 141, and 17, respectively, and values of 34, 194, and 24 for the third quarter, respectively.The results demonstrate favorable distribution across the parameter range, from the minimum to the maximum values, with appropriately sized boxes.Additionally, the average parameters within all three boxes fall towards the center of the box, a positive indicator of data dispersion.According to Figure 3, no points were found to exceed the extreme values defined by the computed fences.Furthermore, Figure 4 shows histograms for different outputs.(c)

Testing and Training Databases
The dataset employed in this study was divided into two distinct categories: the training database and the testing database.To achieve this, a random selection process was utilized, assigning 80% of the data to the training database and the remaining 20% to the testing database.The decision to allocate 80% of the data to the training database and the remaining 20% to the testing database is crucial in machine learning due to several reasons.The larger training set allows the model to learn patterns and generalize from the data, while the separate testing set provides an accurate evaluation of the model's realworld predictive capabilities, avoiding overfitting and ensuring robustness.This division also permits validation, enabling fine-tuning and parameter optimization, resulting in a

Testing and Training Databases
The dataset employed in this study was divided into two distinct categories: the training database and the testing database.To achieve this, a random selection process was utilized, assigning 80% of the data to the training database and the remaining 20% to the testing database.The decision to allocate 80% of the data to the training database and the remaining 20% to the testing database is crucial in machine learning due to several reasons.The larger training set allows the model to learn patterns and generalize from the data, while the separate testing set provides an accurate evaluation of the model's real-world predictive capabilities, avoiding overfitting and ensuring robustness.This division also permits validation, enabling fine-tuning and parameter optimization, resulting in a more realistic assessment of the model's practical utility, which is essential for guiding decisions on real-world deployment.
Tables 2 and 3 provide a comprehensive overview of the statistical characteristics, including the minimum, maximum, and average values, for the parameters in the training and testing databases, respectively.The results of the statistical analysis reveal that the two databases share similar characteristics, indicating that the data used for training the artificial intelligence model are representative of the data used for testing the model.This similarity in statistical properties between the training and testing databases is likely to enhance the accuracy and robustness of the developed model.The findings from this study underscore the importance of using representative and well-characterized data for the development of effective and reliable artificial intelligence models.In Tables 2 and 3, (U95) is 95% Confidence Interval.

Linear Normalizations
In a database context, each input or output variable is linked to specific units of measurement.To mitigate the impact of units and improve the efficiency of artificial intelligence training, a common approach involves data normalization.This process rescales the data to fit within a standardized range, often between zero and one.The normalization is achieved through the application of a linear transformation function, as described below: The four terms in this equation are X max , X min , X, and X norm , which correspond to maximum, minimum, actual, and normalized values, respectively.

Classification and Regression Tree (CART)
In the realm of data mining, decision tree (DT) stands out as a widely used technique known for its simplicity and interpretability [68].Unlike complex black-box algorithms such as artificial neural networks (ANNs), DT provides a white-box model, making it easier to comprehend and computationally efficient [69].Among the various types of DT methods, CART (classification and regression trees) has demonstrated a high accuracy and performance for predicting engineering problems [70,71].
A specific type of DT, known as regression tree (RT), employs recursive partitioning to divide the dataset into smaller regions with manageable interactions [70,71].RT consists of root nodes, interior nodes, branches, and terminal nodes.It utilizes a binary-dividing procedure based on questions about independent variables to achieve optimal splits and construct a tree with high purity [70,71].
To select the best split in RT algorithms, the Gini index is often employed [72].The partitioning process continues until a stop condition-determined by parameters such as the minimum number of observations, tree depth, or complexity-is met [73].Pruning can be applied to enhance the tree's generalization capacity and prevent overfitting [74,75].
An essential capability of CART is its ability to detect and eliminate outliers during the partitioning process [76].Additionally, CART utilizes principal component analysis (PCA) to identify crucial parameters for modeling [77,78].
Figure 5 shows a typical decision tree, comprising multiple nodes and branches.As depicted in the figure, a combination of nodes and branches forms a leaf.Each node is bifurcated into left and right nodes, guided by specific rules assigned to each branch (for example, if Node A is less than or equal to 'a', then proceed to Node C).Ultimately, the final node within each leaf reveals the predicted output (exemplified by nodes D-G in Figure 5).and performance for predicting engineering problems [70,71].
A specific type of DT, known as regression tree (RT), employs recursive partitioning to divide the dataset into smaller regions with manageable interactions [70,71].RT consists of root nodes, interior nodes, branches, and terminal nodes.It utilizes a binary-dividing procedure based on questions about independent variables to achieve optimal splits and construct a tree with high purity [70,71].
To select the best split in RT algorithms, the Gini index is often employed [72].The partitioning process continues until a stop condition-determined by parameters such as the minimum number of observations, tree depth, or complexity-is met [73].Pruning can be applied to enhance the tree's generalization capacity and prevent overfitting [74,75].
An essential capability of CART is its ability to detect and eliminate outliers during the partitioning process [76].Additionally, CART utilizes principal component analysis (PCA) to identify crucial parameters for modeling [77,78].
Figure 5 shows a typical decision tree, comprising multiple nodes and branches.As depicted in the figure, a combination of nodes and branches forms a leaf.Each node is bifurcated into left and right nodes, guided by specific rules assigned to each branch (for example, if Node A is less than or equal to 'a', then proceed to Node C).Ultimately, the final node within each leaf reveals the predicted output (exemplified by nodes D-G in Figure 5).

Genetic Programming (GP)
Genetic programming (GP) is a remarkable field in artificial intelligence and machine learning, utilizing evolutionary algorithms to create computer programs [79].Proposed by John Koza in the early 1990s [80], GP has become widely researched and applied across various domains, including image recognition, classification, and prediction.One of its key strengths lies in its flexibility, enabling it to address a diverse range of problems in fields such as engineering, finance, and biology [81].Moreover, GP's ability to automatically generate computer programs without human intervention saves valuable time and effort [82].Additionally, it excels at optimizing complex functions that may be challenging for traditional methods, and its creative nature often leads to unexpected and innovative solutions, potentially uncovering new discoveries.
However, despite its advantages, GP does come with certain drawbacks.Its computational demands can be time consuming, especially when dealing with large search spaces.Furthermore, the generated programs can be challenging to understand and interpret, which can hinder result validation [54,83].A GP's performance may also be influenced by the choice of parameters, and it may not always achieve the optimal solution.Additionally, there is a risk of overfitting, as GP-generated programs can become overly specialized to the training data, potentially limiting their generalization to new and unseen data [54,83].
Genetic programming manipulates and optimizes a population of computer models (or programs) to find the best-fitting solution for a given problem.It involves creating an initial population of models, each comprising sets of user-defined functions and terminals.These functions may include mathematical operators (e.g., +, −, ×, and /), Boolean logic functions (e.g., AND, OR, and NOT), trigonometric functions (e.g., sin and cos), or other custom functions, while the terminals can consist of numerical constants, logical constants, or variables.These elements are randomly combined in a tree-like structure, forming a computer model that is evolved over generations to improve its performance in solving the problem at hand, as represented in typical GP tree structures.Figure 6 shows an example of a function of [( tational demands can be time consuming, especially when dealing with large search spaces.Furthermore, the generated programs can be challenging to understand and interpret, which can hinder result validation [54,83].A GP's performance may also be influenced by the choice of parameters, and it may not always achieve the optimal solution.Additionally, there is a risk of overfitting, as GP-generated programs can become overly specialized to the training data, potentially limiting their generalization to new and unseen data [54,83]. Genetic programming manipulates and optimizes a population of computer models (or programs) to find the best-fitting solution for a given problem.It involves creating an initial population of models, each comprising sets of user-defined functions and terminals.These functions may include mathematical operators (e.g., +, −, ×, and /), Boolean logic functions (e.g., AND, OR, and NOT), trigonometric functions (e.g., sin and cos), or other custom functions, while the terminals can consist of numerical constants, logical constants, or variables.These elements are randomly combined in a tree-like structure, forming a computer model that is evolved over generations to improve its performance in solving the problem at hand, as represented in typical GP tree structures.Figure 6 shows an example of a function of [( 1 +  2 ) × ( 3 − 3)] 2 .

Results
In the development of artificial intelligence (AI) systems, the evaluation of various models is of paramount importance.This assessment process relies heavily on statistical parameters, which are instrumental in gauging the performance of AI models.The essential parameters are outlined by Equations ( 2)-( 7

Results
In the development of artificial intelligence (AI) systems, the evaluation of various models is of paramount importance.This assessment process relies heavily on statistical parameters, which are instrumental in gauging the performance of AI models.The essential parameters are outlined by Equations ( 2)-( 7), encompassing the mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), mean squared logarithmic error (MSLE), root mean squared logarithmic error (RMSLE), and coefficient of determination (R 2 ).Leveraging these metrics, both researchers and developers can aptly gauge and juxtapose the efficacy of distinct AI models [54].
where N, X m , and X p are the number of datasets, actual values, and predicted values, respectively.In addition, X m and X p are the averages of the actual and predicted values, respectively.To have the best model, R 2 should be 1 and MAE, MSE, RMSE, MSLE, and RMSLE should equal 0.

Classification and Regression Tree (CART) Results
In this study, the development of the CART model was carried out using the MATLAB 2020 software package.The validity of a CART model hinges on selecting suitable distance ranges and maximum tree depth.Several strategies were tested through trial and error to determine the optimal values for these key parameters.Typically, setting a high value for the maximum tree depth may lead to an excessively complex model.Conversely, opting for a small value for the tree depth may result in the removal of certain input parameters, as the algorithm strives to minimize prediction errors.By iterating through trial and error, the most favorable CART model with well-tuned key parameters can be identified.
Figures 7-9 present the performance of the optimal CART model, showcasing the predicted values versus the actual values obtained from experiments for CBR, UCS, and R value testing, respectively.The results indicate that the CART method demonstrated satisfactory predictive capabilities in accurately determining the CBR, UCS, and R value parameters.
Tables 4-6 present a detailed evaluation of the best CART model's overall performance in predicting the CBR, UCS, and R value parameters.These tables aim to provide comprehensive insights into the accuracy and generalization capabilities of the model on both the training and testing databases.The consistency in the model's performance between the training and testing databases across all three tables indicates its ability to generalize unseen data well.Generalization is a critical aspect of machine learning models as it ensures that the model can make accurate predictions on new data, not just the data it was trained on.The fact that the model maintains its accuracy on unseen data suggests that it successfully learned meaningful patterns and relationships from the training database without overfitting.Furthermore, the values of MSLE and RMSLE close to zero in all three tables indicate that the model's predictions were robust and stable, even for cases where the actual values exhibited significant fluctuations.This property is particularly useful in engineering, where data may have wide variations, and the logarithmic metrics provide a more balanced evaluation of the model's performance.Moreover, the low values of the MAE, MSE, and RMSE metrics across all three tables indicate that the CART model performed well in minimizing the discrepancy between predicted and actual values.This suggests that the model's predictions were relatively close to the true values.Small differences between the predicted and actual values are crucial in engineering applications, where precise estimations are vital for design, construction, and safety considerations.
Figures 10-12 illustrate the optimal and best CART models for predicting the CBR, UCS, and R value , respectively.Each model consists of 15 nodes, with the LL, PL, and PI serving as the root nodes, respectively.The model determined the number of nodes, which is an outcome of the modeling process.The node count can vary based on the initial tree depth chosen.A higher initial tree depth can result in more nodes, leading to increased model complexity and longer processing times.In this context, an optimal model consisted of 15 nodes.
analysis and examination of these models can provide deeper insights into their performance and effectiveness in real-world scenarios.analysis and examination of these models can provide deeper insights into their performance and effectiveness in real-world scenarios.

Genetic Programming (GP) Results
In this study, the second grey-box model under investigation was genetic programming (GP).To attain the highest performance, several iterations of this model were tested, and the most optimal configuration was identified.Table 7 presents a summary of the crucial parameters used in the GP model, encompassing various properties that were adjusted to achieve the best possible predictive performance for the target variable.The parameters listed in Table 7 were derived from a series of deliberate attempts to refine the model, with the goal of finding the combination of values that produces the best balance of accuracy and performance.This iterative process involves making adjustments, experimenting with different settings, and analyzing the results to reach the most effective configuration.The table serves as a summary of these endeavors, showcasing the parameter values that have been identified as the most optimal based on the criteria of prediction accuracy and overall model quality.
Table 7 provides a detailed summary of the properties of the optimum GP model utilized to predict the CBR, UCS, and Rvalue parameters.Each parameter has its unique configuration in the GP model, which plays a crucial role in determining its predictive performance.The table presents key properties such as the population size, probabilities To obtain the predicted number using any of these models, one must follow the branch rule from node 1 to the end of the tree.These CART models offer valuable tools for engineers and researchers to make accurate predictions based on specific input conditions, facilitating informed decision making in various engineering applications.Further analysis and examination of these models can provide deeper insights into their performance and effectiveness in real-world scenarios.

Genetic Programming (GP) Results
In this study, the second grey-box model under investigation was genetic programming (GP).To attain the highest performance, several iterations of this model were tested, and the most optimal configuration was identified.Table 7 presents a summary of the crucial parameters used in the GP model, encompassing various properties that were adjusted to achieve the best possible predictive performance for the target variable.The parameters listed in Table 7 were derived from a series of deliberate attempts to refine the model, with the goal of finding the combination of values that produces the best balance of accuracy and performance.This iterative process involves making adjustments, experimenting with different settings, and analyzing the results to reach the most effective configuration.The table serves as a summary of these endeavors, showcasing the parameter values that have been identified as the most optimal based on the criteria of prediction accuracy and overall model quality.
Table 7 provides a detailed summary of the properties of the optimum GP model utilized to predict the CBR, UCS, and R value parameters.Each parameter has its unique configuration in the GP model, which plays a crucial role in determining its predictive performance.The table presents key properties such as the population size, probabilities of GP operations (crossover, mutation, and reproduction), tree structure level, random constants, selection method, tour size, maximum initial and operation tree depth, range for initial values, count of node replacements during crossover, and brood size.By meticulously adjusting these properties for each parameter, the authors aimed to optimize the GP model's accuracy in providing reliable CBR, UCS, and R value predictions.
For each parameter, the GP model followed the HalfHalf method for generating individuals, leading to linear trees with a depth of 1.The selection method utilized was 'rank selection' with a tour size of 2, and various probabilities for GP operations were set, including 0.99 for crossover, 0.99 for mutation, and 0.2 for reproduction.The GP model incorporated two random constants, and it defined the maximum initial and operation tree depth based on values such as 7 for the CBR, 5 for the R value , and 6 for the UCS.The range for initial values varied from 0 to 1, and during crossover, a count of 3 nodes was replaced.Furthermore, the brood size is set at 7 for the CBR, 5 for the R value , and 6 for the UCS.The properties listed in Table 7 offer crucial insights into the GP model's tailored configuration for predicting the CBR, UCS, and R value .
Figures 13-15 show comparisons between the predicted CBR, UCS, and R value generated by the GP model and the actual values based on results obtained from the testing and training databases.The GP model exhibited a highly satisfactory performance for predicting the CBR, UCS, and R value parameters.
Tables 8-10 present the overall performance of the best GP model in predicting the CBR, UCS, and R value parameters for both the training and testing databases, respectively.Each table consists of various metrics that evaluate the model's predictive capability.
of GP operations (crossover, mutation, and reproduction), tree structure level, random constants, selection method, tour size, maximum initial and operation tree depth, range for initial values, count of node replacements during crossover, and brood size.By meticulously adjusting these properties for each parameter, the authors aimed to optimize the GP model's accuracy in providing reliable CBR, UCS, and Rvalue predictions.
For each parameter, the GP model followed the HalfHalf method for generating individuals, leading to linear trees with a depth of 1.The selection method utilized was 'rank selection' with a tour size of 2, and various probabilities for GP operations were set, including 0.99 for crossover, 0.99 for mutation, and 0.2 for reproduction.The GP model incorporated two random constants, and it defined the maximum initial and operation tree depth based on values such as 7 for the CBR, 5 for the Rvalue, and 6 for the UCS.The range for initial values varied from 0 to 1, and during crossover, a count of 3 nodes was replaced.Furthermore, the brood size is set at 7 for the CBR, 5 for the Rvalue, and 6 for the UCS.The properties listed in Table 7 offer crucial insights into the GP model's tailored configuration for predicting the CBR, UCS, and Rvalue.
Figures 13-15 show comparisons between the predicted CBR, UCS, and Rvalue generated by the GP model and the actual values based on results obtained from the testing and training databases.The GP model exhibited a highly satisfactory performance for predicting the CBR, UCS, and Rvalue parameters.Tables 8-10 present the overall performance of the best GP model in predicting the CBR, UCS, and Rvalue parameters for both the training and testing databases, respectively.Each table consists of various metrics that evaluate the model's predictive capability.Tables 8-10 present the overall performance of the best GP model in predicting the CBR, UCS, and Rvalue parameters for both the training and testing databases, respectively.Each table consists of various metrics that evaluate the model's predictive capability.Table 8 focuses on predicting the CBR parameters and contains six evaluation metrics, namely MAE, MSE, RMSE, MSLE, RMSLE, and R 2 .These metrics provide an assessment of the model's accuracy and goodness of fit.For the training database, MAE was 1.508, indicating an average absolute error of 1.508 between the predicted and actual CBR values.The MAE for the testing dataset was 1.096.Lower MAE and RMSE values are desirable as they indicate a better model performance.The high R 2 values (0.978 for training and 0.981 for testing) suggest that the GP model explained a significant portion of the variance in the CBR data.
Table 9 focuses on predicting the UCS and shares the same evaluation metrics as Table 8.The GP model exhibited an impressive performance for predicting the UCS, with lower MAE, MSE, and RMSE values for both the training and testing datasets compared with the CBR predictions reported in Table 8.The R 2 values were also higher (0.990 for training and 0.993 for testing), indicating a stronger fit to the UCS data.
Table 10 presents various evaluation metrics associated with the R value predictions.The GP model's performance for R value predictions was outstanding, with very low MAE, MSE, and RMSE values for both the training and testing datasets.The R 2 values were exceptionally high (0.998 for training and 0.998 for testing), demonstrating an excellent fit of the model to the R value data.Below is the recommended equation obtained from the GP model: where X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , and X 7 are HARHA, LL, PL, PI, w opt , A c , and MDD, respectively.In addition, the values of R 1 , R 2 and R 3 are constants and equal to 0.32, 0.92, and 0.09, respectively.

Comparison of CART and GP Models
Tables 11-13 provide the results of the proposed CART and GP models in predicting the CBR, UCS, and R value parameters for both the training and testing databases, respectively.The performance metrics evaluated for each model included the MAE, MSE, RMSE, MSLE, RMSLE, and R 2 .For the CBR predictions (see Table 11), it can be observed that the GP model generally performed slightly worse than CART on the training database across all metrics.However, for the testing database, the performance of the two models was generally on par with each other.For the UCS predictions (see Table 12), the GP model performed better than CART on both the training and testing datasets -that is, the GP model consistently showed lower MAE, MSE, RMSE, MSLE, and RMSLE values compared to those obtained for the CART model.For the R value predictions (see Table 13), the GP model significantly outperformed CART on both the training and testing datasets across all metrics.In this regard, the GP model achieved noticeably lower MAE, MSE, RMSE, MSLE, and RMSLE values compared to the CART model.
Figures 16 and 17 present convergence curves associated with the output variables (predicted by the proposed CART and GP models) for the R 2 and RMSE parameters, respectively.The findings of this study indicate that the GP model initially achieved superior accuracy across all graphs during the first iteration.However, it required a higher number of attempts (iterations) to eventually reach its maximum accuracy level, with the convergence occurring between the 80th and 120th iterations.On the other hand, the CART method exhibited a relatively lower accuracy in the first iteration.However, the CART method demonstrated a quicker ability to enhance its accuracy over successive iterations, enabling it to reach its maximum accuracy level sooner, typically between the 50th and 80th iterations.This suggests that while the GP model initially showed promise in accuracy, the CART method displayed a more efficient improvement trajectory, ultimately reaching its peak accuracy earlier in the iterative process.This insight highlights the dynamic trade-off between initial accuracy and convergence speed between these two modeling approaches.
The difference in accuracy patterns between the GP and CART models can be attributed to the inherent nature of their respective algorithmic structures and optimization processes.The GP model's initial 'lucky' solutions might explain its early higher accuracy, but the CART method's efficient splitting criteria and incremental refinement process allowed it to quickly catch up and even in some cases surpass the GP model's accuracy by reaching its peak earlier in the iterative process.The trade-off here is that the CART method's accuracy improvement curve might have initially started slower, but accelerated as the iterations progressed, leading to faster convergence towards the optimal solution.
rior accuracy across all graphs during the first iteration.However, it required a higher number of attempts (iterations) to eventually reach its maximum accuracy level, with the convergence occurring between the 80 th and 120 th iterations.On the other hand, the CART method exhibited a relatively lower accuracy in the first iteration.However, the CART method demonstrated a quicker ability to enhance its accuracy over successive iterations, enabling it to reach its maximum accuracy level sooner, typically between the 50 th and 80 th iterations.This suggests that while the GP model initially showed promise in accuracy, the CART method displayed a more efficient improvement trajectory, ultimately reaching its peak accuracy earlier in the iterative process.This insight highlights the dynamic tradeoff between initial accuracy and convergence speed between these two modeling approaches.
The difference in accuracy patterns between the GP and CART models can be attributed to the inherent nature of their respective algorithmic structures and optimization processes.The GP model's initial 'lucky' solutions might explain its early higher accuracy, but the CART method's efficient splitting criteria and incremental refinement process allowed it to quickly catch up and even in some cases surpass the GP model's accuracy by reaching its peak earlier in the iterative process.The trade-off here is that the CART method's accuracy improvement curve might have initially started slower, but accelerated as the iterations progressed, leading to faster convergence towards the optimal solution.

Sensetivity Analysis
In the realm of data-driven modeling, the assessment of input parameters' significance plays a crucial role.This evaluation involves a systematic process: each individual input parameter is intentionally modified by both increasing and decreasing it by 100%, and then the resulting errors in the models are meticulously observed.This meticulous analysis serves as a tool to gauge the sensitivity of each model to particular parameters.When the error values are higher, it indicates that the model is more sensitive to those specific parameters, whereas lower error values suggest that the parameter being examined has a relatively lesser impact on the overall model performance.This methodology allows us to pinpoint which input parameters significantly influence the model's outcomes and helps us fine-tune the model for better results.
Figures 18-20 provide a visual depiction of the importance of the input parameters across the proposed CART and GP models for predicting the CBR, UCS, and Rvalue.These figures offer valuable insights into how the models react to changes in various input parameters and aid in identifying critical factors influencing the predictive performance of the models.

Sensetivity Analysis
In the realm of data-driven modeling, the assessment of input parameters' significance plays a crucial role.This evaluation involves a systematic process: each individual input parameter is intentionally modified by both increasing and decreasing it by 100%, and then the resulting errors in the models are meticulously observed.This meticulous analysis serves as a tool to gauge the sensitivity of each model to particular parameters.When the error values are higher, it indicates that the model is more sensitive to those specific parameters, whereas lower error values suggest that the parameter being examined has a relatively lesser impact on the overall model performance.This methodology allows us to pinpoint which input parameters significantly influence the model's outcomes and helps us fine-tune the model for better results.
Figures 18-20 provide a visual depiction of the importance of the input parameters across the proposed CART and GP models for predicting the CBR, UCS, and R value .These figures offer valuable insights into how the models react to changes in various input parameters and aid in identifying critical factors influencing the predictive performance of the models.Tables 14-16 provide a ranking of the input parameters (in terms of their importance) for the proposed models used for predicting the CBR, UCS, and Rvalue, respectively.In this ranking, 'Rank 1' corresponds to the highest importance, representing the most critical parameter, while 'Rank 7' indicates the lowest importance.
Based on the results from Table 14, the parameters HARHA, MDD and PI emerge as the most crucial factors for predicting the CBR.In contrast, the parameters PL and clay activity AC hold the least importance in this prediction.Similarly, the findings from Table 15 indicate that the parameters HARHA and LL play pivotal roles, exerting the most influence for UCS predictions.Lastly, as shown in Table 16, the parameters HARHA, LL, and wopt hold the highest levels of importance for predicting the Rvalue.Conversely, parameters PL and AC have the least influence on this prediction.Tables 14-16 provide a ranking of the input parameters (in terms of their importance) for the proposed models used for predicting the CBR, UCS, and R value , respectively.In this ranking, 'Rank 1' corresponds to the highest importance, representing the most critical parameter, while 'Rank 7' indicates the lowest importance.Based on the results from Table 14, the parameters HARHA, MDD and PI emerge as the most crucial factors for predicting the CBR.In contrast, the parameters PL and clay activity A C hold the least importance in this prediction.Similarly, the findings from Table 15 indicate that the parameters HARHA and LL play pivotal roles, exerting the most influence for UCS predictions.Lastly, as shown in Table 16, the parameters HARHA, LL, and w opt hold the highest levels of importance for predicting the R value .Conversely, parameters PL and A C have the least influence on this prediction.

Limitations
The present work advances the field by specifically focusing on predicting the soil strength properties for hydrated-lime activated rice husk ash-treated soil.It introduces innovative and readily applicable equations and trees derived from grey-box machine learning models, namely CART and GP.The study's findings on the superior predictive capabilities of GP equations, particularly for the UCS and R value parameters, is a notable contribution.The integration of interpretable yet flexible machine learning models within geotechnical engineering, as practiced in the present study, highlights its potential to enhance decisionmaking processes and safety measures in future infrastructure development projects.
The study also identifies some limitations and challenges.One limitation is the reliance on a specific laboratory database, which most likely does not represent all possible scenarios encountered in the field; in other words, the generalization of the proposed models to different conditions and materials could be a concern.Additionally, while the GP approach showed promising results, it has its computational demands and may be challenging to interpret, which could pose limitations in real-world engineering applications.The risk of overfitting in GP-generated programs also needs to be addressed to ensure their generalization to unseen data.

Conclusions
This paper focused on developing predictive equations and trees using data-driven approaches for three crucial geotechnical properties of hydrated-lime activated rice husk ashtreated soil, namely the California bearing ratio (CBR), unconfined compressive strength (UCS), and resistance value (i.e., R value from an in-situ cone penetrometer test).Two models, namely classification and regression trees (CART) and genetic programming (GP), were employed to predict these properties based on seven input parameters, consisting of hydrated-lime activated rice husk ash (HARHA) content, liquid limit (LL), plastic limit (PL), plasticity index (PI), optimum moisture content (w OMC ), clay activity (A C ), and maximum dry density (MDD).
The proposed CART and GP models both displayed commendable predictive aptitude for the CBR, UCS, and R value parameters.The models were evaluated using various performance metrics, including MAE, MSE, RMSE, MSLE, RMSLE, and R 2 .The performance of the models was consistent across both the training and testing datasets, indicating their ability to generalize unseen data well.
Comparing the two models, CART generally outperformed GP in terms of predicting the UCS and R value , particularly on the testing database.However, for the CBR predictions, CART demonstrated a slightly better performance on the training database, while GP performed slightly better on the testing database.Overall, both models proved effective in predicting the geotechnical properties under investigation.Additionally, the study assessed the importance of individual input parameters in the predictive models.This analysis provided valuable insights into the sensitivity of each model to specific parameters, helping identify critical factors influencing the models' performance.The findings of this study contribute to the understanding of using data-driven approaches in predicting engineering properties of soils, which can have significant applications in geotechnical engineering and construction.The developed models offer valuable tools for engineers and researchers to make accurate predictions based on specific input conditions, facilitating informed decision making in various geoengineering applications.

Figure 1 .
Figure 1.Distribution of the database used based on (a) CBR, (b) UCS, and (c) resistance values.

Figure 2
Figure2depicts the effect of adding HARHA on three parameters: CBR, UCS, and resistance values.The results indicate that all three parameters increased with the percentage of additive until they reached a peak.Afterward, they showed slight decreases.An approximate value of 11.5% could be considered the optimal amount of HARHA additive.

Figure 1 .
Figure 1.Distribution of the database used based on (a) CBR, (b) UCS, and (c) resistance values.

Figure 3 .
Figure 3. Box plot of output to find outliers for (a) CBR, (b) UCS, and (c) Rvalue.

Figure 3 .
Figure 3. Box plot of output to find outliers for (a) CBR, (b) UCS, and (c) R value .

Figure 3 .
Figure 3. Box plot of output to find outliers for (a) CBR, (b) UCS, and (c) Rvalue.

Figure 4 .
Figure 4. Histogram plots showing the distribution of (a) CBR, (b) UCS, and (c) R value .

Figure 7 .Figure 7 .
Figure 7.The results of CART modelling for predicting CBR using (a) training and (b) testing databases.

Figure 7 .
Figure 7.The results of CART modelling for predicting CBR using (a) training and (b) testing databases.

Figure 8 .Figure 8 .Figure 9 .
Figure 8.The results of CART modelling for predicting UCS using (a) training and (b) testing databases.

Figure 9 .
Figure 9.The results of CART modelling for predicting R value using (a) training and (b) testing databases.

Figure 10 .
Figure 10.The best developed CART model to predict the CBR.

Figure 11 .
Figure 11.The best developed CART model to predict the UCS.

Figure 10 .
Figure 10.The best developed CART model to predict the CBR.

Figure 10 .
Figure 10.The best developed CART model to predict the CBR.

Figure 11 .
Figure 11.The best developed CART model to predict the UCS.Figure 11.The best developed CART model to predict the UCS.

Figure 11 .
Figure 11.The best developed CART model to predict the UCS.Figure 11.The best developed CART model to predict the UCS.

Figure 12 .
Figure 12.The best developed CART model to predict the Rvalue.

Figure 12 .
Figure 12.The best developed CART model to predict the R value .

Figure 13 .
Figure 13.The results of GP modelling for predicting the CBR for (a) training and (b) testing databases.

Figure 13 .
Figure 13.The results of GP modelling for predicting the CBR for (a) training and (b) testing databases.

Figure 14 .
Figure 14.The results of GP modelling for predicting the UCS for (a) training and (b) testing databases.

Figure 15 .
Figure 15.The results of GP modelling for predicting the Rvalue for (a) training and (b) testing databases.

Figure 14 . 18 Figure 14 .
Figure 14.The results of GP modelling for predicting the UCS for (a) training and (b) testing databases.

Figure 15 .
Figure 15.The results of GP modelling for predicting the Rvalue for (a) training and (b) testing databases.

Figure 15 .
Figure 15.The results of GP modelling for predicting the R value for (a) training and (b) testing databases.

Figure 16 .Figure 16 .Figure 17 .
Figure 16.Convergence curves for the R 2 parameter associated with (a) CBR, (b) UCS, and (c) Rvalue predicted by the proposed CART and GP models.

Figure 17 .
Figure 17.Convergence curves for the RMSE parameter associated with (a) CBR, (b) UCS, and (c) Rvalue predicted by the proposed CART and GP models.

Figure 17 .
Figure 17.Convergence curves for the RMSE parameter associated with (a) CBR, (b) UCS, and (c) R value predicted by the proposed CART and GP models.

Geotechnics 2023, 3 , 23 Figure 18 .Figure 18 .
Figure 18.The importance of input parameters in predicting the CBR for the proposed (a) CART and (b) GP models.

Figure 18 .
Figure 18.The importance of input parameters in predicting the CBR for the proposed (a) CART and (b) GP models.

Figure 19 .Figure 19 .Figure 20 .
Figure 19.The importance of input parameters in predicting the UCS for the proposed (a) CART and (b) GP models .

Figure 20 .
Figure 20.The importance of input parameters in predicting the R value for the proposed (a) CART and (b) GP models.

Table 1 .
Descriptive statistics for the collected database.

Table 2 .
Descriptive statistics for the training database.

Table 3 .
Descriptive statistics for the testing database.
Table 4 focuses on the CART model's performance in predicting the CBR parameter.The metrics utilized for the evaluation include MAE, MSE, RMSE, MSLE, RMSLE, and R 2 .The model demonstrates excellent accuracy, with low MAE (0.976 for training and 1.141 for testing) and RMSE (1.195 for training and 1.363 for testing) values.The minimal MSLE and RMSLE values (approximately 0.004) indicate that the model's predictions were closely aligned with the actual CBR values.Additionally, the high R 2 values (0.990 for training and 0.982 for testing) suggest that the model captures a substantial portion of variance in the data, leading to reliable predictions of the CBR parameter.

Table 4
focuses on the CART model's performance in predicting the CBR parameter.The metrics utilized for the evaluation include MAE, MSE, RMSE, MSLE, RMSLE, and R 2 .The model demonstrates excellent accuracy, with low MAE (0.976 for training and 1.141 for testing) and RMSE (1.195 for training and 1.363 for testing) values.The minimal MSLE and RMSLE values (approximately 0.004) indicate that the model's predictions were closely aligned with the actual CBR values.Additionally, the high R 2 values (0.990 for training and 0.982 for testing) suggest that the model captures a substantial portion of variance in the data, leading to reliable predictions of the CBR parameter.
Table 5 evaluates the CART model's performance in predicting the UCS.The model performed admirably with reasonably low MAE (3.262 for training and 3.742 for testing) and RMSE (3.868 for training and 4.300 for testing) values, showcasing its accuracy in predicting the UCS.The MSLE and RMSLE values (both approximately 0.001) further reinforce the strong correlation between predicted and actual values.The high R 2 values (0.985 for training and 0.980 for testing) indicate that the model captured a significant portion of

Table 5
evaluates the CART model's performance in predicting the UCS.The model performed admirably with reasonably low MAE (3.262 for training and 3.742 for testing) and RMSE (3.868 for training and 4.300 for testing) values, showcasing its accuracy in predicting the UCS.The MSLE and RMSLE values (both approximately 0.001) further reinforce the strong correlation between predicted and actual values.The high R 2 values (0.985 for training and 0.980 for testing) indicate that the model captured a significant portion of variation in the data, resulting in reliable UCS predictions.Table 6 highlights the CART model's performance in predicting the R value parameter.Once again, the model delivered impressive results, as indicated by the low MAE (0.539 for training and 0.548 for testing) and RMSE (0.672 for training and 0.663 for testing) values.The MSLE and RMSLE values (both approximately 0.001) reinforce the close correspondence between the predicted and actual values.Moreover, the high R 2 values (0.975 for training and 0.983 for testing) demonstrate a strong correlation between the model's predictions and the actual resistance values.

Table 4 .
Overall performance of the best CART model to predict the CBR for both the training and testing databases.

Table 5 .
Overall performance of the best CART model to predict the UCS for both the training and testing databases.

Table 6 .
Overall performance of the best CART model to predict the resistance value for both the training and testing databases.

Table 7 .
The properties of the optimum GP model for predicting the CBR, UCS, and Rvalue.

Table 7 .
The properties of the optimum GP model for predicting the CBR, UCS, and R value .

Table 8 .
Overall performance of the best GP model to predict the CBR for both the training and testing datasets.

Table 8 .
Overall performance of the best GP model to predict the CBR for both the training and testing datasets.

Table 8 .
Overall performance of the best GP model to predict the CBR for both the training and testing datasets.

Table 9 .
Overall performance of the best GP model to predict the UCS for both the training and testing datasets.

Table 10 .
Overall performance of the best GP model to predict the R value for both the training and testing datasets.

Table 11 .
Results of the proposed CART and GP models in predicting the CBR for both the training and testing databases.

Table 12 .
Results of the proposed CART and GP models in predicting the UCS for both the training and testing databases.

Table 13 .
Results of the proposed CART and GP models in predicting the R value for both the training and testing databases.

Table 14 .
Ranking results of variable importance for the proposed mathematical models to predict CBR.

Table 14 .
Ranking results of variable importance for the proposed mathematical models to predict CBR.

Table 15 .
Ranking results of variable importance for the proposed mathematical models to predict UCS.

Table 16 .
Ranking results of variable importance for the proposed mathematical models to predict resistance values.