Performance Prediction of Cement Stabilized Soil Incorporating Solid Waste and Propylene Fiber

Cement stabilized soil (CSS) yields wide application as a routine cementitious material due to cost-effectiveness. However, the mechanical strength of CSS impedes development. This research assesses the feasible combined enhancement of unconfined compressive strength (UCS) and flexural strength (FS) of construction and demolition (C&D) waste, polypropylene fiber, and sodium sulfate. Moreover, machine learning (ML) techniques including Back Propagation Neural Network (BPNN) and Random Forest (FR) were applied to estimate UCS and FS based on the comprehensive dataset. The laboratory tests were conducted at 7-, 14-, and 28-day curing age, indicating the positive effect of cement, C&D waste, and sodium sulfate. The improvement caused by polypropylene fiber on FS was also evaluated from the 81 experimental results. In addition, the beetle antennae search (BAS) approach and 10-fold cross-validation were employed to automatically tune the hyperparameters, avoiding tedious effort. The consequent correlation coefficients (R) ranged from 0.9295 to 0.9717 for BPNN, and 0.9262 to 0.9877 for RF, respectively, indicating the accuracy and reliability of the prediction. K-Nearest Neighbor (KNN), logistic regression (LR), and multiple linear regression (MLR) were conducted to validate the BPNN and RF algorithms. Furthermore, box and Taylor diagrams proved the BAS-BPNN and BAS-RF as the best-performed model for UCS and FS prediction, respectively. The optimal mixture design was proposed as 30% cement, 20% C&D waste, 4% fiber, and 0.8% sodium sulfate based on the importance score for each variable.


Introduction
Cement stabilized soil (CSS) is a routine cementitious material that yields wide applications including leakage-stopping, slope reinforcement, and foundation treatments [1]. However, weak strength and large deformation impede its extensive development. Construction and demolition (C&D) waste resolve the imperfection by physical and bonding strength enhancement. C&D waste particles evolve mechanical support in the CSS matrix due to higher hardness, resulting in better unconfined compressive strength (UCS) performance. The CSS mechanical property is further improved by grinding-incineration treated C&D waste, which represents a positive effect on mortar bonding strength [2,3]. Moreover, C&D waste demonstrates stronger enhancement potential under the excitation of saline The purpose of this paper is to experimentally investigate the CSS performance enhancement by C&D waste, polypropylene fiber, and sodium sulfate. C&D waste was incorporated by 10%, 20%, and 30% to substitute cement. The dosing level of polypropylene fiber and sodium sulfate were 1%, 2%, 4% and 0.2%, 0.4%, 0.8%, respectively. UCS test, flexural strength (FS) test, and direct shear test were conducted to examine the coupling enhancement on CSS mechanical properties. The Back Propagation Neural Network (BPNN) and FR with BAS algorithm tuning hyperparameters were employed to predict the UCS and FS performance of CSS.

Materials
The soil and C&D waste within this research were sourced from the construction site of Zhushan Road metro station in Nanjing. All particles were pre-dried and ground to sizes less than 5 mm. The physical and mechanical properties of soil are summarized in Table 1. The Portland cement 42.5 was utilized as a stabilizer and cementitious binder. Polypropylene fiber with a length and density of 10 mm and 0.91 g/cm 3 , respectively, was employed to enhance mechanical performance. Table 2 listed the detailed mechanical properties. Moreover, the air gun was applied to refrain fibrous material from agglomeration. Sodium sulfate was used to provide alkali catalysis. Table 1. Physical and mechanical properties of soil sample.

Mixture Design
The variables in this research were the content of Portland cement, C&D waste, polypropylene fiber, and sodium sulfate. Each dosing level was determined based on its weight ratio to the pre-dried soil. Particularly, Portland cement and C&D waste to soil ratios were defined as 10%, 20%, and 30%. Polypropylene fiber was incorporated, accounting for 1%, 2%, and 4% of the soil weight. The dosing proportions of sodium sulfate were 0.2%, 0.4%, and 0.8%. As for water, the weight ratio was maintained constantly at 80%. The consequent 81 combinations along with 3 control groups (conventional CSS) were cast for experiments.

Mechanical Tests
UCS, FS, and direct shear tests were conducted to estimate the CSS mechanical performance. The procedure for all mechanical tests was prepared strictly in accordance with GB/T50123-1999 [42]. Soil samples for direct shear tests were shaped as 61.8 mm × 20 mm (diameter × height) particularly. The normal stress was applied vertically at σ n = 50, 100,150, and 200 kPa to examine the shear strength parameters. The single-doped specimens used for the direct shear test contained C&D waste at 10%, 20%, and 30% content, while the variable dosage design for UCS and FS specimens was consistent with the aforementioned. Cubic (50 mm × 50 mm × 50 mm) and cuboid (40 mm × 40 mm × 160 mm) specimens were cast, respectively, for UCS and FS tests. After vibrating, mortar samples were wrapped in a thin membrane and cured in the standard curing condition (20 ± 2 • C temperature and 95% relative humidity) until tests were conducted at 7-, 14-, and 28-days of the curing period. On the day before the tests, specimens were shifted from the curing chamber and soaked in 24 • C water for 24 h to pattern the humid working situation. YAW-4206 and DY-208JX automatic pressure testers were employed to conduct the UCS and FS tests with a 0.04 MPa/s loading rate. The average data of three replicated specimens after eliminating the error was recorded as the ultimate test result, listed in Appendix A.

Baseline Models
Baseline models including LR, MLR, and KNN were selected, in contrast to BPNN and RF to assess the prediction accuracy. Regression models (LR and MLR) identify the relationship between predictor and output, possessing the benefits of minimum computation and easy implementation. Equations (1) and (2) display the principles of LR and MLR models.
Within the proposed equations above, x k and p represent independent and dependent variables; b 0 and b k stand for constant coefficients; Y is the predicted strength of CSS; x i and β i (where i = 1, 2, 3, . . . , n) denote the considered variables in laboratory test design and regression coefficient, respectively.
KNN algorithm estimates mechanical performance through similitude between inputting values. Specifically, KNN models detect the most similar observations in the dataset and output the average value as the ultimate prediction [43]. The pre-defined function calculates the distance between neighbors with Euclidean distances, assigning all neighbors with the same weight (Equation (3)) [44,45]. KNN models, therefore, possess the superiority of effective prediction among large datasets [29].
where i and j represent the detected points and d is the abbreviation of Euclidean distance.

Back Propagation Neural Network (BPNN)
BPNN, as one type of ANN algorithm trained by the Back Propagation (BP) technique, has been employed to successfully develop the prediction pattern for the mechanical strength of cementitious materials. The ANN model is essentially a neural network, consisting of an input layer, output layer, and hidden layer(s). As illustrated in Figure 1, each neuron yields the ability of a processing unit, merging information from the former layer to transport the combination to the subsequent nodes [46]. The following equation presents the neuron connection between upper and lower layers in the mathematical version.
where y represents the output value from the lower layer; w i stands for the weight; x i is the received data from the former layer, and b denotes the bias between neurons. The neuron loop iterates until the mean squared error (MSE) reaches the pre-set value, ending the training process [47].
where y i denotes the model prediction, andŷ i is the result estimated by labels.  Figure 2 represents the BP flow, which is the research approach to updating the bias and weights in the neuron network by calculating the difference between the predicted output and the actual strength from the dataset [48,49]. The BP technique endows the BPNN models with sensitivity to hyperparameters which affect the ultimate accuracy.

Random Forest (RF)
As illustrated in Figure 3, RF generates multiple decision trees (RT) in which each RT is built based on a new training set oriented from the bagging and voting method [50]. The bagging method yields characteristics of independently training predictors through bootstrap and aggregation. Bootstrap indicates that RF models allow the duplicate value, which randomly resamples the original dataset by the number of predictors. Each split is built from the random subset selected from the input predictor variables, improving the diversity to achieve accurate estimation. Equation (6) shows the training set as R n where X and Y, respectively, denote the input and output vector. The average result of RTs will be output as the ultimate prediction using the aggregation approach [50].

Beetle Antennae Search (BAS)
The BAS algorithm was proposed from the behavior of beetles, evolving the function to avoid the tedious effort of optimizing hyperparameters manually. The beetles cannot locate the accurate position while looking for food. As a result, beetles move towards the side which receives the greater intensity of odor. Inspired by the principle, the BAS algorithm simulates the goal hyperparameter as the food, rendering the ML models with the capability of automatically tuning [39]. As explained by Equation (7), the first step of BAS is to generate a random vector as the beetle antennae, where V indicates the direction and k represents the space dimensionality [40].
Secondly, the algorithm determines the antennae coordinate based on the direction vector: where X l , X r , and X i , respectively, denote the coordinate of left, and right antennae and their centroid at the ith iteration; D is the distance between the left and right antennae. The concentration is then compared by the normalized function represented as the following: where S represents the length of steps. The comparison iterates 50 times during the model training process to optimize the hyperparameters. Figure 4 illustrates the 10-fold cross-validation which was applied in this research to mitigate the overfitting during the training and testing stages caused by the finite database. Firstly, the input variable is randomly resampled as training and test set, which, respectively, account for 70% and 30% of the original dataset. Then, the training set is separated into 10 equal folds. The 90% of folds yield the function of training the ML models, and the last fold validates the prediction performance by calculating the root means square error (RMSE) [51]. The 10 folds will take turns to be the validation fold. Specifically, in each cross-validation, the BAS algorithm is used to optimize the hyperparameters within 50 iterations. In each iteration, the RMSE is calculated for hyperparameter adjustment. Finally, the ML model with the minimum RMSE will be saved in each cross-validation (a total of 10 models). By comparing the RMSE values from each fold, the ML model with the lowest RMSE value and optimal hyperparameters will be chosen as the final ML model.

Performance Evaluation
Two evaluation indicators are applied in this study, aiming to estimate the precision of the baseline, BPNN, and RF models: correlation coefficient (R) and RMSE. The indexes are defined by the following equations: where n represents the quantity of data groups; y * i and y i , respectively, denote the estimated and actual output; y * and y are the mean values of y * i and y i .

Effect of Portland Cement
The UCS and FS test results for control groups are illustrated in Figure 5. It is notable in Figure 5a that the CSS compressive strength increased with the curing age as evidenced by the observed increment up to 185.73% for 7-to 14-day curing time. This is mainly ascribed to cement hydration, which has been proven in Figure 6. Subfigures a and b illustrate the hardened sample photo taken by scanning electron microscope (SEM) and the energy-dispersive X-ray spectroscopy (EDX). The silicon content reached 41.14%, which was higher than that of conventional soil [8]. The results indicate the presence of the hydrate phase like calcium silicate hydrate (C-S-H). These products render strengthened bonding between soil particles and cement. The hydration slowed down to maintain a steady rate during the late curing age (after 14-day), as reflected by the declining increasement ranging from 7.45% to 36.18% in Figure 5. Meanwhile, the UCS test results increased with Portland cement content due to the same reason. The maximum increment during all curing ages is 450.34%, indicating that CaSiO 3 and C-S-H significantly promote the CSS compressive strength. Specifically, the colloidal hydrate products filled the porosity, contributing to mitigating entrapped air voids. Figure 5b represents a similar trend as FS results at 28-day increased 91.19% and 176.91%, respectively, when cement content rose 10%.  On the other hand, increasing Portland cement dosing level causes an undesired alkalisilica reaction (ASR). It is shown in Figure 7 that brittle destruction occurred in control groups, leading to the evident fracture surface. Portland cement introduces brittleness along with compressive strength enhancement, impairing sample deformation to the external force. The conclusion can be demonstrated by the surrounding debris in Figure 7.

Effect of C&D Waste
The 28-day UCS results are shown in Figure 8, with subfigures separated based on Portland cement content. The maximum strength for the three dosing levels as illustrated in the figures were 0.8048 MPa, 1.5008 Mpa, and 2.6572 Mpa, respectively. All the results were higher than those of the control groups, indicating mechanical performance increases with C&D waste incorporation. This is mainly ascribed to the old mortar attached to C&D waste particles which participate in cement hydration [52,53]. The consequent ITZ represents bonding strength that anchors the stiff C&D particles in the matrix to support and prevent soil collapse. Additionally, calcium hydroxide (CH) formed during the hydration process accelerates the hydration process of old mortar, which will generate more calcium silicate hydrates (C-S-H) to promote sample strength [5].  Figure 9 shows the FS results at the 28-day curing age, indicating that C&D waste demonstrates a positive effect on FS performance. As shown in Figure 9b,c, C&D waste displayed better FS enhancement when cement content was high. The maximum improvements reached 56.83% and 57.2%, respectively, while the data in Figure 9a was 21.62%. This phenomenon can be explained as the increase in cement content enhances the degree of hydration of old mortar attached to the C&D waste surface. On the other hand, enhancement became insignificant when C&D waste content was high (30%). This is mainly due to superabundant large particles introduced into the matrix, resulting in porosity and entrapped air. Direct shear tests were also conducted due to their cost-effectiveness and convenient operation. The relationship between normal stress (σ) and shear stress (τ) with various C&D waste proportions is illustrated in Figure 10. The inclusion of C&D waste significantly enhances the shear performance as proven by the increasing cohesion © and material angle of friction (ϕ). Moreover, the average increments were 33.52%, 43.28%, and 26.34% for each 10% increase in C&D waste dosage, indicating that 20% C&D waste content demonstrated the best-improving effect.

Effect of Polypropylene Fiber
In Figures 8 and 9, UCS and FS increased with polypropylene fiber proportion, demonstrating the positive effect. The enhancement function on UCS can be attributed to the higher particle friction provided by fiber than that inside the matrix [54]. Specifically, a rougher fiber surface prevents particle displacement, which impedes the generation of microscopic cracks. However, fiber incorporation introduces undesired porosity and compactness descending, leading to a strength reduction of 31.52% in Figure 8c. Figure 11 shows an electron microscope image of the sample fracture surface after the UCS test. Several porosities exist on the fiber periphery. These entrapped air voids caused by fiber agglomeration remarkably weaken the fiber enhancement on compressive strength [55]. Compared with UCS tests, fibrous material promoted FS results more evidently. As depicted in Figure 9a, the FS of CSS specimens increased up to 82.33% and 150.31% when the fiber content doubled. A similar trend can be observed in Figure 9b,c. The dominant enhancement of FS can be attributed to the bridging effect. The randomly distributed fibers demonstrate the outstanding inhibitory function on crack generation. In addition, polypropylene fiber will be pulled out of the fracture surface when the failure occurs, endowing the relict with certain flexural resistance. However, the bridging effective can be hindered by cement inclusion. The peak FS was recorded at 30% C&D waste dosed group when cement content was low (10% and 20%), whereas the 30% cement specimens reached the maximum at 20% C&D waste proportion. This is mainly ascribed to the excessive cement hydration that impeded the fibrous bridge formation.

Effect of Sodium Sulfate
The influence of sodium sulfate can be analyzed in Figures 8 and 9. UCS and FS results share a similar trend that mechanical performances strengthen with sodium sulfate proportion. The average increment of UCS and FS from 0.2% to 0.8% sodium sulfate were 16.94% and 16.29%, with the maximum increasement recorded as 59.61% and 69.96%, respectively. This is attributed to the reaction between sulfate ion and liquid phase (AlO 2− , Ca 2+ , etc.). The main product is ettringite (AFt phase), revealing enhancing characteristics on early-age strength [56]. Moreover, metal ions (Na +− , Ca 2+ , etc.) demonstrate dominant effectiveness in improving alkalinity, which yields function of the reaction rate catalyzation along with SiO 2 and Al 2 O 3 dissolution. However, samples incorporated with 0.4% sodium sulfate failed to follow the positive trend. The UCSs of the sample changed irregularly, as evidenced by the value fluctuating from −14% to 32.59% compared to 0.2% sodium sulfate. A similar phenomenon was also observed in Figure 9. The error source is probably ascribed to human error and material composition deviation.
Furthermore, based on Appendix A, the UCS enhancing rate varies from each curing age, as evidenced by the average increment of 15.41% and 30.49% for early (7-day to 14-day) and late (14-day to 28-day) curing times. The principle can be explained by Equations (14) and (15) [57]. Sulfate ions modified the conventional hydration reaction, resulting in the formation of ettringite which promotes the CSS mechanical performance. However, C-S-H exhibits the capability of absorbing sulfate in the early curing stage and releasing it during the later period, leading to rapid strength promotion from 14-to 28-day [58][59][60].

Hyperparameter Tuning
In total, 252 data (84 groups of experimental results) constituted the database, which reached the requirement of the database size for the traditional machine learning task. During the machine learning process, the contents of cement, C&D waste, fiber, and sulfate, and the curing age were set as features. The outputs were UCS and FS.
For BPNN models, hyperparameters that needed to be determined include the number of neurons and layers. BAS and 10-fold CV detected the optimal hyperparameters through iteration as illustrated in Figure 12. It is evident in Figure 12a,b that the third fold and BPNN network with three hidden layers obtained the lowest RMSE value. Figure 12c represents the BAS algorithm conducted in fold 3, indicating the RMSE value reduced with the iteration and the tuned hyperparameter was gained at the 36th iteration. The consequent BPNN hyperparameters were therefore determined as numHiddenLayers = 3, with numNeuronsInEachLayers = 3, 11, 4, respectively. During the modeling setting, the amounts of trees (ntree) and the minimum number of leaves (minNumlea f ) are fundamental parameters that needed to be adjusted for the RF algorithm. In this research, they were detected from the procedure as shown in Figure 13. It is noted that the RMSE value is basically convergent within 50 iterations for the traditional machine learning task. From Figure 12b,c and Figure 13b, the RMSE value's reduction can be clearly observed within the first 10-30 iterations and maintains the minimum value after 30 iterations, illustrating that the RMSE reaches the local minimum. Specifically, the minimum RMSE value was obtained at the 6th fold as 0.1015 which dropped significantly with iteration progress, demonstrating the obtainment of desired hyperparameters as numTree = 88, minNumlea f = 1.  Additionally, prediction and actual results formed a great correlation as evidenced by the R value in Figure 16. As depicted in Figure 16a, the correlation coefficient (R) for the BPNN algorithm were 0.9717 and 0.9594 for the training and test set, respectively, which were both lower than that for the RF algorithm (0.9877 and 0.9685). Therefore, BPNN and RF simultaneously provided reliable predictions, whereas RF yielded enhanced accuracy. Moreover, a similar R value indicated that there was no overfitting problem in both algorithms.  Four outliers (read+) were defected in BPNN, which was more than that of other models. However, as the interquartile range and median affect the accuracy more significantly, BPNN and RF demonstrated relatively similar reliability among all five algorithms. The Taylor diagram was also applied to evaluate the model performance through three assessment criteria including R, RMSE, and standard deviation, as shown in Figure 18. The dot denoted for RF was the nearest to the actual point with the minimum standard deviation, maximum R, and minor RMSE. Table 3 listed the specific value of R and RMSE for each algorithm, proving RF as the best-performed model in UCS prediction.  A similar procedure with UCS estimation was applied to optimize hyperparameters for FS prediction. The 3rd fold outputs the minimum RMSE value during the CV process as shown in Figure 19a. Moreover, the numHiddenLayers was examined as 1 because the RMSE reduced remarkably and reached the minimum when the iteration was processed for three times. The phenomenon can be ascribed to the effectiveness of BAS on hyperparameter tuning. The other desirable hyperparameter numNeuronsInEachLayers defected as 1. For the RF algorithm, the 9th fold had the minor RMSE as evidenced by Figure 20a. The hyperparameters optimized by this iteration were therefore applied to predict the FS performance. Figure 20b shows the RMSE scatter plot, indicating the decline of RMSE value until it maintained a minimum at the 41st repeat. The final tuned hyperparameters were numTree = 29, minNumlea f = 1.

Performance of BAS-BPNN and BAS-RF for FS
After being automatically tuned in the 70% training set, hyperparameters were applied in the 30% test set to predict the FS property of CSS. Figures 21 and 22 present the scatter plot of FS prediction from BPNN and RF models with the actual strength of the training and test set, respectively. It is noted from the figures that the prediction and the actual results fitted well, as the contract ratio of red and blue lines were relatively high. Furthermore, error bars located on the horizontal line proved that BPNN and RF algorithms demonstrated similar accuracy in FS prediction.   Detailed RMSE and R value are illustrated in Figure 23, where subfigure (a) depicts the BPNN model, and (b) depicts the RF model. The RMSE values were ranging from 0.0841 to 0.1583, indicating that BPNN and RF models estimated the strength accurately. The training set of the RF algorithm defected the hyperparameters with the highest R and the lowest RMSE. However, the test set output the worst-performed value, which manifested that the RF model had a higher risk of overfitting compared with BPNN.

Comparison of BPNN. RF, LR, MLR, and KNN
For FS prediction, MLR demonstrated high accuracy as evidenced by the condensed interquartile range in Figure 24a. However, BPNN exhibited better-integrated reliability because of fewer outliers and the lower median value. Figure 24b integrated R, RMSE and standard deviation into polar coordinates, obtaining the same conclusion owing to the closest distance between BPNN and the actual FS result. In addition, based on evaluation standards listed in Table 4, BAS-BPNN was also considered the most effective algorithm due to the least error and best degree of fitting.

Optimal Mixture Design
The RF algorithm defined the effect factor of each variable as depicted in Figure 25, which contributed to proposing the optimum mixture design. The consequent importance score for each content was similar to that obtained from laboratory tests. Water and soil proportion exhibited no influence on the mechanical property owing to the constant dosing level in all specimens. C&D waste and sodium sulfate had a similar introduction effect. It is noted that cement content, curing age, and fiber content yielded the best effectiveness on CSS mechanical strength. Combined with the UCS and FS results listed in Appendix A, specimens prepared with 30% cement, 20% C&D waste, 4% polypropylene fiber, and 0.8% sodium sulfate were considered the best performed. The conclusion can be attributed to the high ranking of the 28-day UCS and FS performance among all mixture designs.

Conclusions
In this research, the inclusion effects of Portland cement, construction and demolition (C&D) waste, polypropylene fiber, and sodium sulfate on the mechanical properties were assessed through laboratory tests. However, the increase in compressive properties is not significant.
(4) Higher levels of sodium sulphate increase the mechanical properties of the cement soil by 59.61% and 69.96%, respectively. However, the 0.4% sodium sulphate fails to change the properties regularly, with a range of −14% to 32.59%. (5) The influencing factors of each variable on CSS performance are ranked in descending order as: Portland cement, polypropylene fiber, C&D waste, sodium sulfate. The mixture design of 30% cement, 20% C&D waste, 4% fiber and 0.8% is considered as the best-performed combination. (6) BPNN and RF acquired the most accurate prediction for UCS and FS, respectively.
Baseline models generally are inferior to Machine Learning models with hyperparameters in mechanical strength prediction.
The research output from this article could lead to a wider application of CSS as an engineering material. Moreover, the concluded enhancement can be treated as a baseline model. Future research can extend the experiments to explore other properties such as slump, or to consider alternative aggregate ratios. Meanwhile, RF and BPNN can be employed to predict whether the designed proportion will achieve the mechanical strength requirements or to optimize the proportioning for a given strength.