Modelling the Effects of Nanomaterial Addition on the Permeability of the Compacted Clay Soil Using Machine Learning-Based Flow Resistance Analysis

: Impermeable base layers that are made of materials with low permeability, such as clay soil, are necessary to prevent leachate in landﬁlls from harming the environment. However, over time, the permeability of the clay soil changes. Therefore, to reduce and minimize the risk, the permeability-related characteristics of the base layers must be improved. Thus, this study aims to serve this purpose by experimentally investigating the effects of nanomaterial addition (aluminum oxide, iron oxide) into kaolin samples. The obtained samples are prepared by applying standard compaction, and the permeability of the soil sample is experimentally investigated by passing leachate from the reactors, in which these samples are placed. Therefore, Flow Resistance (FR) analysis is conducted and the obtained results show that the Al additives are more successful than the Fe additive in reducing leachate permeability. Besides, the concentration values of some polluting parameters (Chemical Oxygen Demand (COD), Total Kjeldahl Nitrogen (TKN), and Total Phosphorus (TP)) at the inlet and outlet of the reactors are analyzed. Three different models (Artiﬁcial Neural Networks (ANN), Multiple Linear Regression (MLR), Support Vector Machine (SVM)) are applied to the data obtained from the experimental study. The results have shown that polluting parameters produce high FR regression similarity rates (>75%), TKN, TP, and COD features are highly correlated with the FR value (>60%) and the most successful method is found to be the SVM model.


Introduction
The disposal of solid wastes by landfill is a more widely used disposal method compared to others due to its economic advantages and simplicity [1]. With the effect of water in the composition of the wastes, precipitation, and physical, chemical, and biological reactions, leachate occurs in landfill areas [2]. Leachate is wastewater with a high contaminant content that contains organic substances, monovalent and multivalent ions, microorganisms, and heavy metals [3]. The type of waste in the landfill, waste composition, age of the landfill and climatic conditions are all factors that affect the chemical composition of the leachate [4]. In typical leachate, the chemical oxygen demand (COD), total phosphorus (TP) concentration, and organic nitrogen concentration vary in the range of 140-152,000 mg/L, 0.1-30 mg/L, and 14-2500 mg/L, respectively [5][6][7].
Liners with low hydraulic conductivity (k ≤ 10 −7 cm/s) are used to prevent leachate from reaching the groundwater in landfill areas. Hydraulic conductivity depends on many factors such as the void ratio, medium porosity, pore size, soil type, liquid type, and contaminant concentration [8,9]. As the leachate passes through the clayey soil, it causes changes in the structure and properties of the clay, resulting in an increase or decrease in hydraulic conductivity. The increase in ion concentration [10,11], ionic valence [12,13], and organic matter concentration [14,15] in the leachate contributes to increasing the hydraulic conductivity, while the increase in the number of microorganisms [16] and the suspended solid concentration [17] contributes to the decrease in hydraulic conductivity. The increase in hydraulic conductivity as a result of the leachate-liner interaction is caused by factors such as the thinning of the diffuse double layer (DDL) thickness of the clay, agglomeration, flocculation of the clay particles, the formation of cracks in the soil structure, and the increase in the void ratio [12,14,18,19].
The low permeability of the liner, which can be provided with natural materials such as clay, is of great importance in terms of not polluting the groundwater with the leachate of the landfill area. However, geomembranes are used alongside the low permeability liner when the landfill area is inclined. The use of clay in landfills is more advantageous and preferable due to the cost of geomembranes and the risk of puncture. Therefore, it is important to improve the permeability of clay with nanomaterials [20,21].
The properties of the clays used in the landfill can be modified by adding nanomaterials to the clay. The high specific surface area of nanoparticles can cause a high surface energy between the particles, which can greatly change the engineering properties as well as the physical and chemical properties of the soil [22][23][24]. For instance, Naval et al. state that the addition of nano MgO and nano Al 2 O 3 into kaolinite clay resulted in a decrease in the liquid limit, plasticity limit, plastic index, and swelling potential of the soil [25]. In addition, the presence of nanoparticles in the clay, and the increase in nanoparticle concentration, change the bonding and adhesion between the clay particles, increase the connection between the pores and cause the number and size of the pores to decrease [23,24,26]. Increasing the soil integrity of the clayey soil and decreasing the porosity through nanomaterials cause a decrease in hydraulic conductivity. Studies in the literature reveal that nanomaterial type [27][28][29], concentration [28,29], and size [30] affect permeability. Taipodia et al. report that the permeability coefficient decreased from 2.41 × 10 −5 cm/s to 1.25 × 10 −5 cm/s as a result of filling the soil voids with the addition of nano CaO to clayey soil [27]. Ng and Coo mix two different nanomaterials, gamma-aluminum oxide (γ-Al 2 O 3 ) and nano-copper oxide (CuO), with kaolin clay in different proportions. The initial hydraulic conductivity is decreased by 30% and 45%, respectively, with the addition of 2% nano γ-Al 2 O 3 and 2% nano CuO [28]. Taha and Alsharef report that the hydraulic conductivity value of pure soil sample decreased from 2.16 × 10 −9 m/s to 9.46 × 10 −10 m/s and 7.44 × 10 −10 m/s, respectively, with the addition of multiwall carbon nanotube (CNT) and carbon nanofiber (CNF) [29]. Bahmani et al. investigate the effect of nanosilica particles of two different sizes, 15 nm, and 80 nm, on the hydraulic conductivity of the soil. Nanosilica particles with the size of 15 nm further reduced the hydraulic conductivity. Bahmani attributes this to the fact that the packing density of smaller-sized nanoparticles is higher than that of larger-sized nanoparticles [30].
There are many machine learning studies on permeability estimation in the literature. Ahangar-Asr et al. make an estimation of permeability using Evolutionary Polynomial Regression; the obtained results are compared with the Artificial Neural Networks and better results are obtained [31]. In Boroumand and Baziar's studies, permeability estimation is made using ANN and comparative analyses are made with previous Artificial Neural Networks studies [32]. Günaydin et al. calculate the compaction parameters related to permeability using the Support Vector Machine model [33]. In Ozbeyaz and Soylemez's study, Support Vector Machine and decision trees are used to calculate compaction parameters. The Support Vector Machine model is found to produce better results [34]. On the other hand, Singh et al. hypothesize that Support Vector Machines, Multi Linear Regression, Genetic Programming, and Random Forest algorithms are used for permeability estimation. The Support Vector Machine model has been found to produce the best results [35]. Se-bastian and Sindhu's study is intended to develop correlations between permeability and different soil properties using the Multi Linear Regression method [36]. Santisukkasaem et al.'s study is aimed to create a permeability-predicting ANN structure by using different ANN structures. The obtained results are compared with MLR and it is seen that ANN produced better results [37]. In another study, permeability, maximum dry density (MDD), and optimum moisture content are estimated using ANN. The obtained results are compared with Statistical Equations and it is seen that ANN is a suitable model [38]. Tizpa et al. present artificial neural network prediction models, which relate compaction characteristics, permeability, and soil shear strength to soil index properties. A comparison of the results demonstrates that the developed ANN models provide highly accurate predictions [39]. Finally, Mahdi and Holdich's research predicts the permeability of loosely packed granular materials is made using ANN and MLR. One can see that ANN gives the best results [40].
This study is aimed to improve the impermeability of the base by adding different percentages of aluminum oxide and iron oxide as nanomaterials into the kaolin sample. Three different models (Artificial Neural Networks (ANN), Multiple Linear Regression (MLR), Support Vector Machine (SVM)) are applied to the data obtained from the experimental study. Thus, the Flow Resistance (FR) model is employed for the first time to approach this problem with an analogy of the typical traffic flow phenomenon. Finally, the obtained data are evaluated and the drawn conclusions are discussed.

Experimental Setup and Tests
Fixed head permeability experimental methods are used in this study. The reactor has a volume of 15,700 cm 3 and is made of plexiglass. The height of the clay soil layer in the reactor is 11 cm. Clay soil specimens are saturated under 0.3 bar pressure [41]. The experimental setup is visualized in Figure 1 [42].

Material Properties
The properties of the leachate, taken from Kemerburgaz-Odayeri Landfill Area located on the European side of İstanbul are as follows, pH: 8.

Standard Compaction Test
In this study, the standard compaction test, which characterizes the relationship between the dry unit weight of the soils and the mold water content, is applied. The energy and compression type provided for a given clay volume is standard. First, the clay samples are dried, and then water is added gradually to achieve the desired moisture content. The samples are placed in the mold and compression is applied in three stages. At each stage, 25 drops are made from a height of 30 cm with a 2.5 kg hammer [42]. The samples are prepared at the optimum water content by adding nanomaterials separately into kaolin.

Material Properties
The properties of the leachate, taken from Kemerburgaz-Odayeri Landfill Area located on the European side ofİstanbul are as follows, pH: 8.5, COD: 5025 mg/L, TKN: 3365 mg/L, TP: 7.4 mg/L, and SS: 848 mg/L.
The kaolin consists of kaolinite, illite, and quartz. For more detailed chemical composition, the reader is encouraged to refer to the technical data sheet of Esan Eczacıbaşı Industrial Raw Materials Ind. and Trading Inc. Iron oxide nanopowder of 50-100 nm effective diameter, which is purchased from Sigma-Aldrich, and aluminum oxide nanopowder of 13 nm effective particle diameter are used as nanomaterial additives [42].

Artificial Neural Networks (ANN)
ANN is a modelling approach that is based on simulating human nervous system behavior. This is an efficient tool for solving various problems thanks to its properties such as non-linearity, information processing, and learning. Therefore, ANN is used in various fields, especially for classification, modelling, and prediction processes. For the usage of these various fields, many types of ANN are developed in the literature: Multi-Layer Perceptron networks (MLP), Hopfield networks, Cohen-Grossberg networks, Elman networks, bidirectional associative memory networks etc. [43].
MLP, which is a popular network model and also used in this study, is a network that consists of an input layer, an output layer, and one or more hidden layers [43]. In this study, there are three neurons in the input layer as COD, TKN, and TP; ten neurons in the hidden layer, and one neuron in the output layer as kaolin ( Figure 2).
MLP, which is a popular network model and also used in this study, is a network that consists of an input layer, an output layer, and one or more hidden layers [43]. In this study, there are three neurons in the input layer as COD, TKN, and TP; ten neurons in the hidden layer, and one neuron in the output layer as kaolin ( Figure 2).

Multiple Linear Regression (MLR)
Regression is an analysis method used to measure the relationship between two or more quantitative variables and it is widely used for analyzing multifactor data. This is useful when it comes to expressing the relationship between input variable(s) and an output variable. Multiple Linear Regression (MLR) is a regression type, in which there is more than one input parameter. MLR presents a relationship between the independent variables, which affect the dependent variables. The formulation of the general MLR model is shown below:

Multiple Linear Regression (MLR)
Regression is an analysis method used to measure the relationship between two or more quantitative variables and it is widely used for analyzing multifactor data. This is useful when it comes to expressing the relationship between input variable(s) and an output variable. Multiple Linear Regression (MLR) is a regression type, in which there is more than one input parameter. MLR presents a relationship between the independent variables, which affect the dependent variables. The formulation of the general MLR model is shown below: where y is the output, x i (i = 1, 2, . . . , k) are input values, β i (i = 0, 2, . . . , k) are the coefficients for the corresponding variables, β 0 is the constant term and ε is the associated error term [44].

Support Vector Machine (SVM)
SVM is a popular supervised learning algorithm used for classification and regression problems thanks to its strength in the non-linear classification process and the ability to solve problems without getting stuck in a local minimum [45]. When using SVM in regression problems, an alternative loss function is given. In this study, the quadratic loss function is used for regression analysis [46].

Flow Resistance (FR)
In the modelling phase, to examine the effects of clay samples with different percentages of nanomaterial content on pollutant permeability, a new model is developed by analogy with classical traffic flow models based on basic parameters, such as the amount of polluted water passing through the clay at different time intervals obtained from the experiment, height, volume, surface area. The development stages of the model are described below.
In the first stage of the model, firstly, the mass flow values of the leachate passing through the clay at different time intervals are calculated (Equation (2)). Flow rate information for different clay mixtures is shown in Table 1.
Mass Flow Rate (MFR) = m water ∆t (2) Then, the flow density (s/cm) for the polluted water passing through the clay is calculated for each time interval, from which the samples are taken (Equation (3)). ∆t is the time elapsed between the two measurements, and L is the thickness of the clay (L = 11 cm). Since the samples are taken at the same time for all clay types and the thicknesses are equal, the transition density values are the same for all clays (Table 2). In the next step, the flow traffic rate (g/cm) at the time the sample is taken is calculated by multiplying the mass flow values with the flow density (Equation (4)).
Flow Traffic Rate (FTR) = MFR × FD (4) In the last step, the flow traffic rate (FTR) is multiplied by the time difference between the two samples. So, the average flow rate (AFR) is determined (Equation (5)). Afterwards, the average flow rate is multiplied by the clay permeability to determine how much the soil resists the flow (Equation (6)).

Regression Similarity Approach
The other stage of the modelling is to examine the performance of predicting the FR values of pollutants at different time intervals with machine learning methods. Therefore, regression methods are used instead of classification methods in order to measure the effects of pollutants in the system as inputs on the FR value. As the FR values are real numbered variables that do not contain any class information (0 or 1), regression methods will enable the determination of both the current and future FR values of the system. If the problem is linear, linear regression methods should be used. Otherwise, non-linear regression methods can be selected for the analysis (Equation (7)).
Y i is the estimated regression value of FR, β 0 is the parameter of the intersection point of the y-axis, β 1 is the slope of regression graph, X i is pollutant value, ε i is random error rate, n is the degree of the regression line. For n = 1, the solution will be linear, however, for n > 1 the result is solved only in the non-linear plane.
We observed that the literature and many different studies, summarized in the Introduction section, generally focus on three different regression methods for the two types of problems mentioned above. Those are SVM, ANN, and MLR methods.
MLR is chosen as the linear regression method rather than the classical method. The main reason for choosing the MLR method is that the number of pollutant variables is more than one (COD, TKN, and TP). Therefore, classical linear regression is not a suitable method for the problem.
SVM and ANN are two commonly used techniques for non-linear regression. Some studies state that SVM is better than ANN, while some studies found the opposite. These performance differences are related to both problem types, dataset size, and dataset balance. Since the FR model we proposed is the first in the literature and therefore there is no comparable dataset, there is no definite information about which method would give better performance. However, as mentioned in the Analysis section, both methods show that the dataset has meaningful information and is highly correlated with the actual FR values.
Thirty-three randomly selected soil samples that contain COD, TKN, and TP of the leachate at different time intervals are divided into train and test sets to generate FR values based on soil using 5-fold cross-validation. Cross-validation is required to select the different time intervals. The separation process of the data into train and test sets is conducted in order to test the trained regression model without overfitting. What is required here is the similarity between the regression and the actual FR values. In the regression similarity approach, the root means square error (RMSE) of the two values is subtracted from 1 and the error rate is converted to the similarity ratio (Equation (8)). where n is the number of test size, y i is the observed value for the ith observation and y i is the predicted value. In addition, R 2 similarity analysis, which is frequently used in regression analysis, is also applied in our study. The most general definition formula for the identification coefficient (R 2 ) is: where y i is the observed value for the ith observation,ŷ i is the predicted value and y i is the mean value of the response variable. We observed that COD, TKN, and TP as an attribute have different effects on the similarity ratio (in the range of 0 to 1 or non-similar to similar) between the value produced by regression and the actual FR values. This procedure is repeated 100 times and the average of similarity ratios is taken as the basis for the analysis.

Compaction Test Results
Compaction in the laboratory experimentally investigates the changes in dry unit weight as well as the water content. As a result, the level of packing due to the applied compaction energy is also determined. The correlation between the water content and the dry unit weight is given in Figure 3. As a result of the standard compaction test, the optimal water content (w opt ) of the soil is found to be 30% and the dry unit weight (γ dmax ) is 12.4 kN/m 3 [40]. optimal water content (wopt) of the soil is found to be 30% and the dry unit weight (γdmax) is 12.4 kN/m 3 [40].

Flow Resistance Analysis
In the Flow Resistance (FR) analysis, FR values of different clay samples are calculated by using Equation (6). The calculated FR values are normalized to the 0-1 range and shown in Figure 4.

Flow Resistance Analysis
In the Flow Resistance (FR) analysis, FR values of different clay samples are calculated by using Equation (6). The calculated FR values are normalized to the 0-1 range and shown in Figure 4.

Flow Resistance Analysis
In the Flow Resistance (FR) analysis, FR values of different clay samples are calculated by using Equation (6). The calculated FR values are normalized to the 0-1 range and shown in Figure 4.  In light of the figures above, the time taken to reach 80% FR of the clay samples with and without nanomaterials is given in Table 3. According to the results obtained, we can see that the clay samples with nanomaterials reach the maximum FR much faster than the clay samples without additives and prevent leachate permeation much more quickly. While kaolin clay without additives can reach this value on the 139th day, the sample with 2% Fe added can reach this value on the 74th day, the sample with 4% Fe added on the 42nd day, the sample with 2% Al added on the 39th day and finally the sample with 4% Al added on the 11th day. The faster inhibition of leachate permeation after the addition of iron and aluminum nanomaterials to kaolin can be attributed to the fact that nanomaterials fill the pores in the clay soil structure and reduce porosity [47,48]. According to these results, we can be state that in the selection of the nanomaterial additives to be added to the clay, the Al additives are more successful than the Fe additive in the leachate permeability when compared to the pure kaolin clay sample. The smaller effective pore diameter of aluminum oxide compared to iron oxide means that aluminum oxide has a higher specific surface area. Nanomaterials with higher surface area absorb higher amounts of water and cause a greater increase in the water accumulation capacity of the soil [24,49]. The fact that the maximum FR value is reached in a shorter time in aluminum oxide added to kaolin compared to iron oxide added to kaolin, and that the permeability is lower when the aluminum oxide is added to kaolin, can be explained by the relationship between the surface area of the nanomaterial and its water accumulation capacity.
Another point is that with the increase in the iron oxide and aluminum oxide concentration added to kaolin, the time to reach the maximum FR is shortened and permeability values decreased. This result shows that the nanomaterial concentration creates a differentiation in the flow properties of the soil.

Regression and Correlation Analysis
The question underlying the regression analysis is how COD, TKN, and TP of leachate affect the FR value. Thus, the similarity between actual and predicted FR values should be calculated by using COD, TKN, and TP values.
Similarities between measured FR and predicted FR calculated using only leachate values would give some information about the effects of leachate on FR for different types of nanomaterial clay samples. For this reason, each COD, TKN, and TP value is selected as an attribute used for three different types of regression methods. Then, regression similarities are examined between the actual and predicted FR values. This process is repeated 100 times with randomly selected samples every time. Similarity (1 − RMSE), R 2 regression similarity, and correlation (ρ) analysis results are shown in the tables given below (Tables 4-7). When the regression values of leachate parameters are examined, we see features produced by high FR regression similarity rates (>75%). Furthermore, the correlation analysis shows that TKN, TP, and COD parameters are highly correlated with the FR value (>60%).
The FR values of Multiple Linear Regression produced by features and the actual FR values graph are shown in Figure 3 below for three different clay samples (Normal, 4% Fe, 4% Al) (Figures 5-7).     According to the results obtained, we can see that the regression performances a similarities decrease from the sample without nanomaterial additive to the sample w 4% Al. This situation is examined with three different regression methods. In the pu kaolin sample with the highest permeability, the regression similarity performance is 0. However, when the regression success rate of the 4% Al added clay sample with t lowest permeability is considered, we see that the success has decreased to the range 78-80% and the similarity value, R 2 , has decreased to 0.32, away from the safe range. W can say that this result facilitates the predictability of the FR value, as the pollutants in t leachate regularly flow from the pure clay and form a certain flow pattern.

Conclusions
This study demonstrates a new machine learning-based flow resistance (FR) meth for modelling the effects of nanomaterial additions (different percentages of aluminu oxide and iron oxide) on the permeability of compacted clay soil. The new model developed by analogy with classical traffic flow models based on basic parameters su as the amount of polluted water passing through the clay at different time interv obtained from the experiment, height, volume, and surface area. Three different machi learning regression algorithms (SVM, ANN, and MLR) are found to be very reliable a potential techniques for revealing the permeability performance for different types of s clay samples are represented by the new model. In terms of flow resistance analysis, found that the clay samples with nanomaterials, especially those with 4% Al, reach t maximum FR ratio much faster than those without and prevent dirty water permeati early. Regression and correlation analysis reveals how COD, TKN, and TP pollutan affect the FR value of different types of soil clay samples. We found that this analy facilitates the predictability of the FR value, as the pollutants in the leachate regularly flo from the pure clay and form a certain flow pattern. The obtained model results valida the new model, which can be used as a reliable and standard model in many differe permeability analysis studies. According to the results obtained, we can see that the regression performances and similarities decrease from the sample without nanomaterial additive to the sample with 4% Al. This situation is examined with three different regression methods. In the pure kaolin sample with the highest permeability, the regression similarity performance is 0.95. However, when the regression success rate of the 4% Al added clay sample with the lowest permeability is considered, we see that the success has decreased to the range of 78-80% and the similarity value, R 2 , has decreased to 0.32, away from the safe range. We can say that this result facilitates the predictability of the FR value, as the pollutants in the leachate regularly flow from the pure clay and form a certain flow pattern.

Conclusions
This study demonstrates a new machine learning-based flow resistance (FR) method for modelling the effects of nanomaterial additions (different percentages of aluminum oxide and iron oxide) on the permeability of compacted clay soil. The new model is developed by analogy with classical traffic flow models based on basic parameters such as the amount of polluted water passing through the clay at different time intervals obtained from the experiment, height, volume, and surface area. Three different machine learning regression algorithms (SVM, ANN, and MLR) are found to be very reliable and potential techniques for revealing the permeability performance for different types of soil clay samples are represented by the new model. In terms of flow resistance analysis, we found that the clay samples with nanomaterials, especially those with 4% Al, reach the maximum FR ratio much faster than those without and prevent dirty water permeation early. Regression and correlation analysis reveals how COD, TKN, and TP pollutants affect the FR value of different types of soil clay samples. We found that this analysis facilitates the predictability of the FR value, as the pollutants in the leachate regularly flow from the pure clay and form a certain flow pattern. The obtained model results validate the new model, which can be used as a reliable and standard model in many different permeability analysis studies.