# Use of Machine Learning Techniques in Soil Classification

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

_{a}), which considers the effect of a finite elastic half-space limited by the inclined bedrock under a foundation. The results obtained through the application of artificial neural networks (ANNs) demonstrated a notable enhancement in the predicted values for the influence factor in comparison with those of existing analytical equations. Puri et al. [12] suggested that most AI models are reliable in the prediction of missing data. Zhang et al. [32] tried to find a pressure module with artificial intelligence methods; this is an important parameter, as it affects the compressive deformation of geotechnical systems such as foundations and is also difficult and costly to find. The authors suggested that by applying ML algorithms, a system can become intelligent in self-understanding the relationship between input and output. A comparison of the performance of empirical formulas and the proposed ML method for predicting foundation settlement indicated the rationality of the proposed ML model. Momeni et al. [33] used machine learning techniques to estimate pile-bearing capacity (PBC). They found that the Gaussian process regression (GPR)-based model is capable enough to predict the PBC and outperforms the GA-based ANN model. The results showed that the GPR can be utilized as a practical tool for pile-bearing capacity estimation. Nguyen et al. [34] examined the effect of data splitting on the performance of machine learning methods in a prediction of the shear strength of soil through training/test set validation. They used 70% of the dataset for training and 30% of the dataset for testing. The results of this study have shown an effective way to select the best ML model and appropriate dataset ratios to accurately estimate the soil shear strength that will assist in the design and engineering phases of construction projects. Martinelli and Gasser [35] applied machine learning models for predicting soil particle size fractions. They compared performance in estimating particle size fractions in their study. In the study, soil pH, cation exchange capacity, and elements extracted with Mehlich-3 of 8364 soil samples taken from different parts of Canada were used as covariates for the estimation of texture components. The researchers reported that multiple linear regression and neural network models had the weakest prediction performance, and the models with the best prediction performance were reported as RF, KNN, and XGBoost. Nguyen et al. [4] suggested a new classification method for determining soil classes based on support vector classification (SVC), multilayer perceptron (MLP), and random forest (RF) models. The results indicated that the performance of all three models was good, but the SVC model was the best in the accurate classification of soils. Tran [36] used a single machine learning algorithm to predict and investigate the permeability coefficient of soil. The author showed that SHapley Additive exPlanations (SHAP) and Partial Dependence Plot 1D (PDP 1D) performed with the best performance aided by a reliable ML model, GB (gradient boosting).

## 2. Materials and Methods

#### 2.1. Exploratory Data Analysis

#### 2.2. Data Preprocessing

#### 2.2.1. Missing Value Imputation

#### 2.2.2. Dealing with Imbalanced Data

#### 2.3. Classification with Machine Learning

**Foundational Methods:**Decision trees/CART: A decision tree is a graph that provides choices and results in a tree shape [55]. Decision trees are applied in many fields because of their simple analysis approach and high success rates in the prediction of various data forms [53]. Classification and regression trees (CART) are one of the decision tree algorithms and are the default implementation used in the decision tree classifier of the Scikit-learn package. NB: The Naive Bayes algorithm defends Bayes’ theorem with the predictors’ independence assumption, and this algorithm assumes that the features in the class are not related to each other [53]. SVM: Support vector machine [43] can be used for classification and regression problems [55]. The main idea of support vector machines is to find a hyperplane in n-dimensional space to distinctly classify data points [56]. KNN: The K-nearest neighbor algorithm is an easy-to-implement algorithm that can be used for both classification and regression problems. The algorithm considers the K nearest data points to predict the class for the new data point. MLP: Multi-layer perceptron is one of the popular artificial neural networks, where multiple layers of neurons can be used to predict a value or a class [57]. SGD: Stochastic gradient descent implements a gradient descent algorithm through randomly picking one data point from the whole dataset at each iteration to reduce the computation time [58]. LDA: Linear discriminant analysis (LDA) is a dimensionality reduction technique [59] that can be used to separate different classes by projecting the features in higher dimension space into a lower dimension space.

**Ensemble Learning Methods:**Two key approaches to ensemble learning are boosting and bagging. Boosting refers to converting multiple weak models (weak learners) into a single composite model (i.e., strong learners) [60]. The two main boosting techniques are adaptive and gradient boosting. Gradient boosting handles boosting as a numerical optimization problem in which the objective is to minimize the loss function of the model by adding weak learners using the gradient descent algorithm [61]. Bagging is an ensemble method that trains classifiers randomly [62]. RF: Random forest (RF) is one of the most widely used bagging methods. It is used for solving problems in both regression and classification. Random forest has two key parameters. These are the number of trees and the number of randomly selected predictors on each node.

#### 2.4. Measuring the Classification Performance: The Metrics

## 3. Results

#### 3.1. Impact of Missing Data Imputation

- (i).
- A total of 97 rows with missing values are removed from the original dataset, and this dataset with 708 rows was named the pre-imputation-acc-test dataset. Following this,
- (ii).
- random sampling (x 1000) is applied to select 708 rows out of 805 rows of the imputed dataset, and this dataset was named the post-imputation-acc-test dataset.

#### 3.2. Impact of Data Balancing

#### 3.3. Comparison of Classifier Performance

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Acknowledgments

## Conflicts of Interest

## References

- Robertson, P.K. Cone penetration test (CPT)-based soil behaviour type (SBT) classification system—An up-date. Can. Geotech. J.
**2016**, 53, 1910–1927. [Google Scholar] [CrossRef] - Das, B.M.; Sivakugan, N. Fundamentals of Geotechnical Engineering; Cengage Learning: Boston, MA, USA, 2016. [Google Scholar]
- Reale, C.; Gavin, K.; Librić, L.; Jurić-Kaćunić, D. Automatic classification of fine-grained soils using CPT measurements and Artificial Neural Networks. Adv. Eng. Inform.
**2018**, 36, 207–215. [Google Scholar] [CrossRef][Green Version] - Nguyen, M.D.; Costache, R.; Sy, A.H.; Ahmadzadeh, H.; Van Le, H.; Prakash, I.; Pham, B.T. Novel approach for soil classification using machine learning methods. Bull. Eng. Geol. Environ.
**2022**, 81, 468. [Google Scholar] [CrossRef] - Atterberg, A. Die Plastizita¨t der Tone. Int. Mitt. Bodenkd.
**1911**, 1, 10–43. (In German) [Google Scholar] - Das, B.M. Advanced Soil Mechanics; Taylor & Francis: London, UK, 2019. [Google Scholar]
- Das, B.M. Principles of Geotechnical Engineering, 7th ed.; Cengage Learning: Boston, MA, USA, 2009. [Google Scholar]
- Casagrande, A. Research on the Atterberg Limits of Soil. Public Roads
**1932**, 13, 121–136. [Google Scholar] - Konert, M.; Vandenberghe, J.E.F. Comparison of laser grain size analysis with pipette and sieve analysis: A solution for the underestimation of the clay fraction. Sedimentology
**1997**, 44, 523–535. [Google Scholar] [CrossRef][Green Version] - Bartley, P.C.; Fonteno, W.C.; Jackson, B.E. A Review and Analysis of Horticultural Substrate Characterization by Sieve Analysis. HortScience
**2022**, 57, 715–725. [Google Scholar] [CrossRef] - Moreno-Maroto, J.M.; Alonso-Azcárate, J.; O’Kelly, B.C. Review and critical examination of fine-grained soil classification systems based on plasticity. Appl. Clay Sci.
**2021**, 200, 105955. [Google Scholar] [CrossRef] - Puri, N.; Prasad, H.D.; Jain, A. Prediction of geotechnical parameters using machine learning techniques. Procedia Comput. Sci.
**2018**, 125, 509–517. [Google Scholar] [CrossRef] - Kausel, E. Early history of soil–structure interaction. Soil Dyn. Earthq. Eng.
**2010**, 30, 822–832. [Google Scholar] [CrossRef][Green Version] - Karki, P.; Pyakurel, S.; Utkarsh, K. Seismic performance evaluation of masonry infill RC frame considering soil-structure interaction. Innov. Infrastruct. Solut.
**2023**, 8, 5. [Google Scholar] [CrossRef] - Mangalathu, S.; Jeon, J.S. Classification of failure mode and prediction of shear strength for reinforced concrete beam-column joints using machine learning techniques. Eng. Struct.
**2018**, 160, 85–94. [Google Scholar] [CrossRef] - Feng, D.C.; Liu, Z.T.; Wang, X.D.; Jiang, Z.M.; Liang, S.X. Failure mode classification and bearing capacity prediction for reinforced concrete columns based on ensemble machine learning algorithm. Adv. Eng. Inform.
**2020**, 45, 101126. [Google Scholar] [CrossRef] - Siam, A.; Ezzeldin, M.; El-Dakhakhni, W. Machine learning algorithms for structural performance classifications and predictions: Application to reinforced masonry shear walls. In Structures; Elsevier: Amsterdam, The Netherlands, 2019; Volume 22, pp. 252–265. [Google Scholar]
- Yucel, M.; Bekdaş, G.; Nigdeli, S.M.; Sevgen, S. Estimation of optimum tuned mass damper parameters via machine learning. J. Build. Eng.
**2019**, 26, 100847. [Google Scholar] [CrossRef] - Yucel, M.; Nigdeli, S.M.; Bekdaş, G. Machine Learning-Based Model for Optimum Design of TMDs by Using Artificial Neural Networks. In Optimization of Tuned Mass Dampers; Springer: Cham, Switzerland, 2022; pp. 175–187. [Google Scholar]
- Bekdaş, G.; Yucel, M.; Nigdeli, S.M. Estimation of optimum design of structural systems via machine learning. Front. Struct. Civ. Eng.
**2021**, 15, 1441–1452. [Google Scholar] [CrossRef] - Yucel, M.; Kayabekir, A.E.; Nigdeli, S.M.; Bekdaş, G. Optimum design of carbon fiber-reinforced polymer (CFRP) beams for shear capacity via machine learning methods: Optimum prediction methods on advance ensemble algorithms–bagging combinations. In Research Anthology on Machine Learning Techniques, Methods, and Applications; IGI Global: Hershey, PA, USA, 2022; pp. 308–326. [Google Scholar]
- Kayabekir, A.E.; Yucel, M.; Bekdaş, G.; Nigdeli, S.M. An artificial neural network model for prediction of optimum amount of carbon fiber reinforced polymer for shear capacity improvement of beams. In Proceedings of the 12th HSTAM International Congress on Mechanics, Thessaloniki, Greece, 22–25 September 2019; pp. 22–25. [Google Scholar]
- Cakiroglu, C.; Islam, K.; Bekdaş, G.; Isikdag, U.; Mangalathu, S. Explainable machine learning models for predicting the axial compression capacity of concrete filled steel tubular columns. Constr. Build. Mater.
**2022**, 356, 129227. [Google Scholar] [CrossRef] - Sarothi, S.Z.; Ahmed, K.S.; Khan, N.I.; Ahmed, A.; Nehdi, M.L. Predicting bearing capacity of double shear bolted connections using machine learning. Eng. Struct.
**2022**, 251, 113497. [Google Scholar] [CrossRef] - Cakiroglu, C.; Bekdaş, G.; Kim, S.; Geem, Z.W. Explainable Ensemble Learning Models for the Rheological Properties of Self-Compacting Concrete. Sustainability
**2022**, 14, 14640. [Google Scholar] [CrossRef] - Bekdaş, G.; Cakiroglu, C.; Islam, K.; Kim, S.; Geem, Z.W. Optimum Design of Cylindrical Walls Using Ensemble Learning Methods. Appl. Sci.
**2022**, 12, 2165. [Google Scholar] [CrossRef] - Isik, F.; Ozden, G. Estimating compaction parameters of fine-and coarse-grained soils by means of artificial neural networks. Environ. Earth Sci.
**2013**, 69, 2287–2297. [Google Scholar] [CrossRef] - Momeni, E.; Nazir, R.; Armaghani, D.J.; Maizir, H. Prediction of pile bearing capacity using a hybrid genetic algorithm-based ANN. Measurement
**2014**, 57, 122–131. [Google Scholar] [CrossRef] - Gambill, D.R.; Wall, W.A.; Fulton, A.J.; Howard, H.R. Predicting USCS soil classification from soil property variables using Random Forest. J. Terramech.
**2016**, 65, 85–92. [Google Scholar] [CrossRef] - Pham, B.T.; Hoang, T.A.; Nguyen, D.M.; Bui, D.T. Prediction of shear strength of soft soil using machine learning methods. CATENA
**2018**, 166, 181–191. [Google Scholar] [CrossRef] - Díaz, E.; Brotons, V.; Tomás, R. Use of artificial neural networks to predict 3-D elastic settlement of foundations on soils with inclined bedrock. Soils Found.
**2018**, 58, 1414–1422. [Google Scholar] [CrossRef] - Zhang, D.M.; Zhang, J.Z.; Huang, H.W.; Qi, C.C.; Chang, C.Y. Machine learning-based prediction of soil compression modulus with application of 1D settlement. J. Zhejiang Univ.-Sci. A
**2020**, 21, 430–444. [Google Scholar] [CrossRef] - Momeni, E.; Dowlatshahi, M.B.; Omidinasab, F.; Maizir, H.; Armaghani, D.J. Gaussian process regression technique to estimate the pile bearing capacity. Arab. J. Sci. Eng.
**2020**, 45, 8255–8267. [Google Scholar] [CrossRef] - Nguyen, Q.H.; Ly, H.B.; Ho, L.S.; Al-Ansari, N.; Le, H.V.; Tran, V.Q.; Prakash, I.; Pham, B.T. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Math. Probl. Eng.
**2021**, 2021, 4832864. [Google Scholar] [CrossRef] - Martinelli, G.; Gasser, M.O. Machine learning models for predicting soil particle size fractions from routine soil analyses in Quebec. Soil Sci. Soc. Am. J.
**2022**, 86, 1509–1522. [Google Scholar] [CrossRef] - Tran, V.Q. Predicting and Investigating the Permeability Coefficient of Soil with Aided Single Machine Learning Algorithm. Complexity
**2022**, 2022, 8089428. [Google Scholar] [CrossRef] - Bressert, E. SciPy and NumPy; O’Reilly: Beijing, China, 2013; pp. 1–41. [Google Scholar]
- About Pandas. Available online: https://pandas.pydata.org/ (accessed on 11 January 2023).
- Matplotlib—Visualization with Python. Available online: https://matplotlib.org/stable/users/project/mission.html (accessed on 10 January 2023).
- Scikit-Learn Package. Available online: https://scikit-learn.org/stable/ (accessed on 1 December 2022).
- Python (3.9) [Computer software]. Available online: http://python.org (accessed on 1 December 2022).
- Anaconda3 [Computer software]. Available online: https://anaconda.org/ (accessed on 1 December 2022).
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - ASTM D2487; Standard Test Method for Classification of Soils for Engineering Purposes. 1969: R1975. ASTM: West Conshohocken, PA, USA, 1975.
- Performance Tests to Start on High-Speed Metro Line to Istanbul Airport. Available online: https://www.dailysabah.com/business/transportation/performance-tests-to-start-on-high-speed-metro-line-to-istanbul-airport?gallery_image=undefined#big (accessed on 10 January 2023).
- Suthar, B.; Patel, H.; Goswami, A. A survey: Classification of imputation methods in data mining. Int. J. Emerg. Technol. Adv. Eng.
**2012**, 2, 309–312. [Google Scholar] - Maniraj, S.P.; Chaudhary, D.; Deep, V.H.; Singh, V.P. Data aggregation and terror group prediction using machine learning algorithms. Int. J. Recent Technol. Eng.
**2019**, 8, 1467–1469. [Google Scholar] [CrossRef] - Scikit-Learn Imputers. Available online: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.impute (accessed on 1 December 2022).
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res.
**2002**, 16, 321–357. [Google Scholar] [CrossRef] - Ayhan, D. Multi-Class Classification Methods Utilizing Mahalanobis Taguchi System ans a Re-Sampling Approach for Imbalanced Data Sets. Master’s Thesis, Middle East Technical University, Ankara, Turkey, 2009. [Google Scholar]
- Yao, Q.; Yang, H.; Bao, B.; Yu, A.; Zhang, J.; Cheriet, M. Core and spectrum allocation based on association rules mining in spectrally and spatially elastic optical networks. IEEE Trans. Commun.
**2021**, 69, 5299–5311. [Google Scholar] [CrossRef] - Aksoy, S. Classsification of VOC Vapors Using Machine Learning Algorithm. Master’s Thesis, Yildiz Technical University, Istanbul, Turkey, 2022. [Google Scholar]
- Mrva, J.; Neupauer, Š.; Hudec, L.; Ševcech, J.; Kapec, P. Decision support in medical data using 3D decision tree visualisation. In Proceedings of the 2019 E-Health and Bioengineering Conference (EHB), Iasi, Romania, 21–23 November 2019; pp. 1–4. [Google Scholar]
- Xu, Y.; Shang, L.; Ye, J.; Qian, Q.; Li, Y.F.; Sun, B.; Li, H.; Jin, R. Dash: Semi-supervised learning with dynamic thresholding. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 11525–11536. [Google Scholar]
- Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res.
**2020**, 9, 381–386. [Google Scholar] - Zan, Ç.Ö. Prediction of Soil Radon Gas Using Meteorological Parameters with Machine Learning Algorithms. Master’s Thesis, Dokuz Eylül University, İzmir, Turkey, 2021. [Google Scholar]
- Taud, H.; Mas, J.F. Multilayer perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Springer: Cham, Switzerland, 2018; pp. 451–455. [Google Scholar]
- Principles and Techniques of Data Science. Available online: https://www.samlau.me/test-textbook/ch/11/gradient_stochastic.html (accessed on 6 December 2022).
- Sharma, A.; Paliwal, K.K. Linear discriminant analysis for the small sample size problem: An overview. Int. J. Mach. Learn. Cybern.
**2015**, 6, 443–454. [Google Scholar] [CrossRef][Green Version] - A Comprehensive Guide to Ensemble Learning (with Python Codes). Available online: https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/ (accessed on 4 December 2022).
- What is Gradient Boosting and How Is It Different from AdaBoost? Available online: https://www.mygreatlearning.com/blog/gradient-boosting/ (accessed on 10 January 2023).
- Tharwat, A.; Gaber, T.; Awad, Y.M.; Dey, N.; Hassanien, A.E. Plants identification using feature fusion technique and bagging classifier. In Proceedings of the 1st International Conference on Advanced Intelligent System and Informatics (AISI2015), Beni Suef, Egypt, 28–30 November 2015; Springer: Cham, Switzerland, 2016; pp. 461–471. [Google Scholar]
- Cao, J.; Kwong, S.; Wang, R. A noise-detection based AdaBoost algorithm for mislabeled data. Pattern Recognit.
**2012**, 45, 4451–4465. [Google Scholar] [CrossRef] - What is LightGBM Algorithm, How to Use It? Available online: https://www.analyticssteps.com/blogs/what-light-gbm-algorithm-how-use-it (accessed on 4 December 2022).
- Scikit-Learn Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html (accessed on 6 December 2022).
- Daldır, I. Machine Learning Based Analysis and Prediction of Flight Delays in Aviation Industry. Ph.D. Thesis, Akdeniz University, Antalya, Turkey, 2021. [Google Scholar]
- Yılmaz, E. Higher Education Planning and Decision Support System: A Case of Technology Faculty. Master’s Thesis, Marmara University, Istanbul, Turkey, 2022. [Google Scholar]
- Balanced Accuracy: When Should You Use It? Available online: https://neptune.ai/blog/balanced-accuracy (accessed on 13 December 2022).
- Sklearn.tree decisiontreeclassifier. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html (accessed on 13 December 2022).
- Harlianto, P.A.; Adji, T.B.; Setiawan, N.A. Comparison of machine learning algorithms for soil type classification. In Proceedings of the 2017 3rd International Conference on Science and Technology-Computer (ICST), Yogyakarta, Indonesia, 11–12 July 2017; pp. 7–10. [Google Scholar]

**Figure 1.**Consistency of cohesive soils [6].

**Figure 3.**Istanbul city and research points of the study [45].

**Table 1.**US standard sieve sizes [7].

Sieve No. | Opening (mm) | Sieve No. | Opening (mm) |
---|---|---|---|

4 | 4.75 | 35 | 0.500 |

5 | 4.00 | 40 | 0.425 |

6 | 3.35 | 50 | 0.355 |

7 | 2.80 | 60 | 0.250 |

8 | 2.36 | 70 | 0.212 |

10 | 2.00 | 80 | 0.180 |

12 | 1.70 | 100 | 0.150 |

14 | 1.40 | 120 | 0.125 |

16 | 1.18 | 140 | 0.106 |

18 | 1.00 | 170 | 0.090 |

20 | 0.85 | 200 | 0.075 |

25 | 0.710 | 270 | 0.053 |

30 | 0.600 |

Features | Min | Max | Mean | Standard Deviation |
---|---|---|---|---|

Retaining No. 4 sieve | 0 | 29.4 | 0.4365 | 2.4987 |

Passing No. 200 sieve | 13.0 | 100 | 90.8599 | 12.9337 |

Liquid limit | 23.1 | 90.0 | 53.4743 | 11.7505 |

Plastic limit | 3.4 | 36.9 | 23.3158 | 29.6082 |

Plasticity index | 7.3 | 62.0 | 30.4820 | 11.0532 |

Features | Min | Max | Mean | Standard Deviation |
---|---|---|---|---|

Retaining No. 4 sieve | 0 | 29.4 | 0.4836 | 2.6136 |

Passing No. 200 sieve | 13.0 | 100 | 90.8523 | 12.9319 |

Liquid limit | 23.1 | 90.0 | 53.4301 | 10.8526 |

Plastic limit | 3.4 | 36.9 | 23.3168 | 4.1062 |

Plasticity index | 7.3 | 62.0 | 30.0529 | 8.4294 |

Actual | True | False | |
---|---|---|---|

Predicted | |||

Positive | True positive (TP) | False positive (FP) | |

Negative | False negative (FN) | True negative (TN) | |

Accuracy | TP + TN/TP + FP + TN + FN | ||

Precision | TP + TN/TP + FP + TN + FN | ||

Recall | TP/TP + FN | ||

F1-Score | (2 × precision × recall)/(Precision + recall) |

Features | CH | CL | MH | CI | SC | Total |
---|---|---|---|---|---|---|

Pre-SMOTE class distribution (full dataset) | 567 | 169 | 25 | 25 | 19 | 805 |

Pre-SMOTE class distribution (selective sample) | 60 | 60 | 25 | 25 | 19 | 189 |

Post-SMOTE class distribution (SMOTE sample) | 60 | 60 | 60 | 60 | 60 | 300 |

Confusion Matrix | ||||||
---|---|---|---|---|---|---|

Predicted | CH | CL | MH | CI | SC | |

CH | 53 | 4 | 1 | 2 | 0 | |

CL | 1 | 48 | 2 | 0 | 9 | |

True | MH | 0 | 1 | 18 | 0 | 0 |

CI | 2 | 1 | 0 | 21 | 1 | |

SC | 0 | 9 | 0 | 0 | 16 | |

Class | ||||||

Overall | Accuracy | Precision | Recall | F1-Score | ||

Accuracy | 0.8254 | CH | 0.8833 | 0.9464 | 0.8833 | 0.9138 |

Precision | 0.8188 | CL | 0.8000 | 0.7619 | 0.8000 | 0.7805 |

Recall | 0.8221 | MH | 0.9474 | 0.8571 | 0.9474 | 0.9000 |

F1-Score | 0.8193 | CI | 0.8400 | 0.9130 | 0.8400 | 0.8750 |

SC | 0.6400 | 0.6154 | 0.6400 | 0.6275 |

Features | Importance | Rank |
---|---|---|

Retaining No. 4 sieve | 0.0031 | 5 |

Passing No. 200 sieve | 0.2351 | 3 |

Liquid limit | 0.3584 | 1 |

Plastic limit | 0.2396 | 2 |

Plasticity index | 0.1639 | 4 |

Confusion Matrix | ||||||
---|---|---|---|---|---|---|

Predicted | CH | CL | MH | CI | SC | |

CH | 54 | 4 | 0 | 2 | 0 | |

CL | 1 | 46 | 2 | 0 | 11 | |

True | MH | 0 | 0 | 60 | 0 | 0 |

CI | 2 | 2 | 0 | 56 | 0 | |

SC | 0 | 9 | 0 | 0 | 51 | |

Class | ||||||

Overall | Accuracy | Precision | Recall | F1-Score | ||

Accuracy | 0.8900 | CH | 0.9000 | 0.9464 | 0.9000 | 0.9231 |

Precision | 0.8915 | CL | 0.7667 | 0.7541 | 0.7667 | 0.7603 |

Recall | 0.8900 | MH | 1.0000 | 0.9677 | 1.0000 | 0.9836 |

F1-Score | 0.8904 | CI | 0.9333 | 0.9655 | 0.9333 | 0.9492 |

SC | 0.8500 | 0.8226 | 0.8500 | 0.8361 |

Features | Importance | Rank | Rank Change |
---|---|---|---|

Retaining No. 4 sieve | 0.0012 | 5 | 0 |

Passing No. 200 sieve | 0.3212 | 1 | +2 |

Liquid limit | 0.2716 | 3 | −2 |

Plastic limit | 0.2881 | 2 | 0 |

Plasticity index | 0.1179 | 4 | 0 |

Class | Accuracy Change |
---|---|

CH | +2% |

CL | −4% |

MH | +6% |

CI | +9% |

SC | +21% |

Classifier | Python Package | Mean Accuracy (10-Fold-CV) | Mean Std. Dev.(10-Fold-CV) |
---|---|---|---|

Foundational: | |||

DecisionTreeClassifier | Scikit-learn | 0.9066 | 0.0771 |

MultiLayerPerceptronClassifier * | Scikit-learn | 0.7933 | 0.1624 |

KNeighborsClassifier | Scikit-learn | 0.7933 | 0.1854 |

GaussianNaiveBayes | Scikit-learn | 0.7900 | 0.1612 |

SupportVectorMachineClassifier | Scikit-learn | 0.7666 | 0.2027 |

LinearDiscriminantAnalysis | Scikit-learn | 0.7333 | 0.1527 |

StochasticGradientDescentClassifier | Scikit-learn | 0.5366 | 0.3760 |

Ensemble: | |||

XGBClassifier | XGBoost | 0.9033 | 0.0982 |

LGBMClassifier | LightGBM | 0.9000 | 0.1021 |

HistGradientBoostingClassifier | Scikit-learn | 0.8933 | 0.1030 |

GradientBoostingClassifier | Scikit-learn | 0.8866 | 0.1002 |

CatBoostClassifier | Catboost | 0.8866 | 0.1056 |

RandomForestClassifier | Scikit-learn | 0.8766 | 0.1256 |

BaggingClassifier ** | Scikit-learn | 0.7433 | 0.1414 |

AdaBoostClassifie | Scikit-learn | 0.6033 | 0.3787 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Aydın, Y.; Işıkdağ, Ü.; Bekdaş, G.; Nigdeli, S.M.; Geem, Z.W. Use of Machine Learning Techniques in Soil Classification. *Sustainability* **2023**, *15*, 2374.
https://doi.org/10.3390/su15032374

**AMA Style**

Aydın Y, Işıkdağ Ü, Bekdaş G, Nigdeli SM, Geem ZW. Use of Machine Learning Techniques in Soil Classification. *Sustainability*. 2023; 15(3):2374.
https://doi.org/10.3390/su15032374

**Chicago/Turabian Style**

Aydın, Yaren, Ümit Işıkdağ, Gebrail Bekdaş, Sinan Melih Nigdeli, and Zong Woo Geem. 2023. "Use of Machine Learning Techniques in Soil Classification" *Sustainability* 15, no. 3: 2374.
https://doi.org/10.3390/su15032374