Next Article in Journal
Biodiversity and the Recreational Value of Green Infrastructure in England
Next Article in Special Issue
Analyzing Physical-Mechanical and Hydrophysical Properties of Sandy Soils Exposed to Long-Term Hydrocarbon Contamination
Previous Article in Journal
School Walk Zone: Identifying Environments That Foster Walking and Biking to School
Previous Article in Special Issue
Automatic Fault Plane Solution for the Provision of Rapid Earthquake Information in South Korea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Locally Specified CPT Soil Classification Based on Machine Learning Techniques

1
Department of Civil and Environmental Engineering, Kookmin University, Seoul 02707, Republic of Korea
2
Earthquake Research Center, Korea Institute of Geoscience and Mineral Resources, Daejeon 34132, Republic of Korea
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(4), 2914; https://doi.org/10.3390/su15042914
Submission received: 22 November 2022 / Revised: 17 January 2023 / Accepted: 31 January 2023 / Published: 6 February 2023
(This article belongs to the Special Issue Sustainable Development of Geotechnical Engineering)

Abstract

:
Cone penetration tests (CPTs) can provide highly accurate and detailed information and characteristics relevant to the stiffness, strength, and consolidation of tested geomaterials, but they do not directly recover real soil samples. Thus, when CPT results are applied to soil classification, experience-based classification charts or tables are generally used. However, such charts or tables have the inherent drawback of being derived from the test data applied to each classification method, which promotes their failure to cover the engineering features of soils from other places. This study proposes a machine learning approach using C4.5 decision tree algorithm to develop a locally specified CPT-based soil classification system. The findings demonstrate that a locally specified soil classification scheme can be attained by utilizing a simple and trained decision tree model with appropriate combinations of training data and input attributes. Additionally, it is confirmed that oversampling the minor classes makes the classification accuracy for data with highly unbalanced classes appear more balanced for each class.

1. Introduction

1.1. Data Mining in Geotechnical Engineering

Data mining is a process that uses a machine learning algorithm to identify systematic and statistical patterns or rules present in massive amounts of data. The accessibility and collectability of vast amounts of data in practically all fields have improved with the development of information and computing technology, which has encouraged the use of data mining techniques for various purposes in business, education, and research. Numerous studies have been conducted in the field of geotechnical engineering to apply data mining techniques in areas such as the analysis of geotechnical investigation, analysis of design and behavior of foundations, application and estimation of soft ground behavior, and soil-induced disaster countermeasure techniques.
Data mining and data-driven techniques enable the generation of mathematically flexible relationships capable of adequately representing the dynamics of geo-systems [1], beginning with the empirical formulation of geotechnical design parameters and site characterization. Although data-driven general or local models exist, it is not possible to determine site-specific site characteristics (stratification, discontinuities, anomalies, physical/mechanical qualities, etc.) using only well-made principle components and trained models for another case study [2,3]. The primary goal of data-driven site characterization is to classify the soil or site type, or generate a subsurface map based on big data and artificial intelligence technology.
The use of data mining tools for stratification is quite prevalent. With the recent development of data acquisition and processing methods, data mining techniques such as decision trees, support vector machines, clustering, and artificial neural networks have been proposed as alternatives to traditional soil classification methods. Additionally, stratification is carried out via semi-supervised clustering, which combines supervised learning, which makes predictions based on patterns in the data, and unsupervised learning, which unearths the hidden patterns in the data [4,5,6,7,8,9,10,11,12,13,14,15,16]. If the decision tree approach is applied to data stratification, the analysis results can provide a soil classification scheme that considers the local engineering characteristics of the in situ soils.
The C4.5 algorithm, a representative decision tree method, is applied in this study to propose a locally relevant soil classification scheme using the results of cone penetration tests conducted at a specific construction site.

1.2. CPT Data for Model Training and Testing

The cone penetration test (CPT) is a typical geotechnical field test capable of continuous, frequent, and rapid depth measurements. However, unlike standard penetration tests or boring, CPT does not recover the soil samples while investigating; thus, it is inherently limited to elucidating the actual underground engineering characteristics, including soil type identification. However, it is costly and time-consuming to conduct borings and CPT at the same location. After the development of the friction mantle cone, various soil classification charts with CPT results were proposed by many researchers [17,18,19,20,21,22,23,24]. Among these, the classification methods proposed by Robertson are the most widely used worldwide. The proposed CPT classification charts reflect the overall engineering characteristics of each soil and can classify soils rationally; however, they have the inherent shortcomings of not considering the unique features of local geomaterials. Several other soil classification methods have limitations of false classifications that can occur owing to the bias of the data used while developing each method.
Robertson [16] introduced two parameters, normalized cone penetration resistance and friction ratio—considering the effect of the overburden surcharge—and proposed the soil classification chart shown in Figure 1. It has been modified long-term as the world’s most representative CPT soil classification method. Robertson’s chart classifies soils with respect to soil classes called the soil behavior types (SBTs), as listed in Table 1. The normalized cone tip resistance Qt and friction ratio Fr are defined as
Q t = q t σ v p a
F r = ( f s q t σ v ) × 100   ( % )
where Qt is the normalized cone tip resistance, qt is the corrected cone tip resistance, σv is the total vertical stress, pa is the atmospheric pressure, Fr is the friction ratio, and fs is the sleeve friction resistance.
The data used in this study were obtained from a construction site in the Hwajeon area of Busan, South Korea, as shown in Figure 1 [25]. Cone penetration and standard penetration tests were conducted simultaneously at the same location for comparison and verification. Ten testing locations inside the site were selected and tested, as described on the map. As described in Table 1, soil classes with depth were determined with respect to the unified soil classification system [26] and categorized into three groups: CH and CL in clays (SBT 1, 2, and 3); MH, ML, SC, and SM in mixtures of silt, sand, and clay (SBT 4 and 5); and SW and SP in sands (SBT 6 and 7).
According to the SPT results, the total CPT measurements of 12,112 points consist of 10,314 points for clay (approximately 85% in total), 1411 points for mixtures, and 387 points for sand, which can be considered for a clay-dominant soil deposit. As shown in Figure 2 and Table 2, using Robertson’s CPT classification chart, the classification accuracy for clay was 94%, but those for mixtures and sand were only 27 and 21%, respectively.

2. Data Mining of the CPT Test Results for the Soil Classification

2.1. C4.5 Algorithm

The decision tree model was selected to classify the tested soil layers using the attributes measured and obtained from the field tests. The decision tree model is a data mining method for categorizing or estimating classes according to decision rules. The most dominant attribute among the various input attributes of the data can be identified, and a tree model is built, making it simple to analyze and comprehend the trained categorization process. A representative decision tree model, the C4.5 algorithm, was applied to classify the soils using the measured CPT data described in Figure 1 and Figure 2.
The C4.5 classification algorithm is an improved decision tree algorithm for ID3 developed by Quinlan [27]. The algorithm allows for discrete property processing, unlike the ID3 method, which can only process nominal data. Entropy is defined and computed based on information theory, similar to the ID3 algorithm. The classification criteria were determined based on entropy values, and the classification was performed in the direction of decreasing entropy. The degree of decision uncertainty for a particular data classification is measured by entropy, which is defined as:
Entropy ( S ) = i = 1 n p i log 2 ( p i )
where Entropy(S) is the entropy for dataset S and pi is the probability of occurrence for class i.
Indicators such as information gain, gain ratio, and the Gini index are quantitative measures of the degree of uncertainty reduction. The C4.5 algorithm applies the Gain Ratio, which is the ratio of the amount of information gain for a given split criterion to the split entropy for the decision, as shown in Figure 3. The C4.5 algorithm has the advantage of being able to process continuous and discrete data together.

2.2. Input Attribute and Training Data Selection

As the number of input attributes for model training increases, the number of data dimensions also increases, which causes overfitting and makes it difficult to derive a generalized classification model. The training result focuses only on the classification of the training dataset, and the overfitted model accuracy can be reduced for the testing dataset. The CPT data consist of 12,112 measurement points with the attribute candidates of cone tip resistance, sleeve friction, total and effective vertical stresses, friction ratio, and normalized cone resistance. Table 3 lists the four combinations (C#1, C#2, C#3, and C#4) of the attributes used for model training. C#1 includes measurements taken from the cone penetration tests. C#2 adds the attributes of the stress conditions to C#1. C#3 is the normalized cone tip resistance Qt and friction ratio Fr proposed by Robertson [16]. C#4 uses the multiplication and division of Qt and Fr to reflect the diagonal boundaries for a more accurate and efficient classification, as similarly presented in Figure 1. Because this area is mainly composed of clay, it is critical to distinguish between non-clayey soils and clays. Thus, among the testing boreholes in Figure 2, the testing results at SB-11, SB-18, and SB-22—where clays, sands, and mixed soils were found—were used as the training data.

2.3. Analysis Results

Figure 4 shows the classification results of the C4.5 algorithm tree models trained with the organized training data sets listed in Table 4. The model accuracy was computed based on how well the trained model classified the data based on the SPT results. For all the training sets, the models trained with C#2 showed approximately 100% accurate classification results and the highest accuracy among the tested attribute combinations. However, the models with other attribute combinations also exhibited an average accuracy of >90%.
The model performance was evaluated and compared by applying testing data to each model for verification. The classification results of the testing data are presented in Figure 5, Figure 6, Figure 7 and Figure 8 and summarized in Table 5.
As presented in Figure 5, Figure 6, Figure 7 and Figure 8 and Table 5, the accuracy of the test data is different depending on the combinations of the applied training data and input attributes. Considering the trend of the results presented in the table, the training results using C#3 and C#4, and the normalized parameters proposed by Robertson, show better accuracy than C#1 and C#2, to which direct testing measurements are applied. It is in the same line with the basic soil mechanical characteristics that soil strength depends mainly on the interparticle friction resistance, which is relevant to surcharge loads. The training data relevance to sensitivities of the models with C#3 and C#4 are also lower than those with C#1 and C#2, indicating a more reliable model. C#4, which uses an attribute that considers the diagonal line shown in Robertson’s diagram, exhibits better training results than C#3. The results suggest that, even though a classification criterion is introduced through data mining, the thoughtful selection of the input attributes and training data considering the conventional approaches helps train a data-driven model with a higher reliability, efficiency, and accuracy.
Figure 9 demonstrates the model trained by combining C#4 and T#4, which showed the best performance among the training results for comparison with Robertson’s diagram. As can be seen, because of the characteristics of the analyzed data, the clay area is wider than that of Robertson’s diagram, and the sand and mixture areas are relatively narrow. As shown in Table 6, the accuracy of the mixture classification is 58%, and that of sand is 53%. This implies that the data-driven diagram is improved more than twice in terms of classification accuracy, considering that the accuracy of Robertson’s diagram is 27% for mixtures and 21% for sand. As shown in this result, the model trained using all the measured data will promote the most accurate chart in spite of the risk of overfitting. However, when the ratio of classification classes is imbalanced, as in the present data, it is recommended that the training data are to be selected to compose the most balanced ratio for the soil types containing more measurements for sands or silts, which are relatively small ratios.
The difference in classification performance is due to the excessively high proportion of clay soils, which is greater than 85% in the currently applied data; thus, the classification performance for other soils is inevitably lowered. In this case, the oversampling technique was applied to derive the classification criteria obtained through training by intentionally increasing the weights of the classes with a lower composition ratio [28,29]. Through random oversampling examples [30], synthetic data are generated and applied for training so that the imbalance ratio (IR) between mixtures and sand for clay increases close to 1. As observed by Chawla et al. [31], synthetic data were applied only to the training set and not to the test set.
As in the previous parametric analysis, model training was conducted using the C#4-T#4 combination, which exhibited the best classification performance; the results are shown in Figure 10 and Table 7. Although there is a disadvantage of overfitting for sands and mixtures, as shown in the table, the classification accuracy for sands and mixtures were 84 and 89%, respectively, confirming that the classification performance was considerably improved. However, this reduces the classification accuracy for clays from 97% to 88% and overall classification accuracy from 91% to 88%.
The schematic diagram of the recommended process for the development of a locally specified soil classification scheme for a cone penetration test using C4.5 decision tree method is presented in Figure 11.

3. Conclusions

This study proposes the development of a locally specified soil classification scheme for cone penetration tests using the C4.5 decision tree method. The field measurements from the cone penetration tests and standard penetration tests conducted at the clay-dominant deposit are moderately combined and used for the training and testing data. The following conclusions are drawn: First, the conventional soil classification charts or tables for CPT results have inherent limitations that make it difficult to represent regional soil characteristics. This study’s findings for the non-clayey soils support such a flaw. Second, this study theorizes that more locally adapted stratification can be obtained with a soil classification scheme developed through simple data mining techniques applying the local data. Third, model test results indicate that it is imperative to choose the input attributes and training data that is more suitable for model training with sufficient consideration of the conventional classification methods. Fourth, it is confirmed for data with highly imbalanced classes that the classification accuracy appears more balanced for each class through oversampling the minor classes.

Author Contributions

Conceptualization, S.C., H.-S.K. and H.K.; methodology, H.K.; software, S.C.; validation, S.C., H.-S.K. and H.K.; formal analysis, S.C.; investigation, S.C., H.-S.K. and H.K.; resources, S.C. and H.K.; data curation, S.C.; writing—original draft preparation, S.C.; writing—review and editing, H.-S.K. and H.K.; visualization, S.C.; supervision, H.K.; project administration, H.K.; funding acquisition, H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1A2C1088527).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ljung, L. Model Validation and Model Error Modeling; Linköping University Electronic Press: Linköping, Sweden, 1999. [Google Scholar]
  2. Opolot, E.; Yu, Y.Y.; Finke, P.A. Modeling soil genesis at pedon and landscape scales: Achievements and problems. Quat. Int. 2015, 376, 34–46. [Google Scholar] [CrossRef]
  3. Phoon, K.K.; Ching, J.; Shuku, T. Challenges in data-driven site characterization. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2022, 16, 114–126. [Google Scholar] [CrossRef]
  4. Odeh, I.O.A.; McBratney, A.B.; Chittleborough, D.J. Soil pattern recognition with fuzzy-c-means: Application to classification and soil-landform interrelationships. Soil Sci. Soc. Am. J. 1992, 56, 505–516. [Google Scholar] [CrossRef]
  5. Najjar, Y.M.; Basheer, I.A.; Naouss, W.A. On the identification of compaction characteristics by neuronets. Comput. Geotech. 1996, 18, 167–187. [Google Scholar] [CrossRef]
  6. Juang, T.C.; Wang, M.K.; Chen, H.J.; Tan, C.C. Ammonium fixation by surface soils and clays. Soil Sci. 2001, 166, 345–352. [Google Scholar] [CrossRef]
  7. Bhattacharya, B.; Solomatine, D.P. Machine learning in soil classification. Neural Netw. 2006, 19, 186–195. [Google Scholar] [CrossRef] [PubMed]
  8. Das, S.K.; Basudhar, P.K. Utilization of self-organizing map and fuzzy clustering for site characterization using piezocone data. Comput. Geotech. 2009, 36, 241–248. [Google Scholar] [CrossRef]
  9. Bhargavi, P.; Jyothi, S. Soil classification using data mining techniques: A comparative study. Int. J. Eng. Trends Technol. 2011, 2, 55–59. [Google Scholar]
  10. Wang, X.; Wang, H.; Liang, R.Y.; Liu, Y. A semi-supervised clustering-based approach for stratification identification using borehole and cone penetration test data. Eng. Geol. 2019, 248, 102–116. [Google Scholar] [CrossRef]
  11. Cao, Z.; Wang, Y. Bayesian approach for probabilistic site characterization using cone penetration tests. J. Geotech. Geoenvironmental Eng. 2013, 139, 267–276. [Google Scholar] [CrossRef]
  12. Carvalho, L.O.; Ribeiro, D.B. A multiple model machine learning approach for soil classification from cone penetration test data. Soils Rocks 2021, 44, 1–14. [Google Scholar] [CrossRef]
  13. Rauter, S.; Tshuchnigg, F. CPT data interpretation employing different machine learning techniques. Geosciences 2021, 11, 265. [Google Scholar] [CrossRef]
  14. Erharter, G.H.; Oberhollenzer, S.; Frankhauser, A.; Marte, R.; Marcher, T. Learning decision boundaries for cone penetration test classification. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 489–503. [Google Scholar] [CrossRef]
  15. Taiaousi, D.; Travasarou, T.; Drosos, V.; Ugalde, J.; Chacko, J. Machine learning applications for site characterization based on CPT data. Geotech. Earthq. Eng. Soil Dyn. V 2018, 461–472. [Google Scholar] [CrossRef]
  16. Taluja, C.; Thakur, R. Recent trends of machine learning in soil classification: A review. Int. J. Comput. Eng. Res. 2018, 8, 25–32. [Google Scholar]
  17. Begemann, H.K.S. The friction jacket cone as an aid in determining the soil profile. In Proceedings of the 6th International Conference on Soil Mechanics and Foundation Engineering, Montréal, QC, Canada, 8–15 September 1965; pp. 17–20. [Google Scholar]
  18. Douglas, B.J.; Olsen, R.S. Soil Classification Using Electric Cone Penetrometer. In Proceedings of the Symposium on Cone Penetration Testing and Experience, Geotechnical Engineering Division ASCE, St. Louis, MO, USA, October 1981; pp. 209–227. [Google Scholar]
  19. Robertson, P.K.; Campanella, R.G. SPT-CPT correlations. J. Geotech. Div. 1983, 109, 1449–1460. [Google Scholar] [CrossRef]
  20. Robertson, P.K.; Campanella, R.G.; Gillespie, D.; Greig, J. Use of piezometer cone data. Proc. Am. Soc. Civ. Eng. 1986, 6, 1263–1280. [Google Scholar]
  21. Robertson, P.K. Soil classification using the cone penetration test. Can. Geotech. J. 1990, 27, 151–158. [Google Scholar] [CrossRef]
  22. Robertson, P.K.; Wride, C.E. Evaluating cyclic liquefaction potential using the cone penetration test. Can. Geotech. J. 1998, 35, 442–459. [Google Scholar] [CrossRef]
  23. Robertson, P.K. Interpretation of cone penetration tests—A unified approach. Can. Geotech. J. 2009, 46, 1337–1355. [Google Scholar] [CrossRef]
  24. Robertson, P.K.; Cabal, K.L. Guide to Cone Penetration Testing, 6th ed.; Gregg Drilling Inc.: Signal Hill, CA, USA, 2014. [Google Scholar]
  25. Dea Woo E&C. Site Investigation Report for Free Economic Zone Development at Hwajeon; Dea Woo E&C: Busan, Republic of Korea, 2006. [Google Scholar]
  26. ASTM D2487-17; Standard Practice for Classification of Soils for Engineering Purposes (Unified Soil Classification System). American Standards for Testing Materials. ASTM International: West Conshohocken, PA, USA, 2017.
  27. Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
  28. He, H.; Ma, Y. Imbalanced Learning: Foundations, Algorithms, and Applications; Wiley-IEEE Press: Hoboken, NJ, USA, 2013. [Google Scholar]
  29. Demir, S.; Şahin, E.K. Liquefaction prediction with robust machine learning algorithms (SVM, RF, and XGBoost) supported by genetic algorithm-based feature selection and parameter optimization from the perspective of data processing. Environ. Earth Sci. 2022, 81, 1–17. [Google Scholar] [CrossRef]
  30. Menardi, G.; Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 2014, 28, 92–122. [Google Scholar] [CrossRef]
  31. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Figure 1. The locations of Cone Penetration Tests (Hwajeon area in Busan, South Korea).
Figure 1. The locations of Cone Penetration Tests (Hwajeon area in Busan, South Korea).
Sustainability 15 02914 g001
Figure 2. Projection of the SPT boring log results to Robertson’s CPT classification chart.
Figure 2. Projection of the SPT boring log results to Robertson’s CPT classification chart.
Sustainability 15 02914 g002
Figure 3. Flow chart for C4.5 decision tree algorithm.
Figure 3. Flow chart for C4.5 decision tree algorithm.
Sustainability 15 02914 g003
Figure 4. Classification results of the C4.5 model for different training sets with the various input attribute combinations.
Figure 4. Classification results of the C4.5 model for different training sets with the various input attribute combinations.
Sustainability 15 02914 g004
Figure 5. Classification results of C4.5 models using the input attribute combination #1 for the testing set with various training data sets.
Figure 5. Classification results of C4.5 models using the input attribute combination #1 for the testing set with various training data sets.
Sustainability 15 02914 g005
Figure 6. Classification results of C4.5 models using the input attribute combination #2 for the testing set with various training data sets.
Figure 6. Classification results of C4.5 models using the input attribute combination #2 for the testing set with various training data sets.
Sustainability 15 02914 g006
Figure 7. Classification results of C4.5 models using the input attribute combination #3 for the testing set with various training data sets.
Figure 7. Classification results of C4.5 models using the input attribute combination #3 for the testing set with various training data sets.
Sustainability 15 02914 g007
Figure 8. Classification results of C4.5 models using the input attribute combination #4 for the testing set with various training data sets.
Figure 8. Classification results of C4.5 models using the input attribute combination #4 for the testing set with various training data sets.
Sustainability 15 02914 g008
Figure 9. Graphical representation of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4. (a) Data-driven classification chart; (b) Robertson’s classification chart.
Figure 9. Graphical representation of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4. (a) Data-driven classification chart; (b) Robertson’s classification chart.
Sustainability 15 02914 g009
Figure 10. Graphical representation of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4 modified with oversampling.
Figure 10. Graphical representation of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4 modified with oversampling.
Sustainability 15 02914 g010
Figure 11. The schematic diagram of the recommended process for the development of a locally specified soil classification scheme for a cone penetration test using C4.5 decision tree method.
Figure 11. The schematic diagram of the recommended process for the development of a locally specified soil classification scheme for a cone penetration test using C4.5 decision tree method.
Sustainability 15 02914 g011
Table 1. Soil type definitions based on Robertson’s soil classification and SBTs.
Table 1. Soil type definitions based on Robertson’s soil classification and SBTs.
ClassificationSoil Behavior Type
SBT1Clays
(CH, CL)
Sensitive, fine grained
SBT2Organic soils—clay
SBT3Clay—silty clay to clay
SBT4Silt–Sand–Clay Mixtures
(MH, ML, SC, SM)
Silt mixtures—clayey silt to silty clay
SBT5Sand mixtures—silty sand to sandy silt
SBT6Sands
(SW, SP)
Sands—clean sand to silty sand
SBT7Gravelly sand to dense sand
SBT8Stiff SoilsVery stiff sand to clayey sand
SBT9Very stiff fine grained
Table 2. Summary of the Robertson’s chart classification results.
Table 2. Summary of the Robertson’s chart classification results.
TotalClaysMixturesSands
Total instances12,11210,3141411387
Misclassified19185771034307
Accuracy84%94%27%21%
Table 3. Combinations of input attributes.
Table 3. Combinations of input attributes.
CombinationInput Attributes
C#1qt, fs
C#2qt, fs, σv, σv
C#3Qt, Fr
C#4Qtn, Fr, Qt·Fr, Qt/Fr
Table 4. Combinations of training data sets.
Table 4. Combinations of training data sets.
CombinationSelected Training Data
D#1SB11 (1087, 297, 164)
D#2SB18 (1605, 70, 40)
D#3SB22 (555, 179, 41)
D#4SB11 and SB18 and SB22 (3247, 546, 245)
D#5All the measurements (10,314, 1411, 387)
Note: The numbers in parentheses indicate the number of measurements of clay, mixture, and sand from the SPT results.
Table 5. Summary of the classification result of C4.5 models from the various combinations of input attributes and training data sets.
Table 5. Summary of the classification result of C4.5 models from the various combinations of input attributes and training data sets.
C#1C#2C#3C#4
D#161.2%65.6%82.5%82.3%
D#286.7%90.4%87.1%87.2%
D#352.2%56.0%86.1%88.0%
D#483.1%90.3%89.9%90.5%
D#589.9%98.6%93.0%93.1%
Average75.5%81.4%87.7%88.2%
Table 6. Summary of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4.
Table 6. Summary of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4.
SoilsClaysMixturesSands
total instances12,11210,3141411387
misclassified1129357591181
accuracy91%97%58%53%
Table 7. Summary of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4 modified with oversampling.
Table 7. Summary of the data-driven classification result using the input attribute combination #4 for the testing set with the training data set D#4 modified with oversampling.
SoilsClaysMixturesSands
total instances12,11210,3141411387
misclassified1816123712562
accuracy85%88%89%84%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cho, S.; Kim, H.-S.; Kim, H. Locally Specified CPT Soil Classification Based on Machine Learning Techniques. Sustainability 2023, 15, 2914. https://doi.org/10.3390/su15042914

AMA Style

Cho S, Kim H-S, Kim H. Locally Specified CPT Soil Classification Based on Machine Learning Techniques. Sustainability. 2023; 15(4):2914. https://doi.org/10.3390/su15042914

Chicago/Turabian Style

Cho, Sohyun, Han-Saem Kim, and Hyunki Kim. 2023. "Locally Specified CPT Soil Classification Based on Machine Learning Techniques" Sustainability 15, no. 4: 2914. https://doi.org/10.3390/su15042914

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop