Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis
Abstract
1. Introduction
2. Background and State of the Art
2.1. Mining Tailing and Geotechnical Challenge
2.2. Soil Stratigraphic Profiling
- is the corrected cone tip resistance;
- is the sleeve friction;
- is the total vertical overburden stress;
- is the effective vertical overburden stress.
2.3. Clustering Analysis
2.4. Related Work
3. Materials and Methods
3.1. Site Overview
3.2. Dataset Characterization
3.3. Clustering-Based Stratigraphic Profile
3.4. Implementation Details
- Pandas 2.2.3. This library contains an open-source data analysis and manipulation tool broadly used in machine learning projects. We rely on the Pandas library [35] for data management and processing.
- Scikit-learn 1.6.1. This library contains several algorithms used for machine learning purposes [36]. We used Scikit-learn version 1.6.1 to implement the clustering algorithms used in this research.
- matplotlib 3.10.3. This library contains a comprehensive set of tools for creating visualizations in Python [37]. We rely on Matplotlib to generate most of the visualizations presented in this paper, including the stratigraphic profiles.
4. Results
4.1. Model Selection and Tuning
4.2. Stratigraphic Profile Through Clustering
4.3. Stratigraphic Profile Based on Soil Behavior Index ()
5. Discussion
- Model tuning and selection depend on data quality and expertise. In the second step of the conducted method (Figure 4), we removed the variable from the input values of the models, a decision based on the analysis of clustering results generated by the k-means algorithm. As unsupervised learning methods lack an objective approach for feature selection, such decisions depend on the expertise of analysts in interpreting the graphical results, which is subjective.
- The generated stratigraphic profiles reflect the behavior of the specific dam. The set of clusters generated depends on the data used for model training; consequently, it is not possible to infer that a similar behavior will appear in other dams. Thus, reproductions of this study must consider an entire execution of the proposed workflow (Figure 4).
- Pore pressure values reflect conditions at the time of testing, without accounting for seasonal fluctuations or drainage effects during deposition. As long-term hydrogeological monitoring data were not available, measurements were incorporated as instantaneous values, which may limit the interpretation of hydro-mechanical behavior and its temporal variability. Additionally, the analysis did not include a correlation between the identified clusters and specific operational practices (e.g., discharge rate or particle size segregation) during dam construction, as detailed historical operational records were unavailable.
- Lack of validation through laboratory tests to confirm the detected stratigraphic transitions. Ideally, one should validate the clustering results through tension evaluation of undisturbed samples collected simultaneously with the raising of the tailing dam. However, the mining company has not conducted such a test in the past, and given the large area of the tailing dam and the depth of the samples, collecting undisturbed samples is impractical nowadays.
- k-means and MeanShift exhibit intrinsic algorithmic constraints. k-means assumes Euclidean distance and user-defined k, favoring compact, spherical clusters, which may split or merge transitional zones with gradual – shifts; MeanShift eliminates the need for k, but its results are highly sensitive to kernel bandwidth—large bandwidths merge thin depositional veneers, whereas small ones fragment noise into spurious units.
6. Conclusions
- k-means and MeanShift were the most effective methods for detecting geotechnically significant stratigraphic layers;
- DBSCAN and Affinity Propagation showed limitations in dealing with vertical CPTu data, resulting in either under- or over-segmentation;
- The index, although widely used, may overlook internal variations linked to depositional history or consolidation, which clustering methods can identify;
- When aligned by depth and construction phase, clustered profiles from temporally distinct tests revealed consistent stratigraphic patterns.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CPTu | Cone Penetration Tests with pore pressure measurements |
DBCV | Density-Based Clustering Validation |
Soil Behavior Type Index | |
DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
PCA | Principal Component Analysis |
Probability Distribution Function |
References
- Lune, T.; Powell, J.; Robertson, P. Cone Penetration Testing in Geotechnical Practice, 1st ed.; CRC Press: London, UK, 1997. [Google Scholar] [CrossRef]
- White, D. CPT equipment: Recent advances and future perspectives. In Cone Penetration Testing 2022; CRC Press: Boca Raton, FL, USA, 2022; pp. 66–80. [Google Scholar] [CrossRef]
- Adamo, N.; Al-Ansari, N.; Sissakian, V.; Laue, J.; Knutsson, S. Dam safety: The question of tailings dams. J. Earth Sci. Geotech. Eng. 2020, 11, 1–26. [Google Scholar] [CrossRef] [PubMed]
- Robertson, P. Soil classification using the cone penetration test. Can. Geotech. J. 1990, 27, 151–158. [Google Scholar] [CrossRef]
- Ekeanyanwu, C.V.; Obisakin, I.F.; Aduwenye, P.; Dede-Bamfo, N. Merging GIS and machine learning techniques: A paper review. J. Geosci. Environ. Prot. 2022, 10, 61–83. [Google Scholar] [CrossRef]
- Haghshenas, S.S.; Haghshenas, S.S.; Geem, Z.W.; Kim, T.H.; Mikaeil, R.; Pugliese, L.; Troncone, A. Application of harmony search algorithm to slope stability analysis. Land 2021, 10, 1250. [Google Scholar] [CrossRef]
- Zhou, B.; Li, C.; Andrade, J.E. Autonomous particle-shape-based classification and identification of calcareous soils through machine learning. Geotechnique 2024, in press. [Google Scholar] [CrossRef]
- Nierwinski, H.P.; Pfitscher, R.J.; Barra, B.S.; Menegaz, T.; Odebrecht, E. A practical approach for soil unit weight estimation using artificial neural networks. J. S. Am. Earth Sci. 2023, 131, 104648. [Google Scholar] [CrossRef]
- Vick, S. Planning, Design, and Analysis of Tailings Dams; BiTech: Richmond, BC, Canada, 1990. [Google Scholar]
- Chropeňová, D.; Slávik, I. Raising of Embankment of an Ore Tailings Pond and an Analysis of its Stability. Slovak J. Civ. Eng. 2023, 31, 24–33. [Google Scholar] [CrossRef]
- Lyu, Z.; Chai, J.; Xu, Z.; Qin, Y.; Cao, J. A Comprehensive Review on Reasons for Tailings Dam Failures Based on Case History. Adv. Civ. Eng. 2019, 2019, 4159306. [Google Scholar] [CrossRef]
- Robertson, P.K.; Melo, L.; Williams, D.J.; Wilson, G.W.; Report of the Expert Panel on the Technical Causes of the Failure of Feijão Dam I. Technical Report. B1 Technical Investigation Panel. 2019. Available online: https://bdrb1investigationstacc.z15.web.core.windows.net/assets/Feijao-Dam-I-Expert-Panel-Report-ENG.pdf (accessed on 31 July 2025).
- Liu, L.L.; Wang, Y. Quantification of stratigraphic boundary uncertainty from limited boreholes and its effect on slope stability analysis. Eng. Geol. 2022, 306, 106770. [Google Scholar] [CrossRef]
- Jewell, R.J.R.J.; Fourie, A.B. Paste and Thickened Tailings: A Guide, 3rd ed.; Jewell, R.J., Fourie, A.B., Eds.; Australian Centre for Geomechanics, University of Western Australia: Nedlands, Australia, 2015. [Google Scholar]
- Wroth, C.P. The interpretation of in situ soil tests. Géotechnique 1984, 34, 449–489. [Google Scholar] [CrossRef]
- Ezugwu, A.E.; Ikotun, A.M.; Oyelade, O.O.; Abualigah, L.; Agushaka, J.O.; Eke, C.I.; Akinyelu, A.A. A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects. Eng. Appl. Artif. Intell. 2022, 110, 104743. [Google Scholar] [CrossRef]
- Nazareth, A.F.D.V.; Lana, M.S. A methodology for the definition of geotechnical mine sectors based on multivariate cluster analysis. Geotech. Geol. Eng. 2021, 39, 4405–4426. [Google Scholar] [CrossRef]
- Frey, B.J.; Dueck, D. Clustering by passing messages between data points. Science 2007, 315, 972–976. [Google Scholar] [CrossRef]
- Min, E.; Guo, X.; Liu, Q.; Zhang, G.; Cui, J.; Long, J. A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 2018, 6, 39501–39514. [Google Scholar] [CrossRef]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Statistical Laboratory, University of California, 21 June–18 July 1965 and 27 December 1965–7 January 1966; University of California Press: Oakland, CA, USA, 1967; pp. 281–298. [Google Scholar]
- Celebi, M.E.; Kingravi, H.A.; Vela, P.A. A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 2013, 40, 200–210. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
- Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 2017, 42, 1–21. [Google Scholar] [CrossRef]
- Comaniciu, D.; Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 603–619. [Google Scholar] [CrossRef]
- Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1995, 17, 790–799. [Google Scholar] [CrossRef]
- Wang, K.; Zhang, J.; Li, D.; Zhang, X.; Guo, T. Adaptive affinity propagation clustering. arXiv 2008, arXiv:0805.1096. [Google Scholar] [CrossRef]
- San Roman Iturbide, O.; Botero Jaramillo, E. Identification of geotechnical units in soil exploration through principal component analysis and clustering. Int. J. Numer. Anal. Methods Geomech. 2024, 48, 1681–1699. [Google Scholar] [CrossRef]
- Sottile, M.G.; Crocker, J.A.; Roldan, L. Interpretation of CPTu data using machine learning techniques to develop the ground model of a dam. In Proceedings of the 7th International Conference on Geotechnical and Geophysical Site Characterization (ISC 24), Barcelona, Spain, 18–21 June 2024. [Google Scholar] [CrossRef]
- Cho, S.; Cho, B.; Kang, S.; Kim, H. Development of locally specified soil stratification method with CPT data based on machine learning techniques. In Proceedings of the Geotechnics for Sustainable Infrastructure Development, Hanoi, Vietnam, 28–29 November 2019; Springer: Singapore, 2020; pp. 1287–1294. [Google Scholar] [CrossRef]
- Shi, C.; Wang, Y. Nonparametric and data-driven interpolation of subsurface soil stratigraphy from limited data using multiple point statistics. Can. Geotech. J. 2021, 58, 261–280. [Google Scholar] [CrossRef]
- Nierwinski, H.P.; Custodio, L.A.; Barbosa, A.S.; Pfitscher, R.J. Use of Artificial Intelligence to Obtain a StratiGraphic Profile of Tailings Dams from CPTu Tests. In Geo-EnvironMeet 2025; ASCE Library: Reston, VA, USA, 2025; pp. 315–323. [Google Scholar] [CrossRef]
- Dauda, U.; Ismail, B. A study of normalization approach on K-means clustering algorithm. Int. J. Appl. Math. Stat 2013, 45, 439–446. [Google Scholar]
- Abdulnassar, A.; Nair, L.R. A Comprehensive Study on the Importance of the Elbow and the Silhouette Metrics in Cluster Count Prediction for Partition Cluster Models. Rev.-Geintec-Gest. Inov. Tecnol. 2021, 11, 3792–3806. [Google Scholar] [CrossRef]
- Moulavi, D.; Jaskowiak, P.A.; Campello, R.J.; Zimek, A.; Sander, J. Density-based clustering validation. In Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 24–26 April 2014; SIAM: Philadelphia, PA, USA, 2014; pp. 839–847. [Google Scholar] [CrossRef]
- McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28 June–3 July 2010; pp. 56–61. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D graphics environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Vendramin, L.; Campello, R.J.; Hruschka, E.R. Relative clustering validity criteria: A comparative overview. Stat. Anal. Data Mining Asa Data Sci. J. 2010, 3, 209–235. [Google Scholar] [CrossRef]
Range | Soil Behavior Type |
---|---|
<1.31 | Gravelly sand to sand |
– | Sand to silty sand |
– | Silty sand to sandy silt |
– | Clayey silt to silty clay |
>2.95 | Clay |
Test | Num | Max Depth (m) | (kPa) | (kPa) | u (kPa) |
---|---|---|---|---|---|
CPTU04-21-1-2005 | 753 | −15.040 | 1138.03 | 10.80 | 27.55 |
CPTU05-15-1-2005 | 903 | −18.040 | 1931.26 | 17.11 | 55.72 |
CPTU06-1-2-2005 | 1015 | −20.280 | 1324.05 | 13.26 | 138.00 |
CPT00190-11-01-24 | 6305 | −31.810 | 5441.29 | 69.99 | 203.04 |
CPT00190B-13-01-24 | 5209 | −26.290 | 4185.61 | 50.40 | 37.79 |
CPT00190C-15-01-24 | 6315 | −31.810 | 3644.32 | 49.69 | 79.49 |
CPT00190G-06-02-24 | 6348 | −31.805 | 5157.49 | 43.32 | 113.02 |
CPT00191-31-07-24 | 6347 | −31.805 | 3339.42 | 105.94 | 461.91 |
CPT00191A-03-08-24 | 5158 | −25.860 | 3588.67 | 99.79 | 273.65 |
CPT00191B-17-08-24 | 904 | −4.595 | 3641.19 | 161.87 | 24.48 |
CPT00191C-19-08-24 | 1643 | −8.290 | 4006.88 | 128.28 | 23.00 |
CPT00191D-20-08-24 | 5441 | −27.280 | 3696.87 | 114.33 | 297.06 |
Model | Parameter | Values |
---|---|---|
k-Means | k | |
MeanShift | (bandwidth) | |
DBSCAN | ; minPts | ; |
Affinity Propagation | preference |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nierwinski, H.P.; Gabardo, A.M.P.; Pfitscher, R.J.; Piton, R.; Oliveira, E.; Biondo, M. Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis. Metrology 2025, 5, 48. https://doi.org/10.3390/metrology5030048
Nierwinski HP, Gabardo AMP, Pfitscher RJ, Piton R, Oliveira E, Biondo M. Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis. Metrology. 2025; 5(3):48. https://doi.org/10.3390/metrology5030048
Chicago/Turabian StyleNierwinski, Helena Paula, Arthur Miguel Pereira Gabardo, Ricardo José Pfitscher, Rafael Piton, Ezequias Oliveira, and Marieli Biondo. 2025. "Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis" Metrology 5, no. 3: 48. https://doi.org/10.3390/metrology5030048
APA StyleNierwinski, H. P., Gabardo, A. M. P., Pfitscher, R. J., Piton, R., Oliveira, E., & Biondo, M. (2025). Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis. Metrology, 5(3), 48. https://doi.org/10.3390/metrology5030048