Next Article in Journal
Evaluating a New Optical Device for Velocity-Based Training: Validity and Reliability of the PowerTrackTM Sensor
Previous Article in Journal
Optical Fringe Projection: A Straightforward Approach to 3D Metrology
 
 
Article
Peer-Review Record

Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis

by Helena Paula Nierwinski 1,*, Arthur Miguel Pereira Gabardo 1, Ricardo José Pfitscher 1, Rafael Piton 2, Ezequias Oliveira 2 and Marieli Biondo 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Submission received: 15 May 2025 / Revised: 18 July 2025 / Accepted: 30 July 2025 / Published: 6 August 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors presented an interesting study on implementation of machine learning for CPTu Data Processing and Stratigraphic Analysis. The application of machine learning in geotechnical engineering is an important and emerging concept. The manuscript has only few minor issues enlisted below:

(1) Add a flowchart to show the methodology of the work. It will help the readers to understand the methodology in a better way and also enhance the number of citations of the article.

(2) The study is conducted on a dam in Brazil. Provide the location details of the dam. If possible provide some figure with longitude and latitude.

(3) In discussion compare your study with some published studies. Also state the limitations of the study.

(4) Check if the sources of all the dataset are mentioned and cited in the manuscript.

********ALL THE BEST*******************

Author Response

Please find attached the response letter.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript presents a machine learning-based framework to enhance stratigraphic interpretation from CPTu data  to capture the internal heterogeneity typical of mining tailings deposits. It provides a new method for more informed geomechanical modeling, dam monitoring, and design. I only have some minor revision that I hope can help improve the manuscript.
1. Line 319: A preprocessing step is performed to removed the invalid entries. I want to know what specific criteria are used to determine whether it is invalid data.
2. Figure 3 presents the domain of values. I wonder if there is any basis for this range of values.
3. Figure 5 should have subfigure titles, just like (a), (b), (c), (d) in Figure 4.
4. The coordinate axes in Figure 6 should have names.
5. In Conclusion, k-means and MeanShift were the most effective methods for detecting geotechnically significant stratigraphic layers. As presented in Discussion, they also have their own shortcomings and applicable conditions, which should be introduced there.

Author Response

Please find attached the response letter.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

1. Enhance Discussion on Clustering Limitations: While the study effectively demonstrates the utility of k-means and MeanShift, expand the discussion on their limitations in capturing subtle stratigraphic transitions, especially in highly heterogeneous tailings. Compare these limitations with traditional I_c methods to strengthen the argument for integrative approaches.

2. Clarify Feature Selection Rationale: The decision to exclude f_s from clustering requires more detailed justification. Present statistical analyses (e.g., correlation matrices, variance inflation factor) to demonstrate how f_s introduces noise, and validate whether this exclusion impacts specific stratigraphic layers.

3. Expand Temporal Analysis: The 19-year dataset offers a unique temporal perspective, but the current analysis focuses on two soundings. Consider adding a broader temporal comparison across multiple CPTu profiles to generalize how clustering captures consolidation and deposition trends over time.

4. Validate with Laboratory Data: Strengthen the validation by integrating laboratory test results (e.g., grain size analysis, compressibility tests) with clustering outputs. This would establish a direct link between machine learning-derived strata and physical soil properties.

5. Address Computational Efficiency: For practical implementation, discuss the computational efficiency of k-means and MeanShift across large CPTu datasets. Compare processing times and memory requirements with other algorithms to highlight scalability.

Author Response

Please find attached the response letter.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This study presents a commendable effort to apply machine learning (ML) techniques to CPTu data for stratigraphic analysis in tailings dams, addressing a critical gap in geotechnical characterization. The integration of unsupervised clustering algorithms (k-means, MeanShift, DBSCAN, Affinity Propagation) to interpret CPTu profiles demonstrates innovation, particularly in leveraging ML to capture depositional heterogeneity missed by traditional indices like Ic. The use of a 19-year dataset from an iron tailings dam provides valuable temporal context, and the validation against Ic -based profiles offers practical relevance. While the methodology shows promise, clarifications on ML model calibration, data normalization, and field validation would strengthen the conclusions. The study contributes to advancing data-driven approaches in geotechnical engineering, though further exploration of spatial variability and real-world application cases is warranted.

Major comments are summarized as:

  1. How were invalid CPTu measurements identified and removed, and what criteria defined "out-of-distribution" values? Were any regional calibration factors applied to adjust for local soil behaviors?
  2. Why was fs excluded from the final ML models despite its relevance to soil-friction behavior? Does excluding fs affect the differentiation between soil types with similar qtand u2 values?
  3. For k-means, the elbow method suggested k=4, but the silhouette index favored k=3. What geotechnical justifications prioritized k=4 over statistical metrics?
  4. When comparing 2005 and 2024 CPTu profiles, how was the vertical offset from dam raises quantified? Were adjustments made for consolidation-induced thickness changes?
  5. Beyond Ic, were ML-derived clusters validated against core samples, lab tests, or independent in-situ measurements (e.g., SPT, vane shear)?
  6. The dataset is from a single Brazilian tailings dam. How transferable are the clustering results to other tailings deposits with different mineralogies or deposition histories?
  7. Affinity Propagation showed over-segmentation. Given its quadratic complexity, was the algorithm optimized for large CPTu datasets (155,000 records)?
  8. Did the ML models account for depth-dependent soil properties (e.g., stress history, overburden pressure) when defining clusters?
  9. Pore pressure u2 was included, but how were seasonal water table fluctuations or drainage effects during tailings deposition incorporated?
  10. Could clusters be linked to specific operational practices (e.g., discharge rate, particle size segregation) during dam construction?
  11. The study focuses on 1D profiles. Are there plans to extend the methodology to 3D stratigraphic modeling using spatial interpolation of ML clusters?

Author Response

Please find attached the response letter.

Author Response File: Author Response.pdf

Reviewer 5 Report

Comments and Suggestions for Authors

Dear authors,

Cone Penetration Tests with pore pressure measurements (CPTu) provide detailed subsurface soil profiles essential for geotechnical investigations. However, traditional interpretation methods can be time-consuming and may not fully capture complex soil stratigraphy, especially in heterogeneous deposits. Recent advances in machine learning offer powerful tools to automate and enhance the processing of CPTu data, enabling more accurate and efficient stratigraphic analysis. By applying unsupervised clustering algorithms to CPTu datasets, this approach improves the identification of soil layers and geotechnical units, supporting better ground modeling, site characterization, and engineering design. This methodology leverages large in-situ datasets and computational techniques to provide a replicable framework for interpreting complex subsurface conditions with greater precision and reduced subjectivity.

The article titled "Machine Learning-Based Approach for CPTu Data Processing and Stratigraphic Analysis" contains many   mysterious information.

The  Authors  invited to answer some questions for the quality of the article and then possibly publish it:

1-In geotechnical site investigations, what are Cone Penetration Tests with Pore Pressure Measurements (CPTu) and why are they significant?

2-Why do mining tailings deposits not respond well to conventional interpretation techniques such as the Soil Behavior Type Index (Ic)?


3-What is the primary goal of the research that the abstract describes?

 

4-Which machine learning clustering methods were assessed in this work, and what factors led to the selection of these specific algorithms?

5-What are the main features of the dataset used in this study, such as the quantity of soundings, duration, and location, and how was it gathered?


6-What standards were applied to evaluate the clustering algorithms' performance?

7-What particular geological properties might be identified by the clustering algorithms that yielded the most reliable stratigraphic segmentation?

8-In this case, how did DBSCAN and Affinity Propagation perform differently from k-means and MeanShift?


9-Which behavioral patterns that the Soil Behavior Type Index (Ic) by itself was unable to identify were exposed by clustering techniques?

Author Response

Please find attached the response letter.

Author Response File: Author Response.pdf

Reviewer 6 Report

Comments and Suggestions for Authors

The study makes a significant contribution to advancing geotechnical stratigraphic analysis through machine learning, with strong applied implications for dam safety and subsurface modeling. The design, comparing multiple unsupervised clustering techniques on real, longitudinal CPTu data, is methodologically sound and relevant to the problem. The contribution is useful for specialized geotechnical and mining applications, but lacks broader generalization or breakthrough methodological advancement.

  • Methodology lacks transparency and reproducibility.
  • Data preprocessing steps (e.g., outlier removal, data filtering) are vaguely described.

  • The exclusion of the sleeve friction parameter (fs) due to variability is not adequately justified.

Some references lack full DOI links.

Once these issues are addressed, the manuscript could be publishable.

 

Comments on the Quality of English Language

The manuscript's language is clear overall, though a final technical editing pass is recommended to refine phrasing (especially for complex technical descriptions).

Author Response

Please find attached the response letter.

Author Response File: Author Response.pdf

Round 2

Reviewer 6 Report

Comments and Suggestions for Authors

The methodology is technically sound. The paper thoroughly describes data preprocessing, model tuning, and validation procedures 

Lack of quantitative external validation (e.g., borehole logs, lab tests) limits confidence in the "ground truth" of clusters.

Application of MeanShift to CPTu data is novel and not commonly documented in geotechnical literature.

The multi-step workflow (Figure 4) is logical and replicable. Figures (e.g., Figure 5–10) effectively illustrate comparative results.

The stratigraphic consistency across 2005 and 2024 data enhances confidence in model reliability. Visual comparison between cluster profiles and Ic profiles (Figure 10) is thoughtfully executed.

 

Back to TopTop