Next Article in Journal
Near-Real-Time Turbidity Monitoring at Global Scale Using Sentinel-2 Data and Machine Learning Techniques
Previous Article in Journal
Robust High-Precision Time Synchronization for Distributed Sensor Systems in Challenging Environments
 
 
Article
Peer-Review Record

OptiFusionStack: A Physio-Spatial Stacking Framework for Shallow Water Bathymetry Integrating QAA-Derived Priors and Neighborhood Context

Remote Sens. 2025, 17(22), 3712; https://doi.org/10.3390/rs17223712
by Wei Shen 1,2, Jinzhuang Liu 1,2, Xiaojuan Li 3,4, Dongqing Zhao 5, Zhongqiang Wu 6,* and Yibin Xu 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Remote Sens. 2025, 17(22), 3712; https://doi.org/10.3390/rs17223712
Submission received: 29 September 2025 / Revised: 1 November 2025 / Accepted: 10 November 2025 / Published: 14 November 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study proposes the OptiFusionStack framework, which couples QAA-derived inherent optical properties with 9 × 9 neighbourhood spatial statistics under the condition of no in-situ optical data and performs shallow-water bathymetry retrieval through a two-stage “physics-based base-model – stacked MLP” architecture. Compared with pixel-wise approaches and end-to-end deep-learning models, the framework raises R² to 0.92 across three optically contrasting sites and markedly suppresses spatial artefacts, demonstrating the high accuracy and strong generalisability of a physio-spatial strategy in data-scarce scenarios. Nevertheless, several issues in the current version remain and major revision is recommended before publication. Specific comments are as follows:

  1. Significance of Topic

(1) The manuscript focuses on satellite-derived bathymetry (SDB), which is crucial for nautical charting, coastal engineering, habitat monitoring and climate-adaptation planning.

(2) It explicitly tackles the dual bottlenecks of “physical ambiguity” and “spatial incoherence” that plague pixel-wise models in optically complex waters, offering a clear scientific gap.

(3) The “zero in-situ optical data” scenario is common in developing countries and remote reefs; the proposed solution is therefore highly relevant for operational mapping.

  1. Organization and logic

(1) The paper is well organized and progresses logically; the Methods section advances step-by-step from feature engineering, model construction to validation, with a clear technical pipeline.

(2) Results section is overly lengthy: The Results section is somewhat lengthy; several figures (e.g., scatter plots) are highly repetitive and could be condensed. Figures 6 & 7 contain eight scatter plots per site that differ only marginally.

(3) Discussion lacks “failure-case” analysis: The Discussion is in-depth, but an analysis of “model failure scenarios” is insufficient; a short subsection “Limitations & failure scenarios” with at least one negative example or sensitivity curve should be added.

  1. Methodological soundness

(1) The two-stage stacking architecture is reasonable: Level-1 consists of physics-guided base learners (RF, XGBoost, SVR, CatBoost), while Level-2 fuses spatial-statistical features via a meta-MLP, offering interpretability.

(2) QAA stability: QAA_v6 is used without local optical calibration. Please insert a paragraph (or supplementary figure) showing QAA-derived Kd(490) vs. field Secchi depth (or published climatology) to demonstrate that IOP magnitudes are at least physically plausible in each site.

(3) Fixed 9 × 9 window: While a sensitivity analysis is provided, a single fixed scale may be sub-optimal for reefs (sharp relief) vs. gentle deltas. Action: discuss adaptive windows (e.g., standard-deviation-weighted, or multi-scale fusion) as future work.

(4) CNN similarity: Reviewers may equate 9 × 9 neighbourhood statistics + MLP to a shallow CNN. Clarify in the Discussion that (i) no learnable convolution kernel exists, (ii) features are hand-crafted statistics, (iii) no weight sharing or gradient back-propagation through the spatial domain is performed—thus fundamentally different from CNNs.

  1. Experimental design and results

(1) Temporal consistency: A time gap exists between some images and field surveys. While mentioned in the Discussion, future work should specify an acceptable temporal window or introduce time-sensitivity analysis.

(2) Depth-stratified error: Currently only global RMSE is reported, but error distribution across depth bins is not analyzed. Provide a small box-plot or line-figure showing RMSE (or relative error) binned into 0–2 m, 2–5 m, 5–10 m, >10 m to reveal possible shallow-water bias or deep-water saturation.

  1. Writing and presentation

(1) English is generally fluent; however, several sentences exceed 40 words and contain nested clauses. Break these into two shorter sentences to improve readability.

(2) Figure density: Figures 5 & 6 & 7 each contain at least eight sub-panels with identical colour scales and very similar point clouds. Consider: (i) removing grid-lines inside panels, (ii) using only outline scatter points (no fill) to reduce ink, (iii) adding a thin vertical gap between sub-panels to visually separate them.

Author Response

Dear Reviewer,

        I fully agree with and am pleased to accept your suggested revisions to the manuscript. Based on your feedback, I have made detailed modifications to the paper, striving to address your concerns as thoroughly as possible. However, due to the volume of responses, it is not feasible to list all revisions within this email. Please refer to the attached document containing the response to your review comments for a comprehensive overview of the changes. This will facilitate your further review of the updated manuscript.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper proposed an OptiFusionStack modeling framework for SDB, addressing physical ambiguity from variable water quality and spatial incoherence from ignoring geographic context. Overall, the paper exhibits disorganized writing and presentation. The methodology section contains significant issues. Specific comments are provided below.

  1. The introduction section has significant issues. Recent research literature on SDB should be incorporated into the introduction. Additionally, acronyms should be spelled out in full upon their first appearance, such as QAA.
  2. It is recommended that Figure 2 be placed at the beginning of the Methods section.
  3. What is the theoretical basis for assigning values to certain parameters in Formulas 1–11? For example, why are g0 and g1 set to 0.089 and 0.1245, respectively?
  4. The OptiFusionStack Modeling Framework should be expanded and clearly presented to facilitate a better understanding of it.
  5. In Section C of Figure 2, what is the Baed Map?
  6. What are the training water depth ground truths for RF, SVR, XGBoost, and CatBoost when predicting water depth?
  7. The manuscript contains some elementary errors, such as the labeling of equation (6). Additionally, the equations are incorrectly ordered.
  8. Line 322 contains an error; it should be (13)-(16).
  9. The resolution of the figures in the experimental section is poor. It is recommended to provide experimental results with higher resolution.
  10. The language and grammar of the paper should be revised and polished by a professional editing service.

Author Response

Dear Reviewer,
      I fully agree with and am pleased to accept your suggested revisions to the manuscript. Based on your feedback, I have made detailed modifications to the paper, striving to address your concerns as thoroughly as possible. However, due to the volume of responses, it is not feasible to list all revisions within this email. Please refer to the attached document containing the response to your review comments for a comprehensive overview of the changes. This will facilitate your further review of the updated manuscript.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript titled " OptiFusionStack: A Physio-Spatial Stacking Framework for 2 Shallow Water Bathymetry Integrating QAA-Derived Priors 3 and Neighborhood Context " may be accepted pending Major revisions.

 

 

Introduction

 

The authors should provide a clearer and more detailed explanation of the optical types and models applied in this research.

 

Since optical imagery can sometimes be affected by errors related to image quality and meteorological conditions, the authors are encouraged to elaborate further on how such issues were addressed or minimized.

 

The process of cross-validation also requires better clarification—specifically, how it was implemented and verified within the experimental design.

 

Additionally, the proposed bathymetry analysis methodology appears conceptually similar to the Normalized Difference Snow Index (NDSI). Therefore, the authors should explain more precisely the distinct methodological principles and innovations introduced in this study.

 

In light of the above comments, it is recommended that the authors consult and cite a relevant research paper that could strengthen their methodological approach and provide additional context for the techniques employed.

 

I recommend extension of the Introduction

Fig. 1. Why the authors used the geographical coordinates for these small analyzed areas, explain better.

 

The Landsat 8 and 9 missions have the average spatial resolution of 30 m, how the authors were successful in analyzing of data from these recordings? Explain better.

How the authors solved the Site-Dependence of Optimal Size?

How the authors solved the Window Size Sensitivity?

 

Advances of this paper

Hybrid Physio-Spatial Framework:

The paper introduces OptiFusionStack, a novel two-stage framework that combines physical priors (from QAA-derived Inherent Optical Properties) with spatial context through neighborhood statistics. This fusion enables both physical interpretability and spatial coherence.

 

Superior Accuracy Without In-Situ Calibration:

The model achieves high performance (R² > 0.92) even without local optical calibration data, addressing a major challenge in real-world bathymetry where in-situ data are often unavailable.

 

Spatial Coherence and Artifact Reduction:

OptiFusionStack suppresses noise and patchiness common in pixel-wise models, generating smooth, geophysically consistent bathymetric surfaces by incorporating multi-scale neighborhood information.

 

Adaptive Generalization Across Diverse Environments:

The framework demonstrates strong transferability across optically clear (Qilian Islands) and turbid (Yellow River) waters, adapting its reliance on spectral versus physical features based on environmental conditions.

 

Robust Cross-Validation Design:

The use of spatial block cross-validation instead of random sampling ensures geographic independence between training and validation data, yielding realistic performance estimates and avoiding spatial leakage.

 

Benchmarking Against Deep Learning Models:

When compared to end-to-end deep architectures (ResMLP, Conv1DNet, Transformer), the proposed hierarchical model consistently outperformed them, validating the superiority of the “divide-and-conquer” stacking design.

 

Model Interpretability via SHAP Analysis:

SHAP value interpretation confirms that spatial context dominates predictive importance, while physical priors and base-model outputs refine predictions—enhancing transparency and scientific credibility.

Limitations that authors need to answer

 

 

 

How the authors solved the differences between true and real colors?

 

Window Size Sensitivity

The method depends heavily on the chosen neighborhood window (e.g., 9×9 pixels). If the window is too small, it fails to capture sufficient spatial context; if too large, it incorporates irrelevant or noisy information from distant, uncorrelated areas. This trade-off is often site-specific and may not generalize well.

 

Spatial Smoothing vs. Local Detail Loss

While computing mean, standard deviation, minimum, and maximum over local neighborhoods improves spatial coherence, it can also smooth out small-scale bathymetric variations. This may lead to an underestimation of depth gradients or loss of fine seabed structures.

 

Edge and Boundary Effects

At image borders or near sharp topographic transitions (e.g., reef edges, estuarine boundaries), the fixed-size window cannot be symmetrically applied. This can introduce edge artifacts or discontinuities in the resulting spatial features.

 

Non-stationarity Across Environments

The optimal spatial window and statistical descriptors may differ between clear, turbid, and mixed waters. A single, fixed multi-scale configuration might not adapt effectively across diverse optical conditions.

 

Increased Computational Complexity

The extraction of neighborhood statistics for multiple layers (spectral bands, IOPs, base predictions) across large Sentinel-2 scenes substantially increases processing time and memory requirements, which could limit operational scalability.

 

Potential Multicollinearity in Spatial Features

Many neighborhood statistics (mean, min, max) across correlated bands or base models may introduce redundancy and multicollinearity, complicating feature importance analysis and potentially confusing the meta-learner.

 

Lack of Physical Interpretability

Although spatial features improve coherence, their physical meaning is indirect. For example, a high neighborhood standard deviation may indicate rough bathymetry or noise, but the model does not explicitly distinguish between them, reducing interpretability.

 

 

Did the authors used swaping pixel methodology?

 

Conclusion

The authors should extend the Conclusion section to provide a more comprehensive synthesis of the study. This section also need to be extending.

A clearer summary of the key findings and their scientific significance.

 

Overall Recommendations

The paper can be accepted after a Major revision to address the specified points.

Reviewer# 2

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,
I fully agree with and am pleased to accept your suggested revisions to the manuscript. Based on your feedback, I have made detailed modifications to the paper, striving to address your concerns as thoroughly as possible. However, due to the volume of responses, it is not feasible to list all revisions within this email. Please refer to the attached document containing the response to your review comments for a comprehensive overview of the changes. This will facilitate your further review of the updated manuscript.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have well addressed all my comments. I have no more concerns.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript OptiFusionStack: A Physio-Spatial Stacking Framework for 2 Shallow Water Bathymetry Integrating QAA-Derived Priors 3 and Neighborhood Contextis suitable for acceptance in its current form

The paper now is total suitable for acceptance

The section Abstract now is much better written and now is acceptable

The reason because I accepted this paper after first round are

High scientific originality

The paper introduces OptiFusionStack, a novel two-stage framework that unifies physics-informed optical priors (from the Quasi-Analytical Algorithm, QAA) with multi-scale spatial context derived from satellite data.

 

This integration bridges the gap between physics-based semi-analytical models and data-driven machine learning, overcoming the limitations of pixel-wise bathymetric mapping.

 

It enables the framework to produce spatially coherent and physically interpretable bathymetric maps even without in-situ calibration data.

 

OptiFusionStack employs a two-level ensemble learning design:

 

Stage 1: Multiple base learners (Random Forest, XGBoost, SVR, CatBoost) trained with QAA-derived Inherent Optical Properties (IOPs).

 

Stage 2: A StackingMLP meta-learner fuses base predictions with 9×9 neighborhood statistics to model spatial dependencies.

This architecture markedly enhances accuracy and spatial consistency (R² up to 0.9167) compared to conventional pixel-wise and monolithic deep learning models.

 

The framework demonstrates high transferability across optically diverse environments—from clear offshore waters to highly turbid river systems.

 

It works without local in-situ calibration, addressing a major operational limitation in coastal and inland water mapping.

 

Its results show reduced spatial artifacts, enhanced map coherence, and adaptability to varying optical conditions

The number of references now are sufficient.

I have accepted the manuscript in full

Sincerely,

Reviewer#1

Comments for author File: Comments.docx

Back to TopTop