Optical Water Type Guided Benchmarking of Machine Learning Generalization for Secchi Disk Depth Retrieval

Jiang, Bo; Yang, Hanfei; Deng, Lin; Zhao, Jun

doi:10.3390/rs18020287

Open AccessArticle

Optical Water Type Guided Benchmarking of Machine Learning Generalization for Secchi Disk Depth Retrieval

¹

School of Marine Sciences, Sun Yat-sen University, Zhuhai 519082, China

²

Guangdong Research Institute of Water Resources and Hydropower, Guangzhou 510006, China

³

School of Geography and Remote Sensing, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2026, 18(2), 287; https://doi.org/10.3390/rs18020287

Submission received: 16 December 2025 / Revised: 9 January 2026 / Accepted: 9 January 2026 / Published: 15 January 2026

(This article belongs to the Special Issue Remote Sensing in Monitoring Coastal and Inland Waters)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Model bias depended on the split, MDN had the smallest Cross-OWT overestimation (SSPB = 10.7%); SHAP showed an NSMI threshold around 0.4–0.6.
Removing ratios/indices improved Cross-OWT robustness, especially for KNN (MdSA: from 96% to 40%).

What are the implications of the main findings?

Use deployment-relevant splits to assess SDD models (not only random splits).
Treat feature engineering as model- and scenario-dependent, particularly under optical-regime shifts.

Abstract

Secchi disk depth (SDD) is a widely critical indicator of water transparency. However, existing retrieval models often suffer from limited transferability and biased predictions when applied to optically diverse waters. Here, we compiled a dataset of 6218 paired in situ SDD and remote sensing reflectance (R_rs) measurements to evaluate model generalization. We benchmarked nine machine learning (ML) models (RF, KNN, SVM, XGB, LGBM, CAT, RealMLP, BNN-MCD, and MDN) under three validation scenarios with progressively decreasing training-test overlap: Random, Waterbody, and Cross-Optical Water Type (Cross-OWT). Furthermore, SHAP analysis was employed to interpret feature contributions and relate model behaviors to optical properties. Results revealed a distinct scenario-dependent generalization gradient. Random splits yielded minimal bias. In contrast, Waterbody transfer consistently shifted predictions toward underestimation (SSPB: −16.9% to −3.8%). Notably, Cross-OWT extrapolation caused significant error inflation and a bias reversal toward overestimation (SSPB: 10.7% to 88.6%). Among all models, the Mixture Density Network (MDN) demonstrated superior robustness with the lowest overestimation (SSPB = 10.7%) under the Cross-OWT scenario. SHAP interpretation indicated that engineered indices, particularly NSMI, functioned as regime separators, with substantial shifts in feature attribution occurring at NSMI values between 0.4 and 0.6. Accordingly, feature sensitivity analysis showed that removing band ratios and indices improved Cross-OWT robustness for several classical ML models. For instance, KNN exhibited a significant reduction in Median Symmetric Accuracy (MdSA) from 96% to 40% after feature reduction. These findings highlight that model applicability must be evaluated under scenario-specific conditions, and feature engineering strategies require rigorous testing against optical regime shifts to ensure generalization.

Keywords:

Secchi disk depth (SDD); machine learning (ML); optical water types (OWTs); SHAP interpretability

1. Introduction

Water transparency is a fundamental property of aquatic ecosystems that regulates light penetration into the water column [1,2], thereby constraining the depth distribution of submerged macrophytes and the light environment for primary production [3]. Transparency is primarily governed by the optical interactions of phytoplankton, suspended particles, and colored dissolved organic matter (CDOM) [4]. In practice, Secchi disk depth (SDD), serves as the most common in situ metric for water clarity, providing a simple yet robust indicator of long-term environmental changes [5]. SDD records are essential for assessing eutrophication, sediment resuspension, and browning, thus supporting water resource management and ecosystem protection across lakes, reservoirs, rivers, and coastal waters [1,2,6,7,8].

Remote sensing offers the capacity to monitor SDD over large areas and long temporal periods. Existing retrieval approaches can be broadly categorized into empirical algorithms, semi-analytical algorithms (SAAs), and machine learning (ML) or deep learning (DL) models [3,5,9,10,11,12,13,14,15]. Empirical algorithms establish statistical relationships between SDD and remote sensing reflectance (R_rs), typically utilizing single bands or band ratios [16,17]. While these methods have been successfully applied to sensors such as MODIS, Landsat, and MERIS to reconstruct transparency time series [11,18,19], their transferability is often limited. The relationship between SDD and R_rs is inherently nonlinear, and variations in optical and biogeochemical conditions across waterbodies frequently lead to poor performance when empirical models calibrated in one region are applied to another [15,20,21]. Alternatively, SAAs estimate SDD by inverting inherent optical properties (IOPs) based on underwater visibility theory [4,5]. Although physically robust, SAAs generally require detailed optical information and regional parameter tuning, constraining their broad-scale applicability [21,22,23]. In contrast, ML and DL models learn nonlinear mappings directly from data. These data-driven approaches can integrate multi-region and multi-sensor datasets, showing promise for SDD estimation and trend analysis in diverse aquatic environments [14,15,24].

Despite these advancements, three critical challenges persist in ML-based SDD retrieval. First, training datasets often lack optical diversity and representativeness [22,25,26]. Many models are developed using data from a single lake or a limited set of waterbodies, which restricts the range of optical conditions learned during training [27]. Second, validation strategies frequently fail to reflect realistic generalization goals. Random sample splitting remains a common practice; however, this approach often leads to spatial autocorrelation when samples from the same or nearby stations appear in both training and test sets, resulting in overly optimistic performance estimates. Rigorous scenario-based evaluations, such as testing on unseen waterbodies or distinct optical regimes, remain relatively rare for SDD retrieval [27,28]. Third, model interpretability is often insufficient. High predictive accuracy does not reveal which spectral features drive the predictions or whether feature utilization aligns with bio-optical principles and remains stable under distribution shifts [29,30].

To address these challenges, optical water type (OWT) frameworks classify diverse waters into representative optical regimes based on spectral magnitude and shape [31,32]. This stratification facilitates the development of regime-specific algorithms and dynamic algorithm selection, particularly for parameters like chlorophyll-a and total suspended solids (TSS) [33,34,35]. SHAP (SHapley Additive exPlanations) has been adopted to interpret “black-box” models by quantifying feature contributions and revealing nonlinear effects in water quality retrieval [36,37]. However, the integration of OWT frameworks with interpretability tools to assess SDD generalization across optical regimes has not yet been fully explored.

In this study, we combined a comprehensive in situ dataset with an OWT-guided evaluation framework, multiple ML and neural network (NN) models, and SHAP analysis. Our primary objective was to benchmark the accuracy and generalization capability of SDD retrieval models under distinct validation scenarios and to elucidate the link between model behavior and water optical properties. Specifically, this study aimed to:

(1) Compare the performance of various ML and NN models under three validation scenarios (Random split, Waterbody split, and Cross-OWT testing) to characterize scenario-dependent variations in accuracy, bias, and error patterns;

(2) Identify the spectral bands and indices driving SDD predictions using SHAP analysis, compare feature utilization between tree-based models and NN models, and assess the consistency of dominant predictors with established bio-optical principles, thereby providing guidance for feature selection and model optimization in satellite-based SDD retrieval.

2. Materials and Methods

2.1. Global In Situ Datasets

We utilized two in situ datasets providing coincident water quality measurements and hyperspectral R_rs. The first dataset, GLORIA (Global reflectance community dataset for imaging and optical sensing of aquatic environments), is a global compilation of hyperspectral R_rs and co-located water-quality parameters for inland and coastal waters [38]. GLORIA includes 7572 R_rs spectra (350–900 nm at 1 nm resolution) from 450 inland and coastal waterbodies across 59 institutions. Each spectrum is paired with at least one water quality parameter, including chlorophyll-a (chl-a), total suspended matter (TSM), CDOM and SDD, covering a broad spectrum of optical and biogeochemical conditions.

The second dataset comprises satellite matchups of optical properties and water surface temperature (WST) from 18 lakes in China, collected across 586 sampling stations between 2020 and 2023 [39]. Measurements of R_rs, chl-a, TSM, SDD and WST were synchronized with satellite overpasses within a 1.5 h time window. The spatial distribution of sampling sites for both datasets is illustrated in Figure 1.

Prior to model development, rigorous quality control was applied. Samples exhibiting missing SDD values, significant spectral gaps, or non-physical R_rs values were excluded based on Quality Water Index Polynomial (QWIP) scores and threshold criteria [38,39]. Consequently, 6218 paired SDD–R_rs records were retained for analyses. To evaluate model generalization within a realistic satellite observation framework, hyperspectral R_rs was convolved to multispectral reflectance using the spectral response functions (SRFs) of the Sentinel-2 MultiSpectral Instrument (MSI) provided by the European Space Agency (ESA), Paris, France (Equation (1)). We computed band-averaged R_rs for six bands centered at 490, 560, 660, 705, 740, and 850 nm covering the visible to near-infrared (NIR) spectral range. Sentinel-2 was selected due to its high spatial resolution (10–20 m) and extensive application in inland water color remote sensing [1,22,26,40,41].

M u l t i R_{r s} (λ) = \frac{\sum_{λ_{1}}^{λ_{2}} H y p e r R_{r s} (λ) \cdot S R F (λ) d λ}{\sum_{λ_{1}}^{λ_{2}} S R F (λ) d λ}

(1)

2.2. Optical Water Type Classification

The optical variability of natural waters poses significant challenges for developing a universally applicable SDD model. To provide a robust optical context for model evaluation, we classified spectra into OWTs following the scheme proposed by Bi and Hieronymi [32].This framework defines inherent optical properties (IOPs) for pure water, phytoplankton, detritus, and CDOM, sampling concentrations within realistic ranges under both four-component and two-component modes. Corresponding R_rs spectra were generated via radiative transfer simulations and filtered by spectral shape constraints.

We implemented this classification by mapping each spectrum to three R_rs-derived features, Apparent Visible Wavelength (AVW), RGB trapezoidal area (ARGB), and Normalized Difference Index (NDI)—and assigning class membership based on the Mahalanobis distance to class centers. While the original scheme defines ten OWTs, several classes contained sparse SDD data in our dataset. To maintain statistical robustness while preserving the primary optical gradients, we merged similar classes (1/2, 3a/3b, 4a/4b, 5a/5b) into four distinct types: type I (clear oceanic), type II (clear inland), type III (moderately eutrophic), and type IV (strongly eutrophic). We retained classes 6 and 7 as type V (turbid) and type VI (organic/CDOM-rich), respectively. A PERMANOVA-based permutation test on standardized feature space confirmed that this six-type scheme effectively preserved the dominant optical gradients (pseudo-F = 1750.37, R² = 0.585, p = 0.001). Sample sizes and SDD statistics for each OWT are summarized in Table 1.

2.3. Machine Learning Methods

We benchmarked nine models, comprising six classical machine learning (ML) algorithms and three neural network (NN) or probabilistic models. The classical ML group included Random Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and three gradient boosting frameworks: XGBoost (XGB), LightGBM (LGBM), and CatBoost (CAT). These algorithms are widely employed for retrieving SDD and optical water quality variables due to their ability to model nonlinear relationships and handle correlated spectral inputs [14,24,36,42,43]. Specifically, RF and gradient boosting models utilize decision tree ensembles to capture complex feature interactions; SVM performs regression in a high-dimensional kernel space; and KNN predicts targets by aggregating local neighbors in the feature space.

The NN group consisted of a standard Multilayer Perceptron (RealMLP) [44], a Monte Carlo Dropout Bayesian NN (BNN-MCD) and a Mixture Density Network (MDN) [42,45]. BNN-MCD incorporated dropout during inference to approximate Bayesian posterior distributions and quantify predictive uncertainty. MDN modeled the conditional probability distribution of SDD as a Gaussian mixture, providing a full probabilistic output rather than a single point estimate.

2.4. Model Training

2.4.1. Data Splitting Strategies

To rigorously assess generalization, we designed three validation scenarios targeting sample, spatial, and optical domain shifts. For all scenarios, feature preprocessing, hyperparameter tuning, and model selection were strictly restricted to the training/validation sets to prevent data leakage.

Scenario 1: Random split (Random). Samples were randomly shuffled and partitioned into training (70%), validation (15%), and testing (15%) sets. This standard approach assesses model performance under overlapping distributions where training and test data may share similar optical characteristics.
Scenario 2: Waterbody-based split (Waterbody). To evaluate transferability to unseen locations, samples were grouped by unique waterbody identifiers. Each waterbody was exclusively assigned to either the training, validation, or test set, ensuring spatial independence.
Scenario 3: Cross-OWT split (Cross-OWT). To test extrapolation to unseen optical regimes, models were trained on intermediate optical types (OWT II–IV) and evaluated on extreme types (OWT I, V, and VI). This scenario challenges the models to generalize from moderate to distinct optical conditions, with the test domain dominated by turbid waters (OWT V).

2.4.2. Training and Hyperparameters

A standardized training protocol was applied across all scenarios: (1) fixing the test set; (2) tuning hyperparameters via cross-validation within the training domain. Input features varied by model class. NN-based models (RealMLP, BNN-MCD, MDN) utilized only the six simulated Sentinel-2 bands. Classical ML models (RF, KNN, SVM, XGB, LGBM, CAT) employed an expanded feature set, including the six bands, five band ratios (Blue/Green, Blue/Red, Green/Red, Red/NIR, Green/NIR), and three indices: Normalized Difference Chlorophyll Index (NDCI), Normalized Difference Turbidity Index (NDTI), and Normalized Suspended Material Index (NSMI) [46]. All features were standardized using training set statistics.

All key hyperparameters are listed in Table 2. For NN-based models, hyperparameters (architecture, learning rate, weight decay, dropout rate) were optimized via random search and trained using the Adam optimizer. RealMLP and BNN-MCD minimized Mean Squared Error (MSE), with BNN-MCD averaging predictions from multiple stochastic forward passes during inference. MDN minimized the negative log-likelihood of a Gaussian mixture, using the mixture mean as the prediction. For tree-based models, key hyperparameters (e.g., learning rate, tree depth, subsampling ratio) were tuned using grid or random search.

2.5. SHAP Interpretation

To elucidate model behavior and its bio-optical basis, we employed SHAP, a game theoretic approach that attributed predictions to individual feature contributions [47]. SHAP values ensure additivity and consistency, facilitating transparent comparisons of feature importance across different model architectures.

Following model training under the Cross-OWT scenario, we computed SHAP values for all nine models. SHAP summary plots were generated to rank feature importance and visualize the directional impact of features on SDD predictions. Additionally, SHAP dependence plots were utilized to examine nonlinear responses, identify threshold effects, and reveal feature interactions critical for optical interpretation.

2.6. Statistical Metrics

Model performance was evaluated using three metrics: the coefficient of determination (R²), median symmetric accuracy (MdSA), and symmetric signed percentage bias (SSPB). R² indicates the proportion of explained variance. MdSA, calculated from the median absolute log-ratio of predicted to observed values, provides a robust measure of typical relative error [14]. SSPB quantified systematic bias using symmetric percentage differences, with negative values indicating underestimation and positive values indicating overestimation [28,42]. These metrics are formally defined in Equations (2)–(4).

R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(x_{e s t, i} - x_{o b s, i})}^{2}}{\sum_{i = 1}^{N} {(x_{o b s, i} - \bar{x_{o b s, i}})}^{2}}

(2)

M d S A = (\exp (m e d i a n (|\ln (\frac{x_{e s t, i}}{x_{o b s, i}})|)) - 1) \times 100 %

(3)

S S P B = s i g n (m) \times (e^{| m |} - 1) \times 100 % m = m e d i a n (\ln \frac{x_{e s t, i}}{x_{o b s, i}})

(4)

2.7. Uncertainty Metric

The reliability of predictive uncertainty was assessed using the Prediction Interval Coverage Probability at the 90% nominal level (PICP90). For each test sample, a two-sided 90% prediction interval was constructed (Equation (5)).

σ_{i}

is the predictive standard deviation, and

k_{0.9} = Φ^{- 1} (0.95) = 1.645

.

L_{i} = x_{e s t, i} - k_{0.90} σ_{i} U_{i} = x_{e s t, i} + k_{0.90} σ_{i}

(5)

P I C P 90 = \frac{1}{N} \sum_{i = 1}^{N} I \cdot (L_{i} < x_{o b s, i} < U_{i}) 100 %

(6)

PICP90 represents the proportion of observations falling within this interval (Equation (6)). N is the number of test samples, and

I \cdot

() is the indicator function. For probabilistic models (MDN and BNN-MCD)

σ_{i}

was provided by the model outputs, and we aggregated predictions across the saved checkpoints (25 instances per scenario) to obtain an ensemble mean and standard deviation. For deterministic models, which lack native uncertainty outputs,

σ

was estimated via bootstrap ensembling. Specifically, we generated 50 bootstrap resamples of the training data, retrained the model for each, and computed the ensemble mean and spread (

σ

) to construct prediction intervals.

3. Results

3.1. Optical Water Types Analysis

Figure 2 and Figure 3 illustrate the SDD distributions and representative centroid R_rs spectra for the six merged OWTs. A one-way ANOVA on log10-transformed SDD confirmed significant differences among these types (F = 1883.14, p < 0.001), validating the effective stratification of transparency levels. Type I (Clear Oceanic) exhibited the highest SDD, characterized by high blue reflectance decreasing monotonically toward the red, typical of oligotrophic waters. Type II (Clear Inland) similarly showed high transparency (SDD > 1 m) but featured a smoother visible spectrum with a gentler slope. Type III (Moderately Eutrophic) spanned a broad SDD range with generally lower transparency than Type II, distinguished by a pronounced peak near 560 nm and elevated red/red-edge reflectance. Type IV (Strongly Eutrophic) was dominated by low-transparency waters (SDD < 1 m). Its centroid spectrum displayed a bimodal shape with local maxima near 560 and 705 nm, alongside higher red-edge reflectance than Types I–III, indicative of optically complex, productive waters. Type V (Turbid) recorded the lowest SDD (mostly < 1 m), with reflectance remaining consistently high from the visible to near-infrared regions, reflecting strong particulate scattering. Type VI (Organic/CDOM-rich), though containing fewer samples, was characterized by suppressed blue reflectance and relatively high red-edge reflectance, consistent with CDOM-dominated absorption.

Overall, SDD progressively decreased and narrowed in range from clear (Types I–II) to productive and turbid waters (Types III–V), accompanied by systematic shifts in spectral magnitude and shape. These distinct optical contrasts underscore the necessity of the Cross-OWT evaluation (Section 3.2) to test model robustness when extrapolating from moderate to extreme optical regimes.

3.2. Model Performance Under Three Data-Splitting Strategies

We evaluated nine models under three validation scenarios (Random, Waterbody, and Cross-OWT). As shown in Figure 4, the overlap in SDD distributions between training and test sets varied markedly. The Random split exhibited substantial overlap, whereas the Cross-OWT split introduced a severe distributional mismatch: models trained on moderate-transparency waters (OWT II–IV) were tested on optically extreme regimes (OWT I, V, VI). This distribution shift precipitated scenario-dependent variations in error and bias.

Figure 5 summarizes model performance metrics (R², MdSA, SSPB). Random split: All models achieved their best performance (R²: 0.904–0.969; MdSA: 15.9–31.6%). RF and MDN yielded the lowest errors (MdSA: 15.9% and 16.4%, respectively), while SVM exhibited the highest (31.6%). Systematic bias was minimal, with SSPB ranging from −3.8% to 11.4%. Waterbody split: Performance generally degraded (R²: 0.708–0.911; MdSA: 20.9–33.6%). XGB and RF maintained high R² (>0.90), whereas BNN-MCD dropped to 0.708. Crucially, SSPB shifted to negative values for all models (−16.9% to −3.8%), indicating a consistent underestimation tendency when transferring to unseen waterbodies. Cross-OWT split: Although some models retained high R² (0.856–0.971), MdSA surged (42.4–104.2%) and SSPB became consistently positive (10.7–88.6%), revealing a pronounced overestimation bias under optical regime extrapolation. To quantify scenario effects relative to the Random baseline, we defined generalization loss (ΔMdSA) and bias shift (ΔSSPB). Under the Waterbody split, ΔSSPB was negative across the board (−17.7% to −5.3%), with ΔMdSA showing moderate increases (−5.4% to +9.5%). Conversely, under Cross-OWT, Δ showing moderate increases (−5.4% to +9.5%). Conversely, under Cross-OWT, MdSA spiked (+16.5% to +79.9%) and ΔSSPB became strongly positive (+14.5% to +84.6%), highlighting widespread error inflation and bias reversal.

Figure 6 further elucidates error patterns in log–log space. Cross-OWT deviations were most severe at low SDD values, where predictions consistently exceeded the 1:1 line. Tree-ensemble models followed the diagonal for intermediate-to-high SDD but diverged upward at extremes. KNN and SVM exhibited the largest dispersion, indicating high sensitivity to distribution shifts. Among neural models, MDN aligned most closely with the 1:1 line, demonstrating superior bias stability compared to RealMLP and BNN-MCD, which showed larger scatter at the extremes.

In summary, validation scenarios dictated model performance: Waterbody transfer primarily induced underestimation, whereas Cross-OWT extrapolation caused substantial error inflation and overestimation. MDN proved the most robust to extrapolation, while KNN and SVM were the most sensitive to optical regime shifts.

3.3. Uncertainty Analysis

Figure 7 presents uncertainty reliability via PICP90. Under the Random split, coverage was generally below the nominal 90% level (68–90%), suggesting mild under-coverage even within-distribution. Under the Waterbody split, PICP90 for deterministic models dropped significantly (45–76%), indicating pronounced overconfidence. However, probabilistic neural models (MDN, BNN-MCD) maintained near-nominal coverage (89–92%). Under Cross-OWT extrapolation, PICP90 paradoxically rebounded (69–95% for deterministic models) and became conservative for probabilistic models (BNN-MCD near 100%). This suggests that while point prediction errors increased (as seen in Section 3.2), uncertainty estimates inflated even more rapidly, leading to broader intervals rather than improved accuracy.

3.4. SHAP-Based Model Interpretation

3.4.1. Global Feature Importance

Figure 8 presents SHAP summary plots for the nine models under the Cross-OWT scenario. Across all models, important predictors were concentrated within the 490–705 nm spectral region, signifying its continued relevance for SDD prediction even under optical-regime extrapolation. However, feature importance rankings diverged based on model input types. For RF, XGB, LGBM, CAT, and SVM, NSMI ranked among the most influential predictors, followed by band ratios such as R_rs(560)/R_rs(490) and R_rs(660)/R_rs(490), and then individual red and red-edge bands. KNN, in particular, exhibited a similar feature set but concentrated attribution on fewer, highly ranked ratios/indices. NN models (RealMLP, BNN-MCD, MDN) using raw bands: Despite being restricted to raw spectral bands, these models also emphasized the 490–705 nm region. RealMLP assigned relatively higher importance to R_rs(705), while BNN-MCD and MDN prioritized R_rs(660) and R_rs(490). This convergence on key spectral regions, despite differences in feature representation (raw bands vs. engineered indices), underscores the robustness of the 490–705 nm range for SDD estimation across diverse optical conditions.

Furthermore, SHAP color-sign patterns provided qualitative consistency checks with established water-color optics principles. In models utilizing NSMI, higher NSMI values were generally associated with negative SHAP contributions, aligning with predictions of lower SDD in more turbid or optically complex waters. For MDN, higher R_rs(490) tended to yield positive contributions (suggesting higher SDD or clearer water), whereas higher R_rs(660) typically contributed negatively, reflecting the competing influences of blue-band light penetration and red/red-edge absorption and scattering. In summary, Figure 8 highlights a compact set of informative spectral features (490–705 nm bands, ratios, and indices) crucial for SDD prediction. It also reveals systematic differences in how tree-based ensembles and neural networks exploit these predictors. Subsequent analysis in Section 3.4.2 further examines response shapes and interaction effects using SHAP dependence plots.

3.4.2. Partial Dependence and Thresholds

Figure 9 shows SHAP dependence plots for NSMI under the Cross-OWT scenario, with points colored by R_rs(560)/R_rs(490) to provide an auxiliary view of the blue–green spectral contrast across the turbidity–clarity gradient. For the tree-ensemble models (RF, XGB, LGBM, and CAT), SHAP (NSMI) showed a highly consistent segmented response: as NSMI increased, SHAP contributions became more negative, with the strongest change occurring within a narrow transition interval (NSMI: 0.4–0.6). At higher NSMI, changes in SHAP (NSMI) weakened and approached a quasi-plateau, suggesting that NSMI mainly separated regimes rather than providing uniformly incremental information across its full range. The colouring by R_rs(560)/R_rs(490) revealed covariance with NSMI, reinforcing that NSMI captures a component of the blue–green spectral contrast. KNN and SVM showed different dependence structures. KNN exhibited a diffuse pattern with limited organization, indicating less stable NSMI attribution under Cross-OWT. SVM showed a steep transition over a narrow NSMI interval and relatively limited variation elsewhere, implying strong sensitivity near the boundary. Overall, Figure 9 complemented the feature ranking in Figure 8 by revealing the direction, nonlinearity, and effective operating range of NSMI under optical-regime extrapolation, providing behavioral context for the scenario-dependent errors and bias shifts reported in Section 3.2.

Figure 10 shows SHAP dependence plots for the three NN models, with R_rs(660) as the focal variable and points colored by R_rs(490). In all three models, the structured color pattern indicated that the SHAP contribution of R_rs(660) depended on the concurrent level of R_rs(490), consistent with an interaction between red-band and blue-band information. RealMLP exhibited a nonlinear response: SHAP (R_rs(660)) was most negative at intermediate R_rs(660) and became less negative at higher values, indicating a range-dependent effect rather than a strictly monotonic trend. BNN-MCD showed substantially greater dispersion, including occasional extreme negative attributions at moderate-to-high R_rs(660), suggesting less stable attribution under Cross-OWT. MDN spanned the widest SHAP range and showed both positive and negative contributions at similar R_rs(660) levels, indicating a strongly regime-dependent representation of the R_rs(660) effect during optical-regime extrapolation. The regime-separating behavior of NSMI and the band-interaction patterns in the NN models provided a mechanistic context for the bias reversal observed under Cross-OWT.

4. Discussion

4.1. Scenario-Dependent Performance and Practical Applicability

Previous Secchi depth (SDD) retrieval studies, spanning empirical, semi-analytical, and ML-based approaches, have typically involved coupled design choices: (i) feature construction, ranging from single-band predictors to engineered band ratios/indices [13,16,26,48,49], and (ii) validation design, most commonly sample-wise random splits or, less frequently, region-/waterbody-based splits [14,28,42,50]. Random splits often result in training and test data sharing the same waterbodies and similar optical conditions. Consequently, reported accuracies predominantly reflect within-distribution interpolation [27], potentially overstating generalization capabilities for real-world deployments in unseen waterbodies or optically extreme regimes. In contrast, optical-property-aware validation (e.g., OWT-based transfer) has been uncommon in SDD modeling, despite its direct relevance to cross-regime applications.

Our benchmark across three scenarios highlights the scenario-dependent nature of practical applicability. Random Split: All models achieved high accuracy and the ranking differences were modest, consistent with an interpolation regime where training and test distributions strongly overlapped. Waterbody Split: Performance declined, and a consistent underestimation bias emerged across all models, underscoring the impact of cross-lake deployment. Prediction interval coverage (PICP90) also dropped notably for most deterministic models, suggesting overconfident intervals in this transfer setting. This scenario provided a more realistic proxy for operational transfer to unseen waterbodies; within it, tree-ensemble methods (RF, boosting models) remained robust baselines for point accuracy, though their uncertainty reliability still degraded compared to the Random split. Cross-OWT Split: This most stringent scenario presented the greatest challenge. PICP90 for several models shifted towards nominal or became conservative, demonstrating that interval coverage and point accuracy capture different facets of generalization under regime shifts. These results emphasize that model superiority could not be inferred from a single split; instead, generalization should be stated explicitly with respect to the intended deployment target. Notably, the probabilistic NN (MDN) showed comparatively better bias stability under Cross-OWT while maintaining near-nominal coverage, suggesting potential advantages when deployment involves regime shift.

SHAP analysis connected these scenario-dependent generalization patterns to feature representation. Across models, influential predictors primarily resided within the visible to red-edge spectrum (approx. 490–705 nm). However, feature usage varied: tree-ensemble models leaned on engineered indices/ratios (especially NSMI and blue–green/red contrasts), whereas neural networks relied more on raw bands, implicitly capturing band interactions. Under Cross-OWT, NSMI acted as a regime-separating predictor for tree-ensemble models. Its SHAP contribution showed a sharp change within a narrow interval (NSMI ≈ 0.4–0.6) and plateaued thereafter. This behavior aligns with semi-analytical models where retrieval sensitivity shifts in turbid regimes, and index responses can saturate beyond optical transitions [5,10]. Such saturation implies that engineered indices may offer limited incremental information in certain feature spaces, potentially amplifying bias during extrapolation to extreme regimes, consistent with the observed bias reversal. Furthermore, Cross-OWT dependence plots revealed regime-conditioned effects and sensitivity to feature-space geometry, explaining the pronounced degradation of KNN/SVM under distribution shifts and motivating our subsequent feature-design experiment. This analysis sought to determine if simplifying input features could mitigate bias and enhance robustness under Cross-OWT extrapolation, particularly for KNN and SVM, without compromising performance in more conventional Random and Waterbody settings.

4.2. Feature Design Sensitivity Under Cross-OWT Extrapolation

Motivated by the regime-separating behavior of NSMI identified via SHAP (Section 4.1), we compared model performance using only six raw bands versus an expanded engineered-feature set under the Cross-OWT split. The engineered set included band ratios, NDCI/NDTI/NSMI, hue angle [51,52], NDWI, and OWT descriptors (AVW and ARGB). For tree-ensemble models, engineered features did not deliver consistent gains and often amplified overestimation. As presented in Table 3, for RF, XGB, and CAT, SSPB increased from 31.08%, 30.70%, and 16.73% to 48.21%, 36.45%, and 27.45%, respectively, while MdSA rose from 47.52%, 45.84%, and 41.14% to 50.50%, 46.50%, and 44.86%. LightGBM showed a marginal reduction in SSPB (27.86% to 28.25%), but MdSA still increased (49.99% to 54.58%). These patterns suggest that tree-ensemble models already capture useful nonlinear interactions from raw bands, whereas additional engineered predictors can become less informative, or even detrimental, when extrapolating to optically extreme regimes, consistent with the NSMI threshold/plateau behavior in the dependence plots.

Distance/kernel-based models were more sensitive to feature construction. For KNN, engineered features substantially increased both error and bias, with MdSA rising from 39.99% to 104.23% and SSPB from 21.95% to 72.38%, consistent with distorted neighborhood structure under regime shift. SVM improved relative to the raw-band baseline (MdSA 144.93% to 79.23%; SSPB 121.77% to 70.14%), but the remaining large errors indicate limited robustness under Cross-OWT extrapolation. Overall, this comparison indicates that feature engineering should not default to adding more indices for Cross-OWT deployment: raw bands provide a competitive and often more bias-stable baseline for tree ensembles, while distance/kernel methods can be highly vulnerable to ratio/index augmentation.

4.3. Limitations and Future Work

This study has several limitations. Firstly, the dataset exhibits an uneven distribution of sample numbers across different water types, particularly OWT I and VI. Although Cross-OWT metrics were computed on the combined test domain (OWT I/V/VI) and were therefore driven mainly by the dominant extreme regime (OWT V), the limited samples in OWT I/VI reduce coverage of these rare end-member regimes and increase uncertainty in representing the full spectrum of extreme optical conditions. Secondly, the model evaluation was primarily conducted for transparency, and a comprehensive assessment of model performance across other critical water quality parameters, such as Chl-a and CDOM, is currently lacking.

To address these limitations, future work will expand the in situ dataset with emphasis on optically extreme regimes, including eutrophic inland or turbid estuarine waters. However, obtaining extensive in situ datasets for such waters remains challenging. Therefore, we will increase coverage of optically extreme regimes by constructing satellite–in situ matchups from global water quality monitoring stations. Such matchups introduce spatiotemporal mismatch and atmospheric-correction uncertainty. The next phase of this work will not only evaluate model generalization under these uncertainties within the Waterbody and Cross-OWT scenarios, but also extend performance assessment to other key water-quality parameters, including Chl-a and CDOM.

5. Conclusions

Within an OWT framework, we benchmarked nine models for SDD retrieval across three distinct training/testing scenarios (Random, Waterbody, and Cross-OWT). Feature contributions were analyzed using SHAP. Our main conclusions are:

(1): Generalization depended on the validation scenario: Random splits yielded small bias across models. Waterbody transfer produced systematic underestimation, while XGB produced a strong accuracy bias trade-off under this setting. Cross OWT extrapolation produced the largest degradation and a bias reversal toward overestimation; MDN showed the smallest overestimation.
(2): Feature engineering affected Cross OWT performance: SHAP showed that NSMI acted as a regime-separating predictor, with attribution changes concentrated around NSMI values of about 0.4 to 0.6. Indices did not consistently improve Cross OWT robustness for tree ensemble models and could increase bias. KNN was highly feature sensitive; removing indices markedly improved its Cross OWT robustness.
(3): Implications for SDD retrieval practice: Model performance should be reported for a specific validation scenario rather than inferred from a random split. Feature sets should be selected based on scenario testing under regime shifts, instead of applying index engineering by default.

Author Contributions

Conceptualization, B.J. and H.Y.; methodology, B.J. and H.Y.; software, B.J.; validation, B.J., H.Y. and J.Z.; formal analysis, B.J. and H.Y.; data curation, B.J. and H.Y.; writing—original draft preparation, B.J. and H.Y.; writing—review and editing, B.J., H.Y., L.D. and J.Z.; supervision, B.J. and L.D.; funding acquisition, B.J. and L.D. B.J. and H.Y. contributed equally to this work. Correspondence: L.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Research and Development Projects in Key Areas of Guangdong Province, grant number 2020B0101130018 and the GuangDong Basic and Applied Basic Research Foundation (No. 202201011275).

Data Availability Statement

Publicly available datasets were analyzed in this study. The GLORIA in situ hyperspectral remote-sensing reflectance and water-quality dataset is available from PANGAEA (https://doi.org/10.1594/PANGAEA.948492).The satellite–ground synchronous in situ dataset of water optical parameters and surface temperature for typical lakes in China is available from Zenodo (https://doi.org/10.5281/zenodo.10942116).

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Jiang, X.; Wang, S.; Li, J.; Spyrakos, E.; Yao, H.; Zhang, F.; Tyler, A.N.; Zhang, B. Water Transparency and Color in Large Rivers Observed by Sentinel-2 MSI and Its Implications for SDG 6.3.2 Monitoring. Int. J. Appl. Earth Obs. Geoinf. 2025, 143, 104826. [Google Scholar] [CrossRef]
Pi, X.; Feng, L.; Li, W.; Zhao, D.; Kuang, X.; Li, J. Water Clarity Changes in 64 Large Alpine Lakes on the Tibetan Plateau and the Potential Responses to Lake Expansion. ISPRS J. Photogramm. Remote Sens. 2020, 170, 192–204. [Google Scholar] [CrossRef]
Wei, J.; Wang, M.; Jiang, L.; Lee, Z.; Kirby, R.; Mikelsons, K.; Lin, G. Satellite Observations of Water Transparency from VIIRS in Global Aquatic Ecosystems. Remote Sens. Environ. 2025, 330, 114981. [Google Scholar] [CrossRef]
Jiang, D.; Matsushita, B.; Setiawan, F.; Vundo, A. An Improved Algorithm for Estimating the Secchi Disk Depth from Remote Sensing Data Based on the New Underwater Visibility Theory. ISPRS J. Photogramm. Remote Sens. 2019, 152, 13–23. [Google Scholar] [CrossRef]
Lee, Z.; Shang, S.; Qi, L.; Yan, J.; Lin, G. A Semi-Analytical Scheme to Estimate Secchi-Disk Depth from Landsat-8 Measurements. Remote Sens. Environ. 2016, 177, 101–106. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, Y.; Iestyn Woolway, R.; Cao, Z.; Wang, X.; Zhou, J.; Zhou, Y.; Wang, W.; Li, N.; Qin, B.; et al. Spatiotemporal Variations in Global Lake Clarity and Responses to Climate and Landscape Drivers. Sci. Bull. 2025, 70, 4091–4103. [Google Scholar] [CrossRef]
Shen, M.; Duan, H.; Cao, Z.; Xue, K.; Qi, T.; Ma, J.; Liu, D.; Song, K.; Huang, C.; Song, X. Sentinel-3 OLCI Observations of Water Clarity in Large Lakes in Eastern China: Implications for SDG 6.3.2 Evaluation. Remote Sens. Environ. 2020, 247, 111950. [Google Scholar] [CrossRef]
Wei, L.; Wang, Z.; Huang, C.; Zhang, Y.; Wang, Z.; Xia, H.; Cao, L. Transparency Estimation of Narrow Rivers by UAV-Borne Hyperspectral Remote Sensing Imagery. IEEE Access 2020, 8, 168137–168153. [Google Scholar] [CrossRef]
Tian, S.; Sha, A.; Luo, Y.; Ke, Y.; Spencer, R.; Hu, X.; Ning, M.; Zhao, Y.; Deng, R.; Gao, Y.; et al. A Novel Framework for River Organic Carbon Retrieval through Satellite Data and Machine Learning. ISPRS J. Photogramm. Remote Sens. 2025, 221, 109–123. [Google Scholar] [CrossRef]
Msusa, A.D.; Jiang, D.; Matsushita, B. A Semianalytical Algorithm for Estimating Water Transparency in Different Optical Water Types from MERIS Data. Remote Sens. 2022, 14, 868. [Google Scholar] [CrossRef]
Deutsch, E.S.; Cardille, J.A.; Koll-Egyed, T.; Fortin, M.-J. Landsat 8 Lake Water Clarity Empirical Algorithms: Large-Scale Calibration and Validation Using Government and Citizen Science Data from across Canada. Remote Sens. 2021, 13, 1257. [Google Scholar] [CrossRef]
Yan, N.; Qiu, Z.; Zhang, C.; Liu, J.; Liu, D. Observing Water Turbidity in Chinese Rivers Using Landsat Series Data over the Past 40 Years. J. Clean. Prod. 2025, 494, 145001. [Google Scholar] [CrossRef]
Song, K.; Liu, G.; Wang, Q.; Wen, Z.; Lyu, L.; Du, Y.; Sha, L.; Fang, C. Quantification of Lake Clarity in China Using Landsat OLI Imagery Data. Remote Sens. Environ. 2020, 243, 111800. [Google Scholar] [CrossRef]
Maciel, D.A.; Pahlevan, N.; Barbosa, C.C.F.; Martins, V.S.; Smith, B.; O’Shea, R.E.; Balasubramanian, S.V.; Saranathan, A.M.; Novo, E.M.L.M. Towards Global Long-Term Water Transparency Products from the Landsat Archive. Remote Sens. Environ. 2023, 299, 113889. [Google Scholar] [CrossRef]
He, Y.; Lu, Z.; Wang, W.; Zhang, D.; Zhang, Y.; Qin, B.; Shi, K.; Yang, X. Water Clarity Mapping of Global Lakes Using a Novel Hybrid Deep-Learning-Based Recurrent Model with Landsat OLI Images. Water Res. 2022, 215, 118241. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, Y.; Shi, K.; Zhou, Y.; Li, N. Remote Sensing Estimation of Water Clarity for Various Lakes in China. Water Res. 2021, 192, 116844. [Google Scholar] [CrossRef]
Page, B.P.; Olmanson, L.G.; Mishra, D.R. A Harmonized Image Processing Workflow Using Sentinel-2/MSI and Landsat-8/OLI for Mapping Water Clarity in Optically Variable Lake Systems. Remote Sens. Environ. 2019, 231, 111284. [Google Scholar] [CrossRef]
Chen, X.; Liu, L.; Zhang, X.; Li, J.; Wang, S.; Gao, Y.; Mi, J. Long-Term Water Clarity Patterns of Lakes across China Using Landsat Series Imagery from 1985 to 2020. Hydrol. Earth Syst. Sci. 2022, 26, 3517–3536. [Google Scholar] [CrossRef]
Feng, L.; Hou, X.; Zheng, Y. Monitoring and Understanding the Water Transparency Changes of Fifty Large Lakes on the Yangtze Plain Based on Long-Term MODIS Observations. Remote Sens. Environ. 2019, 221, 675–686. [Google Scholar] [CrossRef]
Alikas, K.; Kratzer, S. Improved Retrieval of Secchi Depth for Optically-Complex Waters Using Remote Sensing Data. Ecol. Indic. 2017, 77, 218–227. [Google Scholar] [CrossRef]
Rubin, H.J.; Lutz, D.A.; Steele, B.G.; Cottingham, K.L.; Weathers, K.C.; Ducey, M.J.; Palace, M.; Johnson, K.M.; Chipman, J.W. Remote Sensing of Lake Water Clarity: Performance and Transferability of Both Historical Algorithms and Machine Learning. Remote Sens. 2021, 13, 1434. [Google Scholar] [CrossRef]
Maciel, D.A.; Barbosa, C.C.F.; Novo, E.M.L.D.M.; Flores Júnior, R.; Begliomini, F.N. Water Clarity in Brazilian Water Assessed Using Sentinel-2 and Machine Learning Methods. ISPRS J. Photogramm. Remote Sens. 2021, 182, 134–152. [Google Scholar] [CrossRef]
Qing, S.; Cui, T.; Lai, Q.; Bao, Y.; Diao, R.; Yue, Y.; Hao, Y. Improving Remote Sensing Retrieval of Water Clarity in Complex Coastal and Inland Waters with Modified Absorption Estimation and Optical Water Classification Using Sentinel-2 MSI. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102377. [Google Scholar] [CrossRef]
Zhang, Y.; Shi, K.; Sun, X.; Zhang, Y.; Li, N.; Wang, W.; Zhou, Y.; Zhi, W.; Liu, M.; Li, Y.; et al. Improving Remote Sensing Estimation of Secchi Disk Depth for Global Lakes and Reservoirs Using Machine Learning Methods. GISci. Remote Sens. 2022, 59, 1367–1383. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, C.; Ma, C.; Chen, X.; Li, Q.; Ye, X.; Yu, Z.; Tian, L. Machine Learning-Based Retrieval of Chlorophyll-a and Total Suspended Matter from HY-3A CZI: Model Development, Validation, and Application. ISPRS J. Photogramm. Remote Sens. 2025, 227, 613–631. [Google Scholar] [CrossRef]
Li, S.; Song, K.; Wang, S.; Liu, G.; Wen, Z.; Shang, Y.; Lyu, L.; Chen, F.; Xu, S.; Tao, H.; et al. Quantification of Chlorophyll-a in Typical Lakes across China Using Sentinel-2 MSI Imagery with Machine Learning Algorithm. Sci. Total Environ. 2021, 778, 146271. [Google Scholar] [CrossRef]
Werther, M.; Burggraaff, O.; Gurlin, D.; Saranathan, A.M.; Balasubramanian, S.V.; Giardino, C.; Braga, F.; Bresciani, M.; Pellegrino, A.; Pinardi, M.; et al. On the Generalization Ability of Probabilistic Neural Networks for Hyperspectral Remote Sensing of Absorption Properties across Optically Complex Waters. Remote Sens. Environ. 2025, 328, 114820. [Google Scholar] [CrossRef]
Pahlevan, N.; Smith, B.; Schalles, J.; Binding, C.; Cao, Z.; Ma, R.; Alikas, K.; Kangro, K.; Gurlin, D.; Hà, N.; et al. Seamless Retrievals of Chlorophyll-a from Sentinel-2 (MSI) and Sentinel-3 (OLCI) in Inland and Coastal Waters: A Machine-Learning Approach. Remote Sens. Environ. 2020, 240, 111604. [Google Scholar] [CrossRef]
Silva, S.J.; Keller, C.A.; Hardin, J. Using an Explainable Machine Learning Approach to Characterize Earth System Model Errors: Application of SHAP Analysis to Modeling Lightning Flash Occurrence. J. Adv. Model Earth Syst. 2022, 14, e2021MS002881. [Google Scholar] [CrossRef]
Nallakaruppan, M.K.; Gangadevi, E.; Shri, M.L.; Balusamy, B.; Bhattacharya, S.; Selvarajan, S. Reliable Water Quality Prediction and Parametric Analysis Using Explainable AI Models. Sci. Rep. 2024, 14, 7520. [Google Scholar] [CrossRef]
Spyrakos, E.; O’Donnell, R.; Hunter, P.D.; Miller, C.; Scott, M.; Simis, S.G.H.; Neil, C.; Barbosa, C.C.F.; Binding, C.E.; Bradt, S.; et al. Optical Types of Inland and Coastal Waters. Limnol. Oceanogr. 2018, 63, 846–870. [Google Scholar] [CrossRef]
Bi, S.; Hieronymi, M. Holistic Optical Water Type Classification for Ocean, Coastal, and Inland Waters. Limnol. Oceanogr. 2024, 69, 1547–1561. [Google Scholar] [CrossRef]
Neil, C.; Spyrakos, E.; Hunter, P.D.; Tyler, A.N. A Global Approach for Chlorophyll-a Retrieval across Optically Complex Inland Waters Based on Optical Water Types. Remote Sens. Environ. 2019, 229, 159–178. [Google Scholar] [CrossRef]
Cui, T.W.; Zhang, J.; Wang, K.; Wei, J.W.; Mu, B.; Ma, Y.; Zhu, J.H.; Liu, R.J.; Chen, X.Y. Remote Sensing of Chlorophyll a Concentration in Turbid Coastal Waters Based on a Global Optical Water Classification System. ISPRS J. Photogramm. Remote Sens. 2020, 163, 187–201. [Google Scholar] [CrossRef]
Balasubramanian, S.V.; Pahlevan, N.; Smith, B.; Binding, C.; Schalles, J.; Loisel, H.; Gurlin, D.; Greb, S.; Alikas, K.; Randla, M.; et al. Robust Algorithm for Estimating Total Suspended Solids (TSS) in Inland and Nearshore Coastal Waters. Remote Sens. Environ. 2020, 246, 111768. [Google Scholar] [CrossRef]
Woo Kim, Y.; Kim, T.; Shin, J.; Lee, D.-S.; Park, Y.-S.; Kim, Y.; Cha, Y. Validity Evaluation of a Machine-Learning Model for Chlorophyll a Retrieval Using Sentinel-2 from Inland and Coastal Waters. Ecol. Indic. 2022, 137, 108737. [Google Scholar] [CrossRef]
Singh, K.A.; Ryu, D.; Arora, M.; Tiwari, M.K.; Sahoo, B. Improving the Accuracy of Remotely Sensed TSS and Turbidity Using Quality Enhanced Water Reflectance by a Statistical Resampling Technique. Int. J. Appl. Earth Obs. Geoinf. 2025, 142, 104681. [Google Scholar] [CrossRef]
Lehmann, M.K.; Gurlin, D.; Pahlevan, N.; Alikas, K.; Conroy, T.; Anstee, J.; Balasubramanian, S.V.; Barbosa, C.C.F.; Binding, C.; Bracher, A.; et al. GLORIA—A Globally Representative Hyperspectral in Situ Dataset for Optical Sensing of Water Quality. Sci. Data 2023, 10, 100. [Google Scholar] [CrossRef]
Zhai, M.; Zhou, X.; Tao, Z.; Xie, Y.; Yang, J.; Shao, W.; Zhang, H.; Lv, T. Satellite-Ground Synchronous in-Situ Dataset of Water Optical Parameters and Surface Temperature for Typical Lakes in China. Sci. Data 2024, 11, 883. [Google Scholar] [CrossRef]
Chowdhury, M.; De La Calle, I.; Laiz, I.; Ruescas, A.B. Near-Real-Time Turbidity Monitoring at Global Scale Using Sentinel-2 Data and Machine Learning Techniques. Remote Sens. 2025, 17, 3716. [Google Scholar] [CrossRef]
Kuhn, C.; De Matos Valerio, A.; Ward, N.; Loken, L.; Sawakuchi, H.O.; Kampel, M.; Richey, J.; Stadler, P.; Crawford, J.; Striegl, R.; et al. Performance of Landsat-8 and Sentinel-2 Surface Reflectance Products for River Remote Sensing Retrievals of Chlorophyll-a and Turbidity. Remote Sens. Environ. 2019, 224, 104–118. [Google Scholar] [CrossRef]
Werther, M.; Odermatt, D.; Simis, S.G.H.; Gurlin, D.; Lehmann, M.K.; Kutser, T.; Gupana, R.; Varley, A.; Hunter, P.D.; Tyler, A.N.; et al. A Bayesian Approach for Remote Sensing of Chlorophyll-a and Associated Retrieval Uncertainty in Oligotrophic and Mesotrophic Lakes. Remote Sens. Environ. 2022, 283, 113295. [Google Scholar] [CrossRef]
Cao, Z.; Ma, R.; Duan, H.; Pahlevan, N.; Melack, J.; Shen, M.; Xue, K. A Machine Learning Approach to Estimate Chlorophyll-a from Landsat-8 Measurements in Inland Lakes. Remote Sens. Environ. 2020, 248, 111974. [Google Scholar] [CrossRef]
Holzmüller, D.; Grinsztajn, L.; Steinwart, I. Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data. In Proceedings of the 38th International Conference on Neural Information Processing Systems, San Diego, CA, USA, 2–7 December 2025. [Google Scholar]
Smith, B.; Pahlevan, N.; Schalles, J.; Ruberg, S.; Errera, R.; Ma, R.; Giardino, C.; Bresciani, M.; Barbosa, C.; Moore, T.; et al. A Chlorophyll-a Algorithm for Landsat-8 Based on Mixture Density Networks. Front. Remote Sens. 2021, 1, 623678. [Google Scholar] [CrossRef]
Sankaran, R.; Al-Khayat, J.A.; J, A.; Chatting, M.E.; Sadooni, F.N.; Al-Kuwari, H.A.-S. Retrieval of Suspended Sediment Concentration (SSC) in the Arabian Gulf Water of Arid Region by Sentinel-2 Data. Sci. Total Environ. 2023, 904, 166875. [Google Scholar] [CrossRef]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. Explainable AI for Trees: From Local Explanations to Global Understanding. arXiv 2019, arXiv:1905.04610. [Google Scholar] [CrossRef]
Yin, Z.; Li, J.; Liu, Y.; Xie, Y.; Zhang, F.; Wang, S.; Sun, X.; Zhang, B. Water Clarity Changes in Lake Taihu over 36 Years Based on Landsat TM and OLI Observations. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102457. [Google Scholar] [CrossRef]
Shen, M.; Luo, J.; Cao, Z.; Xue, K.; Qi, T.; Ma, J.; Liu, D.; Song, K.; Feng, L.; Duan, H. Random Forest: An Optimal Chlorophyll-a Algorithm for Optically Complex Inland Water Suffering Atmospheric Correction Uncertainties. J. Hydrol. 2022, 615, 128685. [Google Scholar] [CrossRef]
Fan, W.; Xu, Z.; Dong, Q.; Chen, W.; Cai, Y. Remote Sensing-Based Spatiotemporal Variation and Driving Factor Assessment of Chlorophyll-a Concentrations in China’s Pearl River Estuary. Front. Mar. Sci. 2023, 10, 1226234. [Google Scholar] [CrossRef]
Chen, X.; Liu, L.; Zhang, X.; Li, J.; Wang, S.; Liu, D.; Duan, H.; Song, K. An Assessment of Water Color for Inland Water in China Using a Landsat 8-Derived Forel–Ule Index and the Google Earth Engine Platform. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5773–5785. [Google Scholar] [CrossRef]
Zhao, Y.; Chen, J.; Li, X. Sentinel-2 Observation of Water Color Variations in Inland Water across Guangzhou and Shenzhen after the Establishment of the Guangdong-Hong Kong-Macao Bay Area. Appl. Sci. 2023, 13, 9039. [Google Scholar] [CrossRef]

Figure 1. Spatial distribution of in situ sampling sites.

Figure 2. SDD distributions for the six OWTs (Differences in log10(SDD) among OWTs were tested using one-way ANOVA followed by Tukey’s HSD for pairwise comparisons. The asterisks indicate significant differences (p < 0.001)).

Figure 3. Centroid R_rs spectra for the six OWTs. (Solid lines show the centroid spectra for each OWT, and shaded envelopes indicate the 95% confidence intervals of R_rs at each wavelength).

Figure 4. Train–test SDD distributions under three data-splitting strategies. Histograms show the distributions of in situ Secchi disk depth (SDD, m) in the training (blue) and test (orange) sets under (a) Random, (b) Waterbody, and (c) Cross-OWT scenarios.

Figure 5. Test-set performance of nine models under three validation scenarios. Heatmaps summarize model performance on the held-out test sets for the Random, Waterbody, and Cross-OWT scenarios using R², MdSA (%), and SSPB (%). Higher R² and lower MdSA indicate better performance, while the sign of SSPB indicates underestimation (negative) or overestimation (positive).

Figure 6. Predicted versus measured SDD under the Cross-OWT split. Scatterplots compare model-predicted SDD with in situ SDD for the held-out Cross-OWT test set. The dashed line indicates the 1:1 relationship. Both axes are shown on a log10 scale, and colors denote point density.

Figure 7. PICP90-based uncertainty analysis of nine models under three validation scenarios.

Figure 8. SHAP summary plots for nine models under the Cross-OWT scenario. (Plots show SHAP values for the top features of each model on the Cross-OWT test set. Color indicates relative feature value. Positive (negative) SHAP values increase (decrease) predicted SDD relative to the baseline).

Figure 9. SHAP dependence plots for NSMI under the Cross-OWT scenario (Points represent individual samples and are colored by the band ratio Rrs(560)/Rrs(490)).

Figure 10. SHAP dependence of R_rs660 for NN models under the Cross-OWT scenario (Points represent individual samples and are colored by R_rs(490)).

Table 1. Sample proportion and summary statistics of SDD and simulated R_rs for the six merged OWTs (values are median [Q1–Q3], R_rs490/560/660/705 are reported in ×10⁻³ sr⁻¹).

OWT	Sample, n (%)	SDD (m)	R_rs(490) (sr⁻¹)	R_rs(560) (sr⁻¹)	R_rs(660) (sr⁻¹)	R_rs(705) (sr⁻¹)
I	92 (1.5)	25.32 [21.64~28.80]	4.66 [3.97~5.12]	1.43 [1.31~1.61]	0.16 [0.13~0.20]	0.07 [0.05~0.09]
II	734 (11.8)	7.25 [4.90~11.10]	4.20 [3.04~6.11]	3.13 [2.11~5.85]	0.64 [0.37~1.11]	0.195 [0.11~0.34]
III	3042 (48.9)	1.60 [0.75~3.02]	7.23 [3.48~14.5]	10.8 [5.59~21.0]	4.87 [2.15~11.8]	3.86 [1.49~10.4]
IV	1298 (20.9)	0.70 [0.40~0.98]	6.70 [3.79~9.28]	13.8 [9.04~20.7]	6.93 [4.66~9.59]	10.6 [6.54~17.6]
V	967 (15.6)	0.28 [0.15~0.49]	18.1 [12.3~23.3]	26.5 [20.3~33.3]	29.2 [20.9~37.2]	28.4 [19.6~38.7]
VI	85 (1.4)	1.20 [0.80~1.50]	0.79 [0.53~1.20]	1.65 [1.05~2.51]	1.75 [1.28~2.61]	1.92 [1.25~3.47]

Table 2. Key hyperparameters used in this study.

Model	Key Hyperparameters
Probabilistic NN (BNN-MCD, MDN)	hidden layers = 5; hidden units = 500; activation = ReLU; batch = 32; optimizer = Adam; lr = 1 × 10⁻⁴; L2 = 1 × 10⁻³; loss = NLL; BNN-MCD: dropout = 0.25, MC samples = 100; MDN: mixture components = 5
RealMLP	Default settings (no tuning)
RF	n_estimators = 200; max_depth = 18; max_features = sqrt; min_samples_split = 4
KNN	n_neighbors = 12; weights = distance; p = 1 (Manhattan)
SVM	kernel = RBF; C = 9.91; epsilon = 0.303
XGB	n_estimators = 400; max_depth = 10; learning_rate = 0.016; gamma = 1.91; min_child_weight = 3.20; colsample_bytree = 0.57
LGBM	num_leaves = 200; max_depth = 9; learning_rate = 0.181; feature_fraction = 0.731; bagging_fraction = 0.967; bagging_freq = 6; min_data_in_leaf = 20
CAT	depth = 8; learning_rate = 0.05; l2_leaf_reg = 3

Table 3. Effects of raw-band versus engineered-feature inputs on model bias and error under Cross-OWT extrapolation.

Model	Metrics	6 Bands	Bands + Indices
RF	SSPB	31.08	48.21
RF	MdSA	47.52	50.50
XGB	SSPB	30.70	36.45
XGB	MdSA	45.84	46.50
LGBM	SSPB	27.86	28.25
LGBM	MdSA	49.99	54.58
CAT	SSPB	16.73	27.45
CAT	MdSA	41.14	44.86
SVM	SSPB	121.77	70.14
SVM	MdSA	144.93	79.23
KNN	SSPB	21.95	72.38
KNN	MdSA	39.99	96.40

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, B.; Yang, H.; Deng, L.; Zhao, J. Optical Water Type Guided Benchmarking of Machine Learning Generalization for Secchi Disk Depth Retrieval. Remote Sens. 2026, 18, 287. https://doi.org/10.3390/rs18020287

AMA Style

Jiang B, Yang H, Deng L, Zhao J. Optical Water Type Guided Benchmarking of Machine Learning Generalization for Secchi Disk Depth Retrieval. Remote Sensing. 2026; 18(2):287. https://doi.org/10.3390/rs18020287

Chicago/Turabian Style

Jiang, Bo, Hanfei Yang, Lin Deng, and Jun Zhao. 2026. "Optical Water Type Guided Benchmarking of Machine Learning Generalization for Secchi Disk Depth Retrieval" Remote Sensing 18, no. 2: 287. https://doi.org/10.3390/rs18020287

APA Style

Jiang, B., Yang, H., Deng, L., & Zhao, J. (2026). Optical Water Type Guided Benchmarking of Machine Learning Generalization for Secchi Disk Depth Retrieval. Remote Sensing, 18(2), 287. https://doi.org/10.3390/rs18020287

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optical Water Type Guided Benchmarking of Machine Learning Generalization for Secchi Disk Depth Retrieval

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Global In Situ Datasets

2.2. Optical Water Type Classification

2.3. Machine Learning Methods

2.4. Model Training

2.4.1. Data Splitting Strategies

2.4.2. Training and Hyperparameters

2.5. SHAP Interpretation

2.6. Statistical Metrics

2.7. Uncertainty Metric

3. Results

3.1. Optical Water Types Analysis

3.2. Model Performance Under Three Data-Splitting Strategies

3.3. Uncertainty Analysis

3.4. SHAP-Based Model Interpretation

3.4.1. Global Feature Importance

3.4.2. Partial Dependence and Thresholds

4. Discussion

4.1. Scenario-Dependent Performance and Practical Applicability

4.2. Feature Design Sensitivity Under Cross-OWT Extrapolation

4.3. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI