Self-Supervised Reservoir Water Area Detection Across Multi-Source Optical Imagery

Mo, Guiyan; Yang, Qing; Zhou, Xiaofeng

doi:10.3390/rs18060918

Open AccessArticle

Self-Supervised Reservoir Water Area Detection Across Multi-Source Optical Imagery

by

Guiyan Mo

¹,

Qing Yang

^2,3,* and

Xiaofeng Zhou

¹

College of Computer Science and Software Engineering, Hohai University, Nanjing 211100, China

²

Department of Earth System Science, Doerr School of Sustainability, Stanford University, Stanford, CA 94305, USA

³

Department of Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD 20740, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(6), 918; https://doi.org/10.3390/rs18060918

Submission received: 10 February 2026 / Revised: 9 March 2026 / Accepted: 15 March 2026 / Published: 18 March 2026

(This article belongs to the Special Issue Advanced Remote Sensing for Hydro-Climatic Extremes: Modeling, Characterization, and Risk Analysis)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

We develop a label-free Self-Supervised Water Detection (SWD) framework. It automates sample initialization using geo-spectral features and addresses spectral variability and surface complexity through per-scene adaptive learning.
Consistent and transferable performance is demonstrated across 36 test cases (3 sensors × 6 reservoirs × 2 hydrological conditions). SWD achieves high cross-scale consistency (IoU ≥ 0.774), stable cross-region generalization (SD: 0.010), and accurate hydrological tracking (minimal bias variation, ΔRE < 1%), without manual labels.

What are the implications of the main findings?

The high cross-scale consistency of the framework allows for the seamless integration of historical and current satellite archives to reconstruct reliable long-term surface water extent in ungauged basins.
Without the need for site/sensor-specific training and specialized hardware, the proposed framework provides a scalable solution for near-real-time monitoring of hydrological emergencies and large-scale water resource management.

Abstract

Reservoirs are critical infrastructure for water and energy security, and require accurate and timely monitoring of reservoir water extent to make informed decisions. Optical remote sensing provides frequent, large-area observations; however, automated water extraction is often complicated by dam operation and surface heterogeneity, which increase spectral variability. Supervised methods, though widely used, generally require manual labels and often perform poorly when transferred across sensors and regions, limiting operational deployment. In this paper, we develop a geo-spectral feature-guided Self-Supervised Water Detection (SWD) framework, an automated algorithm designed for multi-source optical imagery. SWD consists of two stages: pixel-level classification and object-level refinement. Initially, SWD integrates spatial priors with spectral features to automatically derive high-confidence samples, which are then utilized to parameterize Gaussian mixture model to represent multimodal spectral distribution throughout the image. Furthermore, superpixel-constrained region growing is applied to refine shoreline and ensure object-level consistency. We validated SWD across 36 test cases comprising three sensors, six reservoirs, and two hydrological conditions. Compared with Random Forest and U-Net, SWD achieved the best performance. Specifically, (1) in cross-scale tests, SWD achieved high consistency with IoU ≥ 0.774; (2) in cross-region transfers, SWD maintained stable generalization (SD: 0.010); and (3) in hydrological response assessments, SWD captured water-level fluctuations with minimal bias variation (ΔRE < 1%). In addition, SWD framework is computationally efficient, with processing times of 0.49–1.29 s/Mpx on a standard CPU. This study demonstrates that SWD effectively addresses spectral variability and surface complexity in reservoir water area detection across multi-source optical imagery. It operates without manual labels or model training, enabling automated, large-scale and multi-temporal reservoir water monitoring.

Keywords:

reservoir monitoring; self-supervised learning; label-free; multi-source remote sensing; geo-spectral characteristics; drawdown zone

1. Introduction

Reservoirs are key components of global water infrastructure, providing essential functions for flood control, irrigation, and hydropower generation [1,2]. Unlike natural lakes, reservoir surface areas are subject to frequent anthropogenic regulation [3]. Consequently, accurate spatiotemporal monitoring of reservoir dynamics is necessary for dam operation [4], water resource management [5], and hydrological risk assessment [6]. Under the influence of climate change and increasing extreme weather events [7], reservoirs exhibit large fluctuations in water levels [8], requiring high-frequency and large-scale monitoring solutions.

Multi-source optical remote sensing, including platforms such as Landsat, Sentinel-2, and PlanetScope, provides the data necessary for frequent surface water observations [9]. These sensors have been extensively used in regional and global hydrological research. For example, Landsat archives have enabled the development of global surface water for long-term change analysis [10], while the higher spatial and temporal resolutions of Sentinel-2 [11] and PlanetScope [12] have supported the monitoring of small-scale inland waters and rapid inundation dynamics [13].

Despite these capabilities, automated water extraction from reservoirs face several environmental and spectral challenges. First, artificial regulation leads to frequent exposure of the drawdown zones, where saturated soils and sparse vegetation induce mixed-pixel effects and spectral signatures that overlap with turbid water [14]. Second, the complex topography of river valleys often causes terrain shadows that are spectrally similar to deep water [15]. Third, varying sediment concentrations produce high intra-class spectral variability [16] within a single reservoir. These environmental factors, combined with inconsistent spectral responses [17] across sensors, reduce the transferability and reliability of automated reservoir water area detection.

Existing water extraction methods are generally categorized into spectral indices, traditional machine learning, and deep learning [18]. Spectral indices, such as the Normalized Difference Water Index (NDWI) [19] and its variants, are widely used for their computational efficiency and physical basis [20,21]. However, these methods typically rely on thresholding techniques (e.g., Otsu’s method [22]), which are less effective when the bimodal distribution of water and non-water classes is obscured by mixed pixels in the drawdown zones [23].

To improve the classification accuracy in complex environments, machine learning models such as Random Forest and Support Vector Machines have been used [24]. These models can integrate multi-dimensional features [25], but they often exhibit limited generalization when applied to different geographical regions or sensors because of shifts in spectral feature distributions [26]. Recently, deep learning architectures, such as U-Net [27] and Transformers [28], have achieved high performance in semantic segmentation [29]. However, the effectiveness of supervised deep learning models depends on the availability of large-scale, high-quality training datasets. Manual annotation of diverse reservoir conditions, such as optically complex waters (turbid waters), is labor-intensive [30]. In addition, the high computational requirements of deep learning models can limit their operational deployment in resource-limited environments [31].

Self-Supervised learning has been proposed to reduce the need for manual labels. However, current deep learning-based self-supervised frameworks in remote sensing often adopt pretext tasks designed for natural images, such as contrastive learning or masked autoencoders [32]. These conventional paradigms require massive pre-training and do not explicitly incorporate the spectral and physical characteristics of water [33]. Furthermore, they still require supervised fine-tuning with labeled data to achieve pixel-level accuracy, which limits their applicability for fully automated, near-real-time monitoring during hydrological emergencies [34]. Distinct from these deep learning pre-training paradigms, self-supervision can also be realized through an automated, heuristic-guided weakly supervised approach. This label-free paradigm directly extracts highly reliable internal supervisory signals from the target image itself.

Building on this concept, this study presents a geo-spectral feature-guided Self-Supervised Water Detection (SWD) framework. In this context, the self-supervised mechanism utilizes historical spatial priors and current spectral features to automatically derive high-confidence water samples. These samples serve as reliable training signals, completely eliminating the need for manual annotations and artificial pretext tasks. The framework adopts a per-scene adaptive learning strategy using Gaussian mixture modeling (GMM) to represent the multimodal spectral distribution of reservoir water. Object-level consistency is maintained through superpixel-constrained region growing.

Overall, SWD is designed to be label-free, computationally efficient, and robust across multi-source optical sensors, providing a practical solution for large-scale, long-term reservoir water area detection. Specifically, extensive validations across 36 scenarios reveal that (1) SWD achieves high cross-scale consistency (IoU ≥ 0.774), exhibiting minimal performance degradation at coarser resolutions compared to supervised models; (2) The per-scene adaptive learning strategy ensures robust cross-region generalization (SD of IoU = 0.010) against heterogeneous backgrounds; (3) The framework provides highly consistent cross-temporal hydrological tracking (ΔRE < 1%) during large water-level fluctuations; and (4) The training-free, CPU-friendly architecture provides high deployment flexibility for on-demand processing (0.49–1.29 s/Mpx). These findings demonstrate that integrating spatial priors with per-scene adaptive learning effectively mitigates the spectral variability and surface complexity in reservoir water area detection across multi-source optical imagery. Accordingly, the remainder of this paper is structured as follows: Section 2.1 details the SWD framework; Section 2.2 and Section 2.3 present the study areas and experimental design; Section 3 analyzes the experimental results; Section 4 discusses the paradigm divergence; and the final section concludes the study.

2. Materials and Methods

2.1. Self-Supervised Water Detection Framework

To address spectral variability and surface complexity in reservoir water area detection across multi-source optical imagery, this study proposes a SWD framework. By integrating geo-spectral features with a per-scene adaptive learning strategy, SWD achieves automated reservoir water area detection across optical sensors without manual labels. As illustrated in Figure 1, the workflow consists of four sequential modules: (1) constructing a reservoir-sensitive feature space using NDWI and NDVI (Section 2.1.1); (2) deriving high-confidence water samples from the permanent pool through a spatio-temporal adaptive approach (Section 2.1.2); (3) utilizing an adaptive Gaussian mixture model (GMM) to represent multi-modal spectral distributions (Section 2.1.3); and (4) applying object-level refinement via superpixel-constrained region growing (Section 2.1.4).

2.1.1. Reservoir-Sensitive Feature Space Construction

Compared to natural lakes, reservoirs often exhibit a longitudinal spectral gradient [35] from the upstream region to the dam region. The riverine backwater zones at the reservoir tail are often affected by inflowing sediment and eutrophication, which can increase reflectance and make water spectrally similar to turbid mudflats [36]. In contrast, the deep lacustrine forebays near the dam show strong absorption and lower reflectance. Furthermore, as artificially regulated water bodies, reservoirs feature expansive drawdown zones that are subject to periodic water level fluctuations [14]. These zones are frequently in an alternating wet-dry state and are often colonized by pioneer vegetation, resulting in complex mixed-pixel characteristics between water and vegetation. Additionally, many reservoirs are located in rugged topographies and can be affected by terrain shadows [15]. Given these conditions, using a single spectral index is often insufficient for the robust separation of water and non-water pixels.

Based on these geo-spectral characteristics, SWD uses the normalized difference water index (NDWI) and normalized difference vegetation index (NDVI) to construct a low-dimensional discriminative space. NDWI utilizes the absorption of water in the near-infrared (NIR) band to enhance the spectral contrast between deep water and the surrounding terrain, and is defined as [19]:

N D W I = \frac{ρ_{G r e e n} - ρ_{N I R}}{ρ_{G r e e n} + ρ_{N I R}} .

(1)

To compensate for the reduced sensitivity of NDWI under turbid inflows or mudflats, NDVI is introduced as an additional constraint to exclude pioneer vegetation and reduce confusion with dark shadows. NDVI reflects the absorption of vegetation in the red band and reflectance in the NIR band, and is defined as [37]:

N D V I = \frac{ρ_{N I R} - ρ_{R e d}}{ρ_{N I R} + ρ_{R e d}},

(2)

where ρ denotes the surface reflectance in the respective spectral bands (Green, Red, and NIR).

In reservoir environments, combining these indices forms three clusters in the feature space: reservoir water surface (high NDWI and low NDVI), pioneer vegetation/drawdown zones (low NDWI and high NDVI), and terrain shadow (low NDWI and low NDVI).

2.1.2. Spatio-Temporal Automated Sample Initialization

In consideration of the spatial patterns of reservoir water, characterized by a stable permanent pool and fluctuating marginal zone, SWD uses a spatio-temporal approach for automated water sample initialization. This approach couple historical spatial priors with current spectral features to derive high-confidence water samples from the current optical image.

(1): Spatial Priors from Historical Water Occurrence

To constrain the potential surface water extent, we utilizes the water occurrence (WO) layer from the global surface water (GSW) dataset [10] as historical spatial priors. After aligning the water occurrence layer with the current optical image, we apply a water-occurrence threshold T_WO (initially set to 80%) to extract the persistent water region (PWR). The persistent water region typically corresponds to the stable permanent pool of the reservoir and the mainstem reaches. These areas have consistently been classified as water across multi-decadal observations, indicating a high probability of being water at the time of observation.

(2): Spectral filtering and Sample Initialization

Despite spatial priors, the current scene may still be affected by clouds, shadows or high-reflectance features. A filtering procedure (split-based approach, SBA [38]) is applied to partition and remove these confounding non-water pixels, yielding a purer initial water sample set. The principal steps of the split-based approach to determine the optimal global threshold for the current scene are as follows:

1. Adaptive splitting and feature extraction. The scene is partitioned into a grid of blocks, and a 10% overlap is maintained between adjacent blocks to ensure comprehensive coverage and reduce edge effects. Subsequently, NDWI, NDVI and NIR reflectance are calculated for each block.

2. Candidate block filtering. Blocks are screened against three criteria: (i) Spatial priors: the block must overlap with the persistent water region; (ii) Water ratio: the proportion of persistent water within the block is constrained between 20% and 80% to ensure a balanced representation of both water and land; and (iii) NIR constraint: the mean NIR reflectance must remain below the global 5th percentile to minimize interference from built-up surface and terrain shadow.

3. Representative block selection and local thresholding. From the qualified candidates, ten blocks exhibiting the highest spectral variability (quantified by the standard deviation of NDWI) are designated as representative blocks. For each of these blocks, Otsu’s method [22] is employed directly to determine the local thresholds for NDWI and NDVI, yielding a local threshold pair

(T_{N D W I}^{l o c a l}, T_{N D V I}^{l o c a l})

.

4. Global threshold integration. The medians of all local threshold pairs are computed to define the final global threshold pair

(T_{N D W I}^{g l o b a l}, T_{N D V I}^{g l o b a l})

for the current scene.

Finally, pixels within the persistent water region that satisfy both the NDWI and NDVI global threshold criteria simultaneously are selected as high-confidence initial water samples. Formally, the high-confidence water sample set (

S_{w a t e r}

) is defined as the intersection (

\cap

) of these conditions:

S_{w a t e r} = {p \in P W R | N D W I (p) ⩾ T_{N D W I}^{g l o b a l} \cap N D V I (p) ⩽ T_{N D V I}^{g l o b a l}} .

(3)

If the size of the initial water sample set (initially set to 1000 [38]) is insufficient, the threshold T_WO was decreased by 5% adaptively [38]. The initial T_WO was set to 80% globally as an optimal trade-off between sample purity and statistical representation. A higher threshold (e.g., >90%) guarantees absolute purity but may yield insufficient sample sizes in highly shrunken reservoirs, limiting the GMM’s ability to capture the intra-class spectral variance of the water body. Conversely, a lower threshold (e.g., >60%) risks introducing seasonal mixed pixels (e.g., mudflats). A sensitivity analysis provided in the Supplementary Material (Table S1) confirms that starting at 80% reliably secures pure and statistically sufficient training samples across diverse environments, while the adaptive reduction mechanism serves as a robust failsafe for extreme scenarios where valid samples are scarce.

2.1.3. Multi-Modal Modeling and Pixel Discrimination

Due to upstream sediment transport and dam interception, reservoirs frequently exhibit distinct longitudinal spectral gradients [39,40]. The transition from deep, clear zones near the dam to shallow, turbid zones at the inflow causes water spectral characteristics to vary with suspended matter concentration and water depth [36], resulting in multi-modal distribution features. Consequently, this study employs a per-scene adaptive Gaussian mixture model (GMM) [41] to characterize the multi-modal spectral distribution of water samples within the NDWI-NDVI feature space.

(1): Adaptive GMM Construction

We assume that the spectral characteristics of reservoir water follow a weighted superposition of multiple Gaussian distributions:

p (x ∣ Θ) = \sum_{k = 1}^{K} π_{k} N (x ∣ μ_{k}, \sum_{k}),

(4)

N (x ∣ μ_{k}, \sum_{k}) = \frac{1}{\sqrt{{(2 π)}^{D} | \sum_{k} |}} e x p (- \frac{1}{2} {(x - μ_{k})}^{T} \sum_{k}^{- 1} (x - μ_{k})),

(5)

where

π_{k}

,

μ_{k}

, and

\sum_{k}

represent the mixing coefficient, mean vector, and covariance matrix of the kth Gaussian component, respectively, and K is the total number of components. Each Gaussian component corresponds to the probability distribution of different hydro-spectral sub-classes within the reservoir water (e.g., deep water, shallow water, turbid water) in the spectral feature space.

The training of Gaussian mixture model relies entirely on the high-confidence water sample set generated in Section 2.1.2, constituting a one-class modeling approach. Parameters are initialized by water sample set

S_{w a t e r}

, followed by an iterative optimization via the expectation–maximization (EM) algorithm [42] to achieve maximum likelihood estimation. To accommodate the varying spectral features across different reservoirs and avoid the subjective selection of mixture components parameter K, the Bayesian information criterion (BIC) [43] is utilized for adaptive model selection on a per-scene basis:

B I C = - 2 l n (\hat{L}) + K l n (N),

(6)

l n (\hat{L}) = \sum_{i = 1}^{N} l n (\sum_{k = 1}^{K} π_{k} N (x_{i} ∣ {\hat{μ}}_{k}, {\sum^{^}}_{k})),

(7)

where N is the number of samples and K denotes the number of free parameters. Given that spectral heterogeneity within the NDWI-NDVI feature space is typically constrained by a finite number of distinct hydro-spectral types, the candidate component parameter is set to

K \in [2, 5]

. The optimal configuration is determined by minimizing the Bayesian information criterion. This produce allows the model to favor a smaller K for spectrally uniform regions and a larger K in complex areas with stronger water heterogeneity.

(2): Probability-based Discrimination

After completing the GMM fitting, a quantile-based probability density threshold [44] mechanism is introduced to achieve automatic water pixel discrimination. Pixels with a low probability density relative to the fitted water distribution are classified as non-water (background). The global probability density threshold T_PDF is derived by calculating the quantiles at the 87% confidence level for the NDWI and NDVI features individually. To ensure global generalization without requiring scene-specific tuning, this 87% threshold was determined through an empirical sensitivity analysis conducted on an independent validation subset prior to the main experiments. Specifically, probability density quantiles ranging from 75% to 95% were evaluated. As demonstrated in the Supplementary Material (Table S2), lower thresholds (e.g., 75–80%) introduced excessive commission errors by incorporating dark soils and shadows, whereas higher thresholds (e.g., 90–95%) resulted in the omission of shallow water boundaries and narrow tributaries. The 87% level provided the optimal trade-off to minimize background noise while preserving the geometric integrity of the water surface [44]. For each pixel

x_{i}

, the discrimination rule is defined as:

L a b e l (x_{i}) = {\begin{matrix} 1 (W a t e r), if P (x_{i} | θ_{o p t i m a l}) ⩾ T_{P D F} \\ 0 (N o n - w a t e r), o t h e r w i s e \end{matrix} .

(8)

2.1.4. Object-Level Morphological Refinement

Controlled by river channel topography, most reservoirs exhibit elongated and dendritic morphologies inherited from river networks. Pixel-level classification often produces fragmented masks in narrow tributaries owing to mixed pixels or shadow occlusion. To improve the hydrologically consistent reservoir geometry, we introduce a region growing [45] mechanism guided by the Felzenszwalb algorithm (RG-FELZ), which refines the initial mask through structural and topological constraints.

A topological constraint is established via connected-component labeling and an Euclidean distance transform [46] of the initial water mask, providing a spatial reference that facilitates growth from existing water bodies while suppressing isolated noise pixels. Simultaneously, structural homogeneity is maintained using Felzenszwalb superpixel segmentation [47] on the feature layers (scale = 15, sigma = 0.5, min_size = 5). This partitions the image into discrete units, confining expansion within homogeneous regions and preventing over-expansion into riparian shadows. An iterative region-growing process originates from the initial water mask (the seed region). Candidate pixels identified through morphological dilation are added to the water region only if they satisfy both the topological reference and structural boundaries. Finally, to preserve narrow tributaries and dendritic bays while reducing fragmentation in drawdown zones, the refinement is performed independently across feature layers and pixel-level integration [48]:

M a s k_{f i n a l} = M a s k_{N D W I} \cap M a s k_{N D V I} .

(9)

2.2. Study Areas and Datasets

To evaluate the performance of the SWD framework across different reservoirs, we constructed a dataset covering diverse geomorphologies, large-amplitude water-level fluctuations, and multi-sensor spatial scales.

2.2.1. Representative Reservoir Scenarios

Six reservoirs were selected globally to represent four environmental scenarios [49] (Table 1).

Scenario 1: Morphological and topological complexity. Lake Eucumbene (Australia) and Pires Ferreira (Brazil) represent reservoirs with complex dendritic shorelines and narrow, winding tributaries that are particularly susceptible to spatial discontinuities caused by mixed-pixel effects. This scenario evaluates the method’s precision in capturing sub-pixel features and its ability to maintain the topological continuity of complex water bodies.

Scenario 2: Turbidity gradients and complex optical properties of surface water. Gilgel Gibe (Ethiopia) represents reservoirs with a longitudinal spectral gradient caused by seasonal sediment-laden runoff. Water near the inflow region is more turbid than that near the dam zone, creating within-reservoir spectral variability. This scenario evaluates the ability of the per-scene adaptive GMM to represent multi-modal water distributions.

Scenario 3: Strong background radiometric interference. Mosul (Iraq) and Orto-Tokoy (Kyrgyzstan) reservoirs represent cases with challenging backgrounds. Mosul reservoir is surrounded by high-albedo sandy terrain, while Orto-Tokoy includes high-reflectance snow/ice and terrain shadows. These reservoirs test false-positive suppression under low-contrast and heterogeneous light illumination.

Scenario 4: Spectral confusion in drawdown zones. Choke Canyon reservoir (USA) represents reservoirs with extensive drawdown zones driven by large-amplitude water-level fluctuations. Wet soils and pioneer vegetation have spectral signatures similar to those of shallow or turbid water. This scenario focuses on the model’s ability to distinguish pure water from water-saturated mixed pixels through geo-spectral feature differentiation.

2.2.2. Hydrological Conditions

Driven by dam operations, reservoir water levels vary between the dead water level and the normal water level, which periodically exposes expansive drawdown zones. To evaluate the framework’s performance under these dynamics, two observation conditions were selected for each reservoir.

The high-water-level (HWL) condition represents the reservoir operates near its normal water level, where most drawdown areas are submerged. This condition provides a relatively uniform background to evaluate the consistency of water surface detection at the maximum spatial extent.

The low-water-level (LWL) condition occurs as water recedes toward the dead water level, where mudflats, bare land, and pioneer vegetation are exposed. This condition evaluates the discrimination between shallow water and wet mudflats under spectrally similar backgrounds, and it also evaluates the shoreline accuracy when the water mask becomes spatially fragmented.

2.2.3. Multi-Source Optical Data

To evaluate the performance of the framework in multi-source, cross-scale reservoir water area detection, we constructed an optical dataset spanning three spatial resolutions. For each reservoir and hydrological condition, imagery was acquired within a 7-day window (cloud cover < 10%) to form a quasi-synchronous baseline.

PlanetScope (3 m) was used as the high-resolution validation benchmark, providing meter-level detail to capture the complex geometric features of narrow tributaries and fragmented water patches, especially near the riverine backwater zones at the reservoir tail.

Sentinel-2 (10 m) served as the primary data source for operational reservoir water area detection, because it provides a balance between spatial resolution and revisit frequency.

Landsat-8/9 (30 m) represented the long-term surface water area detection data source and was used to assess the ability of framework to handle mixed pixels at a coarser resolution, to verify its reliability for reconstructing historical archival data.

2.2.4. Data Preprocessing and Normalization

A standardized preprocessing workflow was used to reduce radiometric and geometric discrepancies among multi-source sensors. Initially, a unified four-band dataset was generated by selecting the corresponding spectral bands (Blue, Green, Red, and NIR) from Sentinel-2 and Landsat-8/9 to ensure consistency with PlanetScope’s configuration. Surface reflectance products were used across all platforms to minimize atmospheric effects. This harmonization minimizes spectral inconsistencies across sensors, effectively isolating spatial resolution as the key variable for subsequent comparative analyses.

Subsequently, image co-registration was performed using the Sentinel-2 WGS84/UTM grid as the spatial reference to ensure geometric alignment. The study area was delineated using a 500 m buffer surrounding the maximum historical water extent from the global surface water dataset [10] to cover the potential inundation and shoreline variations. Finally, cloud and cloud shadow masks were generated from the standard quality assessment bands [50] to remove invalid pixels, enabling reliable multi-temporal analysis.

2.3. Experimental Design and Evaluation Metrics

Using the multi-source datasets described in Section 2.2, this section defines a comparative evaluation framework for SWD and state-of-the-art supervised baselines (Random Forest and U-Net). The experiments are designed to benchmark performance across accuracy, robustness, and computational efficiency.

2.3.1. Comparative Baselines

(1): Random Forest Baseline

Random Forest (RF) was selected as a conventional machine-learning baseline owing to its ability to handle nonlinear decision boundaries and its robustness against feature noise. To simulate an optimal operational configuration, separate RF models were trained for PlanetScope, Sentinel-2, and Landsat-8/9 using the harmonized four-band inputs (Blue, Green, Red, and NIR) described in Section 2.2.4.

To ensure model stability, stratified random sampling was employed to construct a balanced training set of

1 \times 1 0^{5}

samples for each sensor (Water:Non-water = 1:1) from the respective sensor-specific training dataset. To optimize the model performance, Bayesian optimization with 200 iterations was implemented to fine-tune key hyperparameters, including the number of trees (50–600), maximum depth (8–30), maximum features (0.3–1.0), and minimum samples required for a split (2–50) [24].

(2): Deep Learning Baseline

The U-Net architecture was employed as the deep-learning baseline for water semantic segmentation. In alignment with Wieland et al. (2023) [51], we adopted a lightweight U-Net configuration featuring a MobileNet-V3 encoder pretrained on ImageNet-1K dataset to optimize the trade-off between the segmentation accuracy and computational overhead.

Similar to the RF approach, separate models were trained for each sensor using the harmonized four-band inputs defined in Section 2.2.4. To enhance the delineation of water boundaries, a weighted composite loss function, integrating Lovász-Softmax and cross-entropy, was utilized. The models were trained using the Adam optimizer, with learning-rate scheduling (ReduceLROnPlateau) and early stopping implemented to facilitate stable convergence and mitigate the risk of overfitting.

(3): Baseline Training Data Independence

To ensure spatial independence and prevent data leakage, the six study reservoirs were strictly excluded from the training and validation sets of all supervised baselines. These models were trained on independent, sensor-specific public datasets (Table 2) using standardized procedures.

For PlanetScope, we used the dataset from Mukherjee et al. (2024) [52], which originally provided RGB visualizations. Accordingly, we acquired the corresponding surface reflectance data (Blue, Green, Red, and NIR) by applying a 3–7-day temporal window. Low-confidence water labels were merged into the water category to create binary training sets. For Sentinel-2, the S1S2-Water global reference dataset [53] was used, covering samples from 29 countries. We extracted four 10 m spectral bands (B2, B3, B4, and B8) and used the provided expert-verified masks. For Landsat-8, the training samples were derived from the LandCoverNet global dataset [54]. To mitigate bias stemming from imbalanced dataset sizes, stratified sampling of approximately 1600 chips was performed to align the total pixel count (

1.05 \times 1 0^{8}

) with the scale of the PlanetScope dataset. Additionally, quality thresholds (consensus score

⩾

80% and cloud cover

⩽

20%) were enforced to ensure high label confidence.

2.3.2. Full-Factorial Experimental Design

To evaluate method’s performance under different spatial resolutions, reservoir environments, and hydrological states, we constructed a three-dimensional full-factorial experimental matrix across cross-scale, cross-region, and cross-temporal dimensions. This design produced 36 test cases (3 sensors × 6 reservoirs × 2 hydrological conditions), and a separate computational efficiency benchmark.

First, to evaluate cross-scale performance, the 36 test cases were grouped into 12 spatio-temporally consistent sets (6 reservoirs × 2 hydrological conditions). Performance was assessed across 30 m (Landsat-8/9), 10 m (Sentinel-2), and 3 m (PlanetScope) imagery, focusing on the capability to resolve mixed pixels and preserve fine-grained features at coarser spatial resolutions.

Second, to evaluate cross-region robustness, the test cases were reorganized into 6 environmentally comparable sets (3 sensors × 2 hydrological conditions). Within each set, performance was compared across the six reservoirs to assess the sensitivity to background conditions such as sediment turbidity, high-albedo bare soil/snow, and pioneer vegetation.

Third, cross-temporal adaptability was evaluated using 18 hydrological monitoring pairs (3 sensors × 6 reservoirs) by comparing high-water-level and low-water-level results. This analysis focuses on the method’s ability to accurately identify exposed mudflats, drawdown zones, and fragmented water bodies following water level recession.

Finally, an operational efficiency benchmark was conducted using 18 scenes under high-water-level condition to measure the inference speed (s/Mpx). To evaluate performance under comparable hardware conditions, SWD, RF, and U-Net were all benchmarked on a standard Intel Core i7-8700 CPU (3.2 GHz). Additionally, considering that deep learning models are typically deployed on specialized hardware in operational scenarios, U-Net was also evaluated on an NVIDIA GTX 1060 GPU (6 GB) to demonstrate its optimal accelerated performance.

2.3.3. Comprehensive Evaluation Metrics

To assess the segmentation performance and hydrological relevance, we combined pixel-level accuracy metrics with reservoir-specific hydrological consistency metrics.

(1): Pixel-level Accuracy via Cross-scale Validation

We implemented a cross-scale validation strategy in which higher-resolution data provides reference masks for coarser-resolution evaluation [55]. Using the quasi-synchronous multi-source observation baseline (Section 3.3), reference masks were produced by expert visual interpretation of 3 m PlanetScope imagery. For evaluation at 10 m (Sentinel-2) and 30 m (Landsat), the 3 m masks were aggregated to the corresponding grids using majority voting (i.e., a 50% water fraction threshold). To assess potential biases introduced by the aggregation rule in mixed-pixel regions, we conducted a sensitivity analysis using varying water fraction thresholds (25%, 50%, and 75%). As detailed in the Supplementary Material (Table S3), while the absolute metric values fluctuated slightly depending on the strictness of the threshold, the relative performance ranking among the methods remained consistent. Under the standard 50% majority voting rule, a 10 m or 30 m pixel was labeled as water if more than 50% of the underlying 3 m reference sub-pixels were labeled as water. To evaluate the overlap between the detected water surface and the reference masks, we utilized the Intersection over Union (IoU) [51] as the primary accuracy metric.

I o U = \frac{T P}{T P + F P + F N},

(10)

where TP, FP, and FN represent the true-positive, false-positive, and false-negative pixels, respectively.

(2): Hydrological Consistency via EAC Curves

To relate the image segmentation results to hydrological observations, we compared the estimated water areas with reservoir-specific elevation-area-capacity (EAC) curves. The reference area (Area_ref) was derived from in situ water level observations on the acquisition date. The absolute error (AE) and relative error (RE) between the extracted area (Area_est) and the reference area were calculated to measure the agreement with the reservoir’s hydraulic state [56]:

A E = | A r e a_{e s t} - A r e a_{r e f} |,

(11)

R E = \frac{A r e a_{e s t} - A r e a_{r e f}}{A r e a_{r e f}} \times 100 % .

(12)

(3): Performance Stability and Variation

The coefficient of variation (CV) was used to quantify the variation of IoU across hydrological conditions, where a lower CV indicated higher performance consistency [57]:

C V = \frac{σ_{I o U}}{μ_{I o U}} \times 100 %,

(13)

where σ_IoU and μ_IoU are the standard deviation (SD) and mean of IoU, respectively. In addition, the variation in relative error (ΔRE, %) was used to quantify the shifts in area-estimation bias between high-water-level and low-water-level [56]:

∆ R E = | R E_{L W L} - R E_{H W L} | .

(14)

A smaller ΔRE indicates more consistent area estimates across different hydrological conditions.

3. Results

3.1. Assessment of Cross-Scale Consistency

Based on the validation framework in Section 2.3.2, this section evaluates the cross-scale performance as the spatial resolution decreases from 3 m to 30 m.

3.1.1. Qualitative Comparative Analysis

Figure 2 presents a visual comparison of the water extraction results obtained by RF, U-Net, and SWD across multiple sensors (PlanetScope, Sentinel-2, and Landsat-8/9) for Pires Ferreira reservoir at high water level (HWL) and Choke Canyon reservoir at low water level (LWL). Under the high-water-level scenario (Figure 2a), the Pires Ferreira reservoir exhibits a complex dendritic structure with numerous narrow tributaries. SWD consistently captured these multi-level branches and maintained intricate shoreline geometries across all three spatial resolutions. Even in the 30 m Landsat-8/9 imagery, SWD preserved the connectivity of narrow inlets and sharp bifurcations that were closely aligned with the reference mask. In contrast, U-Net results appeared visually smoother, with many narrow branches becoming narrowed or completely disconnected, particularly as the resolution decreased. For the RF results, while identifying the main water body, it produced significant salt-and-pepper noise and fragmented clusters in the terrestrial background. Visual comparison with the false-color imagery indicates that these fragments primarily correspond to low-albedo terrestrial features (e.g., sparse vegetation and dark soil patches). This localized spectral confusion ultimately leads to broken and artificially expanded boundaries composed of isolated commission errors.

Under the low-water-level scenario for the Choke Canyon reservoir (Figure 2b), reservoir water body is characterized by narrow meandering channels and smaller fragmented patches. The SWD results demonstrated high consistency across sensors, accurately tracking the winding channels and maintaining the spatial continuity of the water surface. For U-Net, a noticeable spatial shrinkage was observed; it failed to identify several narrow backwater channels in Landsat-8/9 image, resulting in visible omission errors. The RF extraction results showed numerous isolated pixels and fragmented boundaries in non-water regions with complex textures. Visually, these fragments are driven by the spectral confusion between shallow water and saturated mudflats/pioneer vegetation exposed in the drawdown zones. RF produces a less contiguous and noisier representation of the water body compared to the other two methods.

3.1.2. Quantitative Performance Evaluation

Results from the 12 comparison groups (Table 3) confirmed that SWD consistently outperformed the supervised baselines across all resolutions. Its average IoU remained high at 0.822 (3 m), 0.805 (10 m), and 0.774 (30 m). This stable accuracy validated its robust adaptability to varying image scales. Conversely, while U-Net performance was similar to SWD at high resolution (0.816), its accuracy significantly decreased in Sentinel-2 (0.785) and Landsat-8/9 (0.741) imagery. RF performance was notably lower, with IoU decreasing to 0.644 at 30 m, approximately 17% lower than SWD.

Regarding area estimation, SWD showed the lowest relative error (RE), with values ranging from 5.12% to 7.15%, and the smallest increase in RE with coarser resolution. In contrast, U-Net exhibited faster area error growth (RE reached 9.32% at 30 m), and RF produced the largest errors (10.03–13.91%), showing a significant positive area bias at coarser resolutions.

Furthermore, SWD exhibited superior cross-scale stability, with cross-scale standard deviation (SD) for both IoU (0.024) and RE (1.02%) being significantly lower than those of U-Net (SD of IoU: 0.038, SD of RE: 1.95%) and RF (SD of IoU: 0.032, SD of RE: 1.97%). This minimized variation indicated that SWD’s extraction performance and area estimation were insensitive to resolution degradation.

The quantitative metric degradation of the baseline models can be directly linked to their distinct visual error patterns observed in Figure 2. For instance, the loss and disconnection of narrow tributaries (e.g., U-Net at 30 m) increase false negatives (FN) along dendritic inlets. This spatial omission reduces the IoU and manifests as a negative area bias (underestimation). Conversely, the boundary expansion and fragmentation typical of RF introduce false positives (FP) around shorelines and textured drawdown zones. This FP injection decreases the IoU and produces a positive-biased RE. By mitigating both FN and FP, SWD’s per-scene adaptive learning maintains consistent boundary delineation, which translates quantitatively into the aforementioned minimal cross-scale SD.

From a practical perspective, these metric differences carry important hydrological implications. In operational reservoir monitoring based on Elevation-Area-Capacity (EAC) curves, an RE exceeding 9% (as seen in U-Net and RF at 30 m) can lead to notable miscalculations in water storage, potentially affecting dam operations during flood discharges or drought allocations. Furthermore, an approximately 17% reduction in IoU (as observed in RF compared to SWD) indicates a distortion of shoreline geometry. Such inaccuracies can limit the utility of extracted water masks for high-precision environmental applications, such as assessing localized inundation risks in narrow tributaries or monitoring the ecological exposure of complex drawdown zones.

3.2. Evaluation of Cross-Region Robustness

This section assesses the generalization performance across the four environmental scenarios (Section 2.2.1), analyzing regional accuracy and stability based on Table 4 and Figure 3. To systematically evaluate performance, specific quantitative criteria (IoU, AE, or SD) are highlighted for each distinct environmental challenge. Furthermore, spectral analysis of typical misclassification-prone boundary zones (Supplementary Material, Figure S1) reveals severe spectral overlaps between true water and challenging backgrounds (e.g., saturated mudflats or sediment-laden water).

In regions with complex dendritic geometries (Scenario 1: Eucumbene and Pires Ferreira reservoirs), the primary evaluation criterion is spatial overlap (IoU), which reflects the preservation of topological continuity. SWD’s superpixel-constrained region-growing explicitly confines boundary expansion and maintains narrow tributaries at coarse resolutions. Consequently, SWD achieved the highest IoU (0.815) and lowest absolute error (AE: 2.17 km²) for the Pires Ferreira reservoir (Figure 2a), effectively mitigating the spatial fragmentation observed in RF (AE: 4.37 km²) and the omission of fine channels in U-Net (IoU: 0.785).

For reservoirs exhibiting longitudinal sediment variations (Scenario 2: Gilgel Gibe reservoir, Figure 3a), AE serves as the main metric to quantify the misclassification of turbid water. Here, the per-scene adaptive GMM enables SWD to independently model the distinct spectral modes of turbid riverine inflows and clear lacustrine zones. This adaptive multi-modal representation directly translates to a lower AE (8.71 km²) compared to U-Net (9.01 km²) and RF (15.50 km²), preventing the omission or commission of sediment-laden waters.

Under strong background radiometric interference (Scenario 3: Mosul reservoir, Figure 3b and Orto-Tokoy reservoir, Figure 3c), the primary challenge is false-positive suppression, best reflected by the IoU metric. Supervised models exhibited noticeable metric degradation due to spectral confusion with high-albedo sand (e.g., at Mosul reservoir, where RF yielded an AE of 23.92 km²) or mountain shadows (e.g., at Orto-Tokoy reservoir, where U-Net’s IoU dropped to 0.738). By leveraging historical spatial priors to filter out confounding terrestrial signals during sample initialization, SWD maintained higher spatial agreement across these challenging backgrounds (IoU: 0.785 for the Orto-Tokoy reservoir; AE restricted to 11.02 km² for the Mosul reservoir).

In reservoirs with extensive drawdown zones (Scenario 4: Choke Canyon reservoir, Figure 2b), AE indicates the capability to separate pure water from saturated mudflats. The integration of NDVI into SWD’s feature space explicitly penalized the spectral signatures of pioneer vegetation and wet soils. This feature-level differentiation resulted in a tighter boundary delineation and the lowest AE (3.55 km²), compared to the area estimations of U-Net (3.84 km²) and RF (6.21 km²).

Finally, the statistical summary in Table 4 highlights cross-region stability, quantified by the inter-reservoir standard deviation (SD) of IoU. SWD yielded an SD of 0.010, indicating that the per-scene adaptive strategy effectively decouples extraction accuracy from shifting regional backgrounds. Conversely, U-Net and RF exhibited higher inter-reservoir fluctuations (SD = 0.025 and 0.031, respectively), reflecting the performance variations typically encountered when static models are applied to the challenging scenarios, such as the Mosul and Orto-Tokoy reservoirs.

3.3. Assessment of Cross-Temporal Adaptability

To evaluate the method’s capability to track hydrological dynamic processes, this section assesses trend consistency, response sensitivity, and shoreline topological stability based on the cross-temporal validation framework (Table 5 and Table 6 and Figure 4).

(1): Trend Consistency

The results demonstrated that SWD provided accurate and consistent monitoring of hydrological trends. As evidenced by the relative error (RE) metrics in Table 5, the three methods exhibited distinct area estimation biases. SWD maintained stable RE values across high-water-level (HWL) and low-water-level (LWL) conditions, at +5.58% and +6.56% respectively, indicating minimal variation as water extent receded. Conversely, U-Net exhibited a significant negative area bias across both conditions, ranging from −6.31% to −8.20%. This underestimation became more pronounced during water level recession, suggesting a systematic omission of shallow or fragmented water patches (Figure 4). Regarding the RF baseline, it demonstrated a consistently positive area bias ranging from +10.49% to +13.04%. This overestimation was further intensified under low-water-level conditions, likely due to an increased sensitivity to non-water features exposed in the drawdown zones (Figure 4).

Notably, the variation in relative area error (ΔRE) for SWD was only 0.98 percentage points, which was substantially lower than those of U-Net (1.89) and RF (2.55). Taking the Mosul reservoir as a representative example (Table 6), as the reference surface water extent decreased from 301.43 km² to 231.59 km², SWD’s extractions (312.97 km² and 242.10 km²) closely tracked the hydrological reference with a stable bias. In contrast, the supervised baselines suffered from either systemic omission (U-Net) or persistent overestimation (RF) during this hydrological transition (Figure 3b).

(2): Response Sensitivity

Regarding the sensitivity to water level fluctuations, SWD accurately captured surface area dynamics with minimal performance degradation. While the IoU for all methods decreased during the low-water-level period, SWD exhibited the highest resilience with the smallest decline (from 0.810 to 0.791, approximately −2.35%), outperforming U-Net (−4.63%) and RF (−4.18%). In terms of robustness, SWD demonstrated optimal stability with a coefficient of variation of 2.50%, roughly half that of U-Net (6.41%) and RF (7.37%). This sensitivity was further validated at the Gilgel Gibe reservoir (Table 6), which experienced a reference areal recession of 37 km². SWD precisely quantified a reduction of 36.65 km² (ΔArea closely matching the reference), whereas RF underestimated the recession magnitude (quantifying only 34.58 km²). These results indicated that SWD effectively distinguished shallow water from saturated mud in drawdown zones (Figure 3a), mitigating the underestimation of the recession magnitude caused by spectral confusion.

(3): Shoreline Topological Stability

A qualitative comparison (Figure 4) visually indicates that SWD maintained shoreline smoothness and topological integrity under low-water-level conditions. Quantitatively, the absolute error (AE) relative to the in situ hydrological reference was utilized to evaluate the spatial alignment. For the dendritic Eucumbene reservoir, SWD demonstrated the closest alignment during the low-water stage, restraining the AE to approximately +5.45 km². This tightly constrained error indicates the effective preservation of narrow tributaries and fine-scale edge details. In contrast, U-Net exhibited a pronounced negative bias (AE = −5.72 km²), primarily driven by spatial discontinuity and the extensive omission of narrow inlets. Meanwhile, the extraction result of RF produced a severe positive bias (AE = +11.20 km²), owing to the massive misclassification of exposed bare land surrounding the receding water body. Consequently, the minimized AE and stable error direction of SWD confirm its boundary stability, demonstrating its capacity to accurately delineate complex dendritic geometries without severe structural distortion under challenging hydrological conditions.

3.4. Operational Utility and Efficiency

The boxplot analysis (Figure 5) summarizes the IoU accuracy metrics across the 36 experimental scenarios. SWD demonstrated superior performance, characterized by a maximized median IoU and minimized interquartile range (IQR), with no observed outliers (0.801 ± 0.025). This confirmed that SWD maintained high accuracy and consistent stability across diverse cross-scenario conditions. In contrast, although U-Net achieved a median IoU comparable to that of SWD, it exhibited greater dispersion (0.781 ± 0.047). This larger standard deviation indicated that the stability of U-Net fluctuated when applied to different reservoir environments without site-specific training samples. RF fell into the category of inferior accuracy and stability (0.679 ± 0.046), with outlier further highlighting its limited generalization capability across heterogeneous reservoir scenarios.

Table 7 compares the computational efficiency and deployment characteristics of the evaluated methods. SWD operated in an unsupervised and training-free manner, requiring no manual labels and running on standard CPU hardware. Its average inference latency was 0.858 ± 0.372 s/Mpx, which was suitable for platforms with limited computational resources. The computational load varied with spatial resolution, with processing times of 1.29 s/Mpx for PlanetScope imagery and 0.49 s/Mpx for Landsat-8/9 data, providing a balance between spatial detail and processing speed. This processing time encompasses the entire end-to-end pipeline, including I/O operations, feature space construction, adaptive GMM fitting, and superpixel-constrained refinement. In comparison, the RF baseline showed a similar inference speed (0.956 ± 0.011 s/Mpx) on the same CPU environment, but required pre-loaded classifiers and was more susceptible to spectral variability across different sensors. For the deep learning baseline, U-Net required 1.864 ± 0.1518 s/Mpx when executed on the identical CPU environment. Although its inference time can be reduced to 0.031 ± 0.0008 s/Mpx via specialized GPU acceleration, this hardware dependency limits its operational flexibility for on-demand monitoring in resource-constrained or emergency response scenarios.

In summary, SWD strikes a balance between accuracy, robustness, and efficiency. By maintaining high-speed performance on standard CPUs without specialized hardware dependencies, it provides a practical and highly adaptable solution for large-scale reservoir monitoring tasks.

4. Discussion

4.1. Robustness via Per-Scene Adaptive Learning

The robustness of SWD is attributed to its per-scene adaptive learning paradigm. Unlike supervised models that rely on fixed training dataset, the proposed framework dynamically constructs an optimal model for each individual scene by combining geo-spectral features, pixel-level classification and object-level refinement.

Regarding feature space construction and sample initialization, SWD constructs a physically consistent feature space using NDWI and NDVI. By leveraging the specific red-edge effect of pioneer vegetation [58], the framework explicitly enhances the separability between wet soil or pioneer vegetation and pure water, ensuring strong spectral transferability across optical sensors. Furthermore, by integrating historical spatial priors and current spectral information [10], a high-confidence self-supervision signal in the stable permanent pool (deep-water region) is identified. This localization strategy effectively shields the model from radiometric interference originating from complex mixed pixels in drawdown zones or terrain shadows, thereby effectively yielding a purer initial water sample set.

In terms of pixel-level classification and object-level refinement, SWD addresses the longitudinal spectral gradient [39], a typical reservoir phenomenon driven by runoff dynamics, by employing an adaptive GMM. The GMM represents the multi-modal spectral distribution (ranging from turbid riverine inflows to clear lacustrine zones) as distinct hydro-spectral sub-modes [59]. This allows the model to dynamically reconstruct the probability thresholds for each mode, ensuring accurate segmentation even during sudden turbidity events. Finally, by applying topological continuity constraints via superpixel-based refinement, SWD mitigates spatial discontinuities caused by atmospheric residuals. This step preserves the dendritic morphology of reservoirs within topologically connected units, ensuring that the extracted water bodies maintain a coherent structure consistent with the gravitational equipotential surface.

4.2. Performance Divergence Analysis

A central finding of this research is the inherent advantage of the per-scene adaptive learning paradigm over static global models in addressing the spectral variability and surface complexity of reservoir water area detection across multi-optical imagery. Supervised models (U-Net, RF) operate as static global models, relying on the independent and identically distributed (i.i.d.) assumption that the training and test data share the same distribution [60]. However, the heterogeneity inherent in the river–lake continuum of reservoirs [36,40] frequently makes this assumption invalid.

In cross-scale tasks, static supervised models exhibited systematic performance degradation as resolution decreases, despite the dedicated models being trained independently for each sensor (Landsat, Sentinel-2, and PlanetScope). This indicates that even when the potential interference from spectral differences or model migration is minimized, supervised models are limited by the information density [61] provided by the source training data. For U-Net, the accuracy in extracting spatial textures decreases as shoreline structures that are clear at high resolutions become blurred transition zones in lower-resolution imagery. In such cases, even specialized models can only learn a fuzzy average representation from mixed spectral [62], leading to over-smoothed shorelines and the omission of narrow tributaries (e.g., U-Net’s IoU dropped to 0.741 at 30 m, Table 3). Similarly, for RF, the piecewise linear segmentation rules [24] struggle to resolve the highly complex nonlinear entangled distributions of mixed pixels in the drawdown zones, resulting in spatial discontinuity and frequent impulse noise (e.g., producing an RE of +13.91%). In contrast, SWD maintains cross-scale consistency by dynamically recalibrating its probabilistic kernel to the specific spectral characteristics of each resolution.

In cross-region evaluations, the fixed feature representations of supervised models present notable challenges in generalization when encountering out-of-distribution samples [63]. Because supervised parameters are optimized within specific offline datasets, fixed weights (e.g., U-Net convolutional kernels) and discrimination rules [64] often struggle to adapt to the nonlinear deformation of the feature space caused by environmental heterogeneity. This incompatibility leads to commission errors when models encounter high-reflectance sand (e.g., RF produced an AE of +23.92 km² at the Mosul reservoir) or complex snow/ice and terrain shadows (e.g., U-Net’s IoU dropped to 0.738 at the Orto-Tokoy reservoir) not fully represented in the training data. Conversely, the per-scene adaptive learning paradigm of SWD addresses global generalization as a simplified problem of single-scene optimal approximation. By identifying specific spectral–physical boundaries based on immediate observations, SWD ensures high reliability across heterogeneous backgrounds without relying on region-specific training data.

In cross-temporal tasks, the incompatibility between static models and dynamic hydrological cycles is evident. Reservoirs are characterized by rapid water-level fluctuations, and change the coupling relationship between spectral signatures and surface geometry. For example, the fixed parameters in U-Net exhibited reduced sensitivity to the textural variations of expansive mudflats during recession [65], leading to notable boundary deviations (e.g., resulting in an RE of -8.20% under LWL conditions, Table 5). Similarly, static thresholds in RF failed to track water-level-driven spectral shifts (e.g., deepening or shallowing water) and background exposure variations, resulting in significant error accumulation. The stateless architecture of SWD independently models each image, enabling the precise identification of dynamic boundaries without inheriting historical biases, thus ensuring an unbiased estimation of water surface dynamics.

The dynamic adaptation of SWD inherently relies on the quality of initial pseudo-samples derived from spatial priors. In extreme scenarios, such as newly impounded reservoirs lacking historical records, complete desiccation during severe droughts, or complete freezing during harsh winters, this initialization strategy is compromised by the absence of a reliable pure water reference. Under these conditions, purely self-supervised approaches may struggle to establish valid spectral distributions, whereas generalized deep learning models pretrained on highly diverse historical datasets can provide more stable baseline masks. Consequently, while the per-scene adaptive paradigm effectively mitigates local spectral variability, integrating it with robust global models remains necessary for uninterrupted operational monitoring across extreme scenarios.

4.3. Operational Applicability in Hydrology

The experimental results validate that SWD effectively bridges the gap between algorithmic precision and operational hydrology.

Specifically, the demonstrated temporal consistency across multi-source optical sensors (Section 3.1) directly supports the reconstruction of long-term elevation-area-capacity (EAC) relationships [66]. By overcoming the sensor bias inherent in static models, SWD provides a reliable technical pathway for generating unbiased surface area sequences in ungauged basins, serving as a robust baseline for assessing water volume dynamics and sedimentation effects [67].

Furthermore, the label-free capability of this method facilitates immediate emergency response. In urgent scenarios, such as catastrophic floods or barrier lake formation, SWD allows for direct deployment on multi-source imagery without site-specific tuning. This provides near-real-time decision support for formulating flood dispatching instructions and assessing disaster losses without the latency incurred by model retraining.

Finally, the high-confidence water masks generated by SWD can be used as high-quality pseudo-labels to establish a data iteration pathway for weakly supervised learning. This allows researchers to construct specialized deep learning models for specific basins (e.g., cascade hydropower stations) at a low cost, significantly resolving the sample scarcity bottleneck for model deployment.

4.4. Limitations and Future Directions

Despite its robustness, SWD is subject to certain boundary conditions and limitations inherent to passive optical remote sensing. Typical failure cases primarily stem from severe physical or optical occlusions (Figure S2. Like other optical-based algorithms, the framework is inevitably affected by continuous cloud and cloud shadows (Figure S2a), which can lead to significant temporal data gaps and hinder continuous hydrological monitoring [68,69]. Additionally, it may experience spectral confusion in extremely complex optical conditions, such as dense floating vegetation (Figure S2b) or ice/snow-covered waters (Figure S2c). In these scenarios, the spectral signatures of water closely mimic those of pioneer vegetation or highly reflective surface, potentially leading to the misclassification of water extent [70,71]. Furthermore, while historical spatial priors (GSW) effectively initialize the automated sampling, the framework may encounter challenges in regions with abrupt topological alterations (e.g., newly constructed reservoirs) where historical data are obsolete or unavailable. To maintain operational robustness in such ungauged scenarios, the initial spatial constraints can be flexibly adapted by utilizing basic spectral indices (e.g., NIR or NDWI) rather than relying solely on historical records. Meanwhile, the current per-scene independent processing strategy lacks temporal contextual information. Consequently, the framework may not effectively filter out transient noise (e.g., fleeting anomalies or temporary observation errors) that could otherwise be identified and corrected by employing time-series smoothing or hidden Markov models to leverage historical trends and subsequent observations, a well-documented necessity in long-term surface water mapping [72,73].

To address these challenges, future research will focus on three key directions. First, multimodal fusion will be explored by integrating SAR data (e.g., Sentinel-1) to achieve all-weather monitoring capabilities and mitigate cloud-induced failures [69]. Second, hydro-topographic embedding will be incorporated by utilizing DEMs and river network topology to explicitly constrain false positives in complex terrain. Finally, the framework will evolve towards spatio-temporal integration, moving from single-scene modeling to spatio-temporal synergistic perception to better capture long-term evolutionary trends and support intelligent reservoir management.

5. Conclusions

This study develops and validates the SWD framework, designed for robust, label-free reservoir water area monitoring using multi-source optical remote sensing data. By shifting from the traditional static global model paradigm to a per-scene adaptive learning strategy, the proposed framework effectively addresses the challenges of spectral variability and surface complexity.

The experimental evaluation across 36 diverse scenarios led to several key conclusions. (1) High cross-scale and cross-region robustness. SWD maintained high accuracy (Mean IoU = 0.80) across Landsat-8/9, Sentinel-2, and PlanetScope imagery. Per-scene calibration allowed it to outperform supervised models in challenging environments, such as high-albedo arid terrains and rugged mountainous shadows, where static models often suffer from commission or omission errors. (2) Superior hydrological consistency. By leveraging the adaptive GMM and object-level refinement, SWD accurately tracked reservoir area fluctuations and maintained topological integrity. The minimal variation in relative area error (ΔRE < 1%) across high- and low-water levels demonstrated its capability to reconstruct long-term, unbiased hydrological records from heterogeneous archival data. (3) Operational efficiency and flexibility. As a training-free and CPU-efficient method, SWD bridges the gap between algorithmic precision and operational hydrology. It is particularly suited for near-real-time emergency response and large-scale water resource assessment in ungauged or data-scarce basins.

While the current framework demonstrates high reliability, its performance remains subject to the inherent limitations of passive optical sensors, such as cloud obscuration and optically complex waters. Future research will explore the integration of multi-modal data (e.g., SAR) and hydro-topographic constraints (e.g., DEM) to evolve SWD into a fully automated, all-weather spatio-temporal perception system for global reservoir management.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs18060918/s1, Table S1: Sensitivity analysis of the initial water occurrence threshold (T_WO) on sample initialization and final extraction accuracy. (Tested on a dynamic reservoir in the Heihe River Basin, Northwest China); Table S2: Sensitivity analysis of the global probability density threshold for water pixel discrimination; Table S3: Sensitivity analysis of cross-scale aggregation rules on water extraction accuracy (Mean IoU); Figure S1: Frequency distribution histograms of NDWI demonstrating spectral overlaps between true water and typical misclassification areas in (a) Eucumbene reservoir and (b) Gilgel Gibe reservoir; Figure S2: Typical failure cases of the SWD framework: (a) omission errors caused by clouds and shadows; (b) omission errors induced by dense floating vegetation; and (c) commission errors triggered by highly reflective ice/snow. Red rectangle/circle indicates the misclassified regions.

Author Contributions

Conceptualization, G.M. and Q.Y.; methodology, G.M. and Q.Y.; formal analysis, G.M.; resources, Q.Y.; data curation, G.M.; writing—original draft preparation, G.M.; writing—review and editing, Q.Y.; visualization, G.M.; supervision, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The Sentinel-2 and Landsat-8/9 images used in this study are accessible at https://dataspace.copernicus.eu/ (accessed on 10 July 2025) and https://earthexplorer.usgs.gov/ (accessed on 20 July 2025). The PlanetScope images are obtained through https://www.planet.com/ (accessed on 27 July 2025). The EAC Curve can be found at https://dahiti.dgfi.tum.de/en/ (accessed on 10 August 2025).

Acknowledgments

The authors would like to express their gratitude to the European Space Agency (ESA) for providing Sentinel-2 imagery and NASA/USGS for the Landsat archives. The authors would also like to thank the Planet Labs for providing the PlanetScope data. During the preparation of this manuscript, the authors used Gemini to refine the academic language and improve the grammatical flow of the manuscript. The authors have reviewed and edited the output and take full responsibility for the content of this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Eekhout, J.P.C.; Boix-Fayos, C.; Pérez-Cutillas, P.; De Vente, J. The Impact of Reservoir Construction and Changes in Land Use and Climate on Ecosystem Services in a Large Mediterranean Catchment. J. Hydrol. 2020, 590, 125208. [Google Scholar] [CrossRef]
Xu, A.; Yang, L.E.; Yang, W.; Chen, H. Water Conservancy Projects Enhanced Local Resilience to Floods and Droughts over the Past 300 Years at the Erhai Lake Basin, Southwest China. Environ. Res. Lett. 2020, 15, 125009. [Google Scholar] [CrossRef]
Cooley, S.W.; Ryan, J.C.; Smith, L.C. Human Alteration of Global Surface Water Storage Variability. Nature 2021, 591, 78–81. [Google Scholar] [CrossRef] [PubMed]
Yang, Q.; Shen, X.; He, K.; Zhang, Q.; Helfrich, S.; Straka, W.; Kellndorfer, J.M.; Anagnostou, E.N. Pre-Failure Operational Anomalies of the Kakhovka Dam Revealed by Satellite Data. Commun. Earth Environ. 2024, 5, 230–238. [Google Scholar] [CrossRef]
Shah, D.; Gao, H. Global Patterns of Reservoir Fullness and Fluctuations during Droughts. Environ. Res. Lett. 2025, 20, 124060. [Google Scholar] [CrossRef]
Ficklin, D.L.; Null, S.E.; Abatzoglou, J.T.; Novick, K.A.; Myers, D.T. Hydrological Intensification Will Increase the Complexity of Water Resource Management. Earth’s Future 2022, 10, e2021EF002487. [Google Scholar] [CrossRef]
Li, D.; Lu, X.; Walling, D.E.; Zhang, T.; Steiner, J.F.; Wasson, R.J.; Harrison, S.; Nepal, S.; Nie, Y.; Immerzeel, W.W.; et al. High Mountain Asia Hydropower Systems Threatened by Climate-Driven Landscape Instability. Nat. Geosci. 2022, 15, 520–530. [Google Scholar] [CrossRef]
Ye, X.; Zhu, H.-H.; Chang, F.-N.; Xie, T.-C.; Tian, F.; Zhang, W.; Catani, F. Revisiting Spatiotemporal Evolution Process and Mechanism of a Giant Reservoir Landslide during Weather Extremes. Eng. Geol. 2024, 332, 107480. [Google Scholar] [CrossRef]
Huang, C.; Chen, Y.; Zhang, S.; Wu, J. Detecting, Extracting, and Monitoring Surface Water from Space Using Optical Sensors: A Review. Rev. Geophys. 2018, 56, 333–360. [Google Scholar] [CrossRef]
Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-Resolution Mapping of Global Surface Water and Its Long-Term Changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
Liang, X.; Mao, W.; Yang, K.; Ji, L. Automated Small River Mapping (ASRM) for the Qinghai-Tibet Plateau Based on Sentinel-2 Satellite Imagery and MERIT DEM. Remote Sens. 2022, 14, 4693. [Google Scholar] [CrossRef]
Perin, V.; Roy, S.; Kington, J.; Harris, T.; Tulbure, M.G.; Stone, N.; Barsballe, T.; Reba, M.; Yaeger, M.A. Monitoring Small Water Bodies Using High Spatial and Temporal Resolution Analysis Ready Datasets. Remote Sens. 2021, 13, 5176. [Google Scholar] [CrossRef]
Kimijima, S.; Nagai, M. High Spatiotemporal Flood Monitoring Associated with Rapid Lake Shrinkage Using Planet Smallsat and Sentinel-1 Data. Remote Sens. 2023, 15, 1099. [Google Scholar] [CrossRef]
Jiang, W.; Li, W.; Zhou, J.; Wang, P.; Xiao, H. Drone-Based Investigation of Natural Restoration of Vegetation in the Water Level Fluctuation Zone of Cascade Reservoirs in Jinsha River. Sci. Rep. 2022, 12, 12895. [Google Scholar] [CrossRef]
Tate, C.G.; Cave, J.; Moyers, R.L. Segmenting Water and Shadow Regions within WorldView Imagery Using Local Binary Patterns. J. Appl. Remote Sens. 2022, 16, 034532. [Google Scholar] [CrossRef]
Li, T.; Pasternack, G.B. Water Transfer Redistributes Sediment in Small Mountain Reservoirs. Water Resour. Manag. 2022, 36, 5033–5048. [Google Scholar] [CrossRef]
Li, S.; Cheng, L.; Chang, L.; Fu, C.; Guo, Z.; Liu, P. Multi-Factor Weighted Image Fusion Method for High Spatiotemporal Tracking of Reservoir Drawdown Area and Its Vegetation Dynamics. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103855. [Google Scholar] [CrossRef]
Rajeswari, S.; Rathika, P. Emerging Methodologies in Waterbody Delineation: An in-Depth Review. Int. J. Remote Sens. 2024, 45, 5789–5819. [Google Scholar] [CrossRef]
McFeeters, S.K. The Use of the Normalized Difference Water Index (NDWI) in the Delineation of Open Water Features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Guo, Q.; Pu, R.; Li, J.; Cheng, J. A Weighted Normalized Difference Water Index for Water Extraction Using Landsat Imagery. Int. J. Remote Sens. 2017, 38, 5430–5445. [Google Scholar] [CrossRef]
Khalid, H.W.; Khalil, R.M.Z.; Qureshi, M.A. Evaluating Spectral Indices for Water Bodies Extraction in Western Tibetan Plateau. Egypt. J. Remote Sens. Space Sci. 2021, 24, 619–634. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Zhao, C.; Wei, H.; Feyisa, G.L.; De Castro Tayer, T.; Ma, G.; Wu, H.; Pan, Y. Evaluating Spectral Indices for Water Extraction: Limitations and Contextual Usage Recommendations. Int. J. Appl. Earth Obs. Geoinf. 2025, 139, 104510. [Google Scholar] [CrossRef]
Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
Bangira, T.; Alfieri, S.M.; Menenti, M.; van Niekerk, A. Comparing Thresholding with Machine Learning Classifiers for Mapping Complex Water. Remote Sens. 2019, 11, 1351. [Google Scholar] [CrossRef]
Tahir, A.; Cheng, L.; Guo, R.; Liu, H. Distributional Shift Adaptation Using Domain-Specific Features. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data); IEEE: New York, NY, USA, 2022; pp. 5593–5597. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need 2023. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Mienye, I.D.; Swart, T.G. A Comprehensive Review of Deep Learning: Architectures, Recent Advances, and Applications. Information 2024, 15, 755. [Google Scholar] [CrossRef]
Song, H.; Kim, M.; Park, D.; Shin, Y.; Lee, J.-G. Learning From Noisy Labels With Deep Neural Networks: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 8135–8153. [Google Scholar] [CrossRef] [PubMed]
Chen, C.; Zhang, P.; Zhang, H.; Dai, J.; Yi, Y.; Zhang, H.; Zhang, Y. Deep Learning on Computational-Resource-Limited Platforms: A Survey. Mob. Inf. Syst. 2020, 2020, 8454327. [Google Scholar] [CrossRef]
Albelwi, S. Survey on Self-Supervised Learning: Auxiliary Pretext Tasks and Contrastive Learning Methods in Imaging. Entropy 2022, 24, 551. [Google Scholar] [CrossRef]
Wang, Y.; Albrecht, C.M.; Braham, N.A.A.; Mou, L.; Zhu, X.X. Self-Supervised Learning in Remote Sensing: A Review. IEEE Geosci. Remote Sens. Mag. 2022, 10, 213–247. [Google Scholar] [CrossRef]
Ericsson, L.; Gouk, H.; Loy, C.C.; Hospedales, T.M. Self-Supervised Representation Learning: Introduction, Advances, and Challenges. IEEE Signal Process. Mag. 2022, 39, 42–62. [Google Scholar] [CrossRef]
Doubek, J.P.; Carey, C.C. Catchment, Morphometric, and Water Quality Characteristics Differ between Reservoirs and Naturally Formed Lakes on a Latitudinal Gradient in the Conterminous United States. Inland Waters 2017, 7, 171–180. [Google Scholar] [CrossRef]
Zhuang, W.-E.; Chen, W.; Yang, L. Coupled Effects of Dam, Hydrology, and Estuarine Filtering on Dissolved Organic Carbon and Optical Properties in the Reservoir-River-Estuary Continuum. J. Hydrol. 2023, 617, 128893. [Google Scholar] [CrossRef]
Tucker, C.J. Red and Photographic Infrared Linear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
Martinis, S.; Twele, A.; Voigt, S. Towards Operational near Real-Time Flood Detection Using a Split-Based Automatic Thresholding Procedure on High Resolution TerraSAR-X Data. Nat. Hazards Earth Syst. Sci. 2009, 9, 303–314. [Google Scholar] [CrossRef]
Taylor, N.C.; Kudela, R.M. Spatial Variability of Suspended Sediments in San Francisco Bay, California. Remote Sens. 2021, 13, 4625. [Google Scholar] [CrossRef]
Zhou, B.; Shang, M.; Feng, L.; Shan, K.; Feng, L.; Ma, J.; Liu, X.; Wu, L. Long-Term Remote Tracking the Dynamics of Surface Water Turbidity Using a Density Peaks-Based Classification: A Case Study in the Three Gorges Reservoir, China. Ecol. Indic. 2020, 116, 106539. [Google Scholar] [CrossRef]
Ni, L.; Wang, D.; Wu, J.; Wang, Y.; Tao, Y.; Zhang, J.; Liu, J. Streamflow Forecasting Using Extreme Gradient Boosting Model Coupled with Gaussian Mixture Model. J. Hydrol. 2020, 586, 124901. [Google Scholar] [CrossRef]
You, J.; Li, Z.; Du, J. A New Iterative Initialization of EM Algorithm for Gaussian Mixture Models. PLoS ONE 2023, 18, e0284114. [Google Scholar] [CrossRef]
Xiang, J.; Guo, G.; Li, J. Determining the Number of Factors in Constrained Factor Models via Bayesian Information Criterion. Econom. Rev. 2023, 42, 98–122. [Google Scholar] [CrossRef]
Zhang, L.; Xie, L.; Han, Q.; Wang, Z.; Huang, C. Probability Density Forecasting of Wind Speed Based on Quantile Regression and Kernel Density Estimation. Energies 2020, 13, 6125. [Google Scholar] [CrossRef]
Qingling, L.; Ye, L.; Siddiqui, F.A. Region Growing and Level Set Synergetic Algorithms for Image Segmentation. In Proceedings of the Proceedings of the 2020 4th International Conference on Digital Signal Processing; Association for Computing Machinery: New York, NY, USA, 2020; pp. 12–16. [Google Scholar]
Shih, F.Y.; Wu, Y.-T. The Efficient Algorithms for Achieving Euclidean Distance Transformation. IEEE Trans. Image Process. 2004, 13, 1078–1091. [Google Scholar] [CrossRef]
Li, H.; Jia, Y.; Cong, R.; Wu, W.; Kwong, S.T.W.; Chen, C. Superpixel Segmentation Based on Spatially Constrained Subspace Clustering. IEEE Trans. Ind. Inform. 2021, 17, 7501–7512. [Google Scholar] [CrossRef]
Di, S.; Liao, M.; Zhao, Y.; Li, Y.; Zeng, Y. Image Superpixel Segmentation Based on Hierarchical Multi-Level LI-SLIC. Opt. Laser Technol. 2021, 135, 106703. [Google Scholar] [CrossRef]
Lehner, B.; Liermann, C.R.; Revenga, C.; Vörösmarty, C.; Fekete, B.; Crouzet, P.; Döll, P.; Endejan, M.; Frenken, K.; Magome, J.; et al. High-Resolution Mapping of the World’s Reservoirs and Dams for Sustainable River-Flow Management. Front. Ecol. Environ. 2011, 9, 494–502. [Google Scholar] [CrossRef]
Zhu, Z.; Woodcock, C.E. Object-Based Cloud and Cloud Shadow Detection in Landsat Imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
Wieland, M.; Martinis, S.; Kiefl, R.; Gstaiger, V. Semantic Segmentation of Water Bodies in Very High-Resolution Satellite and Aerial Images. Remote Sens. Environ. 2023, 287, 113452. [Google Scholar] [CrossRef]
Mukherjee, R.; Policelli, F.; Wang, R.; Arellano-Thompson, E.; Tellman, B.; Sharma, P.; Zhang, Z.; Giezendanner, J. A Globally Sampled High-Resolution Hand-Labeled Validation Dataset for Evaluating Surface Water Extent Maps. Earth Syst. Sci. Data 2024, 16, 4311–4323. [Google Scholar] [CrossRef]
Wieland, M.; Fichtner, F.; Martinis, S.; Groth, S.; Krullikowski, C.; Plank, S.; Motagh, M. S1S2-Water: A Global Dataset for Semantic Segmentation of Water Bodies from Sentinel-1 and Sentinel-2 Satellite Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1084–1099. [Google Scholar] [CrossRef]
Alemohammad, H.; Booth, K. LandCoverNet: A Global Benchmark Land Cover Classification Training Dataset 2020. arXiv 2020, arXiv:2012.03111. [Google Scholar]
Storvik, G.; Fjortoft, R.; Solberg, A.H.S. A Bayesian Approach to Classification of Multiresolution Remote Sensing Data. IEEE Trans. Geosci. Remote Sens. 2005, 43, 539–547. [Google Scholar] [CrossRef]
Schwatke, C.; Dettmering, D.; Bosch, W.; Seitz, F. DAHITI—An Innovative Approach for Estimating Water Level Time Series over Inland Waters Using Multi-Mission Satellite Altimetry. Hydrol. Earth Syst. Sci. 2015, 19, 4345–4364. [Google Scholar] [CrossRef]
Patil, S.; Stieglitz, M. Hydrologic Similarity among Catchments under Variable Flow Conditions. Hydrol. Earth Syst. Sci. 2011, 15, 989–997. [Google Scholar] [CrossRef]
Chang, G.J.; Oh, Y.; Goldshleger, N.; Shoshany, M. Biomass Estimation of Crops and Natural Shrubs by Combining Red-Edge Ratio with Normalized Difference Vegetation Index. J. Appl. Remote Sens. 2022, 16, 014501. [Google Scholar] [CrossRef]
Zhou, Y.; Rangarajan, A.; Gader, P.D. A Gaussian Mixture Model Representation of Endmember Variability for Spectral Unmixing. In Proceedings of the 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS); IEEE: New York, NY, USA, 2016; pp. 1–5. [Google Scholar]
Darrell, T.; Kloft, M.; Pontil, M.; Rätsch, G.; Rodner, E. Machine Learning with Interdependent and Non-Identically Distributed Data (Dagstuhl Seminar 15152). Dagstuhl Rep. DagRep 2015, 5, 18–55. [Google Scholar] [CrossRef]
Yan, T.; Shi, J.; Li, H.; Luo, Z.; Wang, Z. Discriminative Information Restoration and Extraction for Weakly Supervised Low-Resolution Fine-Grained Image Recognition. Pattern Recognit. 2022, 127, 108629. [Google Scholar] [CrossRef]
Saltiel, T.M.; Dennison, P.E.; Campbell, M.J.; Thompson, T.R.; Hambrecht, K.R. Tradeoffs between UAS Spatial Resolution and Accuracy for Deep Learning Semantic Segmentation Applied to Wetland Vegetation Species Mapping. Remote Sens. 2022, 14, 2703. [Google Scholar] [CrossRef]
Gawlikowski, J.; Saha, S.; Kruspe, A.; Zhu, X.X. Towards Out-of-Distribution Detection for Remote Sensing. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS; IEEE: New York, NY, USA, 2021; pp. 8676–8679. [Google Scholar]
Zhang, X.; Cui, P.; Xu, R.; Zhou, L.; He, Y.; Shen, Z. Deep Stable Learning for Out-of-Distribution Generalization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2021; pp. 5368–5378. [Google Scholar]
Zhou, Y.; Li, W.; Cao, X.; He, B.; Feng, Q.; Yang, F.; Liu, H.; Kutser, T.; Xu, M.; Xiao, F.; et al. Spatial-Temporal Distribution of Labeled Set Bias Remote Sensing Estimation: An Implication for Supervised Machine Learning in Water Quality Monitoring. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103959. [Google Scholar] [CrossRef]
Hao, Z.; Chen, F.; Jia, X.; Cai, X.; Yang, C.; Du, Y.; Ling, F. GRDL: A New Global Reservoir Area-Storage-Depth Data Set Derived Through Deep Learning-Based Bathymetry Reconstruction. Water Resour. Res. 2024, 60, e2023WR035781. [Google Scholar] [CrossRef]
Schwatke, C.; Dettmering, D.; Seitz, F. Volume Variations of Small Inland Water Bodies from a Combination of Satellite Altimetry and Optical Imagery. Remote Sens. 2020, 12, 1606. [Google Scholar] [CrossRef]
Liu, Z.; Zhu, D.; Wang, L.; Li, D. Enhancing Surface Water Mapping and Monthly Dynamics Monitoring with a Stepwise Gap-Filling Method. Int. J. Digit. Earth 2024, 17, 2413882. [Google Scholar] [CrossRef]
Markert, K.N.; Williams, G.P.; Nelson, E.J.; Ames, D.P.; Lee, H.; Griffin, R.E. Dense Time Series Generation of Surface Water Extents through Optical–SAR Sensor Fusion and Gap Filling. Remote Sens. 2024, 16, 1262. [Google Scholar] [CrossRef]
Kutser, T.; Hedley, J.; Giardino, C.; Roelfsema, C.; Brando, V.E. Remote Sensing of Shallow Waters—A 50 Year Retrospective and Future Directions. Remote Sens. Environ. 2020, 240, 111619. [Google Scholar] [CrossRef]
Uudeberg, K.; Ansko, I.; Põru, G.; Ansper, A.; Reinart, A. Using Optical Water Types to Monitor Changes in Optically Complex Inland and Coastal Waters. Remote Sens. 2019, 11, 2297. [Google Scholar] [CrossRef]
Lan, L.; Wang, Y.-G.; Chen, H.-S.; Gao, X.-R.; Wang, X.-K.; Yan, X.-F. Improving on Mapping Long-Term Surface Water with a Novel Framework Based on the Landsat Imagery Series. J. Environ. Manag. 2024, 353, 120202. [Google Scholar] [CrossRef] [PubMed]
León-López, K.M.; Mouret, F.; Arguello, H.; Tourneret, J.-Y. Anomaly Detection and Classification in Multispectral Time Series Based on Hidden Markov Models. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5402311. [Google Scholar] [CrossRef]

Figure 1. Overview of the SWD framework.

Figure 2. Water extraction results of RF, U-Net, and SWD for the (a) Pires Ferreira (HWL) and (b) Choke Canyon (LWL) reservoirs, across multiple sensors (PlanetScope, Sentinel-2, and Landsat-8/9). For each reservoir, the overview true-color image (RGB) and the corresponding reference mask were shown, with red rectangle indicating the zoomed-in region. In the reference masks, white pixels indicate water and black pixels indicate non-water. In the detailed comparison panels, rows corresponded to sensors (PlanetScope, Sentinel-2, and Landsat-8/9) and columns corresponded to methods (RF, U-Net, and SWD, from left to right). To clearly highlight fine-scale differences in water boundary delineation, the extracted results are overlaid as vector outlines onto the false-color backgrounds.

Figure 3. Water extraction results for (a) Gilgel Gibe, (b) Mosul, and (c) Orto-Tokoy reservoirs using RF, U-Net, and SWD.

Figure 4. Water extraction results for the Eucumbene Reservoir under (a) high-water-level (HWL) and (b) low-water-level (LWL) conditions.

Figure 5. Boxplots of IoU across the 36 test cases for SWD, U-Net, and RF.

Table 1. Test reservoirs and scenario-specific evaluation objectives.

Reservoir (Location)	Center Coordinates	Environmental Conditions	Key Geo-Spectral Characteristic	Validation Objective
Eucumbene (Australia)	36°05′S, 148°42′E	High-altitude terrain; seasonal snowmelt	Dendritic shoreline with narrow tributaries	Connectivity preservation and delineation of narrow water bodies
Pires Ferreira (Brazil)	04°03′S, 40°37′W	Tropical semi-arid; dense tributary network	Narrow channels and shallow patches prone to fragmentation	Connectivity preservation in fragmented narrow waters
Gilgel Gibe (Ethiopia)	07°49′N, 37°19′E	Tropical basin; high sediment inflow	Longitudinal spectral gradient	Multi-modal spectral modeling under turbidity gradients
Mosul (Iraq)	36°37′N, 42°49′E	Arid plateau; high-albedo bare soil/sand	Bright background near shoreline	False-positive suppression under high-albedo background
Orto-Tokoy (Kyrgyzstan)	42°23′N, 75°51′E	Mountainous; snow/ice and terrain shadows	Snow/ice and terrain shadows	Water discrimination under snow/ice and shadow interference
Choke Canyon (USA)	28°30′N, 98°15′W	Semiarid; extensive drawdown zones	Wet soil and pioneer vegetation in drawdown areas	Separation of open water from mixed pixels in drawdown zones

Table 2. Independent public datasets used to train supervised baselines.

Dataset Name	Target Sensor	Dataset Scale	Key Pre-Processing	Reference
PlanetScope Dataset	PlanetScope (3 m)	100 scenes (1024 × 1024)	Acquisition 4-band surface reflectance data (Blue/Green/Red/NIR; 3–7 days)	Mukherjee et al. (2024) [52]
S1S2-Water	Sentinel-2 (10 m)	65 tiles	Band Selection (B2/B3/B4/B8)	Wieland et al. (2024) [53]
LandCoverNet	Landsat-8/9 (30 m)	1980 image chips (256 × 256)	Stratified Sampling; 4-band inputs (Blue/Green/Red/NIR)	Alemohammad et al. (2020) [54]

Table 3. Performance comparison of SWD, U-Net, and RF across different spatial resolutions.

Method	Metric	PlanetScope (3 m)	Sentinel-2 (10 m)	Landsat-8/9 (30 m)	Cross-Scale Mean	Cross-Scale SD
SWD	IoU	0.822	0.805	0.774	0.800	0.024
SWD	RE (%)	5.12	5.96	7.15	6.08	1.02
U-Net	IoU	0.816	0.785	0.741	0.781	0.038
U-Net	RE (%)	5.46	7.00	9.32	7.26	1.95
RF	IoU	0.707	0.685	0.644	0.679	0.032
RF	RE (%)	10.03	11.37	13.91	11.77	1.97

Note: Cross-scale Mean and cross-scale SD were calculated based on the results from the three evaluated sensors (3 m, 10 m, and 30 m).

Table 4. Performance comparison of SWD, U-Net, and RF across six reservoirs.

Reservoir	SWD		U-Net		RF
Reservoir	IoU	AE (km²)	IoU	AE (km²)	IoU	AE (km²)
Eucumbene	0.806	5.54	0.810	5.59	0.703	10.75
PireseFerreira	0.815	2.17	0.785	2.52	0.736	4.37
GilgelGibe	0.793	8.71	0.793	9.01	0.698	15.50
Mosul	0.802	11.02	0.757	17.13	0.660	23.92
Orto-Tokoy	0.785	1.21	0.738	1.65	0.617	2.43
ChokeCanyon	0.803	3.55	0.800	3.84	0.703	6.21
Inter-reservoir Mean	0.801	--	0.781	--	0.679	--
Inter-reservoir SD	0.010	--	0.025	--	0.031	--

Note: Inter-reservoir Mean and SD were calculated based on the results from the six reservoirs.

Table 5. Accuracy and stability of SWD, U-Net, and RF under HWL and LWL conditions.

Method	IoU (HWL)	IoU (LWL)	ΔIoU (%)	RE (HWL), %	RE (LWL), %	ΔRE (%)	σ_IoU	CV_IoU, %
SWD	0.810	0.791	−2.35	5.58	6.56	0.98	0.02	2.50
U-Net	0.799	0.762	−4.63	−6.31	−8.20	1.89	0.05	6.41
RF	0.693	0.664	−4.18	10.49	13.04	2.55	0.05	7.37

Note: ΔRE was the absolute difference of relative area error (RE) across high-water-level (HWL) and low-water-level (LWL) conditions. The σ_IoU and CV_IoU were computed across the 18 hydrological pairs (HWL-LWL).

Table 6. Estimated and reference water area at HWL and LWL for representative reservoirs (km²).

Reservoir	Method	Area_est (HWL)	Area_est (LWL)	ΔArea_est	ΔArea_ref	Area_ref (HWL)	Area_ref (LWL)
Mosul	SWD	312.97	242.10	70.87	69.84	301.43	231.59
	U-Net	286.56	212.20	74.36	69.84	301.43	231.59
	RF	324.58	256.21	68.37	69.84	301.43	231.59
Gilgel Gibe	SWD	183.55	146.90	36.65	37.00	175.04	138.04
	U-Net	166.70	128.36	38.34	37.00	175.04	138.04
	RF	189.27	154.69	34.58	37.00	175.04	138.04
Eucumbene	SWD	101.29	82.91	18.38	18.20	95.66	77.46
	U-Net	90.20	71.74	18.46	18.20	95.66	77.46
	RF	105.96	88.66	17.30	18.20	95.66	77.46

Table 7. Computational efficiency and deployment requirements of SWD, U-Net, and RF.

Method	Inference Time (Mean ± SD, s/Mpx)	Training Required	Hardware Requirement	Deployment Flexibility
SWD	0.858 ± 0.372	No	CPU only	High
RF	0.956 ± 0.011	Yes	CPU only	Medium
U-Net	1.864 ± 0.1518 (CPU) 0.031 ± 0.0008 (GPU)	Yes	GPU dependent	Low

Note: “s/Mpx” denoted seconds per megapixel, a normalized measure of computational efficiency. “Mpx” was an abbreviation for “megapixels” (i.e., one million pixels).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mo, G.; Yang, Q.; Zhou, X. Self-Supervised Reservoir Water Area Detection Across Multi-Source Optical Imagery. Remote Sens. 2026, 18, 918. https://doi.org/10.3390/rs18060918

AMA Style

Mo G, Yang Q, Zhou X. Self-Supervised Reservoir Water Area Detection Across Multi-Source Optical Imagery. Remote Sensing. 2026; 18(6):918. https://doi.org/10.3390/rs18060918

Chicago/Turabian Style

Mo, Guiyan, Qing Yang, and Xiaofeng Zhou. 2026. "Self-Supervised Reservoir Water Area Detection Across Multi-Source Optical Imagery" Remote Sensing 18, no. 6: 918. https://doi.org/10.3390/rs18060918

APA Style

Mo, G., Yang, Q., & Zhou, X. (2026). Self-Supervised Reservoir Water Area Detection Across Multi-Source Optical Imagery. Remote Sensing, 18(6), 918. https://doi.org/10.3390/rs18060918

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Self-Supervised Reservoir Water Area Detection Across Multi-Source Optical Imagery

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Self-Supervised Water Detection Framework

2.1.1. Reservoir-Sensitive Feature Space Construction

2.1.2. Spatio-Temporal Automated Sample Initialization

2.1.3. Multi-Modal Modeling and Pixel Discrimination

2.1.4. Object-Level Morphological Refinement

2.2. Study Areas and Datasets

2.2.1. Representative Reservoir Scenarios

2.2.2. Hydrological Conditions

2.2.3. Multi-Source Optical Data

2.2.4. Data Preprocessing and Normalization

2.3. Experimental Design and Evaluation Metrics

2.3.1. Comparative Baselines

2.3.2. Full-Factorial Experimental Design

2.3.3. Comprehensive Evaluation Metrics

3. Results

3.1. Assessment of Cross-Scale Consistency

3.1.1. Qualitative Comparative Analysis

3.1.2. Quantitative Performance Evaluation

3.2. Evaluation of Cross-Region Robustness

3.3. Assessment of Cross-Temporal Adaptability

3.4. Operational Utility and Efficiency

4. Discussion

4.1. Robustness via Per-Scene Adaptive Learning

4.2. Performance Divergence Analysis

4.3. Operational Applicability in Hydrology

4.4. Limitations and Future Directions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI