1. Introduction
Water is crucial for sustaining life, forming the foundation of ecosystems, and supporting social and economic activities. Inland lakes and reservoirs, as core components of surface freshwater resources, play a vital role in human production and life, profoundly impacting regional economic prosperity and ecological balance. However, with evolving climate patterns—specifically global warming-induced shifts in precipitation regimes (e.g., increased frequency of extreme droughts/floods) and rising temperatures that accelerate surface water evaporation [
1,
2]. Therefore, continuous monitoring of lake and reservoir areas, water levels, and related storage changes is essential for achieving sustainable water resource management, effective flood prediction, in-depth exploration of hydrological cycles, understanding climate change patterns, and assessing human impact [
3,
4].
Traditional approaches for investigating lake and reservoir dynamics are typically constrained to small spatial scales, failing to enable long-term continuous monitoring [
5]. This limitation results in delays in integrating lake/reservoir change information and regional interconnections, thereby undermining decision-making efficiency based on water body variations. Additionally, estimations of storage changes remain heavily dependent on field observations, which are time-consuming, labor-intensive, and ill-suited for scaling to meet large-scale monitoring requirements [
6]. Existing global datasets (e.g., GLWD—Global Lake and Wetland Database, GREALM—Global Reservoir and Lake Map) are hindered by resolution constraints (300 m) and temporal limitations. This study leverages 10 m Sentinel-2 imagery and cloud compositing techniques to substantially enhance extraction accuracy.
With the rapid development of remote sensing technology, its advantages of wide observation range, strong periodic revisit capability, and low monitoring cost have gradually demonstrated its potential for macro, multi-temporal, multi-spectral, dynamic, and repeated monitoring of surface information, providing possibilities for large-scale lake and reservoir water body change monitoring [
7,
8]. Lake and reservoir areas and water level elevations, as key parameters for calculating storage changes, have been widely incorporated into research on continuous lake and reservoir change monitoring [
9]. By combining water body surface areas with water level information, automatic and precise extraction of multi-temporal lake and reservoir boundaries from remote sensing imagery can further monitor changes in lake and reservoir surface areas and storage volumes [
10].
Current water body extraction methods based on optical remote sensing imagery primarily include object-oriented methods [
11,
12,
13], deep learning methods [
12,
14,
15], and band combination methods [
16,
17]. Object-oriented methods are susceptible to segmentation thresholds and classification criteria, making them highly empirical [
13] (Su et al., 2022, who optimized segmentation parameters for karst water bodies using multi-scale object partitioning). Deep learning methods require substantial sample data, with recent advances focusing on small-sample adaptation (e.g., Liang et al., 2021 [
15], who proposed a transfer learning framework for water body extraction in data-scarce karst areas). In contrast, the water body index method, as a type of band combination method, extracts water body areas by constructing ratios between bands, offering the advantages of simplicity, high extraction accuracy, and speed [
17] (Wang et al., 2023, who developed a multi-temporal index to enhance water-body/non-water-body discrimination in seasonal karst regions). It has been successfully applied in the extraction and dynamic monitoring of various surface water bodies, including lakes, reservoirs, urban landscape water bodies, and rivers [
18,
19].
When applying remote sensing technology to large-scale feature extraction, the choice of data sources is crucial. Landsat series satellite imagery and the increasingly popular Sentinel-2 imagery are important options. Sentinel-2 imagery, with its superior spatial resolution (10 m for visible/near-infrared bands, compared to 30 m of Landsat series) and shorter revisit cycle (5 days via dual satellites Sentinel-2A/B), has become the preferred data source for scientists worldwide [
20]. However, the first one is limited by algorithmic transferability: Most existing algorithms are tailored to specific geographic features (e.g., flat plains or temperate forests), leading to degraded performance in karst landscapes like Bijie City. For instance, MNDWI is prone to interference from shadows in karst terrain [
21], whereas AWEIsh exhibits decreased accuracy in urban areas due to reflections from buildings [
22]. Cloud occlusion challenge: Cloud cover in optical imagery does not allow direct water body extraction but introduces information loss. Existing methods often fail to reconstruct occluded regions, whereas this study employs a temporal compositing strategy to infer water status from multi-temporal frequency analysis, avoiding the misinterpretation of ‘cloud-penetrating’ capabilities. Additionally, with the improvement of image spatial resolution, the texture and other features of ground objects become clearer, but this also increases the number of non-water target objects during extraction and the difficulty of their removal. Existing water body data products, such as the JRC Global Water Body Dataset and various land cover products [
23], still have deficiencies in resolution and dynamic water body distribution.
Given this, this study focuses on Bijie City, Guizhou Province, utilizing the Google Earth Engine (GEE) remote sensing cloud platform and Sentinel-2 remote sensing imagery to propose a novel multi-feature, multi-level extraction framework—a core contribution distinguishing it from existing studies. Unlike single-index methods (e.g., MNDWI prone to karst topographic shadows, AWEIsh less effective in urban areas) or deep learning methods requiring massive samples, our framework fuses water indices, spectral bands, and DEM data, and adds cloud compositing—specifically optimizing for karst landscapes’ unique challenges (surface seepage, rocky desertification). While validated in Bijie’s karst landscape, the approach is designed to be globally transferable, aiming to establish a universal methodology for large-scale lake and reservoir monitoring. This framework integrates cloud compositing and multi-feature fusion to address regional adaptability and cloud resilience challenges—outperforming low-resolution products (e.g., JRC Global Water Body Dataset, 300 m;—providing a scalable solution for dynamic water body extraction worldwide.
4. Results and Discussion
4.1. Accuracy Evaluation
4.1.1. Accuracy of Coarse Water Body Extraction
This study performs non-water target removal based on the water body extraction results from two water indices. During the coarse extraction stage, it is necessary to ensure that all water bodies are completely extracted, so the extraction rate is used to verify the accuracy of the coarsely extracted water bodies. The specific water body types, number of sample points, and accuracy are shown in
Table 2. The results indicate that in the coarse extraction stage, the combination of the two water indices can effectively extract lakes and reservoirs, ensuring the integrity of water body information. Although the extraction rates for rivers and small water bodies are low, they do not belong to the category of lakes and reservoirs, so they do not affect the final extraction results of lakes and reservoirs. Since the coarse extraction stage needs to prioritize the integrity of water body information (to avoid missing extraction), the extraction rate can intuitively reflect the coverage capacity of lakes and reservoirs, while indicators such as overall accuracy are more suitable for the evaluation of the final fine extraction results.
4.1.2. Accuracy of Lake and Reservoir Water Body Extraction
For the final lake and reservoir extraction results, lakes and reservoirs of different sizes, locations, and geographical environments are selected. The boundaries of lakes and reservoirs are visually interpreted to validate the extraction accuracy. The number of samples and accuracy are presented in
Table 3. It can be seen that the extraction accuracy of lakes and reservoirs in this paper is above 96%, demonstrating high accuracy.
4.2. Comparison of Extraction Effects of Different Algorithms
To validate the universality of the proposed water body extraction algorithm, two scenarios with abundant shadows—urban and mountainous areas—are selected as case studies to compare the non-water targets suppression capabilities of different algorithms. The results are shown in
Figure 5. As can be seen from the figure, the proposed multi-feature and multi-level water body extraction method produces better results in both urban and mountainous areas. Single water indices generate varying degrees of non-water targets in urban areas, making it difficult to remove non-water target effects. Although they exhibit fewer non-water targets in mountainous areas, they are still affected by villages and some topographic shadows. The difference in extraction effects between the proposed algorithm and single indices is due to the fact that, in large-scale water body extraction, geographical differences determine that the spectral characteristics of each water body are not the same, making it difficult for a single index to distinguish between water bodies and non-water bodies. A single index often cannot determine a universal threshold, whereas the proposed algorithm relies on multi-feature and multi-level methods to eliminate various geographic non-water targets while avoiding the limitations of threshold determination for a single index.
Quantitative analysis based on
Table 4 shows that the proposed method outperforms single MNDWI, AWEIsh, and NDWI algorithms in urban and mountainous areas. In urban areas, the IoU increases by 9.0%, 6.0%, and 15.0%, and the F1 score by 13.0%, 10.0%, and 17.0%; in mountainous areas, the IoU increases by 6.0%, 13.0%, and 20.0%, and the F1 score by 16.0%, 13.0%, and 20.0%. This advantage arises from the method’s targeted optimization for different geomorphic features: in urban areas, the fusion of MNDWI’s short-wave infrared sensitivity and AWEIsh’s vegetation suppression capability effectively reduces misjudgments caused by building shadows and artificial features; in mountainous areas, the combination of slope threshold for removing steep shadows and multi-index features minimizes spectral confusion between forest shadows and water bodies. In contrast, NDWI, relying on the green-near-infrared difference, is susceptible to vegetation and artificial feature reflections, showing the lowest accuracy in both regions. Although MNDWI and AWEIsh have respective advantages, their complementarity is insufficient. These results confirm the scientific validity of the study’s multi-stage processing strategy (coarse extraction for integrity and fine extraction for accuracy optimization) and the key role of “localized” integration of topographic and spectral features in enhancing algorithm universality and accuracy.
4.3. Cloud Occlusion Compositing Results
To validate the results of cloud occlusion compositing, a water reservoir area with heavy cloud cover in July 2024 is selected as a case study (
Figure 6a). Only a small portion of the water body information is visible due to extensive cloud cover. The occurrence frequency IF within this region is then calculated (
Figure 6b), where IF values range from [0,1], represented by a linear scale with black and white indicating 0 and 1, respectively. The proposed water body extraction algorithm is applied to extract water bodies from the occluded imagery (
Figure 6c). Combining the IF image and the water body extraction result under cloud occlusion, the land feature category of the cloud-occluded region is inferred based on the IF value of known water bodies, achieving the purpose of compositing water bodies. The final water body compositing result is shown in
Figure 6d. As can be seen from the figure, the cloud occlusion compositing algorithm accurately restores partially missing water bodies due to cloud occlusion, minimizing the impact of clouds and fog on the extraction results. This breakthrough overcomes the previous approach of using single-date imagery from different years as a substitute, resulting in more accurate extraction results.
4.4. Comparison of Different Products
To quantitatively evaluate the final lake and reservoir results, 44 lakes and reservoirs were selected for comparison with the JRC Global Water Bodies dataset (July 2020) and ESA WorldCover 2020 land cover classification product (
Table 5). As shown in the table, the proposed method extracted more water bodies than the JRC product, while yielding smaller lake and reservoir areas compared to WorldCover. This discrepancy with WorldCover can be attributed to seasonal hydrological dynamics: July (the end of the rainy season) saw water levels decline due to flood discharge from reservoirs, whereas October (the water storage period) featured higher water levels as reservoirs replenished. Since the WorldCover product is based on October data, it inherently captures expanded water bodies during the storage period, leading to overestimated extraction areas.
To further explore these area differences, sampling analysis is performed on the water body information from the proposed method and the two products, as shown in
Figure 7. Among them, a1, a2, and a3 represent the results from the proposed method, JRC, and WorldCover, respectively; a1, b1, and c1 represent large, medium, and small lake and reservoir sample points, respectively. The JRC product (300 m spatial resolution) exhibits smaller extracted areas. This is primarily because its low spatial resolution makes it difficult to identify small water bodies (<0.1 km
2), and its long revisit cycle (30 days) renders it susceptible to cloud occlusion, leading to the omission of some water bodies (Pekel et al., 2016 [
23]). In contrast, the WorldCover product (10 m spatial resolution) shows larger extracted areas. This is attributed to the fact that the product is based on data from the October water storage period—a phase when reservoirs in Bijie City are in concentrated water storage—resulting in a seasonal discrepancy with the data from the July flood discharge period used in this study; additionally, its single-temporal data fails to capture hydrological dynamics. By combining 10 m spatial resolution with multi-temporal compositing, the method in this study not only accurately captures small water bodies but also mitigates the interference of seasonal fluctuations through data fusion of the rainy season and water storage period.
4.5. Changes in the Spatial Distribution of Lakes and Reservoirs
The spatial distribution of lakes and reservoirs in Bijie City is shown in
Figure 8. Using point data to represent their spatial locations, lakes and reservoirs in Bijie City exhibit significant spatial heterogeneity, with a concentration in the central region and fewer lakes and reservoirs in the east and west directions.
As presented in
Table 6, the area of lakes and reservoirs in Bijie City from 2017 to 2021 exhibited an overall decline-then-increase trend, with extraction accuracy remaining consistently high throughout the study period. Notably, the monthly area variations across the five years (focusing on April, July, and October, key periods for hydrological monitoring) also followed a uniform “decrease-then-increase” pattern, which is further supported by annual accuracy metrics, as detailed below:
First, a notable shrinkage occurred in 2018—the most significant area reduction was observed from April to July 2018, when the total area dropped from 7.846 km2 to 7.036 km2 (a decrease of 0.81 km2, equivalent to 10.3%). Even amid this shrinkage, the extraction accuracy remained stable: the Intersection over Union (IoU) was 0.92, the F1 score was 0.91, and the overall accuracy reached 95.8%, confirming the method’s ability to capture area changes reliably during periods of water bodies.
Second, regarding intra-annual cycles (from October of one year to April of the next), most periods showed a decreasing trend—with the only exception being October 2019 to April 2020, when the area increased from 8.381 km2 to 8.682 km2. For this exceptional period, the extraction accuracy was further improved: IoU reached 0.93, the F1 score was 0.92, and the overall accuracy was 96.2%, reflecting the method’s adaptability to minor deviations in annual hydrological rhythms.
Third, in terms of the overall trajectory, the area of lakes and reservoirs rebounded significantly after 2020, with the October 2021 measurement hitting 9.324 km2—the highest value in the five-year study period. Correspondingly, the extraction accuracy in 2021 also reached the peak of the study period: IoU was 0.96, the F1 score was 0.95, and the overall accuracy was 97.6%, fully validating the method’s robustness in capturing recovery-phase area dynamics.
This consistent seasonal fluctuation is closely linked to hydrological cycles: summer evaporation tends to cause water body shrinkage, while autumn-winter precipitation supplements water storage and promotes area recovery. Importantly, the sustained high accuracy (IoU: 0.92–0.96; F1 score: 0.91–0.95) across all five years further confirms the method’s reliability in tracking dynamic changes in lake and reservoir areas.
From the perspective of driving factors, hydrological seasonal fluctuations were the dominant cause of area changes from 2017 to 2021, contributing 95% to the total variation; in contrast, land use/cover dynamics had a relatively minor impact (contributing only 5%). Specifically:
In 2018, some small reservoirs (e.g., a medium-sized reservoir in Zhijin County) experienced an area reduction of 0.03 km2, attributed to the expansion of surrounding farmland—this agricultural activity occupied approximately 100 m of the reservoir’s shoreline, leading to localized shrinkage.
After 2020, with the advancement of the “Grain for Green” project, the vegetation coverage around reservoirs increased from 35% to 52%. This vegetation restoration reduced soil erosion, slowed reservoir siltation, and enabled a steady rebound in reservoir areas.
Overall, however, the annual area impact of land use changes never exceeded 0.1 km2, which was insufficient to alter the overall “decline-then-increase” trend of lake and reservoir areas in Bijie City during the study period.
4.6. Methodology Evaluation and Application Value
4.6.1. Methodology Evaluation
The multi-feature fusion method proposed in this study exhibits significant performance advantages in karst areas, which can be summarized as follows:
Accuracy: The final extraction Intersection over Union (IoU) reaches 96–98%, representing an improvement of 6–13% compared with single-index methods (IoU of 82–88% for MNDWI and 74–76% for NDWI). Additionally, this method can effectively suppress topographic shadows (reducing misjudgments by 20%) and building shadows (reducing misjudgments by 15%).
Limitations: When cloud cover exceeds 60% (e.g., during the rainy season in Bijie City from June to July), the cloud restoration accuracy decreases to 88%. Further optimization is required in subsequent studies by integrating Sentinel-1 SAR data (with cloud-penetrating capability).
Reproducibility: Based on the Google Earth Engine (GEE) platform and open-source QGIS 3.40.3 software, the method does not require commercial software support, which facilitates its promotion among grassroots departments.
4.6.2. Application in Water Resource Planning and Protection
This method can provide technical support for water resource management in Bijie City in three aspects:
Dynamic Monitoring: Monthly water body area data can facilitate reservoir operation. For example, during the flood discharge period in July, the area-based early warning threshold can be set as “a 10% decrease compared to the previous month”.
Ecological Restoration: The extraction results of small water bodies (e.g., water accumulation in karst depressions with an area < 0.1 km2) can assist in rocky desertification control and support the analysis of surface water-groundwater recharge relationships.
Policy Formulation: Long-term time-series results from 2017 to 2021 show that climate change has expanded the inter-annual area fluctuation range from 5% to 8%. This finding can provide data support for the Water Resources Plan for Responding to Climate Change in Bijie City.
5. Conclusions
This study proposes a multi-feature and multi-level water body extraction method for lakes and reservoirs in Bijie City, China, utilizing Sentinel-2 multispectral imagery on the Google Earth Engine (GEE) remote sensing cloud platform. The key findings and contributions of this study are summarized as follows:
(1) Superiority of the Multi-Feature and Multi-Level Extraction Method:
The proposed method effectively integrates multiple features, including the Modified Normalized Difference Water Index (MNDWI), Automated Water Extraction Index (AWEIsh), Normalized Difference Vegetation Index (NDVI), Normalized Difference Built-up Index (NDBI), and Normalized Difference Red-Edge Index (NDREI), combined with Sentinel-2 B8/B9 bands and Digital Elevation Model (DEM) data to construct a comprehensive water body extraction strategy.
This method overcomes the limitations of single water indices in extracting water bodies at large scales, effectively eliminating vegetation shadows, topographic shadows, and artificial object non-water targets through hierarchical processing, significantly improving the accuracy and completeness of water body extraction.
To address the issue of missing water body information due to cloud occlusion, a temporal flood frequency algorithm is employed for cloud-occluded water body compositing, effectively recovering occluded water bodies and improving the timeliness and accuracy of water body extraction.
(2) Validation of Water Body Extraction Accuracy:
In the coarse extraction stage of water indices, the accuracy of extraction results was verified through visual interpretation, with the extraction rates of lake and reservoir water bodies all above 97.5%, ensuring the integrity of lake and reservoir water body information. For the final lake and reservoir extraction results, the overall extraction accuracy reached more than 96%, demonstrating the reliability and effectiveness of the method in this paper.
(3) Comparison of Extraction Results Using Different Algorithms:
Two scenarios with abundant shadows—urban and mountainous areas—are selected as case studies to compare the non-water targets suppression capabilities of different algorithms. The results show that the proposed multi-feature and multi-level extraction method significantly outperforms single water indices in suppressing non-water targets from various geographic features, producing more accurate extraction results.
Compared to existing water body data products (e.g., JRC Global Water Bodies and ESA WorldCover), the proposed method exhibits improvements in fineness and seasonal distribution, more accurately reflecting the actual changes in lake and reservoir water bodies.
(4) Analysis of Lake and Reservoir Water Body Extraction Results in Bijie City:
The spatial distribution of lakes and reservoirs in Bijie City exhibits significant heterogeneity, with a concentration in the central region and fewer lakes and reservoirs in the east and west directions.
From 2017 to 2021, the area of lakes and reservoirs in Bijie City exhibited an overall trend of first declining and then increasing. Intra-annual changes in lake and reservoir areas also followed a similar trend, with large lakes and reservoirs being the primary contributors to the overall area changes. This variation characteristic reflects the seasonal fluctuations in lake and reservoir areas.
(5) Method Universality and Application Prospects:
The proposed multi-feature and multi-level water body extraction method is not only applicable to Bijie City but also demonstrates universality in extracting water bodies at large scales. The method can automatically, rapidly, and accurately extract water body information, providing important technical support for water resource management, flood prediction, hydrological cycle research, climate change impact assessment, and other fields.
(6) Limitations and Future Directions:
While the proposed method demonstrates high accuracy and adaptability in Bijie’s karst landscape, its claimed universality requires further validation across non-karst geomorphic types. For example, in plain regions (e.g., the Yangtze River Delta) with extensive artificial water bodies (e.g., paddy fields), the current NDVI/NDBI thresholds may misclassify paddy fields as lakes/reservoirs, necessitating the integration of phenological features (e.g., seasonal vegetation dynamics) for optimization. In cold regions (e.g., the Qinghai–Tibet Plateau), ice/snow cover may interfere with water body spectral signals, requiring the addition of thermal infrared bands (e.g., Sentinel-3 SLSTR) to distinguish water from ice.
Additionally, the cloud occlusion restoration algorithm exhibits reduced performance when monthly cloud cover exceeds 60% (e.g., Bijie’s rainy season in June–July), with water body recovery accuracy dropping to 88% (vs. 96% for cloud cover < 30%). Future research will incorporate multi-sensor data fusion (e.g., Sentinel-1 SAR, which is cloud-penetrating) to improve restoration accuracy in extreme cloudy conditions.
Finally, the current validation relies primarily on remote sensing reference datasets (JRC, WorldCover) and limited field surveys. Subsequent work will establish a ground-truthing network at the Jiayan Water Conservancy Project [2.1] and five other key reservoirs, collecting in situ water level/area data monthly to enhance the rigor of accuracy evaluation.