As a fundamental natural resource sustaining Earth’s life systems, water resources possess irreplaceable strategic value in areas such as agricultural production, industrial manufacturing, ecological balance, and human settlements [
1]. Water resource issues are typically regional, and their distribution is significantly influenced by factors such as season and climate change. In recent years, with intensifying global climate change and human activities, ecological vulnerability has become increasingly prominent, marked by a significant reduction in national wetland area and widespread shrinkage of inland lakes. Against this backdrop, obtaining timely and accurate information on the spatial distribution of water bodies is not only a core element for ensuring national water security and ecological civilization construction but also a critical pathway for achieving the United Nations Sustainable Development Goals (SDGs) [
2]. To enhance the capability for large-scale dynamic monitoring of water bodies, this study proposes a water extraction method that integrates multi-source remote sensing data, aiming to establish a dynamic monitoring technical system for water bodies characterized by both high spatiotemporal resolution and strong anti-interference capabilities. This system serves critical national needs, including quantitative water resource assessment, ecological redline supervision, and flood-drought disaster emergency response. It also provides a reliable data and methodological foundation for scientifically understanding the evolution patterns of the water cycle under global change and supports the localized implementation of the United Nations Sustainable Development Goals (SDGs).
Traditional water body extraction primarily relied on manual visual interpretation, which was inefficient and highly subjective. With the development of remote sensing technology, automated extraction methods have matured significantly. Currently, commonly used remote sensing data mainly fall into two categories: optical imagery and synthetic aperture radar (SAR) imagery. Optical imagery, such as Landsat and Sentinel-2, offers rich spectral information, which is advantageous for distinguishing land cover types. However, it is susceptible to interference from weather conditions such as clouds and rain. SAR imagery, including Sentinel-1A/B, ALOS-2, GF-3, and LuTan-1 [
3], provides all-weather imaging capability but is sensitive to topographic variations and prone to influences such as mountain shadows. Based on these data, current mainstream automated extraction methods can primarily be classified into object-oriented approaches, water index methods, and deep learning methods, among others [
4]. For instance, Wang Qi et al. [
5] proposed an object-oriented multi-feature optimization method for flood extraction from Synthetic Aperture Radar (SAR) images. This method involved multi-scale segmentation of SAR images, combined features like grayscale, texture, and shape, and constructed an extraction model based on the Random Forest algorithm, achieving good results. Liu Yibo et al. [
6] proposed a Siam-FRNet model based on a siamese network and attention mechanism, effectively improving the extraction accuracy of flood inundation extent in SAR images. However, these methods still face challenges such as strong dependence on threshold setting, large sample requirements, and limited generalization capability [
7]. In contrast, water index methods are widely adopted due to their simple calculation, high extraction efficiency, and good robustness. Since McFeeters [
8] constructed the Normalized Difference Water Index (NDWI) based on TM imagery, drawing on the principles of vegetation indices, various water index methods tailored for different conditions have emerged, becoming a primary approach for water body extraction. For example, Li Wenkang et al. [
7] used Sentinel-2 remote sensing imagery, applied the Automatic Water Index (AWEI) and Modified Normalized Difference Water Index (MNDWI) to extract water bodies, and combined auxiliary parameters to repair missing parts, ultimately obtaining relatively accurate water extraction results. Wu Qingshuang et al. [
9] proposed a Vegetation Red Edge Water Index (RWI) algorithm based on Sentinel-2 data; comparative experiments showed its extraction performance surpassed other indices. In terms of SAR data, Jia Shichao et al. [
10] utilized Sentinel-1 VV/VH dual-polarization data, incorporating the characteristics of water bodies in microwave imagery and drawing on the concepts of the Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI), to propose the SDWI index for Sentinel-1 dual-polarization data. This index effectively mitigates interference from soil and vegetation, demonstrating good performance. However, its extraction results may still be affected by mountain shadows caused by topographic variations.
While existing methods demonstrate high accuracy and reliability in small-scale studies, they still face significant challenges in extracting water bodies over large and complex environments. Optical images are susceptible to interference from weather conditions such as clouds and rain, while SAR images are sensitive to topographic variations such as mountain shadows and bare land. The use of either type alone often leads to false or missed extractions. Furthermore, most methods suffer from limited regional adaptability and insufficient temporal resolution, making them inadequate for large-scale, high-frequency, and long-term water body dynamic monitoring. Therefore, the fusion of optical and radar imagery has become a critical direction in remote sensing-based water resource surveys, aiming to enhance the accuracy and reliability of information extraction through multi-source data complementarity. Currently, the fusion of optical and SAR images is primarily achieved through methods such as weighted averaging, multi-scale feature fusion, and deep learning-based multi-branch networks. These approaches integrate the spectral and textural information of optical images with the structural and deformation features of radar images at the feature or decision level, and have demonstrated significant advantages in applications such as land use monitoring, forest fire warning, and impervious surface extraction [
11,
12,
13,
14]. However, existing fusion techniques still face challenges such as insufficient robustness in registration methods, complex fusion mechanisms, and low processing efficiency, which hinder their rapid and stable application in practical operations. Promoting breakthroughs in the interpretability of fusion mechanisms, computational efficiency, and environmental adaptability is of great significance for achieving efficient and accurate large-scale, long-term water body extraction.
This study integrates Sentinel-1 synthetic aperture radar (SAR) data and Sentinel-2 multispectral data to propose a synergistic water extraction method for large-scale areas (including all water bodies such as rivers and lakes). By effectively combining the cloud-penetrating and day-night imaging capabilities of radar with the rich spectral information of optical imagery, the method significantly overcomes the limitations of single-source data in terms of spatiotemporal continuity and land-cover differentiation accuracy. With high spatiotemporal resolution and strong anti-interference characteristics, the method currently supports operational dynamic monitoring at a monthly scale. Using the monthly water extraction results for Beijing from 2019 to 2020 as a case study, we compared and validated them against the JRC Global Surface Water dataset. A case application is also conducted focusing on water body changes caused by an extreme rainfall event in Beijing in July 2025.