Multitemporal Water Extraction of Dongting Lake and Poyang Lake Based on an Automatic Water Extraction and Dynamic Monitoring Framework

: Timely and accurate large ‐ scale water body mapping and dynamic monitoring are of great significance for water resource planning, flood control, and disaster reduction applications. Synthetic aperture radar (SAR) systems have the characteristics of strong operability, wide cover ‐ age, and all ‐ weather data availability, and play a key role in large ‐ scale water monitoring applica ‐ tions. However, there are still some challenges in the application of highly efficient, high ‐ precision water extraction and dynamic monitoring methods. In this paper, a framework for the automatic extraction and long ‐ term change monitoring of water bodies is proposed. First, a multitemporal water sample dataset is produced based on the bimodal threshold segmentation method. Second, attention block and pyramid module are introduced into the UNet (encoder ‐ decoder) model to construct a robust water extraction network (PA ‐ UNet). Then, GIS modeling is used for the auto ‐ matic postprocessing of the water extraction results. Finally, the results are mapped and statisti ‐ cally analyzed. The whole process realizes end ‐ to ‐ end input and output. Sentinel ‐ 1 data covering Dongting Lake and Poyang Lake are selected for water extraction and dynamic monitoring analy ‐ sis from 2017 to 2020, and Sentinel ‐ 2 images from a similar time frame are selected for verification. The results show that the proposed framework can realize high ‐ precision (the extraction accuracy is higher than 95%), highly efficient automatic water extraction. Multitemporal monitoring results show that Dongting Lake and Poyang Lake fluctuate most in April, July, and November in 2017, 2019, and 2020, and the change trends of the two lakes are the same. Finally, the Pearson correlation coefficients and RMSE are calculated for Dongting Lake and Poyang Lake, achieving the Pearson corre ‐ lation coefficients of the two lakes are 0.9685 and 0.9472, respectively, and RMSE of the two lakes are 0.0768 and 0.3386, respectively. The above results demonstrate that the ex ‐ traction results based on the proposed method are highly consistent with those based on optical data, which also means that the proposed method has an accurate extraction performance.


Introduction
Water body extent mapping is an important research topic in the field of lake change and flood disaster monitoring. According to the available statistics, flood disasters are the most serious natural disasters at present, and annual flood disaster losses account for more than 40% of the losses caused by all natural disasters [1]. Timely and accurate access to water change information is of great significance for government decision-making and flood control rescue efforts.
The emergence of remote sensing has provided an advanced technical means for flood information acquisition. With the increase in remote sensing data sources, the research and application of using optical sensor data to obtain surface water information have been increasingly developed [2][3][4][5][6][7][8]. However, the availability of optical data is limited by rainy weather during flood periods. Synthetic aperture radar (SAR) data can be acquired all day and under all weather conditions, and the backscattering value of water bodies in SAR images is low, making it easily separable from other ground objects [9]. them are in a flood period in July. Moreover, in 2017, 2019, and 2020, the water area of Dongting Lake is over 3000 km 2 , while the water area of Poyang Lake is over 5000 km 2 and reaches even more than 6000 km 2 in 2020. There are two peaks of water area in Dongting Lake every year: one occurs around March, and the other occurs around July. The remainder of this paper is organized as follows: Section 2 provides a description of the study area and experimental data. The proposed methodology is described in Section 3. Section 4 is devoted to the experimental results and analyses. Section 5 is discussion. Finally, conclusions are drawn.

Study Area
Dongting Lake and Poyang Lake in the Yangtze River Basin are selected as the study areas. Poyang Lake is the largest freshwater lake in China, and Dongting Lake is the third largest freshwater lake in China. In the middle and lower reaches of the Yangtze River Basin, there is a lot of continuous precipitation from June to July every year, which is prone to floods. In 2016 and 2017, the Dongting Lake and Poyang Lake experienced extraordinary floods. Expecially in 2020, there has been continuous heavy rainfall in the middle and lower reaches of the Yangtze River, among which Dongting Lake and Poyang Lake have kept exceeding the warning water level for a long time. A large number of farmlands flooded and houses collapsed, posing a major threat to peopleʹs lives and property. In this paper, we try to carry out a long time series of water monitoring in these two areas to explore the dynamic changes of water bodies. Herein, Dongting Lake is located at 112. 35-113.19 E and 29.26-30.12 N, and Poyang Lake is located at 115.78-116.75 E, and 28. 36-29.75 N. The locations of study areas are shown in Figure 1.

Experimental Data
The main data used in this paper are the Sentinel-1 SAR data released by the ESA, and the auxiliary data include the Sentinel-2 optical data and GF-3 SAR data. A total of 87 scenes are used in this paper, including 39 Sentinel-1 images covering Dongting Lake and 39 scenes covering Poyang Lake. All the Sentinel-1 SAR data are IW mode, GRD format, VV/VH polarization, and the resolution is 20 m. In addition, three GF-3 images and six Sentinel-2 images were acquired to provide auxiliary information. The GF-3 data are used to supplement the data of Dongting Lake in June 2020, as Sentinel-1 data are not available in this month. The Sentinel-2 data are used to verify the accuracy of the water extraction results based on the proposed method. The details of the data are shown in Tables 1 and 2.  Sentinel-2 10 13 November 2020

Training and Testing Datasets
The total number of training and testing datasets is 8727, which contains 900 negative sample patches (including mountains). The training data are used to train the PA-UNet, and the test data are used to test the performance of the trained model. The size of the patches is 256×256 pixels. The ratio of training data to testing data in this paper is approximately 8:1. Figure 2 shows a few example sample patches.

Methodology
The automatic processing chain is described in Figure 3. It mainly includes four components: (i) data preprocessing; (ii) construction of the water extraction network (PA-UNet) model; (iii) accuracy evaluation; and (iv) multitemporal water monitoring analysis.

Production of the Multitemporal Water Sample Dataset
Water bodies have different representations in SAR images during different periods, such as pre-flood, during-flood, and post-flood periods, so it is necessary to design a multitemporal water sample dataset. The study areas lie in hilly areas, and some areas are affected by mountain shadows, which leads to the classification of non-water objects as water; therefore, we add a certain proportion of mountain samples into the water sample dataset. In this paper, water samples do not rely on handcrafts, and they are produced based on the water results extracted by Cao et al. [54]. Specifically, (1) the SAR data of Dongting Lake and Poyang Lake collected during flood and non-flood periods are selected for preprocessing, including radiometric calibration, geocoding, and filtering. (2) Based on the preprocessed data, the water extraction process outlined by Cao et al. [54] is used to extract water bodies and obtain a binary map. (3) The binary map is then transformed from a raster to a vector format in ArcGIS software and combined with Google Earth images for data cleaning to further ensure sample accuracy. (4) Finally, the Sentinel-1 images and water vector layer are converted into 8 bit TIF data, which can further improve the training efficiency. (5) The SAR images and corresponding sample images are clipped into 256×256 pixels patches. Some of the samples obtained from mountainous areas are added to the sample dataset to complete the production of the multitemporal water sample dataset.

Preprocessing of the GF-3 Data
Radiation correction is carried out for the L1A level products of the GF-3 SAR data to calculate the backscattering coefficient corresponding to each pixel value. The equation is as follows [55]: where 2 2 I P I Q   , I is the real part of the pixel value of the L1A level product, Q is the imaginary part of the pixel value of the L1A product, QualifyValue is the maximum value of the scene image before quantization, dB K is a calibration constant, and 0 dB NE  is the equivalent noise coefficient. Then, geocoding and filtering are operated.

Preprocessing of the Sentinel-2 Optical Data
First, the Sentinel-2 data bands are resampled to the resolution of band 3 (10 m) using SNAP software, and then the resampled bands are fused to obtain the 10-m resolution Sentinel-2 image. Then, the modified normalized difference water index (MNDWI) [56] is calculated, and the results are segmented by a reasonable threshold to obtain the water body results.

Construction and Training of the PA-UNet Model
By combining the attention block and pyramid pooling module with the UNet model, a water extraction network (PA-UNet) for multitemporal SAR is designed. Specifically, an encoder-decoder structure (UNet model) is used for feature extraction and classification. The pyramid pooling module is located between the encoder and decoder and is used for multiscale feature extraction from deep layers. The attention block lies in the decoder part and can focus the network on the detection of water objects. Additionally, in the classifier, dice loss is used to replace the cross-entropy loss function.

PA-UNet
The main structure of PA-UNet (Seen Figure 4) is divided into two parts: an encoder and a decoder. The encoder is mainly composed of a convolution block (Conv2D block), a max-pooling layer, and a concatenation. The Conv2D block is composed of two groups of convolution layers (Conv2d), batch normalization layers, and activation functions (ReLU Activator). The decoder is mainly composed of a deconvolution structure (Transconv2D), a convolution block, a concatenation, and an attention block, and finally the deconvolution structure upsamples the feature map to achieve pixel-level classification.

Pyramid Pooling Module
The pyramid pooling module [53] can realize multiscale feature extraction from the feature map. In the multitemporal Sentinel-1 images of the two lakes, the lakes and small rivers show different characteristics, such as those observed during the dry season and flood season. Therefore, it is necessary to fully identify the characteristics of water bodies with different scales and forms. Thus, the pyramid pooling module is introduced into the UNet model to extract multiscale features and various morphological characteristics.
The pyramid pooling module fuses features under four-level pyramid scales with bin sizes of 1 1  , 2 2  , 3 3  , and 6 6  (see Figure 5). The coarsest level highlighted in red represents global pooling, which is then used to generate a single bin output. The following pyramid level separates the feature water body map into different subregions and forms a pooled representation of various locations. The outputs of the different levels in the pyramid-pooling module contain feature maps with varied sizes. To maintain the weight of the global feature, if the level size of the pyramid is N , the1 1  convolution layer after each pyramid level reduces the dimensions of the context representation to 1 N the original. Then, the low-dimensional feature maps are directly upsampled to obtain the same size feature as the original feature map via bilinear interpolation. Finally, different levels of various features are concatenated as the final pyramid pooling global feature, which is used for subsequent classification work. The pyramid-pooling module collects multilevel information of multiscale water bodies and combines them with the original feature map extracted from the encoder to improve the accuracy of water extraction.

Attention Block
The attention mechanism is mainly inspired by the attention gate model [52]. The core idea of the attention mechanism model is to identify the irrelevant part of the model by inhibiting it and, at the same time, learn the characteristics related to the task (see Figure 6). Here, we adopt the attention block to the UNet model to make the network focus on water extraction. Where g is the gating signal output from the downsampling layer, .l X represents the feature map of the upsampling layer passed by the skip connection. Dconv2d means to pool the mean using the dilated convolution kernel. Then, the gate signal and the feature map are connected, and the ReLU activation function, dimension reduction and sigmoid activation mapping are used. The results are then dot multiplied with .l X to obtain the features concerned.

Loss Function
In the water samples, the numbers of pixels containing small rivers in the patch are small, and the proportions of water and background pixels are imbalanced. Thus, the dice loss function is used to solve the problem of sample imbalance. In the binary classification task, the calculation equation of the dice loss function is as follows: where dice L represents the Dice loss and Dice represents Dice coefficient. The range of dice L is [0,1], where "0" indicates that there is no overlap between the predicted result and the ground truth value, and ʺ1ʺ indicates that the predicted result and the ground truth value completely overlap.
The calculation equation of the Dice coefficient [57] is as follows: where A B  indicates the intersection of the predicted result and the ground truth value, and A and B represent the pixel number of the network segmentation result and ground truth, respectively. The higher the Dice coefficient is, the better the segmentation performance is.

Accuracy Evaluation
The accuracy evaluation process includes two parts: (1) Sentinel-2 images taken from similar dates are used to extract the water bodies and evaluate the accuracy of the extraction results based on the PA-UNet. (2) The predicted results are evaluated based on the ground truth (water extraction results based on the bimodal threshold method). The accuracy evaluation indicators include commission error (CE), omission error (OE), kappa coefficient, and overall accuracy (OA) [58]. The calculation equations are as follows: where TP means that the classifier recognizes the object correctly and thinks that the sample is positive; FP means that the result of the classifier is wrong, and the classifier thinks that the sample is positive, but, in fact, the sample is negative; FN means that the result of classifier recognized is wrong, and the classifier thinks that the sample is negative, but, in fact, it is positive.
The total number of samples whose true values belong to class

Multitemporal Water Monitoring Analysis Modeling
This paper uses 82 scenes, and the data volume is large. It is time-consuming and laborious and relies only on manual processing. Therefore, we considered using GIS modeling to achieve end-to-end input and output. GIS modeling is carried out in ArcGIS software platform by adding the GIS tools in one model. The specific steps include data input, data conversion, vector clipping, projection conversion, and water area calculation. Batch processing is realized in the whole process, and much time is saved.

Accuracy Evaluation
To further illustrate the performance of the proposed method, Sentinel-2 optical images of Dongting Lake and Poyang Lake in November 2020 were obtained as references. The cloudy and rainy weather in July made it impossible to obtain cloudless Sentinel-2 images, and as they were affected by cloudy and rainy weather, Sentinel-2 images and Sentinel-1 images could not be obtained on the same day. In this paper, the acquisition date of the Sentinel-2 image of Dongting Lake is 2 days different from the Sentinel-1 image, while the acquisition dates of the Sentinel-2 image and Sentinel-1 image of the Poyang Lake have a 6-day difference. The spatial distribution of water bodies in some areas of the Poyang Lake is inconsistent due to time differences; thus, we think it is reasonable that there are obvious inconsistencies between the two results. Water extraction using Sentinel-2 images is based on the classic MNDWI method, which can extract water by adjusting the appropriate threshold. In this paper, to determine the appropriate threshold, we use the raster color slices and quick stats tools in ENVI 5.3 software to determine the segmentation effect of different thresholds and combine with Google Earth image and expert interpretation to determine the best threshold. The comparison results are shown in Figure 7. Figure 7a3, a4 and Figure 7d3, d4 are the global detection results in the two lakes, showing that the results extracted the proposed method are highly consistent with those based on Sentinel-2 data. In addition, we also select two sites in two lakes to compare the extraction effect in detail. Figure 7b1 and Figure 7c1 correspond to the enlarged image in the red box of Figure 7a1, and Figure 7e1 and Figure 7f1 correspond to the enlarged image in the red box of Figure 7d1. It can be seen from Figure 7b1, b2 that the PA-UNet method detects the water bodies more completely than MNDWI results using Sentinel-2 data. While in Poyang Lake area, PA-UNet method failed to detect some small-scale water bodies due to the lower spatial resolution.
To quantitatively evaluate the accuracy of the extraction results, we calculated the relevance between the Sentinel-1-based results and Sentinel-2-based results, and then the Pearson correlation coefficients between the two results of Dongting Lake and Poyang Lake were calculated. Specifically, a 1 km  1 km grid of Dongting Lake and a 4 km  4 km grid of Poyang Lake is established based on the common distribution range of the two results, and the water area of the two results in each grid is calculated. Then, we calculate the correlation between the Sentinel-1-based results and Sentinel-2-based results, as shown in Figure 8. Figure 8 shows that the two results in Dongting Lake and Poyang Lake both have high correlations. Finally, the Pearson correlation coefficients and RMSE are calculated for Dongting Lake and Poyang Lake, achieving the Pearson correlation coefficients of the two lakes are 0.9685 and 0.9472, respectively, and RMSE of the two lakes are 0.0768 and 0.3386, respectively. The above results demonstrate that the extraction results based on the proposed method are highly consistent with those based on optical data, which also means that the proposed method has an accurate extraction performance.

Comparative Experiment of Different Methods
To evaluate the detection performance of PA-UNet, we select the residual UNet and UNet models for comparison with PA-UNet. Herein, three models are trained and tested by the same samples, the same model parameters, and the same environment. Sentinel-1 images acquired during two periods (flood and non-flood period) are selected to test the robustness of PA-UNet in the two study areas. The results extracted by the three methods are shown in Figure 9. In July 2020, affected by continuous heavy rainfall, floods occurred in the middle and lower reaches of the Yangtze River; the water levels of Dongting Lake and Poyang Lake rose, and the lakes' areas became larger. With the subsequent decrease in rainfall, the lakes stabilized in November 2020. It can be seen from the SAR images in Figure 9 that the ranges of Dongting Lake and Poyang Lake change greatly during the two periods. In general, the detection results of the three methods are highly consistent, but in view of the aforementioned details, the proposed method detects the development of small-scale rivers more completely.
To further quantitatively evaluate the detection performance of the three models, we evaluated the results based on ground truth, and the results are shown in Table 3. It can be seen from Table 3 that the overall accuracy of PA-UNet is the highest among the four images, which is more than 95%. However, there are some missing and false detections made by PA-UNet. For example, on 15 November 2020, the omission error of Dongting Lake reached 1.11%, and on 17 November 2020, the commission error of Poyang Lake reached 2.15%. There are two main reasons for this omission error; on the one hand, the water level of the lake is low in winter, and the backscattering coefficient of the silt on the bank is high in the SAR images; on the other hand, the scattering characteristics of small-scale water flow in the SAR images are not obvious, which leads to incomplete detection. The main reason for the commission error is that during the flood period, precipitation leads to more water in some paddy fields, which show similar characteristics to those of water bodies in SAR images; thus, precipitation can be easily extracted as water bodies by networks.
In general, the proposed method can detect different phases of water bodies, which shows that the proposed method has great potential in the application of rapid and accurate water detection.

Multitemporal Dynamic Monitoring in Dongting Lake and Poyang Lake
The images of Dongting Lake and Poyang Lake before flooding (April), during the flood period (July), and after flooding (November) from 2017 to 2020 were selected to analyze the dynamic changes in the two lakes. The results are shown in Figures 10 and  11. Figures 10 and 11 show that the water area of Dongting Lake and Poyang Lake increases significantly in July relative to April and November. In November, there was less continuous heavy rainfall in the Yangtze River Basin, and the water area was relatively stable. Figures 10 and 11 show that the water areas of the two lakes in April 2017-2020 were not significantly different from those in November. To show more clearly the dynamic changes in the areas of Dongting Lake and Poyang Lake during the three time periods from 2017 to 2020, we calculated the water areas of Dongting Lake (the statistical range is 16638.34 km 2 ) and Poyang Lake (the statistical range is 36568.54 km 2 ) during the three time periods, and the statistical results are shown in Figure 12. Figure 12 indicates that the water change trends of Dongting Lake and Poyang Lake in April, July, and November from 2017 to 2020 are the same, and both of them are in a flood period in July. In 2017, 2019, and 2020, the water area of Dongting Lake is over 3000 km 2 , while the water area of Poyang Lake is over 5000 km 2 and reaches even more than 6000 km 2 in 2020. Dongting Lake and Poyang Lake fluctuate most in April, July, and November in 2017, 2019, and 2020.   To further explore and analyze the monthly dynamic changes occurring in the two lakes, Dongting Lake is selected as the experimental area. It can be concluded from Figure 12 that the variations in the trends of Dongting Lake and Poyang Lake are the same; however, Poyang Lake covers a larger area. In addition, Poyang Lake covers two Sentinel-1 images, while Dongting Lake only covers one Sentinel-1 image. Therefore, Poyang Lake is not analyzed iteratively by year, and we only analyze the flood changes of Poyang Lake from April 9, 2020, to August 1, 2020. Figure 13 shows the extraction results of 36 images of Dongting Lake month by month from 2018 to 2020, and Figure 14 shows the corresponding statistical area. Figure 14 shows that there are two peaks of water area in Dongting Lake every year: one occurs around March, and the other occurs around July. The flood season occurs in approximately July every year, and the water area is the largest here that it is throughout the whole year. It can also be concluded from Figure 14 that April is the transitional period of the two water area peaks, and November is a stable period, which also shows that the before-flood period (April), flood period (July), and after-flood period (November) selected in this paper are reasonable. Figure 15 shows the water extraction results of Poyang Lake from 9 April 2020, to 1 August 2020, and Figure 16 shows the corresponding statistical water area. We can see that the water area of Poyang Lake decreased gradually from April 9 to May 3, fluctuated within a small range from May 3 to May 27, and then increased from May 27 to July 20 until it peaked. By August 1, the water area began to decline. These monitoring results can help up intuitively understand the dynamic changes in Dongting Lake and Poyang Lake and can be used as a reference for the application of rapid hydrological monitoring.

Comparison to the Previous Work
As discussed in the Introduction, Li et al. [46], Kang et al. [47], Nemni et al. [48] and Chen et al. [48] all applied deep learning technology for water detection using SAR data. However, these studies focused on a relatively small selection of regions, and they have not yet tested the effectiveness of the proposed method with time series data in different seasons.
Here, we would like to point out the research of Mizuochi et al. [59]. Mizuochi et al. combined random forest and conditional generative adversarial networks (pix2pix) machine learning (ML) methods realized accurate water extraction based on the Moderate Resolution Imaging Spectroradiometer (MODIS) and the Advanced Microwave Scanning Radiometer 2, and effectively rendered the seasonality and heterogeneous distribution of the Lena River and the thermokarst lakes. However, in Mizuochi' work, they applied threshold method to water extraction using Sentinel-1 data and came to the results that many water bodies were missed out using Sentinel-1 data. To test the performance of PA-UNet method, we carried out water extraction experiments using the same data. The comparison results are shown in Figure 17. It is obvious that our method extracted the water bodies more accurate than MODIS-based results. The block in red and blue indicate PA-UNet extracted the rivers more completely. The yellow bock indicates MODIS-based results missed many small-scale water bodies. These results confirm that our method is promising.

The Limitation of PA-UNet Method
Although our method has achieved satisfactory results in the water extraction of Poyang Lake and Dongting Lake. However, our method has the following limitations: (1) this method has not been carried out water extraction experiments in complex environments, such as Lake extraction in Tibet and river extraction in mountainous areas. (2) In cold weather areas, rivers will freeze in winter, and the backscattering intensity of rivers in SAR image is high, so SAR images are not suitable for water extraction in river icing period. (3) Our method has not been tested in urban areas with many high-rise buildings, and building shadows are easy to be detected as water bodies.

Future Prospects
Given these promising results, in the future, the framework proposed in this paper could be transferred into Google Earth Engine platform to implement automatic water mapping. In such a platform, Sentinel-1 data could be downloaded from Copernicus Hub and automatically preprocessed, and preprocessed images would be put into the PA-UNet model. Finally, an accurate water map is provided.
Another work direction is to apply this method to urban water rendering with higher resolution SAR data and combine it with high-precision land use data to analyze building inundation during flood.

Conclusions
Fast and near-real-time water extraction, especially flood dynamic monitoring, is very important for water resources regulation, disaster assessment, and rescue. Some departments still use the classic automatic or semi-automatic methods at present, which is time-consuming in large-scale water monitoring.
In this work, we provide a multitemporal water extraction framework, achieving accurate and automatic water extraction and dynamic analysis. By applying the bimodal threshold segmentation method to create multitemporal sample datasets, considerable manual annotation time is saved. The PA-UNet model constructed by introducing the attention block and pyramid pooling module into the UNet model realized accurate water extraction. Specifically, on the one hand, compared with the extraction results in Dongting Lake and Poyang Lake using Sentinel-2 data, high correlations of 0.9685 and 0.9472 are obtained, respectively. On the other hand, compared with the residual UNet and UNet models, PA-UNet shows better detection performance with an extraction accuracy greater than 95%. Then, based on the above research, we carried out time series water extraction in Dongting Lake and Poyang Lake from 2017-2020. Finally, the dynamic changes of the two lakes were carried out using GIS modeling. The main conclusions obtained are as follows: (1) the water change trends of Dongting Lake and Poyang Lake in April, July, and November from 2017 to 2020 are the same, and both of them are in a flood period in July. In 2017, 2019, and 2020, the water area of Dongting Lake is over 3000 km 2 , while the water area of Poyang Lake is over 5000 km 2 and reaches even larger 6000 km 2 in 2020; (2) There are two peaks of water area in Dongting Lake every year-one occurs around March, and the other occurs around July.
The above conclusions show that the automatic production of water samples based on the bimodal threshold method, combined with the deep learning method, can realize accurate monitoring of large-scale water bodies, which has important value for hydrological monitoring, flood control, disaster reduction, and policy-making applications. In the future, we will try to transplant the framework proposed in this paper to the Google Earth engine platform to achieve rapid emergency response services.

Data Availability Statement:
The Sentinel-1 data presented in this study are openly and freely available at https://urs.earthdata.nasa.gov/; Sentinel-2 data presented in this study are openly and freely available at https://scihub.copernicus.eu/; GF-3 data presented in this study are obtained at http://www.cresda.com/CN.