Continuous Dynamics Monitoring of Multi-Lake Water Extent Using a Spatial and Temporal Adaptive Fusion Method Based on Two Sets of MODIS Products

Due to the widespread presence of noise, such as clouds and cloud shadows, continuous, high spatiotemporal-resolution dynamic monitoring of lake water extents is still limited using remote sensing data. This study aims to take an approach to mapping continuous time series of highly-accurate lake water extents. Four lakes from diverse regions of China were selected as cases. In order to reduce the impact of noise and ensure high spatial and temporal resolution of the final results, two sets of MODIS products (including MOD09A1 and MOD13Q1) are used to extract water bodies. This approach mainly comprises preliminary classification, post processing and data fusion. The preliminary classification used the Random Forest (RF) classifier to efficiently and automatically obtain the initial classification results. Post-processing is implemented to repair the classification results affected by noise as much as possible. The processed results of the two sets of products are fused by using the Homologous Data-Based Spatial and Temporal Adaptive Fusion Method (HDSTAFM), which reduces the effect of noise and also improve the temporal and spatial resolution for the final water results. We determined the accuracy using Landsat-based water results, and the values of overall accuracy (OA), user’s accuracy (UA), producer’s accuracy (PA), and kappa coefficients (KC) are mostly greater than 0.9. Good correlation was achieved for a time series of water area and altimetry data, obtained by multiple satellites, and also for water-level data selected from hydrological stations.


Introduction
Lakes are closely related to human life and the natural environment [1]. Accurate recognition of long-term or dynamic changes in lakes is essential [2]. Remote sensing satellites can provide a significant amount of data which is necessary for monitoring. Long-term dynamics of terrestrial water bodies have been mapped [3][4][5]. However, continuous short-term monitoring of terrestrial lake-water

Materials
The data used in this paper includes several MODIS datasets, DEM data, and validation data (including some Landsat images, the water level data from the multiple satellite and the hydrological station).

MODIS Datasets
MOD09A1 is a set of surface reflectance products, containing bands 1-7 with an approximate resolution of 500 m in an eight-day gridded product. Each pixel is related to the best observation during eight days based on high observation coverage, low view angle, absence of clouds or cloud shadows, and aerosol loading. It also provides quality assignments (QA) data that contain MOD35 cloud/snow/ice flag, and the day of the year (DOY) for all pixels. MOD13Q1 is mainly designed to monitor vegetation. It contains NDVI and EVI, as well as red, near-infrared, blue, and mid-infrared bands, corresponding to bands 1, 2, 3, and 7, respectively. The product is available every 16 days at approximately 250 m spatial resolution. Additionally, it also provides QA and DOY related data.
MOD44W product is the global surface water dataset with a spatial resolution of 250 m [27]. The product provides relatively reliable surface water extent and it can be used for training classification models [16,28]. Of note, These MODIS products were downloaded from the Level 1 and Atmosphere Archive and Distribution (LAADS, https://ladsweb.modaps.eosdis.nasa.gov).

ASTER GDEM Data
The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model (GDEM) is probably the most accurate DEM dataset which can be obtained for free on a global scale. ASTER GDEM dataset with 30 m resolution, has been proven to have high vertical and horizontal accuracy, downloaded from https://search.earthdata.nasa.gov.

Validation Data
Some validation data were used as the key to verify the accuracy of the results. As higher-resolution remote sensing images, some cloudless Landsat images in the same period can be used to verify the accuracy. In addition, the satellite altimetry water level data were also collected (  In addition, there are some hydrological stations to monitor the water level in Poyang Lake. In order to better verify the time-series results, we also collected the observed water level data from the Hukou hydrological station that is the most representative for the lake.

Methods
The flowchart of the proposed research algorithm, mainly includes three major parts: Classification using the RF classifier, post-processing, and results fusion using the HDSTAFM (Figure 2). Hukou hydrological station that is the most representative for the lake.

Methods
The flowchart of the proposed research algorithm, mainly includes three major parts: Classification using the RF classifier, post-processing, and results fusion using the HDSTAFM ( Figure  2).

Water Preliminary Classification
The RF classifier belongs to the ensemble learning algorithms. Compared with a single tree classifier, it is more robust, and it has good generalization ability in classification, due to the characteristics of multiple trees and repetitive sampling [19,20]. The main advantages of RF are the following: (a) Robust behavior in handling outliers and noise; (b) efficient and fast classification performance; (c) high level of immunity to overfitting and efficiency in classification of time-series images [29,30].
NDVI, NDWI, and MNDWI were developed by using bands 1-7, and they made up 10 feature variables for the classification of the MOD09A1 product. Regarding MOD13Q1, its own two indices and four bands were considered as feature variables.
To choose accurate samples, we created a buffer using the MOD44W product of each lake, including core water, transition, and core land zones (CLZ). The core water zone (CWZ) was created by selecting the minimum value of 20 pixels × 20 pixels. The transition zone (TZ) is the zone selecting the maximum value of 20 pixels × 20 pixels but removing the CWZ. CLZ is the zone selecting the maximum value of 60 pixels × 60 pixels but removing CWZ and TZ. For different study areas, we randomly selected 2000 water sample points in CWZ and 2000 non-water sample points in CLZ (Figure 3c).

6
including core water, transition, and core land zones (CLZ). The core water zone (CWZ) was created by selecting the minimum value of 20 pixels × 20 pixels. The transition zone (TZ) is the zone selecting the maximum value of 20 pixels × 20 pixels but removing the CWZ. CLZ is the zone selecting the maximum value of 60 pixels × 60 pixels but removing CWZ and TZ. For different study areas, we randomly selected 2000 water sample points in CWZ and 2000 non-water sample points in CLZ (Figure 3c). According to the provided multi-feature variables and sample points, the water and land can be classified using the RF classifier. According to the provided multi-feature variables and sample points, the water and land can be classified using the RF classifier.

Post-Processing
Post-processing includes de-noise and water restoration ( Figure 4). MODIS QA data is used to remove the noise for the preliminary water results, including ice, snow, clouds, and cloud shadows. In addition, some lakes in the mountain or plateau need to remove terrain shadows by slope data. In this study, we considered that no surface water is present in places where the slope is >5 • . 7 the water of the lake, and the water area is generally relatively large. Hence, when the pixels of both two adjacent results correspond to water, the relative pixel of the middle result (located in the same position) that is covered by noise, is also water. Moreover, for each lake, we find the mode of the elevation values for all water pixels in the CWZ after de-noise. Then we convert non-water pixels whose elevation values are less than the mode, into water pixels. If there are multiple modes, we choose the minimum of them as the final value.

The Homologous Data-Based Spatial and Temporal Adaptive Fusion Method
The homologous data-based spatial and temporal adaptive fusion method mainly uses the composite day of the annual data of the MOD13Q1 products.

The principle of the HDSTAFM
The examples are selected from the MOD09A1, between the 20xx001th and the 20xx009th days (0 ≤ xx ≤ 16), and from the MOD13Q1 in the 20xx001th day ( Figure 5). The pixels of the 20xx001th MOD09A1 product are all optimal selected from the 20xx001-20xx008th day, and the pixels of the 20xx009th MOD09A1 product correspond to the 20xx009-20xx016th day. Moreover, the pixels of the Water restoration is firstly restored by using the previous and subsequent results ( Figure 4). Notably, we consider that these kinds of weather events, such as clouds and cloud shadows, increase the water of the lake, and the water area is generally relatively large. Hence, when the pixels of both two adjacent results correspond to water, the relative pixel of the middle result (located in the same position) that is covered by noise, is also water. Moreover, for each lake, we find the mode of the elevation values for all water pixels in the CWZ after de-noise. Then we convert non-water pixels whose elevation values are less than the mode, into water pixels. If there are multiple modes, we choose the minimum of them as the final value.

The Homologous Data-Based Spatial and Temporal Adaptive Fusion Method
The homologous data-based spatial and temporal adaptive fusion method mainly uses the composite day of the annual data of the MOD13Q1 products.

The principle of the HDSTAFM
The examples are selected from the MOD09A1, between the 20xx001th and the 20xx009th days (0 ≤ xx ≤ 16), and from the MOD13Q1 in the 20xx001th day ( Figure 5). The pixels of the 20xx001th MOD09A1 product are all optimal selected from the 20xx001-20xx008th day, and the pixels of the 20xx009th MOD09A1 product correspond to the 20xx009-20xx016th day. Moreover, the pixels of the 20xx001th MOD013Q1 product are also best collected from the 20xx001-20xx016th day. This fusion process can be explained as follows:

Comparison of the different sized windows
Regarding the HDSTAFM process, the first and the second steps depend only on the classification results and process most of the pixels. The third step is mainly performed on the pixels that are related to two inconsistent classification results with different periods. These results are affected by the window size. The primary reference window sizes are 3 × 3, 5 × 5, and 7 × 7. In order to determine the best window size, an illustrative example is shown in the following Figure 6. Pixels A11 and A21 are at the junction of water and land, where A11 = 0 and A21 = 1. The value of the fused pixel A1 is equal to 1 (D1 = A21) for 3 × 3 or 5 × 5 windows, whereas it is equal to 0 (D1 = A11) for the 7 × 7 window. This example illustrates that the water-and-land boundary pixel tends to the "water" result of MOD13Q1 when the window is small; otherwise, it tends to the "non-water" result of the MOD09A1 when exceeding a certain range.
Pixels A12 and A22, are also at the junction of water and land, but A12 = 1 and A22 = 0. The resulting value of the fused pixel A2 is equal to 1 (D2 = A12) for the 3 × 3 window, and it is equal to 0 (A22 = 0) for the 5 × 5 or 7 × 7 windows. This example illustrates that when the window is small, the "water-and-land" boundary pixel tends to the "water" result of MOD09A1; otherwise, it tends to the "non-water" outcome of MOD09A1 when exceeding a certain range. Combining the two examples, the fusion results of A1 and A2 are both biased towards the "water" for a smaller window and towards the "non-water" for a larger window. Moreover, it indicates that 5 × 5 is the best alternative window size compared to 3 × 3 and 7 × 7. A13 and A23 pixels, are relatively far from a wide range of water bodies but A23 is recognized as "water" because of its higher resolution or we have a potential misidentification when the fusion results are biased towards "non-water". First of all, the 20xx001th and the 20xx009th MOD09A1 results data were directly split into 250 m, so one pixel was split into four pixels. The pixel values (0 or 1) were not changed; that is, if the initial pixel value was 1, the values of the four pixels (obtained by the division) were also equal to 1. Their number of pixels after splitting was equal to the number of 20xx001th MOD13Q1. We took some corresponding pixels as the analysis objects ( Figure 5). A1 was any one of these pixels from 20xx001th MOD09A1. Correspondingly in the same position, they were A2 pixel of 20xx009th MOD09A1 and the B pixel of 20xx001th MOD13Q1. The C pixel was the corresponding production date (The day of the year, DOY) of the B pixel, whose value ranged from 1 to 16 at the 20xx001th day.
Let us compare the results of the MOD09A1 and MOD13Q1 on the 20xx001th day. The following three steps exist in the overall process.

•
First step: If 1 ≤ C < 9: We consider B as accurate, and D1 is assigned to B if the case is the following: The result from MOD13Q1 (that is the 16-day optimal value with 250 m resolution) is more accurate than the one obtained from MOD09A1 (that is the eight-day optimal value with 500 m resolution). • Second step: If 9 ≤ C < 17 and A1 = B, this means that the two classification results are consistent, and D1 can be assigned either to A1 or B. • Third step: If 9 ≤ C < 17 and A1 B, we need to compare their correlation with the surrounding pixels to determine the final result. A gradient equation is being proposed in this study (Equation (1)). Comparing the gradient values of A1 and B, if gradient A1 < gradient B , the correlation between A1 and the surrounding pixels in the window can be considered more reasonable than B, so D1 = A1; otherwise D1 = B.
where T is the number of surrounding pixels in the window, dx it is the distance of pixel i and the surrounding pixel t, and dy it is the difference between them. If the pixel window is set to 3 × 3, T = 8, dx it = 1 or √ 2, and dy it = 0 or ±1. These steps can also be expressed as the follows: where D1 is the pixel of fusion results for the 20xx001th day. According to the above process, all pixels of the 20xx001th day results can be calculated. Likewise, using A2, B, and C, all pixels D2 corresponding to the results of the 20xx009th-day can also be produced. The D2 can be expressed as follows:

Comparison of the different sized windows
Regarding the HDSTAFM process, the first and the second steps depend only on the classification results and process most of the pixels. The third step is mainly performed on the pixels that are related to two inconsistent classification results with different periods. These results are affected by the window size. The primary reference window sizes are 3 × 3, 5 × 5, and 7 × 7. In order to determine the best window size, an illustrative example is shown in the following Figure 6. Pixels A11 and A21 are at the junction of water and land, where A11 = 0 and A21 = 1. The value of the fused pixel A1 is equal to 1 (D1 = A21) for 3 × 3 or 5 × 5 windows, whereas it is equal to 0 (D1 = A11) for the 7 × 7 window. This example illustrates that the water-and-land boundary pixel tends to the "water" result of MOD13Q1 when the window is small; otherwise, it tends to the "non-water" result of the MOD09A1 when exceeding a certain range.
Pixels A12 and A22, are also at the junction of water and land, but A12 = 1 and A22 = 0. The resulting value of the fused pixel A2 is equal to 1 (D2 = A12) for the 3 × 3 window, and it is equal to 0 (A22 = 0) for the 5 × 5 or 7 × 7 windows. This example illustrates that when the window is small, the "water-and-land" boundary pixel tends to the "water" result of MOD09A1; otherwise, it tends to the "non-water" outcome of MOD09A1 when exceeding a certain range. Combining the two examples, the fusion results of A1 and A2 are both biased towards the "water" for a smaller window and towards the "non-water" for a larger window. Moreover, it indicates that 5 × 5 is the best alternative window size compared to 3 × 3 and 7 × 7. A13 and A23 pixels, are relatively far from a wide range of water bodies but A23 is recognized as "water" because of its higher resolution or we have a potential misidentification when the fusion results are biased towards "non-water".
In general, the 5 × 5 window is optimal compared to 3 × 3 and 7 × 7, although some separate "water" or "land" results from the MOD13Q1 may be eliminated.

10
In general, the 5 × 5 window is optimal compared to 3 × 3 and 7 × 7, although some separate "water" or "land" results from the MOD13Q1 may be eliminated.

Accuracy Assessment
Some cloudless Landsat images were used to test the results. This study has employed the NDWI to initially extract "water" bodies and to manually correct them. Overall accuracy (OA), user's accuracy (UA), producer's accuracy (PA), and kappa coefficients (KC) have been calculated.

Comparison of the Results of Different Windows
Figure 7b-d shows that the fusion results of the 3 × 3, 5 × 5, and 7 × 7 windows in Poyang Lake have all been feasible for the methodology. However, some details also exist for small water bodies, especially for complex ones, which are characterized by algae, clouds, and their spatial resolution. The "non-water" part of the 3 × 3 window result is relatively large, and some "water" pixels and plaques have not been extracted. The result of the 7 × 7 window is relatively small, and some "nonwater" pixels have been treated as "water".

Accuracy Assessment
Some cloudless Landsat images were used to test the results. This study has employed the NDWI to initially extract "water" bodies and to manually correct them. Overall accuracy (OA), user's accuracy (UA), producer's accuracy (PA), and kappa coefficients (KC) have been calculated.

Comparison of the Results of Different Windows
Figure 7b-d shows that the fusion results of the 3 × 3, 5 × 5, and 7 × 7 windows in Poyang Lake have all been feasible for the methodology. However, some details also exist for small water bodies, especially for complex ones, which are characterized by algae, clouds, and their spatial resolution. The "non-water" part of the 3 × 3 window result is relatively large, and some "water" pixels and plaques have not been extracted. The result of the 7 × 7 window is relatively small, and some "non-water" pixels have been treated as "water". Similarly, Figure 8b-d shows the fusion results of 3 × 3, 5 × 5, and 7 × 7 windows in Hulun Lake, respectively. There are both obvious misclassification in "water" and "non-water" plaques in the result of the 3 × 3 window (Figure 8b). Most of the "water" pixels have been extracted but there are also some subtle "water" pixels or plaques that have been misclassified in the result of the 7 × 7 window (Figure 8c). Compared to the result of the 7 × 7 window, the advantage of the 5 × 5 window is mainly reflected in the better extraction effect on some fine "water" bodies. Similarly, Figure 8b-d shows the fusion results of 3 × 3, 5 × 5, and 7 × 7 windows in Hulun Lake, respectively. There are both obvious misclassification in "water" and "non-water" plaques in the result of the 3 × 3 window (Figure 8b). Most of the "water" pixels have been extracted but there are also some subtle "water" pixels or plaques that have been misclassified in the result of the 7 × 7 window (Figure 8c). Compared to the result of the 7 × 7 window, the advantage of the 5 × 5 window is mainly reflected in the better extraction effect on some fine "water" bodies. Similarly, Figure 8b-d shows the fusion results of 3 × 3, 5 × 5, and 7 × 7 windows in Hulun Lake, respectively. There are both obvious misclassification in "water" and "non-water" plaques in the result of the 3 × 3 window (Figure 8b). Most of the "water" pixels have been extracted but there are also some subtle "water" pixels or plaques that have been misclassified in the result of the 7 × 7 window (Figure 8c). Compared to the result of the 7 × 7 window, the advantage of the 5 × 5 window is mainly reflected in the better extraction effect on some fine "water" bodies.

Accuracy Assessment with Landsat Images
Some classification results generated by the methodology have been validated by comparing with some cloudless Landsat-based results in the same period ( Table 3). The OA, UA, PA, and KC of the four cases with eight images are almost all high. There are some shortcomings for Poyang Lake, where many narrow "water" bodies are not easily detected by lower spatial-resolution MODIS data.

Case Monitoring Results
The results of Bosten, Namco, Hulun, and Poyang lakes for the period 2000-2016 have been acquired.

The Case of Bosten Lake
Water extent variations are mainly located in the north and west of Bosten Lake (Figure 9a). The maximum "water" area occurred in 2003/5/1 (Figure 9b) and the minimum in 2012/1/1 (Figure 9c). The "water" area is consistent with the relative height obtained by the USDA, DAHITI, and LEGOS, respectively (Figure 9d). The number of valid auxiliary data can be seen in Table 2, and the correlation coefficients between "water" area and relative height, are 0.76 (USDA), 0.87 (DAHITI), and 0.88 (LEGOS). The quality of the USDA data is relatively low (Figure 9e).

The Case of Namco Lake
The "water" area of Namco Lake has shown an increasing trend, but it is minimal compared to the entire lake (Figure 10a). The maximum water area occurred in 2016/6/2 (Figure 10b) and the minimum in 2000/7/2 (Figure 10c). The increase of the lake area mainly occurred in 2000-2009, and it is relatively stable, but a slight increase trend occurred in the period 2010-2016 (Figure 10d). The

The Case of Namco Lake
The "water" area of Namco Lake has shown an increasing trend, but it is minimal compared to the entire lake (Figure 10a). The maximum water area occurred in 2016/6/2 ( Figure 10b) and the minimum in 2000/7/2 ( Figure 10c). The increase of the lake area mainly occurred in 2000-2009, and it is relatively stable, but a slight increase trend occurred in the period 2010-2016 (Figure 10d). The results have proven to be reasonable, where the correlation coefficient was found to be R = 0.79 in the case of "water" area and "relative height" for the LEGOS.

The Case of Hulun Lake
The water extent of Hulun Lake has presented large variations in the period 2000-2016, which can be divided into two phases: It has decreased in the period 2000-2010 and increased in the time interval 2011-2016 (Figure 11d). The maximum "water" area occurred in 2003/4/7 (Figure 11b), and the minimum in 2010/11/9 (Figure 11c). The "water" variations mainly occurred in the northeast and southeast of the lake. Figure 11e presents the relationship between the "water" area and the "relative height" in Hulun Lake. They are equal to 0.93, 0.93, and 0.92 for the USDA, DAHITI, and LEGOS, respectively, which indicates the fact that the obtained results are quite reliable (Table 2).

The Case of Hulun Lake
The water extent of Hulun Lake has presented large variations in the period 2000-2016, which can be divided into two phases: It has decreased in the period 2000-2010 and increased in the time interval 2011-2016 (Figure 11d). The maximum "water" area occurred in 2003/4/7 (Figure 11b), and the minimum in 2010/11/9 (Figure 11c). The "water" variations mainly occurred in the northeast and southeast of the lake. Figure 11e presents the relationship between the "water" area and the "relative height" in Hulun Lake. They are equal to 0.93, 0.93, and 0.92 for the USDA, DAHITI, and LEGOS, respectively, which indicates the fact that the obtained results are quite reliable (Table 2).

The Case of Poyang Lake
There are obvious dynamic variations in the water extent for Poyang Lake. The maximum "water" area was as high as 3413.51 km 2 (Figure 12b

The Case of Poyang Lake
There are obvious dynamic variations in the water extent for Poyang Lake. The maximum "water" area was as high as 3413.51 km 2 (Figure 12b) whereas the minimum was 886.03 km 2 ( Figure  12c) and they were recorded on 2016/7/20 and 2004/1/25, respectively.
Seasonal dynamic variation of water extent in Poyang Lake is prominent and regular ( Figure  12d). The correlation coefficients between the "water" area and the "relative height" are 0.83 (DAHITI) and 0.46 (LEGOS) (Figure 12e).
It looks like the "water" extraction results are poor compared with the "relative height" from LEGOS. Therefore, we further compared the relationship between the "water" area and the observed "water" level from the Hukou Station. The results have shown that this correlation is good with R = 0.90 (Figure 13b). It shows that the "water" extraction result is reasonable, and that the LEGOS data may have some defects.   Seasonal dynamic variation of water extent in Poyang Lake is prominent and regular (Figure 12d). The correlation coefficients between the "water" area and the "relative height" are 0.83 (DAHITI) and 0.46 (LEGOS) (Figure 12e).
It looks like the "water" extraction results are poor compared with the "relative height" from LEGOS. Therefore, we further compared the relationship between the "water" area and the observed "water" level from the Hukou Station. The results have shown that this correlation is good with R = 0.90 (Figure 13b). It shows that the "water" extraction result is reasonable, and that the LEGOS data may have some defects.

Advantages of the Approach
The RF classifier, a robust and efficient machine learning method, can perform well in the case of this preliminary classification of "water" and "land". However, noise such as clouds is inevitable

Advantages of the Approach
The RF classifier, a robust and efficient machine learning method, can perform well in the case of this preliminary classification of "water" and "land". However, noise such as clouds is inevitable for the time series of "water" extraction. In order to ensure the reliability and continuity of the results, post-processing including de-noise and "water" restoration has been built based on existing research in this paper. The denoising process mainly used QA layer data and slope data. The "water" restoration included two steps. In past research, the first step commonly used technical means by considering the results of adjacent images. However, many "water" pixels were still not restored, so the second step aimed to further improve the results by using the elevation data. It has to be emphasized that this process mainly relies on the accuracy of the DEM data, where the current ASTER GDEM data basically meet the requirements, but they still need to be improved in the future.
The HDSTAFM provides a good approach for data fusion in remote sensing applications. It has two main advantages: (1) It uses multiple sets of data from the same data source (MODIS), and it takes into account the temporal and spatial information for each pixel; and (2) it performs fusion only for the classification results, which reduces errors caused by the fusion method itself. In fact, compared to the classification results of MOD09A1 or MOD13Q1, the accuracy of the fusion result is higher [18]. The HDSTAFM utilizes more bands information and reduces the impact of noise because of the integration of the two results. In addition, it must be mentioned that the window size is set to 5 × 5, which looks better than 3 × 3 and 7 × 7. The two examples (A1 and A2) have shown that the fusion results of the 5 × 5 window are more robust than the other two, and the example A3 has clearly proven that the fusion results can eliminate some small separate "water" or "land" plaques that may cause noise ( Figure 6).

Challenges for the Four Lakes
Although the classification results have proven to be reliable, several challenges are still existing in mapping time series "water", such as clouds, snow or ice, terrain or cloud shadows, and turbidity. The noise varies significantly for these four lakes due to their different "water" types, climates, and underlying surfaces.
Namco Lake is the only saltwater lake among them. Its "water" type is relatively simple with little sediment or algae, and there are few other features around it that can interfere with the "water" classification. The results are mainly affected by snow, ice, and clouds. The "water" type of Bosten Lake is also relatively simple, but there are some vegetation and artificial buildings as well as the "water" is relatively shallow on the west side of the lake. Other noises such as clouds, cloud shadows, snow and ice are relatively few. The classification results of the lake are good despite some noise effects. Hulun Lake is less affected by the cloud. However, the salinity of the "water" is higher than in the freshest "water" cases. Due to the large fluctuation of the "water" extent and temperature, the classification results have shown some differences.
These problems from the three lakes have been well resolved by the approach, according to the test results (Table 3). The classification results of Poyang Lake are relatively poor. Many types of noise have a severe effect of the "water" extraction, including clouds, clouds shadows, turbidity, vegetation, and algae. Clouds and cloud shadows are common in Poyang Lake, especially in spring and summer. There are no completely reliable methods to remove their effect; however, some serious attempts have been made in this paper (Section 3.2). In addition to the noise, the low spatial resolution of the MODIS data is the main influencing factor for accuracy. There are many small "water" patches and narrow river channels in the Poyang Lake region. They are challenging to be detected using MODIS products, but they can be well detected using Landsat images. Overall, the results of Poyang Lake are reliable (Table 3 and Figures 12 and 13).

Potential Improvement to Remove Noise
Some improvements can still be worth considering, including denoisng and "water" restoration. Zhu et al. [12] detected clouds, cloud shadows and snow for Landsat 4-8 by improving the fmask algorithm. The multi-satellite methodology helps to avoid the noise [31,32]. In addition to optical remote sensing data, radar and microwave remote sensing data are partially suitable for "water" extraction [33,34]. Some optical remote sensing images with much cloudiness can be replaced by radar data during the same period.

Selection of More Efficient Classification Methods
The RF classifier is more efficient and robust than some traditional methods. However, it will also appear obvious misclassification when the samples are poorly representative. Some new classification methods proposed have also been applied successfully towards "water" classification [35,36]. Unfortunately, there are no classification methods that can classify "water" and "land" without being influenced by a certain level of noise so far. Robust and efficient classifiers are still our main pursued goal.

Applications of the Fusion Method for Higher-Resolution Data
The HDSTAFM mainly considers the temporal and spatial relationships of each pixel, where the temporal and spatial features are compared through a gradient value for the adjacent time and the neighboring pixels. The fusion results have proven to be more robust and accurate, and a higher spatiotemporal resolution has been achieved. This method can also be considered to fuse other data sources, such as Landsat time series data. Even different data can still refer to the fusion idea if the temporal and spatial relationship is satisfied.

Conclusions
This study aimed to map continuous lake water extents with high spatiotemporal resolution. It is conducive to more accurately understand the "water" variations of the lakes. Four different types of lakes have been selected and used as application cases. The approach comprises of three basic parts: (1) Preliminary water classification using the RF classifier; (2) post-processing, including de-noise and water restoration; and (3) water result fusion using the HDSTAFM method. Continuous water results at high frequency (every eight days) and moderate spatial resolution (250 m) have been provided for these four lakes.
Some specific conclusions can be conducted: 1.
The classification results are reliably validated using some Landsat-based results. The approach performs efficiently in obtaining time series "water" extent of the lakes based on the MODIS products by comparing against altimetry data and water level data.

2.
The RF classifier is efficient in the long-term sequence classification, and post-processing is also critical in removing noise and restoring water bodies.

3.
Although there are still some shortcoming (e.g., single water pixel information may be eliminated), the HDSTAFM is feasible for the fusion of two sets of water results. Moreover, the 5 × 5 window performs better than the 3 × 3 and 7 × 7 windows.
In general, the approach is effective for continuous long-term monitoring of the lake "water" extent. In the future, we will try to improve it further and mapping dynamic "terrestrial water", including several lakes and rivers.