Large-Scale Surface Water Mapping Based on Landsat and Sentinel-1 Images

Tang, Hailong; Lu, Shanlong; Ali Baig, Muhammad Hasan; Li, Mingyang; Fang, Chun; Wang, Yong

doi:10.3390/w14091454

Open AccessArticle

Large-Scale Surface Water Mapping Based on Landsat and Sentinel-1 Images

¹

Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China

²

School of Geospatial Engineering and Science, Sun Yat-Sen University, Zhuhai 519082, China

³

College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China

⁴

Institute of Geo-Information & Earth-Observation (IGEO), PMAS Arid Agriculture University, Rawalpindi 46300, Pakistan

⁵

School of Earth Sciences and Spatial Information Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

^*

Author to whom correspondence should be addressed.

Water 2022, 14(9), 1454; https://doi.org/10.3390/w14091454

Submission received: 8 April 2022 / Revised: 28 April 2022 / Accepted: 29 April 2022 / Published: 2 May 2022

(This article belongs to the Special Issue Application of Remote Sensing Technology to Water-Related Ecosystems)

Download

Browse Figures

Versions Notes

Abstract

:

Surface water is a highly dynamical object on the earth’s surface. At present, satellite remote sensing is the most effective way to accurately depict the temporal and spatial variation characteristics of surface water on a large scale. In this study, a region-adaptive random forest algorithm is designed on the Google Earth Engine (GEE) for automatic surface water mapping by using data from multi-sensors such as Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-1 SAR images as source data, and China as a case study region. The visual comparison of the mapping results with the original images under different landform areas shows that the extracted water body boundary is consistent with the water range in the image. The cross-validation with the JRC GSW validation samples shows a very high precision that the average producer’s accuracy and average user’s accuracy of water is 0.933 and 0.998, respectively. The average overall accuracy and average kappa is 0.966 and 0.931, respectively. The independent verification results of lakes with different areas also prove the high accuracy for our method, with a maximum average error of 3.299%. These results show that the method is an ideal way for large-scale surface water mapping with a high spatial–temporal resolution.

Keywords:

surface water mapping; remote sensing; random forest; Google Earth Engine

1. Introduction

Surface water, including lakes, rivers, swamps, and other forms of water bodies, is an essential part of the earth’s water resources and plays a vital role in agriculture, industrial production and climate regulation. The quick and accurate acquisition of spatial distribution information of surface water is of great significance for water resources monitoring, management, and flood and drought disaster prevention [1,2]. Due to the influence of climate, seasons, and human activities, surface water is continuously changing [3]. The surface water changes exhibit highly dynamic characteristics both spatially and temporally and are supposed to be monitored in real- or near real-time. Remote sensing provides us the advantages of macroscopic, dynamic, continuous, and low-cost monitoring of the surface water distribution patterns with better understanding of spatiotemporal changes [4].

In the past few decades, many studies have focused on surface water mapping methods based on remote sensing images [5,6,7]. In general, surface water extraction methods can be roughly classified into three categories: the threshold method based on single-band images, the identification method based on spectral index, and the image classification method.

For optical remote sensing images, a series of satellites such as MODIS, Landsat, Sentinel, and Chinese high spatial resolution satellite (Gao Fen, GF) provide a large number of freely available remote sensing data sources for surface water mapping. For example, Klein et al. [8] extracted surface water in Central Asia from 1986 to 2012 with a single-band threshold method, based on the near-infrared band (NIR) images of 250 m resolution of MODIS and 1.1 km resolution of AVHRR images. Lu et al. [9] used an improved Otsu method to calculate the dynamic segmentation threshold within each water mask to extract surface water, based on the NIR band of the MODIS 8-day synthetic image product. The single-band threshold method is one of the most common water extraction methods based on the simple principle to implement. However, the spectral reflectance characteristics of water bodies vary with many factors such as seasons, geographical locations, and depth, and the features of the ground objects that a single-band image can reflect are limited. Non-water objects may have spectral reflectance characteristics similar to those of water in a specific band. Therefore, this method has some limitations in extracting the spatial distribution of surface water, especially for surface water mapping on a large scale.

The spectral-index-driven water body extraction method is based on the different spectral reflectance characteristics of water in different bands, combining multiple bands to highlight the water body information from remote sensing images [10]. For example, the Normalized Difference Water Index (NDWI) proposed by McFeeters [11] highlights the water information in the image and suppresses vegetation and soil information based on the prominent reflection of water in the green band and strong absorption in the infrared band. The Modified Normalized Difference Water Index (MNDWI) proposed by Xu [12] can effectively eliminate the interference of urban building shadows on surface water extraction and improve the accuracy of surface water extraction in urban areas. Zou et al. [13] produced a 30-m resolution annual surface water frequency map in the United States, based on Landsat 5, 7, and 8 images. Based on sentinel-2 data, Yang et al. [14] generated a monthly surface water extent covering France with a resolution of 10 m. The water extraction method based on the spectral index is easy to implement and widely used in regional water body extraction. However, when using this method to extract water bodies, it is not easy to find an optimal segmentation threshold to distinguish water bodies from other ground objects [15,16].

In recent years, image classification methods have widely been used in the automatic recognition and classification of remote sensing images, and have been proved to be an ideal method to achieve better image classification accuracy. For example, Feng et al. [17] mapped surface water extent in Mainland Alaska using random forest regression based on 8-day VIIRS surface reflectance composites. Li et al. [18] used a random forest method to map the annual surface water distribution in Huizhou, China from 1986 to 2020. Duan et al. [19] extracted surface water distribution information in Wuhan, China, using a new lightweight convolutional neural network based on GaoFen-1D data. These studies focus on small-area or large-area low-resolution surface water mapping. Pekel et al. [3] obtained a monthly 30 m resolution Global Surface Water data set based on Landsat series satellite data by constructing an expert system for surface water identification. This data set is attributed for higher temporal and spatial resolutions and has widely been recognized and adopted. However, the data set is updated once a year, so it is difficult for users to obtain the dynamic changes of surface water distribution in real-time for any specific area of interest. Furthermore, the method of producing this data set is very complicated, and it is difficult for users to reproduce. In summary, for a large number of scientific researchers and users engaged in remote sensing and water resources-related research, there is still a lack of simple methods that can be used to obtain a higher spatio-temporal resolution for surface water mapping from a regional to global scale.

Although the study of surface water extraction seems straightforward, the water body is a highly dynamic entity whose spectral characteristics and dielectric properties change with the sediment contents, water depth, chlorophyll concentration, and other features. The combination of optical image and SAR data can simultaneously reflect the spectral reflectance information of ground objects and the characteristics of radar backscattering, considered to have unique advantages in ground object extraction [20,21]. How to effectively combine multi-temporal and multi-source remote sensing data to accurately map surface water in a large area with a high spatio-temporal resolution still needs more effort to research.

In recent years, the cloud computing platform represented by Google Earth Engine (GEE) significantly promoted this process. It provides a strong support for remote sensing calculation and analysis spanning over large areas. Furthermore, the platform provides a large amount of remote sensing data for free, avoiding tedious and time-consuming data downloading and the preprocessing process; therefore, it has widely been used over large-scale surface water mapping [3,22,23,24]. The new generation of mapping methods combining image classification methods based artificial intelligence, cloud computing, and remote sensing big data has become common for land surface mapping [25].

However, there is still a lack of surface water remote sensing extraction methods with a high spatio-temporal resolution and high precision, and which are automatic, reproducible, and suitable for large-scale applications. In this study, a region-adaptive random forest algorithm is designed on the GEE cloud computing platform for automatic surface water mapping with Landsat 7 ETM+, Landsat 8 OLI images, and Sentinel-1 SAR images as multi-sensor source data, by using China as a case study region. The visual and statistical comparison is made to verify the accuracy of the method and the derived surface water data set. The advantages and disadvantages will also be discussed.

2. Study Area and Data

2.1. Study Area

China is the third-largest country in the world. There are plains, platforms, hills, and mountains from small, medium, and large to extremely undulating mountains (Figure 1). From sea level in the East, the terrain becomes higher in the west, with complexity and diversity in the central and southern regions. The Qinghai-Tibet Plateau in western China, with the highest terrain in China and altitude mostly above 4000 m, is mainly composed of plateaus and extremely high mountains covered with many glaciers and perennial snow. The eastern part of China is mainly composed of vast plains and hills, and is the main economic hub with industrial production units and thus appears as the most populous belt. The significant latitudinal difference between the north and south of China and the complex and diverse terrain conditions make Chinese climate diverse which includes temperate monsoon climate, subtropical monsoon climate, tropical monsoon climate, tropical rain forest climate, moderate continental climate, and plateau mountain climate with further subdivisions of climatic zones. Such complex and diverse topography and climatic conditions allow the simultaneous existence of different water forms such as glaciers, snow, salt water, and fresh water. Therefore, China appears as an ideal study area for testing the accuracy and stability of our proposed surface water extraction method.

2.2. Data

The data sources used for surface water extraction in China mainly include Landsat 7 ETM+ images, Landsat 8 OLI images, and Sentinel-1 SAR images, and auxiliary data mainly include SRTM 30 m resolution digital elevation model (DEM), artificial impervious surface data from 2015 (https://doi.org/10.5281/zenodo.3505079, accessed on 3 April 2022), and the Global Land Ice Measurements from Space (GLIMS) Glacier Database. Additionally, the JRC GSW data set is used to collect training and accuracy assessment samples.

All the top-of-atmosphere (TOA) images and surface reflectance (SR) images of Landsat 7 and Landsat 8 images, and Sentinel-1 Ground Range Detected (GRD) images from January 2019 to June 2020 in China are directly used through the GEE platform. As the reflectance value from some deep water in the SR images is found negative or greater than one, these invalid data which alter the accuracy of surface water extraction are discussed in Section 4.4. The TOA images are used as the main data, and the QA band of SR images is used as auxiliary data to remove clouds, shadows, and snow in the TOA images. In GEE, all the Sentinel-1 GRD images are preprocessed with Sentinel-1 toolbox, including thermal noise removal, radiometric calibration, and terrain correction. Finally, each pixel value in the GRD images is processed into a backscatter coefficient in decibels (dB).

The 1 arcsec (approximately 30 m) resolution digital elevation model data, Shuttle Radar Topography Mission (SRTM) DEM, produced by NASA Jet Propulsion Laboratory (JPL), is often used to generate slope data, which was used as auxiliary data for water body recognition, by choosing a fixed slope value to eliminate the interference of some non-water ground objects such as terrain shadows [15,24]. In this study, the slope is used to filter out the non-water regions.

For other auxiliary data, the GLIMS glacier dataset was used to mask the glacier pixels in images. The 30 m artificial impervious surface data are used to eliminate the interference of the building area. The two years of JRC GSW permanent surface water data set is used to obtain sample data for model training and accuracy assessment of final results.

3. Methodology

Dynamic surface water extraction was achieved using a regional adaptive random forest method that automatically obtains training sample points. The whole method process consists of four main steps. First, the training sample points were automatically obtained based on the JRC GSW permanent water dataset. Secondly, monthly image compositing was performed by combining Landsat-7/8 optical and Sentinel-1 SAR images, and then DEM data was merged. Thirdly, the features for surface water extraction included spectral reflectance features, terrain features, radar backscattering features, texture features, and spectral indices. Then, in each 5° × 5° geographical grid cell, a random forest classifier was trained based on the sample points and classification features to extract surface water. Finally, the accuracy of water body extraction results was evaluated by comparing the surface water distribution data with the JRC GSW data of the same period and the water body boundary obtained through interpretation of higher resolution images (Figure 2).

3.1. Sample Collection

The samples are one of the critical inputs of machine learning classification. The number, accuracy, and purity of the samples affect the final classification result [26]. Common sample acquisition methods include manual interpretation and automated acquisition. In this study, due to the large scope of the study area, the required sample size is also large, so it is suitable to automatically construct the sample data set. The JRC GSW dataset was used to extract water and non-water body classification samples automatically. The main reasons for choosing this data set include: (1) the spatial resolution of this data set is 30 m, which can ensure the selected samples have high purity and meet the needs of surface water extraction research with a resolution of 30 m. (2) This dataset is generated based on an expert knowledge analysis system, which eliminates the pixels that easily affect the extraction of water bodies, such as terrain shadows, ice and snow, and building shadows, and has high accuracy. (3) The temporal resolution of this dataset is one month, and the accuracy of the samples can be ensured by selecting training samples of water and non-water from twenty-four months of data.

To further minimize the impact of misclassification in the water samples and to ensure high accuracy and purity, only the non-changed water and non-water areas in the twenty-four months from January 2018 to December 2019 were used to collect water and non-water training samples. Furthermore, since the complexity of the characteristics of ground objects varies significantly in different regions, to ensure that the collected training sample points can better reflect the regional objects’ features, the entire study area was divided into 62 geographic grids of 5° × 5° (Figure 1). The training sample points in each grid were collected separately. Based on the facts that the different number and proportion of training samples affect the accuracy of surface water extraction [27,28,29], the 1:3 ratio with 1000 water and 3000 non-water training sample points in each grid were automatically obtained. Finally, 248,000 training sample points were collected in the 62 grids every month.

3.2. Multi-Source Features Selection

The features used for surface water extraction include the spectrum and texture of the Landsat-7/8 TOA images, the backscatter coefficient and texture of the Sentinel-1 SAR GDR images, and the slope and aspect derived from SRTM DEM data. The specific process of integrating these three types of data on the GEE platform can be summarized as follows: First, within the scope of the study area, the interferential pixels in the TOA image were removed through the QA band of the Landsat-7/8 SR data, mainly including clouds, cloud shadows, and snow-related pixels. Then, monthly median and mean synthesis was performed on Landsat TOA images and Sentinel-1 SAR images, respectively. Additionally, the 10 m Sentinel-1 image is resampled into the 30 m resolution image.

The B1–B5 and B7 bands of Landsat7 images and the B2–B7 bands of Landsat-8 images reflect the spectral reflectance characteristics of ground targets in blue, green, red, near-infrared (NIR), shortwave infrared 1 (SWIR1), and shortwave infrared 2 (SWIR2) bands, which can distinguish various ground objects, being ideal characteristics for surface water extraction. Meanwhile, the commonly used indices to enhance water information (NDWI, MNDWI), vegetation information (NDVI), and building information (NDBI) were added to the feature set for water extraction. Additionally, the NIR band was used to calculate the local texture features of the image based on the gray-level co-occurrence matrix within a window size of 7 × 7 [27]. The three texture features of variance, dissimilarity, and entropy were finally used as water extraction features.

In the Sentinel-1 SAR images, the backscatter coefficient of water bodies is usually lower than other types of ground objects, such as construction areas, farmland, and rocks, which means the backscattering coefficients of VV and VH bands are ideal classification features. So, the monthly average of the VV and VH bands were calculated as two features of water classification. Because the texture information of SAR images can improve the accuracy of target object extraction [30], the textures of Sentinel-1 SAR images were calculated as features for the extraction.

Additionally, the terrain information of slope and aspect was introduced into the classification process. Additionally, the slope data was used to mask out areas with a slope greater than 10°.

3.3. Constructing a Regional Adaptive Random Forest Classification Model

Random forest (RF) is an integrated learning algorithm based on the decision tree, which has fast training speed and is suitable for large-scale classification. Compared with other machine learning models, the RF model can better handle high-dimensional multicollinearity data with higher accuracy and computational efficiency [31]. When constructing a random forest classifier, there are usually two key parameters that need to be determined. The one is the number of classification trees (Ntree), and the other is the number of classification features referenced in each classification (Mtry). Because the former research has shown that the selection of these two parameters has almost no significant impact on the accuracy of the classification results [32], the default value 500 for Ntree of the RF model and the arithmetic square root of the total feature number of the training sample for Mtry were used here.

In order to eliminate the influence of large-scale topography and land cover difference on water extraction accuracy, the regional adaptive water extraction strategy was used. Based on the sample data of 62 grids in China, 62 regional classifiers were trained and run separately.

3.4. Accuracy Evaluation

The monthly surface water data in 2019 is compared with the JRC GSW monthly data. The location of data missing due to aspects such as clouds and shadows, varies in different months. Therefore, when evaluating the accuracy, a separate validation sample point set must be constructed every month to ensure that all validation sample points are not located in the missing data area. Moreover, due to the different strategies of cloud and shadow removal, this study’s missing data areas are not entirely consistent with the JRC GSW dataset. So, the collect validation sample points should also locate in the areas where both datasets are available.

Meanwhile, the uniformity of the spatial distribution of these validation areas should also be considered. Finally, six 5° × 5° validation regions, located in the Northeast (NE), North (NC), West (WC), Northwest (NW), South (SC), and East (EC) of China were selected (Figure 3). There may be no data in a validation area in some months, which a neighboring area will replace. In each validation region, 2000 validation sample points were collected each month, including 1000 water samples and 1000 non-water samples. The validation samples and the training samples were not duplicated. Finally, a total of 144,000 validation sample points were collected in 2019.

A total of 6 indicators were used to evaluate the accuracy of the surface water extraction results, including producer’s accuracy of water (PAW), user’s accuracy of water (UAW), producer’s accuracy of non-water (PANW), user’s accuracy of non-water (UANW), overall accuracy (OA), and Kappa coefficient.

Meanwhile, a total of nine large, medium and small lakes located in different geographical zones were used to evaluate the detailed performance of the proposed method in single water body extraction (Figure 3). The lake areas in different months from January 2019 to June 2020 based on 10 m resolution Sentinel-2 images were visually interpreted. The lake areas in the same month and the distribution of the lake areas sequence were quantitatively and qualitatively compared.

4. Results and Discussion

4.1. Visual Comparison of Water Extraction Results

The comparison of the extraction results of river and lake boundaries with images under different ground cover backgrounds can intuitively reflect the adaptability of our proposed surface water extraction method. Figure 4 indicates that the extracted water boundary is in good agreement with the original image under various complex land cover conditions, such as large rivers in plain areas, lakes in desert areas, large and small lakes in plateau alpine areas, and rivers in urban areas.

4.2. Comparison with JRC GSW

The comparison results between this study’s results and the JRC GSW in 2019 show that the surface water datasets have a very high degree of agreement in various accuracy assessment indicators. The average PAW of the six verification areas in twelve months is 0.933, average UAW is 0.998, average PANW is 0.998, average UANW is 0.938, average OA is 0.966, and average Kappa coefficient is 0.931 (Table 1).

From the perspective of PAW, the maximum value of the average PAW of all verification areas in 12 months is 0.979, and the minimum value is 0.916. This means that the probability of water pixels misclassified into non-water is 2.1–8.4% in this study, compared with the JRC GSW data. The maximum and minimum average UAW is 0.999 and 0.998, indicating a much lower probability (0.1–0.2%) of misclassifying non-water into water. Similarly, the maximum and minimum average PANW in twelve months is 0.999 and 0.998, and the maximum and minimum average UANW is 0.980 and 0.923. For OA and Kappa coefficient, the maximum and minimum average OA in twelve months is 0.989 and 0.957, respectively, and the maximum and minimum Kappa coefficient is 0.980 and 0.913, respectively. Both of them have proved a very high total accuracy of this study. From the perspective of spatial distribution, the accuracy in the north is slightly worse than in the south. The overall accuracy and Kappa coefficient of the NE and NW verification areas are the lowest. The overall accuracy is 0.957 and 0.962, and the Kappa coefficient is 0.913 and 0.924, respectively. The EC and SC verification areas achieve the highest accuracy, where the overall accuracy is 0.989 and 0.988, and the Kappa coefficient is 0.977 and 0.980, respectively. From the perspective of time distribution, the accuracy in winter is slightly worse than in other seasons, which may be due to the influence of winter ice and snow on surface water extraction, especially in northern China.

4.3. Comparison with the Different Size of Sample Lakes

The lakes used for validation are divided into three categories: large, medium, and small. Among them, large lakes include Zhaling Lake, Hongze Lake, and Zhari Namco Lake, with an area greater than 500 km². Medium-sized lakes include Changhu Lake, Dalinuoer Lake, and Fuxian Lake, with an area of 100–500 km². Small lakes include Daxi Reservoir, Nanshui Reservoir, and Fengjiashan Reservoir, with an area of less than 100 km² (Table 2). These nine lakes are located in different geographical locations and represent large, medium, and small water bodies. They are used to assess the robustness of the method to extract different types of water bodies. Due to the influence of clouds, cloud shadows, etc., only part of the months can obtain complete lake images. When there is missing data on the lake boundary, the data for that month is discarded. When there is missing data inside the lake, they are artificially filled.

The comparison of lake areas based on remote sensing images obtained at two resolutions shows that for three large lakes, Zhaling lake, Hongze lake, and Zhari Namco Lake, the maximum area differences are 9.128 km², 163.763 km², and 3.181 km², respectively, with average errors of 0.568%, 9.9%, and 0.064%, respectively. The areas of Zhaling lake and Zhari Namco Lake are highly consistent. Hongze lake has a larger area difference mainly due to the large amount of agricultural land around the lake. The size of farmland is usually small, and in the water body extraction result of 30 m resolution Landsat image, the water bodies of farmland and lake water bodies are connected together, which makes the extracted lake water area increase. In contrast, in the 10 m resolution sentinel images, the water bodies of farmland and lake waters were are easily separated. For the three medium-sized lakes, Changhu Lake, Dalinuoer Lake, and Fuxian Lake, the maximum area differences are 5.698 km², 1.659 km², and 0.994 km², respectively, with average errors of 2.125%, 0.589%, and 0.298%, respectively. The areas of water bodies extracted from the two resolution images of these three lakes are highly consistent. For the three small lakes, Daxi Reservoir, Nanshui Reservoir, and Fengjiashan Reservoir, the maximum area differences are 0.601 km², 1.641 km², and 0.425 km², respectively, with average errors of 2.8%, 3.299%, and 1.589%, respectively. The areas of the three small lakes have high consistency. In general (Figure 5), for these nine large, medium, and small lakes, the area difference between the two area datasets is always maintained within a small range. It shows that the large-scale surface water extraction method in this study ensures that the global results achieve high accuracy and ensures that individual water bodies of different sizes achieve satisfactory accuracy.

4.4. Comparison of Surface Water Products

Poyang Lake is the largest freshwater lake in China, and we compared the water body details portrayed by several surface water mapping products using Poyang Lake basin as an example (Figure 6). Due to the missing data caused by cloud and other factors, only the relatively complete monthly surface water distribution of Poyang Lake basin was extracted in March 2020. The surface water distribution of Poyang Lake basin in March 2020 extracted from this study and JRC GSW, both with 30 m resolution, are shown in Figure 6a,b, respectively. In addition, Figure 6c,d are both from the surface coverage classification product of 2020, where c is from ESA WorldCover product, the official global land cover map published by ESA with 10 m resolution (https://esa-worldcover.org/en assessed on 3 April 2022), and d is from the GlobeLand30 product, with a spatial resolution of 30 m, developed by the Chinese Ministry of Natural Resources and the first global geographic information public product provided by China to the United Nations (http://www.globallandcover.com/ assessed on 3 April 2022). Comparing the maps in Figure 6, it can be seen that maps a and b portray the closest and most detailed surface water distribution, which is because both of them depict the water distribution in Poyang Lake basin in March 2020, which includes seasonal water bodies, while maps c and d are annual water distribution maps, which exclude seasonal water bodies in the year. From the area of Poyang Lake responded by each map, a, b, and c are closer, and the area of water bodies in Poyang Lake depicted in d is obviously larger, probably because this data reflectedthe largest lake area in the year. From the local comparison of these surface water products, it can be seen that the surface water extraction method proposed in this study not only extracts the surface water distribution map accurately, but also retains the rich detail portrayal of water bodies.

4.5. Difference between TOA Image and SR Image in Surface Water Extraction

The TOA image reflects the top of the atmosphere’s spectral reflectance characteristics, while the SR image reflects the spectral reflectance characteristics of the earth’s surface. Additionally, the SR image is generated from the TOA image after being atmospherically corrected. Generally, when the classification of ground objects or the spectral index calculation is required, it is more appropriate to select SR images to reflect the ground surface’s actual spectral reflectance characteristics. However, when there are objects with strong absorption or high reflection on the ground surface (such as deep water), the SR image may generate reflectance values that are negative or greater than 1, resulting in invalid data. There are many deep lakes in the Qinghai-Tibet Plateau in the study area of this paper, and these lakes are often missing data in SR images. As shown in Figure 7, the water bodies extracted from TOA images can well reflect the current distribution of surface water in remote sensing images.

In contrast, the water bodies extracted from SR images have a significant amount of data missing. For example, there are many vacancies in the eastern part of Siling Co Lake extracted from SR images. Simultaneously, similar data are missing in the west and south of Xuru Co, almost the entire lake in Angzi Co, south of Nam Co, and east of Dagze Co. In this study, the used TOA images of the Landsat 7/8 satellite to ensure the high accuracy of surface water extraction from the data source perspective.

4.6. Advantages and Potential Applications of Our Proposed Method

Our proposed method, which combines surface water features from optical and SAR remote sensing data, employs a relatively simple random forest approach to extract water bodies. Compared with similar studies, it has the advantages of high spatiotemporal resolution and high efficiency. For example, Lu et al. [9] and Han et al. [24] extracted surface water based on MODIS images, and the excessively coarse image resolution limited the spatial accuracy of surface water distribution maps. At the same time, the method of Lu et al. [9] requires pre-production and screening of water body masks, which requires a lot of time and manpower. The method proposed by Han et al. [24] requires extensive post-processing of the water extraction results to ensure accuracy. Pekel et al. [3] used the Landsat series of satellite data to extract the surface water distribution map with a resolution of 30 m, which achieved high resolution and accuracy, but the method used was too complicated, the calculation efficiency was low, and it was difficult for others to reproduce the method. The relatively simple method proposed by us can meet the requirements of high precision at the same time, which is of great significance for the practical application of water resources management. Other researchers and policy makers related to water resources management can easily use our proposed method to obtain a monthly surface water distribution map, providing basic data for related water resources management work. For example, the surface water distribution data of China mapped based on the proposed method was further processed to obtain the area changes of natural and artificial water bodies in China in the last 20 years, and the results were selected for the exhibition on big earth data facilitating the monitoring and evaluation of SDGs. Even the large-scale remote sensing analysis method and idea adopted in this article can be used for reference by other remote sensing research, such as large-scale land cover classification or other types of ground target extraction research.

5. Conclusions

In this study, based on the GEE cloud computing platform, a regional adaptive random forest classifier for large-scale surface water mapping was proposed. Due to the multi-sensor data used from both optical and synthetic aperture radar sensors, our method comprehensively considers the spectral reflectance characteristics, dielectric characteristics, and texture characteristics of the surface water. In the sample collection step, JRC GSW data was used to obtain the training sample set automatically. Then, a 5° × 5° regional adaptive random forest classifier was constructed to ensure the adaptability to complex surface cover, thereby ensuring the accuracy of surface water mapping. Through a total of 144,000 verification sample points, the monthly surface water data this study extracted and the JRC GSW data were compared. The average PAW of the six verification areas in twelve months is 0.933, the average UAW is 0.998, the average PANW is 0.998, the average UANW is 0.938, the average OA is 0.966, and the average kappa coefficient is 0.931. Additionally, taking lakes with different land cover and different sizes as examples, the lake areas we extracted were compared with the lake areas obtained from higher-resolution images in the same month. The large lakes Zhaling Lake, Hongze Lake, and Zhari Namco Lake have an average error of 0.568%, 9.9%, and 0.064%. The medium lakes Changhu Lake, Fuxian Lake, and Dalinuoer Lake have an average error of 2.125%, 0.589%, and 0.298%, respectively. The average errors for the small lakes Daxi Reservoir, Nanshui Reservoir, and Fengjiashan Reservoir are 2.8%, 3.299%, and 1.589%, respectively. These accuracy evaluation results prove the advantages of this method in large-scale high spatio-temporal resolution for surface water extraction through remote sensing.

Although our proposed method can accomplish the task of high-precision surface water mapping over a large area, there are still some limitations. A large number of third-party thematic product data were used in the steps of sample collection and removal of pixels due to interference, such as the JRC GSW data set, impervious surface data set, and land glacier data. The errors in these data products will inevitably affect the accuracy of surface water mapping. Therefore, future method improvement will focus on the sample selection process by iteration using the results extracted by this method. At the same time, the processes of eliminating ice, snow and building interference information will be optimized.

Author Contributions

S.L. and H.T. conceived and designed the method; M.H.A.B.; H.T. performed the experiments; M.L. wrote the program in the experiments; C.F. and Y.W. analyzed the data; H.T. and S.L. wrote the original draft; S.L. and Muhammad Hasan Ali Baig reviewed and edited the paper. If you need the algorithm code and time series data set in this paper, please contact the corresponding author. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (42171283), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19090120, XDA19030104), the National Key Research and Development Program of China (2017YFC0405802, 2016YFC0503507-03), the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) (2019QZKK0202), and the Key Program of the National Natural Science Foundation of China (91637209).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data sources used in this study came from two parts: the GEE platform and a free-access website. Specifically, the Landsat 7 ETM+ images, Landsat 8 OLI images, Sentinel-1 SAR images, SRTM 30 m resolution digital elevation model (DEM), and JRC GSW data set were available at the GEE platform. The artificial impervious surface data of 2015 was downloaded from a free-access website (https://doi.org/10.5281/zenodo.3505079, accessed on 3 April 2022).

Acknowledgments

We appreciate Google Earth Engine for providing a computing platform. We want to thank the anonymous reviewers and the editor for their constructive comments and suggestions for this paper.

Conflicts of Interest

No potential conflict of interest was reported by the authors.

References

Huang, C.; Chen, Y.; Zhang, S.; Wu, J. Detecting, Extracting, and Monitoring Surface Water from Space Using Optical Sensors: A Review. Rev. Geophys. 2018, 56, 333–360. [Google Scholar] [CrossRef]
Vörösmarty, C.J.; McIntyre, P.B.; Gessner, M.O.; Dudgeon, D.; Prusevich, A.; Green, P.; Glidden, S.; Bunn, S.E.; Sullivan, C.A.; Liermann, C.R. Global threats to human water security and river biodiversity. Nature 2010, 467, 555–561. [Google Scholar] [CrossRef] [PubMed]
Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef] [PubMed]
Voigt, S.; Kemper, T.; Riedlinger, T.; Kiefl, R.; Scholte, K.; Mehl, H. Satellite image analysis for disaster and crisis-management support. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1520–1528. [Google Scholar] [CrossRef]
Brisco, B. Mapping and monitoring surface water and wetlands with synthetic aperture radar. Remote Sens. Wetl. Appl. Adv. 2015, 119–136. [Google Scholar] [CrossRef]
Yamazaki, D.; Trigg, M.A.; Ikeshima, D. Development of a global ~90 m water body map using multi-temporal Landsat images. Remote Sens. Environ. 2015, 171, 337–351. [Google Scholar] [CrossRef]
Marcus, W.A.; Fonstad, M.A. Optical remote mapping of rivers at sub-meter resolutions and watershed extents. Earth Surf. Process. Landf. Group 2008, 33, 4–24. [Google Scholar] [CrossRef]
Klein, I.; Dietz, A.J.; Gessner, U.; Galayeva, A.; Myrzakhmetov, A.; Kuenzer, C. Evaluation of seasonal water body extents in Central Asia over the past 27 years derived from medium-resolution remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2014, 26, 335–349. [Google Scholar] [CrossRef]
Lu, S.; Ma, J.; Ma, X.; Tang, H.; Zhao, H.; Hasan Ali Baig, M. Time series of the Inland Surface Water Dataset in China (ISWDC) for 2000–2016 derived from MODIS archives. Earth Syst. Sci. Data 2019, 11, 1099–1108. [Google Scholar] [CrossRef] [Green Version]
Zhou, Y.; Dong, J.; Xiao, X.; Xiao, T.; Yang, Z.; Zhao, G.; Zou, Z.; Qin, Y. Open surface water mapping algorithms: A comparison of water-related spectral indices and sensors. Water 2017, 9, 256. [Google Scholar] [CrossRef]
McFeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Xu, H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 2006, 27, 3025–3033. [Google Scholar] [CrossRef]
Zou, Z.; Xiao, X.; Dong, J.; Qin, Y.; Doughty, R.B.; Menarguez, M.A.; Zhang, G.; Wang, J. Divergent trends of open-surface water body area in the contiguous United States from 1984 to 2016. Proc. Natl. Acad. Sci. USA 2018, 115, 3810–3815. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yang, X.; Qin, Q.; Yésou, H.; Ledauphin, T.; Koehl, M.; Grussenmeyer, P.; Zhu, Z. Monthly estimation of the surface water extent in France at a 10-m resolution using Sentinel-2 data. Remote Sens. Environ. 2020, 244, 111803. [Google Scholar] [CrossRef]
Shanlong, L.; Gaohuai, X.; Li, J.; Wei, Z.; Haijing, L. Extraction of the spatial-temporal lake water surface dataset in the Tibetan Plateau over the past 10 years. Remote Sens. Land Resour. 2016, 28, 181–187. [Google Scholar]
Du, J.-K.; Huang, Y.-S.; Feng, X.-Z.; Wang, Z.-l. Study on water bodies extraction and classification from SPOT image. J. Remote Sens. 2001, 5, 219–225. [Google Scholar]
Feng, W.; Huiran, J. Mapping Surface Water Extent in Mainland Alaska Using VIIRS Surface Reflectance. 2021 IEEE Int. Geosci. Remote Sens. Symp. Igarss 2021, 6120–6123. [Google Scholar] [CrossRef]
Li, K.W.; Xu, E.Q. High-accuracy continuous mapping of surface water dynamics using automatic update of training samples and temporal consistency modification based on Google Earth Engine: A case study from Huizhou, China. Isprs J. Photogramm. Remote Sens. 2021, 179, 66–80. [Google Scholar] [CrossRef]
Duan, Y.M.; Zhang, W.Y.; Huang, P.; He, G.J.; Guo, H.X. A New Lightweight Convolutional Neural Network for Multi-Scale Land Surface Water Extraction from GaoFen-1D Satellite Images. Remote Sens. 2021, 13, 4576. [Google Scholar] [CrossRef]
Shao, Z.; Fu, H.; Fu, P.; Yin, L. Mapping urban impervious surface by fusing optical and SAR data at the decision level. Remote Sens. 2016, 8, 945. [Google Scholar] [CrossRef] [Green Version]
Shao, Z.; Wu, W.; Guo, S. IHS-GTF: A fusion method for optical and synthetic aperture radar data. Remote Sens. 2020, 12, 2796. [Google Scholar] [CrossRef]
Li, Y.; Niu, Z.; Xu, Z.; Yan, X. Construction of high spatial-temporal water body dataset in China based on Sentinel-1 archives and GEE. Remote Sens. 2020, 12, 2413. [Google Scholar] [CrossRef]
Wang, R.; Xia, H.; Qin, Y.; Niu, W.; Pan, L.; Li, R.; Zhao, X.; Bian, X.; Fu, P. Dynamic monitoring of surface water area during 1989–2019 in the hetao plain using landsat data in Google Earth Engine. Water 2020, 12, 3010. [Google Scholar] [CrossRef]
Han, Q.; Niu, Z. Construction of the long-term global surface water extent dataset based on water-NDVI spatio-temporal parameter set. Remote Sens. 2020, 12, 2675. [Google Scholar] [CrossRef]
Parente, L.; Taquary, E.; Silva, A.P.; Souza, C.; Ferreira, L. Next generation mapping: Combining deep learning, cloud computing, and big remote sensing data. Remote Sens. 2019, 11, 2881. [Google Scholar] [CrossRef] [Green Version]
Thanh Noi, P.; Kappas, M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 2018, 18, 18. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, X.; Liu, L.; Wu, C.; Chen, X.; Gao, Y.; Xie, S.; Zhang, B. Development of a global 30 m impervious surface map using multisource and multitemporal remote sensing datasets with the Google Earth Engine platform. Earth Syst. Sci. Data 2020, 12, 1625–1648. [Google Scholar] [CrossRef]
Zhu, Z.; Gallant, A.L.; Woodcock, C.E.; Pengra, B.; Olofsson, P.; Loveland, T.R.; Jin, S.; Dahal, D.; Yang, L.; Auch, R.F. Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative. ISPRS J. Photogramm. Remote Sens. 2016, 122, 206–221. [Google Scholar] [CrossRef] [Green Version]
Li, C.; Wang, J.; Wang, L.; Hu, L.; Gong, P. Comparison of classification algorithms and training sample sizes in urban land classification with Landsat thematic mapper imagery. Remote Sens. 2014, 6, 964–983. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Zhang, H.; Lin, H. Improving the impervious surface estimation with combined use of optical and SAR remote sensing images. Remote Sens. Environ. 2014, 141, 155–167. [Google Scholar] [CrossRef]
Zhang, X.; Liu, L.; Chen, X.; Xie, S.; Gao, Y. Fine land-cover mapping in China using Landsat datacube and an operational SPECLib-based approach. Remote Sens. 2019, 11, 1056. [Google Scholar] [CrossRef] [Green Version]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]

Figure 1. Spatial distribution of landform types in China. The data is obtained from the Resource and Environmental Science and Data Center (http://www.resdc.cn/Default.aspx assessed on 3 April 2022).

Figure 2. Workflow of large-scale surface water extraction based on GEE. The light-yellow boxes indicate the raw input data, the orange boxes indicate the third-party auxiliary data, and the blue boxes indicate the processing done to the data.

Figure 3. The selected validation regions for accuracy assessment.

Figure 4. The visual comparison between extracted results and the original remote sensing images under different ground cover backgrounds.

Figure 5. Comparison of two spatial resolution lake areas of large lakes (a–c), medium lakes (d–f), small lakes (g–k).

Figure 6. The comparison of surface water portrayals of the Poyang Lake basin from this study (a), JRC GSW (b), ESA WorldCover (c), and GlobeLand30 (d) products.

Figure 7. Comparison of water extraction results from TOA image and SR image (a1–a5 are Landsat 7/8 images, b1–b5 are water extraction results from TOA images, and c1–c5 are water extraction results from SR images).

Table 1. Accuracy evaluation results between this study and JRC GSW data in each month of 2019.

		Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
	PAW	0.877	0.982	0.949	0.902	0.840	0.891	0.996	0.927	0.873	0.912	0.905	0.932
	UAW	0.999	0.997	0.999	0.999	0.998	0.999	0.999	0.997	0.998	0.997	0.998	0.993
NE	PANW	0.999	0.997	0.999	0.999	0.998	0.999	0.999	0.997	0.998	0.997	0.998	0.993
	UANW	0.890	0.982	0.951	0.911	0.862	0.902	0.996	0.932	0.887	0.919	0.913	0.936
	OA	0.938	0.990	0.974	0.951	0.919	0.945	0.998	0.962	0.936	0.955	0.952	0.963
	Kappa	0.876	0.979	0.948	0.901	0.838	0.890	0.995	0.924	0.871	0.909	0.903	0.925
	PAW	0.882	0.932	0.882	0.937	0.956	0.956	0.962	0.969	0.953	0.968	0.982	0.922
	UAW	0.999	0.999	0.998	1.000	0.999	0.998	0.999	0.998	0.999	0.998	1.000	0.999
NC	PANW	0.999	0.999	0.998	1.000	0.999	0.998	0.999	0.998	0.999	0.998	1.000	0.999
	UANW	0.894	0.936	0.894	0.941	0.958	0.958	0.963	0.970	0.955	0.969	0.982	0.928
	OA	0.941	0.966	0.940	0.969	0.978	0.977	0.980	0.984	0.976	0.983	0.991	0.961
	Kappa	0.881	0.931	0.880	0.937	0.955	0.954	0.961	0.967	0.952	0.966	0.982	0.921
	PAW	0.894	0.927	0.938	0.977	0.982	0.983	0.989	0.983	0.983	0.979	0.986	0.909
	UAW	1.000	1.000	0.999	0.998	0.997	1.000	1.000	1.000	0.999	1.000	0.997	0.999
WC	PANW	1.000	1.000	0.999	0.998	0.997	1.000	1.000	1.000	0.999	1.000	0.997	0.999
	UANW	0.904	0.932	0.942	0.977	0.982	0.983	0.989	0.983	0.983	0.979	0.986	0.917
	OA	0.947	0.964	0.969	0.988	0.990	0.992	0.995	0.992	0.991	0.990	0.992	0.954
	Kappa	0.894	0.927	0.937	0.975	0.979	0.983	0.989	0.983	0.982	0.979	0.983	0.908
	PAW	0.959	0.965	0.816	0.905	0.886	0.933	0.958	0.954	0.890	0.915	0.983	0.943
	UAW	0.999	1.000	0.996	0.999	0.999	0.999	0.998	0.998	0.999	0.998	0.998	0.998
NW	PANW	0.999	1.000	0.997	0.999	0.999	0.999	0.998	0.998	0.999	0.998	0.998	0.998
	UANW	0.961	0.966	0.844	0.913	0.898	0.937	0.960	0.956	0.901	0.922	0.983	0.946
	OA	0.979	0.983	0.907	0.952	0.943	0.966	0.978	0.976	0.945	0.957	0.991	0.971
	Kappa	0.958	0.965	0.813	0.904	0.885	0.932	0.956	0.952	0.889	0.913	0.981	0.941
	PAW	0.972	0.984	0.957	0.987	0.994	0.986	0.987	0.978	0.947	0.993	0.976	0.987
	UAW	0.998	0.999	0.997	0.999	0.997	0.999	0.996	0.999	0.997	0.998	0.997	0.997
EC	PANW	0.998	0.999	0.997	0.999	0.997	0.999	0.996	0.999	0.997	0.998	0.997	0.997
	UANW	0.973	0.984	0.959	0.987	0.994	0.986	0.987	0.978	0.950	0.993	0.976	0.987
	OA	0.985	0.992	0.977	0.993	0.996	0.993	0.992	0.989	0.972	0.996	0.987	0.992
	Kappa	0.970	0.983	0.954	0.986	0.991	0.985	0.983	0.977	0.944	0.991	0.973	0.984
	PAW	0.981	0.936	0.984	0.985	0.995	0.996	0.991	0.987	0.980	0.994	0.988	0.905
	UAW	0.998	0.997	0.998	1.000	0.999	0.997	0.998	0.996	0.998	0.999	0.999	0.999
SC	PANW	0.998	0.997	0.998	1.000	0.999	0.997	0.998	0.996	0.998	0.999	0.999	0.999
	UANW	0.981	0.940	0.984	0.985	0.995	0.996	0.991	0.987	0.980	0.994	0.988	0.913
	OA	0.990	0.967	0.991	0.993	0.997	0.997	0.995	0.992	0.989	0.997	0.994	0.952
	Kappa	0.979	0.993	0.982	0.985	0.994	0.993	0.989	0.983	0.978	0.993	0.987	0.904
	PAW	0.928	0.954	0.921	0.949	0.942	0.958	0.981	0.966	0.938	0.960	0.970	0.933
	UAW	0.999	0.999	0.998	0.999	0.998	0.999	0.998	0.998	0.998	0.998	0.998	0.998
	PANW	0.999	0.999	0.998	0.999	0.998	0.999	0.998	0.998	0.998	0.998	0.998	0.998
Mean	UANW	0.934	0.957	0.929	0.952	0.948	0.960	0.981	0.968	0.943	0.963	0.971	0.938
	OA	0.963	0.977	0.960	0.974	0.971	0.978	0.990	0.983	0.968	0.980	0.985	0.966
	Kappa	0.926	0.963	0.919	0.948	0.940	0.956	0.979	0.964	0.936	0.959	0.968	0.931

Table 2. Lake area detail comparison for accuracy verification.

Name	Water Types	Area	Number of Months of Lake Area Acquisition	Maximum Area Difference	Average Error
Zhaling Lake	large	more than 500 km²	6	9.128 km²	0.568%
Hongze Lake	large	more than 500 km²	9	163.763 km²	9.9%
Zhari Namco Lake	large	more than 500 km²	8	3.181 km²	0.064%
Changhu Lake	medium	100–500 km²	8	5.698 km²	2.125%
Dalinuoer Lake	medium	100–500 km²	7	1.659 km²	0.589%
Fuxian Lake	medium	10–500 km²	10	0.994 km²	0.298%
Daxi Reservoir	small	less than 100 km²	9	0.601 km²	2.8%
Nanshui Reservoir	small	less than 100 km²	9	1.641 km²	3.299%
Fengjiashan Reservoir	small	less than 100 km²	12	0.425 km²	1.589%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, H.; Lu, S.; Ali Baig, M.H.; Li, M.; Fang, C.; Wang, Y. Large-Scale Surface Water Mapping Based on Landsat and Sentinel-1 Images. Water 2022, 14, 1454. https://doi.org/10.3390/w14091454

AMA Style

Tang H, Lu S, Ali Baig MH, Li M, Fang C, Wang Y. Large-Scale Surface Water Mapping Based on Landsat and Sentinel-1 Images. Water. 2022; 14(9):1454. https://doi.org/10.3390/w14091454

Chicago/Turabian Style

Tang, Hailong, Shanlong Lu, Muhammad Hasan Ali Baig, Mingyang Li, Chun Fang, and Yong Wang. 2022. "Large-Scale Surface Water Mapping Based on Landsat and Sentinel-1 Images" Water 14, no. 9: 1454. https://doi.org/10.3390/w14091454

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Large-Scale Surface Water Mapping Based on Landsat and Sentinel-1 Images

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Data

3. Methodology

3.1. Sample Collection

3.2. Multi-Source Features Selection

3.3. Constructing a Regional Adaptive Random Forest Classification Model

3.4. Accuracy Evaluation

4. Results and Discussion

4.1. Visual Comparison of Water Extraction Results

4.2. Comparison with JRC GSW

4.3. Comparison with the Different Size of Sample Lakes

4.4. Comparison of Surface Water Products

4.5. Difference between TOA Image and SR Image in Surface Water Extraction

4.6. Advantages and Potential Applications of Our Proposed Method

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI