Open Surface Water Mapping Algorithms: a Comparison of Water-related Spectral Indices and Sensors

Open surface water bodies play an important role in agricultural and industrial production, and are susceptible to climate change and human activities. Remote sensing data has been increasingly used to map open surface water bodies at local, regional, and global scales. In addition to image statistics-based supervised and unsupervised classifiers, spectral index-and threshold-based approaches have also been widely used. Many water indices have been proposed to identify surface water bodies; however, the differences in performances of these water indices as well as different sensors on water body mapping are not well documented. In this study, we reviewed and compared existing open surface water body mapping approaches based on six widely-used water indices, including the tasseled cap wetness index (TCW), normalized difference water index (NDWI), modified normalized difference water index (mNDWI), sum of near infrared and two shortwave infrared bands (Sum457), automated water extraction index (AWEI), land surface water index (LSWI), as well as three medium resolution sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI). A case region in the Poyang Lake Basin, China, was selected to examine the accuracies of the open surface water body maps from the 27 combinations of different algorithms and sensors. The results showed that generally all the algorithms had reasonably high accuracies with Kappa Coefficients ranging from 0.77 to 0.92. The NDWI-based algorithms performed slightly better than the algorithms based on other water indices in the study area, which could be related to the pure water body dominance in the region, while the sensitivities of water indices could differ for various water body conditions. The resultant maps from Landsat 8 and Sentinel-2 data had higher overall accuracies than those from Landsat 7. Specifically, all three sensors had similar producer accuracies while Landsat 7 based results had a lower user accuracy. This study demonstrates the improved performance in Landsat 8 and Sentinel-2 for open surface water body mapping efforts.


Introduction
Global climate change and increasing human activities have been causing large changes in the water bodies on the Earth's surface [1,2].Those changes in surface water bodies have been substantially affecting agricultural and industrial production [3,4], as well as ecological and environmental security [5][6][7][8][9][10].The information about the spatial distribution and area of water bodies is critical for regional economic development and environmental protection [11,12].
With the rapid development of remote sensing technology in recent years, water mapping and change detection based on satellite remote sensing images has become a main approach [13][14][15][16][17][18].Synthetic Aperture Radar (SAR) data has been used to identify surface water area [19]; however, relatively limited availability of SAR data has blocked large scale and long term applications in water body mapping.Meanwhile, Landsat data has the longest period of archive with more than three decades [20][21][22][23], which can track water body changes back to the 1970s.For example, Pekel et al. [24] used all the Landsat images to detect surface water body changes in the last three decades for the entire globe.The methods of water body extraction can generally be divided into two categories; one is the traditional supervised and unsupervised classifications using a single band or multiple bands [25][26][27], and the other one is the water-related spectral index-(water index for short) and threshold-based approach [28][29][30][31][32]. Generally, supervised classification technologies based on spectral signature analysis can effectively identify and detect large water bodies, but these approaches are constrained when performing a rapid and reproducible large scale water body mapping [33].Hence, the water index-based algorithm become an important approach for rapid implementation of water body mapping in large scale regions [34].The water index-and threshold-based approach has been widely used to identify water bodies, due to the unique spectral characteristics of water bodies in the visible and infrared bands.
The water index-and threshold-based water body mapping approaches have undergone a succession of evolution.In the early stage, the spectral band based approach was used for water body extraction.Then a set of water indices were proposed for water body mapping.For example, early in 1985, Crist [35,36] put forward the tasseled cap wetness index (TCW) derived from six bands of surface reflectance data and set a threshold of 0 to separate water and non-water objects.McFeeters [37] proposed the normalized difference water index (NDWI) in 1996 using the value of the green band minus the near-infrared (NIR) band and divided by the sum of the two bands, and water bodies have positive values while the non-water body features have negative values.Though the NDWI could suppress and remove non-water features to a large degree, it failed to suppress built-up land signals efficiently.As a result, the extracted features could be a mixture of water and built-up land noises [38].Based on the NDWI, Xu [38] proposed a modified normalized difference water index (mNDWI) in 2006, and made a change by replacing the NIR band with the shortwave-infrared (SWIR) band, which helped to remove the disturbances from built-up lands.However, the optimal thresholds vary in different locations and time, and the method could also not remove the shadow noises effectively in some areas.Beeri et al. [39] put forward a combination of indices of Sum457, ND5723, and ND571 to form a water extraction model.Among the three indices, Sum457 has been proposed and applied for separating water from the surrounding materials by Al-Khudhairy et al. [40] in 2002, and the two additional indices introduced (ND5723 and ND571) could further minimize the effects of aerosols and other atmospheric interface if the data has been normalized.Feyisa et al. [41] proposed an automated water extraction index (AWEI) in 2013, and furthermore, they used different formats of AWEI for scenes with shadows (AWEIsh) and without shadows (AWEInsh).This separated and systematic technique improved the water body mapping accuracy.Another scheme was the application of the land surface water index (LSWI) for water body or flooding identification, based on the relationship between the LSWI and a vegetation greenness index like the normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) [42].Menarguez [43] put forward a new method by combining each of the three water indices (LSWI, mNDWI, and NDWI) with EVI and NDVI, and the results revealed that this integrated method was more sensitive to water bodies, especially the mixed water and vegetation pixels.Despite the improvements of water index-and threshold-based approaches, a systematic assessment among all the water index-based approaches has not yet been conducted.
All the methods proposed above were originally tested in a certain sensor, and the accuracies of the water body extraction results may vary when different sensors are used.Landsat TM/ETM+ images with medium spatial and moderate temporal resolutions have been used to map land surface water bodies for a long time [44].With evolutionary technical improvements, Landsat 8 OLI has more bands and an improved bandwidth design [45].Sentinel-2A was launched on 23 June 2015 and carries an innovative wide swath high-resolution multispectral imager with 13 spectral bands for a new perspective of our land.The combination of high resolution, novel spectral capabilities, a swath width of 290 km, and frequent revisit times will provide unprecedented views of Earth.How the different water index-based approaches respond to different sensors has not been documented yet.
Given the uncertainties in open surface water body mapping, especially the effects from different water indices and sensors, a study area in the Poyang Lake Basin was selected to examine the performances of different water body mapping algorithms in terms of various water indices and sensors.Specifically, nine algorithms based on the six water indices (TCW, NDWI, mNDWI, SNN, AWEI, and LSWI) or the combination of the water index and vegetation index and three medium resolutions of sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI) were selected to evaluate the performances of different open surface water body mapping strategies, by considering not only deep water bodies but also shallow fresh water bodies.This study expects to provide useful experiences and knowledge for large scale open surface water body mapping and the selection of sensors and algorithms.

Study Area
We chose a case region in the Poyang Lake Basin as our study area.Poyang Lake is the largest freshwater lake in China, and is located in the north of the Jiangxi Province and the south-west border of the Anhui Province.The study area was selected deliberately to ensure that the water surface there remains submerged throughout the year.Relatively shallow and deep freshwater bodies both exist in the study area, and the water bodies in the northern half are shallower than those in the southern half.We also made sure that the case region had effective observations from Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI.The detailed location is shown in Figure 1.

Data Acquisition and Processing
The Landsat 7 satellite is equipped with the Enhanced Thematic Mapper Plus (ETM+), including seven 30 m visible, near-infrared, and shortwave infrared bands, one 60 m thermal band, and one 15 m panchromatic band (Table 1).The Landsat 8 OLI satellite carries two Earth observing sensors of the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS), and the OLI collects image data for nine spectral bands with a spatial resolution of 30 m (15 m panchromatic band) and the TIRS collects image data for two thermal infrared spectral bands with a spatial resolution of 100 m.Both sensors provide a 15-degree field-of-view covering a 185 km across-track ground swath from an altitude of 705 km [46].Sentinel-2 carries the Multi Spectral Instrument (MSI) with 13 bands covering the visible, NIR, and SWIR wavelength regions, including four 10 m visible and near-infrared bands, six 20 m vegetation red edge and short wave infrared bands, and three 60 m coastal aerosol, water vapor, and SWIR-Cirrus bands [47].The band comparison of the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI) is shown in         Note: 1 W and R stand for the wavelength and spatial resolution of the bands of each sensor, respectively.
The water storage of lakes may be different over time, even in different days during a month.In order to make the water level consistent during the study period, the time selection requirement of images was strict in this study.Images of the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI) in the same or close period with less cloud coverage were chosen for this specific study.To achieve this aim, we checked all the availability and cloud percentages of the three datasets during their overlaid period (2015 December-present, Figure 2).As the surface water body extents may change quickly, we selected a period within which all the three kinds of quantified (low cloud coverage) images were available.Finally, we selected the images of Landsat 7 ETM+ (8 February 2016, Path/Row = 121/40), Landsat 8 OLI (16 February 2016, Path/Row = 121/40), and the Sentinel-2 MSI image (10 February 2016).The Level 1 Terrain Corrected (L1T) products of Landsat 7 and 8 surface reflectance data were derived from the U.S. Geological Survey (USGS) [48].The Sentinel-2 MSI image was downloaded from the Sentinel Scientific Data Hub web site as a Level 1C (L1C) product.The Sentinel-2 L1C products were provided in Standard Archive Format for Europe (SAFE) file format [49], and the products were Top Of Atmosphere (TOA) reflectance data [50,51], with 12-bit dynamic   28 20 Note: 1 W and R stand for the wavelength and spatial resolution of the bands of each sensor, respectively.
The water storage of lakes may be different over time, even in different days during a month.In order to make the water level consistent during the study period, the time selection requirement of images was strict in this study.Images of the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI) in the same or close period with less cloud coverage were chosen for this specific study.To achieve this aim, we checked all the availability and cloud percentages of the three datasets during their overlaid period (2015 December-present, Figure 2).As the surface water body extents may change quickly, we selected a period within which all the three kinds of quantified (low cloud coverage) images were available.Finally, we selected the images of Landsat 7 ETM+ (8 February 2016, Path/Row = 121/40), Landsat 8 OLI (16 February 2016, Path/Row = 121/40), and the Sentinel-2 MSI image (10 February 2016).The Level 1 Terrain Corrected (L1T) products of Landsat 7 and 8 surface reflectance data were derived from the U.S. Geological Survey (USGS) [48].The Sentinel-2 MSI image was downloaded from the Sentinel Scientific Data Hub web site as a Level 1C (L1C) product.The Sentinel-2 L1C products were provided in Standard Archive Format for Europe (SAFE) file format [49], and the products were Top Of Atmosphere (TOA) reflectance data [50,51], with 12-bit Water 2017, 9, 256 5 of 16 dynamic range, and each image was in the Universal Transverse Mercator (UTM) map projection.We used the Sen2Cor algorithm on the SentiNel Application Platform (SNAP) developed by European Space Agency (ESA) to perform atmospheric correction to convert the images into product 2A which is the Bottom Of Atmosphere (BOA) reflectance data [48,52].The Sentinel-2 MSI data has multiple bands, and the resolutions of the shortwave infrared bands (20 m) were different from that of the visible and near infrared bands (10 m) [53][54][55].In order to make the Sentinel-2 data and Landsat data comparable, we resampled the visible and near infrared bands to 20 m to conduct further comparisons in this study.Before calculating the water indices, we used the nearest neighbor approach to resample the data of the visible and NIR bands to make the resolutions of the six bands used in this study consistent.range, and each image was in the Universal Transverse Mercator (UTM) map projection.We used the Sen2Cor algorithm on the SentiNel Application Platform (SNAP) developed by European Space Agency (ESA) to perform atmospheric correction to convert the images into product 2A which is the Bottom Of Atmosphere (BOA) reflectance data [48,52].The Sentinel-2 MSI data has multiple bands, and the resolutions of the shortwave infrared bands (20 m) were different from that of the visible and near infrared bands (10 m) [53][54][55].In order to make the Sentinel-2 data and Landsat data comparable, we resampled the visible and near infrared bands to 20 m to conduct further comparisons in this study.Before calculating the water indices, we used the nearest neighbor approach to resample the data of the visible and NIR bands to make the resolutions of the six bands used in this study consistent.

Water Index-Based Approaches
The index TCW developed by Crist [35,36] used all the six bands together with the coefficients empirically determined through analyzing both the simulated and actual data [36,56].Due to the theory that water absorbs almost all incident radiant flux while the land surface reflects significant amounts of energy in near-and shortwave-infrared bands, and the reflectance in the green band is much higher for water compared to the land surface, McFeeters [37] put forward the NDWI using the value of the green band minus the near-infrared (NIR) band, divided by the sum of the two bands.As a result, water features have positive values and thus are enhanced, while the land surface has zero or negative values and is therefore suppressed or eliminated.However, water bodies are often mixed with built-up land noises in NDWI-images due to similar reflectance characteristics in the green and NIR bands between the water and built-up land.Xu [38] proposed the mNDWI by replacing the NIR band with SWIR-1 based on the above theory, achieving a satisfied result in suppressing built-up land noise.Due to water bodies having a relatively low reflectance, especially in the NIR to SWIR bands, Al-Khudhairy et al. [40] developed the Sum457 using the NIR, SWIR-1, and SWIR-2 bands of the Landsat Thematic Mapper (TM) data to delineate water bodies.Xiao et al. [57] developed the LSWI by considering the two bands of NIR and SWIR to estimate water content of the land surface.AWEI was developed by Feyisa et al. [41] to extract surface water with improved accuracy, and the coefficients that were used and the combinations of the chosen bands were determined based on critical examination of the reflectance properties of various land cover types.
In this study, we used nine water index-based algorithms, based on six water indices (TCW, NDWI, mNDWI, SNN, AWEI, and LSWI) or the combination of the water index and vegetation index, as well as three different sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI) of data to extract open surface water bodies.More detailed information about the open surface water body mapping algorithms in this study can be found in Table 2.

Water Index-Based Approaches
The index TCW developed by Crist [35,36] used all the six bands together with the coefficients empirically determined through analyzing both the simulated and actual data [36,56].Due to the theory that water absorbs almost all incident radiant flux while the land surface reflects significant amounts of energy in near-and shortwave-infrared bands, and the reflectance in the green band is much higher for water compared to the land surface, McFeeters [37] put forward the NDWI using the value of the green band minus the near-infrared (NIR) band, divided by the sum of the two bands.As a result, water features have positive values and thus are enhanced, while the land surface has zero or negative values and is therefore suppressed or eliminated.However, water bodies are often mixed with built-up land noises in NDWI-images due to similar reflectance characteristics in the green and NIR bands between the water and built-up land.Xu [38] proposed the mNDWI by replacing the NIR band with SWIR-1 based on the above theory, achieving a satisfied result in suppressing built-up land noise.Due to water bodies having a relatively low reflectance, especially in the NIR to SWIR bands, Al-Khudhairy et al. [40] developed the Sum457 using the NIR, SWIR-1, and SWIR-2 bands of the Landsat Thematic Mapper (TM) data to delineate water bodies.Xiao et al. [57] developed the LSWI by considering the two bands of NIR and SWIR to estimate water content of the land surface.AWEI was developed by Feyisa et al. [41] to extract surface water with improved accuracy, and the coefficients that were used and the combinations of the chosen bands were determined based on critical examination of the reflectance properties of various land cover types.
In this study, we used nine water index-based algorithms, based on six water indices (TCW, NDWI, mNDWI, SNN, AWEI, and LSWI) or the combination of the water index and vegetation index, as well as three different sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI) of data to extract open surface water bodies.More detailed information about the open surface water body mapping algorithms in this study can be found in Table 2.The open surface water body mapping results from the different approaches and sensors were evaluated by comparing against the very high resolution (VHR) imagery.In order to guarantee the temporal consistency of the results and ground truth data, we chose the reference VHR images obtained within ±8 days around the study period.A stratified random sampling method was used for the collection of the reference data.A primary water/non-water map was used to set up the strata, then 150 random points were generated in each stratum, and the sample numbers matched the proportions of the water and non-water areas.Then the buffers (circles of 100 m radius) of those points were generated as region of interests (ROIs) and used to be interpreted with the VHR images from the DigitalGlobe, Centre National d'Etudes Spatiales (CNES)/Astrium on the Google Earth platform, and Gaofen-3 (GF-3) for the recognition of land cover information.We deleted the ROIs with mixed pixels by visually checking the Google Earth platform.Finally, a total of 98 water and 93 non-water body ROIs were chosen (Figure 1).Confusion matrices for all the results from the combinations of the three sensors and nine algorithms were generated on the ENVI 5.3 software for accuracy assessment.

Results
The resultant open surface water body maps from the 27 combinations of the nine algorithms and three sensors are shown in Figure 3. Visually, all the results showed similar patterns of water bodies, while the Landsat-7 based results showed higher water areas than that from the other two sensors.Overall accuracies and Kappa coefficients are shown in Figure 4.All the results had an average overall accuracy of 93.9%, ranging from 88.7% to 95.7%.The two lowest accuracies were from the combinations of the TCW-based algorithm and Landsat 7 ETM+ data (TCW + Landsat 7 ETM+ for abbreviation) and the mNDWI + Landsat 7 ETM+; while the two highest accuracies were from the combinations of the NDWI + Landsat 8 OLI and the NDWI plus VI + Landsat 8 OLI.There was a small difference in the Kappa coefficients of the two methods with the highest overall accuracies (95.7%), the combination of the NDWI + Landsat 8 OLI had a Kappa coefficient of 0.915 while the combination of the NDWI plus VI + Landsat 8 OLI had a slightly lower Kappa coefficient of 0.913.The open surface water body mapping results from the different approaches and sensors were evaluated by comparing against the very high resolution (VHR) imagery.In order to guarantee the temporal consistency of the results and ground truth data, we chose the reference VHR images obtained within ±8 days around the study period.A stratified random sampling method was used for the collection of the reference data.A primary water/non-water map was used to set up the strata, then 150 random points were generated in each stratum, and the sample numbers matched the proportions of the water and non-water areas.Then the buffers (circles of 100 m radius) of those points were generated as region of interests (ROIs) and used to be interpreted with the VHR images from the DigitalGlobe, Centre National d'Etudes Spatiales (CNES)/Astrium on the Google Earth platform, and Gaofen-3 (GF-3) for the recognition of land cover information.We deleted the ROIs with mixed pixels by visually checking the Google Earth platform.Finally, a total of 98 water and 93 non-water body ROIs were chosen (Figure 1).Confusion matrices for all the results from the combinations of the three sensors and nine algorithms were generated on the ENVI 5.3 software for accuracy assessment.

Results
The resultant open surface water body maps from the 27 combinations of the nine algorithms and three sensors are shown in Figure 3. Visually, all the results showed similar patterns of water bodies, while the Landsat-7 based results showed higher water areas than that from the other two sensors.Overall accuracies and Kappa coefficients are shown in Figure 4.All the results had an average overall accuracy of 93.9%, ranging from 88.7% to 95.7%.The two lowest accuracies were from the combinations of the TCW-based algorithm and Landsat 7 ETM+ data (TCW + Landsat 7 ETM+ for abbreviation) and the mNDWI + Landsat 7 ETM+; while the two highest accuracies were from the combinations of the NDWI + Landsat 8 OLI and the NDWI plus VI + Landsat 8 OLI.There was a small difference in the Kappa coefficients of the two methods with the highest overall accuracies (95.7%), the combination of the NDWI + Landsat 8 OLI had a Kappa coefficient of 0.915 while the combination of the NDWI plus VI + Landsat 8 OLI had a slightly lower Kappa coefficient of 0.913.In terms of the performances of the nine algorithms, the NDWI-based algorithm and NDWI plus VI-based algorithm were stable and they shared the same highest overall accuracy for each sensor, while the TCW-based algorithm had the lowest accuracy for each sensor.Both the NDWI-based algorithm and NDWI plus VI-based algorithm had the same highest average overall accuracy of the three sensors (95.0%), followed by those algorithms based on AWEIsh (94.3%),AWEInsh (94.1%), mNDWI plus VI (94.0%),SNN (93.9%), mNDWI (93.4%), and LSWI plus VI (93.4%), while the TCWbased algorithm had the lowest average overall accuracy of 91.8%.The NDWI-based algorithm and the NDWI plus VI-based algorithm had high Kappa coefficients of ~0.90 for the three sensors.Specifically, the NDWI-based algorithm and NDWI plus VI-based algorithm had the same Kappa coefficients (0.874) when using Landsat 7 ETM+ data; while the Kappa coefficient was higher (~0.91) with a slight difference when using the Landsat 8 OLI and Sentinel-2 MSI data.
We also found different sensitivities of water body mapping for the three sensors.Landsat 8 OLI performed best with the highest overall accuracy (95.7%), followed by Sentinel-2 MSI (95.6%) and Landsat 7 ETM+ (93.7%).The average overall accuracies of the nine algorithms for Landsat 8 OLI was the same as that for Sentinel-2 MSI (94.9%), while the value for Landsat 7 ETM+ was the lowest (91.9%).The average Kappa coefficients of the nine algorithms for Sentinel-2 MSI was the highest (0.899), immediately followed by that for Landsat 8 OLI (0.897), and Landsat 7 ETM+ had a lower Kappa coefficient (0.837).Note that the Landsat 8 OLI based results always had higher overall accuracies than that of Landsat 7 ETM+ for all the algorithms, which implies the reliability of the improved design of the Landsat 8 OLI bandwidth.
For each algorithm, the omission error using the Landsat 7 ETM+ data was the lowest among the three kinds of data, and the producer accuracy using the Landsat 7 ETM+ data was slightly higher than those using the Landsat 8 OLI data and Sentinel-2 MSI data (Figure 5a).As the commission error of each algorithm using Landsat 7 ETM+ data was much higher than that using the other two kinds of data, the user accuracy of each algorithm using the Landsat 8 OLI data and Sentinel-2 MSI data was higher than that using the Landsat 7 ETM+ data to a large extent (Figure 5b).It can be seen that the user accuracy difference among the three sensors for each algorithm was the main factor that caused the lowest accuracy of each algorithm using the Landsat 7 ETM+ data.From the perspective of algorithm performance, the producer accuracies of the NDWI-based algorithm and the NDWI plus VI-based algorithm were not the highest for each sensor, and the producer accuracy difference of the nine algorithms was subtle; however, the user accuracy of the NDWI-based and NDWI plus VI-based algorithms was much higher than that of the other algorithms, and the user accuracy differences of all of the nine algorithms were larger than those of the producer accuracies of all the algorithms.That was the main reason for why the NDWI-based and NDWI plus VI-based algorithms performed best with the same highest overall accuracy among the nine algorithms, while the TCW-based algorithm performed the worst with the lowest accuracy due to the largest commission error in the study area.In terms of the performances of the nine algorithms, the NDWI-based algorithm and NDWI plus VI-based algorithm were stable and they shared the same highest overall accuracy for each sensor, while the TCW-based algorithm had the lowest accuracy for each sensor.Both the NDWI-based algorithm and NDWI plus VI-based algorithm had the same highest average overall accuracy of the three sensors (95.0%), followed by those algorithms based on AWEI sh (94.3%),AWEI nsh (94.1%), mNDWI plus VI (94.0%),SNN (93.9%), mNDWI (93.4%), and LSWI plus VI (93.4%), while the TCW-based algorithm had the lowest average overall accuracy of 91.8%.The NDWI-based algorithm and the NDWI plus VI-based algorithm had high Kappa coefficients of ~0.90 for the three sensors.Specifically, the NDWI-based algorithm and NDWI plus VI-based algorithm had the same Kappa coefficients (0.874) when using Landsat 7 ETM+ data; while the Kappa coefficient was higher (~0.91) with a slight difference when using the Landsat 8 OLI and Sentinel-2 MSI data.
We also found different sensitivities of water body mapping for the three sensors.Landsat 8 OLI performed best with the highest overall accuracy (95.7%), followed by Sentinel-2 MSI (95.6%) and Landsat 7 ETM+ (93.7%).The average overall accuracies of the nine algorithms for Landsat 8 OLI was the same as that for Sentinel-2 MSI (94.9%), while the value for Landsat 7 ETM+ was the lowest (91.9%).The average Kappa coefficients of the nine algorithms for Sentinel-2 MSI was the highest (0.899), immediately followed by that for Landsat 8 OLI (0.897), and Landsat 7 ETM+ had a lower Kappa coefficient (0.837).Note that the Landsat 8 OLI based results always had higher overall accuracies than that of Landsat 7 ETM+ for all the algorithms, which implies the reliability of the improved design of the Landsat 8 OLI bandwidth.
For each algorithm, the omission error using the Landsat 7 ETM+ data was the lowest among the three kinds of data, and the producer accuracy using the Landsat 7 ETM+ data was slightly higher than those using the Landsat 8 OLI data and Sentinel-2 MSI data (Figure 5a).As the commission error of each algorithm using Landsat 7 ETM+ data was much higher than that using the other two kinds of data, the user accuracy of each algorithm using the Landsat 8 OLI data and Sentinel-2 MSI data was higher than that using the Landsat 7 ETM+ data to a large extent (Figure 5b).It can be seen that the user accuracy difference among the three sensors for each algorithm was the main factor that caused the lowest accuracy of each algorithm using the Landsat 7 ETM+ data.From the perspective of algorithm performance, the producer accuracies of the NDWI-based algorithm and the NDWI plus VI-based algorithm were not the highest for each sensor, and the producer accuracy difference of the nine algorithms was subtle; however, the user accuracy of the NDWI-based and NDWI plus VI-based algorithms was much higher than that of the other algorithms, and the user accuracy differences of all of the nine algorithms were larger than those of the producer accuracies of all the algorithms.That was the main reason for why the NDWI-based and NDWI plus VI-based algorithms performed best with the same highest overall accuracy among the nine algorithms, while the TCW-based algorithm performed the worst with the lowest accuracy due to the largest commission error in the study area.

Comparison of Different Water Indices in Water Body Extraction
Among the various water indices, we found that the NDWI-based and NDWI plus VI-based algorithms had slightly higher accuracies in our study area, no matter which sensor was used.Xu [38] proposed the mNDWI by replacing the NIR band with SWIR at the base of NDWI, and demonstrated that mNDWI can largely suppress the noise of built-up lands better than NDWI, especially in the areas where buildings dominate.In this specific study, we did not find a higher accuracy of the mNDWI-based algorithm compared to the NDWI-based algorithm for each sensor, probably because there were limited mixed water and built-up land pixels in the study area (Figure 1).The user accuracy of the NDWI-based algorithm was larger than that of the mNDWI-based algorithm, so the commission error of NDWI was lower than that of mNDWI; which implied that the mNDWI-based algorithm could have better performance of water body mapping in complex landscapes, while the NDWI had higher sensitivity for water body extraction in pure open surface water body regions.Menarguez [43] tried to improve the performances of the NDWI-or mNDWI-based algorithms by involving vegetation indices (EVI and NDVI), that could help take care of the mixed pixels of water and vegetation.That could be particularly meaningful when used in flood monitoring, e.g., paddy rice transplant monitoring [42,58,59].
TCW achieved the lowest accuracy among the nine algorithms no matter which data source was used.Due to the many non-water pixels that were mistakenly classified into water pixels, the commission error of TCW was larger than that of the other eight algorithms, so the user accuracy was the lowest among the nine algorithms (Figure 5b).
In order to minimize the errors of atmospheric correction, Beeri et al. [39] proposed an integrated indicator by considering Sum457, ND5723, and ND571.We found this integrated indicator performed well for Landsat 8 OLI (OA = 95.7%);however, it showed a lower accuracy compared to the NDWI-based algorithms when using the Landsat 7 ETM+ and Sentinel-2 MSI data.This study implied that the integrated approach by considering more indices could not necessarily be more robust than a simple combination of NIR and green bands in the identification of open surface water bodies.
As an initial effort, we selected a simple and ideal case region, generally covered by pure open surface water bodies.There was no shadow area due to the flat terrain, and no high-albedo surfaces such as snow and ice because of the warm weather in the Poyang Lake Basin.Also, very limited urban areas existed in the area.Therefore, we found both AWEInshand AWEIsh-based algorithms achieved reasonably high accuracies.

Comparison of Different Water Indices in Water Body Extraction
Among the various water indices, we found that the NDWI-based and NDWI plus VI-based algorithms had slightly higher accuracies in our study area, no matter which sensor was used.Xu [38] proposed the mNDWI by replacing the NIR band with SWIR at the base of NDWI, and demonstrated that mNDWI can largely suppress the noise of built-up lands better than NDWI, especially in the areas where buildings dominate.In this specific study, we did not find a higher accuracy of the mNDWI-based algorithm compared to the NDWI-based algorithm for each sensor, probably because there were limited mixed water and built-up land pixels in the study area (Figure 1).The user accuracy of the NDWI-based algorithm was larger than that of the mNDWI-based algorithm, so the commission error of NDWI was lower than that of mNDWI; which implied that the mNDWI-based algorithm could have better performance of water body mapping in complex landscapes, while the NDWI had higher sensitivity for water body extraction in pure open surface water body regions.Menarguez [43] tried to improve the performances of the NDWI-or mNDWI-based algorithms by involving vegetation indices (EVI and NDVI), that could help take care of the mixed pixels of water and vegetation.That could be particularly meaningful when used in flood monitoring, e.g., paddy rice transplant monitoring [42,58,59].
TCW achieved the lowest accuracy among the nine algorithms no matter which data source was used.Due to the many non-water pixels that were mistakenly classified into water pixels, the commission error of TCW was larger than that of the other eight algorithms, so the user accuracy was the lowest among the nine algorithms (Figure 5b).
In order to minimize the errors of atmospheric correction, Beeri et al. [39] proposed an integrated indicator by considering Sum457, ND5723, and ND571.We found this integrated indicator performed well for Landsat 8 OLI (OA = 95.7%);however, it showed a lower accuracy compared to the NDWI-based algorithms when using the Landsat 7 ETM+ and Sentinel-2 MSI data.This study implied that the integrated approach by considering more indices could not necessarily be more robust than a simple combination of NIR and green bands in the identification of open surface water bodies.
As an initial effort, we selected a simple and ideal case region, generally covered by pure open surface water bodies.There was no shadow area due to the flat terrain, and no high-albedo surfaces such as snow and ice because of the warm weather in the Poyang Lake Basin.Also, very limited urban areas existed in the area.Therefore, we found both AWEI nsh -and AWEI sh -based algorithms achieved reasonably high accuracies.

Comparison of Different Sensors in Water Body Extraction
As for the performance comparison among the three sensors, the Landsat 8 OLI and Sentinel-2 MSI performed better than the Landsat 7 ETM+, and the average overall accuracy of all of the nine algorithms using the Landsat 8 OLI data was almost the same as that using the Sentinel-2 MSI data (Average OA = 94.9%), while the value of Landsat 7 ETM+ was lower (91.9%).The overall accuracies of all of the nine algorithms using the Landsat 8 OLI data were higher than that using the Landsat 7 ETM+ data, and was due to the significant improvements of several aspects made in Landsat 8 OLI compared to Landsat 7 ETM+.Landsat 8 OLI improved the signal-to-noise ratio (SNR) [60] and the measurement of subtle variability in the ground by quantizing data to twelve bits as compared to the eight-bit data produced by Landsat 7 ETM+ [23,61].
The highly reflective surfaces such as snow, clouds, and sun-glint over water bodies may saturate, and especially the sun-glint over water bodies is an important factor that could cause the saturation of the reflective wavebands due to the flat terrain in the Poyang Lake Basin.The visible bands in Landsat 7 ETM+ tend to saturate more frequently compared to the bands in Landsat 8 OLI, which improved the dynamic range to reduce band saturation over highly reflective objects [62].Regarding the improvements of the waveband design in Landsat 8 OLI, the OLI bands are spectrally narrower and cover different spectral ranges compared to Landsat 7 ETM due to the band edges that have been modified to avoid atmospheric absorption features, which can add additional uncertainties to the reflected wave from water bodies.The most significant change is that the NIR and SWIR-1 bands of Landsat 8 OLI are substantially narrower than those of Landsat 7 ETM+, which obviously avoids a water absorption feature that occurs in the NIR band and SWIR-1 band of Landsat 7 ETM+.Generally, the SNR is smaller in the narrower band, but Landsat 8 OLI overcome this limitation by implementing a pushbroom imager, allowing for longer dwell times and a greater range of the sensed signal.So, the SNR of Landsat 8 OLI is 6 to 10 times better than that of Landsat 7 ETM+ for the different spectral bands, which has allowed narrowing down of its spectral bandwidths to avoid atmospheric absorption to the reflected wave from water bodies [61,62].An additional cirrus band of Landsat 8 (1.36-1.38 µm) and Sentinel-2 (1.365-1.385µm) can be used to detect clouds, which is especially helpful for detecting high altitude clouds [62][63][64][65].Some water indices were defined to be calculated by the surface reflectance data of each band, so the original TOA reflectance data needs to conduct atmospheric corrections to be converted into surface reflectance data, to remove the impacts of several factors such as atmospheric gases and aerosols in space.As the band designs of Landsat 8 OLI were more focused on minimizing atmospheric impacts, and the atmospheric correction models of Landsat 8 OLI have been improved compared to Landsat 7 ETM+ [61,66], the surface reflectance data of Landsat 8 OLI is closer to the true values of the surface reflectance compared to Landsat 7 ETM+.Based on the above points, we concluded that Landsat 8 OLI is more applicable to extract water bodies than Landsat 7 ETM+ [67].
In terms of the performance difference between Landsat 8 OLI and Sentinel-2 MSI, the overall performance was almost the same.The average of the overall accuracy of all of the nine algorithms using the Landsat 8 OLI data was the same as that using the Sentinel-2 MSI data (Average OA = 94.9%).This could be attributed to the similar band wavelength of the two sensors, specifically, the central wavelength of band SWIR-1 in both sensors is 1.61 µm.The radiometric resolution of Sentinel-2 MSI is as same as Landsat 8 OLI (12-bit) [68] despite the higher spatial resolution of Sentinel-2 MSI.
This study showed a similar performance between the 20 m Sentinel-2 data and the 30 m Landsat 8 data.Note that we compared Sentinel-2 and Landsat 8 using different resolutions of data.That is, the capability of the Sentinel-2 data could be partly attributed to the higher spatial resolution.Moreover, we added more analyses based on the 10 m Sentinel-2 data (resampling 20 m SWIR Bands into 10 m resolution) and found that the 10 m Sentinel-2 results had an improved accuracy compared to those based on the 20 m Sentinel-2 data and the 30 m Landsat 8 data (Figure 6).

Sensitivities of Sensors and Water Body Mapping Algorithms in Different Water Conditions
As for various water conditions, different sensors and water mapping algorithms could have different performances.We compared the results in a shallow water body area (Figure 7) where the bottom albedo could contribute to observed radiance and affect the accuracies of algorithms.As can be seen in Figure 7, Landsat 7 ETM+ performed the worst and a lot of non-water pixels were classified into water pixels in the shallow water body area where a large amount of silt sediments exist, which caused a high commission error.In the border region of water and land, Landsat 7 ETM+ also had higher commission errors (Figure 8).
Among the nine algorithms, the NDWI-based and NDWI plus VI-based algorithms performed better than the others in either shallow water conditions (Figure 7) or mixed water and land conditions (Figure 8), no matter which sensor was used.TCW performed worst with the lowest overall accuracy and user accuracy, as many built-up lands were identified as water bodies (Figure 8) and caused a high commission error.The mNDWI-based algorithm also had higher commission errors than that of the NDWI-based algorithm (Figure 8).More variables such as the seasonal and daily difference of the sun angle, atmospheric composition, changes in water body properties, and even the choice of atmospheric correction methods may also affect the accuracy of water body extraction.Those are beyond the scope of this specific study and need to be explored in a future study.

Sensitivities of Sensors and Water Body Mapping Algorithms in Different Water Conditions
As for various water conditions, different sensors and water mapping algorithms could have different performances.We compared the results in a shallow water body area (Figure 7) where the bottom albedo could contribute to observed radiance and affect the accuracies of algorithms.As can be seen in Figure 7, Landsat 7 ETM+ performed the worst and a lot of non-water pixels were classified into water pixels in the shallow water body area where a large amount of silt sediments exist, which caused a high commission error.In the border region of water and land, Landsat 7 ETM+ also had higher commission errors (Figure 8).methods may also affect the accuracy of water body extraction.Those are beyond the scope of this specific study and need to be explored in a future study.

Sensitivities of Sensors and Water Body Mapping Algorithms in Different Water Conditions
As for various water conditions, different sensors and water mapping algorithms could have different performances.We compared the results in a shallow water body area (Figure 7) where the bottom albedo could contribute to observed radiance and affect the accuracies of algorithms.As can be seen in Figure 7, Landsat 7 ETM+ performed the worst and a lot of non-water pixels were classified into water pixels in the shallow water body area where a large amount of silt sediments exist, which caused a high commission error.In the border region of water and land, Landsat 7 ETM+ also had higher commission errors (Figure 8).
Among the nine algorithms, the NDWI-based and NDWI plus VI-based algorithms performed better than the others in either shallow water conditions (Figure 7) or mixed water and land conditions (Figure 8), no matter which sensor was used.TCW performed worst with the lowest overall accuracy and user accuracy, as many built-up lands were identified as water bodies (Figure 8) and caused a high commission error.The mNDWI-based algorithm also had higher commission errors than that of the NDWI-based algorithm (Figure 8).Different types of water bodies have different reflectance patterns [56]; for example, reflectance peaks occur in the green band for the water bodies with high concentrations of phytoplankton, while the water bodies with a high concentration of sediments such as silt have reflectance peaks in the red band.Given that each water index has its own specific combination of spectral bands, it would have a better performance in extracting a certain kind of water body.Thus it would not be effective to just use only one water index to extract the water bodies in a large scale region where various types of water bodies are mixed.It could achieve a higher accuracy in large scale water body mapping using a combination of different water indices.More characteristics of those water indices and their sensitivities to different water cover types should be further explored and compared in the future.

Conclusions
In this study, we evaluated the performances of 27 water body extraction strategies by combining nine water index-based algorithms and three sensors in an open surface water body region inside the Poyang Lake Basin.The overall accuracy, Kappa coefficient, producer accuracy, and user accuracy for each result were calculated to evaluate the performance of each method.We found that all the methods generally had reasonable overall accuracies ranging from 88.7% to 95.7% and Kappa Coefficients ranging from 0.77 to 0.92.The two lowest accuracies were from the combinations of TCW + Landsat 7 ETM+ and mNDWI + Landsat 7 ETM+; while the two highest accuracies were from the combinations of NDWI + Landsat 8 OLI and NDWIplusVI + Landsat 8 OLI.The combinations of NDWI + Landsat 8 OLI as well as NDWIplusVI + Landsat 8 OLI had the highest accuracies.
Among the three sensors, Landsat 8 OLI and Sentinel-2 MSI performed better with higher accuracies than Landsat 7 ETM+ in the extraction of water bodies for almost all of the algorithms.Regarding the performance difference between Landsat 8 OLI and Sentinel-2 MSI, their accuracies were almost the same.As for the performances of the nine algorithms, the NDWI-and NDWI plus VI-based algorithms performed best with the highest average overall accuracy (95.0%) and Kappa coefficient (0.913-0.915), followed by the algorithms based on AWEIsh, AWEInsh, mNDWI plus VI, SNN, mNDWI, LSWI plus VI, and TCW.However, these findings may not able to be simply applied to other regions, as they were based on the limited study area dominated by the pure open surface water body.The performances of those water index based algorithms in more complex situations, e.g., the mixed water + soil, or mixed water + vegetation conditions, could be different and should be explored in future studies.
Above all, all the three sensors and nine algorithms based on different water indices had reasonable good performances in open surface water body mapping, while Landsat 8 and Sentinel-2 as well as NDWI had extraordinary advantages for accurate water body mapping in the study area.Among the nine algorithms, the NDWI-based and NDWI plus VI-based algorithms performed better than the others in either shallow water conditions (Figure 7) or mixed water and land conditions (Figure 8), no matter which sensor was used.TCW performed worst with the lowest overall accuracy and user accuracy, as many built-up lands were identified as water bodies (Figure 8) and caused a high commission error.The mNDWI-based algorithm also had higher commission errors than that of the NDWI-based algorithm (Figure 8).
Different types of water bodies have different reflectance patterns [56]; for example, reflectance peaks occur in the green band for the water bodies with high concentrations of phytoplankton, while the water bodies with a high concentration of sediments such as silt have reflectance peaks in the red band.Given that each water index has its own specific combination of spectral bands, it would have a better performance in extracting a certain kind of water body.Thus it would not be effective to just use only one water index to extract the water bodies in a large scale region where various types of water bodies are mixed.It could achieve a higher accuracy in large scale water body mapping using a combination of different water indices.More characteristics of those water indices and their sensitivities to different water cover types should be further explored and compared in the future.

Conclusions
In this study, we evaluated the performances of 27 water body extraction strategies by combining nine water index-based algorithms and three sensors in an open surface water body region inside the Poyang Lake Basin.The overall accuracy, Kappa coefficient, producer accuracy, and user accuracy for each result were calculated to evaluate the performance of each method.We found that all the methods generally had reasonable overall accuracies ranging from 88.7% to 95.7% and Kappa Coefficients ranging from 0.77 to 0.92.The two lowest accuracies were from the combinations of TCW + Landsat 7 ETM+ and mNDWI + Landsat 7 ETM+; while the two highest accuracies were from the combinations of NDWI + Landsat 8 OLI and NDWI plus VI + Landsat 8 OLI.The combinations of NDWI + Landsat 8 OLI as well as NDWI plus VI + Landsat 8 OLI had the highest accuracies.
Among the three sensors, Landsat 8 OLI and Sentinel-2 MSI performed better with higher accuracies than Landsat 7 ETM+ in the extraction of water bodies for almost all of the algorithms.Regarding the performance difference between Landsat 8 OLI and Sentinel-2 MSI, their accuracies were almost the same.As for the performances of the nine algorithms, the NDWI-and NDWI plus VI-based algorithms performed best with the highest average overall accuracy (95.0%) and Kappa coefficient (0.913-0.915), followed by the algorithms based on AWEI sh , AWEI nsh , mNDWI plus VI, SNN, mNDWI, LSWI plus VI, and TCW.However, these findings may not able to be simply applied to other regions, as they were based on the limited study area dominated by the pure open surface water body.The performances of those water index based algorithms in more complex situations, e.g., the mixed water + soil, or mixed water + vegetation conditions, could be different and should be explored in future studies.
Above all, all the three sensors and nine algorithms based on different water indices had reasonable good performances in open surface water body mapping, while Landsat 8 and Sentinel-2 as well as NDWI had extraordinary advantages for accurate water body mapping in the study area.This study provided valuable implications for the selection of water indices and sensors for large scale open surface water body mapping efforts.

Figure 1 .
Figure 1.The location of the study area and the distribution of the ground truth points for accuracy assessment.The left part is the specific location of the study area in China, while the right part is a red, green, blue (RGB) composite image using the bands near-infrared (NIR), red, and green of Landsat 8 OLI and the distribution of water and non-water region of interests (ROIs).

Figure 1 .
Figure 1.The location of the study area and the distribution of the ground truth points for accuracy assessment.The left part is the specific location of the study area in China, while the right part is a red, green, blue (RGB) composite image using the bands near-infrared (NIR), red, and green of Landsat 8 OLI and the distribution of water and non-water region of interests (ROIs).

Figure 2 .
Figure 2. Data availability and cloud contents of Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI images from December 2015 to September 2016.The selection of the compared group considered the temporal adjacency among them and cloud effects.

Figure 2 .
Figure 2. Data availability and cloud contents of Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI images from December 2015 to September 2016.The selection of the compared group considered the temporal adjacency among them and cloud effects.

Figure 3 .
Figure 3. Contrast of the resultant open surface water body maps from 27 combinations of the nine algorithms (refer to Table 2) and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 3 .
Figure 3. Contrast of the resultant open surface water body maps from 27 combinations of the nine algorithms (refer to Table 2) and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 4 .
Figure 4. Overall accuracies (a) and Kappa coefficients (b) of different combinations of the nine algorithms and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 4 .
Figure 4. Overall accuracies (a) and Kappa coefficients (b) of different combinations of the nine algorithms and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 5 .
Figure 5. Producer accuracies (a) and user accuracies (b) of different combinations of the nine algorithms and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 5 .
Figure 5. Producer accuracies (a) and user accuracies (b) of different combinations of the nine algorithms and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).
affect the accuracy of water body extraction.Those are beyond the scope of this specific study and need to be explored in a future study.

Figure 6 .
Figure 6.A comparison of overall accuracies based on different data sources (30 m Landsat 8 OLI, resampled 20 m and 10 m Sentinel-2 MSI data) and the nine algorithms.

Figure 7 .
Figure 7. Contrast of the shallow water body mapping results from 27 combinations of the nine algorithms (refer to Table 2) and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 6 .
Figure 6.A comparison of overall accuracies based on different data sources (30 m Landsat 8 OLI, resampled 20 m and 10 m Sentinel-2 MSI data) and the nine algorithms.

Figure 6 .
Figure 6.A comparison of overall accuracies based on different data sources (30 m Landsat 8 OLI, resampled 20 m and 10 m Sentinel-2 MSI data) and the nine algorithms.

Figure 7 .
Figure 7. Contrast of the shallow water body mapping results from 27 combinations of the nine algorithms (refer to Table 2) and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 7 .
Figure 7. Contrast of the shallow water body mapping results from 27 combinations of the nine algorithms (refer to Table 2) and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 8 .
Figure 8. Contrast of the water body mapping results in the border area between the water and land from 27 combinations of the nine algorithms (refer to Table 2) and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Figure 8 .
Figure 8. Contrast of the water body mapping results in the border area between the water and land from 27 combinations of the nine algorithms (refer to Table 2) and the three sensors (Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI).

Table 1 .
Band comparison among Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI.Only selected bands used in this study were included here.

Table 1 .
Band comparison among Landsat 7 ETM+, Landsat 8 OLI, and Sentinel-2 MSI.Only selected bands used in this study were included here.

Table 2 .
A summary of open surface water body mapping algorithms based on different water indices and thresholds.