Automated Extraction of Surface Water Extent from Sentinel-1 Data

Accurately quantifying surface water extent in wetlands is critical to understanding their role in ecosystem processes. However, current regionalto global-scale surface water products lack the spatial or temporal resolution necessary to characterize heterogeneous or variable wetlands. Here, we proposed a fully automatic classification tree approach to classify surface water extent using Sentinel-1 synthetic aperture radar (SAR) data and training datasets derived from prior class masks. Prior classes of water and non-water were generated from the Shuttle Radar Topography Mission (SRTM) water body dataset (SWBD) or composited dynamic surface water extent (cDSWE) class probabilities. Classification maps of water and non-water were derived over two distinct wetlandscapes: the Delmarva Peninsula and the Prairie Pothole Region. Overall classification accuracy ranged from 79% to 93% when compared to high-resolution images in the Prairie Pothole Region site. Using cDSWE class probabilities reduced omission errors among water bodies by 10% and commission errors among non-water class by 4% when compared with results generated by using the SWBD water mask. These findings indicate that including prior water masks that reflect the dynamics in surface water extent (i.e., cDSWE) is important for the accurate mapping of water bodies using SAR data.


Introduction
Wetlands are among the world's most productive and ecologically diverse ecosystems; yet, they are being lost at alarming rates [1,2].Accurately quantifying the spatial and temporal dynamics of surface water in wetlands is critical to understanding ecosystem processes, including land-atmosphere energy balance [3], carbon and nutrient cycles [4,5], wetland modeling [6] and surface-groundwater dynamics [7,8].Despite this critical need, most regional-to global-scale surface water extent products do not adequately characterize spatially complex or temporally dynamic wetlands due to their limited spectral, spatial or temporal resolutions [9,10].
A limitation of products derived from optical sensors stems from their inability to penetrate cloud cover, which often coincides with flood events.Products derived from optical sensors such as Landsat [11] and Sentinel-2 [12] normally use a variety of spectral bands ranging from visible to the shortwave infrared (SWIR) regions of the electromagnetic spectrum.
Active sensors, such as synthetic aperture radars (SAR), have advantages over optical sensors to quantify the spatial and temporal variation of surface water extent [13].These advantages include 'all-weather' and 'day-and-night' capacity, as well as sensitivity to both open water and below-canopy inundation [14][15][16][17][18][19].Detection of surface water using SAR backscatter relies on the fact that areas of open, smooth (no or small waves relative to the wavelength of energy employed by the SAR) water bodies typically exhibit lower backscatter coefficients [14].SAR backscatter coefficients from both spaceborne and airborne platforms have been employed for mapping surface water extent in emergent flood events and a variety of ecosystems, including lakes, rivers, and wetlands [16,19,20].Many SAR-based studies employed either histogram thresholding methods [16,21,22], classification approaches [14,23] or multi-temporal thresholding [24] to map water surface extent.A number of studies have explored the influence of incidence angle [22,25,26], wave conditions [21,27], and vegetation cover [15,17] on SAR backscatter.Numerous operational mapping algorithms rely on SAR backscatter from single-polarized (e.g., HH, HV, or VV), dual-polarized (HH/HV or VV/VH) [28][29][30], or quad-polarized (HH/HV/VV/VH) data [23,31], where the first and second letter denote transmit and receive polarizations.
Despite these advantages, there are few SAR-based sub-hectare (100 m) surface extent products at continental to global scales.The main reason these products have not been developed is the limited availability of SAR data [16], which results in spatial and temporal discontinuities when quantifying the dynamics of surface water extents across broader geographic regions.Until recently, systematically collected SAR datasets, such as that from the Sentinel-1 satellite, have not been available.The European Space Agency currently provides publicly available Sentinel-1 SAR data with potentially global coverage that may greatly improve land surface monitoring [32].
Automated algorithms are essential for surface water mapping at large geographic scales [18,22,33].Traditionally, the selection of a training dataset is one of the tedious and subjective steps that impede the automation in either thresholding or classification algorithms.Recently, several studies demonstrated the feasibility of automatically selecting training datasets from existing data products to generate newer and enhanced products [34][35][36].Among these products, the Shuttle Radar Topography Mission (SRTM)-derived water body dataset (SWBD) is one of the more commonly used datasets, perhaps in part because it has a relatively fine spatial resolution (90 m) compared to other products with near global coverage [37].Such products include open permanent water bodies (SAR-WBI) derived from the Envisat advanced synthetic aperture radar (ASAR) instrument [16], and the Moderate Resolution Imaging Spectroradiometer (MODIS) water mask 250 m (MOD44W) data product [37].In addition, global-scale data and cloud computing are now available for large-scale applications.For example, multi-temporal Landsat data were used recently to generate a global inundation surface water extent product [38], using Google Earth Engine (GEE), a platform that stores multiple sources of global-scale satellite imagery and provides planetary-scale analysis capabilities for non-profit scientists and researchers [39].
In this paper, we describe a fully-automated approach for mapping surface water extent using satellite-based Sentinel-1 C-band SAR data.A critical step in the automatic algorithm is the selection of training samples from prior masks generated by existing data products.While existing water masks can be used to automatically train a surface water classification algorithm [22], most static water masks omit small and dynamic wetlands, generating a bias in these classification models.The objective of this study was to determine the importance of including information on surface water dynamics when selecting training data for surface water classification models.Specifically, the surface water extent was mapped using a random forest approach, using water masks from either the SWBD or composites of dynamic surface water extent (DSWE) products (USGS, 2017).This paper is organized as follows.First, the representative study sites and data used for analysis are described, followed by a brief description of the prior masks prepared using SWBD and DSWE datasets.Then, the proposed automated algorithms to classify water and non-water are described and applied to Sentinel-1 data, and accuracy assessments are performed using high-resolution images.Finally, the significance and limitations of the result are discussed and conclusions are drawn.

Study Area
Two sites in North America were selected for this study, including a portion of the Prairie Pothole Region located in North Dakota and a site in the Delmarva Peninsula portion of Maryland, representing inland and coastal wetlandscapes, respectively (Figure 1).The Prairie Pothole Region (PPR) of central North America is dominated by natural grasslands with a relatively high density of small and elliptical open water bodies.The Delmarva Peninsula (DMV) is between the Chesapeake Bay and Atlantic Ocean in the eastern United States and mainly consists of cropland and forest, including many small rivers, streams, swamps in forests and marshes along the Peninsula's edge [40,41].Both wetlandscapes contain geographically isolated wetlands (wetlands surrounded by uplands) and have a relatively flat topography.
Remote Sens. 2018, 10, x FOR PEER REVIEW 3 of 18 Sentinel-1 data, and accuracy assessments are performed using high-resolution images.Finally, the significance and limitations of the result are discussed and conclusions are drawn.

Study Area
Two sites in North America were selected for this study, including a portion of the Prairie Pothole Region located in North Dakota and a site in the Delmarva Peninsula portion of Maryland, representing inland and coastal wetlandscapes, respectively (Figure 1).The Prairie Pothole Region (PPR) of central North America is dominated by natural grasslands with a relatively high density of small and elliptical open water bodies.The Delmarva Peninsula (DMV) is between the Chesapeake Bay and Atlantic Ocean in the eastern United States and mainly consists of cropland and forest, including many small rivers, streams, swamps in forests and marshes along the Peninsula's edge [40,41].Both wetlandscapes contain geographically isolated wetlands (wetlands surrounded by uplands) and have a relatively flat topography.

Remote-Sensing Datasets
Sentinel-1 SAR and Landsat optical data were used for algorithm development.Sentinel-1 carries a 5.405 GHz C-band imager, operating over land in three modes with various observation strategies, swath widths, and spatial resolutions [42].With a swath width of 250-km at fine spatial resolution (5 m × 20 m), the interferometric wide-swath mode (IW) is the main operational mode over land and has the potential to benefit a diverse array of land cover studies.Two Sentinel-1 satellites, Sentinel-1A launched in September 2014 and Sentinel-1B launched in April 2016, jointly provide a nominal 6-day repeat cycle over the equator, 6-day repeat cycle over Europe, and 12-day repeat cycle over North America, allowing for continuous monitoring of surface water extent [42].The National Aeronautics and Space Administration Alaska Satellite Facility (NASA/ASF) houses a complete archive of Sentinel-1 SAR data processed by the European Space Agency (ESA).Dual-polarized (VV/VH) Sentinel-1 SAR data acquired under interferometric wide-swath (IW) mode processed to

Remote-Sensing Datasets
Sentinel-1 SAR and Landsat optical data were used for algorithm development.Sentinel-1 carries a 5.405 GHz C-band imager, operating over land in three modes with various observation strategies, swath widths, and spatial resolutions [42].With a swath width of 250-km at fine spatial resolution (5 m × 20 m), the interferometric wide-swath mode (IW) is the main operational mode over land and has the potential to benefit a diverse array of land cover studies.Two Sentinel-1 satellites, Sentinel-1A launched in September 2014 and Sentinel-1B launched in April 2016, jointly provide a nominal 6-day repeat cycle over the equator, 6-day repeat cycle over Europe, and 12-day repeat cycle over North America, allowing for continuous monitoring of surface water extent [42].The National Aeronautics and Space Administration Alaska Satellite Facility (NASA/ASF) houses a complete archive of Sentinel-1 SAR data processed by the European Space Agency (ESA).Dual-polarized (VV/VH) Sentinel-1 SAR data acquired under interferometric wide-swath (IW) mode processed to Level-1 ground range detected (GRD) were automatically downloaded via the ASF application programming interface (API) [43].
Two datasets were prepared for calibration of the random forest models.First, the SRTM water body dataset (SWBD) is comprised of worldwide water body outlines in a vector format generated by the National Geospatial-Intelligence Agency (NGA) and published by NASA in 2003 [44].SWBD represents water body status as of February 2000 surveyed by the SRTM.The SWBD data is distributed in 1 • × 1 • tiles, covering the land surface on Earth between 56 • southern latitude and 60 • northern latitude.Second, a composited dynamic surface water extent (cDSWE) was prepared using a multi-year composition of DSWE products [45].The DSWE products were developed by the U.S. Geological Survey as part of a suite of Landsat science products that will initially be produced for the entire Landsat archive over the United States and its territories.The DSWE model is composed of independent tests designed to detect pixels composed not only of open water, but partial mixtures of water, vegetation and soil.Threshold-based and relying only on Landsat surface reflectance Level-2 science products and digital terrain data as inputs, DSWE may be broadly applied through space and time without need for scene-based training.DSWE has been rigorously tested over the Florida Everglades [45] and evaluated over other landscapes in North America, showing that DSWE classes provide a good overview of open water bodies and partially inundated surfaces under clear-sky conditions [46,47].The cDSWE data was prepared by applying the DSWE algorithm to all available data from the Landsat 5 Thematic Mapper (TM) and Landsat 7 Enhanced Thematic Mapper (ETM+) acquired between 2000 and 2015, and calculating the class probabilities of land and water during this time period using Google Earth Engine [39].
Two datasets derived from optical imagery were used to validate the results from our automated algorithms.First, the DSWE algorithm was applied to Landsat-8 operational land image (OLI) data that had been acquired within 3 days of corresponding Sentinel-1 data over both sites and pre-processed to surface reflectance [45,48].A summary of the Sentinel-1 and Landsat data used for this study is shown in Table 1, including acquisition date and weather data such as temperature and precipitation.For the PPR site, the weather data were recorded at Robinson, ND, by the North Dakota Agricultural Weather Network (NDAWN) system [49].Second, we used high-resolution images (1 m) coincident with Sentinel-1 acquisitions and acquired through the National Agriculture Imagery Program (NAIP) 2016 campaign over the Prairie Pothole Region site.

Automated Synthetic Aperture Radar (SAR) Algorithm for Water Extent Mapping
The automatic algorithm that was developed consisted of four steps: (A) pre-processing of Sentinel-1 SAR data to backscatter coefficient (Υ 0 ); (B) prior class mask preparation either from SWBD or cDSWE products; (C) random forest classification by calibrating models using prior class information and covariates from Sentinel-1 data; and (D) accuracy assessment using co-incident NAIP images (Figure 2).

SAR Data Pre-Processing
The Science Toolbox Exploitation Platform (SNAP) Toolkit developed by ESA was used for SAR data pre-processing.Sentinel-1 intensities from high-resolution Level-1 ground range detected products (10 m; GRDH) were speckle-filtered, multi-looked, geocoded, terrain-flattened and terrain corrected to Gamma naught [50] backscatter coefficients (Υ 0 ).Specifically, the improved Lee-Sigma filter [51] with a combination of 5 × 5 and 9 × 9 kernels were used to reduce speckle noise.The image pixels were then multi-looked (3 × 3 window) to 30 m square pixels and resampled using a bilinear method to match the Landsat Universal Transverse Mercator (UTM) coordinate systems.Terrain flattening and terrain correction were conducted using the recently released STRM 1 arc-second (approximately 30 m) digital elevation model (DEM) (SRTMGL1).
Sentinel-1 backscatter coefficients, band indices, and incidence angle were prepared as co-variables for classification.The Sentinel-1 intensity data available over the North American sites consisted of co-polarized VV (vertically transmitted and received) and cross-polarized VH (vertically transmitted and horizontally received) data.The local incidence angle (LIA) was included as a feature representing attributes of viewing geometry, which was found useful to differentiate water from land by previous studies [22].Backscatter intensities were converted into decibel-scale (dB) before computing indices and carrying out the classification.A set of polarized indices were used in this study to potentially extract more information from dual-polarized radar data, and are summarized in Table 2.The polarized ratio (VHrVV) was found to separate open water from deep marsh or shallow marsh efficiently [52], and reduce systematic errors associated with acquisition systems [53].The normalized difference polarized ratio (NDPI) was adopted from the radar forest degradation index (RFDI) reported in recent studies [54,55] to access the strength of the double-bounce signal.The NVHI and NVVI indices were adopted from the radar vegetation index (RVI), also described in literature [56,57] to reflect the level of vegetation growth.

Index Abbreviation Equations Reference
Polarized Ratio (VH to VV)

Training Datasets Preparation
Two training datasets were prepared for Sentinel-1 SAR image classification.These datasets included prior water/non-water masks derived from the SRTM water body dataset (SWBD) [44] and the composited dynamic surface water extent (cDSWE) [45].The version 2.1 SWBD product, distributed in 1 • × 1 • vector tiles, was used to build prior masks [18].Classes of open water, including ocean, lakes and rivers, were extracted and a 'water' class was applied to them.The remaining pixels were included in a 'land' class.
As an alternative to the SWBD water mask, we used class probabilities derived from a 15-year DSWE time series to build prior masks.The data processing was completed using the Google Earth Engine, and data were exported in 1 • × 1 • tiles for each site.The probability of each class was calculated as: P class = 100 × N occu /N clear-sky (1) where: N occu is the number of occurrences of the land, open water, and partial water DSWE classes, and N clear-sky is the number of clear-sky acquisitions from Landsat 5 and Landsat 7.
A threshold of 95% was applied to the cDSWE-generated probabilities to derive 'land', 'water', and 'partial water' masks, and the rest pixels were labeled as 'other'.The DSWE partial surface water class was omitted from the analysis because of limited area with probability above 95%.

Random Forest Classification
Random forest models [58] were used to derive water classes from Sentinel-1 data.Specifically, the models described the relationship between the radar-derived covariates (i.e., radar backscattering coefficients, indices and local incidence angle) and the response variable (i.e., water/land classes from the prior masks).Samples for water/land classes were first randomly extracted using the prior masks as references.Sample sizes were proportional to the ratio of classes from the prior mask, ensuring a minimum number of 10,000 samples from the smaller of the two classes.For example, if the ratio of water to land calculated from the prior mask was 3, then we sampled 30,000 for the water class and 10,000 for the land class.For instance, a typical 30 m resolution Sentinel-1 scene in UTM projection is comprised of approximately 9498 columns and 6954 rows.An example is shown in Figure 3, which includes an image acquired on 10 August 2016 for the Prairie Pothole Region site.According to summary statistics, the number of water pixels in this image is 1,595,923, and the number of non-water pixels is 41,356,046.Therefore, we sampled 0.62% (10,000/1,595,923) of the water pixels and 0.48% (200,000/41,356,046) of non-water pixels according to the cDSWE prior mask.A random forest model with 100 trees (estimators) was then trained using the randomly sampled features as explanatory variables and water/land classes from each prior mask as response variables.A map of the probability of water class was derived by applying the trained model to all image pixels.Simple probability thresholds were selected based on recommendations from the RF model and previous studies [58,59].Specifically, a map of water classes was derived from water probability (P w ) using rules as follows: A visual comparison of DSWE and Sentinel-1 derived classification maps and an interpretation of the accuracy assessment of the classification results were conducted, depending on the availability of validation data over each site.For the Delmarva site, we compared the Sentinel-1 derived probability maps to DSWE maps derived from Landsat data acquired within three days of the Sentinel-1 image.Validation using NAIP imagery is not applicable for the Delmarva site because no co-incident NAIP imagery is available for the time frame of the Sentinel imagery (2015 through 2016).For the Prairie Pothole Region site, we used high-resolution NAIP images and DSWE data as reference data (Figure 3).Specifically, we randomly selected 300 points, 100 points per class, for evaluation over three classes-the persistent water, persistent land, and other mixed from Landsat derived DSWE class probabilities.Greater than 95% in either class probability (water or land) was considered as persistent in that class.Each point was visually identified from co-incident NAIP imagery acquired on the same day.

Automation of Algorithms
Automation of the algorithms was achieved by splitting them into several fully automated subprocesses that rely on open source software and packages.First, data pre-processing was handled using the SNAP graphic processing tool (gpt), a command line interface, as embedded in bash scripts to execute in batch mode and generate a composite processing graph in Extensible Markup Language (XML).Then, several python packages ('rasterio', 'gdal' and 'numpy') were used to implement the analytical algorithms.This included the preparation of the co-variables and prior masks; the extract of training samples from the SWBD and cDSWE; random forest models model construction and execution; output the classification results; and evaluate the results.

Comparison of Prior Masks
A comparison of the SWBD and cDSWE prior masks for the Delmarva Peninsula is presented in Figure 4.Although the general patterns were similar over major water bodies, more spatial details were preserved in the cDSWE product.Many of the pixels classified as 'land' in the SWBD mask were

Automation of Algorithms
Automation of the algorithms was achieved by splitting them into several fully automated sub-processes that rely on open source software and packages.First, data pre-processing was handled using the SNAP graphic processing tool (gpt), a command line interface, as embedded in bash scripts to execute in batch mode and generate a composite processing graph in Extensible Markup Language (XML).Then, several python packages ('rasterio', 'gdal' and 'numpy') were used to implement the analytical algorithms.This included the preparation of the co-variables and prior masks; the extract of training samples from the SWBD and cDSWE; random forest models model construction and execution; output the classification results; and evaluate the results.

Comparison of Prior Masks
A comparison of the SWBD and cDSWE prior masks for the Delmarva Peninsula is presented in Figure 4.Although the general patterns were similar over major water bodies, more spatial details were preserved in the cDSWE product.Many of the pixels classified as 'land' in the SWBD mask were classified as 'partial' or 'other' in the cDSWE mask, indicating that these areas were variably classified as land or water in the DSWE time series and were thus excluded from the cDSWE mask.

Land/Water Separability from Different Radar Variables
The prior class masks described in Section 2.3.2 were used to derive training samples relating prior water/land classes to radar polarizations, indices and incidence angle.A box plot of these indices by water/land classes presented in Figure 5 demonstrates the highest class separability between water and land classes using VV and VH backscatter when compared to other indices.The separability is weak among both local and ellipsoid incidence angles and the polarized ratio index (VHrVV).

Land/Water Separability from Different Radar Variables
The prior class masks described in Section 2.3.2 were used to derive training samples relating prior water/land classes to radar polarizations, indices and incidence angle.A box plot of these indices by water/land classes presented in Figure 5 demonstrates the highest class separability between water and land classes using VV and VH backscatter when compared to other indices.The separability is weak among both local and ellipsoid incidence angles and the polarized ratio index (VHrVV).
Despite the low separability compared to backscatter and polarization indices, the rationality of including incidence angle for distinguishing water from land is illustrated in Figure 6.The density scatter plot of backscatter coefficients versus local incidence angle indicated two potential classes.A plot of mean backscatter coefficients (Gamma0_VV and Gamma0_VH in dB) for each incidence angle (with 1-σ bar in grey) showed a decreasing backscatter trend with increasing incidence angle over water (in cyan), and a relative stable trend over land (in red).Despite the low separability compared to backscatter and polarization indices, the rationality of including incidence angle for distinguishing water from land is illustrated in Figure 6.The density scatter plot of backscatter coefficients versus local incidence angle indicated two potential classes.A plot of mean backscatter coefficients (Gamma0_VV and Gamma0_VH in dB) for each incidence angle (with 1-σ bar in grey) showed a decreasing backscatter trend with increasing incidence angle over water (in cyan), and a relative stable trend over land (in red).Despite the low separability compared to backscatter and polarization indices, the rationality of including incidence angle for distinguishing water from land is illustrated in Figure 6.The density scatter plot of backscatter coefficients versus local incidence angle indicated two potential classes.A plot of mean backscatter coefficients (Gamma0_VV and Gamma0_VH in dB) for each incidence angle (with 1-σ bar in grey) showed a decreasing backscatter trend with increasing incidence angle over water (in cyan), and a relative stable trend over land (in red).

Comparison of Classification Results
Classification results at Prairie Pothole Region and Delmarva sites are shown in Figure 7.The water/land classes derived from both SWBD and cDSWE prior masks followed reasonable patterns over large water bodies.At the Prairie Pothole Region site, the cDSWE-based results had fewer commission errors resulting from Bragg scattering from waves on open water and omission errors from the high probability water class compared to the results using SWBD (Figure 7A,C and subzoom).At the Delmarva site, increasing omissions of small and linear objects were observed from results using SWBD as a prior mask (Figure 7B,D and sub zoom).

Comparison of Classification Results
Classification results at Prairie Pothole Region and Delmarva sites are shown in Figure 7.The water/land classes derived from both SWBD and cDSWE prior masks followed reasonable patterns over large water bodies.At the Prairie Pothole Region site, the cDSWE-based results had fewer commission errors resulting from Bragg scattering from waves on open water and omission errors from the high probability water class compared to the results using SWBD (Figure 7A,C and subzoom).At the Delmarva site, increasing omissions of small and linear objects were observed from results using SWBD as a prior mask (Figure 7B,D and sub zoom).(d-f)) missing from result using SWBD as prior mask.Difference in classification results between using SWBD and cDSWE were labeled in light to dark orange colors on the classification maps (A,B) and subzoom maps (a,d).

Comparison with DSWE Product
The classified Sentinel-1 maps and DSWE products from Landsat-8 images acquired within three days of the Sentinel-1 data are presented in Figure 8. Results indicate similar spatial patterns in both sites over water with high probability ("water-high").For the Prairie Pothole Region site, the comparison between classified maps from Sentinel-1 and Landsat-8 DSWE indicated a closer pattern between the two results (Figure 8A,C).For the Delmarva site, the classification map derived from Sentinel-1 showed similar patterns as those from Landsat-8 derived DSWE over the high-probability water class (Figure 8B,D).

Comparison with DSWE Product
The classified Sentinel-1 maps and DSWE products from Landsat-8 images acquired within three days of the Sentinel-1 data are presented in Figure 8. Results indicate similar spatial patterns in both sites over water with high probability ("water-high").For the Prairie Pothole Region site, the comparison between classified maps from Sentinel-1 and Landsat-8 DSWE indicated a closer pattern between the two results (Figure 8A,C).For the Delmarva site, the classification map derived from Sentinel-1 showed similar patterns as those from Landsat-8 derived DSWE over the high-probability water class (Figure 8B,D).

Accuracy Assessment by High-Resolution Imagery
The classification results in 2016 from the Prairie Pothole Region site were validated using coincident high-resolution imagery from NAIP.Accuracy assessments based on data collected on two dates (5 July 2016 and 10 August 2016) for classification derived using the SWBD and cDSWE datasets are presented in Table 3. Use of the cDSWE prior mask resulted in better accuracies (overall accuracy 82%~93%, kappa 0.64~0.84)than use of the SWBD prior mask (overall accuracy 79%~90%, kappa 0.54~0.77).Confusion matrix and accuracy estimates of the classification maps using the cDSWE prior mask are presented in Table 4.

Accuracy Assessment by High-Resolution Imagery
The classification results in 2016 from the Prairie Pothole Region site were validated using co-incident high-resolution imagery from NAIP.Accuracy assessments based on data collected on two dates (5 July 2016 and 10 August 2016) for classification derived using the SWBD and cDSWE datasets are presented in Table 3. Use of the cDSWE prior mask resulted in better accuracies (overall accuracy 82~93%, kappa 0.64~0.84)than use of the SWBD prior mask (overall accuracy 79~90%, kappa 0.54~0.77).Confusion matrix and accuracy estimates of the classification maps using the cDSWE prior mask are presented in Table 4.

Time-Series Classifcation Results
The time-series extent of surface water extracted using the Sentinel-1 data from April to September 2016 (Figure 9) revealed the dynamic nature of surface inundation in the Prairie Pothole Region site.The percentage of water summarized from the classification results for a subset in the PPR site (labelled 'A' in Figures 9 and 10) the response of surface water extent to increasing precipitation at local weather station.Specifically, higher percentages of surface water were observed on several dates (6 May, 30 May, 11 June, 17 July and 22 August), giving rises in precipitation (Figure 10).

Time-Series Classifcation Results
The time-series extent of surface water extracted using the Sentinel-1 data from April to September 2016 (Figure 9) revealed the dynamic nature of surface inundation in the Prairie Pothole Region site.The percentage of water summarized from the classification results for a subset in the PPR site (labelled 'A' in Figures 9 and 10) the response of surface water extent to increasing precipitation at local weather station.Specifically, higher percentages of surface water were observed on several dates (6 May, 30 May, 11 June, 17 July and 22 August), giving rises in precipitation (Figure 10).

Automation of Algorithms
The developed fully automatic random forest classification tree approach to classify water and non-water over representative sites can be used to map the dynamics of surface waters at large geographical scales.Historically, preparing training datasets has often been a subjective and timeconsuming step in this classification approach.In this paper, surface water extent was classified using training datasets automatically derived from prior class masks, thereby avoiding the time-consuming task of preparing training datasets.The preprocessing of Sentinel-1 data, preparation of prior masks (e.g., SWBD or cDSWE) and implementation of the classification were fully automated using open source software and packages (e.g., the SNAP toolbox gpt and several Python packages) in batch scripts.Only a few parameters (i.e., cDSWE thresholds, training sample size, random forest settings, and thresholds for class labeling) need to be stipulated by the user.However, the default values used in this study were found to be efficient in extracting surface water extent, and these parameter values were encoded within the batch scripts-allowing for full algorithm automation.Our algorithm computes water probabilities over an entire interferometric wide mode Sentinel-1 image (about 9500 by 7000 pixels) in less than 10 minutes (running on 15 parallel processors).Applying the same classification approach for Sentinel-1 images over different wetlandscapes, the automated algorithms were tested and validated the classification results through comparison with co-incident highresolution imagery.

Improved Prior Masks for Classification
Separability of water and land classes in the prior masks is important for the classification model.Classification results using cDSWE were better than those using the SWBD at both study sites.This was expected for two reasons.First, SWBD is based on only one date (year 2000), whereas the 15-year composite DSWE probabilities account for seasonal/ephemeral inundation (Figure 4).Furthermore, the SWBD only maps water bodies that meet a minimum capture criteria of 600 m in length (i.e., about 4 × 4 30 m pixels) and 183 m in width (i.e., about 6 30 m pixels) for rivers and lakes, and 90 m in width (i.e., about 3 30 m pixel) for lake inlets/arms [44].These features were delineated in the SWBD product regardless of whether they are dynamically inundated or not.On the other hand, the

Automation of Algorithms
The developed fully automatic random forest classification tree approach to classify water and non-water over representative sites can be used to map the dynamics of surface waters at large geographical scales.Historically, preparing training datasets has often been a subjective and time-consuming step in this classification approach.In this paper, surface water extent was classified using training datasets automatically derived from prior class masks, thereby avoiding the time-consuming task of preparing training datasets.The preprocessing of Sentinel-1 data, preparation of prior masks (e.g., SWBD or cDSWE) and implementation of the classification were fully automated using open source software and packages (e.g., the SNAP toolbox gpt and several Python packages) in batch scripts.Only a few parameters (i.e., cDSWE thresholds, training sample size, random forest settings, and thresholds for class labeling) need to be stipulated by the user.However, the default values used in this study were found to be efficient in extracting surface water extent, and these parameter values were encoded within the batch scripts-allowing for full algorithm automation.Our algorithm computes water probabilities over an entire interferometric wide mode Sentinel-1 image (about 9500 by 7000 pixels) in less than 10 minutes (running on 15 parallel processors).Applying the same classification approach for Sentinel-1 images over different wetlandscapes, the automated algorithms were tested and validated the classification results through comparison with co-incident high-resolution imagery.

Improved Prior Masks for Classification
Separability of water and land classes in the prior masks is important for the classification model.Classification results using cDSWE were better than those using the SWBD at both study sites.This was expected for two reasons.First, SWBD is based on only one date (year 2000), whereas the 15-year composite DSWE probabilities account for seasonal/ephemeral inundation (Figure 4).Furthermore, the SWBD only maps water bodies that meet a minimum capture criteria of 600 m in length (i.e., about 4 × 4 30 m pixels) and 183 m in width (i.e., about 6 30 m pixels) for rivers and lakes, and 90 m in width (i.e., about 3 30 m pixel) for lake inlets/arms [44].These features were delineated in the SWBD product regardless of whether they are dynamically inundated or not.On the other hand, the cDSWE-based training samples represent water and land classes in which any uncertain pixels (i.e., those variably classified as land or water throughout the 15-year time series) were excluded from training, reducing the commission errors in the water class (Figure 7).Second, the spatial resolutions of these two datasets were different.SWBD has a nominal spatial resolution of 90 m and includes only large rivers and lakes for inland water.By the minimum capture criteria, the small water bodies were excluded in SWBD.Being derived from Landsat imagery, the open water class cDSWE product has a 30 m spatial resolution and reveals sub-hectare inundation.With training samples for dynamic and small water bodies from cDSWE, the random forest models trained using cDSWE were, therefore, more sensitive to small water bodies (Figure 7).

Validation of Products
Validation of dynamic surface water mapping products can be very challenging and resource-intensive.Other remote sensing data offer a cost-effective alternative to field data for validation.Although validation using coincident high-resolution imagery was only implemented in the Prairie Pothole Region site, this approach could be extended when more coincident high-resolution images are available.Future validation of products from this algorithm at large geographical scales will rely on other high-resolution NAIP, Planet constellation [60], and commercial imagery.A series of factors may contribute to changes in surface water extent.Time-series classification results from the Prairie Pothole Region site provided insights in explaining response in surface water extent to environmental conditions such as precipitation in an inland wetlandscape.

Omission from Inundated Vegetation
Omission errors were observed from inundated vegetation along rivers and lakes (Figure 8).This was also found in previous studies which concluded that fully polarimetric data (e.g., RASARSAT-2) are generally preferred for detecting flooded vegetation [23,31].However, from the Sentinel-1 false-color backscatter composite (Figure 7), strong double bounce was observed from VH backscattering (in green color).The fact that these features were omitted is not unexpected, since the partial water class, defined as a pixel with mixed water and vegetation or soil reflectance signatures, was not included in the cDSWE prior masks due to limited area with probabilities above 95% (Figure 7).Future research will test whether compositing DSWE class probabilities on a seasonal basis would increase the number of high-probability partial-water pixels, allowing the inclusion of this class in the random forest model.In addition, for identifying inundated vegetation, future research could also test coherence and phase information from Sentinel-1, and decomposition information from experimental quad-polarization mode data from the RADARSAT-2.Additionally, the upcoming Radarsat Constellation Mission (RCM) [61] will provide operational compact polarimetry mode data, which has been recently demonstrated to be more effective than dual polarization approaches and nearly as effective as full polarimetric data at detecting inundated vegetation [62].

Resolution-Induced Omission Error
Omission errors were also noticed among small water bodies.In this study, we applied a commonly used filter (Lee Sigma) to reduce speckle-noise in radar images.Decreased spatial resolution was noticed after applying the filter, resulting in the omission of small water bodies in our classification maps.The lower producer's accuracy (64%) for water in the July 5 PPR results (Table 3) is mainly caused by this resolution-induced omission error.The July 5 PPR validation dataset covered a region (Figure 3 orange polygons in middle of the scene) that has more small water bodies than that from the August 10 PPR validation region (Figure 3 red polygons in west and east of the scene).Recent studies suggest multi-temporal filters [63] and non-local speckle filters [64] could potentially reduce the speckle-noise while maintaining the original spatial resolution of the data.

Commission from Smooth Objects
Commission errors were observed from smooth objects like pavements and sandy surfaces.Recent studies have shown that the use of sand masks or road masks can reduce the overestimations introduced by sand in arid regions [54] or roads in urban areas [45].Future research will test including external masks to reduce these commission errors.

Conclusions
In this study, fully automated algorithms were developed to derive water probability and classified maps of water and non-water over two distinct wetlandscapes using Sentinel-1 SAR data and historical Landsat data.Results indicated that using training data based on composite dynamic surface water extent products (i.e., cDSWE) together with random forests allows for the automated detection of surface water using Sentinel-1 SAR data.The classification maps followed reasonable patterns over the main water bodies, while increasing omission errors were observed from small, linear targets and inundated vegetation, and commission errors were noticed from sand and dark pavements.Importantly, the time-series results (Figure 9) revealed seasonal dynamics of inundation and provided insights in explaining the response in surface water extent to changes in precipitation.The fully automatic algorithms developed in this study can be implemented in an operational system to generate continental to global, long-term inundation records, including inundation-monitoring frameworks using satellite systems such as NASA and Indian Space Research Organization (ISRO) SAR mission (NISAR) and Canada's RADARSAT Constellation Mission (RCM) [61,65].

Figure 1 .
Figure 1.Map of the study sites, including the Prairie Pothole Region (PPR) and the Delmarva Peninsula (DMV), with Landsat path/row (purple solid line) and Sentinel-1 path/frame (blue solid line).National Agriculture Imagery Program (NAIP) images (right column) are given for each site, representing inland (upper right) and coastal (bottom right) wetlandscapes.

Figure 1 .
Figure 1.Map of the study sites, including the Prairie Pothole Region (PPR) and the Delmarva Peninsula (DMV), with Landsat path/row (purple solid line) and Sentinel-1 path/frame (blue solid line).National Agriculture Imagery Program (NAIP) images (right column) are given for each site, representing inland (upper right) and coastal (bottom right) wetlandscapes.

Figure 2 .
Figure 2. Workflow for mapping surface water extent using Sentinel-1 SAR data.
Remote Sens. 2018, 10, x FOR PEER REVIEW 7 of 18derived DSWE class probabilities.Greater than 95% in either class probability (water or land) was considered as persistent in that class.Each point was visually identified from co-incident NAIP imagery acquired on the same day.

Figure 3 .
Figure 3.The Prairie Pothole Region site and footprints of remotely sensed data used in this study.The water and land class are from dynamic surface water extent (DSWE) composite class probabilities.Sentinel-1 data were collected on mid-night of 5 July 2016 (UTC) and 10 August 2016 (UTC).NAIP images were collected on the same day (noon to afternoon) of Sentinel-1 data.

Figure 3 .
Figure 3.The Prairie Pothole Region site and footprints of remotely sensed data used in this study.The water and land class are from dynamic surface water extent (DSWE) composite class probabilities.Sentinel-1 data were collected on mid-night of 5 July 2016 (UTC) and 10 August 2016 (UTC).NAIP images were collected on the same day (noon to afternoon) of Sentinel-1 data.

Figure 4 .
Figure 4. Land (A) and Water (B) class probabilities summarized from composited dynamic surface water extent (cDSWE) water/land classes and land/water classes derived from DSWE classes using a 95% threshold (C) and Shuttle Radar Topography Mission water body dataset (SWBD) water/land mask (D) over a site on the Delmarva Peninsula (see map).The zoom-in window (a-d) in the bottomright shows the difference in spatial details in two products.These two prior masks were used to train and calibrate the surface water models.

Figure 4 .
Figure 4. Land (A) and Water (B) class probabilities summarized from composited dynamic surface water extent (cDSWE) water/land classes and land/water classes derived from DSWE classes using a 95% threshold (C) and Shuttle Radar Topography Mission water body dataset (SWBD) water/land mask (D) over a site on the Delmarva Peninsula (see map).The zoom-in window (a-d) in the bottom-right shows the difference in spatial details in two products.These two prior masks were used to train and calibrate the surface water models.

Figure 6 .
Figure 6.Gamma0_VV and Gamma0_VH (left column, (A,D)), density scatterplot (middle column, (B,E)), and binned scatterplot (right column, (C,F)) showing the separability between land and water classes defined by the SWBD in the Delmarva Peninsula.In the binned scatter plot, the backscatter coefficients (Gamma0_VV and Gamma0_VH in dB) are shown in cyan for water pixels and in red for land pixels.Grey bars represent 1 standard deviation.

Figure 6 .
Figure 6.Gamma0_VV and Gamma0_VH (left column, (A,D)), density scatterplot (middle column, (B,E)), and binned scatterplot (right column, (C,F)) showing the separability between land and water classes defined by the SWBD in the Delmarva Peninsula.In the binned scatter plot, the backscatter coefficients (Gamma0_VV and Gamma0_VH in dB) are shown in cyan for water pixels and in red for land pixels.Grey bars represent 1 standard deviation.

Figure 6 .
Figure 6.Gamma0_VV and Gamma0_VH (left column, (A,D)), density scatterplot (middle column, (B,E)), and binned scatterplot (right column, (C,F)) showing the separability between land and water classes defined by the SWBD in the Delmarva Peninsula.In the binned scatter plot, the backscatter coefficients (Gamma0_VV and Gamma0_VH in dB) are shown in cyan for water pixels and in red for land pixels.Grey bars represent 1 standard deviation.

Figure 7 .
Figure 7. Random forest classification results over Prairie Pothole Region (PPR, left column, (A,B)) and Delmarva (DMV, right column, (D,E)) sites, using prior mask either from SWBD (top row) or composite DSWE (cDSWE) class probabilities (second row).The Sentinel-1 images (third row) were shown in false-color composited of Gamma naught (dB) (R: VV, G: VH, B: VHrVV), and were acquired on 10 August 2016 (E) and 9 July 2016 (F).The subzoom windows (bottom row) show small water bodies in PPR (left (a-c)) and linear streams (right(d-f)) missing from result using SWBD as prior mask.Difference in classification results between using SWBD and cDSWE were labeled in light to dark orange colors on the classification maps (A,B) and subzoom maps (a,d).

Figure 7 .
Figure 7. Random forest classification results over Prairie Pothole Region (PPR, left column, (A,B)) and Delmarva (DMV, right column, (D,E)) sites, using prior mask either from SWBD (top row) or composite DSWE (cDSWE) class probabilities (second row).The Sentinel-1 images (third row) were shown in false-color composited of Gamma naught (dB) (R: VV, G: VH, B: VHrVV), and were acquired on 10 August 2016 (E) and 9 July 2016 (F).The subzoom windows (bottom row) show small water bodies in PPR (left (a-c)) and linear streams (right(d-f)) missing from result using SWBD as prior mask.Difference in classification results between using SWBD and cDSWE were labeled in light to dark orange colors on the classification maps (A,B) and subzoom maps (a,d).

Figure 8 .
Figure 8.Comparison of classification maps derived from near-coincident Sentinel-1 (upper row) and Landsat-8 DSWE (bottom row), over the sits of Prairie Pothole Region (PPR, left column) and the Delmarva Peninsula (DMV, right column).The two Sentinel-1 classification maps were generated using cDSWE as prior mask from Sentinel-1 data collected on (B) 9 July 2016 and (D) 10 August 2016.The two DSWE products were generated from Landsat-8 data collected on (A) 11 July 2016 and (C) 11 August 2016.Difference in classification results between this study and DSWE are labeled in light to dark orange colors on the classification maps from this study (A,B).

Figure 8 .
Figure 8.Comparison of classification maps derived from near-coincident Sentinel-1 (upper row) and Landsat-8 DSWE (bottom row), over the sits of Prairie Pothole Region (PPR, left column) and the Delmarva Peninsula (DMV, right column).The two Sentinel-1 classification maps were generated using cDSWE as prior mask from Sentinel-1 data collected on (B) 9 July 2016 and (D) 10 August 2016.The two DSWE products were generated from Landsat-8 data collected on (A) 11 July 2016 and (C) 11 August 2016.Difference in classification results between this study and DSWE are labeled in light to dark orange colors on the classification maps from this study (A,B).

Figure 9 .
Figure 9. Classification results based on Sentinel-1 SAR data from April to September 2016, using cDSWE probabilities to derive training data for the Prairie Pothole Region site.The zoom-in window (A) was selected for illustrating change patterns of the surface water extent and weather data in Figure 10.

Figure 9 .
Figure 9. Classification results based on Sentinel-1 SAR data from April to September 2016, using cDSWE probabilities to derive training data for the Prairie Pothole Region site.The zoom-in window (A) was selected for illustrating change patterns of the surface water extent and weather data in Figure 10.

Figure 10 .
Figure 10.Time series of Sentinel-1 derived percentage of surface water extent (%) and precipitation (mm) for a subset in the Prairie Pothole Region site.The percentage of water was calculated from the time series of classification maps over the given inset in Figure 9.The daily precipitation data were collected by a nearby North Dakota Agricultural Weather Network (NDAWN) weather station in Robinson, ND [49].

Figure 10 .
Figure 10.Time series of Sentinel-1 derived percentage of surface water extent (%) and precipitation (mm) for a subset in the Prairie Pothole Region site.The percentage of water was calculated from the time series of classification maps over the given inset in Figure 9.The daily precipitation data were collected by a nearby North Dakota Agricultural Weather Network (NDAWN) weather station in Robinson, ND [49].

Table 3 .
Comparison of overall accuracy, kappa coefficient, commission and omission errors between classification results using prior mask from SWBD and cDSWE over the Prairie Pothole Region site (PPR).

Table 3 .
Comparison of overall accuracy, kappa coefficient, commission and omission errors between classification results using prior mask from SWBD and cDSWE over the Prairie Pothole Region site (PPR).