A Submonthly Surface Water Classification Framework via Gap-Fill Imputation and Random Forest Classifiers of Landsat Imagery
Abstract
:1. Introduction
2. Materials and Methods
2.1. Gather Landsat Imagery
- Collect image population: Use GEE to collect all Landsat 5, 7, and 8 Surface Reflectance Tier 1 images with at least one pixel located within the ROI and clip each image to the ROI polygon. In regions with snow or frozen water during the winter months, we exclude images under these conditions.
- Import JRC data: Merge each of the retained images with the JRC water classification layer of the same month and year as each image’s acquisition date.
- Nullify low quality pixels: Utilize the Landsat quality assurance bands to nullify pixels containing cloud cover or other sensor errors as described in Appendix A [16].
- Perform feature selection: The spectral channels provided in each of the images may be used directly for classification or functions of these bands may be used to generate other features. The six standard reflectance bands among Landsat mission imagery include Red, Green, Blue, NIR (Near Infrared), SWIR_1 (Shortwave Infrared 1), and SWIR_2 (Shortwave Infrared 2) [6]. Various forms of the Normalized Difference Water Index (NDWI, MNDWI) and other indices related to the infrared Landsat sensor bands have consistently been shown to perform well as features for detecting surface water [10,17,18,19]. We utilize the following quantities as features:
2.2. Select Training Images
- Filter low quality images: Filter the collected images, requiring each retained image to meet the following thresholds: no more than of the ROI is covered by clouds and at least of the ROI is included in the image (Appendix A). Values of and allow for a sufficient pool of high-quality imagery for the validation study presented in Section 3.1. We select images only from the Landsat 5 archive for training, but Landsat 7 and 8 images can also be used.
- Sort imagery: Calculate the day of year (DOY) quantities from each image acquisition date and sort the images into k sets . In this implementation, we use DOY 120–180, DOY 180–240, and DOY 240–300 to split the growing season into three groups.
- Select representative training imagery: For the images in each group, calculate the total number of pixels labeled as water in the JRC classification layer and sort the images by total JRC water extent in increasing order. Determine which image is the first to fall within the percentile for the group and select this image to represent the associated DOY range. We select for our analyses. At this level, some of the selected images exhibit signal variation due to frequent flooding, which helps the classifier account for a range of surface water conditions.
2.3. Cluster-Based Bias Correction Algorithm
- Cluster image: Given an image containing a set of pixels with b bands and the associated “ground truth” labels, use X-Means clustering [24] to assign each pixel to one of X clusters, such that . For this study, we used and in order to ensure at least 10 clusters would be identified in the training images. This specification often results in regions of each waterbody being separated into various clusters based on different depths, bordering terrains, or vegetation. X-means clustering intrinsically determines the optimal number of clusters X by minimizing AIC. Perform clustering on the b bands for each pixel alone, without including the information from the ground truth labels.
- Calculate proportion water: For each cluster , calculate the proportion of pixels with an associated binary label of 1 or with at least one vertically or horizontally adjacent pixel labeled as 1. These adjacent pixels assist in identifying pixels located along the border of waterbodies.
- Threshold at : For cluster x, if the proportion p is greater than a threshold , i.e., , set a temporary classification label to 1 for all pixels belonging to cluster x. If , set the temporary label to 0 for all pixels belonging to cluster x.
- Compare to JRC label: After iterating through all X clusters, compare the temporary label for each pixel to its original label. If the temporary and original labels are both 1 or both 0, include one copy of the pixel data with the original label in the new training dataset. If the temporary and original labels do not match, include two copies of the pixel data in the training dataset, the first labeled 0 and the second labeled 1.
- Sensitivity analysis: The threshold is selected in the validation study to reach desirable rates of sensitivity and specificity. See details in Section 3.1.
2.4. Impute Missing Values
- Mean estimation: Group images by every two consecutive years (the last group containing three years if the total number of years is odd), then estimate a pixel-wise annual mean function within each group separately. Compared to using all of the images to estimate one universal mean function in STFIT, the proposed step mean estimation can reflect long-term waterbody extent variation and is more accurate when the annual mean of any pixel is nonstationary.
- Temporal effect estimation: Subtract the estimated mean function from the observed pixels (i.e., calculate the residuals within each group after removing outliers), then pass the residuals to the temporal effect estimation in the STFIT algorithm.
- Spatial effect estimation: The STFIT algorithm is used to estimate the spatial effect for pixels in partially missing images by sparse FPCA techniques [32]. For images with completely missing data, we use a linear interpolation of the spatial effect estimates from the nearest before and after images by date (of those with less than pixels missing) to estimate the spatial effect.
- Imputation: The imputed pixel value is the sum of the mean estimate, temporal effect estimate, and spatial effect estimate.
2.5. Classify Imagery
- Train classifier: Train a random forest classifier using t trees, e.g., , and the k training images processed by the CBC algorithm for some , as described in Section 2.2 and Section 2.3 [25,33].
- Gap-fill images: If the data is downloaded for use with R, use the method described in Section 2.4 to gap-fill all the images of the ROI gathered in Section 2.1.
- Perform feature selection: For each pixel in the set of ROI imagery, use the Landsat surface reflectance bands to generate the same features as derived in the training data (Section 2.2).
- Classify images: For each image, use the trained random forest classifier to classify each pixel.
2.6. Detect Outlier Images
2.7. Swim Classification Framework Summary
3. Results
3.1. Validation Study
3.2. New Orleans Case Study
3.3. Devils Lake Case Study
3.4. Colorado River Case Study
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Masking Landsat Data
- pixel_qa: 66, 68, 130, 132, 322, 324, 386, 388, 834, 836, 898, 900, 1346, 1348
- sr_cloud_qa: 1, 32
- sr_aerosol: 1, 2, 4, 32, 66, 68, 96, 100, 130, 132, 160, 164, 194, 224, 228
References
- NASA. Earth System Science Data Resources; National Aeronautics and Space Administration (NASA): Greenbelt, MD, USA, 2011.
- Friedl, M.; McIver, D.; Hodges, J.; Zhang, X.; Muchoney, D.; Strahler, A.; Woodcock, C.; Gopal, S.; Schneider, A.; Cooper, A.; et al. Global land cover mapping from MODIS: Algorithms and early results. Remote Sens. Environ. 2002, 83, 287–302. [Google Scholar] [CrossRef]
- Homer, C.; Huang, C.; Yang, L.; Wylie, B.; Coan, M. Development of a 2001 National Land-Cover Database for the United States. Photogramm. Eng. Remote Sens. 2004, 70, 829–840. [Google Scholar] [CrossRef] [Green Version]
- Pekel, J.F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418. [Google Scholar] [CrossRef]
- Guo, M.; Li, J.; Sheng, C.; Xu, J.; Wu, L. A review of wetland remote sensing. Sensors 2017, 17, 777. [Google Scholar] [CrossRef] [Green Version]
- Wulder, M.A.; White, J.C.; Loveland, T.R.; Woodcock, C.E.; Belward, A.S.; Cohen, W.B.; Fosnight, E.A.; Shaw, J.; Masek, J.G.; Roy, D.P. The global Landsat archive: Status, consolidation, and direction. Remote Sens. Environ. 2016, 185, 271–283. [Google Scholar] [CrossRef] [Green Version]
- Huang, C.; Chen, Y.; Zhang, S.; Wu, J. Detecting, extracting, and monitoring surface water From space using optical sensors: A review. Rev. Geophys. 2018, 56, 333–360. [Google Scholar] [CrossRef]
- Feng, M.; Sexton, J.O.; Channan, S.; Townshend, J.R. A global, high-resolution (30-m) inland water body dataset for 2000: First results of a topographic–spectral classification algorithm. Int. J. Digit. Earth 2016, 9, 113–133. [Google Scholar] [CrossRef] [Green Version]
- Mueller, N.; Lewis, A.; Roberts, D.; Ring, S.; Melrose, R.; Sixsmith, J.; Lymburner, L.; McIntyre, A.; Tan, P.; Curnow, S.; et al. Water observations from space: Mapping surface water from 25years of Landsat imagery across Australia. Remote Sens. Environ. 2016, 174, 341–352. [Google Scholar] [CrossRef] [Green Version]
- Rokni, K.; Ahmad, A.; Selamat, A.; Hazini, S. Water feature extraction and change detection using multitemporal Landsat imagery. Remote Sens. 2014, 6, 4173–4189. [Google Scholar] [CrossRef] [Green Version]
- Verpoorter, C.; Kutser, T.; Tranvik, L. Automated mapping of water bodies using Landsat multispectral data. Limnol. Oceanogr. Methods 2012, 10, 1037–1050. [Google Scholar] [CrossRef]
- Markham, B.; Storey, J.; Williams, D.; Irons, J. Landsat sensor performance: History and current status. IEEE Trans. Geosci. Remote Sens. 2004, 42, 2691–2694. [Google Scholar] [CrossRef]
- Feyisa, G.L.; Meilby, H.; Fensholt, R.; Proud, S.R. Automated Water Extraction Index: A new technique for surface water mapping using Landsat imagery. Remote Sens. Environ. 2014, 140, 23–35. [Google Scholar] [CrossRef]
- Isikdogan, F.; Bovik, A.C.; Passalacqua, P. Surface water mapping by deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4909–4918. [Google Scholar] [CrossRef]
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017. [Google Scholar] [CrossRef]
- Foga, S.; Scaramuzza, P.L.; Guo, S.; Zhu, Z.; Dilley, R.D.; Beckmann, T.; Schmidt, G.L.; Dwyer, J.L.; Hughes, M.J.; Laue, B. Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sens. Environ. 2017, 194, 379–390. [Google Scholar] [CrossRef] [Green Version]
- Du, Z.; Li, W.; Zhou, D.; Tian, L.; Ling, F.; Wang, H.; Gui, Y.; Sun, B. Analysis of Landsat-8 OLI imagery for land surface water mapping. Remote Sens. Lett. 2014, 5, 672–681. [Google Scholar] [CrossRef]
- Fisher, A.; Flood, N.; Danaher, T. Comparing Landsat water index methods for automated water classification in eastern Australia. Remote Sens. Environ. 2016, 175, 167–182. [Google Scholar] [CrossRef]
- Ji, L.; Zhang, L.; Wylie, B. Analysis of dynamic thresholds for the Normalized Difference Water Index. Photogramm. Eng. Remote Sens. 2009, 75, 1307–1317. [Google Scholar] [CrossRef]
- Gao, B. NDWI-A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
- McFeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
- Xu, H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 2006, 27, 3025–3033. [Google Scholar] [CrossRef]
- Frazier, P.S.; Page, K.J. Water body detection and delineation with Landsat TM data. Photogramm. Eng. Remote Sens. 2000, 66, 1461–1467. [Google Scholar]
- Pelleg, D.; Moore, A.W. X-means: Extending K-means with efficient estimation of the number of clusters. In Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, CA, USA, 29 June–2 July 2000; pp. 727–734. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.; Zhu, X.; Vogelmann, J.E.; Gao, F.; Jin, S. A simple and effective method for filling gaps in Landsat ETM+ SLC-off images. Remote Sens. Environ. 2011, 115, 1053–1064. [Google Scholar] [CrossRef]
- Yin, G.; Mariethoz, G.; Sun, Y.; McCabe, M.F. A comparison of gap-filling approaches for Landsat-7 satellite data. Int. J. Remote Sens. 2017, 38, 6653–6679. [Google Scholar] [CrossRef]
- Zeng, C.; Shen, H.; Zhang, L. Recovering missing pixels for Landsat ETM + SLC-off imagery using multi-temporal regression analysis and a regularization method. Remote Sens. Environ. 2013, 131, 182–194. [Google Scholar] [CrossRef]
- Zhu, X.; Liu, D.; Chen, J. A new geostatistical approach for filling gaps in Landsat ETM+ SLC-off images. Remote Sens. Environ. 2012, 124, 49–60. [Google Scholar] [CrossRef]
- Zhu, W. Topics in Sparse Functional Data Analysis. Ph.D. Dissertation, Iowa State University, Ames, IA, USA, 2018. [Google Scholar]
- Yao, F.; Wang, J.; Wang, C.; Cretaux, J. Constructing long-term high-frequency time series of global lake and reservoir areas using Landsat imagery. Remote Sens. Environ. 2019, 232, 111210. [Google Scholar] [CrossRef]
- Yao, F.; Müller, H.G.; Wang, J.L. Functional data analysis for sparse longitudinal data. J. Am. Stat. Assoc. 2005, 100, 577–590. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Vuolo, F.; Ng, W.T.; Atzberger, C. Smoothing and gap-filling of high resolution multi-spectral time series: Example of Landsat data. Int. J. Appl. Earth Obs. Geoinf. 2017, 57, 202–213. [Google Scholar] [CrossRef]
- Maechler, M.; Rousseeuw, P.; Croux, C.; Todorov, V.; Ruckstuhl, A.; Salibian-Barrera, M.; Verbeke, T.; Koller, M.; Conceicao, E.L.T.; Anna di Palma, M. Robustbase: Basic Robust Statistics. R package version 0.93-7. 2021. Available online: http://robustbase.r-forge.r-project.org/ (accessed on 22 March 2021).
- Geoffrey Gabbott, W. National Agriculture Imagery Program (NAIP); United States Department of Agriculture Farm Service Agency Aerial Photography Field Office: Salt Lake City, UT, USA, 2010. [Google Scholar]
- USGS. National Hydrography Geodatabase; United States Geological Survey (USGS): Washington, DC, USA, 2013.
- Lane, R.R.; Day, J.W.; Kemp, G.; Demcheck, D.K. The 1994 experimental opening of the Bonnet Carre Spillway to divert Mississippi River water into Lake Pontchartrain, Louisiana. Ecol. Eng. 2001, 17, 411–422. [Google Scholar] [CrossRef]
- U.S. Army Corps of Engineers. Bonnet Carre Spillway; US Army Corps of Engineers New Orleans District: Norco, LA, USA, 2014.
- Todhunter, P.E.; Rundquist, B.C. Terminal lake flooding and wetland expansion in Nelson County, North Dakota. Phys. Geogr. 2004, 25, 68–85. [Google Scholar] [CrossRef]
- Yu, L.; Wang, Z.; Tian, S.; Ye, F.; Ding, J.; Kong, J. Convolutional neural networks for water body extraction from Landsat imagery. Int. J. Comput. Intell. Appl. 2017, 16, 1750001. [Google Scholar] [CrossRef]
- Masek, J.; Vermote, E.; Saleous, N.; Wolfe, R.; Hall, F.; Huemmrich, K.; Gao, F.; Kutler, J.; Lim, T. LEDAPS Landsat Calibration, Reflectance, Atmospheric Correction Preprocessing Code; Oak Ridge National Laboratory Distributed Active Archive Center: Oak Ridge, TN, USA, 2012. [Google Scholar]
- Vermote, E.; Roger, J.C.; Franch, B.; Skakun, S. LaSRC (Land Surface Reflectance Code): Overview, application and validation using MODIS, VIIRS, Landsat and Sentinel 2 data’s. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 8173–8176. [Google Scholar] [CrossRef] [Green Version]
Method | Mean | Std Dev |
---|---|---|
JRC | −0.04830 | 0.0679 |
SWIM | −0.03960 | 0.0429 |
SWIM | −0.03410 | 0.0456 |
SWIM | −0.02850 | 0.0398 |
SWIM | −0.01970 | 0.0345 |
SWIM | −0.01690 | 0.0357 |
SWIM | −0.00885 | 0.0381 |
SWIM | 0.00317 | 0.0410 |
SWIM | 0.00960 | 0.0502 |
SWIM | 0.02130 | 0.0554 |
SWIM | 0.04550 | 0.0845 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Labuzzetta, C.; Zhu, Z.; Chang, X.; Zhou, Y. A Submonthly Surface Water Classification Framework via Gap-Fill Imputation and Random Forest Classifiers of Landsat Imagery. Remote Sens. 2021, 13, 1742. https://doi.org/10.3390/rs13091742
Labuzzetta C, Zhu Z, Chang X, Zhou Y. A Submonthly Surface Water Classification Framework via Gap-Fill Imputation and Random Forest Classifiers of Landsat Imagery. Remote Sensing. 2021; 13(9):1742. https://doi.org/10.3390/rs13091742
Chicago/Turabian StyleLabuzzetta, Charles, Zhengyuan Zhu, Xinyue Chang, and Yuyu Zhou. 2021. "A Submonthly Surface Water Classification Framework via Gap-Fill Imputation and Random Forest Classifiers of Landsat Imagery" Remote Sensing 13, no. 9: 1742. https://doi.org/10.3390/rs13091742
APA StyleLabuzzetta, C., Zhu, Z., Chang, X., & Zhou, Y. (2021). A Submonthly Surface Water Classification Framework via Gap-Fill Imputation and Random Forest Classifiers of Landsat Imagery. Remote Sensing, 13(9), 1742. https://doi.org/10.3390/rs13091742