1. Introduction
Worldwide mass migration to urban areas results in the land use/cover changes, changes in climate and intensifying anthropogenic modifications to urban environments [
1]. This directly brings about more unexpected variations in urban surface water, especially in external morphological features of the coverage. The urban surface water changes further impact relevant aquatic biodiversity, healthy human life and even urban ecological balance [
2]. Urban surface water deficiencies would aggravate the urban heat island effect and disrupt the living environments of urban vegetation; conversely, surface water inundation would result in flooding and even high fatality because of associated waterborne diseases [
3]. Therefore, figuring out the coverage of urban surface water is a crucial issue for urban environments.
Remote sensing is a powerful data source for acquiring prior and comprehensive knowledge of urban surface water [
4,
5]. It allows synoptic, permanent, and dynamic urban surface water monitoring and is clearly superior to conventional in-situ measurements [
6,
7]. Among current remote sensing sensors, Landsat sensors have the greatest reputation in urban monitoring because of its advantages in terms of free availability, and moderate spectral, temporal, and spatial resolutions. Therefore, in our study, we implement Landsat imagery to investigate the urban surface water coverage problem.
Many studies have previously reported urban surface water extraction achievements using Landsat images. Regular water extraction methods can be categorized into three main groups [
8,
9]: (1) thematic classification methods [
10,
11,
12]; (2) single-band thresholding methods [
13,
14]; and (3) water index methods [
15,
16,
17].
Thematic classification methods formulate urban surface water extraction into a regular binary unsupervised or supervised classification problem on urban land cover types, and select surface water as the exclusive thematic class for mapping [
10]. The methods easily bring about low accuracy in areas where the background land cover includes low albedo surfaces, such as asphalt roads and building shadows in urban areas [
11]. Moreover, they utilize a Boolean set to classify each pixel as either water or non-water, and fail to achieve the desired accuracy, especially at the water-land (i.e., non-water) interface [
12]. Single-band thresholding methods select a single diagnostic spectral band from Landsat images (e.g., band 5 from TM/ETM+) and delineate the urban surface water coverage with a manually-defined threshold [
18]. Accordingly, the subjectivity of the threshold selection can lead to an overestimated or underestimated result and, moreover, the extracted surface water is affected by shadow noise [
16].
Different from the above two methods, water index methods combine two or more spectral bands using algebraic operations to enlarge the divergence between water and non-water areas. McFeeters proposed the normalized difference water index (NDWI) to delineate urban surface water. The NDWI is implemented with a ratio model using the green band (i.e., band 2) and the near-infrared band (i.e., band 4) from Landsat TM/ETM+ data [
15]. An empirical value of 0 is set as the threshold for extracting surface water from the raw Landsat images, and pixels with positive NDWI values are regarded as belonging to surface water. Unfortunately, the obtained NDWI surface water suffers from noise in built-up areas, and the threshold of 0 always results in an over-estimation of the surface water [
16]. Subsequently, Xu presented another surface water index called modified normalized difference water index (MNDWI) [
16]. MNDWI improves NDWI by replacing the near-infrared band (i.e., band 4) with the middle-infrared band (i.e., band 5) from Landsat TM/ETM+ images. MNDWI reduces the built-up area noise in NDWI, and it performs better than NDWI in extracting urban surface water where built-up areas dominate in the image scene. Nevertheless, the threshold of MNDWI is difficult to estimate because of their scene-driven features, and the problem adversely impacts its realistic performance of MNDWI [
8]. To address the instability of MNDWI, the automated water extraction index (AWEI) was presented by combining multi-band Landsat images (i.e., bands 2, 4, 5, and 7 of Landsat TM/ETM+ images) [
9]. The AWEI argues that the threshold of 0 is a good initialization for urban surface water extraction in the method.
The above three types of methods greatly benefit the studies of urban surface water extraction. However, one big problem of mixed pixels still exists in the urban surface water extraction procedure when using moderate spatial resolution Landsat images. In particular, the problem becomes more pronounced when extracting accurate boundaries of surface water. A simple cause for this problem is that the scale of urban land cover is often smaller than the field of view in the Landsat TM/ETM+ sensor (30 m) [
19,
20]. Subsequently, a few sub-pixel classifiers were presented to handle the mixed pixel problem. Sethre proposed a sub-pixel classifier named analysis spectral analytical process (AASAP), which aimed to expand the regular classifier into the sub-pixel field to detect the size and shape of ponds [
21]. The classifier focuses on sub-pixel wetlands or ponds and requires careful verifications when implemented in the case of urban water extraction. Sun optimized the training samples with mixed training samples and then combined them with the support vector machine (SVM) classifier to improve the urban surface water extraction results [
22]. However, the scheme suffers from slow computational speed and complicated manual operations, which seriously restricts its real-word applications in other urban areas.
Spectral unmixing is an alternative technique that can be used to solve the mixed pixel problem encountered in urban environments. It can be classified into linear spectral unmixing (LSU) and nonlinear spectral unmixing (NLSU), according to different mathematical assumptions in mixing patterns of urban land covers in the study area [
23]. Numerous applications exploit the powerful performance of LSU in converting spectral information into physical abundances of materials on the earth’s surface [
23]. Previously, researchers have made some trials related to the surface water extraction problem using spectral unmixing. Zhou integrated a multiscale extraction scheme with spectral mixture analysis techniques to improve water extraction in urban environments from moderate spatial resolution satellite images [
24]. The feature of this work is to adopt the multiscale scheme that conducts surface water extraction in multiscale local regions in order to refine the result. Xie combined the water index NDWI with LSU and proposed an automatic subpixel water mapping (ASWM) method to map urban surface water at the sub-pixel scale [
25]. Pure water extracted from NDWI and water fractions of mixed water-land pixels estimated from LSU constitute the final urban surface water map. As distinct from previous research, we propose a low albedo fraction (LAF) method based on LSU to extract urban surface water from Landsat imagery. In comparison to all of the above methods, our LAF methods have three major advantages, in the following:
- (1)
The LAF method stands on the H-L-V [
23] (i.e., high albedo-low albedo-vegetation) spectral mixture analysis of urban surface reflectances, and investigates the urban surface water extraction problem with the low albedo fraction map. Accordingly, our idea is different from above water extraction methods, especially sub-pixel classifiers and spectral unmixing methods by Zhou [
24] and Xie [
25].
- (2)
The LAF method implements a steady initial threshold at 1 and that significantly reduces the work of parameter tuning in LAF. By contrast, current spectral unmixing-based methods by Zhou and Xie could not provide a stable threshold for fraction segmentation. The water index methods also suffer from the unstable initial threshold problem. Therefore, the LAF is easier to implement in real-word applications than other methods, such as spectral unmixing methods and water index methods.
- (3)
The LAF method obtains high extraction accuracies of urban surface water, and it significantly improves the accuracy of sub-pixel surface water extraction when compared against MNDWI and AWEI.
2. Test Sites and Datasets
The test sites utilized in the study are located in three representative metropolises of China: Wuhan, Shanghai, and Guangzhou. Different surface features of the urban environments (e.g., different spatial patterns of land covers and different urban backgrounds) of the three sites render them good candidates for testing the proposed LAF method. The Wuhan metropolis lies in one of the fastest-growing regions in central China, and it is becoming a significant strategic center for the rejuvenation of the Chinese nation. Wuhan is centered at the confluence of the Yangzi River and Han River, as shown in
Figure 1a. Shanghai is a famous international metropolis, and it is known for advanced economics, shipping, and finance. The Huangpu River in
Figure 1b is very important for the health and wellbeing of people in Shanghai. Guangzhou is an important port in China. The Pearl River in
Figure 1c runs around Guangzhou city, and is a vital source of drinking water.
Figure 1 illustrates the different surface characteristics of all three metropolises, where it can be seen that they have similar land cover types, including built-up surfaces, tall buildings, rivers, and vegetation.
Landsat images of the three metropolises were acquired from the website of the United States Geological Survey (USGS) (available at
http://www.glovis.usgs.gov) [
26], and the subsets cover the main urban background types and surface water for extraction. The downloaded Landsat imagery belongs to a Level-1 precision- and terrain-corrected product (L1T). The utilized Landsat images are free of clouds in order to avoid any negative effects from cloud. A reference image was utilized to determine the ground truth of water pixels in Landsat images, and it greatly helped in evaluating the accuracies of extracted surface water, at either the pixel level or sub-pixel level. The original sources of the reference data were high spatial-resolution pan-sharpened Quickbird images from the Digital Globe Company, and the JPEG format image at 4m spatial resolution was exported from Google Earth Pro (available at
www.google.com). We selected high spatial-resolution images (HSRI) with acquisition times as close as possible to the Landsat images, and tried our best to ensure that the land-cover classes of the Landsat images and the Google Earth images were the same for the same site.
Table 1 lists detailed information about the reference data and Landsat images. Geo-referencing HSRI data with Landsat images was implemented to unify spatial references of the corresponding pixels in both datasets. The manual co-registration was carefully undertaken with a Root Mean Square Error (RMSE) of no more than 0.3 pixels, and 19 control points were manually selected from each image. The “true” boundaries of urban surface water at the test sites were manually digitized on screen from the reference data, and were then rasterized at 4 m spatial resolution.
5. Discussion
In the above experiments, we implemented LAF to extract urban surface water from Landsat imagery on three metropolises, Wuhan, Shanghai and Guangzhou. The extraction results were evaluated on the aspects of per-pixel accuracy and sub-pixel accuracy and were compared with two state-of-the-art methods, AWEI and MNDWI. All the experimental results demonstrate the superiority of LAF to other two methods.
First, from per-pixel accuracy estimation experiment on three test sites, our LAF shows better performance in differentiating urban surface water from other ground objects (e.g., building roofs, roads, and vegetation), especially in the image scenes of Wuhan and Shanghai. The better per-pixel accuracy results, in our estimation, from two main causes. The first is that the H-L-V linear mixture model could explain reflectance features of land covers in Landsat imagery, while also avoiding nonnegative effects from soil. The second is that multiple selection schemes maximize the divergence of three endmembers of high albedo, low albedo and vegetation, and it guarantees three vertexes of triangular topology in mixing space of all land covers of urban environments.
Second, with regard to sub-pixel accuracy estimation results on three test sites, our LAF behaves better at recognizing water fractions from boundary mixed pixels. The LSU feature of our method guarantees that it is better able to identify water fractions from boundary mixed pixels, using a fraction threshold of low albedo. On the contrary, the AWEI and MNDWI could not avoid the large uncertainty in boundary water pixels originating from the hard-binary classification of water and non-water at the pixel level.
Finally, the threshold analysis explains that the LAF has a relatively more stable threshold than other two methods. For many water extraction methods, the threshold value for binary classification is difficult to estimate because of its data-driven nature [
8]. Our LAF has the smallest variations in the threshold on three test sites among all three methods, making the implementation of the method simpler. It is essential to note that the different endmember selection scheme described in [
38] would also greatly affect the stability or value of the fraction threshold.
However, our work has several limitations that require further study. The first is that we could not explain theoretical reasons for good behaviors of empirical threshold value as 1. The fraction relations between water and other urban land covers should be carefully analyzed in further experiments to explain the physical meanings of the recommended initial threshold. The second is that we did not carefully investigate the water extraction problem in the presence of cloud and SLC-gaps. Many algorithms including the multi-temporal linear regression algorithm [
39] and the GNSPI algorithm [
40] have been proposed to detect the thick clouds and fill gap pixels in SLC-OFF Landsat imagery. The combination of the above algorithms with our LAF would be a promising direction to extend the LAF into urban water extraction of any archived Landsat images. The third is that the H-L-V linear mixture model restricts the applications of LAF into other image scenes. It is not difficult to extend the LAF for the purposes of extracting urban wetlands and identifying water fractions from mixed vegetation-water pixels. Unfortunately, the method would not directly apply to other situations, such as open water or coastal wetlands, because the spectral features of their land covers do not satisfy the H-L-V linear mixture model, especially the unavailability of high albedo reflectance such as building roofs and airports. In such cases, other linear mixture models or nonlinear mixture models might be a good addition to the proposed method. The fourth one is that the endmember selection scheme involves too much manual operations and it might restrict the application of LAF to too large an image scene. The automatic or intelligent scheme should be further investigated to satisfy the demands from its complicated image scenes in massive Landsat datasets. The last one is that most recently proposed methods including the enhanced water index (EWI) [
39] and dynamic surface water extent (DSWE) [
40] have not been considered in comparisons with the LAF. Further performance contrast with modifications of MNDWI and newly-proposed methods on more Landsat images is essential to promote the LAF in real-word applications.