Next Article in Journal
Higher-Order Conditional Random Fields-Based 3D Semantic Labeling of Airborne Laser-Scanning Point Clouds
Next Article in Special Issue
Aligning Pixel Values of DMSP and VIIRS Nighttime Light Images to Evaluate Urban Dynamics
Previous Article in Journal
Correction: Zhang, M., et al. Estimation of Vegetation Productivity Using a Landsat 8 Time Series in a Heavily Urbanized Area, Central China. Remote Sens. 2019, 11, 133
Previous Article in Special Issue
A Rapid and Automated Urban Boundary Extraction Method Based on Nighttime Light Data in China
Open AccessArticle

Mapping Urban Extent at Large Spatial Scales Using Machine Learning Methods with VIIRS Nighttime Light and MODIS Daytime NDVI Data

Center for International Earth Science Information Network (CIESIN), Earth Institute, Columbia University, Palisades, NY 10964, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2019, 11(10), 1247; https://doi.org/10.3390/rs11101247
Received: 28 April 2019 / Revised: 20 May 2019 / Accepted: 21 May 2019 / Published: 27 May 2019
(This article belongs to the Special Issue Advances in Remote Sensing with Nighttime Lights)

Abstract

Urbanization poses significant challenges on sustainable development, disaster resilience, climate change mitigation, and environmental and resource management. Accurate urban extent datasets at large spatial scales are essential for researchers and policymakers to better understand urbanization dynamics and its socioeconomic drivers and impacts. While high-resolution urban extent data products - including the Global Human Settlements Layer (GHSL), the Global Man-Made Impervious Surface (GMIS), the Global Human Built-Up and Settlement Extent (HBASE), and the Global Urban Footprint (GUF) - have recently become available, intermediate-resolution urban extent data products including the 1 km SEDAC’s Global Rural-Urban Mapping Project (GRUMP), MODIS 1km, and MODIS 500 m still have many users and have been demonstrated in a recent study to be more appropriate in urbanization process analysis (around 500 m resolution) than those at higher resolutions (30 m). The objective of this study is to improve large-scale urban extent mapping at an intermediate resolution (500 m) using machine learning methods through combining the complementary nighttime Visible Infrared Imaging Radiometer Suite (VIIRS) and daytime Moderate Resolution Imaging Spectroradiometer (MODIS) data, taking the conterminous United States (CONUS) as the study area. The effectiveness of commonly-used machine learning methods, including random forest (RF), gradient boosting machine (GBM), neural network (NN), and their ensemble (ESB), has been explored. Our results show that these machine learning methods can achieve similar high accuracies across all accuracy metrics (>95% overall accuracy, >98% producer’s accuracy, and >92% user’s accuracy) with Kappa coefficients greater than 0.90, which have not been achieved in the existing data products or by previous studies; the ESB is not able to produce significantly better accuracies than individual machine learning methods; the total misclassifications generated by GBM are more than those generated by RF, NN, and ESB by 14%, 16%, and 11%, respectively, with NN having the least total misclassifications. This indicates that using these machine learning methods, especially NN and RF, with the combination of VIIRS nighttime light and MODIS daytime normalized difference vegetation index (NDVI) data, high accuracy intermediate-resolution urban extent data products at large spatial scales can be achieved. The methodology has the potential to be applied to annual continental-to-global scale urban extent mapping at intermediate resolutions.
Keywords: urbanization; urban extent; urban land use; remote sensing; machine learning; VIIRS; MODIS urbanization; urban extent; urban land use; remote sensing; machine learning; VIIRS; MODIS

1. Introduction

Over 50% of the global population already lives in urban areas, and two-thirds of them are expected to live in urban areas by 2050 [1]. Urban population growth and the associated socioeconomic development have caused intensive urban expansion [2,3] and, in turn, greater poverty and environmental degradation [4], which are posing significant challenges on sustainable development, disaster resilience, climate change mitigation, and environmental and resource management in urban areas [5,6,7]. As a result, city governments may not be able to provide services for the increased population, elevated city energy use often leads to greater air pollution, and the risk of urban environmental hazards can be magnified [5]. Studies have shown that the global urbanization rate hit about 54% in 2014 [1], and urban land areas will grow 1.2 million km2 by 2030 if the current trend continues [8]. To better understand the patterns, dynamics, drivers, and impacts of urban expansion and effectively support decision makings regarding sustainable urban development, it is fundamental to obtain timely, accurate, and consistent measurements of urban extent at large spatial scales [9,10,11].
Satellite remote sensing has been widely used in urban area mapping since Earth observation satellite data became available [10,12,13,14,15,16]. While high-resolution urban extent data products - including the Global Human Settlements Layer (GHSL), the Global Man-Made Impervious Surface (GMIS), the Global Human Built-Up and Settlement Extent (HBASE), and the Global Urban Footprint (GUF) derived from Landsat and TanDEM-X data - have recently become available [16,17,18], intermediate-resolution urban extent data products are still valuable, especially for large-scale urbanization analysis. The reasons include (1) intermediate-resolution satellite images have proven effective in urban extent extraction at regional to global scales [10,16,19,20] and will be more computationally efficient, (2) intermediate-resolution urban extent data products generated from satellite data such as the 1 km NASA Socioeconomic Data and Applications Center (SEDAC)’s Global Rural-Urban Mapping Project (GRUMP), MODIS 1km, and MODIS 500 m [10,21,22] still attract many analyses and modeling users [23,24,25,26,27], (3) considering that urban and rural areas are not necessarily discrete classes but more of a continuum [28,29], intermediate-resolution data products may better reflect demographic and sociological conditions of urban areas [12,15], which include not just built-up areas but the urban fabric of core urban areas and surrounding hinterlands and commuter-sheds, (4) broader definitions of what constitutes urban areas are useful for studies of urban morphology, energy use, climate change, and sustainability [14], and also for research on rural agricultural systems where one may wish to exclude all but the smallest built-up areas, and (5) a recently published study has demonstrated that urban extent data products at 480 m resolution are more appropriate than those at the high resolution (30 m) for urbanization process analysis at large spatial scales [30].
Although several global urban extent data products at intermediate resolutions are available currently (Table 1), significant inconsistencies remain among them [10,11]. For example, the total areas of global urban extent measured by IMPSA, MODIS 1km, MODIS 500 m, and GRUMP differ by 15% to 516% (572,000, 657,000, 727,000, and 3,524,000 km2, respectively). While one of the reasons for these inconsistencies is that different groups and disciplines define urban areas somewhat differently [10,31,32,33,34,35], these definitions are highly correlated. This calls into question the accuracy of each map’s depiction of urban areas. Based on Schneider’s assessment on these data products, their overall accuracies ranged from 73% (GRUMP) to 93% (MODIS 500 m). However, their producer’s accuracy (how often real urban areas on the ground are correctly shown on the classified map) is generally low (IMPSA and GLC2000 < 50%, MODIS 500 m and MODIS 1km around 75%, and GRUMP nearly 90%), and their user’s accuracies (how often the urban areas on the map are actually present on the ground) is also low (MODIS 500 m around 73%, GLC2000 and IMPSA are 66% and 65%) with the Kappa coefficients ranging from only 0.28 to 0.65 [10]. More recent studies at intermediate resolutions [16,36,37,38,39,40,41,42,43,44,45,46,47,48] reported overall accuracies from 73% to 99% for all urban and non-urban features, with Kappa coefficients from 0.29 to 0.84, and the producer’s accuracy around 80%, and user’s accuracy close to 90% for only urban features. Therefore, research is still needed to develop an intermediate-resolution urban extent mapping methodology that can achieve consistent high accuracies (overall accuracy, producer’s accuracy, user’s accuracy, and Kappa), is repeatable for different times, and is scalable to continental-to-global scale applications [7].
Both nighttime light [7,24,34,36,37,38,39,40,41,42,43] and daytime spectral satellite data [10,13,14,15,32,33,35,44,45] have been studied in mapping urban extent, and they tend to be complementary for characterizing urban areas [16]. Therefore, the combination of nighttime light and daytime spectral data has the potential to overcome their individual limitations. However, only a few former studies have combined nighttime light data and daytime spectral data [16,19,20,46,47,48].
Machine learning methods have been demonstrated to perform well in land cover mapping [49,50,51,52], and have been effective in urban area mapping in recent years [11,13,15,16,44,45,48,53]. Schneider et al. employed a supervised decision tree algorithm (C4.5) with a one-year time series of MODIS 8-day composites of the seven land bands and the enhanced vegetation index (EVI), and produced the MODIS 500 m global urban extent data product [10]. Hu and Weng estimated impervious surfaces from medium spatial resolution imagery using multi-layer perceptron neural networks [13]. Zhou et al. developed a cluster-based method to map urban areas from the Defense Meteorological Program Operational Line-Scan System (DMSP/OLS) nightlight data [43]. Wan et al. relied on the Terra MODIS surface reflectance datasets and a positive and unlabeled learning (PUL) method for mapping US urban extent [45]. Zhang et al. applied the one-class support vector machine (OCSVM) to classify different combinations of the DMSP/OLS stable nighttime light (NTL) data, MODIS normalized difference vegetation index (NDVI) data, and land surface temperature (LST) data for regional urban extent mapping [48]. Wang et al. used a back propagation neural network to identify urban areas in China with VIIRS nighttime light and MODIS NDVI data as inputs [16]. Li et al. experimented with support vector machine (SVM) methods to extract urban extent from LJ1-01 and VIIRS nighttime light data [20].
Random forest (RF), gradient boosting machine (GBM), neural network (NN), and their ensemble (ESB) are commonly used machine learning methods in land cover mapping but have not been fully assessed in urban area mapping, especially at intermediate resolutions. The objective of this study is to explore the effectiveness of these machine learning methods for improving the accuracies of large-scale urban extent mapping at intermediate resolutions (500 m) based on the combination of the complementary VIIRS nighttime light and MODIS daytime NDVI data.

2. Materials and Methods

2.1. Study Area

This study takes the conterminous United States (CONUS) as the study area (Figure 1). The reasons for choosing this study area include: First, the United States is one of the highly urbanized countries with intensive urbanization in recent decades. Based on the statistics from the US Census, four out of five Americans lived in urban areas in the 2000s and the urbanization of the United States is not uniform across its vast landscape with the fastest urbanization occurring in the northeastern region [54]. Lopez’s study in 2014 [55] demonstrated that for 2010, there were 136 US metropolitan areas with a sprawl index ranging from 50 to 70, and 176 US metropolitan areas with a sprawl index greater than 75. The sprawl index values were calculated based on the formula:
SIi = ((S%i − D%i)/100) × 50,
where:
  • SIi = sprawl index for metropolitan area i
  • S%i = percentage of total population in low-density census tracts in metropolitan area i
  • D%i = percentage of total population in high-density census tracts in metropolitan area i
Sprawl index values range between 0 and 100, with 100 representing the highest level of sprawl and 0 representing the lowest level of sprawl. In addition, the Joint Research Centre (JRC)’s degree of urbanization calculation indicates that from 1975 to 2015, the United States’ total built-up area increased from 80,417 km2 to 161,379 km2. Secondly, the application of VIIRS nighttime light data in large-scale urban extent mapping is not fully studied for the CONUS region, especially regarding the use of machine learning methods [7,32,37,39,56,57]. Thirdly, the CONUS covers an area of about 7.6 million km2 and contains various land cover types such as urban built-up areas, water, forests, grasslands, bare lands, croplands, wetlands, shrubs, and other land cover types. Fourthly, the urban extent data products for this region have abundant regional users from both the scientific research community and government agencies [24].

2.2. Definition of Urban Area

The definition of an urban area varies from different research perspectives [33,58,59]. For example, census-related urban studies refer mainly to population distributions while those using nighttime lights or multi-spectral data may be related to economic conditions or “built-up areas” (physical attributes of land surface) [9,10,23,31,57,60,61]. As the characteristics of all these definitions are correlated, most of the urban areas identified by relevant methods are consistent. However, significant inconsistencies remain among the urban extent data products, especially for the large differences in total urban areas that are partially caused by the different definitions of an urban area. Physical attribute-based urban extent data products have broader application potential including population analysis, economic research, disaster modeling, and environmental impact assessment. Therefore, this study employed the definition proposed by Schneider et al. [11]. That is, urban areas are defined by the physical attributes and land cover composition of the land surface: urban areas are places dominated by the built environment with a minimum mapping unit of 1 km by 1 km, which includes all non-vegetated, human-constructed elements such as buildings, roads, runways, etc. Here, ‘dominated’ implies the coverage of human-constructed elements is greater than 50% in a 1 km by 1 km area. Based on this definition, when vegetation, water, and other non-human-constructed elements cover most of a 1 km by 1 km area, that area will not be considered as an urban area, while any 1 km by 1 km area with over 50% built-up area, whether they are continuous or not, will be considered as an urban area in practice (Figure 2). Another reason to adopt this definition of an urban area in this study is that many recent research activities on urban extent mapping with satellite data used this physical attribute-based definition [10,11,16,32].

2.3. Data and Preprocessing

2.3.1. Satellite Data

Two types of intermediate-resolution satellite data were utilized in this study: (1) the nighttime light (NTL) data of the Day/Night Band (DNB) from the Visible Infrared Imaging Radiometer Suite on the Suomi National Polar-orbiting Partnership Satellite (NPP-VIIRS), and (2) the Normalized Difference Vegetation Index (NDVI) from the Moderate Resolution Imaging Spectroradiometer (MODIS) (Figure 3). The capability of MODIS daytime spectral reflectance and NDVI for urban extent mapping has been extensively demonstrated by many researchers [10,11,14,35,44,45,62]. However, as urban areas are spectrally similar to none or low vegetated non-urban areas, such as uncropped soils or bare lands [6,59], depending totally on MODIS daytime spectral data for urban extent mapping often leads to classification errors [57].
Nighttime lights are straight forward for applications in urban extent mapping as artificial lights in urban areas are easier to separate from the darker non-urban areas at night [63]. The most widely-used nighttime light data are the stable light data products from the Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) [34,42,43,64]. While DMSP nighttime light datasets provide a longer time series of nighttime light observations (1992–2013) than VIIRS and a method exists to deblur for the blooming [65], their applications are still affected by the coarser 1 km spatial resolution, light intensity saturation in urban areas, and intra-sensor calibration problems [66]. The nighttime light data provided by the VIIRS DNB band are superior to DMSP [67], with significant improvements including increased spatial resolution (15 arc-second, approximately 500 m), lower light imaging detection limits (~2 × 1011 W·cm2·sr1), and higher radiometric quantization (14 bit), thus providing the potential to better delineate urban areas and enhancing the capability to detect urban areas effectively [1,37,67]. Because nighttime light emissions are related to a number of factors including energy policies at country level, levels of access to electricity, and measurable levels of luminosity affected by varied economic conditions, solely depending on nighttime light data may not be able to produce accurate urban extent maps, especially at a continental-to-global scale [16,24,57].
Combination of VIIRS nighttime light data and MODIS daytime spectral data might overcome their individual limitations, thus achieving better performance in mapping urban extent than using MODIS or VIIRS data alone [16,32,46,68].
The Earth Observations Group (EOG) at NOAA/National Centers for Environmental Information (NCEI) is producing a version 1 suite of annual average radiance composite images using nighttime light data from VIIRS DNB, which was available only for 2015 at the time of this study (Figure 3 (Left)). Impacts from stray lights, lightning, lunar illumination, and cloud-cover have been filtered in this dataset, and gas flares have also been removed by referencing the gas flares’ locations accompanied with the nighttime light data provided by NCEI [69].
MODIS 16-Day 500 m (MOD13A1) vegetation index time series data are available from the open cloud platform Google Earth Engine (GEE) [70] at a global scale from 18 February 2000 to present. Removing cloud effects from MODIS images is critical when using the data for land cover mapping. The greenest pixel compositing method uses the maximum normalized difference vegetation index (NDVI) of a MODIS time series to composite. MODIS NDVI annual composite at 500 m for 2015 was created using the greenest pixel compositing method through GEE API and downloaded to our local server (Figure 3 (Right)), and co-registered with the VIIRS nighttime light data.

2.3.2. Reference Sample Data

For training and validating machine learning models, “ground truth” reference samples were collected for both urban and non-urban land cover types (e.g., forest, cropland, wetland, water, grassland, and bare land) based on high-resolution images (e.g., ESRI World Imagery) and visual interpretation for a given 1 km by 1 km area. Data recorded for each reference sample include the location of the sample site and its attribute (urban or non-urban).
Urban areas, unlike other land cover types, have special spatial characteristics. For example, in most urban areas, the core or central portions are covered by more human-constructed features (e.g., buildings and roads) and less natural features (e.g., vegetation) than the periphery portions; relatively smaller urban areas tend to have all human-constructed features distributed uniformly within their boundaries, while the density of human-constructed features in bigger urban areas tend to decrease from core or central portions to the periphery portions; urban areas located in the northeastern US contain more vegetation than those in the southwestern US. Therefore, during the process of reference sample data collection, we applied the stratified random sampling method to reduce sampling bias. That is, in addition to considering randomness when picking a reference sampling site, we also considered (1) the core-periphery spatial structure of urban areas to balance the number of urban area samples for the core urban areas and peripheral urban areas, (2) balancing the number of urban area samples for bigger urban areas and smaller urban areas, and (3) balancing the number of urban area samples for urban areas located in different geographic regions. Ignoring such characteristics of urban areas when collecting reference samples may make the accuracy assessment falsely high and, thus, untrustworthy (e.g., using urban area samples collected only from the core urban areas achieved a user’s accuracy of 99.77%, a producer’s accuracy of 98.86%, and an overall accuracy of 99.15% with a Kappa coefficient of 0.9820 for the machine learning methods, as core urban areas are easy to identify and almost all the inconsistencies among the available urban extent data products occur in the periphery portions of urban areas). For each reference sample site, as long as the surrounding 1 km by 1 km area contained more than 50% human-constructed features, it was recorded as an urban sample, and vice visa. As a result, small patches of urban forest or parks that account for less than 50% in a 1 km by 1 km area were also considered as urban areas.
A total of 2772 reference samples (1455 urban samples and 1317 non-urban samples) were collected for this study in two steps: (1) reference samples were collected solely for training the machine learning models - in total, 295 training samples (198 urban samples and 97 non-urban samples) were collected in this step; (2) after the urban extent data products were generated by each of the machine learning models, a new set of reference samples was collected solely for independent and rigorous accuracy assessment - in total, 2477 samples were collected in this step, including 1257 urban samples and 1220 non-urban samples (Figure 4).

2.4. Methods

Machine learning methods build mathematical models from training sample data to make predictions or decisions automatically and have been applied in intermediate-resolution remote sensing of urban extent. RF, GBM, and NN are three relatively mature and commonly-used machine learning methods in data analytics and have been increasingly applied to satellite image classifications for land cover mapping at different spatial scales and resolutions in recent years [11,16,49,51,71,72]. Machine learning ensembles are learning algorithms that construct a group of different classifiers and then classify the data by taking a weighted vote of the individual classifier predictions [73]. Such ensembles or model combinations are usually more accurate than a single classifier and were introduced to land cover mapping by Walsh in 2015 [51]. There have been no studies reported yet in the literature regarding the performance of RF, GBM, NN, and ESB in intermediate-resolution urban extent mapping with nighttime light and daytime satellite data as inputs.
To explore and compare the effectiveness of RF, GBM, NN, and ESB in mapping urban extent, exactly the same datasets were used as inputs, which include VIIRS nighttime light luminosity annual composite, MODIS NDVI annual composite, and the training reference samples. VIIRS nighttime light and MODIS NDVI were stacked together; therefore, at each 500 m pixel location, there is a 2-dimension vector:
Z (i, j) = (VIIRS (i, j), MODIS (i, j)),
where:
  • VIIRS (i, j) = VIIRS nighttime light luminosity at pixel (i, j)
  • MODIS (i, j) = MODIS NDVI at pixel (i, j)
The construction of the ESB is based on the outputs from the individual machine learning models [51,73]. As each of the three individual machine learning models uses the same reference samples and satellite data inputs, their predictions for urban and non-urban land cover types at unknown pixel locations are correlated inherently, which must be considered during the ensemble step to achieve better results. Linear stacking with the elastic net was used to address this issue through both ridge and lasso penalizations [74].
Figure 5 shows the entire workflow of this study. First, for each of the reference sample sites, the VIIRS luminosity value and MODIS NDVI value were extracted. Secondly, two-thirds of the 295 training reference samples were randomly selected to train the three individual machine learning models while the remaining one-third reference samples were used for regularizing the ESB weights and finding an optimal set of model weights that would not diminish the predictive performance of the ensemble [51]. Specifically, RF was set up using training parameters including out-of-bag (OOB) error, GBM was set up using training parameters including 10-fold repeated cross-validation and 5 repeats, NN was set up with one hidden layer and training parameters including 10-fold cross-validation, and the ESB was set up with a 10-fold regularized ensemble weighting. All these machine learning algorithms were implemented in R, which can be run on Linux, Windows, and Mac. Thirdly, the urban probability grids outputted by RF, GBM, NN, and ESB were classified into urban and non-urban using 0.95 probability as the threshold. At the last step, all the validation reference samples were overlaid with the four urban extent maps corresponding to RF, GBM, NN, and ESB, and their accuracies for correctly identifying urban areas were assessed and compared.

3. Results

3.1. New Urban Extent Maps for CONUS

After around 3 h of program execution, four urban extent maps at 500 m resolution were produced for CONUS 2015 (Figure 6a) corresponding to the four machine learning methods (RF, GBM, NN, and ESB). By visually analyzing these maps, it is observed that all the maps have correctly revealed the spatial patterns of urban area distributions in CONUS. Zoom-in detailed comparisons demonstrate that their differences occur mainly in the areas where urban and non-urban features mix and are located either in the peripheral areas or inner urban areas bordering with vegetation (e.g., big parks inside a city) (Figure 6b).

3.2. Comparing the Four Urban Extent Maps through Quantitative Accuracy Assessment

To rigorously compare and evaluate which urban extent maps generated by the four machine learning algorithms are more accurate, a quantitative accuracy assessment was conducted against the reference samples specifically collected for validation and accuracy assessment. While the overall accuracy (OA) is the commonly-used index for accuracy assessment, because of class sample imbalance and different performance of classification methods for different land cover types, it can be biased towards the majority classes, ignoring the minority classes [16]. Therefore, we generated the confusion matrices and calculated the producer’s accuracy, the user’s accuracy, and the Kappa coefficients for each of the four urban extent maps to fully assess the accuracies (Table 2).
The confusion matrices show that these four machine learning methods can achieve similar high accuracies across all accuracy metrics (>95% overall accuracy, >98% producer’s accuracy, and >92% user’s accuracy, Kappa coefficients > 0.90), which have not been achieved by existing data products, previous studies, and associated methods; the ESB of RF, GBM, and NN is not able to produce significantly better accuracies than the three individual machine learning methods; the total misclassified validation samples generated by GBM (121) are more than those generated by RF (107), NN (104), and the ESB (109) by 14%, 16%, and 11%, respectively, with NN having the least total misclassified validation samples. If we must pick the best and worst among these four machine learning methods, NN performs the best while GBM performs the worst based on their relatively low and high numbers of misclassifications and similar level of accuracies across all accuracy metrics.
The reasons for the ESB of RF, GBM, and NN not being able to outperform the individual machine learning methods could be that RF-, GBM-, and NN-based urban extent data products are mostly consistent; thus, none is adding significant new information to the others, or each of them has already achieved quite high accuracy (not many errors), or their outputs are correlated because they use the same data inputs, especially the satellite data. Considering the more computing resources needed for running the ESB and its inability to improve significantly on the accuracies relative to the individual machine learning methods, constructing an ensemble from RF, GBM, and NN appears unnecessary for such urban extent mapping applications.
The differences in accuracies for NN and RF are negligible while GBM apparently performs a little worse than both (with user’s accuracy of 1% lower than RF and NN, and misclassified validation samples of about 15% lower than RF and NN). Therefore, NN and RF should be better choices for intermediate-resolution urban extent mapping.
Additionally, the reason for NN and RF outperforming GBM could be that RF can break the correlation between individual base learner predictions, thus hopefully reducing the variance of final predictions, and NN is flexible and capable of effectively representing both structured and non-structured data (pixel values).

3.3. Comparing the New Urban Extent with Existing Data Products

The comparison between our newly created urban extent (year 2015) with other existing urban extent data products is to characterize their differences and confirm whether the new urban extent data perform well in delineating urban areas. Not all the datasets listed in Table 1 were available to us at the time of this study. We chose the available GRUMP urban extent (1 km for the year 1995) and GlobCover artificial surfaces (309 m for the year 2009) datasets and selected US cities for the comparison. As there were no urban extent data products available for the same year (2015), urban extent data for different nominal years were selected.
Figure 7 shows the comparison between our new urban extent and GRUMP urban extent. While these datasets are 20 years apart and comparing cannot discriminate between actual urban changes and improvements of the new methodology, considering urban extents for all cities in 1995 must be smaller than or the same as those in 2015 (actually most of US cities are sprawling based on the sprawl analysis [55]), it clearly shows that our new urban extent data product delineates more realistically the urban areas (boundaries between urban areas and non-urban areas) as the base map is dated between 2016 and 2017.
Figure 8 shows the comparison between our new urban extent (2015) and urban extent extracted from GlobCover (2009) for the class “artifical surfaces.” These two datasets are 6 years apart during which there were no significant urban expansions in Baltimore and Philadelphia urban areas by comparing Landsat images between 2009 and 2015. It appears that GlobCover detected only the core urban portions and missed the peripheral urban portions while our new urban extent data product correctly identified both urban portions, even though they used similar definitions of urban (urban areas >50%).

4. Conclusions

Urban land changes are related to sustainable development, environmental quality, public health, natural hazards, poverty, climate change adaptation, and other environmental and socioeconomic issues [75,76,77,78,79,80]. Satellite urban extent mapping provides a fundamental dataset for analyzing urban land changes and the relevant environmental or socioeconomic drivers and impacts [81,82].
Our results and analyses have demonstrated that both NN and RF can be used to create high-accuracy intermediate-resolution urban extent data products at large spatial scales when the complementary VIIRS nighttime light and MODIS daytime NDVI data are combined. Such data products can be used to update the existing intermediate-resolution urban extent data products with better accuracy, spatial resolution, and consistency (e.g., our NN-based 500 m urban extent data product for CONUS is being used to update the 23-years-old 1995 GRUMP urban extent at 1 km for CONUS region at SEDAC). In addition, these machine learning methods can be used to create annual urban extent data products based on the availability of VIIRS nighttime light and MODIS daytime spectral data or data from other similar satellites so that urban land change dynamics can be studied.

5. Discussion

These machine learning methods, especially NN and RF, have the potential to be applied to continental-to-global scale urban extent mapping at intermediate resolutions. However, as the characteristics of urban areas in other parts of the world might be different from the CONUS, further study is needed before these methods can be applied to other continents or globally. For example, there are recognized issues in using nighttime light satellite data to capture urban areas located in poorly lit countries like North Korea or African countries or because of the deliberate decision of some countries in Europe to rarely light urban areas [67] (it is known that when comparing the USA to European cities, American cities have five times more lights per capita than European cities). While incorporating daytime MODIS spectral data with VIIRS nighttime light can help to mitigate these issues, they may still have an impact on the machine learnings’ calibration and limit their performance when applying them directly to other continents or at a global scale. One possible solution to these issues is to train and calibrate the machine learning methods by continent; however, the effectiveness and accuracies that can be achieved can only be found out through further testing of other world regions. At the very least, these methods can be used to effectively map urban areas in the US and other developed or well-lit countries of the world.
Although several recent studies have explored the most suitable sensors and methods for mapping urban extent, and data products derived from high-resolution sensors (e.g., Landsat, Sentinel-1) and very high-resolution sensors (e.g., TerraSAR-X) are currently available, producing these kinds of data products is time consuming, especially when repeated mapping at large spatial scales is needed. Further, when applying such high-resolution urban extent data products to continental-to-global scale models and applications, the data usually needs to be resampled to lower resolutions because of the huge data volume involved and the computing resources needed. A recent study [30] based on 30 m Landsat-derived urban extent has demonstrated that 480 m resolution performs the best for urbanization process analysis. Therefore, improving the accuracy of intermediate-resolution urban extent mapping at 500 m with VIIRS nighttime light, MODIS daytime spectral data or other similar satellite data, and machine learning methods is still needed for data users even when there are open global urban extent products at relatively high resolutions.
As shown in the results and analyses, the major differences or inconsistencies between the urban extent data products generated by different machine learning methods or other studies are located mainly in the peripheral portions of urban areas or along the borders of built-up areas and vegetated urban areas (e.g., parks) where mixing pixels are mainly located. Therefore, introducing unmixing image analysis in the future may further help increase the urban extent mapping accuracies or provide another interesting data product to represent urbanization degree. For example, through unmixing, different percentages of built-up area coverage in a pixel can be extracted and, thus, a non-binary continuous urban extent data layer can be generated. This might be helpful for mapping low-density sprawl settlements and for some urban models or applications.
Urban land changes occurring on small scales may not be detectable at the relatively coarse resolution of VIIRS and MODIS. When such urban changes are important in relevant applications (e.g., local applications), higher resolution satellite data such as those from Landsat and Sentinel 2 need to be introduced and the effectiveness of machine learning methods in improving urban extent mapping at such scales can be explored.

Author Contributions

X.L. and A.d.S. conceived and designed the experiments; X.L. and Y.Z. performed the experiments; X.L. and Y.Z. analyzed the data; X.L. wrote the original draft of the paper, which was revised by X.L. and A.d.S.

Funding

The authors would like to acknowledge support under the National Aeronautical and Space Administration (NASA) contract NNG13HQ04C for the continued operation of the Socioeconomic Data and Applications Center (SEDAC).

Acknowledgments

The authors wish to thank Christopher D. Elvidge and Mikhail N. Zhizhin of NOAA National Centers for Environmental Information (NCEI) for use of their VIIRS nighttime light datasets, and thank Markus Walsh for providing the open R code on Github. We would also like to thank the three anonymous reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. United Nations. World Urbanization Prospects. 2014. Available online: https://esa.un.org/unpd/wup/publications/files/wup2014-highlight.pdf (accessed on 20 January 2018).
  2. Grimm, N.B.; Faeth, S.H.; Golubiewski, N.E.; Redman, C.L.; Wu, J.; Bai, X.; Briggs, J.M. Global change and the ecology of cities. Science. 2008, 319, 756–760. [Google Scholar] [CrossRef]
  3. Small, C. High spatial resolution spectral mixture analysis of urban reflectance. Remote Sens. Environ. 2003, 88, 170–186. [Google Scholar] [CrossRef]
  4. Mills, G. Cities as agents of global change. Int. J. Climatol. 2007, 27, 1849–1857. [Google Scholar] [CrossRef]
  5. Cohen, B. Urbanization in developing countries: Current trends, future projections, and key challenges for sustainability. Technol. Soc. 2006, 28, 63–80. [Google Scholar] [CrossRef]
  6. Deng, C.; Wu, C. BCI: A biophysical composition index for remote sensing of urban environments. Remote Sens. Environ. 2012, 127, 247–259. [Google Scholar] [CrossRef]
  7. Dou, Y.; Liu, Z.; He, C.; Yue, H. Urban land extraction using VIIRS nighttime light data: An evaluation of three popular methods. Remote Sens. 2017, 9, 175. [Google Scholar] [CrossRef]
  8. Seto, K.C.; Güneralp, B.; Hutyra, L.R. Global forecasts of urban expansion to 2030 and direct impacts on biodiversity and carbon pools. Proc. Natl. Acad. Sci. USA 2012, 109, 16083–16088. [Google Scholar] [CrossRef]
  9. Aubrecht, C.; Gunasekera, R.; Ungar, J.; Ishizawa, O. Consistent yet adaptive global geospatial identification of urban–rural patterns: The iURBAN model. Remote Sens. Environ. 2016, 187, 230–240. [Google Scholar] [CrossRef]
  10. Schneider, A.; Friedl, M.A.; Potere, D. A new map of global urban extent from MODIS satellite data. Environ. Res. Lett. 2009, 4, 44003–44011. [Google Scholar] [CrossRef]
  11. Schneider, A.; Friedl, M.A.; Potere, D. Mapping global urban areas using MODIS 500-m data: New methods and datasets based on ‘urban ecoregions’. Remote Sens. Environ. 2010, 114, 1733–1746. [Google Scholar] [CrossRef]
  12. Arino, O.; Gross, D.; Ranera, F.; Leroy, M.; Bicheron, P.; Brockman, C.; Defourny, P.; Vancutsem, C.; Achard, F.; Durieux, L.; et al. Globcover: ESA service for global land cover from MERIS. In Proceedings of the 2007 IEEE International Geoscience and Remote Sensing Symposium, Barcelona, Spain, 23–28 July 2007. [Google Scholar]
  13. Hu, X.; Weng, Q. Estimating impervious surfaces from medium spatial resolution imagery using the self-organizing map and multi-layer perceptron neural networks. Remote Sens. Environ. 2009, 113, 2089–2102. [Google Scholar] [CrossRef]
  14. Knight, J.; Voth, M. Mapping impervious cover using multi-temporal MODIS NDVI data. IEEE J.-Stars 2011, 4, 303–309. [Google Scholar] [CrossRef]
  15. Liu, X.; Hu, G.; Chen, Y.; Li, X.; Xu, X.; Li, S.; Pei, F.; Wang, S. High-resolution multi-temporal mapping of global urban land using Landsat images based on the Google Earth Engine Platform. Remote Sens. Environ. 2018, 209, 227–239. [Google Scholar] [CrossRef]
  16. Wang, R.; Wan, B.; Guo, Q.; Hu, M.; Zhou, S. Mapping regional urban extent using NPP-VIIRS DNB and MODIS NDVI data. Remote Sens. 2017, 9, 862. [Google Scholar] [CrossRef]
  17. Esch, T.; Heldens, W.; Hirner, A.; Keil, M.; Marconcini, M.; Roth, A.; Zeidler, J. Global Urban Footprint; MUAS 2015; ESA-ESRIN: Frascati, Italy, 2015. [Google Scholar]
  18. Pesaresi, M.; Ehrlich, D.; Ferri, S.; Florczyk, A.J.; Freire, S.; Halkia, S.; Julea, A.M.; Kemper, T.; Soille, P.; Syrris, V. Operating procedure for the production of the Global Human Settlement Layer from Landsat data of the epochs 1975, 1990, 2000, and 2014. Publ. Off. Eur. Union JRC Tech. Rep. 2774, 1–67, EUR 27741 EN. [Google Scholar]
  19. Guo, W.; Li, G.Y.; Ni, W.J.; Zhang, Y.H.; Lu, D.S. Exploring improvement of impervious surface estimation at national scale through integration of nighttime light and proba-v data. Gisci. Remote Sens. 2018, 55, 699–717. [Google Scholar] [CrossRef]
  20. Li, X.; Zhao, L.; Li, D.; Xu, H. Mapping Urban Extent Using Luojia 1-01 Nighttime Light Imagery. Sensors 2018, 18, 3665. [Google Scholar] [CrossRef] [PubMed]
  21. Balk, D.; Pozzi, F.; Yetman, G.; Deichmann, U.; Nelson, A. The distribution of people and the dimension of place: Methodologies to improve the global estimation of urban extents. In Proceedings of the Urban Remote Sensing Conference, International Society for Photogrammetry and Remote Sensing, Tempe, AZ, USA, 14–16 March 2005. [Google Scholar]
  22. Schneider, A.; Friedl, M.; McIver, D.; Woodcock, C. Mapping urban areas by fusing multiple sources of coarse resolution remotely sensed data. Photogramm. Eng. Remote Sens. 2003, 69, 1377–1386. [Google Scholar] [CrossRef]
  23. Balk, D.L.; Deichmann, U.; Yetman, G.; Pozzi, F.; Hay, S.I.; Nelson, A. Determining global population distribution: Methods, Applications, and Data. Adv. Parasitol. 2006, 62, 119–156. [Google Scholar]
  24. Center for International Earth Science Information Network (CIESIN)-Columbia University; CUNY Institute fir Demographic Research (CIDR); International Food Policy Institute (IFPRI); The World Bank; Centro Interncional de Agricultura Tropical (CIAT). Global Rural Urban Mapping Project, Version 1 (GRUMPv1): Urban Extent Polygons, Revision 01; NASA Socioeconomic Data and Applications Center (SEDAC): Palisades, NY, USA, 2017.
  25. Kabaria, C.W.; Gilbert, M.; Noor, A.M.; Snow, R.W.; Linard, C. The impact of urbanization and population density on childhood Plasmodium falciparum parasite prevalence rates in Africa. Malar. J. 2017, 16, 49. [Google Scholar] [CrossRef]
  26. Lloyd, C.T.; Sorichetta, A.; Tatem, A.J. High resolution global gridded data for use in population studies. Sci. Data 2017, 4, 170001. [Google Scholar] [CrossRef]
  27. McDonald, R.I.; Green, P.; Balk, D.; Fekete, B.M.; Revenga, C.; Todd, M.; Mongomery, M.; Gleick, P. Urban growth, climate change, and freshwater availability. Proc. Natl. Acad. Sci. USA 2011, 108, 6312–6317. [Google Scholar] [CrossRef]
  28. Hugo, G.; Champion, T. New forms of Urbanization: Beyond the Urban-Rural Dichotomy; Routledge: Abingdon, UK, 2003. [Google Scholar]
  29. McIntyre, N.E.; Knowles-Yánez, K.; Hope, D. Urban ecology as an interdisciplinary field: Differences in the use of “urban” between the social and natural sciences. In Urban Ecology; Springer: Boston, MA, USA, 2008; pp. 49–65. [Google Scholar]
  30. Wei, C.; Blaschke, T.; Kazakopoulos, P.; Taubenböck, H.; Tiede, D. Is Spatial Resolution Critical in Urbanization Velocity Analysis? Investigations in the Pearl River Delta. Remote Sens. 2017, 9, 80. [Google Scholar] [CrossRef]
  31. Potere, D.; Schneider, A. A critical look at representations of urban areas in global maps. GeoJournal 2007, 69, 55–80. [Google Scholar] [CrossRef]
  32. Sharma, R.C.; Tateishi, R.; Hara, K.; Gharechelou, S.; Iizuka, K. Global mapping of urban built-up areas of year 2014 by combining MODIS multispectral data with VIIRS nighttime light data. Int. J. Digit. Earth 2016, 9, 1–17. [Google Scholar] [CrossRef]
  33. ESA (European Space Agency). GlobCover 2009. 2011. Available online: http://due.esrin.esa.int/files/GLOBCOVER2009_Validation_Report_2.2.pdf (accessed on 18 January 2018).
  34. Elvidge, C.; Tuttle, B.; Sutton, P.; Baugh, K.; Howard, A.; Milesi, C.; Bhaduri, B.; Nemani, R. Global distribution and density of constructed impervious surfaces. Sensors 2007, 7, 1962–1979. [Google Scholar] [CrossRef]
  35. Bartholome, E.; Belward, A. GLC2000: A new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 2005, 26, 1959–1977. [Google Scholar] [CrossRef]
  36. Ma, T.; Zhou, Y.; Zhou, C.; Haynie, S.; Pei, T.; Xu, T. Night-time light derived estimation of spatio-temporal characteristics of urbanization dynamics using DMSP/OLS satellite data. Remote Sens. Environ. 2015, 158, 453–464. [Google Scholar] [CrossRef]
  37. Shi, K.; Huang, C.; Yu, B.; Yin, B.; Huang, Y.; Wu, J. Evaluation of NPP-VIIRS night-time light composite data for extracting built-up urban areas. Remote Sens. Lett. 2014, 5, 358–366. [Google Scholar] [CrossRef]
  38. Xie, Y.; Weng, Q. Updating urban extents with nighttime light imagery by using an object-based thresholding method. Remote Sens. Environ. 2016, 187, 1–13. [Google Scholar] [CrossRef]
  39. Wu, W.; Zhao, H.; Jiang, S. A Zipf’s law-based method for mapping urban areas using NPP-VIIRS nighttime light data. Remote sensing. Remote Sens. 2018, 10, 130. [Google Scholar] [CrossRef]
  40. Yao, Y.; Chen, D.; Chen, L.; Wang, H.; Guan, Q. A time series of urban extent in China using DSMP/OLS nighttime light data. PLoS ONE 2018, 13, e0198189. [Google Scholar] [CrossRef]
  41. Yu, B.L.; Tang, M.; Wu, Q.S.; Yang, C.S.; Deng, S.Q.; Shi, K.F.; Peng, C.; Wu, J.P.; Chen, Z.Q. Urban built-up area extraction from log-transformed NPP-VIIRS nighttime light composite data. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1279–1283. [Google Scholar] [CrossRef]
  42. Zhang, Q.; Seto, K.C. Mapping urbanization dynamics at regional and global scales using multi-temporal DMSP/OLS nighttime light data. Remote Sens. Environ. 2011, 115, 2320–2329. [Google Scholar] [CrossRef]
  43. Zhou, Y.; Smith, S.J.; Elvidge, C.D.; Zhao, K.; Thomson, A.; Imhoff, M. A cluster-based method to map urban area from DMSP/OLS nightlight. Remote Sens. Environ. 2014, 147, 173–185. [Google Scholar] [CrossRef]
  44. Salmon, B.P.; Olivier, J.C.; Kleynhans, W.; Wessels, K.J.; Bergh, F.V.D.; Steenkamp, K.C. The use of a multilayer perceptron for detecting new human settlements from a time series of MODIS images. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 873–883. [Google Scholar] [CrossRef]
  45. Wan, B.; Guo, Q.; Fang, F.; Su, Y.; Wang, R. Mapping US urban extents from MODIS data using one-class classification method. Remote Sens. 2015, 7, 10143–10163. [Google Scholar] [CrossRef]
  46. Guo, W.; Lu, D.; Wu, Y.; Zhang, J. Mapping impervious surface distribution with integration of SNNP VIIRS-DNB and MODIS NDVI data. Remote Sens. 2015, 7, 12459–12477. [Google Scholar] [CrossRef]
  47. Xue, X.Y.; Yu, Z.L.; Zhu, S.C.; Zheng, Q.M.; Weston, M.; Wang, K.; Gan, M.Y.; Xu, H.W. Delineating urban boundaries using Landsat 8 multispectral data and VIIRS nighttime light data. Remote Sens. 2018, 10, 799. [Google Scholar] [CrossRef]
  48. Zhang, X.; Li, P.; Cai, C. Regional Urban Extent Extraction Using Multi-Sensor Data and One-Class Classification. Remote Sens. 2015, 7, 7671–7694. [Google Scholar] [CrossRef]
  49. Godinho, S.; Guiomar, N.; Gil, A. Using a stochastic gradient boosting algorithm to analyse the effectiveness of Landsat 8 data for montado land cover mapping: Application in southern Portugal. Int. J. Appl. Earth Obs. Geoinf. 2016, 49, 151–162. [Google Scholar] [CrossRef]
  50. Ming, D.; Zhou, T.; Wang, M.; Tan, T. Land cover classification using random forest with genetic algorithm-based parameter optimization. J. Remote Sens. 2016, 10, 035021. [Google Scholar] [CrossRef]
  51. Walsh, G.M. New Cropland and Rural Settlement Maps of Africa. 2015. Available online: http://africasoils.net/2015/06/07/new-cropland-and-rural-settlement-maps-of-africa/ (accessed on 20 July 2017).
  52. Yuan, H.; Wiele, C.F.V.D.; Khorram, S. An automated artificial neural network system for land use/land cover classification from Landsat TM imagery. Remote Sens. 2009, 1, 243–265. [Google Scholar] [CrossRef]
  53. Li, X.; Zhou, Y. Urban mapping using DMSP/OLS stable nighttime light: A review. Int. J. Remote Sens. 2017, 38, 6030–6046. [Google Scholar] [CrossRef]
  54. US Census Bureau. 2010 Census of Population and Housing, Population and Housing Unit Counts; CHP-2-5; U.S. Government Printing Office: Washington, DC, USA, 2012; pp. 20–26.
  55. Lopez, R. Urban Sprawl in the United States: 1970–2010; Cities and the Environment (CATE). Loyola Marymount University: Los Angeles, CA, USA, 2014; Volume 7. Available online: http://digitalcommons.lmu.edu/cate/vol7/iss1/7 (accessed on 24 January 2018).
  56. Goldblatt, R.; Deininger, K.; Hanson, G. Utilizing publicly available satellite data for urban research: Mapping built-up land cover and land use in Ho Chi Minh City, Vietnam. Dev. Eng. 2018, 3, 83–99. [Google Scholar] [CrossRef]
  57. Wang, P.; Huang, C.; Brown de Colstoun, E.C.; Tilton, J.C.; Tan, B. Global Human Built-Up and Settlement Extent (HABSE) Dataset from Landsat; NASA Socioeconomic Data and Applications Center (SEDAC): Palisades, NY, USA, 2017.
  58. Dijkstra, L.; Poelman, H. A Harmonised Definition of Cities and Rural Areas: The New Degree of Urbanization; Directorate-General for Regional and Urban Policy, European Commission: Brussel, Belgium, 2014. [Google Scholar]
  59. Rozenfeld, H.D.; Rybski, D.; Gabaix, X.; Makse, H.A. The area and population of cities: New insights from a different perspective on cities. Am. Econ. Rev. 2011, 101, 2205–2225. [Google Scholar] [CrossRef]
  60. Uchida, H.; Nelson, A. Agglomeration Index: Towards a New Measure of Urban Concentration. In Urbanization and Development: Multidisciplinary Perspectives; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
  61. Weiss, D.J.; Nelson, A.; Gibson, H.S.; Temperley, W.; Peedell, S.; Lieber, A.; Hancher, M.; Poyart, E.; Belchior, S.; Fullman, N.; et al. A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature 2018, 553, 333. [Google Scholar] [CrossRef]
  62. Yang, F.; Matsushita, B.; Fukushima, T.; Yang, W. Temporal mixture analysis for estimating impervious surface area from multi-temporal MODIS NDVI data in Japan. ISPRS J. Photogramm. Remote Sens. 2012, 72, 90–98. [Google Scholar] [CrossRef]
  63. Elvidge, C.D.; Baugh, K.E.; Kihn, E.A.; Kroehl, H.W.; Davis, E.R. Mapping city light with nighttime data from the DMSP operational linescan system. Photogramm. Eng. Remote Sens. 1997, 63, 727–734. [Google Scholar]
  64. Sutton, P.C. A scale-adjusted measure of “urban sprawl” using nighttime satellite imagery. Remote Sens. Environ. 2003, 86, 353–369. [Google Scholar] [CrossRef]
  65. Abrahams, A.; Oram, C.; Lozano-Gracia, N. Deblurring DMSP nighttime light: A new method using Gaussian filters and frequencies of illumination. Remote Sens. Environ. 2018, 210, 242–258. [Google Scholar] [CrossRef]
  66. Elvidge, C.D.; Baugh, K.; Zhizhin, M.; Hsu, F.C. Why VIIRS data are superior to DMSP for mapping nighttime light. Proc. Asia-Pac. Adv. Netw. 2013, 35, 62–69. [Google Scholar] [CrossRef]
  67. Elvidge, C.D.; Baugh, K.; Zhizhin, M.; Hsu, F.C.; Ghosh, T. VIIRS night-time light. Int. J. Remote Sens. 2017, 38, 5860–5879. [Google Scholar] [CrossRef]
  68. Zhang, Q.; Schaaf, C.; Seto, K.C. The vegetation adjusted NTL urban index: A new approach to reduce saturation and increase variation in nighttime luminosity. Remote Sens. Environ. 2013, 129, 32–41. [Google Scholar] [CrossRef]
  69. NOAA NCEI. Version 1 VIIRS Day/Night Band Nighttime Light; 2018. Available online: https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html (accessed on 10 December 2017).
  70. Google Earth Engine. MOD13A1.005 Vegetation Indices 16-Day L3 Global 500 m. 2017. Available online: https://explorer.earthengine.google.com/#detail/MODIS%2FMOD13A1 (accessed on 6 December 2017).
  71. Nery, T.; Sadler, R.; Solis-Aulestia, M.; White, B.; Polyakov, M.; Chalak, M. Comparing supervised algorithms in land use and land cover classification of a Landsat time series. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016. [Google Scholar]
  72. Shelestov, A.; Lavreniuk, M.; Kussul, N.; Novikov, A.; Skakun, S. Exploring Google Earth Engine platform for Big Data processing: Classification of multi-temporal satellite imagery for crop mapping. Front. Earth Sci. 2017, 5, 1–10. [Google Scholar] [CrossRef]
  73. Zhu, Z. Ensemble Methods: Foundations and Algorithms; Chapman and Hall/CRC: Boca Raton, FL, USA, 2012. [Google Scholar]
  74. Gunes, F.; Wolfinger, R.; Tan, P. Stacked Ensemble Models for Improved Prediction Accuracy; SAS Institute, Inc.: Cary, NC, USA, 2017; In Proceedings of SAS Global Forum 2017, April 2–5, Orlando, FL, USA; Available online: http://support.sas.com/resources/papers/proceedings17/SAS0437-2017.pdf (accessed on 20 December 2017).
  75. Kareiva, P.; Watts, S.; McDonald, R.; Boucher, T. Domesticated nature: Shaping landscapes and ecosystems for human welfare. Science 2007, 316, 1866–1869. [Google Scholar] [CrossRef]
  76. Colstoun, E.C.; Huang, B.C.; Wang, P.; Tilton, J.C.; Tan, B.; Phillips, J.; Niemczura, S.; Ling, P.; Wolfe, R. Documentation for the Global Man-Made Impervious Surface (GMIS) Dataset from Landsat; NASA Socioeconomic Data and Applications Center (SEDAC): Palisades, NY, USA, 2017; pp. 1–13.
  77. Doherty, M.; Nakanishi, H.; Bai, X.; Meyers, J. Relationships between Form, Morphology, Density and Energy in Urban Environments; GEA Background Paper; CSIRO Sustainable Ecosystems: Canberra, Australia, 2009; pp. 1–28. [Google Scholar]
  78. Dewey, R. The rural-urban continuum: Real but relatively unimportant. Am. J. Sociol. 1960, 66, 60–66. [Google Scholar] [CrossRef]
  79. Atkinson, R. Atmospheric chemistry of VOCs and NOx. Atmos. Environ. 2000, 34, 2063–2101. [Google Scholar] [CrossRef]
  80. Loveland, T.R.; Reed, B.C.; Brown, J.F.; Ohlen, D.O.; Zhu, Z.; Yang, L.; Merchant, J.W. Development of a global land cover characteristics database and IGBP discover from 1 km AVHRR data. Int. J. Remote Sens. 2000, 21, 1303–1330. [Google Scholar] [CrossRef]
  81. Dorélien, A.; Balk, D.; Todd, M. What is urban? Comparing a satellite view with the demographic and health surveys. Popul. Dev. Rev. 2013, 39, 413–439. [Google Scholar] [CrossRef]
  82. Ogashawara, I.; Bastos, V.D.S.B. A quantitative approach for analyzing the relationship between urban heat islands and land cover. Remote Sens. 2012, 4, 3596–3618. [Google Scholar] [CrossRef]
Figure 1. The study area consisting of the 48 conterminous states in the US, with urban extent data layer from the Global Rural–Urban Mapping Project (GRUMP) overlaid with state boundaries.
Figure 1. The study area consisting of the 48 conterminous states in the US, with urban extent data layer from the Global Rural–Urban Mapping Project (GRUMP) overlaid with state boundaries.
Remotesensing 11 01247 g001
Figure 2. Illustration of urban and non-urban areas based on the definition of urban areas employed in this study. The size of the square is 1 km by 1 km: urban area contains more than 50% of built-up areas (left) while a non-urban area contains less than 50% built-up area (right).
Figure 2. Illustration of urban and non-urban areas based on the definition of urban areas employed in this study. The size of the square is 1 km by 1 km: urban area contains more than 50% of built-up areas (left) while a non-urban area contains less than 50% built-up area (right).
Remotesensing 11 01247 g002
Figure 3. (Left) An example of VIIRS nighttime light annual composite for the northeastern United States with stray light, lightning, lunar illumination, cloud-cover, and gas flares removed (urban areas are characterized by brighter pixel clusters); (Right) an example of MODIS NDVI annual composite for the northeastern United States with cloud contamination removed using the greenest pixel method (urban areas are characterized by NDVI pixel clusters with lower positive values).
Figure 3. (Left) An example of VIIRS nighttime light annual composite for the northeastern United States with stray light, lightning, lunar illumination, cloud-cover, and gas flares removed (urban areas are characterized by brighter pixel clusters); (Right) an example of MODIS NDVI annual composite for the northeastern United States with cloud contamination removed using the greenest pixel method (urban areas are characterized by NDVI pixel clusters with lower positive values).
Remotesensing 11 01247 g003
Figure 4. Urban and non-urban reference samples collected for the conterminous United States: (1) the cross “+” symbols indicate reference sample sites solely for training purposes, (2) the triangle “” symbols indicate reference sample sites solely for accuracy assessment. Sample sites of (1) and (2) were collected in two separate steps and are, therefore, totally independent.
Figure 4. Urban and non-urban reference samples collected for the conterminous United States: (1) the cross “+” symbols indicate reference sample sites solely for training purposes, (2) the triangle “” symbols indicate reference sample sites solely for accuracy assessment. Sample sites of (1) and (2) were collected in two separate steps and are, therefore, totally independent.
Remotesensing 11 01247 g004
Figure 5. The workflow for data processing and machine learning prediction of urban extent.
Figure 5. The workflow for data processing and machine learning prediction of urban extent.
Remotesensing 11 01247 g005
Figure 6. Urban extent maps generated by Random Forest (RF), gradient boosting machine (GBM), neural network (NN), and their ensemble (ESB) for CONUS 2015: (a) whole urban extent maps generated by the four machine learning algorithms (a1,a2,a3,a4), (b) zoom-in detailed comparison of the four urban extent maps (b1,b2,b3).
Figure 6. Urban extent maps generated by Random Forest (RF), gradient boosting machine (GBM), neural network (NN), and their ensemble (ESB) for CONUS 2015: (a) whole urban extent maps generated by the four machine learning algorithms (a1,a2,a3,a4), (b) zoom-in detailed comparison of the four urban extent maps (b1,b2,b3).
Remotesensing 11 01247 g006aRemotesensing 11 01247 g006b
Figure 7. Comparison between 2015 urban extent generated by NN and GRUMP urban extent 1995: (a1,b1,c1,d1) are GRUMP urban extent 1995 (Yellow) while (a2,b2,c2,d2) are NN-based urban extent 2015 (Red).
Figure 7. Comparison between 2015 urban extent generated by NN and GRUMP urban extent 1995: (a1,b1,c1,d1) are GRUMP urban extent 1995 (Yellow) while (a2,b2,c2,d2) are NN-based urban extent 2015 (Red).
Remotesensing 11 01247 g007
Figure 8. Comparison between 2015 urban extent generated by NN (Red) and GlobCover-extracted urban extent 2009 (Yellow), with the background imagery from 2017: (a) Baltimore; (b) Philadelphia.
Figure 8. Comparison between 2015 urban extent generated by NN (Red) and GlobCover-extracted urban extent 2009 (Yellow), with the background imagery from 2017: (a) Baltimore; (b) Philadelphia.
Remotesensing 11 01247 g008
Table 1. Major intermediate-resolution global urban extent or urban extent-related data products derived from satellite imagery available in the past decades.
Table 1. Major intermediate-resolution global urban extent or urban extent-related data products derived from satellite imagery available in the past decades.
Data Product/YearDefinition of Urban AreaResolution
Global Urban Built-Up Areas 2014 [32]Urban and built-up areas500 m
Global Rural–Urban Mapping Project 1995 (GRUMP v1) [24]Urban extent927 m
GlobCover 2009 (GlobCover) [33]Artificial surfaces and associated areas (urban areas >50%)309 m
MODIS Urban Land Cover 500 m (MODIS 500 m) ca 2001 [10]Areas dominated by a built environment (>50%), including non-vegetated, human-constructed elements, with the minimum mapping unit >1 km by 1 km463 m
Global Impervious Surface Area 2000–2001 (IMPSA) [34]Density of impervious surface area927 m
Global Land Cover 2000 (GLC2000) [35]Artificial surfaces and associated areas988 m
MODIS Urban Land Cover 1 km (MODIS 1 km) ca 2001 [22]Urban and built-up areas927 m
Table 2. Confusion matrices and accuracy assessment using “ground truth” reference samples: (a) random forest (RF)-based urban extent accuracy assessment, (b) gradient boosting machine (GBM)-based urban extent accuracy assessment, (c) neural network (NN)-based urban extent accuracy assessment, and (d) ESB-based urban extent accuracy assessment.
Table 2. Confusion matrices and accuracy assessment using “ground truth” reference samples: (a) random forest (RF)-based urban extent accuracy assessment, (b) gradient boosting machine (GBM)-based urban extent accuracy assessment, (c) neural network (NN)-based urban extent accuracy assessment, and (d) ESB-based urban extent accuracy assessment.
(a)
Classified DataReference DataTotalUser’s AccuracyKappa Coefficient
UrbanNon-Urban
Urban124090133093.23%
Non-Urban171130114798.52%
Total125712202477
Producer’s Accuracy98.65%92.62% 95.68% (Overall Accuracy)0.9121
(b)
Classified DataReference DataTotalUser’s AccuracyKappa Coefficient
UrbanNon-Urban
Urban1243107135092.07%
Non-Urban141113112798.76%
Total125712202477
Producer’s Accuracy98.89%91.23% 95.12% (Overall Accuracy)0.9023
(c)
Classified DataReference DataTotalUser’s AccuracyKappa Coefficient
UrbanNon-Urban
Urban123582131793.77%
Non-Urban221138116098.10%
Total125712202477
Producer’s Accuracy98.25%93.28% 95.80% (Overall Accuracy)0.9159
(d)
Classified DataReference DataTotalUser’s AccuracyKappa Coefficient
UrbanNon-Urban
Urban123688132493.35%
Non-Urban211132115398.18%
Total125712202477
Producer’s Accuracy98.33%92.79% 95.60% (Overall Accuracy)0.9119
Back to TopTop