Over 50% of the global population already lives in urban areas, and two-thirds of them are expected to live in urban areas by 2050 [1
]. Urban population growth and the associated socioeconomic development have caused intensive urban expansion [2
] and, in turn, greater poverty and environmental degradation [4
], which are posing significant challenges on sustainable development, disaster resilience, climate change mitigation, and environmental and resource management in urban areas [5
]. As a result, city governments may not be able to provide services for the increased population, elevated city energy use often leads to greater air pollution, and the risk of urban environmental hazards can be magnified [5
]. Studies have shown that the global urbanization rate hit about 54% in 2014 [1
], and urban land areas will grow 1.2 million km2
by 2030 if the current trend continues [8
]. To better understand the patterns, dynamics, drivers, and impacts of urban expansion and effectively support decision makings regarding sustainable urban development, it is fundamental to obtain timely, accurate, and consistent measurements of urban extent at large spatial scales [9
Satellite remote sensing has been widely used in urban area mapping since Earth observation satellite data became available [10
]. While high-resolution urban extent data products - including the Global Human Settlements Layer (GHSL), the Global Man-Made Impervious Surface (GMIS), the Global Human Built-Up and Settlement Extent (HBASE), and the Global Urban Footprint (GUF) derived from Landsat and TanDEM-X data - have recently become available [16
], intermediate-resolution urban extent data products are still valuable, especially for large-scale urbanization analysis. The reasons include (1) intermediate-resolution satellite images have proven effective in urban extent extraction at regional to global scales [10
] and will be more computationally efficient, (2) intermediate-resolution urban extent data products generated from satellite data such as the 1 km NASA Socioeconomic Data and Applications Center (SEDAC)’s Global Rural-Urban Mapping Project (GRUMP), MODIS 1km, and MODIS 500 m [10
] still attract many analyses and modeling users [23
], (3) considering that urban and rural areas are not necessarily discrete classes but more of a continuum [28
], intermediate-resolution data products may better reflect demographic and sociological conditions of urban areas [12
], which include not just built-up areas but the urban fabric of core urban areas and surrounding hinterlands and commuter-sheds, (4) broader definitions of what constitutes urban areas are useful for studies of urban morphology, energy use, climate change, and sustainability [14
], and also for research on rural agricultural systems where one may wish to exclude all but the smallest built-up areas, and (5) a recently published study has demonstrated that urban extent data products at 480 m resolution are more appropriate than those at the high resolution (30 m) for urbanization process analysis at large spatial scales [30
Although several global urban extent data products at intermediate resolutions are available currently (Table 1
), significant inconsistencies remain among them [10
]. For example, the total areas of global urban extent measured by IMPSA, MODIS 1km, MODIS 500 m, and GRUMP differ by 15% to 516% (572,000, 657,000, 727,000, and 3,524,000 km2
, respectively). While one of the reasons for these inconsistencies is that different groups and disciplines define urban areas somewhat differently [10
], these definitions are highly correlated. This calls into question the accuracy of each map’s depiction of urban areas. Based on Schneider’s assessment on these data products, their overall accuracies ranged from 73% (GRUMP) to 93% (MODIS 500 m). However, their producer’s accuracy (how often real urban areas on the ground are correctly shown on the classified map) is generally low (IMPSA and GLC2000 < 50%, MODIS 500 m and MODIS 1km around 75%, and GRUMP nearly 90%), and their user’s accuracies (how often the urban areas on the map are actually present on the ground) is also low (MODIS 500 m around 73%, GLC2000 and IMPSA are 66% and 65%) with the Kappa coefficients ranging from only 0.28 to 0.65 [10
]. More recent studies at intermediate resolutions [16
] reported overall accuracies from 73% to 99% for all urban and non-urban features, with Kappa coefficients from 0.29 to 0.84, and the producer’s accuracy around 80%, and user’s accuracy close to 90% for only urban features. Therefore, research is still needed to develop an intermediate-resolution urban extent mapping methodology that can achieve consistent high accuracies (overall accuracy, producer’s accuracy, user’s accuracy, and Kappa), is repeatable for different times, and is scalable to continental-to-global scale applications [7
Both nighttime light [7
] and daytime spectral satellite data [10
] have been studied in mapping urban extent, and they tend to be complementary for characterizing urban areas [16
]. Therefore, the combination of nighttime light and daytime spectral data has the potential to overcome their individual limitations. However, only a few former studies have combined nighttime light data and daytime spectral data [16
Machine learning methods have been demonstrated to perform well in land cover mapping [49
], and have been effective in urban area mapping in recent years [11
]. Schneider et al. employed a supervised decision tree algorithm (C4.5) with a one-year time series of MODIS 8-day composites of the seven land bands and the enhanced vegetation index (EVI), and produced the MODIS 500 m global urban extent data product [10
]. Hu and Weng estimated impervious surfaces from medium spatial resolution imagery using multi-layer perceptron neural networks [13
]. Zhou et al. developed a cluster-based method to map urban areas from the Defense Meteorological Program Operational Line-Scan System (DMSP/OLS) nightlight data [43
]. Wan et al. relied on the Terra MODIS surface reflectance datasets and a positive and unlabeled learning (PUL) method for mapping US urban extent [45
]. Zhang et al. applied the one-class support vector machine (OCSVM) to classify different combinations of the DMSP/OLS stable nighttime light (NTL) data, MODIS normalized difference vegetation index (NDVI) data, and land surface temperature (LST) data for regional urban extent mapping [48
]. Wang et al. used a back propagation neural network to identify urban areas in China with VIIRS nighttime light and MODIS NDVI data as inputs [16
]. Li et al. experimented with support vector machine (SVM) methods to extract urban extent from LJ1-01 and VIIRS nighttime light data [20
Random forest (RF), gradient boosting machine (GBM), neural network (NN), and their ensemble (ESB) are commonly used machine learning methods in land cover mapping but have not been fully assessed in urban area mapping, especially at intermediate resolutions. The objective of this study is to explore the effectiveness of these machine learning methods for improving the accuracies of large-scale urban extent mapping at intermediate resolutions (500 m) based on the combination of the complementary VIIRS nighttime light and MODIS daytime NDVI data.
3.1. New Urban Extent Maps for CONUS
After around 3 h of program execution, four urban extent maps at 500 m resolution were produced for CONUS 2015 (Figure 6
a) corresponding to the four machine learning methods (RF, GBM, NN, and ESB). By visually analyzing these maps, it is observed that all the maps have correctly revealed the spatial patterns of urban area distributions in CONUS. Zoom-in detailed comparisons demonstrate that their differences occur mainly in the areas where urban and non-urban features mix and are located either in the peripheral areas or inner urban areas bordering with vegetation (e.g., big parks inside a city) (Figure 6
3.2. Comparing the Four Urban Extent Maps through Quantitative Accuracy Assessment
To rigorously compare and evaluate which urban extent maps generated by the four machine learning algorithms are more accurate, a quantitative accuracy assessment was conducted against the reference samples specifically collected for validation and accuracy assessment. While the overall accuracy (OA) is the commonly-used index for accuracy assessment, because of class sample imbalance and different performance of classification methods for different land cover types, it can be biased towards the majority classes, ignoring the minority classes [16
]. Therefore, we generated the confusion matrices and calculated the producer’s accuracy, the user’s accuracy, and the Kappa coefficients for each of the four urban extent maps to fully assess the accuracies (Table 2
The confusion matrices show that these four machine learning methods can achieve similar high accuracies across all accuracy metrics (>95% overall accuracy, >98% producer’s accuracy, and >92% user’s accuracy, Kappa coefficients > 0.90), which have not been achieved by existing data products, previous studies, and associated methods; the ESB of RF, GBM, and NN is not able to produce significantly better accuracies than the three individual machine learning methods; the total misclassified validation samples generated by GBM (121) are more than those generated by RF (107), NN (104), and the ESB (109) by 14%, 16%, and 11%, respectively, with NN having the least total misclassified validation samples. If we must pick the best and worst among these four machine learning methods, NN performs the best while GBM performs the worst based on their relatively low and high numbers of misclassifications and similar level of accuracies across all accuracy metrics.
The reasons for the ESB of RF, GBM, and NN not being able to outperform the individual machine learning methods could be that RF-, GBM-, and NN-based urban extent data products are mostly consistent; thus, none is adding significant new information to the others, or each of them has already achieved quite high accuracy (not many errors), or their outputs are correlated because they use the same data inputs, especially the satellite data. Considering the more computing resources needed for running the ESB and its inability to improve significantly on the accuracies relative to the individual machine learning methods, constructing an ensemble from RF, GBM, and NN appears unnecessary for such urban extent mapping applications.
The differences in accuracies for NN and RF are negligible while GBM apparently performs a little worse than both (with user’s accuracy of 1% lower than RF and NN, and misclassified validation samples of about 15% lower than RF and NN). Therefore, NN and RF should be better choices for intermediate-resolution urban extent mapping.
Additionally, the reason for NN and RF outperforming GBM could be that RF can break the correlation between individual base learner predictions, thus hopefully reducing the variance of final predictions, and NN is flexible and capable of effectively representing both structured and non-structured data (pixel values).
3.3. Comparing the New Urban Extent with Existing Data Products
The comparison between our newly created urban extent (year 2015) with other existing urban extent data products is to characterize their differences and confirm whether the new urban extent data perform well in delineating urban areas. Not all the datasets listed in Table 1
were available to us at the time of this study. We chose the available GRUMP urban extent (1 km for the year 1995) and GlobCover artificial surfaces (309 m for the year 2009) datasets and selected US cities for the comparison. As there were no urban extent data products available for the same year (2015), urban extent data for different nominal years were selected.
shows the comparison between our new urban extent and GRUMP urban extent. While these datasets are 20 years apart and comparing cannot discriminate between actual urban changes and improvements of the new methodology, considering urban extents for all cities in 1995 must be smaller than or the same as those in 2015 (actually most of US cities are sprawling based on the sprawl analysis [55
]), it clearly shows that our new urban extent data product delineates more realistically the urban areas (boundaries between urban areas and non-urban areas) as the base map is dated between 2016 and 2017.
shows the comparison between our new urban extent (2015) and urban extent extracted from GlobCover (2009) for the class “artifical surfaces.” These two datasets are 6 years apart during which there were no significant urban expansions in Baltimore and Philadelphia urban areas by comparing Landsat images between 2009 and 2015. It appears that GlobCover detected only the core urban portions and missed the peripheral urban portions while our new urban extent data product correctly identified both urban portions, even though they used similar definitions of urban (urban areas >50%).
These machine learning methods, especially NN and RF, have the potential to be applied to continental-to-global scale urban extent mapping at intermediate resolutions. However, as the characteristics of urban areas in other parts of the world might be different from the CONUS, further study is needed before these methods can be applied to other continents or globally. For example, there are recognized issues in using nighttime light satellite data to capture urban areas located in poorly lit countries like North Korea or African countries or because of the deliberate decision of some countries in Europe to rarely light urban areas [67
] (it is known that when comparing the USA to European cities, American cities have five times more lights per capita than European cities). While incorporating daytime MODIS spectral data with VIIRS nighttime light can help to mitigate these issues, they may still have an impact on the machine learnings’ calibration and limit their performance when applying them directly to other continents or at a global scale. One possible solution to these issues is to train and calibrate the machine learning methods by continent; however, the effectiveness and accuracies that can be achieved can only be found out through further testing of other world regions. At the very least, these methods can be used to effectively map urban areas in the US and other developed or well-lit countries of the world.
Although several recent studies have explored the most suitable sensors and methods for mapping urban extent, and data products derived from high-resolution sensors (e.g., Landsat, Sentinel-1) and very high-resolution sensors (e.g., TerraSAR-X) are currently available, producing these kinds of data products is time consuming, especially when repeated mapping at large spatial scales is needed. Further, when applying such high-resolution urban extent data products to continental-to-global scale models and applications, the data usually needs to be resampled to lower resolutions because of the huge data volume involved and the computing resources needed. A recent study [30
] based on 30 m Landsat-derived urban extent has demonstrated that 480 m resolution performs the best for urbanization process analysis. Therefore, improving the accuracy of intermediate-resolution urban extent mapping at 500 m with VIIRS nighttime light, MODIS daytime spectral data or other similar satellite data, and machine learning methods is still needed for data users even when there are open global urban extent products at relatively high resolutions.
As shown in the results and analyses, the major differences or inconsistencies between the urban extent data products generated by different machine learning methods or other studies are located mainly in the peripheral portions of urban areas or along the borders of built-up areas and vegetated urban areas (e.g., parks) where mixing pixels are mainly located. Therefore, introducing unmixing image analysis in the future may further help increase the urban extent mapping accuracies or provide another interesting data product to represent urbanization degree. For example, through unmixing, different percentages of built-up area coverage in a pixel can be extracted and, thus, a non-binary continuous urban extent data layer can be generated. This might be helpful for mapping low-density sprawl settlements and for some urban models or applications.
Urban land changes occurring on small scales may not be detectable at the relatively coarse resolution of VIIRS and MODIS. When such urban changes are important in relevant applications (e.g., local applications), higher resolution satellite data such as those from Landsat and Sentinel 2 need to be introduced and the effectiveness of machine learning methods in improving urban extent mapping at such scales can be explored.