Researchers in a range of fields including economics, sociology, and geography have developed urbanization typologies to categorize cities into different classes in order to understand their similarities and differences [1
]. One of the most common urbanization typologies is based on population size. However, in addition to a demographic transition, urbanization is a process that simultaneously involves the transition from a rural, agricultural-based economy to an industrial, services-based economy [6
], and land cover change from more natural ecological systems to the built environment [8
]. Therefore, figures on demographic changes alone are insufficient for providing information about other dimensions of urbanization. For example, urban areas may change their economic base from agriculture to manufacturing or services. In terms of urban form and structure, urbanization may result in the built environment becoming more linear, dispersed, or compact. Furthermore, not all urban areas may be rapidly growing: some are stable or shrinking. Thus, there are multiple dimensions that should be considered when considering typologies of urbanization.
Over the last decade, there has been an increase in the use of night-time light (NTL) data collected by the Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) to study urbanization. NTL data measure lights on Earth’s surface, including street lights, but the signal also captures gas flares, illuminated boats, and lit agricultural fields [10
]. NTL data have been shown to be correlated with urban economic activity [11
], population [14
], and the built environment [17
]. Thus, the data offer a unique opportunity to characterize urbanization from multiple dimensions. Furthermore, the long historical archive of NTL data starting in 1992 provides a valuable time series that could be used to explore the temporal dynamics of urbanization.
Although NTL images have been used in many urban applications, there has been no systematic assessment of whether, when, where and under what conditions these data succeed or fail to correctly identify different types of urbanization around the world. The purpose of this paper is to examine these issues explicitly and systematically. We first assess how well a time series of NTL data identifies the process of urbanization, defined as a process of simultaneous changes in land cover and the built environment, economic activity, and population density. Next, we examine how well the NTL data are able to characterize urbanization typologies for different regions of the world. We quantify errors of omission and commission for different urbanization typologies across different world regions. Finally, we discuss opportunities and challenges for using time series NTL data for urbanization studies based on the findings.
The remote sensing community has generated abundant information about urban areas in a single year [21
]. Among these studies, efforts to assess the accuracy of NTL data to identify urban extent show that DMSP/OLS NTL data often overestimate the illuminated area. This overestimation is the result of a combination of factors, including the large NTL pixel size, the capability of NTL to detect sub-pixel light sources, atmospheric effects, and geolocation errors [21
]. However, only a few studies monitor urbanization dynamics with time series NTL data [20
]. Although differing in their approaches and scopes, these analyses commonly suggest that time series NTL data have enormous potential to characterize urbanization processes at national, regional and global scales. However, to date, there has been no systematically and globally quantitative accuracy assessment of how well time series NTL data can identify different typologies of urbanization as processes. We ask the following research questions in this study: Does the time series NTL signature accurately capture the urbanization changes underway on the ground? When and where are NTL data able to correctly characterize urbanization? What are the underlying reasons for when NTL data succeed or fail to correctly characterize urbanization?
Our study differs from previous efforts in four respects. First, while the main foci of previous studies have been urban land extent, we use a multi-dimensional definition of urbanization and examine it as a simultaneous process of change in land cover and the built environment, economic activity, and population density. Second, we use time series NTL data and multiple Google Earth images to identify urbanization, rather than assessing a snapshot of urban activity in a single year. Third, while previous studies chose only a few select city or metro case studies, we randomly selected sample points distributed across all world regions, which enables us to compare urbanization processes across the world. Fourth, most prior studies used Landsat TM/ETM+ images to compare with NTL. Here, we use multi-temporal images from Google Earth, which combines a mix of different high spatial resolution satellite and aerial data from multiple providers, such as TruEarth®
15 m and GeoEye 0.5 m images, for a given area. Because of their high spatial resolution, Google Earth images provide more detailed information about land cover and land use than do Landsat images [30
]. These high resolution images enable us to make inferences on land uses and economic activities that are not possible with Landsat data. For example, detached residential housing, commercial space, and industrial activities are often distinguishable from each other in a Google Earth image but are not in a Landsat scene.
3. Results and Discussion
3.1. Overall Accuracy
Our stratified random sampling scheme to select points, which stratify first by region and second by intervals of NTL data values in the end year of the study period, covers most types of urbanization in each world region. Moreover, there is a high degree of consistency in interpretation of the ground truth images and NTL profiles from the six remote sensing experts (Table 3
). Among all sampling points, the six analysts disagreed on only 14 points (accounting for 6% of all sampling points).
Based on the sample points around the world, the overall accuracy of time-series NTL data in identifying urbanization typologies is 78.3%. Most of the failures are false positives; these account for 80.8% of the total number of errors. The overall sensitivity of our sample is 0.93, with 130 true positives and 10 false negatives. The high overall sensitivity value indicates that if urbanization is identified from Google imagery, there is a high likelihood (93%) that we can infer urbanization from the NTL profile. The overall specificity of our sample is only 0.58, based on 42 false positives and 58 true negatives. This specificity value indicates that 42% of the time (1-specificity), an analyst will infer urbanization from a NTL profile when there is an absence of actual urbanization (as identified from Google Earth images). In other words, there is a 42% probability of a false positive. Furthermore, the predictive value for a positive result (PV+) of 0.76 indicates that if the NTL profile has urbanization-related signatures, there is a 76% likelihood that pixel experienced urbanization. The predictive value for a negative result (PV−) of 0.85 indicates that there is 85% likelihood that the pixel did not experience urbanization when the NTL profile suggests an absence of urbanization-related signature.
Together, these indicators provide an assessment of the utility of time series NTL data for identifying different types of global urbanization. The key insight from these indicators is that if a location experienced urbanization (as identified by the Google Earth images), an analyst will infer urbanization from the NTL data profile 93% of the time, similar to a producer’s accuracy. However, 42% of the time, urbanization will be incorrectly inferred from the NTL profiles when no urbanization occurred on the ground, similar to a user’s accuracy. In other words, nearly half of the time (42%), the NTL data profiles will have false positives, thus leading to an overestimation of urbanization.
Next, we examine the underlying conditions (e.g., geography, type of urbanization) under which types of “successes” and “failures” were most likely to occur. This is a contribution to the scientific literature because previous studies did not quantify the conditions under which NTL data could identify urbanization. In order to do this, we divided the 16 regions into three groups based on the overall accuracy of each region compared with the overall world accuracy (Figure 3
). The overall accuracies of groups a, b, and c are respectively below, near, and above the overall world accuracy. The values and occurrences of the four accuracy indexes (sensitivity, specificity, PV+, and PV−) vary by region. The spatial distributions of correctly and incorrectly identified sampling points also vary according to region (Figure 4
). Based on the information presented in Figures 3
, Oceania has the highest overall accuracy (93.3%), followed by Northern America, Western-Southern-Northern Europe, China, and India (86.7%) although they have different numbers of false positives and false negatives. This indicates that time series NTL data are successful in identifying urbanization in developed regions, such as Oceania, Northern America, and Western-Southern and Northern Europe, where there is greater density of lights in urban and peri-urban areas than in non-urban or rural areas. In contrast, the overall accuracies in Central Asia and Northern Africa are low (60%), suggesting that time series NTL data do a poor job in identifying urbanization in less developed regions. Our hypothesis is that urban activities are less correlated with intensity and density of illumination in less developed regions, resulting in lower accuracies compared with developed regions.
3.2. Successes: True Positives and True Negatives
The more urban-related activities dominate the pixel, the higher the likelihood that NTL profile correctly identifies it as being urbanized. That is, NTL data are better able to correctly identify urbanization for pixels that contain more urban-related features and activities on the ground than those that contain fewer such features and activities. All urban core areas with high-intensity urban economic activities and peri-urban areas are correctly identified with the NTL data profile. Compared with NTL data for a single year, time series NTL data provide an account of the process of urbanization and the speed of change over the study period. Consequently, urbanization processes such as urban intensification and de-urbanization, and the trajectories and relative speeds of urbanization, can be determined from time series NTL data profiles. Taken together, time series NTL data, acting as diagnostic “spectral” profiles, can be used to determine urbanization typologies for every pixel.
We took the true positives and further analyzed them by applying a two-step clustering method [37
] which categorized the points into eight classes (Figure 5a
). Using the clustering result, we examined the NTL data profiles and the Google Earth images of these true positive points and then adjusted their categories according to the similarities of data value ranges and shapes of curves. That is, the more similar the data value ranges and shapes of NTL curves, the more likely they will be grouped into the same category. Based on the profiles of the eight classes, we generated stylized urbanization typologies (Figure 5b
). The stylized urbanization typologies include three constant levels of urban economic activity (high, medium, and low), four urbanization classes (urban intensification, rapid, moderate, and slow urbanization), and de-urbanization. Selected NTL data profiles and Google Earth images (Table 4
) illustrate the example patterns found. All NTL data curves for constant urban economic activities are essentially smooth, and the ranges of NTL data values for these typologies occupy >60 for high levels of activity, 55–60 for medium, and 40–45 for low. These thresholds are only approximate, depend on the region and the type of economic activities, and may be adjusted to better reflect the urbanization realities on the ground. Urban intensification means that urban economic activities existed in the start year and increased to a higher intensity later in the study period. This intensification usually occurs in the urban core areas and peri-urban areas. NTL profiles for rapid, moderate, and slow urbanization exhibit an increasing trend but differ in their degree and speed of increase, and reflect the range of urbanization trajectories across the world. De-urbanization occurs when the intensity of urban activities reduces over time, and is characterized by declines in NTL data values over the study period.
Here we synthesized eight classes through visualized interpretation with the guidance from a cluster analysis (Figure 5b
). However, this is only one way to generate the clusters. The number of unique urbanization typologies globally requires further study and is beyond the scope of this analysis. Testing of different clustering methods, such as the K-means algorithm, the iterative self-organizing data analysis technique (ISODATA), and the fuzzy c-means clustering algorithm (FCM), constitutes potential ways forward to obtain the minimum number of unique classes that maximizes the variation of the NTL dataset. Although our study assessed the utility of time series NTL data for identifying urbanization typologies without using an algorithm, this investigation represents the primary step towards understanding and distinguishing different urbanization typologies across different world regions.
Bearing in mind the number of correctly identified sample points (188), and taking a conservative approach to our conclusions, we summarized the general pattern of the spatial distribution of successes in identifying urbanization typologies (Table 5
and Figure 6
). Constant urbanization (high, medium, and low) took place across all regions of the world but occurred more in some places and less in others. Urban intensification occurred in Southern Africa, Western-Middle-Eastern Africa, China, Eastern Asia, Southeastern Asia, and Western Asia. In contrast, of the 13 true positives in Western-Southern-Northern Europe, there was only one case of urbanization intensification, and none in either Northern America or Oceania. All types of urbanization (rapid, moderate, or slow) took place across the world. China’s urban growth is dominated by high constant urban economic activities, rapid and moderate urbanization, and urbanization intensification. In particular, one fourth of rapid and moderate urbanization (5 out of 20 points) occurred in China. This corroborates the characteristic of contemporary China’s urbanization: explicit and tremendous dynamics of land cover, infrastructure, and economic activities [38
], which can be explicitly presented by NTL profiles.
3.3. Failures: False Positives and False Negatives
All regions have at least one type of failure. False positives and false negatives account for 80.8% and 19.2% of the total failures, respectively. Over-glow, which can be described as dim lighting detected from lights in surrounding areas through the scattering of lights in the atmosphere, is the major challenge of using time series NTL data to generate urbanization typologies, and 95% of false positive failures are due to over-glow from lights in surrounding areas of bright lighting.
Examples of NTL data profiles and Google Earth images of false negatives and false positives illustrate the circumstances of these errors (Table 6
). Analyzing all false positive points improved our understanding of the locations preferentially affected by over-glow. Topographic effects are evident, as fewer than 20% (8 of 42) of over-glow occurrences took place in mountainous areas. This is not only because mountainous areas have fewer lights, but also because flat, open areas are more affected by over-glow effects than mountainous areas. Also noteworthy is that not only do mega- or big cities cause over-glow, but some major traffic routes also contribute to over-glow in their surrounding areas.
One-fifth of the false negative failures took place in India. This not only accounts for 15% probability of committing an error of omission in this country, but also indicates highest likelihood (50%) among the 16 world regions that the pixel experiences urbanization when the NTL profile suggests an absence of urbanization-related signature. This confirms that the character of urbanization in India is different from that in China. One possible reason of false negatives errors is that the places have urban-related activities but with limited or no access to electric power [40
]. India’s power shortage is prominent and has threatened urban and industrial growth in the country [41
]. Of the 1.4 billion people of the world who have no access to electricity in the world, India accounts for over 21% [42
]. For those in India who have access to electricity, supply is often both intermittent and unreliable. In contrast, there are no false negative failures in developed regions, such as Oceania, Northern America, and Western-Southern-Northern Europe (Figure 4
). We hypothesize that in these highly industrialized regions, urban-related activities are strongly correlated with outdoor and street lighting. Consequently, NTL pixels that contain urban-related activities will always have high NTL profiles.
Our analysis shows that there are no false positives in India. This does not suggest that over-glow does not exist. False positives might exist beyond these sample points, as peri-urban areas are tightly connected to their urban areas, and these regions are characterized by relatively high densities of human settlements and infrastructure. Therefore, we count them as having urban-related activities, but in actuality they do not contain such activities. In contrast, false positives occurred more in less developed regions than in other regions, such as Northern Africa (6 out of 15) and Central Asia (5 out of 15). Examining the Google Earth images shows that these failures occurred in wide-open areas that lack infrastructure and human settlements within a 25 km2 area.
In this paper, we systematically examined the ability of NTL time series data to characterize urbanization and further analyzed cases of “successes” and “failures”. Our research fills an important knowledge gap by quantitatively assessing the ability of NTL data to characterize different types of urbanization.
First, accuracy assessment indicators provide a comprehensive picture of the utility of time series NTL data for identifying global urbanization typologies. The key insight is that there is a high likelihood (93%) that NTL time series will accurately identify these transitions in our global sample if urbanization occurred. This corroborates earlier studies that NTL data are successful at characterizing urbanization when urbanization actually occurred on the ground. We also found that 42% of the time, urbanization as inferred by the NTL data profile when no urbanization occurred, thus leading to an overestimation of urbanization.
Second, through examining the types of errors that were most likely to occur, the results indicate that most of the failures are false positives, accounting for 80.8% of the total errors, and that 95% of the false positives are due to over-glow from lights nearby, which is the major challenge of using time series NTL data to differentiate urbanization typologies. With well-calibrated and finer spatial resolution nighttime lights data, such as the new Visible Infrared Imaging Radiometer Suite (VIIRS) instrument, it may be possible to model, mitigate, and even remove over-glow using an atmospheric radiative transfer model. This would resolve a large percentage of the errors found in this study.
Third, the analysis identifies where and under what conditions the NTL data succeed or fail to characterize urbanization. The result illustrates that the more urban-related activities dominate a pixel, the higher the likelihood that NTL data correctly identify it as urbanized. It is worth mentioning that the study geographically shows the types of errors that are likely for the first time. There are no false negatives in developed regions, such as Oceania, Northern America, and Europe, and false positives are prevalent in developing regions, such as Northern Africa and Central Asia. Consequently, Oceania has the highest overall accuracy (93.3%), followed by Northern America, Western-Southern-Northern Europe, China, and India (86.7%) although they have different numbers of false positives and false negatives. These findings suggest that time series NTL data are successful in identifying urbanization in developed regions and do a comparatively poor job in less developed regions, suggesting the need to be cautious when using or interpreting NTL data in these areas.
Using time series NTL data (rather than single-year data) and treating them as continuous “spectral” profiles can improve the capability to identify trajectories of urbanization, because time series NTL data profiles represent the evolution of urban characteristics as captured by a single pixel through time. For this reason, generating consistent time series NTL data profiles will play a significant role in the study of the development of global urbanization typology over a longer period (1992 to present). After several years of data of acquisition, it will be possible to conduct a similar and comparative study on the newly available data from the VIIRS instrument, which continues the low light imaging measurements of the OLS, with substantial improvements in calibration, spatial resolution and levels of quantification.