Global Harmonization of Urbanization Measures: Proceed with Care

By 2050, two-thirds of the world’s population is expected to be living in cities and towns, a marked increase from today’s level of 55 percent. If the general trend is unmistakable, efforts to measure it precisely have been beset with difficulties: the criteria defining urban areas, cities and towns differ from one country to the next and can also change over time for any given country. The past decade has seen great progress toward the long-awaited goal of scientifically comparable urbanization measures, thanks to the combined efforts of multiple disciplines. These efforts have been organized around what is termed the “statistical urbanization” concept, whereby urban areas are defined by population density, contiguity and total population size. Data derived from remote-sensing methods can now supply a variety of spatial proxies for urban areas defined in this way. However, it remains to be understood how such proxies complement, or depart from, meaningful country-specific alternatives. In this paper, we investigate finely resolved population census and satellite-derived data for the United States, Mexico and India, three countries with widely varying conceptions of urban places and long histories of debate and refinement of their national criteria. At the extremes of the urban–rural continuum, we find evidence of generally good agreement between the national and remote sensing-derived measures (albeit with variation by country), but identify significant disagreements in the middle ranges where today’s urban policies are often focused.


Introduction
The United Nations forecasts that, by 2050, two-thirds of the world's population will live in cities and towns, well above today's level of 55 percent, with further increases in store [1]. Rigorous, scientifically justifiable assessments of the changes in urban population and in the extents and forms of the built-up environment will be needed to face the challenges of the coming century, namely, urban and rural economic development [2][3][4][5][6], poverty alleviation and inequality [7][8][9], disease transmission [10,11], health and health disparities [12,13], carbon emissions [14,15] and climate change [16][17][18][19][20]. These issues demand rigorous investigation across countries and over time-but scientific studies of urban change have often been thwarted by the heterogeneities of national urban definitions [1,21]. Even within the "statistical urbanization" camp, in which urban definitions are couched solely in terms of population density, contiguity and size, there exist divergent views on the proper measurement of urban population and land area [22] and, arguably, an over-reliance on population density as the definitive indicator of urban locations. Recognizing that density measures alone are likely to be inadequate for many purposes, researchers are now beginning to explore the prospects for merging satellite-derived data with on-the-ground socioeconomic measures drawn from spatially disaggregated population censuses and surveys [20,[22][23][24][25][26]. A necessary first step in this emerging program of research is to assess areas of agreement and disagreement between global remotely-sensed and country-specific urban concepts and measures.
Although many researchers have lamented the non-comparability of country-specific definitions, there has perhaps been an insufficient appreciation of how these definitions have evolved to reflect national priorities and economic development strategies. Taking a comparative approach, this research study examines three countries that vary significantly in their urban criteria-the United States, Mexico and India. We aim to achieve a better understanding of how the definitions adopted by these countries compare with classifications derived from remotely sensed data [27][28][29][30][31]. Such differences have seldom been systematically quantified and evaluated from the shared perspectives of the scientific communities concerned with local governance, demography, land cover and remote sensing. In particular, given its endorsement in March 2020 by the United Nations Statistical Mexico presents a more complicated case. Especially over the 2000-2012 period, national policies and funding strongly encouraged the development of large-scale, low-density housing in urban peripheries, motivated by the desire to improve access to housing for poor and lower-middle-income groups [41]. These well-meaning policies did not cover the costs of land acquisition, an oversight that led to the development of rural plots of land that were only later converted to urban plots on a fragmentary, case-by-case basis. Consequently, many of Mexico's new low-income communities in the peripheries have lacked essential public services and adequate infrastructure, including transport to more centrally located sites of employment. For much of the time-span of our research study-before the initiation of post-2013 policy reforms-the contrasting situations of the U.S. and Mexico are well summarized by [41] (p. 31): The resulting sprawl of Mexican cities is different from suburbanization in the United States during the 1960s and 1970s, where middle-class households moved to suburbs for more space with better amenities and schools. Instead, urban growth in Mexico has been connected to the fissure between new, peri-urban developments and more central neighborhoods in terms of the provision of infrastructure and services (including health and education), connectivity, access to sources of employment and urban amenities.
Judged in terms of built-up percentages and population density alone, the patterns of urban sprawl in Mexico and the U.S. might appear to be quantitatively similar, but any such similarities are superficial; the inhabitants of urban peripheries in these two countries differ in their socioeconomic standing, access to public services and connections to employment.
India occupies a distinctive position in the set of study countries. According to the country's official criteria, its urban percentage in 2011 was only 31.2 percent, well under half the percentages of Mexico and the U.S. at the time. However, in the Indian context, the meaning of "urban" is jurisdictional: urban populations are defined in terms of the people living within the boundaries of legally urban local governments. (A partial exception is made for census towns and outgrowths, which are legally rural despite having urban-like features in terms of density, size and the extent of non-agricultural employment. A legally rural settlement is eligible to be designated as a census town if it has a population expected to be 5000 or above in the upcoming census, an expected density of 400 persons per km 2 and if 75% of the male main workforce is likely to be engaged in non-agricultural activities. Outgrowths are areas of high-density, arguably urban settlement that are spatially adjacent to statutory cities and towns and which would thus seem to be poised on the threshold of becoming legally urban. In the meantime, however, they too continue to be governed by rural local governments.) Before approving the transition of a rural settlement to legally urban status, Indian state-level authorities must weigh the fiscal costs-loss of access to dedicated, relatively plentiful rural development funds-against the prospects of securing commensurate urban-dedicated funds, which have proven difficult for smaller cities and towns to obtain [42,43].
Perhaps as a consequence, a substantial proportion of India's rural population lives in large, dense, legally rural villages that elsewhere might be accorded urban status. According to [44][45][46], in 2011 there were 155,732 Indian villages with at least 1000 population and a density of 400 persons per km 2 or more, accounting for almost 80 percent of the country's total rural population. The high densities of these villages raise concerns about the adequacy of sanitation and other services that would normally be addressed through urban infrastructure programs. High rural population densities also present a challenge to density-centered "statistical" definitions of urban.
Some researchers contend that the tight linkage between India's jurisdictional criteria and its fiscal system causes the country's urban percentages to be seriously understated [47]. A more sympathetic view is that urban entities in India are regarded as being fundamental units of decentralized governance, a position that was formalized in India's constitutional amendments of 1993 [43]. Since then, the country has struggled to find an operationally sustainable model for the local end of the urban governance tier.
As this brief account suggests, in the three study countries much recent policy attention has been directed to urban peripheries, smaller urban centers, and larger rural settlements -all of which are situated in the middle ranges of the urban-rural continuum. In these spaces, remotely-sensed methods may well excel in producing rigorous estimates of built-up land and population densities in fine spatial detail. However, for the foreseeable future, such estimates will need to be supplemented with measures of socioeconomic composition, service provision and governance if the densities are to be properly interpreted. In the near term, accepting that remotely-sensed methods will not soon provide persuasive evidence of socioeconomic variation within small spatial units, such measures can at least identify terrain that is rapidly developing over the years between censuses and prompt on-the-ground investigation by governments into the adequacy of services and transport. In this way, the new methods can help alert local authorities to the fast-paced changes underway in their own and nearby jurisdictions.

Indian Settlements and Within-Town
Wards-Settlement-specific summaries of the population and socioeconomic characteristics of all officially defined Indian settlements covered in its 2011 Census-rural villages, statutory (i.e., legally urban) towns and their wards and the quasi-urban census towns and outgrowths-have been placed in the public domain by the Registrar General of India and are readily accessible via the national census website (https://censusindia.gov.in/2011-Common/CensusData2011.html, accessed on 1 April 2021). Unfortunately, the boundaries of 2011 Indian settlements (and withintown wards) are not yet publicly available. We have relied on a comprehensive collection of settlement boundaries in vector format, originally assembled by the private firm ML Infomap Ltd. (New Delhi, India), whose license allows the data to be displayed in research products but not redistributed (for discussion, see [25]).

Mexican Basic Geographic
Areas-Mexico's official definition of urban begins with the identification of distinct settlements termed localities, which are classified as urban localities if the settlement contains 2500 or more inhabitants (see the account of [41], pp. 106-107). Each urban locality is spatially subdivided into basic geostatistical areas (AGEBs), which are groups of blocks delimited by streets, avenues, sidewalks, or other easily identifiable construction, in which land is used mainly for occupational, industrial, service provision, or commercial purposes. They are further constrained in size by the requirement that a single census enumerator should be able to canvass an AGEB. The boundaries of AGEBs are typically drawn so as to extend just beyond the current built-up area. Boundary adjustments are then made after each census to keep pace with local land development. All other land area outside urban localities is defined as rural and is spatially divided into rural AGEBs, within which rural localities (villages) are situated. To locate urban AGEBs spatially, we have relied on detailed boundary files that the National Institute of Statistics, Geography and Informatics (INEGI) has placed in the public domain (http:// en.www.inegi.org.mx/temas/mg/#Downloads, accessed on 1 April 2021).

United States Census
Blocks-Not unlike Mexican AGEBs, U.S. census blocks are delineated by both man-made and physical characteristics of the landscape, such as roads and rivers as well as legal and administrative boundaries in some instances. They can differ greatly in area and total population, ranging from zero to several hundred people in cities. As [38] explains, in the 2010 U.S. census an urbanized area (UA) was defined as a contiguous set of blocks each having a population density >1000 people per mi 2 and, when taken together, a total population in excess of 50,000. Urban clusters (UCs) were defined as a core set of contiguous census blocks with densities greater than 1000 people per mi 2 but a total population across such blocks of 2500-49,999 persons. Any blocks in close proximity (within 2.5 miles) to UAs and UCs were defined to be urban if their population density exceeded 500 people per mi 2 (a complex algorithm defines proximity; it allows short "hops" and "skips" to connect otherwise discontiguous units). Also defined as urban are some categories of land in industrial and commercial use-non-residential blocks mainly covered by impervious surfaces (pavement, parking lots, and airports) within 0.25 miles of populated urban blocks of UAs and UCs. The boundaries, populations and urban-rural status of all 2010 U.S. census blocks are accessible in the public domain (https://www.census.gov/programs-surveys/geography/ technical-documentation/complete-technical-documentation/tiger-geo-line.html, accessed on 1 April 2021). For this research study, we used the 2014 version, whose imagery was assembled some 4 years after the population censuses of Mexico and the U.S. and 3 years after the Indian census. The GHSL team applies machine-learning methods to identify the presence or absence of structures at a resolution of 30 m 2 . Individual structures are not themselves identified, only the proportion of the grid cell occupied by one or more structures. The model has been trained not to mistake roads for structures-it does not currently identify roads as such-and does not (yet) distinguish residential from non-residential structural coverage [30,37,[48][49][50]. Apart from a 30 m 2 water mask, we used the 250 m 2 aggregated version of GHSL-BUILT, which provides the proportion of each grid-cell that is built-up.

Gridded Population of the World and Global Human Settlement
Population Layer-Two intermediate products serve to link GHSL-BUILT to the classification system of the Degree of Urbanization model. The Gridded Population of the World (GPW) ( [55], Version 4.11) is based exclusively on sub-national administrative unit boundaries and total unit populations. To produce the Global Human Settlement Population Grid (GHS-POP), a dasymetric refinement method is applied to reallocate GPW population counts for administrative units to the grid cells within the unit boundaries, according to the finer-resolution, cell-specific proportions built-up as estimated by GHS-BUILT. As [56] note, "The benefit of GHS-POP is that it restricts population to built-up areas and makes its density directly proportional to the density of built-up areas (Freire et al., 2015). However, …, population may be allocated to 'non-residential' areas such as commercial, industrial and recreational areas". The assumption of direct proportionality between population and built-up densities is possibly too strong, although, admittedly, no compelling alternative is yet in hand. Alternative methodological solutions are explored in [57,58]; see [59] for a review of such approaches. Another concern warranting attention is that, due to the limited detection accuracy of GHS-BUILT in thinly settled rural areas, it is possible that GHS-POP over-concentrates administrative-unit population in the more built-up areas of the unit, which is likely to produce overestimates of urban populations relative to rural population when these data are incorporated into the DoU algorithm. Despite this limitation at the rural end of the continuum, recent studies have shown that in urban and urbanized areas, GHS-POP can provide accurate estimates of pixel-level population [52,53,56,60].

The Degree of Urbanization
Model-The Degree of Urbanization (DoU) model further refines GHS-POP by assigning settlement types on the basis of the spatially located population sizes and densities of GHS-POP. Along with other inputs, these data enable the construction of a seven-fold classification of urban and rural settlements [61]. (An eighth category identifies inland open water.) Table 1 presents a simplified version of these seven classes; see [61] for discussion of additional contiguity criteria not spelled out in the table. Note, in particular, the range of settlement types in the band of population density from 300 to 1500 persons per square kilometer; within this band, some difficulties can be anticipated in any effort to cleanly separate semi-dense urban clusters and suburban or peri-urban settlements from rural clusters. Taken together, the seven classes describe the full range of the urban-rural continuum insofar as population density, contiguity and size are concerned.
Additional grid cells qualify for inclusion in the urban centres of the DoU model if they are at least 50% built-up. As [61] (p. 18) explain, "This assumption is useful for accommodating the presence in the city of large areas with low resident inhabitants but strongly functionally linked with the city, as for example large productive or commercial areas (typical case of cities in Unite(d) States of America". A built-up density threshold of 3% was applied to add grid cells to the other three urban categories (dense urban cluster, semi-dense urban cluster and suburban or peri-urban), with the rationale being given by [61] (p. 19): "Grid cells are included in the urban cluster domain only if some minimal (evidence) of physical built-up structure was recorded by an independent source (with) respect to census data. The purpose of this assumption is to increase the robustness of the GHSL SMOD response by forcing consistency between census-derived sources (population grids) and land cover/land use sources (built-up areas) mitigating the effect of misalignment, thematic bias, scale gaps or other data gaps that may be present in the data". In short, although the principal density criterion employed in the DoU is population density, a role for built-up density is integrated as well.
The implementation of the DoU depends crucially on the spatial resolution of the GPW administrative unit boundaries containing the unit populations. In the DoU global data set, fine-resolution spatial data are used for the United States and Mexico (census blocks and AGEBs), but only moderate-resolution subdistricts for India (https://sedac.ciesin.columbia.edu/downloads/docs/gpw-v4/gpw-v4documentation-rev11.pdf, accessed on 1 April 2021). Using restricted-distribution, settlement-level data for India in Supplementary Materials, we illustrate how the resolution of the population data can affect DoU classifications.

Data-Processing-
We have conducted all data-processing steps using the Python geoprocessing capabilities of ArcGIS 10.6.1. The comparisons of vector boundary data (corresponding to settlement and within-town ward boundaries in India, the boundaries of AGEBs in Mexico and those of census blocks in the U.S.) to raster data were based on the overlap of vector units with raster cell centroids and employed zonal statistics geoprocessing functions. Vector and raster data for India were projected to Asia South Albers Equal Area Conic; for Mexico, the system was North American Albers Equal Area Conic; and for the U.S., USA Contiguous Albers Equal Area Conic.

Comparing Official Urban-Rural Classes with GHS-BUILT and the DoU
In the final DoU model, the fundamental role of built-up density (that is, the density of structures) is somewhat obscured by the intervening boundary and population layers used in constructing the model. In addition, the dependence of DoU classifications on country-specific administrative boundaries and population counts may limit the application of the DoU method to immediate post-censal periods and to countries that place their boundaries and population counts into the public domain. Hence, there is some value to be gained in a direct assessment of GHSL-BUILT settlement proportions in relation to official urban-rural designations, since the remote-sensing and land classification programs operate independently of census data-collection and, in the future, can be expected to place new estimates in the public domain on a more frequent basis.
To explore an approach based on built-up densities, we have specified a threshold of τ = 50 percent built-up to identify potentially urban land (recall that thresholds of 50% and 3% built-up are applied in classifying grid cells in the DoU model, in addition to population densities). Appendix A provides an extensive discussion of threshold choice in what can be viewed as a simple diagnostic test, whereby built-up densities above and below the threshold in a given grid cell serve as an (imperfect) signal of the official urban-rural status of the cell.
The entries of Table 2 provide the terms we use in what follows to describe the diagnostic test outcomes in relation to the official urban-rural classification (selected results based on alternative τ thresholds are provided in Supplementary Materials).
In extending this approach beyond land classification to address population densities and totals, we have adopted a conventional areal weighting approach [62,63] to distribute population uniformly within the boundaries of each individual census unit of land (settlements or within-city wards in India, AGEBs in Mexico and census blocks in the U.S.). The uniformity assumption may be acceptable within very small census units but has the potential to underestimate population in the highly built-up sections of any given unit and overestimate population in its less built-up sections. However, we do not expect significant errors in the average estimates of population densities, nor do we anticipate systematic biases overall.
Official Urban-Rural Classifications and the DoU-Similar overlay methods were used to reallocate official population counts to grid cells and thus compare officially classified census units with the more refined seven-fold classification of the DoU. The maps of each of these input layers are shown in Figure 1 for one major city and its surrounding areas in each of the study countries. A visual scan of these examples suggests generally good agreement between the official urban-rural designations and the Degree of Urbanization (DoU) classification. However, on closer inspection, areas of nuanced disagreement and ambiguity come into view. For example, in the northern part of Nassau County (to the east of New York City proper), a relatively small patch of land is officially rural, but, in the DoU classification, the rural extents are considerably larger and are intermixed with suburban or peri-urban areas. Significant discrepancies are also seen to the south of New York City in Monmouth county (New Jersey), most of which is officially urban yet DoU-classified as rural. The DoU classes (and their descriptive labels) could be regarded as helpful refinements of binary urban-rural official designations, or, when they are not in obvious agreement with such designations, could be taken as an invitation to engage in more detailed critical analysis. Table 3 summarizes the findings on land area by urban-rural class. Note first that, by official urban criteria, India and the U.S. devote approximately equal percentages of habitable land to urban areas (3.4% and 3.6%, respectively); at 1.2%, Mexico's allotment is much lower (for international comparisons, see [64]). Of all urban land in India (whether in statutory towns, or in the outgrowths and census towns that, while legally rural, are treated as urban in India's official tabulations), about 73 percent is occupied by statutory towns, some 3 percent by outgrowths and the remaining 24 percent by census towns (figures not shown). The rural governments that have census towns in their jurisdictions do not have direct access to the urban development funds that could otherwise be used to improve infrastructure and sanitation, nor, in general, can they impose urban rates of property taxation to raise needed revenues.

Classifications of Land
The rural agreement category-using the terminology of Table 2, estimated from a crosscomparison of official census land and GHSL built-up levels-is nearly identical to the figures for officially rural land. This close correspondence means that the sum of the rural, but built-up, urban, not built-up and urban agreement categories is also quite close to the officially urban land class. Of all officially urban land area, 58% in Mexico and 41% in the U.S. is found in the urban agreement category. However, in India, only 11% of officially urban land is classified as urban agreement; the remainder falls into the urban, not built up category. In other words, the great majority of officially urban land in India is estimated to be less than half built-up. If this seems surprising and scarcely credible, it may be that our perceptions of high densities in Indian cities are formed mainly by impressions of high population densities, rather than by the settlement proportions being described here.
For India, the DoU classes are distributed differently from what can be seen in Mexico and the United States. The most rural category of the DoU, very low-density rural, occupies a notably smaller share of Indian habitable land (78.1%) than the 94.8% of land in Mexico and 91.9% in the United States. (The DoU category labels refer mainly to population density, not [or at least not directly] to the settlement proportion. In addition, recall that GHSL-BUILT estimates of built-up land in sparsely-settled rural areas-on which the DoU system is based-may well be downwardly biased.) Hence, the remaining rural DoU classes in India account for higher percentages of land than in the other two countries. Another difference worth noting is that, among the three most urban of the DoU land classes (urban centres, dense urban and semi-dense urban), the urban centre class takes a significantly larger shares of land in Mexico (63 percent of these three categories of urban land) and especially the U.S. (72 percent) than in India (only 57 percent). Figure 2 confirms that built-up densities of officially urban land in India fall well below the densities of urban land in Mexico and the United States (panel (a)). As panel (b) shows, the difference is mainly attributable to the low densities of the urban, not built-up areas. There is close agreement among the three countries in the mean built-up percentages of the urban agreement and rural agreement groups and in the rural, but built-up categories; only in the urban, not built-up category does India appreciably diverge from Mexico and the United States in having lower built-up levels on average. Because we define "built-up" in terms of a τ = 50 percent threshold, the averages for urban agreement and rural, but built-up must exceed 50 percent and the averages for urban, not built-up and rural agreement must fall short of that threshold.

Built-Up Density by Urban-Rural Category-
What is striking is the extent to which these categories depart from the threshold value.
Perhaps the most surprising results are those for the built-up densities by DoU class. Figure  2c shows a consistent ranking of the three countries across all DoU classes. The land density percentages in the U.S. are the highest in each class, followed by those of Mexico and India, with a clear pattern of decreasing densities in Mexico and the U.S. as one moves down the urban-rural continuum. However, in India, suburban or peri-urban, semi-dense urban and dense urban areas exhibit essentially the same settlement proportions, which is noticeably lower than that found in the urban centres of the country and of course higher than the three rural DoU classes. Figure 3 displays one measure of agreement/disagreement between the official urban designations and those derived from the DoU model: the percentages of all grid cells in a given DoU class that are officially urban. At the most rural end of the DoU spectrum, the two classification schemes are in general agreement. However, even in the rural clusters class, differences begin to emergein the United States, over 55% of grid cells in this DoU class were officially urban-and, moving up the urban-rural continuum in India and Mexico from suburban or peri-urban to urban centres, a lack of consistent agreement with the DoU becomes evident.

Official Urban Designations by DoU Class-
India presents an array of difficulties for the DoU conception of the urban-rural continuum, not only in defining urban settlements in jurisdictional terms as we have discussed, but also in the percentage of urban-designated land that is less built-up. Figure 4 illustrates the case of New Delhi and its surrounding areas. The areas depicted in yellow and red, when taken together, make up the land officially designated as urban. Evidently, a significant percentage of urban land-even in India's capital-is less than 50 percent built-up. The composition of urban land near Mexico City-see Figure 5-also includes areas of less built-up land, especially in the smaller urban areas to the west of Mexico City; but as can be seen, Mexico City itself is dominated by built-up land. As with Mexico City, Figure 6 shows that New York City proper (i.e., the five counties that jurisdictionally comprise the city) is nearly all built-up, but, as in New Delhi and to a lesser degree Mexico City, the surrounding areas are officially urban but not majority built-up. In these images (20 km is their common scale), the amount of officially urban land is much larger in the New York metro area-although at a range of built-up land densities-than in the other two regions.

Population Shares and Densities
We now turn attention to population and population density, illustrated in Figure 7. This figure shows the spatial distribution of population in New Delhi, Mexico City and New York City and their surrounding areas. For context, the United Nations estimated the 2020 population of the greater urban agglomerations of these cities to be 30.3, 21.8 and 18.8 million, respectively ([1], File 22). The relatively high density of population in India in areas outside the capital, many of which are officially rural (see Figure 1), is clearly evident. Table 4 displays the total population size and percentage share for each officially designated urban-rural category, the official-GHSL cross-classification and the seven-fold DoU classes. The census-based classification indicate that 77.8% and 80.8% of the population is officially urban in Mexico and the U.S., respectively. As is often remarked-upon, India's official urban share at the time of its 2011 census, based on legal jurisdictional criteria with additional consideration of census towns and outgrowths, was only 31.3 percent. The official-GHSL cross-classification (built-up threshold 50%), shows that 57-58% of the populations of Mexico and the U.S. lived in places of urban agreement in 2010-areas that were defined as urban by the census and in which built-up density was 50% or greater-whereas in India, only 10.7% of the total population lived in such areas. In all three countries, just over one-fifth of the population (20-24%) lives in areas that were officially urban yet less than 50% built-up. It should be added that, in India, these urban, not built-up areas are home to some 248 million people (20.5% of the population), a total that exceeds the number of urban residents of the United States. In the U.S., where 22.8% of people live in the urban, not built-up areas, this percentage has been the result of urban expansion (or "sprawl") into areas occupied by relatively few people but which account for large shares of land [28].
The more detailed DoU classification place a greater share of India's population (24.5%) and fewer U.S. residents (47.4%) in urban centres than are found in areas of urban agreement (a detailed analysis of these differences will be presented in Figure 8 with accompanying discussion). The distribution of the population living in the middle-range urban categories of dense urban, semi-dense urban and suburban or peri-urban varies considerably across the three countries in ways that do not seem obviously attributable to differences in national urban percentages overall. With only 3.2% of India's population residing in areas classified as suburban or peri-urban, it seems likely that the country's large, dense villages must be distributed among this class and the rural and, possibly, even the low-density rural classes. Table 5 presents the differences in population density by urban classification and country setting. We observe sizable differences across countries in the official urban census-based estimates; India and Mexico have very high urban population densities (>3300 people per km 2 ) with much lower densities evident in the officially urban areas of the U.S. (only 886 people per km 2 ). As for rural areas, India's are, respectively, 24-and 33-times denser than the rural areas of Mexico and the United States. In the census-GHSL crossclassification (τ = 50% built up), we observe, in areas of urban agreement, very high levels of population density in India at 10,283 persons per km 2 and high levels in Mexico (5336 km 2 ), but substantially lower levels in the U.S. (less than 1600 persons per km 2 ). Interestingly, population densities in urban agreement areas in the U.S. are less than or similar to those found in areas classified as urban, not built-up in India (2549) and Mexico (1538), respectively However, the ratios of population densities of these two classes (urban agreement:urban, not built-up) are similar in all three countries, ranging between 3.5 and 4.
Highlighting differences between countries, population densities of the rural, but built-up areas of India, Mexico and the U.S. are 1101,168 and 57 persons per km 2 , respectively. This suggests two (not mutually exclusive) possibilities: Even small, officially rural settlements may be very densely populated in India; or there may exist many legally rural settlements that might qualify by other criteria to be considered urban. Although the population share of the rural, but built-up areas is quite small, there is good reason to think that these built-up rural locations are likely to transition to census-urban over time; see Conclusions for discussion. Areas classified as rural agreement (that is, officially rural and below the τ = 50 percent threshold) are much less dense in population but still exhibit notable variation across countries-266, 11 and 8 persons per km 2 in India, Mexico and the U.S., respectively. These results indicate that, while there is general consistency among countries with regard to ratios of densities by urban class, the density levels of the study countries are quite different.
For the more refined DoU classification (Table 5), we see some curious variations along the urban-rural continuum. Surprisingly, India falls below Mexico in the population densities of urban centres and has lower densities than both Mexico and the U.S. in the dense urban and semi-dense urban categories. This is certainly not expected and may signal some difficulties in applying the DoU approach to urban India. (Recall that the DoU model takes no account of differences in official urban definitions across countries. However, the administrative units used for India are coarser than those employed in the DoU models of the U.S. and Mexico; see the Supplementary Materials.) Further investigation of DoU classification performance is clearly warranted in this case, not least because India accounts for about one-seventh of the world's population.

A Combined Perspective
In this section, we assess how the DoU classes are composed, by tracing their connections to both the official-GHSL cross-classifications and to the binary urban-rural official designations. The "alluvial plots" of Figure 8 illustrate these three-way linkages, which are expressed in terms of population percentages. The colors of the figure correspond to the seven DoU classes that are shown on the far right of each plot; by following (from right to left) the color flow and the lines (and labels) that demarcate classes, one can trace the composition of these DoU classes (the Supplementary Materials provide the detailed percentages). The colors of the DoU classes are as follows: urban centres are shown in red; dense urban in brown; semi-dense urban in a lighter brown; suburban or peri-urban in yellow, marking the divide between urban and rural classes in the DoU scheme; rural population is indicated in dark green; low-density rural in a lighter green; and very lowdensity rural in the lightest of the green shades (the color scheme suggested in the DoU documentation).
As the figure indicates, urban centres and very low-density rural areas-the extremes of the urban-rural continuum in the DoU model-tend to agree with the official census-designated layers and census-GHSL cross-classifications. The percentage of the population in the lowdensity rural and rural DoU classes in India aligns fairly well with the rural agreement class, although a small portion of India's DoU-rural, low-density rural and even very low-density rural classes originate in census-designated urban populations of the urban, not built-up type. The origins of the DoU suburban or peri-urban class in India are one part urban and the other part rural. Curiously, the semi-dense urban and dense urban DoU classes derive mainly from census-rural populations. However, the urban centre class largely originates in census-urban populations, although a significant percentage of this DoU class is traceable back to census-rural populations. In short, there are a number of cross-over cases in India (whereby urban DoU classes stem from census-rural populations, or vice-versa) that call for further analysis. On the whole, such cross-overs are less evident in Mexico, although there certainly exist linkages worth investigating between DoU-rural and low-density rural populations having origins in census-urban populations. Much the same can be said for the United States, where both DoU-rural and low-density rural populations can also be traced back to census-urban origins.
Looking across the three countries, it is evident that the urban, not built-up class of officially urban populations living on less than 50% built-up land is highly mixed in terms of its representation in multiple urban and rural DoU classes. In setting priorities for further investigation, we would place these urban, not built-up areas at the top of the priority list.
It is certainly possible that the DoU algorithm successfully refines and correctly labels this important segment of national populations-which, let us recall, accounts for over one-fifth of the total population in each of the study countries-but it is also possible that further tuning of the algorithm will be needed to prevent misclassification.
Likewise, the urban agreement cross-classified GHSL-official category is somewhat mixed, contributing to several urban DoU classes and making small contributions to two or more DoU rural classes. Note that, in India, the population of DoU urban centres is roughly twice that of the urban agreement class; in other words, a substantial share of the urban centre population lives on land that is less than half built-up. This can occur in the DoU algorithm for land areas that meet the population size and population density criteria; the 50% built-up test is only applied to grid cells that do not meet these population criteria. A similar if less pronounced pattern is also evident in Mexico.
We should again emphasize that apparent inconsistencies in classification do not in themselves cast doubt on the DoU method. Instead, they may reveal that urban forms are simply too varied to be forced into national, binary, urban-rural containers. Nor is it obvious that apparent mis-classifications in India necessarily contain any lessons for Mexico and the United States-what is considered urban in one country may not be regarded as urban in another. Another complicating factor needs to be mentioned: Rural-like areas can be found amid surrounding urban terrain, such as when natural amenities akin to parks lie within otherwise fully urban areas, a situation that may be expressed in what appears to be misclassification. Finally, as shown in the Supplementary Materials, all these results are dependent on the chosen thresholds of built-up percentages used in both the DoU and the census-GHSL cross-classification and are also dependent on the spatial resolution of the input population data.

Discussion
In exploring several ways to represent rural-urban classes from population-based, landbased and combined perspectives, we have conducted what is, in effect, one independent evaluation of the DoU classification system, which can be considered the first global data product that represents the full range of the urban-rural continuum. The DoU system is well conceived and, given its purpose, is appropriately free of country-specific urban-rural designations. From one point of view, it can be seen as a diagnostic tool that advances our understanding of official urban differences among countries, with the potential to produce insights that could eventually prompt revision of the official statistics and improve planning and economic development programs [65,66]. Before such benefits can materialize, however, users must first understand precisely how the DoU model departs from countryspecific urban conceptualizations. Only then can such a global schema be taken up by countries with differing urban definitions or, indeed, with no urban-rural designations at all.
Our results reveal some similarities among country-specific and harmonized models at the extremes of the urban-rural continuum. In the middle ranges, however, there exist inconsistencies that need to be resolved. Areas that we term urban, not built-up have an especially high priority for further investigation. If country differences are most pronounced in this heterogeneous span of the continuum-because, say, growth in population and land-convergence is most likely to occur here, or because these spaces present challenges to transportation or climate adaptation policies-then no global classification model is likely to capture all important local variation. Modern technology, data and methods enable the creation of valuable, scientifically comparable new global datasets-but such datasets must inevitably sacrifice some aspects of local knowledge. Researchers and policy-makers need to be aware of the DoU limitations and constraints to fully understand its fitness for use. The point applies more generally: For all such urban-proxy research programs, both dataproducers and the growing community of users stand to benefit from sustained engagement and critical feed-back.
Our research study and findings are subject to several limitations that should be borne in mind. First, it goes almost without saying that the more accurate is the detection of built-up land, the more reliable is the integration of the land perspective in urban definitions. The GHSL-BUILT method is known to face detection and classification challenges in lowdensity rural areas; improving its performance is the focus of much current research [51,58]. Second, there are concerns about variation in the resolution and units of the population data. As an illustration for India in the Supplementary Materials shows, quite different patterns of DoU classification emerge when the population census data are used at a settlement-level resolution rather than in the coarser subdistrict resolution of the global DoU dataset. Third, as discussed in Appendix A, the appropriate selection of built-up thresholds-which affect not only the GHSL-official cross-classification, but also the composition of the urban classes of the DoU model-needs to be established more rigorously than it has been in the literature. A fourth difficulty has to do with the technique by which population data are allocated spatially; there is potential to further improve the accuracy of the population statistics relative to the dasymetric methods used here. On this point, it has been shown that at least in the global South, the dasymetric refinement of populations using multivariate approaches typically outperforms the simpler GHSL-POP method by integrating additional explanatory variables and implementing different allocation techniques [57,58]. Although the development of advanced methods for population allocation is an active field of research [59,67,68], this on-going research has yet to engage in spatially-explicit studies of placebased characteristics in areas marked by agreement and disagreement between alternative urban classifications. Therefore, it is not yet known whether disagreement is more likely to be found in the peripheries of large or small cities, or in more or less economically developed subnational regions, etc. Focused analyses would help to improve future urban modeling.

Conclusions
The production of harmonized, globally-comparable statistical measures of urbanization is a remarkable scientific achievement. Even in their first-generation versions, GHSL-BUILT and the Degree of Urbanization model exhibit the potential to revolutionize understanding of country-specific urban definitions; night-time lights and similar remotely-sensed measures might add even further value [32,[69][70][71][72][73][74][75]. In particular, radar-based methods for measuring urban vertical expansion and building volume are likely to complement and extend our understanding of the multiple dimensions of urban growth [76,77]. When they are combined with population and socioeconomic data, such models can produce much-needed, finely-detailed portraits of urbanization along the full urban-rural span. There are excellent nearterm prospects for even deeper integration of social and demographic data than the Degree of Urbanization model has achieved. In many countries, data to be released in the 2020 round of national population censuses will be more spatially detailed, with more easily accessible boundaries, than was the case at any point in the past. Valuable demographic resources are coming into the public domain; they must be enlisted in the harmonization effort.
While the globally harmonized approaches are being improved, it is vitally important not to overlook the country-specific systems already in place, however difficult they may be to formally integrate. In countries such as India that have adopted jurisdictional criteria to define urban places, legal boundaries delineate the spaces within which local governments possess the authority to act, whether to improve environmental sustainability, safeguard health, or intervene in any number of areas that are high priorities in the Sustainable Development Goals. In the United States, which has fully embraced statistical conceptions of urbanization and expunged jurisdictional names and boundaries from its urban definitions, jurisdictions nevertheless play a central and enduring role in regional policies and governance (see, for example, [78] on core-city populations and their surrounding metropolitan areas). Jurisdictional spaces may never be detectable in remotelysensed data, but they retain local and national significance. The harmonized measures derived from such data can, however, indicate where urban-like densities of population and structures spill across jurisdictional boundaries to present multiple local and higher-level governments with coordination challenges. As the balance of sustainable development effort shifts from problem description to action and intervention, both jurisdictional and harmonized measures will need to be kept firmly in view.
As our analysis has indicated, much population and land area is located in places that are not obviously either urban or rural. One study of spatial reclassification of urban and rural areas from 1990-2010 found that such areas (in the U.S. case, rural but built-up areas) are likely to be reclassified as urban in the next census [79]. We suspect that dynamics such as these will be found to characterize other countries and urban-rural classes. If the pattern seen in the U.S. proves to hold more generally, there may be a land-cover basis for devising spatial forecasts of urbanization for applications ranging from city planning to climate modeling.
In closing, we urge that rural and urban localities be studied not in isolation, but as members of systems, with recognition of and appreciation for their multiple interconnections. A recent and otherwise sophisticated literature on the delineation of specific urban areas, focusing on the use of high-resolution satellite data in high-income, data-rich countries, has surprisingly restricted attention to the identification of individual cities rather than zooming out to survey the wider urban-rural context [80,81]. As the other papers in this Special Issue of Remote Sensing demonstrate, the bird's-eye view provided by remote sensing can contribute not only new measures of urban-rural connections and change, but also new insights with the potential to shape a theory encompassing such disparate units [82][83][84]. In the lead-up to the adoption of the Sustainable Development Goals, the High-Level Panel [85] argued forcefully for a holistic view of urban-rural-environmental systems and warned of the risks of narrowly urban-specific or rural-specific perspectives: The post-2015 agenda must be relevant for urban dwellers. Cities are where the battle for sustainable development will be won or lost. Yet the Panel also believes that it is critical to pay attention to rural areas, where three billion near-poor will still be living in 2030. The most pressing issue is not urban versus rural, but how to foster a local, geographic approach to the post-2015 agenda.
To advance toward this long-term goal, the necessary first steps are to achieve an accurate description of the constituent local units and to situate them along an urban-rural continuum -but these are only the first steps.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material. urban; if B i < τ, it is diagnosed as officially rural. This test outcome is then checked against the official urban-rural status U i , of the cell (1 for urban, 0 for rural), using an overlay of urban vector data (polygons) on the GHSL built-up raster. Table 2 in the main text provides verbal labels for the joint probability distribution of the built-up binary indicator and the official urban binary indicator (for example, the probability of urban agreement is Pr((B i ≥ τ) ∩ (U i = 1))). To go deeper in the exploration of τ values, it is helpful to recast the diagnostic problem in terms of conditional probabilities. A good test would have a high conditional probability Pr(U i = 1|B i ≥ τ) and also a high probability Pr(U i = 0|B i < τ), both of which must depend on the level of the τ threshold that is used.
These are the equivalents in the present case of what would be termed "true positives" and "true negatives" in a medical testing context. As the τ threshold increases, Pr(U i = 1|B i ≥ τ) is likely to increase, but Pr(U i = 0|B i < τ) is apt to decrease. Thus, as τ is varied over a range of built-up densities, a generally negative relationship is traced out between the two probabilities.
Two additional probabilities also need consideration, which have to do with the "sensitivities" of the test, i.e., Pr(B i ≥ τ|U i = 1) (the likelihood that an officially urban grid cell is diagnosed as such by the threshold criterion) and Pr(B i < τ|U i = 0) (the likelihood that an officially rural cell is diagnosed as rural via the threshold). This pair of probabilities also varies with the τ threshold, with urban sensitivity generally declining with τ (owing to the frequency of relatively low-density, below-the-threshold urban cells that become excluded from the B i ≥ τ diagnostic) and rural sensitivity increasing.
For these reasons, a test design (i.e., choice of τ) that exhibits high values for all four of these probabilities is simply unattainable. How, then, would a single best threshold τ be determined? This is not a statistical question as such, but rather a question about decision-making priorities. The optimal τ threshold clearly must depend on the positive weights attached by a decision-maker to each of the four relevant probabilities. There is no particular reason to think that decision-makers in different countries, experiencing different urban-rural contexts and paces of change, would necessarily adopt the same weights and arrive at the same choice for the τ threshold.
To estimate the four probabilities, we overlay the vector boundaries of officially urban land units on the GHSL-BUILT rasters of built-up density (here, as elsewhere in our analysis, water masks are used to restrict the comparison to habitable units of land). As is readily apparent in Figures A1 and A2, the choice of theshold τ has little perceptible influence on the estimated Pr(U i = 0|B i < τ), the probability of correct rural classification. This is due to two dominating facts about the distribution of land, namely, the percentages of rural land in the national totals for India and Mexico were extremely high-accounting for well above 90 percent of all habitable land-and the built-up percentages of rural land were very low, on average. Urban land misclassified as rural via the B i < τ criterion certainly exists-see the main text for discussion of low built-up densities in India's officially urban settlements-and the total amount of such misclassified land does increase with τ, as would be expected; however, of all grid cells with B i < τ the overwhelming majority remain officially rural even at high values for τ. A similar logic explains the lack of a τ effect on test sensitivity, Pr(B i < τ|U i = 0). A large share of rural land has built-up densities below 1 percent, so that raising the threshold above that level has little impact on the percentage of all officially rural land diagnosed as rural by the B i < τ criterion. In short, this kind of threshold test proves to be uninformative where rural land is concerned.  Effects of varying τ percentage thresholds in the interval (1, 99) on the likelihood of correct urban and rural land classifications and test sensitivity. By "correct", we mean classifications that are in agreement with the official country-specific urban and rural designations. (a) Mexico correct classification percentages by τ threshold (b) Mexico test sensitivity percentages by τ threshold.
As these figures also show, however, the implications of the τ threshold for the correctness and sensitivity of urban land classification are far greater. A tradeoff between "true positives" correctness and sensitivity is clearly apparent; higher τ thresholds are associated with higher probabilities of correct classification given density (panel (a) of both figures) but produce lower test sensitivities (panel (b)), because raising τ has the effect of excluding relatively low-density urban cells. This presents would-be decision-makers with a dilemma: How should the two urban probabilities be balanced against each other in selecting one τ threshold?
Much as with the simple single-threshold diagnostic tool we here describe, issues of correct classification and sensitivity also arise in more heavily parameterized models such as the Degree of Urbanization, whose seven categories are based on multiple thresholds for both population density and size, as well as further parameters defining contiguity and built-up density. In this brief treatment, we cannot do justice to the complexities entailed in an analysis of threshold choice in such richer models. These issues must be left to future research.       Population densities in selected regions of the study countries. Settlement boundaries and population from the 2011 census year for India (panel (a)) and the 2010 census year for both Mexico (panel (b)) and the United States (panel (c)).  Degree of Urbanization (DoU) population density and size criteria. For further detail on continguity criteria, not described here, see [61]. Terms used to summarize comparisons between official urban-rural classifications and GHSL built-up percentages, using a built-up threshold of τ = 50 percent (see Appendix A).