Three types of data were needed to generate high-resolution grids of India’s urban areas and population: Settlement-level demographic data from the 2011 population census; spatial boundaries that delineate these settlements; and remotely-sensed data on built-up area. Each is described in turn.
2.1.1. Population Census Abstracts
In a welcome departure from previous practice, India’s Office of the Registrar General and Census Commissioner has placed into the public domain a very large collection of detailed, settlement-specific tabulations of 2011 population census data. For the purposes of this research, the key tabulations come in the form of what are termed primary census abstracts (PCAs)—they cover places ranging in size from tiny rural villages to small- and medium-sized towns and upward to the largest of India’s municipalities, providing information on the population of each settlement, its number of households, and selected additional characteristics. Complementary spatial data—to be described below—are available for a total of 4041 legally-constituted urban areas (statutory towns), 3893 so-called census towns, and 640,930 rural villages. The statutory towns are further subdivided into wards, with an abstract produced for each ward. Some of these wards are additionally designated as outgrowths.
The four urban categories—statutory towns, wards, outgrowths, and census towns—require a few words of explanation. Statutory towns are governed by one of the many forms of urban local governmental authority that exist in India, with the Constitution giving considerable latitude to state governors in decisions about whether and what type of authority to establish. The legal basis is set out in the Constitution of India PART IXA–243Q, as follows:
“Constitution of Municipalities. (1) There shall be constituted in every State, (a) a Nagar Panchayat (by whatever name called) for a transitional area, that is to say, an area in transition from a rural area to an urban area; (b) a Municipal Council for a smaller urban area; and (c) a Municipal Corporation for a larger urban area, in accordance with the provisions of this Part: Provided that a Municipality under this clause may not be constituted in such urban area or part thereof as the Governor may, having regard to the size of the area and the municipal services being provided or proposed to be provided by an industrial establishment in that area and such other factors as he may deem fit, by public notification, specify to be an industrial township. (2) In this article, ‘a transitional area’, ‘a smaller urban area’ or ‘a larger urban area’ means such area as the Governor may, having regard to the population of the area, the density of the population therein, the revenue generated for local administration, the percentage of employment in non-agricultural activities, the economic importance or such other factors as he may deem fit, specify by public notification for the purposes of this Part.”
Wards are electoral units that are overseen by statutory-urban governing bodies. There is no automatic rule by which statutory towns and their constituent wards come into being on the basis of well-defined demographic and economic criteria. Indeed, a notable and much commented-upon feature of Indian urbanization has to do with the reluctance of some states to allow their large, urban-like villages to be legally declared urban. The public finance problem is that an urban authority may not be eligible for development funds that are earmarked for rural areas, and depending on circumstance, state governors may be unpersuaded of the potential for obtaining commensurate urban development funds [2
]. These dueling political–economy considerations may well combine to produce under-estimates of India’s urban percentages, an issue that we will investigate in what follows.
Among all the wards of a statutory town, some can be designated as outgrowths, these being units which hold a type of dual status. An outgrowth is an area of high-density, arguably urban settlement that is spatially adjacent to a statutory town, and which would thus seem to be poised on the threshold of becoming legally urban. However, outgrowths are in fact governed by rural authorities. The ambiguous status of outgrowths is signified in their PCA identifier codes: outgrowths are assigned both a village code and a code defining the outgrowth as a ward of the statutory town. In India’s tabulations of urban population, outgrowths are treated as urban.
Much like outgrowths, census towns are legally rural settlements, but they are designated as urban for the purposes of an upcoming census and grouped with statutory urban areas in the official post-census tabulations. The census-town designation emerges in the course of discussions between census authorities and state government officials in the lead-up to each new census [12
]. There are specific demographic and socioeconomic criteria that are meant to guide the discussions, but these criteria are evaluated on the basis of data gathered in the previous census, a practice that leaves ample room for misunderstandings of local trends and variations in judgement [14
]. Also, there is no requirement that once classified as urban in this specialized way for a given census, a census town must remain so classified for the next decennial census. It seems that the state-specific discussions effectively begin anew in each census round. The census town–statutory town distinction has been in place for many decades, and India’s system of identifier codes for settlements has long distinguished the two.
Our calculations reveal how important census-town designations are, for example, to the state of Kerala’s overall percentage urban. The 2011 Census put the urban percentage of India as a whole at 31.1 percent, with census towns accounting for only 4.2 percentage points of the total. In Kerala, however, roughly 50.8 percent of the population is urban, a total that is well above the all-India average, with census towns accounting for almost 29 points of this total. Indeed, had the census towns of Kerala been ignored, only 21.9 percent of the state’s residents would have been counted as urban. Since the status of “census town” holds only for a given census, these towns can transition from rural village to census-urban status and then back, or alternatively can go on to become statutory urban, a complication that spawns confusion about the longer-term meaning of India’s reported urban percentages and which obscures the true pace of the country’s urbanization.
2.1.2. Boundary Data
Although the PCA settlement-level tabulations have been placed in the public domain, the government of India has not released digital records of settlement boundaries. Boundary data must be purchased from third-party vendors, who prohibit their redistribution. We use the proprietary “Village Map” data products (one for each state or union territory, in the WGS 1984 geographic projection) from ML Infomap LLC to provide the vector settlement boundary input for our new grids. (It is impossible to reconstitute the original boundaries from our new, gridded data products.) According to the ML Infomap metadata, its “coastal boundaries were aligned with imagery, [with] no gaps in polygon or no topological errors” and assure an “accuracy of the boundaries to 30–50 m”. These spatial data include the key identifiers (indicating state, district, subdistrict, and settlement) which (with some exceptions) allow the boundaries of each settlement to be linked to the corresponding PCA record and thereby to the full set of PCA demographic indicators. In total, over 650,000 spatial units have been used to construct the data collection provided in this research, which represents more than a 100-fold increase in the input resolution over the best publicly-available alternative, The Gridded Population of the World, Version 4. (GPWv.4) [15
]. Although the settlement boundaries are primarily rendered as vector polygons, in some less populous states, settlements could only be represented in terms of point locations. (Implications of this for our gridding method are discussed below).
below lists, by state as well as for India as a whole, the spatial inputs that we have used in this research. For statutory towns, census towns, and villages the table gives the number of whole settlements available in the spatial data. Many of these settlements are sub-divided into components that are specific to administrative district or subdistrict, but here we report only the total number of settlements. Spatial records for outgrowths are also available for the whole of India. However, ward-level spatial data are only available from ML Infomap as separate proprietary products. For a subset of 62 cities, including all cities with a population above 1 million, we collected additional ward-level spatial information, as shown in Figure 1
. Indian cities are internally organized in a variety of ways, and as the map in Figure 1
demonstrates, the ratio of settlement-level to ward-level units varies from one city to the next. Mumbai, a city of over 12 million inhabitants, consists of two distinct settlements and 97 wards. Navi Mumbai, a city of just over 1 million in the Greater Mumbai region, is 1 settlement with 89 wards, giving it twice the ratio of wards to settlements as Mumbai for about one-tenth of the population. The ward-level detail reveals population and density variations within cities and helps to identify uninhabited areas (e.g., large urban parks or reserved land) that would otherwise skew the density estimates. The distribution of population within cities is a very important factor in assessing population exposed to spatially-specific environmental risks, for example, flooding.
The original spatial data from ML Infomap were thoroughly cleaned and subjected to multiple rounds of topological correction. In the course of cross-validating the ML Infomap spatial data with census information and open-source settlement information, we uncovered numerous although generally minor flaws in these proprietary data. As described below, we made alterations only where necessary to achieve a match to the PCA records, and only if authoritative boundary information lent support to the changes. (Many additional alterations could have made to the original boundary data but were not due to inconsistencies and deficits in authoritative boundary records in India.) Furthermore, because district-level boundary data from ML Infomap are used in other 1-km gridded data products (such as GPW v.4), we decided to make the fewest alterations possible, systematically adjusting only the settlement polygons that were misaligned with district or state borders.
In particular, when we attempted to merge ward and outgrowth data from the PCAs with their corresponding ML Infomap spatial boundaries, it became clear that some wards and outgrowths (as well as some census towns), were either omitted from, or clearly misrepresented by, the spatial data. To achieve adequate linkages, we were required to split and merge ward polygons, and needed to draw new polygons where the omitted units could be identified with confidence. Publicly available data, including the District Census Handbooks
] of the Indian Census, resources on Open Street Map, and the ESRI base-map and Google were used in this data-cleaning process. In total, over 200 new polygons were either created or substantially altered. These corrections made it possible to achieve a match with the PCA data for nearly all of India’s settlements and outgrowths. Of the about 1036 outgrowths identified in the PCAs, only seven have not yet been located in the spatial boundaries data. The populations of these missing outgrowths are known, as are the statutory towns to which they are adjacent; only their precise spatial locations are yet to be established. (Additionally, a small number (n
= 57) of outgrowths, were aggregated in the ML Infomap spatial data. Maps provided in the Census Atlases
and District Census Handbooks
were not sufficiently informative to identify the individual outgrowth boundaries, leaving us no option but to aggregate, the PCA records to match the ML Infomap spatial units (n
The ward-level spatial data supplied by ML Infomap for Delhi, New Delhi, and certain cities in Andhra Pradesh do not always respect the fine partitions by administrative sub-district that are found in the PCA census records. We hope to resolve this problem in future research, but in the present data collection we have chosen to represent such problematic statutory towns spatially by their outer settlement boundaries and use whole-settlement PCA summaries to account for their populations.
2.1.3. Global Human Settlement Layer (GHSL) Data
Where spatial boundaries for urban settlements are lacking, out of date, or subject to conflicting interpretations, satellite data can be invaluable in identifying areas of human activity, whether by indicating built-structures or night-time lights. Such data have been used in recent decades to serve as proxies for urban areas [18
]. Here we employ the Global Human Settlement Layer (GHSL) produced by the Joint Research Center (JRC) of the European Commission. These data represent a new generation of global built-up land data products, ranging over 40 years of historic change (1975, 1990, 2000, and 2014; these are the ‘epochs’ or year on which the satellite observations were made) at fine spatial resolution (approximately 30 m in original form, aggregated to 250 m). The GHSL rasters were released in World Molleweide projection (datum: D_WGS_1984). Our research makes use of the 2014 built-up areas for India, as other studies have also done [19
] rather than interpolating data from 2000–2014 to match the 2011 census, despite the possibility that additional built-up areas may have emerged in the three years since the census was conducted.
In their original form, the GHSL data are binary, indicating either the presence or absence of a built structure in each 30 m grid cell [6
]. A cell is coded as built-up if it overlaps with a built structure or impervious surface (but not roads). In the version of GHSL used here, the 30 m cells were aggregated to a resolution of 250 m and assigned the proportion of built-up land as the raster value. Recent research has generally confirmed acceptable levels of accuracy of the GHSL except perhaps in very thinly settled rural regions; for details, see studies of omission errors in the rural United States [20
]. While similar validation studies for India have not been undertaken, Corbane and colleagues [18
] report that errors of omission in the newest GHSL product—the one we use here that is based on Sentinel-1 data in addition to Landsat imagery—are substantially reduced in Asia from the first generation (Landsat-only version).