UvA-DARE ( Digital Academic Repository ) Capturing the Diversity of Deprived Areas with Image-Based Features

Many cities in the Global South are facing rapid population and slum growth, but lack detailed information to target these issues. Frequently, municipal datasets on such areas do not keep up with such dynamics, with data that are incomplete, inconsistent, and outdated. Aggregated census-based statistics refer to large and heterogeneous areas, hiding internal spatial differences. In recent years, several remote sensing studies developed methods for mapping slums; however, few studies focused on their diversity. To address this shortcoming, this study analyzes the capacity of very high resolution (VHR) imagery and image processing methods to map locally specific types of deprived areas, applied to the city of Mumbai, India. We analyze spatial, spectral, and textural characteristics of deprived areas, using a WorldView-2 imagery combined with auxiliary spatial data, a random forest classifier, and logistic regression modeling. In addition, image segmentation is used to aggregate results to homogenous urban patches (HUPs). The resulting typology of deprived areas obtains a classification accuracy of 79% for four deprived types and one formal built-up class. The research successfully demonstrates how image-based proxies from VHR imagery can help extract spatial information on the diversity and cross-boundary clusters of deprivation to inform strategic urban management.


Introduction
Official maps often omit the existence of deprived areas [1] or declare them to be homogeneous [2,3]. However, deprived areas generally differ in their histories, their morphologies, services, socioeconomic, conditions and tenure (ranging from pavement dwellers and large slum areas to deprived resettlement colonies). Finding reliable information on deprived areas is a complex problem, as illustrated by population estimates in the large Mumbai slum Dharavi, which, according to [4], range from 300,000 to 900,000 inhabitants. Furthermore, deprivation mapping is often carried out at the administrative ward level (c.f. [5,6]), hiding spatial differences within wards and clustering across ward boundaries. This is a particular problem if wards are rather large, as is the case of the health wards in Mumbai (of which there were 88 at the time of the 2001 Census, with an average population of 136,000). In the 2011 Census data, the metropolitan area of Mumbai is divided into 24 administrative wards, with populations ranging from 127,290 (city [7]) to 941,366 people (suburban [8]). Linking and integrating spatially detailed information on slums to such large and aggregated spatial units is a problem [9], thus even when data on slums are available they are often not used as useful spatial relationships cannot be built.

Conceptualizing Deprivation
Unlike classical concepts of poverty that focus on income and consumption, the term "deprivation" is often used in understanding poverty as a multi-dimensional phenomenon, applied for example in the index of multiple deprivation [34]. Deprived areas, similar to 'slums' or 'informal settlements', refer to areas with sub-standard housing conditions and poor physical and environmental conditions offering housing to predominantly poor people [23]. They may also include areas that have been formally developed (e.g., resettlement colonies) but have slum-like living conditions [5]. Inhabitants of such areas are commonly deprived of access to basic services and live in overcrowded and unsafe environments. The official definitions of deprived areas and the terminology used to refer to such areas vary by country, but also within countries or even localities, where various definitions and interests are commonly found [35]. Official slum definitions can be very political. The Indian census has three types of slums: 'notified' (by the government under any Act), 'recognized' (areas not formally notified but recognized by the government), and 'identified' (areas of at least 300 people or 60-70 households that live in congested and unhygienic environments, lack basic services, and need to be visited and registered by a Charge Officer) [36]. Notified and non-notified slums differ in the level of service provision, as for notified slums the local government has the obligation of basic service provision and upgrading [37]. Because of this, in India, often a large proportion of deprived areas are excluded from basic service provision and upgrading as they lack notification (e.g., pavement dwellers), or city governments have simply stopped notification processes [38].
According to the census of 2001 and 2011, the urban slum population in India decreased from 26.3% [39] to 17.4% [36], suggesting the success of policy initiatives such as Basic Services for the Urban Poor (part of the Jawaharlal Nehru National Urban Renewal Mission; [40]). However, the real extent of deprived areas might be concealed by such statistics as they exclude several types of deprived areas (e.g., pavement dwellers and resettlement colonies) [36,38]. Administrative boundaries of wards have also often changed, and, furthermore, the average deprivation per ward (e.g., via a deprivation index [6]) may mask slum areas.
Slums, and deprived areas in general, are "more heterogeneous than is often assumed" [38] (p. 60). UN-Habitat [41] analyzed the vast diversity of slum types for 30 cities in the Global South and North. We categorize their typology into a concept of deprived areas around three main determinants ( Figure 1).
As shown in Figure 1, first, deprived areas differ in terms of object types, for example, housing types range from pavement dwellings (using locally available material) to multi-story housing or the occupation of dilapidated (historic) buildings. Second, land and site characteristics such as reserves on public land (e.g., along roads or railways), small encroachments between formal areas or illegal subdivisions on agricultural land, which can have very regular patterns, result in different types of deprived areas differing in location, size, densities and access to services. Third, temporal dynamics and the history of areas determine the typology (e.g., chawls in Mumbai developed mainly in the early 20th century as 3-5-story housing for textile and other industrial workers). Patterns of settlements differ when areas are developed by collective and organized occupation, such as the organized land invasion in Latin America (e.g., in Lima, where several thousand people invaded land within one day [42]), compared to areas incrementally developed by individual households.
Having conceptualized the determinants that produce the diversity of deprived areas, we explore how this diversity was recognized in previous remote sensing studies (Table 1). Table 1 provides examples of studies in the field of remote sensing, differentiating types of deprived areas, which we extracted from a literature search (using Science Direct, Web of Science, and Scopus). All these initiatives departed from the idea of deprived areas being homogenous, stressing that such areas differ among themselves as well as differ from formal urban areas. The identified typologies range from two to seven categories and reflect the complexity and diversity of such areas across the globe. In some cities, the land/site characteristics have a strong determining influence, as noted for Cairo, where differences between developments on former agricultural land are structured by the farm boundaries, Remote Sens. 2017, 9, 384 4 of 23 while those on desert land have a less orderly morphology. Several typologies also include fuzzy or transition classes between informal and formal areas (i.e., semi-formal low-cost housing, hybrid, or ex-formal on public or private land) or formal but deprived areas (i.e., basic formal). By contrast, [43] showed that informal areas might not be deprived (i.e., affluent informal settlements), and might not be relevant targets for pro-poor policies. Thus, having spatial data on the combinations of such characteristics would allow for a better understanding of the spatial diversity of deprivation and would offer specific information useful for the development of upgrading programs.
Remote Sens. 2017, 9, 384 4 of 22 farm boundaries, while those on desert land have a less orderly morphology. Several typologies also include fuzzy or transition classes between informal and formal areas (i.e., semi-formal low-cost housing, hybrid, or ex-formal on public or private land) or formal but deprived areas (i.e., basic formal). By contrast, [43] showed that informal areas might not be deprived (i.e., affluent informal settlements), and might not be relevant targets for pro-poor policies. Thus, having spatial data on the combinations of such characteristics would allow for a better understanding of the spatial diversity of deprivation and would offer specific information useful for the development of upgrading programs.

Deprived Areas in Mumbai: A Typology
Deprived areas in Mumbai are diverse in terms of their physical characteristics [2,41], which we conceptualize utilizing the dimensions geometry, density, pattern, and environment ( Figure 2). In an earlier study [14], we developed a typology of deprived areas for Mumbai using VHR imagery. That typology included five types (Table 1), established through fieldwork surveys and discussions with local experts. Their morphological characteristics can be associated with information extracted from spectral image analysis, texture analysis, and spatial metrics. This research utilizes this typology for mapping deprivation, but since the transition between the types 'deprived areas with larger buildings/chawls' and 'basic formal areas' is ambiguous, these two types are combined. This results in four deprived and one formal type of area ( Figure 2) whose specific dimensions are translated into image features, allowing them to be mapped.

Deprived Areas in Mumbai: A Typology
Deprived areas in Mumbai are diverse in terms of their physical characteristics [2,41], which we conceptualize utilizing the dimensions geometry, density, pattern, and environment ( Figure 2). In an earlier study [14], we developed a typology of deprived areas for Mumbai using VHR imagery. That typology included five types (Table 1), established through fieldwork surveys and discussions with local experts. Their morphological characteristics can be associated with information extracted from spectral image analysis, texture analysis, and spatial metrics. This research utilizes this typology for mapping deprivation, but since the transition between the types 'deprived areas with larger buildings/chawls' and 'basic formal areas' is ambiguous, these two types are combined. This results in four deprived and one formal type of area ( Figure 2) whose specific dimensions are translated into image features, allowing them to be mapped.  Our typology ( Figure 2) ranges from slum pockets/encroachments (e.g., pavement dwellings) along physical infrastructure ( Figure 3) such as highways, pipelines, or the airport area to more regular and well-maintained areas with houses of several floors with proper paths and open spaces between the houses. Type 1, slum pockets, are often temporary areas along the transport network or 'islands' within formal areas, displaying poor housing structures, very high densities, and a lack of access to basic service provision. The second type concerns long-established and often large settlements with very high densities, small houses, and narrow lanes between them. Such areas commonly lack access to basic services such as piped water or a closed drainage system. The third type ( Figure 4) has mixed housing sizes including slightly larger houses of 1-2 floors, though often in irregular arrangements with somewhat larger paths between the houses, still leaving very little space between the houses. Despite high densities and few open spaces, houses and spaces are often rather clean and well-maintained. Frequently, some basic infrastructure is present. The fourth type consists of a gradual transition of deprived informal to formal areas with medium-sized or larger buildings (e.g., chawls built for textile workers), and settlements with wider paths and streets as well as open but limited green spaces (e.g., resettlement colonies). These areas mostly have access to basic infrastructure.
Besides the four types of deprived areas, formal areas are classified separately to analyze whether deprived areas differ from this type. Formal built-up areas are rather heterogeneous, but display a relatively regular building layout, larger building sizes, more vegetation cover, and commonly have lower built-up densities. The four deprived types can be relatively well distinguished from formal areas via the GLCM variance [17].  Our typology ( Figure 2) ranges from slum pockets/encroachments (e.g., pavement dwellings) along physical infrastructure ( Figure 3) such as highways, pipelines, or the airport area to more regular and well-maintained areas with houses of several floors with proper paths and open spaces between the houses. Type 1, slum pockets, are often temporary areas along the transport network or 'islands' within formal areas, displaying poor housing structures, very high densities, and a lack of access to basic service provision. The second type concerns long-established and often large settlements with very high densities, small houses, and narrow lanes between them. Such areas commonly lack access to basic services such as piped water or a closed drainage system. The third type ( Figure 4) has mixed housing sizes including slightly larger houses of 1-2 floors, though often in irregular arrangements with somewhat larger paths between the houses, still leaving very little space between the houses. Despite high densities and few open spaces, houses and spaces are often rather clean and well-maintained. Frequently, some basic infrastructure is present. The fourth type consists of a gradual transition of deprived informal to formal areas with medium-sized or larger buildings (e.g., chawls built for textile workers), and settlements with wider paths and streets as well as open but limited green spaces (e.g., resettlement colonies). These areas mostly have access to basic infrastructure.
Besides the four types of deprived areas, formal areas are classified separately to analyze whether deprived areas differ from this type. Formal built-up areas are rather heterogeneous, but display a relatively regular building layout, larger building sizes, more vegetation cover, and commonly have lower built-up densities. The four deprived types can be relatively well distinguished from formal areas via the GLCM variance [17].

Materials and Methods
The main approach used in this study to map the diversity of physical deprivation ( Figure 2) at the spatial level of HUPs combines VHR imageries with available spatial data. The spatial aggregation to HUPs is done via image segmentation, creating homogenous areas. For producing a typology of deprivation, both a random forest classifier and LR modeling are employed. Details on the study context, data, and methods are described in the following sub-sections.

Study Area Context and Available Dataset
The study area is the city of Mumbai, India, which has a present population of about 12.4 million, with about 41.8% living in slums [52].  [6]. It maps multiple deprivation experienced within Mumbai at the level of health wards. For the city of Mumbai, the IMD ranges from 0.22 for the least deprived ward up to 0.44 for the most deprived ward (range 0-1). A ward with a hypothetical value 0 would imply a fully planned upper-middle class area, without deprived households, while a ward with a value of 1 would mean that all households in the entire ward are deprived in all aspects (i.e., have no access to sanitation, water, electricity, education, bank accounts, or scooters; live in overcrowded dwellings; are unemployed; and are all members of a scheduled caste) [6]. The temporal inconsistency of the index and imagery is discussed in Section 3.4.3. Although only 17 out of 88 health wards are covered by the images, the area has a good mix of the full range of the IMD ( Figure 5).
For developing the LR model, aimed at mapping the typology of deprivation, 94 ground-truth points (training sample) are available from fieldwork undertaken in 2011 and 2013. For accuracy assessment, an additional 170 ground-truth points were collected within three subunits through fieldwork in 2015 for another study [53], which we could use for this study as reference (test sample). The two ground-truth sets are not combined because the training set covers the entire study area, while the test set focuses on three subunits. All ground-truth data were collected as point data and recorded the dominant built-up type in the immediate surroundings. To overcome problems with temporal inconsistency between image and ground-truth (test) data, a visual inspection of the points was performed, comparing the imagery from 2009 with Google Earth images from 2015; as a result, four points were removed for having obvious land cover/use changes. The random selection of points within deprived areas led to an unequal distribution of points across the types of deprived areas. As a result, the types 'slum small' and 'formal' are overrepresented.

Materials and Methods
The main approach used in this study to map the diversity of physical deprivation ( Figure 2) at the spatial level of HUPs combines VHR imageries with available spatial data. The spatial aggregation to HUPs is done via image segmentation, creating homogenous areas. For producing a typology of deprivation, both a random forest classifier and LR modeling are employed. Details on the study context, data, and methods are described in the following sub-sections.

Study Area Context and Available Dataset
The study area is the city of Mumbai, India, which has a present population of about 12.4 million, with about 41.8% living in slums [52].  [6]. It maps multiple deprivation experienced within Mumbai at the level of health wards. For the city of Mumbai, the IMD ranges from 0.22 for the least deprived ward up to 0.44 for the most deprived ward (range 0-1). A ward with a hypothetical value 0 would imply a fully planned upper-middle class area, without deprived households, while a ward with a value of 1 would mean that all households in the entire ward are deprived in all aspects (i.e., have no access to sanitation, water, electricity, education, bank accounts, or scooters; live in overcrowded dwellings; are unemployed; and are all members of a scheduled caste) [6]. The temporal inconsistency of the index and imagery is discussed in Section 3.4.3. Although only 17 out of 88 health wards are covered by the images, the area has a good mix of the full range of the IMD ( Figure 5).
For developing the LR model, aimed at mapping the typology of deprivation, 94 ground-truth points (training sample) are available from fieldwork undertaken in 2011 and 2013. For accuracy assessment, an additional 170 ground-truth points were collected within three subunits through fieldwork in 2015 for another study [53], which we could use for this study as reference (test sample). The two ground-truth sets are not combined because the training set covers the entire study area, while the test set focuses on three subunits. All ground-truth data were collected as point data and recorded the dominant built-up type in the immediate surroundings. To overcome problems with temporal inconsistency between image and ground-truth (test) data, a visual inspection of the points was performed, comparing the imagery from 2009 with Google Earth images from 2015; as a result, four points were removed for having obvious land cover/use changes. The random selection of points within deprived areas led to an unequal distribution of points across the types of deprived areas. As a result, the types 'slum small' and 'formal' are overrepresented.

Methodology-Mapping the Diversity of Deprived Areas
The methodology to map the diversity of deprived areas consists of (1) extracting image features, (2) analyzing the significance of image feature, and (3) extracting different types of deprived areas, presented in Figure 6. First, to extract image features, the basic land cover/use classes (built-up (deprived and formal), vegetation, water, soil, road, and shadow) are mapped using a RF classifier, employing our parameter setting of a previous study on Mumbai [17]. The NDVI (normalized difference vegetation index), edges and GLCM texture measures (variance, contrast, homogeneity, entropy, dissimilarity, and second-moment mean) are extracted using the WorldView images. For the extraction of the

Methodology-Mapping the Diversity of Deprived Areas
The methodology to map the diversity of deprived areas consists of (1) extracting image features, (2) analyzing the significance of image feature, and (3) extracting different types of deprived areas, presented in Figure 6.

Methodology-Mapping the Diversity of Deprived Areas
The methodology to map the diversity of deprived areas consists of (1) extracting image features, (2) analyzing the significance of image feature, and (3) extracting different types of deprived areas, presented in Figure 6. First, to extract image features, the basic land cover/use classes (built-up (deprived and formal), vegetation, water, soil, road, and shadow) are mapped using a RF classifier, employing our parameter setting of a previous study on Mumbai [17]. The NDVI (normalized difference vegetation index), edges and GLCM texture measures (variance, contrast, homogeneity, entropy, dissimilarity, and second-moment mean) are extracted using the WorldView images. For the extraction of the First, to extract image features, the basic land cover/use classes (built-up (deprived and formal), vegetation, water, soil, road, and shadow) are mapped using a RF classifier, employing our parameter setting of a previous study on Mumbai [17]. The NDVI (normalized difference vegetation index), edges and GLCM texture measures (variance, contrast, homogeneity, entropy, dissimilarity, and second-moment mean) are extracted using the WorldView images. For the extraction of the GLCM we used a window size of 21 × 21 pixels, which was optimized in a previous study on the same image [17].
The result of the land cover/use classification showing the built-up classes (deprived and formal) having an overall pixel-based accuracy of 90% with a Kappa of 0.87 (for details see [17]), are used to calculate several spatial metrics with the potential to describe aggregation (AI), shape (FRAC, SHAPE), density (PD), and homogeneity (SHDI and SHE) conditions in deprived areas. The rationale for the selection of features to map deprivation is provided in Section 3.3. The selected set of metrics [54] consists of: • Patch density • Shannon's evenness index where AI: g ii = number of like adjacencies, max→g ii = maximum number of like adjacencies; FRAC: p ij = perimeter (m) of patch ij, a ij = area (m 2 ) of patch ij; PD: m = number of patch types, A = total landscape area (m 2 ); SHAPE: p ij = perimeter of patch ij, min p ij = minimum perimeter of patch ij; SHDI/SHEI: P i = proportion of the landscape of class I, and m = number of classes. In this study, HUPs are the main spatial analysis unit for aggregating pixel-based information. They are areas of both homogenous textural and spectral characteristics, e.g., representing formal areas or deprived neighborhood types. HUPs, as defined by Liu, Clarke and Herold [24], (1) have homogenous texture; (2) consist of several land-cover types; (3) have matching physical boundaries; and (4) do not contain single objects and are sufficiently large. Thus HUPs are extracted via image segmentation using multi-resolution image segmentation employing the road network as thematic layer (to refine boundaries), with a scale parameter of 200, following our previous study in Mumbai [17]. However, the OSM road data have limitations in terms of consistent coverage in countries of the Global South [9]. In Mumbai, such inconsistencies exist in particular in slums. As a consequence, we did not use footpaths, which are only available for some slum areas (e.g., Dharavi).
Second, the significance of the derived image features is analyzed. Therefore, all image features (e.g., based on spatial metrics, GLCM) are aggregated at HUPs and the training set of 94 ground-truth points is used to derive significant features that differentiate types of deprived areas (details are given in Section 3.4.1). Third, to extract the typology of deprivation, multiple regression modeling is used and the accuracy is assessed by a set of 166 test samples (details are given in Section 3.4.2).

Extraction of Features to Map the Diversity of Deprivation
Based on the four morphological dimensions of deprived areas in Mumbai, i.e., environment, density, geometry, and texture pattern (building on the earlier work of [4,15,22,25,[55][56][57]), image features are created with the potential to capture the diversity of such areas (Figure 2). This list of image features (Figure 7) is generated based on distinguishing features reported in slum mapping studies (e.g., [4,10,22,23,56,[58][59][60]), as well as by considering the local characteristics of deprived areas in Mumbai.
Earlier studies [15,25] showed that deprived areas display diversity in terms of environmental (environ) features such as location on steep slope. Furthermore, land cover/use characteristics often vary among deprived areas; e.g., large and very densely built-up areas have little land cover/use heterogeneity (measured e.g., by SHDI and SHEI) while small slum pockets are often surrounded by vegetation or other land cover/use types. The patterns of deprived and formal areas show distinct differences, meaning that deprived areas commonly have more organic layouts and formal areas more regular ones. Yet, texture pattern differences exist among deprived areas, which can be measured by GLCM features. The geometry features explore object shape variations and arrangements (e.g., via AI, SHAPE, FRAC) [61]. Building layouts in deprived areas are often less complex and object sizes are small compared to formal areas. However, these features show variations, e.g., very small objects in areas of slum pockets compared to larger buildings in chawls or resettlement colonies, which can be very densely built-up. Thus density features also show variations among deprived areas, e.g., lower densities in areas of the type 'slum mix' compared to 'slum small'. For a large number of features (e.g., GLCM, edge features) the panchromatic band of WorldView-2 imagery is used, while for some density, geometry and environment features, the results are derived from the random forest classification (land cover/use) of the WorldView-2 images (e.g., shadow, built-up). For calculating some of the geometry and environment features, spatial metrics is used. For line density, road network data from OSM are used, while topography features are derived from the SRTM DEM. The features are either calculated using a 21 × 21 window (e.g., GLCM) or are directly captured per HUP (e.g., slope). However, the features extracted via a window are also aggregated to HUPs using the mean feature values (each HUP receives 34 features values). To allow comparability of the features, they are normalized employing the method '0-1 scaling'.

Modeling the Typology of Deprivation
To model the diversity of deprivation, two major steps are necessary (details are provided in Sections 3.4.1 and 3.4.2): first, the significant features are extracted; and second, they are used within a regression model to classify the HUPs. This is done in a stepwise process (Table 2) using the normalized features per HUP (Figure 7) to model the typology of deprivation. Thus first, a binary backward LR model (modeling the class probability), second, a multinomial LR model (assessing the separability of deprived types), and third, four binary LR models (extracting the probability values of HUPs belonging to one of the four deprived types) are set up. For all models, features at the 95% confidence interval are considered significant (p < 0.05); features below this level are considered not significant and are therefore not included. For building the LR models, the first set of ground truth (training) data is used. The result is a fuzzy classification; each HUP obtains probability values of all built-up types. However, for the final classification the class with the highest probability is selected. The steps to arrive at this final typology are detailed in the following sub-sections.

Significance of Image Features
Before employing the features within a regression analysis, their multi-collinearity is analyzed using the VIF (variable inflation factor) value, where values should be below 10 to avoid serious problems of multi-collinearity [62]. The commonly used VIF threshold value of 10 is used [62,63] (other sources suggest a value of 5 [64]). Considering that morphological features have a general tendency to correlate, the maximal threshold is selected. Thus only very highly correlating features are identified and excluded. Besides analyzing multi-collinearity, the means of all features of the built-up types (deprived and formal) are plotted. This allows us to analyze whether features show differences for the deprived types. Both the ability to differentiate between deprived types and multi-collinearity are used to arrive at a pre-selection of features to be entered into the regression model. This step is necessary as the number of training data points would not support the use of a very large set of variables (features). For the first model, a binary LR model, all deprived types are merged and tested to see whether deprived and formal HUPs can be easily differentiated. The model eliminates all non-significant features, thereby simplifying the calculation of the HUP probability. In addition, it provides classification accuracy and probability values of class memberships (Equation (7)), allowing us to classify all HUPs (also HUPs where no ground truth data is available).
where P(y) is the probability of y occurring, e: natural logarithm base, b 0 : interception at y-axis and b 1 : line gradient.
The result provides the classification of all formal HUPs. To avoid formal areas with larger vegetation cover being classified as vegetation HUPs, the classification rule allocates HUPs with a mix of vegetation and formal areas to formal areas when the vegetation cover is less than 60%. HUPs with more than 60% vegetation cover are classified as vegetation; however, such HUPs might still contain individual buildings.
The second model, a multinomial LR model, assesses whether the features (Figure 7) are able to distinguish the different types of deprived areas. The result shows the significant features to be used for the third LR model and the resulting classification accuracies for the various types of deprived areas.

Extracting the Typology of Deprivation
Employing a third LR model, the most significant features per deprived type are extracted by four binary backward LR models. The obtained coefficients and constants for the four deprived types are used to calculate the probability of each HUP to belong to a specific type using Equation (7). Each built-up HUP (using the result of the land cover/use classification) is classified in a vector environment according to the highest probability of the five built-up types. All other non-built-up HUPs are also classified using the result of the land cover/use classification; only roads and water bodies are derived from OSM. The strength of the model is assessed via the classification accuracy and Nagelkerke R 2 . In a final step, the accuracy of the classification is assessed using the second set of 166 ground truth (test) data using the overall accuracy and Kappa.

Cross-Boundary Health Ward Clusters of Deprivation
To illustrate the application potential of mapping the typology of deprivation, the results of the HUP-based deprivation map and the ward boundaries (including the index of multiple deprivation) are superimposed. Despite the temporal inconsistency of the data, this comparison illustrates the different aggregation levels of the datasets for a central area of Mumbai, where large areas have been relatively stable between 2001 and 2009 (the center was already in 2001 very densely built up, not allowing for much in the way of horizontal building dynamics). This comparison focuses on problems of aggregated administrative units for analyzing aspects of the urban morphology, as also illustrated in [12].

Results
In this section, we present the results of the stepwise process to extract the typology of deprived areas based on the most significant features. We also illustrate how such data can visualize clusters of deprivation across ward boundaries and show their diversity.

Analyzing the Correlation of Potential Features
Both the ability to distinguish the five built-up types and the correlation of all 34 image features ( Figure 7) are analyzed for all image features aggregated at the level of built-up HUPs. Many of the features highly correlate with several others. Therefore, the most correlating and least differing features are excluded from the selection. Mean feature values per built-up class are shown in Figure 8. For several features, formal areas show large differences with the deprived area types, e.g., 'GLCM variance', 'built-up PD' (patch density), 'shadow and line density', 'shape index' and 'vegetation percentage'. However, for 'GLCM entropy STD', 'built-up density' and 'mean built-up area' formal areas and slum pockets have rather similar values. This seems rather surprising, but is caused by small slum pockets often being part of a larger HUP, which also contains non-built-up classes (e.g., soil) or in-between formal areas, while formal HUPs are often rather small because of the surrounding vegetation cover being part of a different HUP. This is also confirmed by the high 'land cover/use (lc/u) evenness' value of slum pockets, indicating that they have the highest mix of land cover/use classes. The largest 'mean area' is displayed by slum areas with small buildings; this type covers large areas across the study area. In general, slum areas with mixed building sizes are located at higher elevation (DEM min) and on steeper slopes ('slope mean'). Basic formal areas and chawls have the 'highest built-up densities' and 'GLCM second moment', while having the lowest 'shadow density'.

Features Used to Distinguish between Formal and Deprived Areas
A first binary backward LR model to distinguish between formal and deprived areas using all 15 features shows that GLCM variance alone is sufficient to distinguish them, with a classification accuracy of 98.9 % and Nagelkerke R 2 of 0.93. The coefficients and constants displayed in Table 4 are used to calculate the probability of a HUP being formally built-up. To simplify the calculation, the HUPs are stored as vector data and probability values are attributes. None of the 15 remaining features has a VIF value of more than 10 (critical value), but several have more than 5, which still signals relatively high collinearity (Table 3). However, all features are entered into the LR model.

Features Used to Distinguish between Formal and Deprived Areas
A first binary backward LR model to distinguish between formal and deprived areas using all 15 features shows that GLCM variance alone is sufficient to distinguish them, with a classification accuracy of 98.9 % and Nagelkerke R 2 of 0.93. The coefficients and constants displayed in Table 4 are used to calculate the probability of a HUP being formally built-up. To simplify the calculation, the HUPs are stored as vector data and probability values are attributes.

Analyzing the Separability of Deprived Areas
The second LR model analyzes whether deprived area types can be distinguished based on the selected set of 15 features within a multinomial LR model. Table 5 shows the classification result for all types using the training data, having an overall accuracy of 83%. The lowest accuracy is obtained for the 'slum mix' category, with only 61.5% correctly predicted HUPs. This was to be expected, as these deprived areas contain a mixture of small and large buildings, illustrating the complexity of slum typologies. In addition, the type 'basic/chawl' shows some incorrect predictions, which relates to the diversity within this type, ranging from chawls to resettlement colonies. Problems within the type 'slum small' often relate to the definition of HUPs that sometimes include smaller areas of other types and relate to the fact that ground-truth was collected as point data, not necessarily representing the dominant type of a larger HUP. Very stable predictions are obtained for the type 'slum pocket' and the type 'formal area' (Table 5). However, the training samples for the type 'slum pocket' are rather few.

Features to Classify Deprived HUPs
To calculate the probability of a built-up HUP belonging to a specific deprived type, four binary backward LR models are employed. After eliminating all non-significant features (via the second model), the coefficients and constants for the significant features of the four types of deprived areas were obtained ( Table 6). Out of the 15 features, only seven significant features were finally used within the four LR models, i.e., 'built-up mean area', 'GLCM second moment mean', 'GLCM entropy mean', 'built-up patch density (PD)', 'GLCM variance', 'land cover/use evenness (SHEI)', and 'DEM mean'. The most commonly reoccurring feature is the 'GLCM variance'. The features 'built-up mean area', 'land cover/use evenness', and 'DEM mean' are significant features for two types, while others are only significant for a specific type, e.g., 'GLCM entropy mean'. The coefficients and constants are used (Equation (7)) to calculate the HUP probabilities for all types. All models have a high Nagelkerke R 2 (ranging from 0.88 to 0.98), showing that they have very good explanatory power, even though there were only a few samples for some deprived classes. For the classification of the formal areas, the results of Table 4 are employed. Non-built-up HUPs (soil and vegetation) are classified using the land cover/use classification and OSM layers representing water bodies and roads. The classified HUPs (Figure 9) show the distribution of deprived areas. At the center of the mosaic is the international airport of Mumbai, with several deprived areas in its environs. Most large areas consist of the type 'slum small', whereas the type 'slum pocket' is scattered throughout the entire study area. The type 'basic/chawl' is found more towards the edges of the studied area, while the type 'slum mix' is often found adjacent to areas of the type 'slum small'. The statistics (Figure 9) show that 60.3% of the built-up area is 'formal', followed by 27.3% 'slum small', 5.6% 'slum pockets', 3.4% 'slum mix' and 'basic/chawls'. Deprived areas in this part of Mumbai represent almost 40% of the built-up area, while deprived areas are diverse (with 'slum small' as the most commonly occurring type).
Some problems exist with the dominant land cover/use type per HUP. For example, formal HUPs that are dominated by more than 60% vegetation cover are classified as vegetation HUP (see example 10a). Also, smaller areas that are within a larger HUP, e.g., small formal areas within larger deprived areas (see the example in Figure 10b,c) are omitted. Slum pockets are most prone to be completely or partially omitted (see the example in Figure 10d) due to their size. The transition between deprived types is very much influenced by the selected scale. Thus HUPs sometimes include a mix of formal and slum areas (see the example in Figure 10b), while the transition zones between deprived types are often not entirely crisp (see the example in Figure 10c).
The overall classification accuracy for a typology of deprived areas is 79%, with a Kappa value of 0.67 ( Table 7). The types with the best performance (considering producer and user accuracy) are 'formal' and 'slum small', followed by 'basic formal/chawl' and 'slum pocket'. The type 'slum mix', which has in its morphological definition some degree of fuzziness, has the lowest accuracy. However, the results show that the employed features allow for the extraction of a complex typology of deprived areas, with some limitations.  HUPs that are dominated by more than 60% vegetation cover are classified as vegetation HUP (see example 10a). Also, smaller areas that are within a larger HUP, e.g., small formal areas within larger deprived areas (see the example in Figure 10b,c) are omitted. Slum pockets are most prone to be completely or partially omitted (see the example in Figure 10d) due to their size. The transition between deprived types is very much influenced by the selected scale. Thus HUPs sometimes include a mix of formal and slum areas (see the example in Figure 10b), while the transition zones between deprived types are often not entirely crisp (see the example in Figure 10c). The overall classification accuracy for a typology of deprived areas is 79%, with a Kappa value of 0.67 ( Table 7). The types with the best performance (considering producer and user accuracy) are 'formal' and 'slum small', followed by 'basic formal/chawl' and 'slum pocket'. The type 'slum mix', which has in its morphological definition some degree of fuzziness, has the lowest accuracy. However, the results show that the employed features allow for the extraction of a complex typology of deprived areas, with some limitations.

Cross-Boundary Health Ward Clusters of Deprivation
In order to generate information that has societal relevance and can inform the development of pro-poor policies, which is often based on census data and commonly aggregated at large administrative units, the study examines whether such units are meaningful for mapping the diversity of deprivation. The results show that deprived areas do not match the boundaries of health wards, nor do health wards necessarily contain homogeneous types of deprivation. As illustrated in Figure 11, health ward boundaries crosscut large deprived areas. Furthermore, deprived areas within wards differ, sometimes showing adjacent areas of different deprivation types as well as large clusters of the same type. Analyzing deprivation based on such administrative spatial units obscures the real spatial extent of deprivation and could prevent the efficient targeting of pro-poor policies, e.g., [65] showed "considerable spatial variability" (p. 15) of deprivation (in form of a slum index) within administrative units (neighborhoods). Combining our classification results with the index of multiple deprivation reveals that wards with lower census-based deprivation values may

Cross-Boundary Health Ward Clusters of Deprivation
In order to generate information that has societal relevance and can inform the development of pro-poor policies, which is often based on census data and commonly aggregated at large administrative units, the study examines whether such units are meaningful for mapping the diversity of deprivation. The results show that deprived areas do not match the boundaries of health wards, nor do health wards necessarily contain homogeneous types of deprivation. As illustrated in Figure 11, health ward boundaries crosscut large deprived areas. Furthermore, deprived areas within wards differ, sometimes showing adjacent areas of different deprivation types as well as large clusters of the same type. Analyzing deprivation based on such administrative spatial units obscures the real spatial extent of deprivation and could prevent the efficient targeting of pro-poor policies, e.g., [65] showed "considerable spatial variability" (p. 15) of deprivation (in form of a slum index) within administrative units (neighborhoods). Combining our classification results with the index of multiple deprivation reveals that wards with lower census-based deprivation values may also have large and cross-boundary clusters of deprivation (see ward A with an IMD of 0.29 in Figure 11), while more deprived wards may also have larger formal built-up areas (see ward B with an IMD of 0.39 in Figure 11).
Remote Sens. 2017, 9,384 17 of 22 also have large and cross-boundary clusters of deprivation (see ward A with an IMD of 0.29 in Figure 11), while more deprived wards may also have larger formal built-up areas (see ward B with an IMD of 0.39 in Figure 11). When calculating the percentage of deprived areas from the total built-up area per ward (based on our classification results), the Pearson correlation coefficient with the multiple deprivation index (IMD) is 0.83, showing that image features are helpful indicators for mapping deprivation. When comparing the percentage of deprived areas with the percentage of people living in slums per ward (census-based), the correlation, at 0.65, is much lower. This indicates that census statistics do not fully cover deprivation in a complex mega-city like Mumbai, and shows the high potential of VHR imagery, which is capable of mapping cross-ward clusters and the diversity of deprivation. However, this finding is limited by the temporal difference of the two datasets, as the census and image data have a time gap of eight years. When calculating the percentage of deprived areas from the total built-up area per ward (based on our classification results), the Pearson correlation coefficient with the multiple deprivation index (IMD) is 0.83, showing that image features are helpful indicators for mapping deprivation. When comparing the percentage of deprived areas with the percentage of people living in slums per ward (census-based), the correlation, at 0.65, is much lower. This indicates that census statistics do not fully cover deprivation in a complex mega-city like Mumbai, and shows the high potential of VHR imagery, which is capable of mapping cross-ward clusters and the diversity of deprivation. However, this finding is limited by the temporal difference of the two datasets, as the census and image data have a time gap of eight years.

Discussion
The aim of the study was to analyze the capability of image processing methods to spatially distinguish different deprived areas in Mumbai from VHR imagery. Deprived areas in Mumbai have diverse and complex morphological characteristics, often overlooked in previous studies, e.g., [17,18]. The morphological characteristics were conceptualized into four dimensions, i.e., environment, texture pattern, density, and geometry, and further utilized in the image-based analysis to extract spatial information about their morphological differences. This not only improved our understanding of how to extract such information, but also has practical value. For instance, [51] stressed that deprived areas with a more regular pattern offer a better "base for subsequent improvements and installation of infrastructure"(p. 7) than areas with more irregular patterns, which often require more investment for upgrading. Thus, if different morphologies require different action for upgrading, detailed knowledge on the morphology of deprivation will support planning and decision-making for implementing upgrading policies [66]. However, the employed dimensions and their features have an inherent challenge, which refers to the spatial dimension used for its measurement [12]; for instance, density measures vary considerably depending on the reference unit used. Thus utilizing a different spatial aggregation level, e.g., via smaller or larger HUPs or using more regular outlined blocks will give different feature values and impact final mapping results. Nevertheless, we argue that HUPs optimized for the local context are much better adapted to reflect the urban morphology compared to administrative units, which are often not suitable due to the modifiable areal unit problem (MAUP) [12] and their overly large and variable size.
The extracted morphological features allowed us to capture the diversity of four deprived and one formal built-up area type. The significance of these image features was analyzed within a LR model, resulting in a set of coefficients and constants for the most significant features (i.e., GLCM variance, built-up mean area, land cover/use evenness (SHEI), DEM mean, GLCM second moment mean, GLCM entropy mean, and built-up patch density). This allowed us to calculate class probabilities for all HUPs, which resulted in a fuzzy probability layer at the HUP level. The final typology of deprived areas was based on the highest class probability. Due to the logistical challenges of collecting a large set of ground-truth data spread over a large urban area, the number of training points was relatively small. Collecting such data based on visual image interpretation, as is often done, would introduce a lot of uncertainty, as experts often disagree on the delineation of deprived areas in VHR imagery [33,67]. The increasing availability of crowdsourced data and Google Street View (e.g., in Indonesian cities) combined with visual image interpretation might, in the future, facilitate the extraction of suitable training data. Therefore, it would be interesting to repeat the approach for other cities using a larger set of training data.
Through this study we distinguished different types of deprivation with an overall classification accuracy of 79%. Obtained accuracy levels differed by type, showing that slums with small buildings had the highest classification accuracy while slums with mixed building sizes and the transition type between chawls and basic formal areas had the lowest classification accuracy. The aggregation of deprived areas to HUPs allowed for mapping the dominant type of entire neighborhoods. However, this aggregation often led to very small clusters of slum pockets (e.g., small pavement dwellings) being omitted as they are frequently part of a larger (e.g., formal) HUP. Employing a LR model helped to reduce the computational demand, because all feature values were aggregated to HUPs stored as vector data (in a raster data structure, image features would consume several GB). HUPs are also a more meaningful spatial unit for informing pro-poor policies. Furthermore, LR modeling allowed the extraction of the most significant features per type, while the fuzzy classification facilitated a better optimization of class threshold (probability) values compared to standard image classification methods.
The presented approach to capture the diversity of deprivation in a large and complex megacity was tailored to the local morphology of deprivation (in Mumbai) via the selected image features. However, the conceptual level of the four dimensions of the diversity of deprivation has the potential of being transferable (for concepts on measuring transferability and robustness, see [59,[67][68][69]) to other cities in the Global South. Further studies are recommended to better understand and analyze the diversity of deprivation across the globe, as well as to decide which image features are relevant for specific regional conditions.
The application potential of mapping the diversity and clustering of deprived areas was illustrated by overlaying the result with the health ward boundaries. This showed that large administrative units have limited use in mapping fine-grained patterns of deprivation in a complex megacity [15]. The ward boundaries sometimes cut across larger clusters of deprivation, splitting them into smaller subunits. For informing pro-poor policy, ward-based information hides the spatial heterogeneity of deprivation within wards and across boundaries, hampering effective planning and service provision [66]. Thus, more disaggregated and clustered information on deprivation that also measures its diversity could improve planning and decision-making in complex and dynamic megacities. It also points to the possible benefit of coordinating anti-deprivation action across ward jurisdictions, so that spatial coherent investments and improvements are made. Thus, VHR imagery, with its potential for covering larger areas with high temporal frequency, is fit for capturing details of the urban morphology beyond the aggregated view of administrative units.

Conclusions
Deprived areas are not homogenous in their dimensions, and considering them as one class ignores their vast diversity. We have shown that their morphological differences can be captured from space via image-based features, used as inputs for modeling the morphological dimensions of deprivation, i.e., geometry, density, texture pattern, and environment, while other aspects of their diversity such as economic activities are not easily captured from space. Employing image-based features within logistic regression models allowed for the selection of the most significant features to build a typology of deprivation in a very complex Indian megacity. The resulting fuzzy probability vector layer allowed for optimizing probabilities thresholds for the different types of deprivation and other land cover/use types. Comparing the results with aggregated deprivation maps revealed the internal diversity of wards as well as the existence of cross-ward clusters of deprivation. Such disaggregated spatial and semantically meaningful information on deprivation from VHR imagery has the potential to provide relevant information for strategic urban planning and management in a complex and dynamic megacity. Further research could address the transferability of image features for mapping locally specific types of deprivation to other cities in the Global South, aiming at employing a larger set of training data, which would allow for using larger feature sets. This would address one of the identified limitations of this research but also illustrate variations in the typology of deprivation across the globe.