Exploring the Potential of Machine Learning for Automatic Slum Identification from VHR Imagery

Slum identification in urban settlements is a crucial step in the process of formulation of pro-poor policies. However, the use of conventional methods for slum detection such as field surveys can be time-consuming and costly. This paper explores the possibility of implementing a low-cost standardized method for slum detection. We use spectral, texture and structural features extracted from very high spatial resolution imagery as input data and evaluate the capability of three machine learning algorithms (Logistic Regression, Support Vector Machine and Random Forest) to classify urban areas as slum or no-slum. Using data from Buenos Aires (Argentina), Medellin (Colombia) and Recife (Brazil), we found that Support Vector Machine with radial basis kernel delivers the best performance (with F2-scores over 0.81). We also found that singularities within cities preclude the use of a unified classification model.


Introduction
According to [1], slums are the most deprived and excluded form of informal settlements.Slums are characterized by poverty and agglomerations of inadequate housing and are often located in hazardous urban areas.In 2016, approximately one in eight individuals lived in a slum.Although there was a decrease from 39% to 30% of urban population living in slums between 2000 and 2014, the absolute number of people living in urban slums continues to grow and it is a critical factor for the persistence of poverty in the world [2].Moreover, the urban population of the world's two poorest regions, South Asia and Sub-Saharan Africa, is expected to double over the next 20 years, which suggests that the slum dwellers in those regions will grow significantly [1].
There has been a significant increase in the number of studies regarding the usefulness of remote sensing imagery to measure socioeconomic variables [3][4][5][6].This trend is partly due to the increasing availability of satellite platforms, advances in methods and the decreasing costs of these images [7,8].Remote sensing imagery may become an alternative source of information in urban settings for which survey data are scarce.In addition, this imagery may complement socioeconomic data that have been obtained from socioeconomic surveys [3].The use of remote sensing data to estimate socioeconomic variables is based in the premise that the physical appearance of a human settlement is a reflection of the society that created it and is also based on the assumption that individuals who live in urban areas with similar physical housing conditions have similar social and demographic characteristics [9,10].
Slum detection or slum mapping is one of the most recurrent applications in this field of study; scholars have published a minimum of 87 papers in scientific journals over the last 15 years [8].
These studies have demonstrated that the physical characteristics of slums are distinguishable from the physical characteristics of formal settlements by using remote sensing data [11][12][13].This is an important area of study because numerous local governments do not fully acknowledge the existence of slums or informal settlements [1], which hinders the formulation of policies to benefit the poor citizens of cities [8].
Numerous methods that make use of remote sensing imagery can be used to identify slum areas.Object based image analysis (OBIA) was, until recently, the most used method; other methods include visual interpretation, texture/morphology analysis and machine learning, which is more accurate and is often combined with OBIA [8].Machine learning (ML) approaches generally combine textural, spectral and structural features [12].The Random Forest classifier (RF) is one of the most popular ML methods for slum extraction that uses very high spatial resolution (VHR) imagery [12,14].Support Vector Machine (SVM) and Neural Networks (NN) are also used for slum identification [8].However, most of these ML algorithms are implemented at the pixel level and have limited viability when working with VHR imagery, in contrast to OBIA [15].Appropriate ML methods are generally determined by the intuition of the researcher.
According to [8], most published studies describe the use of remote sensing to map slums and relied on expensive commercial imagery with near-infrared (NIR) information [16] or three-dimensional data such as LIDAR [13].Numerous small cities in developing countries do not have the funds to purchase full satellite imagery and often use RGB data for data extraction via interpretation [15,17].Google Earth (GE) imagery may be the only available source of aerial imagery for small local governments because these images are free to the public [18,19].In addition, Google Earth provides historical VHR imagery for many locations, which may be useful for spatio-temporal urban analysis.According to Google Earth terms of service [20], GE imagery can be used for non-commercial purposes, and its use is specifically allowed for research papers and other related documents.
The purpose of this study is threefold.First, we explore the possibility of detecting slums within cities by using very high spatial resolution (VHR) RGB GE imagery, image feature extraction and OBIA techniques, without ancillary data.Second, using identical input data, we compare the performance of different ML algorithms to identify slums and determine which algorithm provides the optimal results.Third, we seek to identify a low-cost standardized method to detect slums that is also flexible, easy to automate and may be used in other urban settings with scarce data.We use data for three Latin American cities with different physical and climate conditions and different urban layout characteristics: Buenos Aires (Argentina), Medellin (Colombia), and Recife (Brazil).
The structure of this paper is as follows: Section 2 describes the methodology including a description of the data and the three classification models that are utilized in this study.Section 3 provides the results and a discussion of the implemented approach.Section 4 presents the primary conclusions, suggestions for future research and policy-making implications for local governments and authorities.

Methods
Our goal is to design an algorithm that can automatically identify the areas of a city that possess the urban characteristics of a Slum.This problem can be defined as a binary classification problem for which the inputs are features that have been extracted from GE images and the output is a binary variable that assumes the value of 1 if a particular area of the city is a slum and 0 otherwise.Figure 1 summarizes the proposed approach for detecting slums.This process begins with collecting the input data for the administrative boundaries.Data are obtained from Open Street Maps (OSM) and GE images for two different time instances (upper portion of the figure).The second stage of the process (middle portion of the figure) includes calculating spectral, textural and structural variables (i.e., the image feature extraction) from the GE images.During this stage, the images are discretized by overlapping a regular grid; the outer border is defined by the OSM boundary.This procedure generates Spatial Datasets (one per year, per city) that are composed of regular polygons with their Remote Sens. 2017, 9, 895 3 of 23 corresponding spectral, textural and structural variables.Finally, the third stage (lower segment of the figure) includes a classification analysis.The data for the most recent year are used to train the classification models and identify the best-performing model for slum identification.The optimal model is then applied to images from prior years to identify urban changes in the most important areas of each city.
model is then applied to images from prior years to identify urban changes in the most important areas of each city.

The Data
We selected three Latin American cities to test the transferability of this approach: Buenos Aires, Argentina; Medellin, Colombia; and Recife, Brazil (Figure 2).These cities represent different climates, environmental conditions, and cultures and the use of different building materials.Buenos Aires is located at 34°35′59″S, 58°22′55″W at sea level and borders the La Plata river outlet to the ocean over plain lands and has a dry climate with marked seasons.Medellin is located at 6°14′41″N, 75°34′29″W in an intermountain valley at 1460 m above mean sea level and has a tropical, wet climate.Recife is located at 8°03′14″S, 34°52′51″W at sea level in a hilly terrain and has a tropical, wet climate.Table 1 provides general descriptions of these cities.

The Data
We selected three Latin American cities to test the transferability of this approach: Buenos Aires, Argentina; Medellin, Colombia; and Recife, Brazil (Figure 2).These cities represent different climates, environmental conditions, and cultures and the use of different building materials.Buenos Aires is located at 34 • 35 59" S, 58 • 22 55" W at sea level and borders the La Plata river outlet to the ocean over plain lands and has a dry climate with marked seasons.Medellin is located at 6 • 14 41" N, 75 • 34 29" W in an intermountain valley at 1460 m above mean sea level and has a tropical, wet climate.Recife is located at 8 • 03 14" S, 34 • 52 51" W at sea level in a hilly terrain and has a tropical, wet climate.Table 1 provides general descriptions of these cities.We downloaded the most updated (up to March 2016) GE images for each city and used a zoom level that was similar to VHR imagery with sub-meter pixel size.Google Earth imagery with very high spatial resolution is available for almost all urban areas worldwide.The VHR images were obtained from a number of providers or satellite platforms (e.g., Digital Globe, Geo Eye, and CNES/Astrium, among others).Images are captured by different sensors on different dates using different spatial resolutions; however, most of the images have a submeter pixel size and serve as natural-colored images that have three bands: red, green and blue (RGB).Because of the differences in platforms and different dates of acquisition, images captured at the same location on different dates will indicate differences through illumination conditions and color intensities.The GE images were georeferenced and rescaled between 0 and 255.We kept the preprocessing of the images to a minimum to gain speed in the workflow and to maintain the ease of automation of the whole approach.
Prior studies state that block-level spatial units of analysis are the most useful for urban planning purposes [13,24].OpenStreetMap (OSM) data that layer streets and roads are useful to delineate urban blocks.However, in developing countries, cities' street networks are incomplete because of the high density and complexity of slum areas [13] or because areas that have been recently occupied have not been registered in all of the OSM datasets, as is the case for the northeastern section of Medellin city.In these instances, the delineation of urban blocks would add considerable processing time to the approach because it would require visual interpretation and manual digitalization of roads and pedestrian paths.
A simple alternative that can be automated is using a regular grid to detect slums from remote sensing imagery.Prior studies have used regular grids to extract, aggregate and classify image data [8,25,26].A regular grid in a vector, or fishnet, format can be drawn using any GIS software; the only  We downloaded the most updated (up to March 2016) GE images for each city and used a zoom level that was similar to VHR imagery with sub-meter pixel size.Google Earth imagery with very high spatial resolution is available for almost all urban areas worldwide.The VHR images were obtained from a number of providers or satellite platforms (e.g., Digital Globe, Geo Eye, and CNES/Astrium, among others).Images are captured by different sensors on different dates using different spatial resolutions; however, most of the images have a submeter pixel size and serve as natural-colored images that have three bands: red, green and blue (RGB).Because of the differences in platforms and different dates of acquisition, images captured at the same location on different dates will indicate differences through illumination conditions and color intensities.The GE images were georeferenced and rescaled between 0 and 255.We kept the preprocessing of the images to a minimum to gain speed in the workflow and to maintain the ease of automation of the whole approach.
Prior studies state that block-level spatial units of analysis are the most useful for urban planning purposes [13,24].OpenStreetMap (OSM) data that layer streets and roads are useful to delineate urban blocks.However, in developing countries, cities' street networks are incomplete because of the high density and complexity of slum areas [13] or because areas that have been recently occupied have not been registered in all of the OSM datasets, as is the case for the northeastern section of Medellin city.In these instances, the delineation of urban blocks would add considerable processing time to the approach because it would require visual interpretation and manual digitalization of roads and pedestrian paths.
A simple alternative that can be automated is using a regular grid to detect slums from remote sensing imagery.Prior studies have used regular grids to extract, aggregate and classify image data [8,25,26].A regular grid in a vector, or fishnet, format can be drawn using any GIS software; the only necessary input is the boundary of the study area.This method could increase the speed of this study.We tested the use of two fishnets with different polygon sizes (a fishnet with square cells of 100 m on each side and another fishnet with square cells of 50 m on each side) for image feature extraction and classification.The results that were obtained with the 100 m grid outperformed the results that were obtained with the 50 m grid in regards to the correct classification of slum-like areas.The 100 m square cells are similar in size to actual urban blocks and have been recognized as an appropriate spatial unit of analysis to study intra-urban poverty for urban planning and policy making [24].We downloaded the administrative boundaries of each city from OSM using QGIS [27] to define the extent of the study areas, and then created a regular grid of square cells with 100 m on each side over the urban areas of each city to extract the image features.The use of administrative boundaries to select the study areas could introduce bias in the identification of slums, as those areas located just outside the fringe will not be included in the analysis.As the focus of this work is to test the ability to identify slums from GE imagery in the three different Latin American cities using the same approach, rather than identify all the slum areas in a particular city, we used the administrative boundaries to select the areas in the same way for all three cases.
In addition, we selected well known slum areas in each city and downloaded cloud-free GE images for each sector from approximately a decade prior to test the approach's ability to analyze changes in slum areas.We attempted to capture images from the same city at two different points in time that were roughly a decade apart to determine if the proposed approach could identify changes that had taken place between the dates.This time span was restricted by the availability of Google Earth's VHR images for each city and by the quality of the available images, which can be affected by the presence of clouds and shadows.Historical VHR imagery provided by GE is also restricted to the availability of commercial VHR data, which was released after the launch of the Ikonos satellite in 1999.The most updated good quality VHR images available for Buenos Aires, Medellin, and Recife are from 2006, 2008, and 2008, respectively.Although images from other dates are available for these cities in GE, they were captured using medium spatial resolution platforms and are not suitable for extracting spatial pattern descriptors at the intra-urban scale.
The historical GE images were resampled to the identical pixel size of the 2016 images of each city, and we performed radiometric normalization between the historical images and the 2016 images; the 2016 images were used as a reference.Resampling and radiometric normalization were performed to obtain historical images with the identical pixel size and similar color intensity as the 2016 images (i.e., pixel values in each RGB band).Preprocessing the historical images simplifies the process to identify changes and differentiates between changes in intensity because of differences in illumination and atmospheric conditions.

Feature Extraction
Different image texture measures and spatial pattern descriptors (structure measures) have been used for differentiating slum areas from formal ones in several cities of developing countries around the world [3,12,24,28,29].We used current GE images (obtained in March 2016) and the regular grid of each city to extract image information using FETEX 2.0. Figure 3 illustrates the outline of the urban areas for each city and selected sectors (500 by 500 m) illustrate the regular grid over the 2016 GE images.FETEX is an interactive software package that is used for image and object-oriented feature extraction [30] and is available on the Geo-Environmental Cartography and Remote Sensing Research Group website [31].We calculated three sets of variables: a set of spectral features, a set of textural features and a set of structural features.The image features are extracted from the image by processing the pixels that are located within the same polygon without changing the image resolution or pixel values.Spectral features provide information regarding color; texture and structural features provide information regarding the spatial arrangement of the elements within the image.The urban layout of slum-like neighborhoods often displays a more organic, crowded and cluttered pattern than for more formal and wealthy neighborhoods.Texture and structural features may help to differentiate between slum and no-slum areas [3,12,32,33].Spectral features: Spectral features include the summary statistics of pixel values inside each polygon.These features provide information regarding the spectral response of objects, which differs for land coverage types, states of vegetation, soil composition, building materials, etc. [30].We selected the mean and standard deviation for each RGB band and the majority statistic, to be extracted within this group.These features are easy to understand and provide better information about the spectral differences across the cities than other summary statistics (minimum, maximum, range, and sum).
Texture features: Textural features characterize the spatial distribution of intensity values of an image and provide information about contrast, uniformity, rugosity, etc. [30].FETEX 2.0 performs texture feature extraction based on the Grey Level Co-occurrence Matrix (GLCM) and a histogram of pixel values inside each polygon.The kurtosis and skewness features are based on a histogram of the pixel values inside the polygon; the GLCM describes the co-occurrences of the pixel values that are separated at a distance of one pixel inside the polygon and is calculated considering the average value of four principal orientations, 0°, 45°, 90° and 135°, to avoid any effects of the orientation of the elements inside the polygon [30].The GLCM of FETEX 2.0 was utilized to calculate a set of variables that were proposed by [34] and are widely used for image processing, including uniformity, entropy, contrast, inverse difference moment (IDM), covariance, variance, and correlation.The edgeness factor is another useful feature that represents the density of the edges of a neighborhood.The mean and standard deviation of the edgeness factor (MEAN EDG, and STDEV EDG) are also computed within this set of texture features in FETEX 2.0 [30].
Structural features: These features provide information regarding the spatial arrangement of elements inside the polygons in terms of the randomness or regularity of their distribution [30,35,36].Structural features are calculated in FETEX using the experimental semivariogram approach.According to [30], the semivariogram quantifies the spatial associations of the values of a variable, measures the degree of spatial correlation between the different pixels of an image and is a suitable tool to determine regular patterns.FETEX 2.0 obtains the experimental semivariogram for each polygon by computing the mean of the semivariogram calculated in six different directions, from 0° to 150° in increments of 30°.Then, each semivariogram curve is smoothed using a Gaussian filter to reduce experimental fluctuations [30].Structural features extracted from the semivariogram are based on the zonal analysis that is defined by a set of singular points on the semivariogram, such as the first maximum, the first minimum, and the second maximum [30].For a full description of Spectral features: Spectral features include the summary statistics of pixel values inside each polygon.These features provide information regarding the spectral response of objects, which differs for land coverage types, states of vegetation, soil composition, building materials, etc. [30].We selected the mean and standard deviation for each RGB band and the majority statistic, to be extracted within this group.These features are easy to understand and provide better information about the spectral differences across the cities than other summary statistics (minimum, maximum, range, and sum).
Texture features: Textural features characterize the spatial distribution of intensity values of an image and provide information about contrast, uniformity, rugosity, etc. [30].FETEX 2.0 performs texture feature extraction based on the Grey Level Co-occurrence Matrix (GLCM) and a histogram of pixel values inside each polygon.The kurtosis and skewness features are based on a histogram of the pixel values inside the polygon; the GLCM describes the co-occurrences of the pixel values that are separated at a distance of one pixel inside the polygon and is calculated considering the average value of four principal orientations, 0 • , 45 • , 90 • and 135 • , to avoid any effects of the orientation of the elements inside the polygon [30].The GLCM of FETEX 2.0 was utilized to calculate a set of variables that were proposed by [34] and are widely used for image processing, including uniformity, entropy, contrast, inverse difference moment (IDM), covariance, variance, and correlation.The edgeness factor is another useful feature that represents the density of the edges of a neighborhood.The mean and standard deviation of the edgeness factor (MEAN EDG, and STDEV EDG) are also computed within this set of texture features in FETEX 2.0 [30].
Structural features: These features provide information regarding the spatial arrangement of elements inside the polygons in terms of the randomness or regularity of their distribution [30,35,36].Structural features are calculated in FETEX using the experimental semivariogram approach.According to [30], the semivariogram quantifies the spatial associations of the values of a variable, measures the degree of spatial correlation between the different pixels of an image and is a suitable tool to determine regular patterns.FETEX 2.0 obtains the experimental semivariogram for each polygon by computing the mean of the semivariogram calculated in six different directions, from 0 • to 150 • in increments of 30 • .Then, each semivariogram curve is smoothed using a Gaussian filter to reduce experimental fluctuations [30].Structural features extracted from the semivariogram are based on the zonal analysis that is defined by a set of singular points on the semivariogram, such as the first maximum, the first minimum, and the second maximum [30].For a full description of these features, see [30,35,36].Table 2 provides a list of the remote sensing variables that are used for this analysis.After the image features are extracted, the next step is to create the dataset.This process includes selecting a ground truth sample for each city.Each polygon of the sample is manually labeled as one of two categories: slum or no-slum.Ancillary information and prior studies were used as reference to identify slum areas in each city for sampling.A slum area can be considered a homogeneous zone with specific characteristics, but it can exhibit different appearances depending on the context [37].However, most slum definitions relate to physical aspects of the built environment, which makes them comparable across settings.Although each city could have its own definition of slum, as pointed out by Taubenböck and Kraff, "the term slum is difficult to define, but if we see one, we know it" ( [13], p. 15).The location of slum areas in Buenos Aires were identified on the "Caminos de la Villa" website [38] that provides an interactive map of the city and the location of recognized "villas" (slums).For Medellin we used the delineation of urban slums from [3,39], which is based on survey data and the UN-Habitat global definition of slum [40].The benchmark slum areas in Recife were identified using the work of [41], which shows the delimitation of widely recognized slum areas or "favelas" in that city.We visually checked the selected slum areas in each case to ensure that we were picking similar slum-like areas in all three cities.We then labeled as "slum" all the 100 m cells that overlapped with the slum areas from those already identified in the benchmarks.The sampling of no-slum areas in each city included different formal urban layouts such as high and low rise residential areas, parks, urban forests, green spaces, and commercial and industrial areas such as malls, transport facilities and factories.This binary classification scheme is common practice in remote sensing object-oriented approaches for identifying slum areas [13,28,29].When benchmark information of slum areas is not available to construct a ground truth sample, practitioners must find reference information from local authorities or use an experienced interpreter who can visually determine slum and no-slum areas.Figure 4 provides the sampling spatial distribution for each city.
The final step in this stage is to divide the dataset into two sets: the training set that includes 60% of the sampled polygons for training and tuning the classification models and the testing set that includes 40% of the sampled polygons to evaluate the predictive capability of the classification models.Table 3 summarizes the composition of the datasets.
After the ground truth sampling was complete, we used the Kolmogorov-Smirnov (KS) test [42] and implemented the R package "kolmin" [43,44] to better understand the discriminating ability of the image-derived variables to differentiate slum areas from no-slum areas in each city.
Remote Sens. 2017, 9, 895 8 of 22 industrial areas such as malls, transport facilities and factories.This binary classification scheme is common practice in remote sensing object-oriented approaches for identifying slum areas [13,28,29].When benchmark information of slum areas is not available to construct a ground truth sample, practitioners must find reference information from local authorities or use an experienced interpreter who can visually determine slum and no-slum areas.Figure 4 provides the sampling spatial distribution for each city.The final step in this stage is to divide the dataset into two sets: the training set that includes 60% of the sampled polygons for training and tuning the classification models and the testing set that includes 40% of the sampled polygons to evaluate the predictive capability of the classification models.Table 3 summarizes the composition of the datasets.
After the ground truth sampling was complete, we used the Kolmogorov-Smirnov (KS) test [42] and implemented the R package "kolmin" [43,44] to better understand the discriminating ability of the image-derived variables to differentiate slum areas from no-slum areas in each city.

Classification Model
Classification literature is broad and multiple methods and algorithms have been proposed over recent decades [45].In general, the primary goal is to develop a quantitative classification method that is capable of determining and generalizing the relationships between a set of variables (X) and a categorical variable (Y).For our specific classification problem, X is a matrix that includes the spectral, textural, and structural values of each polygon in the grid and Y is a categorical variable that assumes the value of either 1 or −1 if a polygon is a slum or no-slum, respectively.The capability of the classification method is determined by two factors: (i) the theoretical definition of the classification boundary of the classifier (e.g., linear and nonlinear); and (ii) the complexity of the data.
Based on the classification boundary, classifiers are commonly designated as either linear or nonlinear.Linear classifiers, such as logistic regression and linear SVMs, assume that the categorical variable (Y) can be obtained by exploiting a linear combination of the input features (X).Nonlinear classifiers generalize the boundary by adjusting polynomial boundaries, Gaussian kernels, or algorithmic criteria based on feature thresholding.Figure 5 illustrates a linear and nonlinear decision boundary.Nonlinear classifiers can capture more complex patterns from the data, but as a

Classification Model
Classification literature is broad and multiple methods and algorithms have been proposed over recent decades [45].In general, the primary goal is to develop a quantitative classification method that is capable of determining and generalizing the relationships between a set of variables (X) and a categorical variable (Y).For our specific classification problem, X is a matrix that includes the spectral, textural, and structural values of each polygon in the grid and Y is a categorical variable that assumes the value of either 1 or −1 if a polygon is a slum or no-slum, respectively.The capability of the classification method is determined by two factors: (i) the theoretical definition of the classification boundary of the classifier (e.g., linear and nonlinear); and (ii) the complexity of the data.
Based on the classification boundary, classifiers are commonly designated as either linear or nonlinear.Linear classifiers, such as logistic regression and linear SVMs, assume that the categorical variable (Y) can be obtained by exploiting a linear combination of the input features (X).Nonlinear classifiers generalize the boundary by adjusting polynomial boundaries, Gaussian kernels, or algorithmic criteria based on feature thresholding.Figure 5 illustrates a linear and nonlinear decision boundary.Nonlinear classifiers can capture more complex patterns from the data, but as a consequence, are more computationally complex than their linear counterparts and may be able to memorize the training data (overfitting).The intrinsic complexity of the data cannot be easily understood or described, particularly for high dimensional datasets.The most intuitive method to understand the data complexity is by visualizing its features and the respective classes.This approach is generally restricted to low dimensional data (2D or 3D) or simplified versions of the feature space that are obtained using manifold algorithms such as Principal Components Analysis (PCA), IsoMaps, or Self Organizing Maps [46].A common approach that is used when working with dimensional data is to determine its complexity by comparing the capabilities of different classification algorithms to capture known patterns.To clarify, a simple classifier (linear) will perform poorly when using complex data (nonlinear) and complex classifiers (nonlinear) are able to use more complex data but have a large risk of overfitting.This risk is referred to as the bias-variance tradeoff [47], and it is commonly faced by adding a theoretical strategy known as regularization.The regularization strategy depends on the classification method and goes from the inclusion of additional terms in the error functions (e.g., Logistic Regression, SVM) to random disturbances in the training step and/or training data (e.g., Deep Neural Networks).In regards to the size of the training sets, there is no definitive number of observations that are required to train the models; this issue is commonly noted as a consequence of the complexity of the problem that is to be solved.Recent advances in data-science and deep-learning frequently refer to the benefits of large datasets; however, when data collection is expensive and time-consuming, a common practice is to observe changes in the evaluation criteria and sequentially increase the number of observations that are used to train the models.If the evaluation criteria do not improve (converge) as the number of training samples increases, then it is not necessary to collect additional training data.
Because our data have high dimensionality (30 features extracted per polygon) with unknown distributions and include data for three different cities, we explored two approaches for training our model to identify slums: (i) train a unique classifier on an unified dataset (i.e., without differentiating the cities) and then evaluate if the resulting slums are reliable; and (ii) use a multi-model approach by training the classifier in each city.Given the geographic and cultural differences as well as in the appearance of slums in these cities, fitting one method for slum identification in all the cities is a huge challenge.However, it is important to test its feasibility in the search of robust tools for rapid urban slum detection with good performance in different settings.We analyze the performance of linear (Logistic Regression, linear SVM) and nonlinear classifiers (Polynomial and Radial Basis Kernel SVMs and Random Forests), which are available in the Python library Scikit-learn by [48].
The Logistic Regression (LR) is the most common linear classifier and is frequently used by The intrinsic complexity of the data cannot be easily understood or described, particularly for high dimensional datasets.The most intuitive method to understand the data complexity is by visualizing its features and the respective classes.This approach is generally restricted to low dimensional data (2D or 3D) or simplified versions of the feature space that are obtained using manifold algorithms such as Principal Components Analysis (PCA), IsoMaps, or Self Organizing Maps [46].A common approach that is used when working with dimensional data is to determine its complexity by comparing the capabilities of different classification algorithms to capture known patterns.To clarify, a simple classifier (linear) will perform poorly when using complex data (nonlinear) and complex classifiers (nonlinear) are able to use more complex data but have a large risk of overfitting.This risk is referred to as the bias-variance tradeoff [47], and it is commonly faced by adding a theoretical strategy known as regularization.The regularization strategy depends on the classification method and goes from the inclusion of additional terms in the error functions (e.g., Logistic Regression, SVM) to random disturbances in the training step and/or training data (e.g., Deep Neural Networks).In regards to the size of the training sets, there is no definitive number of observations that are required to train the models; this issue is commonly noted as a consequence of the complexity of the problem that is to be solved.Recent advances in data-science and deep-learning frequently refer to the benefits of large datasets; however, when data collection is expensive and time-consuming, a common practice is to observe changes in the evaluation criteria and sequentially increase the number of observations that are used to train the models.If the evaluation criteria do not improve (converge) as the number of training samples increases, then it is not necessary to collect additional training data.
Because our data have high dimensionality (30 features extracted per polygon) with unknown distributions and include data for three different cities, we explored two approaches for training our model to identify slums: (i) train a unique classifier on an unified dataset (i.e., without differentiating the cities) and then evaluate if the resulting slums are reliable; and (ii) use a multi-model approach by training the classifier in each city.Given the geographic and cultural differences as well as in the appearance of slums in these cities, fitting one method for slum identification in all the cities is a huge challenge.However, it is important to test its feasibility in the search of robust tools for rapid urban slum detection with good performance in different settings.We analyze the performance of linear (Logistic Regression, linear SVM) and nonlinear classifiers (Polynomial and Radial Basis Kernel SVMs and Random Forests), which are available in the Python library Scikit-learn by [48].
The Logistic Regression (LR) is the most common linear classifier and is frequently used by policy makers in the econometric literature.This classifier is a mathematical approach whose primary goal is to use the logistic function to estimate the probability of a categorical value, Y, given the input features, X.For this classifier we used the Ridge regularization (known as L2), which compared to the Lasso regularization (known as L1) is less computationally expensive, provides a unique combination of coefficients, and, in case of correlated features, shrinks the estimates of the parameters but not to 0 [49][50][51].The Support Vector Machine (SVM) is a popular non-probabilistic classification algorithm and is commonly recognized for its capability to maximize the margins between a decision boundary and the observations belonging to the particular categories.SVM, as a logistic regression, relies on a mathematical formulation to express the classification task as an optimization problem.This algorithm is highly popular in machine learning literature because of its ability to use nonlinear boundaries (kernels) from the theoretical formulation and its explicit goal of locating the boundaries as far as possible of the training data.In the experiment section, we use the polynomial kernel (SVMk), with k ranging from 1 to 5, and the radial basis kernel (SVMrbk).See [45] for a complete overview of the optimization procedure and more detailed information regarding the kernel functions.The regularization, in the case of the SVM, is defined as a constant that can be tuned to reduce overfitting.Finally, the Random Forest (RF), contrary to the Linear Regression and the SVM, makes a decision based on a sequential set of thresholding rules on the input space.Theoretically, a RF is an ensemble method that is formed by multiple decision trees.The RF decision is the average of the individual decisions of its trees, each of which is trained on bootstrap subset taken from the complete training data [52].A decision tree is an algorithmic strategy that sequentially divides a feature space to fit the output variable [53].For the results section we use the Least Squared Error (LSE) as the optimization function, the maximum depth of the trees is set to 10, and each random forest includes 10 decision trees.The use of the average to obtain the final decision endows RF, and in general all the ensemble methods, with an intrinsic robustness to overfitting.This is frequently pointed out as one their most significant advantages in Machine Learning literature.

Model Performance Assessment
Our comparison of the classifiers is based on the β score (F β ), which is a numeric performance defined by Equation (1), where the precision and recall are defined by Equations ( 2) and (3), respectively.Generally, precision measures the reliability of the slums that are detected (the purity of the regions that are detected as slum areas) and recall measures how efficiently the classifier retrieves the areas that are defined as slum areas (the number of slums that are detected).The F β score, precision, and recall are bounded between 0 and 1; 1 represents a perfect classifier.The value of β must be selected according to the problem to be solved and is generally set to 0.5, 1 or 2. A value of β = 0.5 gives a larger weight to the precision and a value of β = 2 prioritizes the recall.In the remaining sections of this paper, β is defined as 2 (i.e., F β=2 ) to give more importance to recall.This implies that, when classifying areas as slum or no-slum, we prefer type I errors over type II errors to prevent the vulnerable population from being ignored in the consideration.
Once we have defined the best performing approach (unified or multi-model) and the best classifier, the next step is to tune the regularization constant to avoid overfitting the data and fine-tune the decision threshold to obtain the final F2-scores.The regularization constant is exhaustively tuned by evaluating the F2-score that is obtained while changing the regularization constant.The regularization constant that results in the highest F2-score is defined as the final choice.The decision threshold is the value for which the classifier decides whether a particular observation is classified as slum or no-slum.The decision threshold is selected by using the Receiver Operating Characteristic (ROC) curve, which is a visualization of the False-Positives rates (X-axis) and True-Positives rates (Y-axis) while changing the decision thresholds.The machine learning bibliography suggests that the threshold is defined as the closest point to the upper-left corner of the ROC curve.It is important to note that the decision thresholds of the logistic regression reported in Section 3 are not bounded between 0 and 1, which is equivalent to using the X-axis for the final decision.
To ensure the tuning process is fair (regularization constant, decision threshold), only observations are used in the training dataset, which is accomplished by using cross-validation F2-scores.To obtain the cross-validation F2-scores, the first step is to divide the training dataset into k equal sized parts.On a single iteration, a classifier (with a specific regularization constant and decision threshold) is trained on k − 1 parts and tested in the remaining part to keep the F2-score.This process is repeated k times to ensure that each part is used once for testing.The final cross-validation F2-score is the average of the obtained F2-score for each iteration.Our parameter selection is based on 10-fold cross-validation.

Slum Changes in Time
As stated above, we downloaded historical GE images for specific sectors of each city from roughly a decade ago (period t − 1) to perform change analysis.We selected identified slum sectors of 1 km 2 in the recent GE images (2016), and downloaded historical images of those sectors, from one decade ago, using historical imagery functionality in Google Earth.We applied relative radiometric normalization between the t − 1 image and the most recent image in each city.This process minimizes the differences in image data due to changes in atmospheric conditions, solar illumination, and view angles between images acquired at different dates.We extracted image features using the same regular grid of square cells and used the classifier model trained with the 2016 image-extracted data (period t) to classify each cell within the sector as either slum or no-slum.Then, cell by cell, we compared the results of the two dates (t vs. t − 1) and assigned different colors to differentiate the areas that were classified as slum for both dates, areas that were classified as no-slum for both dates, areas that were classified as no-slum for the t − 1 date but were classified as slum for the t date, and areas that were classified as slum for the t − 1 date and no-slum for the t date.
Following this rationale, we tested if the proposed approach could be useful to analyze slum dynamics over time by detecting areas that became slum areas, stable areas (no change), and areas that were slum areas and became no-slum areas by upgrading or through urban renovation processes.

Discriminating Image Features
The results of the Kolmogorov-Smirnov test indicate that the distributions of all image-derived variables are significantly different for the slum areas when compared to the no-slum areas.Figure 6 provides the boxplots for the five most discriminant image-extracted variables for each city (the results for the other variables are available upon request).It is notable that two of these five most discriminating variables were present for all three cities (SDF and CONTRAS) and all five variables are identical for Buenos Aires and Medellin (SFD, CONTRAS, IDM, MEAN EDG and FDO).These variables include textural and structural features, with the exception of MEAN1, which belongs to the spectral group and provides information about the mean of the intensity values in band 1, which corresponds to the red channel.SDF is a structure variable that provides information about homogeneity at short distances [35,36].Slum areas demonstrate lower homogeneity than no-slum areas because they often include a variety of small dwelling units with different roof colors in close proximity to each other [3].CONTRAS is a texture variable that provides information about the differences in color and intensity of the objects that are present in the image [34].For the three cities that were included in this analysis, the slum areas had higher values for this variable than the no-slum areas.MEAN EDG is an aggregated measure of the density of edges present in an image [30]; the slum areas of these cities had higher values for this variable than no-slum areas because of the smaller sized dwelling units, narrower roads, and the presence of shadows between housing units and their surroundings.IDM is a texture measure that provides information about the general homogeneity [30,34]; slum areas are characterized by lower values of this feature than no-slum areas [3].FDO is a structural feature that provides information about the variability of changes at short distances [35]; slum areas had higher values for this variable than no-slum areas because pixel values can change abruptly at short distances.AFM and DMF are structural features that are also related to the variability of the pixel values in the image [35]; Recife slum areas had higher values for both of these features than no-slum areas, which implies that slum areas often display more variability and less homogeneity than no-slum areas.These variables include textural and structural features, with the exception of MEAN1, which belongs to the spectral group and provides information about the mean of the intensity values in band 1, which corresponds to the red channel.SDF is a structure variable that provides information about homogeneity at short distances [35,36].Slum areas demonstrate lower homogeneity than no-slum areas because they often include a variety of small dwelling units with different roof colors in close proximity to each other [3].CONTRAS is a texture variable that provides information about the differences in color and intensity of the objects that are present in the image [34].For the three cities that were included in this analysis, the slum areas had higher values for this variable than the no-slum areas.MEAN EDG is an aggregated measure of the density of edges present in an image [30]; the slum areas of these cities had higher values for this variable than no-slum areas because of the smaller sized dwelling units, narrower roads, and the presence of shadows between housing units and their surroundings.IDM is a texture measure that provides information about the general homogeneity [30,34]; slum areas are characterized by lower values of this feature than no-slum areas [3].FDO is a structural feature that provides information about the variability of changes at short distances [35]; slum areas had higher values for this variable than no-slum areas because pixel values can change abruptly at short distances.AFM and DMF are structural features that are also related to the variability of the pixel values in the image [35]; Recife slum areas had higher values for both of these features than no-slum areas, which implies that slum areas often display more variability and less homogeneity than no-slum areas.

Classification into Slum and No-Slum
For the first step in our experimental analysis, we use data for three cities to build a unified model.Table 4 provides the F2-score for each type of classifier in the testing set for each city.It is evident from the table that SVM rbk is the best performing model.Regarding the polynomial SVMs (SVM 2 , ..., SVM 5 ), signals of underfitting occurred, particularly for the higher order models.The linear models (Logistic Regression and SVM 1 ) obtained a good classification score and did not show any signals of overfitting/underfitting.However, the Gaussian kernel of the SVM rbk performs considerably better in all the cities.Finally, the poor performance of the Random Forest is noteworthy.The low classification scores provided by certain algorithms suggest the existence of singularities within cities that may complicate the identification of slums using a unified model.An additional boost in the performances can be obtained by carefully tuning each of the classifiers.For simplicity, this tuning is only applied to the best performing algorithms in the final part of this section.Because of the differences in the cities' urban structures, we train a classification model for each city.Table 5 provides the testing F2-score for each model and each city.In this case, the best classification score is obtained by Logistic Regression and SVM rbk ; both models achieved F2 improvements between 2 and 5 points with respect to the unified model.The remainder of the models indicated certain improvements against their unified counterpart; however, their performance is still poor when compared to the Logistic Model and the SVM rbk .These results confirm the intuition of structural differences in the features of the slums for each city that preclude the implementation of a unified model, which is in line with [33] who found morphological differences in spatial, spectral and textural characteristics of deprived areas in Mumbai.Figure 7 shows the distribution of the time required by each algorithm to classify a cell.As expected, the Logistic Regression is the fastest approach.The speed of the SVMs is comparable among them, even for those with complex kernels.Finally, the Random Forest is the slowest of the proposed algorithms.The results do not show significant differences between cities.
The next step is to remove signs of overfitting/underfitting of the best performing models and tune the decision threshold (th).This step only includes the Logistic Regression and the SVM rbk .As explained in Section 2.2, the regularization term is selected by an exhaustive incremental search and the best threshold is selected by using the ROC curve.Table 6 provides the F2-scores of the default configuration (default), using only the tuned regularization term (Reg.) and using the tuned regularization term and the best threshold (Reg + th).The table confirms the benefits of the final tuning and allows us to conclude that the best strategy is to use a single model per city, include the regularization parameter and tune the decision threshold.The next step is to remove signs of overfitting/underfitting of the best performing models and tune the decision threshold (th).This step only includes the Logistic Regression and the SVMrbk.As explained in Section 2.2, the regularization term is selected by an exhaustive incremental search and the best threshold is selected by using the ROC curve.Table 6 provides the F2-scores of the default configuration (default), using only the tuned regularization term (Reg.) and using the tuned regularization term and the best threshold (Reg + th).The table confirms the benefits of the final tuning and allows us to conclude that the best strategy is to use a single model per city, include the regularization parameter and tune the decision threshold.
Figure 8 provides the F2-score for each SVMrbk while changing the regularization term.The regularization value that maximizes the F2-score is set as the regularization term of the model.Figure 9 illustrates the ROC curves given the best regularization term.As previously explained, the decision threshold is set as the regularization term that generates the closest point to the upper-Left corner of the curve.Using the data from Medellin, we found that the area below the ROC curve when using the 100 m fishnet was about 2% greater than the area below the ROC curve when using the 50 m fishnet (i.e., the ROC curve is slightly closer to the upper-left corner when using the 100 m fishnet).Finally, Table 7 provides the parameters that were selected for Table 6.   Figure 8 provides the F2-score for each SVM rbk while changing the regularization term.The regularization value that maximizes the F2-score is set as the regularization term of the model.Figure 9 illustrates the ROC curves given the best regularization term.As previously explained, the decision threshold is set as the regularization term that generates the closest point to the upper-Left corner of the curve.Using the data from Medellin, we found that the area below the ROC curve when using the 100 m fishnet was about 2% greater than the area below the ROC curve when using the 50 m fishnet (i.e., the ROC curve is slightly closer to the upper-left corner when using the 100 m fishnet).Finally, Table 7 provides the parameters that were selected for Table 6.The next step is to remove signs of overfitting/underfitting of the best performing models and tune the decision threshold (th).This step only includes the Logistic Regression and the SVMrbk.As explained in Section 2.2, the regularization term is selected by an exhaustive incremental search and the best threshold is selected by using the ROC curve.Table 6 provides the F2-scores of the default configuration (default), using only the tuned regularization term (Reg.) and using the tuned regularization term and the best threshold (Reg + th).The table confirms the benefits of the final tuning and allows us to conclude that the best strategy is to use a single model per city, include the regularization parameter and tune the decision threshold.
Figure 8 provides the F2-score for each SVMrbk while changing the regularization term.The regularization value that maximizes the F2-score is set as the regularization term of the model.Figure 9 illustrates the ROC curves given the best regularization term.As previously explained, the decision threshold is set as the regularization term that generates the closest point to the upper-Left corner of the curve.Using the data from Medellin, we found that the area below the ROC curve when using the 100 m fishnet was about 2% greater than the area below the ROC curve when using the 50 m fishnet (i.e., the ROC curve is slightly closer to the upper-left corner when using the 100 m fishnet).Finally, Table 7 provides the parameters that were selected for Table 6.For visualization purposes, the x-axis is reported in logarithmic scale.The final regularization term is reported in Table 7.For visualization purposes, the x-axis is reported in logarithmic scale.The final regularization term is reported in Table 7.For visualization purposes, the x-axis is reported in logarithmic scale.The final regularization term is reported in Table 7.
Table 7. Final parameters of the logistic regression and the SVM rbk .The values of the decision threshold for the logistic regression are reported in the x-axis, and are not bounded between 0 and 1.For both cases (Logistic Regression and SVM), more negative thresholds indicate the classifier is more prone to set an observation as slum.Figure 10 provides the maps of detected slum areas using the classification process of the 2016 GE images for each city.The percentage of urban areas that are covered by slums is 24% for Buenos Aires and 36% for both Medellin and Recife.The spatial patterns changes in slums areas across the cities is as follows.In Buenos Aires, the slums are dispersed in little pockets throughout the territory.The slums emerge in intra-urban vacant lots and even in the periphery of industrial areas.In Medellin, slums are located in peripheral green areas adjacent to existing slums.This pattern is one of the consequences of armed conflict with guerrillas in rural areas; large groups of individuals were forced to move to the periphery of major cities in Colombia.In Recife, the slums are distributed in large clusters throughout the city and near highways (e.g., Highway BR-101) and rivers (the Capibaribe River).Figure 10 provides the maps of detected slum areas using the classification process of the 2016 GE images for each city.The percentage of urban areas that are covered by slums is 24% for Buenos Aires and 36% for both Medellin and Recife.The spatial patterns changes in slums areas across the cities is as follows.In Buenos Aires, the slums are dispersed in little pockets throughout the territory.The slums emerge in intra-urban vacant lots and even in the periphery of industrial areas.In Medellin, slums are located in peripheral green areas adjacent to existing slums.This pattern is one of the consequences of armed conflict with guerrillas in rural areas; large groups of individuals were forced to move to the periphery of major cities in Colombia.In Recife, the slums are distributed in large clusters throughout the city and near highways (e.g., Highway BR-101) and rivers (the Capibaribe River).

Accuracy Assessment of Classification Results
The confusion matrix of the classification results presented in Table 8 illustrates the magnitude of the overestimation (no-slum areas classified as slum areas) and underestimation (slum areas classified as no-slum areas) for the testing dataset for each city.Buenos Aires resulted in less than 3% of overestimation and approximately 1% of underestimation; Medellin was the best case, with 2% overestimation and no underestimation; and Recife indicated 4.5% overestimation and less than 1% underestimation.Figure 11 provides the known slum sectors for each city using an identical spatial scale: Villa Zavaleta  in Buenos Aires, Comuna Santa Cruz in Medellin, and Chao de Estrelas in Recife.Table 9 provides a general characterization of slum areas for each city in terms of the image-derived features to better understand the classification results.The slums in all three cities are composed of clusters of small dwelling units and very few vegetated areas.

Accuracy Assessment of Classification Results
The confusion matrix of the classification results presented in Table 8 illustrates the magnitude of the overestimation (no-slum areas classified as slum areas) and underestimation (slum areas classified as no-slum areas) for the testing dataset for each city.Buenos Aires resulted in less than 3% of overestimation and approximately 1% of underestimation; Medellin was the best case, with 2% overestimation and no underestimation; and Recife indicated 4.5% overestimation and less than 1% underestimation.Figure 11 provides the known slum sectors for each city using an identical spatial scale: Villa Zavaleta  in Buenos Aires, Comuna Santa Cruz in Medellin, and Chao de Estrelas in Recife.Table 9 provides a general characterization of slum areas for each city in terms of the image-derived features to better understand the classification results.The slums in all three cities are composed of clusters of small dwelling units and very few vegetated areas.However, as expected from the boxplots of Figure 6, the slums in Buenos Aires and Medellin are similar when compared to slums in Recife.The slum areas in Buenos Aires and Medellin are characterized by high heterogeneity at short distances, high homogeneity at large distances and similar organic patterns.This means that there are different objects in close proximity (centimeters or few meters), but the same pattern is observed at larger distances across the territory (tens of meters); e.g., settlements that are made up of small dwellings with different building and roofing materials, which are located very close to each other, and with the same general pattern over the settlement or neighborhood.However, the slums in Buenos Aires are more cluttered than in Medellin.The slums in Recife have more homogeneity in color because most of the roofs are made of clay tiles or similar products, which explains the high discriminating power of the variable MEAN1 for this city because band 1 records the intensity values of the red channel in the visible spectrum and slum areas have many pixels with the same red tone in this city.Furthermore, the slums in Recife have more regularity in the spatial pattern of the urban layout than the slums in Buenos Aries and Medellin.
The lower score obtained for Buenos Aires may be explained by the quality of the input GE image and because the no-slum areas of the city had similar characteristics to the slum areas.The Buenos Aires' GE image indicates low contrast (differences in color intensity tend to be low across the image), which could lower the quantifiable differences between the slum and no-slum areas.Figure 12 features a Zavaleta villa next to a no-slum area.Both areas indicate that very few vegetated areas exist between buildings and high heterogeneity occurs at short distances, but the no-slum sector has more regularity in the spatial pattern of the urban layout and large homogeneous surfaces are interspersed with clusters of smaller buildings.However, as expected from the boxplots of Figure 6, the slums in Buenos Aires and Medellin are similar when compared to slums in Recife.The slum areas in Buenos Aires and Medellin are characterized by high heterogeneity at short distances, high homogeneity at large distances and similar organic patterns.This means that there are different objects in close proximity (centimeters or few meters), but the same pattern is observed at larger distances across the territory (tens of meters); e.g., settlements that are made up of small dwellings with different building and roofing materials, which are located very close to each other, and with the same general pattern over the settlement or neighborhood.However, the slums in Buenos Aires are more cluttered than in Medellin.The slums in Recife have more homogeneity in color because most of the roofs are made of clay tiles or similar products, which explains the high discriminating power of the variable MEAN1 for this city because band 1 records the intensity values of the red channel in the visible spectrum and slum areas have many pixels with the same red tone in this city.Furthermore, the slums in Recife have more regularity in the spatial pattern of the urban layout than the slums in Buenos Aries and Medellin.
The lower score obtained for Buenos Aires may be explained by the quality of the input GE image and because the no-slum areas of the city had similar characteristics to the slum areas.The Buenos Aires' GE image indicates low contrast (differences in color intensity tend to be low across the image), which could lower the quantifiable differences between the slum and no-slum areas.Figure 12 features a Zavaleta villa next to a no-slum area.Both areas indicate that very few vegetated areas exist between buildings and high heterogeneity occurs at short distances, but the no-slum sector has more regularity in the spatial pattern of the urban layout and large homogeneous surfaces are interspersed with clusters of smaller buildings.

Temporal Analysis
Figure 13 provides the results of the temporal analysis of the selected sectors of one square kilometer for each city.This approach is useful to provide information from a global perspective regarding areas that have changed from no-slum to slum and vice versa between the analyzed dates.However, as in the implemented algorithm, recall was given priority over the precision for a good identification of the more problematic regions within the city; we expected to obtain false positives in the classification results.These false positives adversely impact the interpretation at a detailed scale or on a cell-by-cell (of the regular grid) basis and can mask changes that an interpreter might see when comparing the two GE images for each sector.Because of the lack of reference data, we assessed the classification results of historic images by using on-screen interpretation.The obtained overall accuracies were 92% for Buenos Aires, 90% for Medellin and 72% for Recife.However, the results indicate certain interesting general trends: In Buenos Aires, slum areas tends to grow by using available space that is adjacent to already existing slum areas (e.g., vacant spaces between existing structures, zones adjacent to railroad tracks, and even parking lots in industrial areas).In Medellin, the slum areas grow by occupying undeveloped land on the edge of the urban perimeter.In 2008, we note the first part of the slum and, in 2016, the slum areas extend southwards over the adjacent "green" or "free" areas.Finally, in Recife, certain slum areas disappeared between 2008 and 2016; certain slum areas were located adjacent to the river and were removed to allow for a road on the riverbank.Certain other areas, with green or bare soil, were occupied by either slum areas or formal developments.
The proposed approach is optimal for identifying recently informally occupied urban areas that have slum characteristics versus changes due to slum upgrading processes.When slum areas are upgraded, this process often includes improving dwelling units and offering public services [2]; this process less often includes the modification of an urban layout because that implies relocation of a population and many slum residents fear that redevelopment will leave them homeless [54].In this regard, upgrading processes that do not significantly change the spatial pattern and texture of the urban areas cannot be determined by this approach because most of the image-derived features quantify aspects of the urban scene that are related to the spatial pattern and texture of the urban layout.
This workflow worked well for slum detection for a single date, but it did not work as well for spatio-temporal analysis.Although the historical images were resampled to match the pixel size of

Temporal Analysis
Figure 13 provides the results of the temporal analysis of the selected sectors of one square kilometer for each city.This approach is useful to provide information from a global perspective regarding areas that have changed from no-slum to slum and vice versa between the analyzed dates.However, as in the implemented algorithm, recall was given priority over the precision for a good identification of the more problematic regions within the city; we expected to obtain false positives in the classification results.These false positives adversely impact the interpretation at a detailed scale or on a cell-by-cell (of the regular grid) basis and can mask changes that an interpreter might see when comparing the two GE images for each sector.Because of the lack of reference data, we assessed the classification results of historic images by using on-screen interpretation.The obtained overall accuracies were 92% for Buenos Aires, 90% for Medellin and 72% for Recife.However, the results indicate certain interesting general trends: In Buenos Aires, slum areas tends to grow by using available space that is adjacent to already existing slum areas (e.g., vacant spaces between existing structures, zones adjacent to railroad tracks, and even parking lots in industrial areas).In Medellin, the slum areas grow by occupying undeveloped land on the edge of the urban perimeter.In 2008, we note the first part of the slum and, in 2016, the slum areas extend southwards over the adjacent "green" or "free" areas.Finally, in Recife, certain slum areas disappeared between 2008 and 2016; certain slum areas were located adjacent to the river and were removed to allow for a road on the riverbank.Certain other areas, with green or bare soil, were occupied by either slum areas or formal developments.
The proposed approach is optimal for identifying recently informally occupied urban areas that have slum characteristics versus changes due to slum upgrading processes.When slum areas are upgraded, this process often includes improving dwelling units and offering public services [2]; this process less often includes the modification of an urban layout because that implies relocation of a population and many slum residents fear that redevelopment will leave them homeless [54].In this regard, upgrading processes that do not significantly change the spatial pattern and texture of the urban areas cannot be determined by this approach because most of the image-derived features quantify aspects of the urban scene that are related to the spatial pattern and texture of the urban layout.
This workflow worked well for slum detection for a single date, but it did not work as well for spatio-temporal analysis.Although the historical images were resampled to match the pixel size of the updated reference images and they were normalized to match the color intensity, differences still exist for view angles, lighting, and vegetation phenology cycles between images that can affect vegetation appearance and shadow extent and affect the values of the image-derived features and the classification results.To minimize the differences in view angles and vegetation phenology, a practitioner must use historical images captured on the same day as the reference or an updated image; however, this is nearly impossible to control using the data available from Google Earth.Commercial satellite VHR imagery is more appropriate for this purpose because it can be acquired for specific dates and can match the day of the original image and minimize differences.The use of transfer learning methods recently introduced to remote sensing classification problems could overcome these issues [55].

Conclusions
This study explored implementing a low-cost standardized method for slum detection using spectral, texture and structural features extracted from VHR GE imagery that was utilized as input data and assessed the capability of three ML algorithms to classify urban areas as either slum or no-slum.Using data from Buenos Aires (Argentina), Medellin (Colombia), and Recife (Brazil), we determined that Support Vector Machine with radial basis kernel (SVMrbk) performed the best with a F2-score over 0.81.
In addition, we determined that the specific characteristics of each city are important to

Conclusions
This study explored implementing a low-cost standardized method for slum detection using spectral, texture and structural features extracted from VHR GE imagery that was utilized as input data and assessed the capability of three ML algorithms to classify urban areas as either slum or no-slum.Using data from Buenos Aires (Argentina), Medellin (Colombia), and Recife (Brazil), we determined that Support Vector Machine with radial basis kernel (SVM rbk ) performed the best with a F2-score over 0.81.
In addition, we determined that the specific characteristics of each city are important to consider and preclude the use of a unified classification model.The ML algorithms performed best for Medellin and Recife and resulted in F2-scores of 0.98 and 0.87, respectively.The image-derived features performed better for slum detection in these cities because their slum areas have a different spatial pattern and texture than no-slum areas and exhibit significant variations in the use of building and roofing materials.
The proposed workflow requires more sophistication to properly track changes over time because for the implemented ML algorithms, recall was given a higher priority than precision to obtain a good identification of the more problematic regions within the cities; false positives occurred in the classification results that adversely impact the change analysis between different dates.However, the proposed approach did identify recently and informally occupied urban areas that possessed slum characteristics, where the changes in local heterogeneity and the spatial pattern are clearly identified and were different from occupied formal areas.Changes in the slum status of an area because of upgrading processes would still be difficult to identify because those processes do not significantly change the spatial pattern and texture of the urban areas, which are the aspects quantified by the image-derived variables.
A suggestion for future studies is to use algorithms for object and scene recognition on images that are obtained from Google Street View to generate a new set of features that can improve the performance of our classification models.Street views and satellite imagery for slum identification can also be an important tool for supporting programs such as the Trust Fund for the Improvement of Family Housing that is led by the Development Bank of Latin America and the Foundation in Favor of Social Housing.

Figure 1 .
Figure 1.Flow diagram of the proposed approach for slum detection and change analysis.Figure 1. Flow diagram of the proposed approach for slum detection and change analysis.

Figure 1 .
Figure 1.Flow diagram of the proposed approach for slum detection and change analysis.Figure 1. Flow diagram of the proposed approach for slum detection and change analysis.

Figure 2 .
Figure 2. Cities included in the study.

Figure 2 .
Figure 2. Cities included in the study.

Figure 3 .
Figure 3. Urban areas and selected sectors showing the regular grid over the 2016 GE images of each city.From left to right: Buenos Aires, Medellin, and Recife.

Figure 3 .
Figure 3. Urban areas and selected sectors showing the regular grid over the 2016 GE images of each city.From left to right: Buenos Aires, Medellin, and Recife.

Figure 4 .
Figure 4. Sampling scheme of slum and no-slum areas for each city.From left to right: Buenos Aires, Medellin, Recife.

Figure 4 .
Figure 4. Sampling scheme of slum and no-slum areas for each city.From left to right: Buenos Aires, Medellin, Recife.

Figure 5 .
Figure 5. Linear and nonlinear classification boundary in 2D: (a) Linear Boundary on separable data; and (b) Nonlinear Boundary on separable data.

Figure 5 .
Figure 5. Linear and nonlinear classification boundary in 2D: (a) Linear Boundary on separable data; and (b) Nonlinear Boundary on separable data.

Figure 6 .
Figure 6.Boxplots of the distributions of the five most discriminant image-derived variables for each city.No-slum distributions are yellow and slum distributions are red.Variables are organized from left to right; the higher values of the Kolmogorov-Smirnov test are on the left and lower values are on the right.

Figure 6 .
Figure 6.Boxplots of the distributions of the five most discriminant image-derived variables for each city.No-slum distributions are yellow and slum distributions are red.Variables are organized from left to right; the higher values of the Kolmogorov-Smirnov test are on the left and lower values are on the right.

Figure 7 .
Figure 7. Time (s) taken by each method to classify a fishnet cell.

Figure 7 .
Figure 7. Time (s) taken by each method to classify a fishnet cell.

Figure 7 .
Figure 7. Time (s) taken by each method to classify a fishnet cell.

Figure 9 .
Figure 9. ROC curves used to select the best decision threshold: (a) Buenos Aires; (b) Medellin; (c) Recife.For visualization purposes, the x-axis is reported in logarithmic scale.The final regularization term is reported in Table7.

Figure 10 .
Figure 10.Classification results for 2016 GE images of each city to slum and no-slum areas.

Figure 10 .
Figure 10.Classification results for 2016 GE images of each city to slum and no-slum areas.

Figure 11 .
Figure 11.Sectors of known slum areas in each city.From left to right: Buenos Aires, Medellin and Recife.

Figure 11 .
Figure 11.Sectors of known slum areas in each city.From left to right: Buenos Aires, Medellin and Recife.

Figure 12 .
Figure 12.Slum sector in Buenos Aires (Villa Zavaleta) compared to the adjacent no-slum area.The red line indicates the slum boundary as mapped in [38].

Figure 12 .
Figure 12.Slum sector in Buenos Aires (Villa Zavaleta) compared to the adjacent no-slum area.The red line indicates the slum boundary as mapped in [38].
Remote Sens. 2017, 9, 895 19 of 22 the classification results.To minimize the differences in view angles and vegetation phenology, a practitioner must use historical images captured on the same day as the reference or an updated image; however, this is nearly impossible to control using the data available from Google Earth.Commercial satellite VHR imagery is more appropriate for this purpose because it can be acquired for specific dates and can match the day of the original image and minimize differences.The use of transfer learning methods recently introduced to remote sensing classification problems could overcome these issues [55].

Figure 13 .
Figure 13.Classification results for historical GE images of selected sectors for each city.

Figure 13 .
Figure 13.Classification results for historical GE images of selected sectors for each city.
MFM Mean of the semivariogram values up to the first maximum VFM Variance of the semivariogram values up to the first maximum DMF Difference between the mean of the semivariogram values up to the first maximum and the semivariance at first lag RMM Ratio between the semivariance at first local maximum and the mean semivariogram values up to this maximum SDF Second order difference between first lag and first maximum AFM Area between the semivariogram value in the firs lag and the semivariogram function until the first maximum 2.1.2.The Dataset

Table 3 .
Composition of the dataset (number of fishnet cells).

Table 3 .
Composition of the dataset (number of fishnet cells).

Table 7 .
Final parameters of the logistic regression and the SVMrbk.The values of the decision threshold for the logistic regression are reported in the x-axis, and are not bounded between 0 and 1.For both cases (Logistic Regression and SVM), more negative thresholds indicate the classifier is more prone to set an observation as slum.

Table 8 .
Confusion Matrix of the SVMrbk with the parameters reported in Table7.

Table 8 .
Confusion Matrix of the SVM rbk with the parameters reported in Table7.

Table 9 .
General characteristics of slums in the analyzed cities.

Table 9 .
General characteristics of slums in the analyzed cities.