Mapping Urban Land Use at Street Block Level Using OpenStreetMap , Remote Sensing Data , and Spatial Metrics

Up-to-date and reliable land-use information is essential for a variety of applications such as planning or monitoring of the urban environment. This research presents a workflow for mapping urban land use at the street block level, with a focus on residential use, using very-high resolution satellite imagery and derived land-cover maps as input. We develop a processing chain for the automated creation of street block polygons from OpenStreetMap and ancillary data. Spatial metrics and other street block features are computed, followed by feature selection that reduces the initial datasets by more than 80%, providing a parsimonious, discriminative, and redundancy-free set of features. A random forest (RF) classifier is used for the classification of street blocks, which results in accuracies of 84% and 79% for five and six land-use classes, respectively. We exploit the probabilistic output of RF to identify and relabel blocks that have a high degree of uncertainty. Finally, the thematic precision of the residential blocks is refined according to the proportion of the built-up area. The output data and processing chains are made freely available. The proposed framework is able to process large datasets, given that the cities in the case studies, Dakar and Ouagadougou, cover more than 1000 km2 in total, with a spatial resolution of 0.5 m.


Introduction
As reported by the United Nations, urban areas currently contain more than 50% of the world's population.According to the latest estimates, this proportion will reach 60% by 2030 [1].In developing countries, high urbanization rates and uncontrolled urban sprawl often lead to challenges such as inefficiency of transport systems, degradation of the environment, growth of informal settlements, and a proportion of the population living in deprived conditions.Availability of accurate and up-to-date information about the current situation of a city could help in defining and setting up adapted urban policies.
Among the set of potential geospatial information related to urban areas, population density and land use are probably the most important to an urban planner [2].Unfortunately, they are limited or not available at all in developing countries, as these lag behind the most developed countries in the adoption and use of geographic information systems (GIS) [3,4].This is especially the case for Africa, which faces a critical need of geographic information [5][6][7].For instance, a study showed that several important geographic datasets were still either unavailable or difficult to access in Africa [7].Notwithstanding recent initiatives to alleviate this issue [8] and a stronger interest towards alternative data, such as volunteered geographic information (VGI) [9], more progress needs to be made.
In urban areas, land-use information can be mapped at different scales that range from cadastral plots to large neighborhoods.In this study, we chose to work at the street block level, as was the case in previous studies [2,[10][11][12].The street block, sometimes referred to as a "city block" or "land parcel", provides sufficient spatial detail to urban planners and have been depicted as the most fundamental and appropriate unit in which to map the urban structure [13][14][15].Unfortunately, reference street block datasets were not accessible for our case studies, from either the local authorities and national mapping agencies or any other reliable source.We overcame this challenge by developing a semiautomated processing chain for the creation of street block geometries using OpenStreetMap (OSM) data [16].OSM is open-data, meaning it can be accessed and used at no cost by anyone and for any purpose, which makes it an alternative source of data when the availability and access to geoinformation is limited.Disparaged during its early stages of development, the quality of OSM data has been improving rapidly, both in terms of completeness and of thematic accuracy.For that reason, it could become a key player in the coming decade for production and access to high-quality geoinformation in developing countries.As an example, a recent study proved the potential of OSM data to be used for increasing the thematic level of land-use/land-cover maps where there is a lack of official data [17].
To the best of our knowledge, few works [18,19] have proposed a methodology for the creation of street block geometries using OSM data.Long and Liu [18] proposed a method to automatically identify "land parcels" from OSM roads.They operated in the Chinese geographic context and developed a framework to address outdated, inexistent, or unavailable reference data.Their approach consists of using geometric operations to clear up the road network.Subsequently, land parcels are automatically created and defined as the remaining space when buffered roads are removed.Their approach proved to be a good approximation of the results obtain from conventional methods but suffered from incompleteness of the OSM road network, leading to the creation of large parcels in smaller cities.Their framework was used recently in other studies [20,21].However, Long et al. [18] and Fan et al. [19] provided a theoretical framework without a ready-to-use computer code that limited the easy reproduction of their methods.
Studies aiming at mapping urban land use often make use of land-cover and/or ancillary reference geographic datasets, e.g., detailed cadastral datasets, socioeconomic datasets, or datasets that contain the location of urban facilities (schools, hospitals, shops, etc.) [11,[20][21][22].Despite their great potential for mapping land use at a fine scale, such exhaustive and detailed datasets are rarely available, especially in developing countries.Furthermore, the initial production and the process of keeping them updated are both costly and labor-intensive.Remote sensing solutions can be used as an alternative for creating and updating reliable land-use information on urban areas.The land use can be mapped directly from satellite imagery and/or from land-cover maps.
The latter approach usually relies on the computation of spatial metrics, also named "landscape metrics" [23].These metrics have been widely used for the classification and characterization of urban or rural areas.They were first mainly used in the field of landscape ecology [24,25] for their ability to characterize landscapes as ecosystems according to the composition and spatial organization of the land cover classes they contain.Their use in urban areas dates back to the 2000s [26] for studying urban sprawl [27], urbanization gradient [28], or land-use changes [29].
More broadly, this study is part of two research projects, namely, MAUPP (maupp.ulb.ac.be) and REACT (react.ulb.be),aiming at improving urban population distribution models and urban malaria risk models, respectively.In these projects, the land-use and land-cover information will be used for disaggregating population counts available for administrative units, using dasymetric modeling [30,31].Consequently, emphasis is placed on having sufficient thematic details for residential use to allow for adequate reallocation of population counts and modeling of population density at the intraurban level.These projects focus on sub-Saharan African cities, which implies the development of solutions that consider the scarcity of ancillary reference data.
The present research proposes a complete, mostly automated, framework for mapping land use at the street block level, using only very-high resolution (VHR) land-cover maps and remote-sensing-derived data.It includes the extraction of the street blocks from OSM and their subsequent characterization using spatial, spectral, and morphological metrics, a feature selection step for discarding highly correlated and redundant information and supervised classification using fandom forest.
This research deploys great efforts for research reproducibility and open access to data and products.Consequently, implemented computer codes and resulting datasets are made available at no cost to any interested users (see Appendix B).

Study Areas
The methodology presented here was applied to two cities in Western Africa, namely Ouagadougou and Dakar, the capitals of Burkina Faso and Senegal, respectively.The areas of interests (AOI) were selected to cover both the core of the city and the peri-urban areas, as there is a lack of a well-established consensus for the definition and delineation of urban areas [32].AOIs were selected through visual interpretation of VHR imagery and were not restricted to administrative units.This allowed for a wide capture of economic activities and urban sprawl.Figures 1 and 2 illustrate the extents of the AOIs, covering 615 km 2 for Ouagadougou and 418 km 2 for Dakar, superimposed with the administrative units.Land-cover map of Ouagadougou superimposed with administrative units.HB: High buildings; LB: Low buildings; SW: Swimming pools; AS: Asphalt surfaces; BS: Bare soils; TR: Trees; LV: Low vegetation; WB: Water bodies; SH: Shadows.

Input Data
The primary input data consisted of land-cover (LC) maps (Figures 1 and 2) derived from very-high resolution (VHR) satellite imagery, i.e., WorldView-3 and Pléiades for Ouagadougou and Dakar, respectively, with a spatial resolution of 0.5 m.These were produced using a semiautomated object-based image analysis (OBIA) [33] framework based on open-source solutions [34][35][36][37].The overall accuracy (OA) of the LC products was 93.4% and 89.5% for Ouagadougou and Dakar, respectively.Their legends are presented in Table 1.Additionally, normalized digital surface models (nDSM), i.e., datasets that contain the height of above-ground objects, were used.The nDSMs were derived from photogrammetric digital surface models (DSM) generated from a stereo triplet for Dakar and a stereo couple for Ouagadougou.Vegetation and water indices, i.e., normalized difference vegetation index (NDVI) and normalized difference water index (NDWI), respectively, were also used.

Extraction of Street Block Geometries Using OpenStreetMap
In OSM data, roads are the map features mostly associated with the highest completeness.A recent study [38] estimates that the OSM roads have reached more than 80% of completeness at a global scale.Although this high score hides important variations at regional or national scales, it encourages the use of this global dataset to develop solutions that can be applied worldwide.
In this research, we propose an approach similar to [19].Our method is a semiautomated workflow exploiting the OpenStreetMap data for the creation of urban street blocks geometries, to be used as a fundamental urban landscape unit to map land use [16].Different from proprietary solutions (ESRI ArcGIS) proposed in [19], it takes advantage of the open-source software PostGIS for storage, management, and processing of large vector datasets.The programming language is Python and the code is implemented in a "Jupyter notebook" [39] accessible under an open license on a dedicated online repository (Appendix B).It can be easily adapted to suit further research needs.The main steps are illustrated in Figure 3.
To map the land use at the street block level implies that blocks should have a high intrahomogeneity of the urban function.Indeed, it is important to get meaningful spatial units, according to the process investigated (here, the land use).Otherwise, the spatial metrics will be meaningless [40] and the classification task would be more complex, with more confusions between classes and lower confidence in the land-use maps produced.
The OSM road network alone could not adequately meet our needs.Indeed, in some situations, the edges of the blocks could be defined better using line segments of a river, hill, or other manmade structures [19].Actual land use is often a mix of uses, and thus it is difficult to reach a situation where all street blocks extracted would be homogenous in terms of land use.However, incorporating other extra map features (e.g., rivers, water bodies, railways, military camps, cemeteries, residential areas, farmlands, etc.) allowed for these problems to be reduced.Consequently, the blocks that were produced were not street blocks stricto sensu, but they met the needs of our analysis.Moreover, vector data such as administrative city sectors or functional zones could be used as ancillary datasets in addition to OSM data.
The script starts by taking as input a polygon shapefile corresponding to the AOI and optionally some ancillary vector layers.Then, the bounding box of the AOI is created and subdivided into tiles and OSM data are automatically downloaded using the OSM extended overpass API [41].Next, map features of interest are filtered according to their "key = value" pairs in the OSM tagging scheme [42,43].The map features (i.e., lines and polygons) are then intersected with the extent of the AOI and the polygons are converted into linear features.At this point, some lines that cross each other without being connected, e.g., because they do not share a common node at their intersection, are processed to obtain a stack of fully connected lines.Owing to coregistration inaccuracies and/or nearly redundant road geometries in OSM [44] or between OSM and ancillary data, many sliver polygons are created.This is overcome by using the PostGIS topological functions to merge neighboring nodes according to a user-provided snapping tolerance.The snapping tolerance should not be too large because it is likely to distort the accurately digitized road sections and make further steps more difficult [44].After this procedure, the street blocks polygons are extracted from the stack of lines.Similar to [19], two kinds of polygons are generated: (i) urban blocks and (ii) undesirable polygons (sliver polygons) resulting from multilane roads, functional roads near crossroads, or highway ramps.These sliver polygons are usually easily identifiable based on criteria of shape and size since they are thin and small.The user is here in charge of adapting the preset criteria to be used for identification of probable sliver polygons.
The sliver polygons are then eliminated by merging them with their neighboring nonsliver polygon with which they shared the larger border.This latest step iterates until no sliver polygons remain, resulting in final block geometries.

Computing Street Block Features
In this research, street block features used to classify land use can be separated in two groups.The first relates to spatial metrics computed based on the land-cover maps available.The other group include additional information, such as block morphology or features derived directly from the spectral values.In total, 116 and 97 features were computed for Ouagadougou and Dakar, respectively.All metrics were computed in GRASS GIS, using an automated script coded in Python [45] which is available on a dedicated repository (see Appendix B).

Street Blocks' Spatial Metrics (Patch-Based Metrics)
In this paper, the spatial metrics used are all related to the "patch mosaic" paradigm [40,46], whereby the landscape is viewed as a mosaic of land-cover patches.A patch could be defined as a group of neighboring pixels that belong to the same class.In that way, it acts as an abstraction level that masks some information of the actual landscape.For instance, in urban areas, a coalescence of hundreds of small individual buildings can form one single patch and could have the same size as a patch corresponding to a single large building, such as a commercial center.Amongst other things, this paradigm makes the use and interpretation of patch-based metrics difficult for nonexperts.According to [40], the behavior of spatial metrics are theoretically not well understood and their interpretation could be very challenging.There is a profusion of different patch-based metrics but all aiming at describing a landscape either on its composition/diversity or the spatial configuration of the patches it contains.
Different software can be used for computing spatial metrics and the best known is probably FRAGSTAT [23].Unfortunately, its use is limited by the size of the dataset that can be handled [40] and offers limited automation.As an alternative, we used the "r.li" suite of modules, available in GRASS GIS [47].These modules provide a set of landscape indices that can be found in FRAGSTATS and are designed not to overload the computer memory (i.e., the RAM), thus having the capacity to process large datasets [48].Besides, GRASS GIS is built as a collection of hundreds small programs, enabling all common GIS operations to be handled in the same environment in a computationally efficient manner.Importantly, the process could be automated thanks to the Python application programming interface (API) [49].The list of metrics computed is presented in Table A1 (see Appendix A).

Additional Street Block's Features
In addition to the spatial metrics described above, features related to the shape of the street blocks were computed, as well as key features aggregated from spectral data, e.g., the median and standard deviation of NDVI and NDWI, for their ability in the characterization of nonbuilt landscapes.Those additional features were computed using "i.segment.stats"add-on of GRASS GIS [50].Moreover, as information on the height of above-ground objects was available from the nDSMs, we computed the mean height of the building pixels.Table A2 (see Appendix A) summarizes the additional block features that are used in complement to spatial metrics.

Land Use Scheme and Sampling
The choice of the land-use classes constituting the legend scheme was made after a visual interpretation of the different types of urban structures and uses.Both cities are characterized by several types of land use such as industrial, commercial and services, administrative, or residential.In the land-use legend scheme (see Table 2), a clear focus is made on having a better thematic precision for residential areas than for other classes.It includes two residential classes enabling the distinction between planned (usually richer and with lower density) and unplanned/deprived (usually poorer and with higher density).The nonresidential built-up land uses, such as commercial, administrative, or services, were all grouped together in one single class.This was done because we intend to utilize the land-use information for further research regarding fine-scale modeling of population density.
Moreover, as we aimed at mapping the whole extent of the AOI, which encompasses peri-urban areas, we also included classes related to natural elements, e.g., vegetated or bare areas.Urban land use is often mixed because of the presence of multiple urban activities on the same block.However, our aim here was to map the dominant activity in the block.This explains the absence of "mixed" classes in the legends.
While urban patterns in Ouagadougou present a clear distinction between planned and unplanned neighborhoods (as visible in Figure 4a), in Dakar, the difference is less straightforward.There, some neighborhoods look more deprived than most of the residential areas, even if they present a semblance of regular street pattern (see Figure 4b).Previous research, integrating remote sensing and socioeconomic census data, proved that they are inhabited by a poorer population [51].
First, a set of 1648 and 1500 street blocks were randomly sampled for Dakar and Ouagadougou, respectively, for training a supervised classification algorithm and for validation.Each sampled block was then assigned a label by visual interpretation according to its supposed dominant land-use class.In the case of Dakar, the resulting training/test set was highly imbalanced, between "Planned residential" and "Deprived residential".The same was true for "Agricultural vegetation" and "Natural vegetation".For that reason, we manually sampled an extra 344 blocks to obtain a more balanced training/validation set.Next, for both case studies, a split in a 75%/25% ratio was made to get a training set and an independent validation set.During the process, the interpreter was asked about his confidence in giving an adequate label without any doubt.Finally, samples for which the interpretation decision was not certain, i.e., the experts were undecided about the land-use class to be attributed, were removed from the validation set (41 and 76 blocks removed for Dakar and Ouagadougou, respectively).This explains why the number of validation samples do not reach the 25% previously mentioned for some classes (see Table 1).

Feature Selection and Classification Using Machine Learning
A supervised random forest classifier (RF) was used for the classification step.RF is an ensemble of Classification and Regression Decision Trees (CARTs) [52], where each tree is trained on a random bootstrapped sample of the training data (about two-thirds of the data).In the end, a label is assigned, as derived from the combined predictions (majority voting) of each tree.Since RF is an aggregation of several individual and independent trees, it has been very commonly used in RS studies due to its high prediction accuracy and relative immunity to overfitting [53].To maximize performance, two parameters are usually fine-tuned in RF, the number of trees to grow and the number of randomly selected features at each decision point (split) within a tree.The former is commonly suggested to be set as high as computationally efficient [52], while the value of the latter is identified through cross-validation of the out-of-sample training data, known as Out of Bag (OOB) error.
As already mentioned, many features were computed for both case studies.A large proportion were spatial metrics which are inherently highly correlated and redundant since they are all dependent on a small amount of basic patch metrics for their computation, e.g., area, perimeter, patch, and neighboring patch type [54].This kind of dataset could result in an underperforming and unnecessarily complex classification model.Consequently, we performed a feature selection (FS) procedure prior to the classification step with the aim of constructing smaller, more predictive and parsimonious models [55].The "Variable Selection Using Random Forest" (VSURF) algorithm, a popular automated method for FS selection developed by [56], was used.The salient features of VSURF are categorized in defining three types of feature subsets: (i) removing useless features, (ii) finding the most predictive set of features which may contain a great amount of redundancy, and (iii) retaining the accuracy while removing redundant features through a stepwise search.
Feature selection and classification were performed using the R software, version 3.5.0[57].The R code has been made available in R markdown format [58] on a dedicated repository (see Appendix B).

Extraction of Street Block Geometries
Our processing chain was used to create the street block geometries using a large amount of input data thanks to the capabilities of PostGIS.To give an order of magnitude, in Ouagadougou, more than 47,000 blocks were extracted from a set of more than 180,000 segments.The number of sliver polygons present after this initial extraction was quite impressive: 32.6% and 31.5% for Ouagadougou and Dakar, respectively.Sliver polygons were removed to produce a final layer containing nearly 32,000 street blocks geometries for Ouagadougou and 23,000 for Dakar.In Ouagadougou, an existing ancillary layer produced in a previous study [35], whereby the city had been delineated into local morphological zones, was used.
Figure 5 illustrates the results from different main steps of the processing chain.The initial stack of linear elements coming from OSM and ancillary data is quite chaotic (see Figure 5a).Snapping all nodes (here, with a snapping threshold of 7 m), enables efficient cleaning of the initial errors but some sliver polygons remain (see Figure 5b).The final geometries after the removal of sliver polygons are presented on Figure 5c.

Automated Feature Selection
Feature selection was performed on the initial set of features computed and resulted in an impressive reduction of 81.9% (from an initial set of 116 features to 21 remaining features) and 86.6% (from 97 initial features to 13 remaining) for Ouagadougou and Dakar, respectively.The list of selected features is presented in Table 3. Globally, spatial metrics relative to almost all land-cover between the residential classes in Ouagadougou, they were of a larger magnitude in Dakar.In both cases, most of the confusion occurred between the "Plan residential" and "Nonresidential built-up" classes.Moreover, misclassifications appeared between "Bare soils" and "Low vegetation", as was expected, since many nonbuilt street blocks present a mix of vegetated and bare soil elements.
The analysis of the RF feature importance reveals that, for both cases studies, the most important features are those related to the built environment (see Tables 6 and 7).They are in the top-five features in Ouagadougou and in the top four in Dakar (assuming shadows are a proxy of the built-up patterns).For the built-up classes, height is an important element, as witnessed by the selection of proportions of high and low buildings.It is interesting to notice the importance of shadows patch density as a top feature in Dakar for "Planned residential" which is not the case in Ouagadougou.This could be explained by the fact that residential buildings are more often multi-stories in Dakar than in Ouagadougou.Thereby, this shadow-related feature could be considered as a proxy of the presence of highly elevated built-up structures.Unsurprisingly, the vegetation index (NDVI) is the best feature for the vegetated land-use classes.Bare soils also present a feature related to the built land-cover classes.We assume it should be an inverse relation, i.e., characterizing the blocks as having no presence of built-up.

Introduction of Uncertainty and Thematic Improvement of Final Products
Errors and uncertainty are inherent in any classification problem.Even if the classifier provides a class label for each item, predictions could be affected by a high level of uncertainty.RF natively provides the class probability for each street block [60].We take the decision to use this essential information to reclassify street blocks for which the prediction was highly uncertain.We compute the difference between the probabilities of the most probable and the second most probable class.Street blocks having a difference of less than 5 percentage points were then relabeled as "Uncertain" (see Figure 6c).It concerns 3.7% and 4.1% of the available street blocks for Ouagadougou and Dakar, respectively.For the convenience of the users, all class probabilities are included in the product releases.
Residential built-up density is usually a good indicator of population densities.For that reason, we use the information about blocks' percentage of built-up patches to discriminate between different densities of built-up.(Figure 6d).In both case studies, street blocks classified as "Planned residential" were relabeled as "Planned residential (low density)" if their built-up percentage was lower than 30% and 40% for Ouagadougou and Dakar, respectively.In Ouagadougou, the same approach was used to distinguish two classes of built-up density for the "Unplanned residential" class, with a threshold fixed at 15% of built-up, and to enable a split between peri-urban settlements and slum-like patterns.The choice of these thresholds was made through trial-and-error, relying on visual assessment of the land-cover map.The final land-use maps are visible in Figures 7 and 8.For the convenience of the reader, they can be visualized online along with the land-cover information (https://tgrippa.github.io/Landuse_from_landcover_webmap/).

Discussion
The solution proposed in this paper proved to be operational for processing very large areas, as our case studies datasets cover more than 1000 km 2 in total, with a spatial resolution of 0.5 m.However, some limitations can be highlighted.
The first limitation relates to the completeness of OSM data.A quantitative evaluation of the geometric and semantic quality of the street blocks is out of the scope of this article, but some aspects can be discussed.A qualitative visual assessment shows that the consistency is more evident in the core urban areas, where the street network is denser and OSM data generally more complete.From several tests that were carried out, we concluded that the resulting street blocks may not be as detailed as expected, e.g., presence of polygons that are too large and encompass multiple distinct land uses.This is mostly related to the fact that the OSM database is not complete enough for certain locations, especially in peri-urban areas.To solve this issue, time was dedicated to the digitization of additional map features in OSM (e.g., roads, tracks, natural elements, etc.) at the periphery of our AOIs (peri-urban areas) to meet our requirements.This also contributed to the completion of the OSM database, which is a positive outcome.Since the OSM data completeness is increasing, it is likely that such issues will become less prevalent in the future.However, the performance of the proposed framework is likely to decrease as the landscape becomes more rural.Further research could look for other strategies for the automated extraction of meaningful landscape units for mapping the land use in rural and peri-urban areas.
The second limitation is linked to the spatial metrics.The selection of relevant spatial metrics for the phenomenon under investigation and the interpretation of their behaviors can be a challenging task in itself [40].Moreover, it is likely that some metrics that perform well in one case study are less discriminant for another.It was the case in our results and this could be interpreted because of differences in terms of urban landscapes.As a solution, computing many metrics and feeding them into a feature selection procedure allows for the unsupervised selection of a parsimonious set of features.
Thirdly, the labelling procedure for creating the training and validation sets may clearly be a bottleneck if automation is mandatory.Further research could explore the possibility of taking advantage of the OSM database for the automatic selection and labeling of these samples, as OSM contains some information on land use and Point of Interest (POI).
Next, future studies aiming at implementing the same kind of workflow that we present here should consider the possibility of improving efficiency by computing the metrics for the street blocks belonging to the training samples only.Since they are sufficient for performing the feature selection step, this would save processing time and storage space [61].Only the most discriminant features could then be computed for the whole AOI.This approach would allow for computing a very large number of features without creating computational and storage issues.
Finally, as previously mentioned (see Section 2.4.1), the "patch mosaic" paradigm hides some aspects of the urban structures, which is likely to limit the ability of spatial metrics to adequately characterize urban land use.Possible future work should investigate a broader workflow that would include explicit information derived from the OBIA segmentation process.For example, information on individual segments could be computed, e.g., area, compactness, and fractal dimension, and then summarized either at the class or at the landscape level.
Prediction errors and the corollary uncertainty of the produced maps are important points that any classification framework should consider.In this study, we used the class-probability output from the RF model to identify street blocks for which the prediction was affected by an important level of uncertainty.In addition to the land-use maps where labels correspond to the most probable class, we also provide the class-probability values for each street block.This information is useful especially when classification products are used as input data to other classification or modeling tasks since it is well known that errors propagate to the derived products.In the future, we plan to carry out sensitivity analysis to assess how errors and uncertainty of land-cover maps affect the derived land use and the models of spatial distribution of population densities.

Conclusions
While availability of up-to-date and reliable geographical information on urban areas is sorely missing in developing countries, new sources of information such as VGI can overcome existing challenges.This research presented a workflow, mostly automated, for mapping urban land use at the street block level, with a focus on residential use.The proposed framework proved its ability to efficiently handle large datasets, since the two case studies, Ouagadougou and Dakar, covering more than 1000 km 2 in total, achieved 84% and 79%, respectively.All of the computer codes developed and the resulting datasets have been released in open-access to any interested users.
Author Contributions: T.G. is the main author of the study who wrote the manuscript, analyzed the results, developed the code for street blocks extraction and the final version of the code for spatial metrics computation.S.G. extracted the street blocks and performed the classification using R software.S.Z.initiated an exploratory analysis in the early stages of this study and assessed the ability of GRASS GIS to meet the needs.M.L. provided support in programming.Y.F.developed the online map for visualization of the results.T.G., S.G., N.M., P.B., and E.D. contributed to the visual interpretation for creation of training/validation sets.S.G., M.L., S.V., N.M., and E.W. revised the manuscript and helped to improve it.
The results of the land use classification and the street blocks extracted: • Ouagadougou land-use map [64] is referenced and available on https://doi.org/10.5281/zenodo.1291384.The version produced in this research is referred as v1.0 (10.5281/zenodo.1291385).
The R code used for the feature selection and RF classification steps, belonging to the dataset of features used and training/test sets, is available in the following Github repository: https://github.com/ANAGEO/R_stuff/tree/master/VSURF_FeatureSelection_RF_Optimization.
The semiautomated processing chain for extraction of street block from OSM using PostGIS is available in the following Github repository: https://github.com/ANAGEO/OSM_Streetblocks_extraction.
The semiautomated processing chain for computation of spatial metrics using GRASS GIS is available in the following Github repository: https://github.com/tgrippa/Street_blocks_features_computation.
The piece of Python code used for computing uncertainty form the probabilistic output of RF: https://github.com/ANAGEO/RFprob_to_uncertainty.

Figure 3 .
Figure 3. Flowchart of the semiautomated processing chain for the extraction of street blocks from OpenStreetMap and ancillary vector data.

Figure 4 .
Figure 4. (a) Opposition between planned residential neighborhoods and unplanned ones in Ouagadougou.(b) Opposition between planned residential areas and deprived (poorer) neighborhoods in Dakar.

Figure 5 .
Figure 5. Extraction of street blocks from OSM data and ancillary vector data.(a) Lines and polygons coming from OSM and ancillary vector layer; (b) Street blocks that contain several undesired polygons (sliver polygons); (c) Final street blocks extracted.

Figure 6 .
Figure 6.Addition of uncertainty and built-up density to refine the thematic precision of the maps for Ouagadougou.(a) Land cover map for comparison purpose; (b) Most probable class from the random forest classifier; (c) Introducing "Uncertain" class; (d) Thematic refinement of residential classes according to the computed proportion of buildings.Land-cover classes (a) HB: High buildings; LB: Low buildings; SW: Swimming pools; AS: Asphalt surfaces; BS: Bare soils; TR: Trees; LV: Low vegetation; WB: Water bodies; SH: Shadows.Land-use classes (b-d) VEG: Vegetation; BARE: Bare soils; ACS: Nonresidential built-up (administrative, commercial, services, etc.); PLAN: Planned residential; PLAN LD: Planned residential (low density); UNPLAN: Unplanned residential; UNPLAN LD: Unplanned residential (lox density); UNCERT: Uncertain prediction.

Table 1 .
Legend of the land-cover maps used as input to compute spatial metrics.

Table 2 .
Legend scheme of land use for Ouagadougou and Dakar and size of training and test sets (number of street block polygons).

Table 6 .
Ouagadougou -Per class feature importance from the random forest classifier (mean decrease in accuracy).Only the 10 most important are presented."SD" refers to standard deviation.The color-ramp indicates the feature importance for each land-use classes, with darker green corresponding to the top feature for each class (number in bold).ACS: Nonresidential built-up (administrative, commercial, services, etc.); BARE: Bare soils; PLAN: Planned residential; UNPLAN: Unplanned residential; VEG: Vegetation.

Table 7 .
Dakar-Per class feature importance from the random forest classifier (mean decrease in accuracy).Only the 10 most important are presented.The color-ramp indicates the feature importance for each land-use classes, with darker green corresponding to the top feature for each class (number in bold).ACS: Nonresidential built-up (administrative, commercial, services, etc.); AGRI: Agricultural vegetation; BARE: Bare soils; DEPR: Deprived residential; PLAN: Planned residential; VEG: Natural vegetation.