An Open-Source Semi-Automated Processing Chain for Urban Object-Based Classiﬁcation

: This study presents the development of a semi-automated processing chain for urban object-based land-cover and land-use classiﬁcation. The processing chain is implemented in Python and relies on existing open-source software GRASS GIS and R. The complete tool chain is available in open access and is adaptable to speciﬁc user needs. For automation purposes, we developed two GRASS GIS add-ons enabling users (1) to optimize segmentation parameters in an unsupervised manner and (2) to classify remote sensing data using several individual machine learning classiﬁers or their prediction combinations through voting-schemes. We tested the performance of the processing chain using sub-metric multispectral and height data on two very different urban environments: Ouagadougou, Burkina Faso in sub-Saharan Africa and Li è ge, Belgium in Western Europe. Using a hierarchical classiﬁcation scheme, the overall accuracy reached 93% at the ﬁrst level (5 classes) and about 80% at the second level (11 and 9 classes, respectively).


Introduction
Land-use/land-cover (LULC) information extraction is one of the main use cases of remote sensing imagery.The advent of sub-meter resolution data brought about the revolution of methods from pixel-based to object-based image analysis (OBIA) involving image segmentation.The latter provides many new opportunities and highly increases the quality of the output, but there remains a number of challenges to address.
First of all, segmentation parameters are often selected after a tedious and time-consuming trial-and-error refinement [1,2].This method consists of a manual step-by-step segmentation parameters adjustment, relying on subjective visual human interpretation.Despite such efforts, the validity of the selected parameters is usually restricted to the specific scene under study, or even to specific areas within this scene, and they have to be adapted for each dataset.Unsupervised optimization methods meet the requirements for automation in the OBIA process, as they can be used to automatically adjust the segmentation parameters [1].
Second, during the classification step, many authors use rule-based approaches, which can be efficient on a specific dataset (e.g., [3,4]).However, their transferability remains an issue [5,6] as they also generally rely on manual intervention by the authors, with many choices guided by scene specificities.As an alternative, machine-learning classifiers, e.g., random forest or support vector machines (see [7,8] for a review of applications in remote sensing), have proven their efficiency for remote sensing data classification.While identification of the best performing classifier cannot rely on a priori knowledge, the combination of the results of multiple classifiers through an ensemble or voting schemes is a solution towards the development of more automated classification processes, as it "[...] makes the performance of the system more robust against the difficulties that each individual classifier may have on each particular data set".[9] (p.705).
Third, much of the work presented on OBIA tool chains is black box.First, the specific decisions of authors concerning parameter settings in the manual processes described above are based on their subjective evaluation, which is not always easy to reproduce.Moreover, even if their procedures are well documented, algorithms implemented in proprietary software cannot be properly reviewed as their code is distributed as closed source.This concerns the core software and also, in some cases, extensions of that software (e.g., the 'Estimation of Scale Parameter' (ESP) tool published in [10]).Furthermore, only those who have access to the software can attempt the replication of the results.In times when the reproducibility of research is high on the discussion agenda [11], the use of free and open-source solutions, including access to the code developed by researchers in their work, becomes paramount.
Linked to the previous point, the question of access to the necessary tools is of great importance, especially for many researchers in poorer countries where the lack of resources reduces their options [12], and especially for research using remote sensing [13].Again, free and open-source solutions provide an answer to this issue by creating common-pool resources that all researchers can use, but also contribute to.Licensing costs can also be an obstacle to the upscaling of processes, especially in times of big data with ever-increasing spatial, spectral, and temporal resolutions [13].Free and open-source software can help researchers surmount this challenge by letting them run their programs on as many different cores or machines as necessary without having to worry about software costs.
In this paper, we present a complete semi-automated processing chain for urban LULC mapping from earth observation data, which responds at least partly to the above issues.This chain was initially presented at the GEOBIA 2016 conference [14].Freely available to any potential user, it should be seen as a framework that can be reused, modified, or enhanced for further studies.The chain was developed in a completely free and open-source environment, using GRASS GIS (Geographical Resources Analysis Support System) [15] and R [16], and was immediately reinjected into the wider open-source community.It contains tools for unsupervised segmentation parameter optimization, statistical characterization of objects, and machine-learning techniques combined through a majority-voting scheme.Care was taken to make the use of this processing chain accessible even to novice programmers.The proposed framework was tested with similar datasets on two very different urban environments to assess its transportability, i.e., the ability to achieve accurate classification when applying the same generic framework to different scenes with similar datasets [17].

Methods and Tools
The processing chain mainly relies on the open-source software GRASS GIS, that has been in continuous development since the 1980s and is now one of the core components of the Open Source Geospatial software stack [18].This multipurpose Geographical Information System is made of hundreds of small programs [19], called 'modules' or 'add-ons', enabling users to carry out a large variety of geospatial processes [18].Thanks to its continuous review mechanism and to its active community that has strong links with academia, GRASS GIS is increasingly being used by researchers [20][21][22][23][24][25].Since 2012, GRASS GIS has had major advances in object-based image analysis (OBIA).
The proposed chain is made of the core Python code linking GRASS GIS functions thanks to the GRASS Python scripting library.It is implemented in a 'Jupyter notebook' that enables researchers to easily share the computer code that they developed for their studies and that often remains unpublished [26].This programming environment allows users to mix both explanatory text sections with the related computer code that can be executed in the same document (see Figure 1).Care was taken to clearly document the code and to refer to the official help and/or scientific references.
The Jupyter notebook is subdivided into several parts corresponding to the different processing steps (see Figure 1) which are summarized in the flowchart presented in Figure 2.
The GRASS GIS add-ons used in the processing chain are briefly presented below.For a more detailed description of those add-ons, interested readers may refer to the presentation made during the FOSS4G 2016 conference [27].

Segmentation and Unsupervised Segmentation Parameter Optimization (USPO) Tools
The segmentation was performed using the i.segment module of GRASS GIS [28].This module implements image segmentation with a region-growing algorithm or an experimental mean-shift algorithm which was added recently.The region-growing algorithm, which is used in this study, requires a standardized 'threshold' parameter below which regions are merged, and a 'minsize' parameter defining the minimum size of regions.As with most GRASS GIS modules, the i.segment module is designed to handle very large datasets while keeping a low memory footprint.As an example of the orders of magnitudes, we encountered an issue when exceeding 2 billion objects, and this issue was solved quite quickly by the responsive GRASS Development Team.Most of the elements in the processing chain offer the option of using parallel computing to accelerate the analyses.Scaling is thus possible across all available cores, within the limits of available memory and input-output restrictions.
Care was taken to clearly document the code and to refer to the official help and/or scientific references.The Jupyter notebook is subdivided into several parts corresponding to the different processing steps (see Figure 1) which are summarized in the flowchart presented in Figure 2.
The GRASS GIS add-ons used in the processing chain are briefly presented below.For a more detailed description of those add-ons, interested readers may refer to the presentation made during the FOSS4G 2016 conference [27].

Segmentation and Unsupervised Segmentation Parameter Optimization (USPO) Tools
The segmentation was performed using the i.segment module of GRASS GIS [28].This module implements image segmentation with a region-growing algorithm or an experimental mean-shift algorithm which was added recently.The region-growing algorithm, which is used in this study, requires a standardized 'threshold' parameter below which regions are merged, and a 'minsize' parameter defining the minimum size of regions.As with most GRASS GIS modules, the i.segment module is designed to handle very large datasets while keeping a low memory footprint.As an example of the orders of magnitudes, we encountered an issue when exceeding 2 billion objects, and this issue was solved quite quickly by the responsive GRASS Development Team.Most of the elements in the processing chain offer the option of using parallel computing to accelerate the analyses.Scaling is thus possible across all available cores, within the limits of available memory and input-output restrictions.The choice of segmentation parameters is an important step in OBIA.Indeed, the ultimate goal of segmentation is to cluster individual pixels into meaningful objects, i.e., objects that correspond as much as possible to the geographical objects of interest in the scene.Moreover, the impact of segmentation quality on the accuracy of the classification seems obvious, even though a recent study [29] argues that this link is not so straightforward.
Usually, the selection of segmentation parameters is carried out using a 'trial-and-error' approach that relies on the visual assessment of several naïve segmentation results, and gradual adjustment of the segmentation parameters.This method presents the disadvantages of being subjective and requiring a tedious and time-consuming effort.
When objectivity is required in the evaluation of the segmentation results, several empirical methods can be used.Among them, a distinction can be made between the supervised (empirical discrepancy methods) and the unsupervised approaches (empirical goodness methods), depending on the requirement of a reference object delineation [1,30].Both supervised and unsupervised methods allow the comparison of different segmentation algorithms or of different parameters used in a single algorithm (segmentation parameter optimization).The choice of segmentation parameters is an important step in OBIA.Indeed, the ultimate goal of segmentation is to cluster individual pixels into meaningful objects, i.e., objects that correspond as much as possible to the geographical objects of interest in the scene.Moreover, the impact of segmentation quality on the accuracy of the classification seems obvious, even though a recent study [29] argues that this link is not so straightforward.
Usually, the selection of segmentation parameters is carried out using a 'trial-and-error' approach that relies on the visual assessment of several naïve segmentation results, and gradual adjustment of the segmentation parameters.This method presents the disadvantages of being subjective and requiring a tedious and time-consuming effort.
When objectivity is required in the evaluation of the segmentation results, several empirical methods can be used.Among them, a distinction can be made between the supervised (empirical discrepancy methods) and the unsupervised approaches (empirical goodness methods), depending on the requirement of a reference object delineation [1,30].Both supervised and unsupervised methods allow the comparison of different segmentation algorithms or of different parameters used in a single algorithm (segmentation parameter optimization).
Supervised evaluation methods assess the divergence between a segmented image and a reference segmentation layer using 'discrepancy measures'.Usually, the reference layer is created by delineating objects manually, thus requiring a time-consuming and highly subjective task.
In contrast, unsupervised evaluation methods assess the quality of a segmented image without the need of a reference or prior knowledge.This is the major advantage of these methods that can be used for automated segmentation parameter optimization [31].Moreover, a recent study shows that they can achieve similar classification accuracy [32].The evaluation relies on 'goodness measures' computed directly on the segmented image that represent the characteristics of a good segmentation.The uniformity of single objects (intra-segment homogeneity) and a significant difference between adjacent objects (inter-segment heterogeneity), firstly presented in [33] as desired characteristics of created objects, are now widely used in unsupervised evaluation methods.Several unsupervised approaches have been proposed in the literature, with different goodness measures and methods for combining them into a synthetic metric (see [1] for a review).
As we looked for automation, we elaborated a new GRASS GIS add-on for unsupervised segmentation parameter optimization (USPO) named i.segment.uspo[34].Its working principle is illustrated on Figure 3.This tool is an implementation of the methods proposed by [31,35].It relies on optimization functions combining measures of intra-object variance weighted by object size [32] (WV) as an intra-segment homogeneity quality measure, and spatial autocorrelation (SA) as an inter-segment heterogeneity quality measure [35].For the latter, the user can choose between Moran's I [36] or Geary's C [37].As the measure should be comparable for different segmentation results, both intra-segment homogeneity and inter-segment heterogeneity measures are normalized using the following function [35]: where F(x) is the normalized value of either WV or SA, X is the WV (or SA) value of the current segmentation result, and X max and X min are the maximum and minimum values of WV (or SA) for the whole stack of segmentation results to be evaluated.A high value for normalized WV (WVnorm) indicates higher undersegmentation, while a high value for normalized SA (SAnorm) highlights a higher oversegmentation.
The GRASS GIS add-on i.segment.uspoenables the combination of these WV and SA measures using two different optimization functions: a simple sum of the normalized criteria values as proposed by [35] or the F-function proposed by [31] that permits us to weight the two optimization criteria.The F-function is calculated as follows: where F is the 'overall goodness', ranging from 0 (poor quality) to 1 (high quality) [31], to be used as a synthetic measure of the quality of the segmentation and α is a parameter that can be modified to give more weight to WV or to SA.This overall goodness metric was designed in order to perform unsupervised segmentation parameter optimization for multi-scale OBIA (MS-OBIA) [31] (i.e., a process where different levels of segmentation are used together in the classification).In the semi-automated processing chain that was developed, the classification is performed using a single segmentation level.However, the chain could very easily be modified to enable MS-OBIA.
As highlighted in [38], the ability of USPO approaches to produce a good segmentation for specific features of interest in the scene is not straightforward, especially if those features are small-sized.Regarding this issue, we clearly recommend a visual check of the segmentation results to ensure that they are consistent with the objects of interest in the scene, as illustrated in the flowchart in Figure 2. If this is not the case, the α parameter in the Johnsons' optimization function can be adapted to give more importance either to intra-segment homogeneity (set the α parameter higher than 1 to avoid residual undersegmentation) or to inter-segment heterogeneity (set the α parameter lower than 1 to avoid residual oversegmentation) [31].More generally, it is clear that the 'perfect' segmentation does not exist [1,39,40], even if optimization methods are used.In their conclusion, Räsänen et al. argue that "[...] different segmentation evaluation methods should be used with care [...].When segmentation evaluation is rigorously used, however, it can assist in finding a more optimal segmentation".[29] (p.8623).
Based on a range of parameter values provided by the user, the i.segment.uspotool creates a set of segmentation results that are then assessed using the optimization function (see Figure 3).We suggest setting the range of segmentation parameter values to be tested by identifying values resulting in clearly under-segmented and over-segmented results, and using them as extremes.In order to reduce computation time during the optimization process, the tool provides the possibility to optimize the segmentation parameters on several spatial subsets of the scene (i.e., several zones limited in terms of area).
Care is recommended during the selection of those spatial subsets to ensure that they represent the diversity of the landscape that can be found in the whole scene.Detailed results are available (WV and AS measures and optimization scores for each segmentation parameter combination and each spatial subset), enabling the user to make an informed choice.Provided that there are no extreme outliers among the distribution of segmentation parameters from the different spatial subsets, the choice amongst the results can be completely automated by, for example, selecting the lowest value of the threshold parameter (as illustrated in Figure 2).Even though this approach could result in oversegmentation in some parts of the scene, some studies [39,41] argue that oversegmentation is preferable to undersegmentation, as the former can be corrected during classification, contrary to the latter.Furthermore, some recent studies [32,42] highlight that oversegmentation, as long as it remains at an admissible level, could be a minor issue in regard to the final classification result.Insofar that the different spatial subsets were well chosen to ensure that they represent the diversity of landscapes in the whole scene, the presence of extreme outliers among the optimized segmentation parameter is an indication that segmentation using a single parameter for the whole scene is not recommended.In this case, the whole scene could be subdivided into several more homogeneous areas according to some specific criteria.These areas could then be used as tiles in the segmentation workflow to perform local optimizations of the segmentation parameters [43].Based on a range of parameter values provided by the user, the i.segment.uspotool creates a set of segmentation results that are then assessed using the optimization function (see Figure 3).We suggest setting the range of segmentation parameter values to be tested by identifying values resulting in clearly under-segmented and over-segmented results, and using them as extremes.In order to reduce computation time during the optimization process, the tool provides the possibility to optimize the segmentation parameters on several spatial subsets of the scene (i.e., several zones limited in terms of area).
Care is recommended during the selection of those spatial subsets to ensure that they represent the diversity of the landscape that can be found in the whole scene.Detailed results are available (WV and AS measures and optimization scores for each segmentation parameter combination and each spatial subset), enabling the user to make an informed choice.Provided that there are no extreme outliers among the distribution of segmentation parameters from the different spatial subsets, the choice amongst the results can be completely automated by, for example, selecting the lowest value of the threshold parameter (as illustrated in Figure 2).Even though this approach could result in oversegmentation in some parts of the scene, some studies [39,41] argue that oversegmentation is preferable to undersegmentation, as the former can be corrected during classification, contrary to the latter.Furthermore, some recent studies [32,42] highlight that oversegmentation, as long as it remains at an admissible level, could be a minor issue in regard to the final classification result.Insofar that the different spatial subsets were well chosen to ensure that they represent the diversity of landscapes in the whole scene, the presence of extreme outliers among the optimized segmentation parameter is an indication that segmentation using a single parameter for the whole scene is not recommended.In this case, the whole scene could be subdivided into several more homogeneous areas according to some specific criteria.These areas could then be used as tiles in the segmentation workflow to perform local optimizations of the segmentation parameters [43].The chain was designed to perform the segmentation process by dividing the scene using a vector layer provided by the user.This layer can consist of, e.g., arbitrary tiles or existing administrative boundaries.This implementation also allows users to manage very large datasets.

Object Statistics Computation
Object statistics were computed using the i.segment.statsGRASS GIS add-on [44] and were used as features in the classification process.This tool computes both the spectral statistics (e.g., min, max, median, stddev) and morphological statistics of objects (e.g., area, perimeter, compactness, fractal dimension).In order to speed up the calculation of the latter, another add-on, r.object.geometry[45], was developed.This add-on eliminates the need for vectorizing segments when computing morphological statistics, resulting in a significant gain in time.

Classification by the Combination of Multiple Machine Learning Classifiers
The classification stage of the processing chain uses the v.class.mlRGRASS GIS add-on [46].It relies on the utilization of the "Caret" library of the R software [47], and enables the classification of data using Support Vector Machine (currently only with a radial kernel) (SVMradial), Random Forest (RF), Recursive partitioning (Rpart), and k-Nearest Neighbors (kNN) classifiers.This add-on automatically tunes classifiers' parameters using repeated cross-validation with, by default, 10 iterations of 5-fold cross-validation on the training data set.Predictions of individual classifiers are then combined using several types of majority vote.
Four voting systems are provided: "Simple Majority Vote" (SMV), "Simple Weighted Vote" (SWV), "Best Worst Weighted Vote" (BWWV), and "Quadratic Best Worst Weighted Vote" (QBWWV).SMV simply consists of retaining the most frequent prediction.In the other votes, the predictions of individual classifiers are weighted.In SWV, the weight used is strictly the accuracy of individual classifiers estimated through cross-validation.In BWWV, the worst classifier is assigned a zero weight and is thereby not taken into account, and the best classifier is assigned a unit weight.The remaining classifiers are weighted linearly between 0 and 1.The last vote, QBWWV, is designed similarly to the former but the remaining classifiers are weighted using a squared function, amplifying the importance of more accurate classifiers.Interested readers can refer to [9] for the votes presented here and to [48] for more advanced methods used in remote sensing field.
A noticeable advantage of GRASS GIS is that it can be connected directly to R [16,49], allowing the exploitation of several advanced statistics methods (e.g., deep learning methods) implemented in this open-source software.

Study Areas and Data
In order to evaluate the transportability of the proposed processing chain, we applied it to two very different urban environments: Ouagadougou (Burkina Faso, in Sub-Saharan Africa) and Liège (Belgium, in Western Europe).More broadly, this work is linked with two research projects dealing with the production (Modelling and forecasting African Urban Population Patterns for vulnerability and health assessments project (MAUPP, http://maupp.ulb.ac.be/), focusing on African Sub-Saharan cities) and the update (SmartPop project, focusing on the Walloon region in Belgium, http://www.issep.be/smartpop/) of LULC maps.These maps will be used later as inputs in census population data disaggregation models.
The processing chain was first developed on Ouagadougou, the capital of Burkina Faso in Western Africa.Covering more than 615 km 2 , this city has been facing intensive urban sprawl during the last few decades similar to most sub-Saharan African cities and is characterized by very different urban patterns, such as planned versus unplanned residential areas, among others.Then, the processing chain was applied to the Liège area (261 km 2 ), a Western European city located in Belgium which shows strong land artificialization (more than 55% of the territory).Urban morphologies are more diversified (from isolated houses to 10+ storey buildings), but urban sprawl is limited and controlled in comparison with Africa.
The datasets consist of multi-spectral and height data.For Ouagadougou, a pan sharpened stereo WorldView-3 imagery (Visible and Near-InfraRed bands (VNIR), spatial resolution of 0.5 m) acquired during the wet season (October 2015) and a normalized digital surface model (nDSM) (spatial resolution of 0.5 m) produced by stereophotogrammetry from WorldView-3 stereo-pairs were used.For Liège, the data consisted of leaf-on VNIR aerial orthophotos with a spatial resolution of 0.25 m acquired in May 2012 and a leaf-off nDSM extracted from Light Detection And Ranging data (LiDAR) (with a point density between 1 and 3 points per square meter) that was acquired in the winter of 2013-14.
As our processing chain is under development, we focused the classification effort on a 25 km 2 subset for both cities (see Figure 4), representative of the diversity of landscapes and urban forms.As our processing chain is under development, we focused the classification effort on a 25 km 2 subset for both cities (see Figure 4), representative of the diversity of landscapes and urban forms.

Legend/Classification Scheme
The classification scheme is organized in two hierarchical levels (see Table 1).The first level contains only land-cover (LC) classes, while the second level is a LULC mix of classes.At both levels,

Legend/Classification Scheme
The classification scheme is organized in two hierarchical levels (see Table 1).The first level contains only land-cover (LC) classes, while the second level is a LULC mix of classes.At both levels, an extra class is dedicated to shadows; their post-processing is out of the scope of this article.The classification was made based on the second-level classes, which were aggregated to match the first-level classes.

Sampling Scheme
Sampling was conducted outside the processing chain, by generating random points and labelling them by hand, through visual image photo-interpretation.Although existing geodatabases were used for stratification, visual interpretation was needed to bypass thematic or spatial accuracy issues.In order to ensure a clear spatial independence, the training set was generated for the whole area excluding the 25 km 2 subset where the classification was produced.An independent test set was generated inside this subset for performance evaluation purposes (see Figure 4).This procedure avoids potential spatial autocorrelation between the training and test sets.
For Ouagadougou, the OpenStreetMap (OSM) dataset was used as far as possible according to the availability.These data were used only for stratification purposes and only for some specific classes, i.e., for second-level classes of 'buildings', 'asphalt surfaces', and 'water bodies'.When OSM datasets consisted of lines, as it is the case for asphalt roads and watercourses, buffers were created.Manual sampling was required for 'swimming pools' and 'shadow' classes.Intensive visual interpretation was needed for labelling each sampled point individually and to bypass mislabelled (cases where OSM attributes were false) and spatial inaccurate issues coming from the OSM data.
For Liège, existing official geodatabases from the national administration, i.e., 'TOP10V' (Institut Géographique National (IGN), 2010), and from the regional administration, i.e., 'Projet Informatique de Cartographie Continue' (PICC) (Service Public de Wallonie (SPW), 2007), were used for the stratification of the majority of second-level classes.Manual sampling was needed for the class 'shadow'.Given the production date of the geodatabases used, a visual validation of the samples was needed to match the 2012/2013 land-cover status.In total, 1352 training points and 369 test points were created for Ouagadougou and 549 training points and 388 test points for Liège.The smaller size of the training set for Liège is explained by the reduced number of classes, their higher spectral consistency, and the intensive use of reference geodatabases.The class-distribution details are presented in Table 1.
Training and test points were used to automatically select intersecting segments and create the training and test sets.Although there is risk that some imperfect segments are used, the advantage of this strategy is that the same labelled set of points can be used with different segmentation results.

Segmentation
The segmentation and unsupervised segmentation parameter (USPO) steps were carried out using multispectral information.For Ouagadougou, NDVI was also used as an additional layer.The nDSM layer was not used for the segmentation because of its insufficient geometric precision.The "minsize" parameter was set in order to match a chosen minimum mapping unit.The latter was defined according to the geographical context based on the smallest house/shelter: 2 m 2 for Ouagadougou and 15 m 2 for Liège.The intervention of the operator in the USPO process was limited to identification of the range of "threshold" parameters to be tested (minimum, maximum, and intervals), by manually looking for the thresholds resulting in clearly over-segmented or under-segmented objects.The optimized threshold was then automatically determined via the i.segment.uspoadd-on.When giving the same weight to both intra-object homogeneity and inter-object heterogeneity measures (with the Johnson's α parameter set to 1), objects of interest like small houses or trees were undersegmented.To avoid this issue, Johnson's α parameter was then set to 1.25 for both Ouagadougou and Liège, in order to give more importance to intra-object homogeneity in the optimization function.

Classification Feature
For both case studies, the minimum, maximum, range, standard deviation, sum, and median statistics were computed for segments on the multispectral bands, NDVI and nDSM.These spectral statistics were completed with the morphological attributes of the objects (area, perimeter, and compactness).

Results
The classifications were performed at the second level of the legend scheme (see Table 1) using four individual machine learning classifiers that were combined using four voting systems.For each classification, the second-level classes were then aggregated to obtain the classes of the first level.The overall accuracy as well as Cohen's Kappa metric of individual classifiers and vote combinations are presented in Table 2.The ranking of individual classifiers' performance is the same for Ouagadougou and Liège, with Random Forest (RF) performing best (overall accuracy (OA) of 81% and 79%, respectively), followed by Support Vector Machine with radial kernel (SVMradial) (75% and 74% OA, respectively), then Recursive partitioning (Rpart) (72% and 74% OA, respectively), and finally K-Nearest Neighbors classifier (kNN) (both 50% OA).
In the proposed processing chain, the training set is created by selecting the objects that contain the manually labelled point (see Figure 2), without any visual check.This design could result in the presence of mis-segmented objects in the training set which could perturb the classifiers.This explains why RF outperformed SVM, as studies show that RF is very robust when trained with imperfect data [50], while SVM is very sensitive to the presence of noise in the training set [51].
The user's and producer's accuracy computed on second-level classes are provided for each classification in Tables A1 and A2.As assessing the performance with these measures can become very confusing, the F-score (harmonic mean of the user's and producer's accuracy) is used as a synthetic accuracy metric [52,53] in order to compare the classifiers' performance on a class basis.The 'buildings' class is of particular importance in the context of the MAUPP and SmartPop projects since their final objective is to disaggregate census population data using LULC maps in order to model the spatial distribution of population densities.Using RF, this class reached a high accuracy for both case studies, with an F-score of 0.93 for Ouagadougou and 0.91 for Liège (see Table 3).For Ouagadougou, RF impressively outperformed Rpart and SVM for the class 'buildings' (both reaching an F-score of 0.78).This is also true for asphalt surfaces, with an F-score of 0.83 for RF, 0.61 for Rpart, and 0.55 for SVM.Again, those observations can be explained by the robustness of RF when dealing with imperfect data.While satisfactory F-scores were obtained for specific classes such as 'buildings', 'asphalt surfaces', or 'water bodies', the accuracy is quite low for the other classes.It is also interesting to note that SVM and Rpart outperformed RF for specific classes in Ouagadougou ('dry vegetation' and 'mixed bare soil/vegetation', respectively).
The analysis of individual classifiers' confusion matrices (see Tables 4 and 5) revealed that, for both case studies, confusions occurred mainly between the different vegetation classes (46% and 61% of the whole confusions in Ouagadougou and Liège, respectively).In Ouagadougou, confusion also appeared between the bare soils classes (Brown/red bare soils; White/grey bare soils) and asphalt surfaces, shown in Table 4. Thanks to the hierarchical design of the legend, those confusions were greatly reduced when aggregating the second level classes to reach the first level of the legend.For this level, the overall accuracy is 93% for both case studies when considering the best performing voting scheme, i.e., the Simple Weighted Vote (SWV), shown in Table 2. Despite the combination of individual predictions, majority votes do not perform better than the best individual classifier for the classes with high confusion, i.e., vegetation and bare soil classes.Conversely, the accuracy of other classes was improved by the votes.For example, it can be observed in Table 3 that the classes of 'buildings' and 'bare soil' benefit from the votes in Liège.The improvement resulting from the vote is more noticeable for the 'other vegetation' class in Ouagadougou, where the best-performing individual classifier (RF) reached an F-score of 0.77 while weighted votes (BWWV, QBWWV) reached 0.81.These balanced results, with votes outperforming individual classifiers for some classes and underperforming for others are consistent with the previous research [9].The current method of attributing weight during the vote, using the overall accuracy of individual classifiers, is quite simple.Other methods might be implemented in order to take into account the performance of each classifier for specific classes (see [54] for a review of decision level fusion methods used in remote sensing).
Regarding segmentation, the use of an optimized segmentation parameter provided by i.segment.uspoachieved satisfactory results in our case studies.Even if the quantitative assessment of the segmentation's quality is not in the scope of this paper, a visual check of Figure 5 reveals that the images are segmented into meaningful objects.
Even though a rigorous comparison of the results using different datasets and training/test sets could not be performed, the results obtained by applying the proposed semi-automated processing chain on two very different urban contexts are similar and attest the transportability of the proposed framework.

Discussion and Perspectives
The entire semi-automated processing chain for urban OBIA classification, relying on open-source solutions, is available on a dedicated Github repository (https://github.com/tgrippa/Opensource_OBIA_processing_chain).As it is shared under the CC-BY 4.0 Creative Common Licence, anyone interested can use and/or adapt it to match different project-specific needs, by integrating additional steps (e.g., automated image pre-processing, computation of spectral or textural indices, automated sampling based on existing reference geodatasets).
Other frameworks relying on open-source solutions have already been proposed for the extraction of valuable geographical information from remote sensing data [53,[55][56][57][58].Some of them are distributed as a plug-in or a toolbox for existing geographical information systems, mainly for QGIS [55,56,58], and present the advantage of providing a comforting environment for users, making their use quite simple.
For most of them, pixel-based image classification is their core task.However, some include basic object-based capabilities.For example, in the context of a pixel-based supervised classification, the Semi Automated Classification Plug-in (SACP) [55] enables the user to save time when creating regions of interest (ROI), as these are created using a region growing segmentation, starting from pixel seeds defined by the user.Another example is the 'Twinned Object and Pixel-based Automated Classification chain' (TWOPAC) plug-in that enables performing classification using object-based derived features, but considers the segmentation as well as the object features computation as pre-processing steps to be performed outside of the tool [59].
After our investigations, we found only one existing open-source framework that allowed us to perform a complete object-based image analysis from segmentation to classification [57].It relies fully on Python libraries and is a highly modular solution for object-based image analysis, as it can be linked with a lot of existing functions and software.Unfortunately, it could be very difficult for researchers without strong programming skills to handle this kind of framework.
In this paper, we propose a contribution toward the development of a fully automated processing chain for object-based image analysis.The advantage of the framework we propose is that it relies on the open-source software GRASS GIS, which has had recent enhancements for object-based image analysis, enabling the development of more automated procedures.As GRASS GIS offers a graphical user interface (GUI), the different commands can be tested in the GUI during the script development stage, and can then be included in the processing chain thanks to the GRASS Python scripting library.Another key advantage of GRASS GIS is its users' and developers' community, which is usually helpful and responsive.
Even though enhancements are desirable, the semi-automated processing chain currently achieved interesting results, as shown through the two case studies presented in this paper.Perspectives on further developments are discussed below in this section.
The generation of training and validation samples still requires strong manual expert intervention.This remains a challenge to be overcome by future research looking for automation, especially when highly accurate reference geodatabases are not available.In that case, alternative data such as OpenStreetMap data could be used, but their quality is often inconsistent and they should therefore be assessed prior to any automated use.Issues of co-registration with VHR imagery might also arise when using such data sets.The practical implementation of an active learning strategy [60], which could help in building efficient training sets more rapidly, is currently under development in GRASS GIS.
In order to improve the segmentation and hence the resulting classification, we intend to implement a multi-scale segmentation strategy, which has proved its ability to enhance the classification performance in a previous study [31].Segmentation strategies using superpixels could also be investigated for further enhancements, since a new add-on [61] implementing SLIC superpixels method has been developed recently.This approach has provided interesting results in recent research [42].
Another improvement method concerns the features used as inputs in the classification process.Currently, only relatively simple object statistics are used.Band ratios and several textural indices will be added.They will be automatically computed and submitted to a feature selection procedure for those classifiers that do not include feature selection inherently.
During parameter tuning, spatial autocorrelation between the training and test sets created in cross-validation can lead to undetected overfitting and an overvaluation of the accuracy [62,63].To reduce this potential bias and obtain better bootstrap error estimates, we will investigate the possibility of implementing spatial cross-validation, i.e., a spatially-constrained partitioning of the training and test sets created in cross-validation [64].In addition, more classifiers will also be included.
Moreover, we will explore the possibility of implementing other strategies for the combination of multiple classifiers (see [48,54,65] for a review).The voting systems currently used to combine predictions are based on weights derived from the overall accuracy or kappa of individual classifiers, but in some cases, the non-best classifiers outperformed the best classifier's performance for specific classes (see Table 3).
Since the performance of different LULC mapping methods is currently being assessed in the SmartPop project, our open-source semi-automated approach is being compared to a rule-based approach, developed in a proprietary software.The latter integrates existing ancillary vector layers (buildings, roads, rails, and water bodies) in the segmentation.Constrained segmentation using ancillary vector layers in GRASS GIS will be investigated in future studies.
In the near future, the processing chain will be tested on different datasets and/or cities.For the MAUPP project, Synthetic Aperture Radar (SAR) data will be added as an input in order to improve the accuracy and the chain will be applied to Ouagadougou (Burkina Faso), Dakar, and Saint-Louis (Senegal).For the SmartPop project, Pléiades imagery will be used instead of orthophotos in order to assess the comparative advantage of each dataset.Thereafter, the efficiency of the processing chain will be tested for the automated processing of a very large area (i.e., the Walloon Region in Belgium), taking advantage of the parallel computing options in the different modules.

Conclusions
In times when the reproducibility of research and the sharing of existing solutions is high on the discussion agenda, the development of free and open-source solutions becomes paramount.In this paper, a semi-automated processing chain for urban object-based classification is proposed as a contribution towards the development of a transparent and open-source fully automated processing chain for urban land-use/land-cover mapping from earth observation data.This processing chain, relying on existing open-source geospatial software, is very adaptable and transportable to similar datasets.It proved its ability of being quickly customizable in order to match the requirements of different projects, with very different urban morphologies and different datasets.Freely available for anyone interested, it should be seen as a framework to be reused and enhanced for further studies.The results achieved on our case studies are very interesting, taking into account the complexity of the urban environments and the detail of the legend.

Figure 1 .
Figure1.Excerpt of the "Jupyter notebook" consisting of a sequence of descriptive text parts that document the different processing steps and cells of the Python script that can be executed directly from the notebook.

Figure 1 .
Figure1.Excerpt of the "Jupyter notebook" consisting of a sequence of descriptive text parts that document the different processing steps and cells of the Python script that can be executed directly from the notebook.

Figure 2 .
Figure 2. Flowchart of the processing chain.

Figure 2 .
Figure 2. Flowchart of the processing chain.

Figure 3 .
Figure 3. Selection of the optimized thresholds by i.segment.uspo.

Figure 3 .
Figure 3. Selection of the optimized thresholds by i.segment.uspo.
diversified (from isolated houses to 10+ storey buildings), but urban sprawl is limited and controlled in comparison with Africa.The datasets consist of multi-spectral and height data.For Ouagadougou, a pan sharpened stereo WorldView-3 imagery (Visible and Near-InfraRed bands (VNIR), spatial resolution of 0.5 m) acquired during the wet season (October 2015) and a normalized digital surface model (nDSM) (spatial resolution of 0.5 m) produced by stereophotogrammetry from WorldView-3 stereo-pairs were used.For Liège, the data consisted of leaf-on VNIR aerial orthophotos with a spatial resolution of 0.25 m acquired in May 2012 and a leaf-off nDSM extracted from Light Detection And Ranging data (LiDAR) (with a point density between 1 and 3 points per square meter) that was acquired in the winter of 2013-14.

Figure 4 .
Figure 4. Ouagadougou and Liège case studies.True color composite is used as the background.For both cities, the classification was made on the white-squared subset.Training samples are in yellow while the test samples are in red.

Figure 4 .
Figure 4. Ouagadougou and Liège case studies.True color composite is used as the background.For both cities, the classification was made on the white-squared subset.Training samples are in yellow while the test samples are in red.

Table 1 .
Classification scheme and size of the training and test sets for Ouagadougou and Liège.

Table 2 .
Performance evaluation of individual classifiers and the four different voting systems.For each line, the highest value is in bold.OA: Overall accuracy.L1 and L2: Levels of the classification scheme.kNN: k-Nearest Neighbors.Rpart: Recursive partitioning.SVMradial: Support Vector Machine with radial kernel.RF: Random Forest.SMV: Simple Majority Vote.SWV: Simple Weighted Vote.BWWV: Best Worst Weighted Vote.QBWWV: Quadratic Best Worst Weighted Vote.

Table 3 .
F-score for individual classes for the second level (L2) of the classification.For each line, the highest value is in bold.kNN: k-Nearest Neighbors.Rpart: Recursive partitioning.SVMradial: Support Vector Machine with radial kernel.RF: Random Forest.SMV: Simple Majority Vote.SWV: Simple Weighted Vote.BWWV: Best Worst Weighted Vote.QBWWV: Quadratic Best Worst Weighted Vote.

Table 4 .
Confusion matrix for the Simple Weighted Vote on Ouagadougou, Burkina Faso.Values are given as a percentage of the reference test set (column-based normalization).Diagonal values correspond to the producer accuracy.BU: Buildings, SW: Swimming pools, AS: Asphalt surfaces, RBS: Brown/red bare soil, GBS: White/grey bare soil, TR: Tree, MBV: Mixed bare soil/vegetation, DV: Dry vegetation, OV: Other vegetation, WB: Water bodies, SH: Shadow.

Table 5 .
Confusion matrix for the Simple Weighted Vote on Liège, Belgium.Values are given as a percentage of the reference test set (column-based normalization).Diagonal values correspond to the producer accuracy.BU: Buildings, AS: Asphalt surfaces, LV: Low vegetation, MV: Medium vegetation, HVD: High vegetation deciduous, HVD: High vegetation coniferous, BS: Bare soil, WB: Water bodies, SH: Shadow.

Table A1 .
Performance evaluation of the level-2 classification of Ouagadougou, Burkina Faso.Producer accuracy (PA) and User accuracy (UA) for each class of the second level of classification.For each line, the highest value is in bold.BU: Buildings.SW: Swimming pools.AS: Asphalt surfaces.RBS: Brown/red bare soil.GBS: White/grey bare soil.TR: Trees.MBV: Mixed bare soil/vegetation.DV: Dry vegetation.OV: Other vegetation.WB: Water bodies.SH: Shadow.