Analysis of OpenStreetMap Data Quality at Different Stages of a Participatory Mapping Process: Evidence from Slums in Africa and Asia

: This paper examines OpenStreetMap data quality at different stages of a participatory mapping process in seven slums in Africa and Asia. Data were drawn from an OpenStreetMap-based participatory mapping process developed as part of a research project focusing on understanding inequalities in healthcare access of slum residents in the Global South. Descriptive statistics and qualitative analysis were employed to examine the following research question: What is the spatial data quality of collaborative remote mapping achieved by volunteer mappers in morphologically complex urban areas? Findings show that the completeness achieved by remote mapping largely depends on the morphology and characteristics of slums such as building density and rooftop architecture, varying from 84% in the best case, to zero in the most difﬁcult site. The major scientiﬁc contribution of this study is to provide evidence on the spatial data quality of remotely mapped data through volunteer mapping efforts in morphologically complex urban areas such as slums; the results could provide insights into how much ﬁeldwork would be needed in what level of complexity and to what extent the involvement of local volunteers in these efforts is required.


Introduction
Slums, which are areas deprived of durable housing, acceptable sanitation, and safe water as well as characterized by insecure land tenure and overcrowding generally, hold about one-quarter of the world's urban population [1].Slum neighborhoods are frequently characterized by a complex morphology of their physical characteristics [2], which include the geometry of buildings and routes, density, and roofing material, among others [3,4].In recent years, the lack of high-quality spatial data of slums has received renewed socio-political and academic interest [3,5,6].One potential data source for making spatial data available is Volunteered Geographic Information (VGI), which is opening up new possibilities of data production in recent years and facilitating the emergence of a global humanitarian mapping community [7] with several initiatives aimed at "putting the most vulnerable people on the map" [8].A key concern that arises from the use of volunteered and crowdsourced geographic information to map vulnerable areas is related to the quality of the data generated by non-experts.This has led to a plethora of scientific studies for evaluating the quality of data in crowdsourced platforms such as OpenStreetMap (OSM) [9][10][11][12][13][14][15][16][17][18][19].The overall goal of the OSM project is aimed at the creation of a free to use, and editable, world map according to the OSM Foundation (OSMF) board which exists to protect the project [20].Recent studies have investigated OSM data quality without using any external data; the so-called intrinsic approach [9,10].In contrast to the intrinsic approach, other studies commonly used what is referred to as the extrinsic approach where the OSM data are compared with external datasets such as the UK Ordnance Survey data or National Park Service lists [21,22].
A major difficulty arises for evaluating the quality of OSM data for slums: since these neighborhoods are frequently not present in official maps, there is seldom external reference data to be used for extrinsic quality studies.Furthermore, intrinsic quality analyses of the mapping in slums often offers limited insights about data quality since most of these communities do not have organic growth in mapping activities.Implementing participatory-based mapping activities within slum environments offers the potential to assess the cumulative improvements of the data over time as in the OSM communities in the Global North.
Due to these challenges, very few studies have assessed the quality achieved by collaborative mapping efforts of the global humanitarian OSM community (e.g., [11]), particularly in slums, whose complex morphology raises particular challenges for mapping from satellite imagery.To fill this gap, we examined OSM data quality at different stages of a participatory mapping process leading to the final update of the OSM database.The following research question is addressed in this study: What is the spatial data quality of collaborative remote mapping achieved by volunteer mappers in morphologically complex urban areas?By addressing this question, a multi-country case study associated with an ongoing research project of the National Institute for Health Research (NIHR) Global Health Research Unit on Improving Health in Slums is presented.The Unit focuses on health services in slums through the study of seven slum sites across two continents (Asia and Africa) with the ultimate aim of finding optimal ways to deliver health services to slum dwellers [23].This context enables us to overcome the challenges of previous studies, by analyzing the results of the same mapping procedures systematically applied in seven slums in four countries (Bangladesh, Kenya, Nigeria, and Pakistan).In this paper, we present results from a spatial data quality assessment along various stages of mapping process workflow used to map and update the OSM database in these sites and seek to learn lessons for future humanitarian mapping initiatives.
The next section, Section 2, discusses related work covering intrinsic, extrinsic, analytical platforms and identifies knowledge gaps which inform our research question.Section 3 presents the research question as well as materials and methods of the study.Section 4 presents results, and Section 5 presents the discussion and conclusion including limitations of the study and potential directions for future work.

Related Work
There are several quality categories that one can study when considering data quality of geographic datasets.According to the International Organization for Standardization (ISO), there are five categories of geographic data quality comprising completeness, logical consistency, positional accuracy, thematic accuracy, and temporal quality where, for example, completeness is defined as: "[ . . .] the presence and absence of features, their attributes and relationships."[12].Among these quality categories, completeness is considered as a fundamental measure of geographic data quality in OSM research [13], as this foundation is paramount for the remaining quality elements to build upon.Some studies examined OSM feature attributes such as speed [24], or focused only on the main aspect of completeness (i.e., "presence and absence of features" where OSM feature extraction or determination was based on primary feature tags, or attributes, such as building or highway), sometimes relating outcome with the density of features but in a non-slum context [13][14][15].Several studies used intrinsic approaches, where only OSM data is used, to investigate OSM data quality in recent years [9,10,16,21,25].Some studies have used the concept of analyzing activity stages based on the history of OSM data and heuristic rules for stage transitions (e.g., no data, start, growth, and saturation), covering 12 representative metropolitan areas [12] as well as globally at the regional scale [25].Other studies have developed tools to examine OSM line type feature data quality such as a plugin for Quantum Geographic Information System (QGIS) [10] and Python-based tools [16].Some of the studies have led to a plethora of analytical platforms such as iOSMAnalyzer [9], OSM History Data Analytics Platform [26], OSMStats [27], and OSM Analytics Tool [28] among others.Although these platforms do show historical profiles of intrinsic quality indicators, they are unable to clearly define the underlying mapping stages that led to the data produced for visualization or descriptive statistics presented; this problem is partly due to a lack of information about the data production process that led to the historical data.Additionally, most of the services being offered by these platforms do not provide immediate latest analyses based on the most recent historical data.Making data production processes visible to researchers can help to identify whether the historical data available is produced via online mapping processes alone or in combination with field validation.Understanding the data production stages alongside the historical data in the database can be a very useful way to systematically explore challenges of quality data production as well as relative completeness (intrinsically) and absolute completeness (extrinsically).
In terms of extrinsic approaches, where OSM data is compared with other reliable datasets to examine data quality, previous studies have mostly used authoritative data as a reference and these studies were usually based on developed urban areas mostly in the UK, France, Germany, Ireland, United States, New Zealand, and Canada among others [13][14][15][17][18][19]21,22,[29][30][31][32].Where authoritative data are available, access and license restrictions can make the study impossible.Additionally, the financial cost and ethical appropriateness of the data for a given area to study can also be problematic.Finally, the data production stages leading to the final version of the data being shared are not transparent enough to allow systematic review in relation to identified stages of OSM data production.For extrinsic analysis, completeness measures for routes are usually length metrics and completeness is defined as the ratio of the total length of routes in OSM and the total length of routes in the reference data (e.g., [22]).Similarly, completeness measures for buildings are usually defined in terms of total number and/or area (e.g., [15]).
Related studies have used either unit-based or object-based comparative analytical approaches for examining the completeness of buildings and routes [14,15,17,19,22].For example, the proportion of the total number of OSM buildings relative to the total number of reference buildings per unit in percentage form is considered a unit-based comparison.In the case of object-based comparison, an example is the proportion of the total number of buildings in the reference data that are present in the OSM in percentage form where the centroids of reference buildings intersect an OSM building [15].These comparative approaches are normally implemented extrinsically due to the need for reference data to compare with the OSM data.Therefore, the historical OSM data is not used.In an object-based approach, matching of corresponding elements in both the OSM data and the reference is required prior to determining the proportion of the total number of reference buildings that are represented in OSM in percentage terms [15].Because external and different reference data sets are compared with OSM data, it is impossible to establish a correspondence between objects of the two datasets without using either centroid proximity or overlapping area.As object-based matching is sensitive to positional mismatches of objects and can be complicated and time-consuming, recent studies normally use the unitbased approach that is much simpler [14,19].The unit-based approach does not require any form of object-based matching and relies on the proportion of the total number (or area) of OSM buildings, or total length of OSM routes relative to the reference buildings (or routes) per unit in percentage terms [14,15,17,19,22].However, in the case of building completeness estimation, the unit-based approach is reported to be sensitive to disparities in estimates depending upon whether the total number or area is used and therefore an object-based approach is recommended [15].Although completeness and other quality indicators are examined in the aforementioned cases, the emphasis rarely focuses upon slum areas.The lack of suitable data in such areas often impede systematic analysis [17].This is partly because authoritative or reliable data is usually not available for slums and it is consequently impossible to undertake extrinsic studies.Moreover, in cases where reliable secondary data is available, the completeness of OSM data could be so low that it would not make any sense to conduct either extrinsic or intrinsic study.Collecting data in slums is time-consuming and the data production process can be even more complex than in a non-slum area.For example, identification of building footprints and footpaths in slums can be daunting.This study contributes to this gap and investigates the stages of the mapping process leading to the final update of the OSM database.Because historical data from different time points and data sources exist in the OSM, lessons can be learned by examining the impact of different data production processes on data quality.Such work can help advance transparency and inform future work on OSM.Mapping stages undertaken by remote or local mappers need to be visible to inform OSM data quality assessment and decisions.In a situation where both intrinsic and extrinsic quality assessments are difficult to undertake, a research-based experimental approach might be the best option.There are very limited studies that adopt a research-based experimental approach.For example, Eckle and Albuquerque [11] designed an experimental approach to the assessment of OSM data quality in crisis mapping but recommended a bigger study.Until now, no systematic OSM-based experimental studies exist for mapping and surveying of slums across multiple countries using the same open-source-based methodological framework.

Materials and Methods
The following research question guided this study, which covers seven slums across four countries: What is the spatial data quality of collaborative remote mapping achieved by volunteer mappers in morphologically complex urban areas?Within this overarching question, we focus on the following measures: (1.1) completeness of remote mapping (Stage 1) based on additional field data collected from fieldwork (Stage 2); (1.2) growth in data completeness during remote mapping of slums based on field data; and (1.3) completeness contributions per mapper during remote mapping and fieldwork.

OSM-Based Mapping Process for Slums
The overall methodological framework for mapping urban slum communities in the project has been published elsewhere [33,34].In this section, an overview of the participatory method and process is provided along with details of the mapping process workflow and the stages at which data sets were captured for the analysis in this study.Table 1 describes the overview of the methodological framework of the mapping process, which builds upon the typology of tasks of geographic crowdsourcing presented in Albuquerque et al. [35].
Table 1.Overview of the main steps of the mapping process.

Main Steps Brief Description
Preparation Preparing materials, engaging stakeholders, and defining responsibilities for the subsequent steps.This involves clarifying who the persons are that will fulfil the roles for each local partner team, as well as preparing materials for the digitization and mapping phases.

Digitization
To produce detailed base maps of the slum locations by tracing all streets and building structures from high-resolution optical satellite imagery.It involves digitization by remote and local teams and validation by experts.

Participatory mapping
To validate and enrich the digital maps obtained in the previous steps with the local communities by correcting potential inaccuracies and, most importantly, conflating the map with local knowledge of residents.

Analysis
To consolidate the geospatial data obtained in the previous steps into data products and visualizations that will be useful for end-users and researchers.
Figure 1 depicts graphically the activities that were part of the different mapping stages of our OSM-based participatory mapping process.Stage 0 consisted of preparatory activities and was useful for setting the agenda prior to the start of the online mapping.The preparation period mainly covered creating training materials, defining responsibilities with local core teams in each partner country, training local teams, procuring highresolution satellite imagery, identifying neighborhood boundaries with local core teams at partner institutions, and setting up of the online mapping platforms.Securing access to the slum sites was also negotiated during this period.All data collectors who took part in the mapping process but were not familiar with the tools were trained.Stage 1 was for online mapping.We used the Humanitarian OSM Team (HOT) Tasking manager (TM) and this served as an interface for coordination of mapping tasks [36].The TM provided links to OSM editors (e.g., iD Editor/JOSM), which in turn directed all edits by mappers to be recorded in the OSM database [37].The online mapping and validation activities ensured the capture of geometrically valid vector data from optical satellite imagery for fieldwork in Stage 2. Beyond online mapping and validation, field-mapping was undertaken to verify the digitized features.Stage 2 period was for fieldwork until the time by which the OSM data was extracted and prepared as a sampling frame.The sampling frame was made up of building structure geometry and names of household heads or representatives.Stage 2 involved using portable global positioning system (GPS) devices to track routes (roads and footpaths); uploading the data onto the OSM database; generating quick reference (QR) coded field paper maps based on Fieldpapers.orgtechnology; annotating the paper maps (Fieldpapers) in the field after checking building structure geometry along with a tablet-based structured questionnaire for building geometry verification and enumeration; scanning the annotated paper maps; and, conflating scanned annotated maps into the OSM database to obtain final field data (reference data).Two key open-source technologies were integrated into the questionnaire template development using OpenDataKit (ODK) and OpenMapKit (OMK) technologies for the household-heads listing survey to inform sampling frame generation [38,39].ODK server and client tools provided means for mobile a-spatial data collection and management while OMK tools allowed the collection of, and linking to, spatial data as part of the questionnaire administration.
ISPRS Int.J. Geo-Inf.2021, 10, x FOR PEER REVIEW 6 of 25 links to OSM editors (e.g., iD Editor/JOSM), which in turn directed all edits by mappers to be recorded in the OSM database [37].The online mapping and validation activities ensured the capture of geometrically valid vector data from optical satellite imagery for fieldwork in Stage 2. Beyond online mapping and validation, field-mapping was undertaken to verify the digitized features.Stage 2 period was for fieldwork until the time by which the OSM data was extracted and prepared as a sampling frame.The sampling frame was made up of building structure geometry and names of household heads or representatives.Stage 2 involved using portable global positioning system (GPS) devices to track routes (roads and footpaths); uploading the data onto the OSM database; generating quick reference (QR) coded field paper maps based on Fieldpapers.orgtechnology; annotating the paper maps (Fieldpapers) in the field after checking building structure geometry along with a tablet-based structured questionnaire for building geometry verification and enumeration; scanning the annotated paper maps; and, conflating scanned annotated maps into the OSM database to obtain final field data (reference data).Two key open-source technologies were integrated into the questionnaire template development using OpenDataKit (ODK) and OpenMapKit (OMK) technologies for the householdheads listing survey to inform sampling frame generation [38,39].ODK server and client tools provided means for mobile a-spatial data collection and management while OMK tools allowed the collection of, and linking to, spatial data as part of the questionnaire administration.Figure 2 shows example photographs of an online remote mapping event held during Stage 1 and fieldwork activities in Stage 2 (building geometry verification and enumeration) of the mapping process.An important variable in the definition of the stages is the actual dates for the start and end of the stages.Without the time intervals of the mapping stages, it is impossible to undertake systematic analyses as shown in this study.With careful planning, it might be possible to determine the Stage 1 period from the new version of the tasking manager but not that of the fieldwork.Identification of stage dates independently from the OSM database without any input from the mapping team can be inaccurate.For example, the use of annual stages by Gröchenig et al. [12] to estimate OSM completeness is not applicable to this study.Using the raw data, we defined the following indicators for analysis.Three measures were constructed comprising (1) completeness of building structures and routes, (2) completeness growth at each mapping stage as well as (3) completeness growth per mapper per stage.The next section presents the study sites, data, and analytical approach used in this study.Figure 2 shows example photographs of an online remote mapping event held during Stage 1 and fieldwork activities in Stage 2 (building geometry verification and enumeration) of the mapping process.An important variable in the definition of the stages is the actual dates for the start and end of the stages.Without the time intervals of the mapping stages, it is impossible to undertake systematic analyses as shown in this study.With careful planning, it might be possible to determine the Stage 1 period from the new version of the tasking manager but not that of the fieldwork.Identification of stage dates independently from the OSM database without any input from the mapping team can be inaccurate.For example, the use of annual stages by Gröchenig et al. [12] to estimate OSM completeness is not applicable to this study.Using the raw data, we defined the following indicators for analysis.Three measures were constructed comprising (1) completeness of building structures and routes, (2) completeness growth at each mapping stage as well as (3) completeness growth per mapper per stage.The next section presents the study sites, data, and analytical approach used in this study.

Study Sites
The seven study sites were as follows (see Figure 3 and Figure 4): a slum in Pakistan, city of Karachi anonymized as Karachi site; a slum in Bangladesh, city of Dhaka anonymized as Dhaka site; three slums in Nigeria, cities of Lagos and Ibadan anonymized as Lagos site, Ibadan site 1 and 2; and, two slums in Kenya, city of Nairobi anonymized as Nairobi site 1 and 2 [33,40].Karachi site is centrally located in a well-established area with permanent and multi-story buildings undergoing vertical levels of new construction.Dhaka site is centrally located in a well-established area with semi-permanent structures, undergoing regular demolitions and reconstructions.Ibadan site 1 is centrally located within a historical area, which is along an old, tarred road with permanent structures in poor condition.Ibadan site 2 is a resettled community at the edge of the city with a wellspaced clear layout and mostly permanent structures.Nairobi site 1 has a settled community and is located about 12 km from the Central Business District (CBD) of the city; the slum structures are made up of mud, timber or tin-roof materials and are mostly in rows.Nairobi site 2 is located about 7 km from the CBD; the slum structures are made up of either iron sheet or tin walls with iron sheet roofs.Figure 4 shows the qualitative sample characteristics of the seven slums in terms of geographical location, satellite imagery, buildings, and routes.The satellite imagery and the photographs also show sample characteristics of the layout of structures as well as rooftop architecture and height qualitatively.

Study Sites
The seven study sites were as follows (see Figures 3 and 4): a slum in Pakistan, city of Karachi anonymized as Karachi site; a slum in Bangladesh, city of Dhaka anonymized as Dhaka site; three slums in Nigeria, cities of Lagos and Ibadan anonymized as Lagos site, Ibadan site 1 and 2; and, two slums in Kenya, city of Nairobi anonymized as Nairobi site 1 and 2 [33,40].Karachi site is centrally located in a well-established area with permanent and multi-story buildings undergoing vertical levels of new construction.Dhaka site is centrally located in a well-established area with semi-permanent structures, undergoing regular demolitions and reconstructions.Ibadan site 1 is centrally located within a historical area, which is along an old, tarred road with permanent structures in poor condition.Ibadan site 2 is a resettled community at the edge of the city with a well-spaced clear layout and mostly permanent structures.Nairobi site 1 has a settled community and is located about 12 km from the Central Business District (CBD) of the city; the slum structures are made up of mud, timber or tin-roof materials and are mostly in rows.Nairobi site 2 is located about 7 km from the CBD; the slum structures are made up of either iron sheet or tin walls with iron sheet roofs.Figure 4 shows the qualitative sample characteristics of the seven slums in terms of geographical location, satellite imagery, buildings, and routes.The satellite imagery and the photographs also show sample characteristics of the layout of structures as well as rooftop architecture and height qualitatively.Ibadan site 1 (Africa)

Data
We used the full history dump of OSM data that is normally referred to as Planet OSM [41].There are three types of elements in the database: Nodes which define points in space; Ways which defines linear features and area boundaries; and Relations which are usually used to explain how other elements work together [42,43].Another dimension of the history data is the information about contributors.Any changes made by contributors, or mappers, such as geometry changes or deletions or creation of new elements during an editing session are saved into the OSM database; all these editing information are saved in what is called Changeset [16].All edits in the OSM history data were extracted using a computational framework for spatio-temporal analysis of OSM history database

Data
We used the full history dump of OSM data that is normally referred to as Planet OSM [41].There are three types of elements in the database: Nodes which define points in space; Ways which defines linear features and area boundaries; and Relations which are usually used to explain how other elements work together [42,43].Another dimension of the history data is the information about contributors.Any changes made by contributors, or mappers, such as geometry changes or deletions or creation of new elements during an editing session are saved into the OSM database; all these editing information are saved in what is called Changeset [16].All edits in the OSM history data were extracted using a computational framework for spatio-temporal analysis of OSM history database (OSHDB) in combination with the application programming interface of the "ohsome" big data analytics platform [26,44].The scripts for the data extraction are available on Gitlab space [45].

Analytical Approach
In this study, an object-based approach was used with the updated OSM history file and relies upon using OSM object identifiers (OSM-IDs) of buildings and routes to match OSM objects at timestamp k with OSM objects at the last timestamp of the final stage of the mapping process (i.e., timestamp at end of Stage 2).We examined the "true sense" of completeness at each timestamp during the mapping process for seven slums in multiple countries, across Africa and Asia, by exploring four novel completeness definitions shown in Equations ( 1)-(4) (Equations ( 2) and (4) are used for sensitivity analyses to provide additional information; we expect the same conclusion in this study).The definitions allowed the possibility of obtaining completeness of buildings and routes at any time during the mapping process retrospectively.The final stage in the equations refers to the end of Stage 2 (fieldwork).To our knowledge, this is the first time building and route completeness have been studied at the level of urban slum settlements in multiple countries and in such detail simultaneously.
where C bc_k is building count completeness at timestamp k; B ck was the total number of buildings at timestamp k which were also present at the final stage (i.e., end of fieldwork) and were never edited between timestamp k and the final stage; and, B c f was the total number of buildings at the final stage.
where C ba_k is building area completeness at timestamp k; B ak was the total area of buildings at timestamp k which were also present at the final stage (i.e., end of fieldwork) and were never edited between timestamp k and the final stage; and, B a f was the total area of buildings at the final stage.
where C rc_k is road count completeness at timestamp k; R ck was the total number of roads at timestamp k which were also present at the final stage (i.e., end of fieldwork) and were never edited between timestamp k and the final stage; and, R c f is the total number of roads at the final stage.
where C rl_k is road length completeness at timestamp k; R lk was the total length of roads at timestamp k which were also present at the final stage (i.e., end of fieldwork) and were never edited between timestamp k and the final stage; and, R l f was the total length of roads at the final stage.These four definitions of completeness, or completeness ratio, partly conform to the definition of completeness offered by Gröchenig et al. [12] who suggested that "the completeness measure of the geographical dataset D, where D is defined by geographical region R [slum area] and for purpose P [slum health mapping], depends on the degree of correspondence between the existence of objects and properties in the real world and the presence of their representing features in dataset D." Using these equations based on the three metrics (total number, length, and area) will allow estimation of disparities of completeness to situate the results in a better context.The different mapping stages are defined in Figure 1 and Table A1 (see Appendix A).The estimated completeness is presented using descriptive statistical tables and graphs.Understanding the growth of OSM elements has the potential to serve as a foundation for data quality assessment [12,46].Using the resulting data from completeness analysis, we calculate completeness growth (CG) of a stage which is defined as the difference, or gap, in completeness estimate resulting from the subtraction of the start estimates from the end estimates between stages expressed in percentage terms (see equation 5).Additionally, the number of mappers per stage together with the completeness growth estimates are used to compute completeness growth per mapper per stage (see equation 6).This information is used to compare the densities (i.e., number of elements per square kilometers) of OSM elements across the sites.OSM completeness measure is one of the most important components of quality and pertinent to our study, and, according to ISO, completeness measure indicates the presence or absence of real-world features in the database [47].The presence or absence of real-world features in the slums during the different stages of the participatory mapping process is what we sought to explore in this study without consideration to feature attributes beyond building and highway primary tags used to identify buildings and routes (such as speed on routes [24]).
where C ge_s is the completeness growth of OSM element E (i.e., building or route) at Stage S in percentage form, C e_stage_end is the completeness of E at end of S, and C e_stage_start is the completeness of E at start of S.
where C ge_m_s is completeness growth of OSM element E per mapper at stage S in percentage form, C ge_s is the completeness growth of OSM element E at stage S in percentage form, and M s is the total number of active mappers in Stage S.

Completeness of Buildings and Routes
Figure 5 shows the completeness of buildings during the mapping stages in all the seven slums.The results provide empirical evidence suggestive of the possibility to achieve up to 84% building completeness during remote mapping of some slums.In this case, for the slums in Asia, Karachi site, and Dhaka site, no building completeness was achieved which means none of the remotely mapped buildings was used in the updated map after fieldwork (or all of them had to be corrected).At the time of remote mapping, Karachi site was characterized by complex rooftop architecture, making it difficult for mappers to interpret satellite imagery and digitized building footprints.The rooves were mainly concrete and made up of other small structures.This meant that almost all buildings had to be edited during Stage 2 (fieldwork).Dhaka site had "extreme" building density, which meant that buildings were close to each other, impeding satellite imagery interpretation and digitization of building footprints.Except Karachi site, the slums had mostly roofing sheets and relatively well defined building footprint interpretation in which Dhaka site was the worst case.Ibadan site 2, unsurprisingly, achieved the highest completeness during remote mapping due to its clear layout which had not changed substantially from its resettlement layout for the past few decades.Overall, during remote mapping, Ibadan sites 2 and 1 achieved the highest completeness of 84% and 59% respectively.In the case of the two slums in Asia, Stage 2 mapping was essential in achieving 100% completeness as shown in Figure 5b. Figure 6 shows the completeness of routes during the mapping stages in all the seven slums.The results provide empirical evidence suggestive of the possibility to achieve up to 73% route completeness during remote mapping of slums.Graphs showing how route count completeness compares with route length completeness are presented in Appendix A (i.e., Figures A1-A7).Additionally, graphs showing how building count completeness compares with building area completeness are presented in Appendix A using the same figures showing route count versus length comparisons.The next section presents completeness growth per mapper at each mapping stage showing results of disparities in completeness growth using all four completeness definitions outlined in Section 3.4.Figure 7 shows sample completeness maps of Ibadan site 2 which achieved the highest building and route completeness.Figure 7a shows that prior to remote mapping (Stage 1) in this study, there was some level of completeness for buildings and routes although not at the desirable level required for any serious work or decision-making.Another contrasting sample completeness map is shown in Figure 8 using Karachi site, which achieved nearly zero building completeness in Stage 1. Figure 6 shows the completeness of routes during the mapping stages in all the seven slums.The results provide empirical evidence suggestive of the possibility to achieve up 73% completeness during remote mapping of slums.Graphs showing how route count completeness compares with route length completeness are presented in Appendix A (i.e., Figures A1-A7).Additionally, graphs showing how building count completeness compares with building area completeness are presented in Appendix A using the same figures showing route count versus length comparisons.The next section presents completeness growth per mapper at each mapping stage showing results of disparities in completeness all definitions outlined in Section 3.4.Figure 6 shows the completeness of routes during the mapping stages in all the seven slums.The results provide empirical evidence suggestive of the possibility to achieve up to 73% route completeness during remote mapping of slums.Graphs showing how route count completeness compares with route length completeness are presented in Appendix A (i.e., Figures A1-A7).Additionally, graphs showing how building count completeness compares with building area completeness are presented in Appendix A using the same figures showing route count versus length comparisons.The next section presents completeness growth per mapper at each mapping stage showing results of disparities in completeness growth using all four completeness definitions outlined in Section 3.4.Figure 7 shows sample completeness maps of Ibadan site 2 which achieved the highest building and route completeness.Figure 7a shows that prior to remote mapping (Stage 1) in this study, there was some level of completeness for buildings and routes although not at the desirable level required for any serious work or decision-making.Another contrasting sample completeness map is shown in Figure 8 using Karachi site, which achieved nearly zero building completeness in Stage 1. Figure 7 shows sample completeness maps of Ibadan site 2 which achieved the highest building and route completeness.Figure 7a shows that prior to remote mapping (Stage 1) this study, was level of completeness for buildings and routes although not at desirable level required for any serious work or decision-making.Another contrasting sample completeness map is shown in Figure 8 using Karachi site, which achieved nearly zero building completeness in Stage 1.

Completeness Growth per Mapper at Each Mapping Stage
We used the number of mappers per stage shown in Table 2 together with the completeness growth estimates in Table 3 to compute completeness growth per mapper per stage in Table 4.Only mappers who edited during a stage were counted and used for the calculation.Slum residents mapped Nairobi sites after prior training by the project team.Mostly experienced OSM mappers mapped Dhaka site; these mappers were already using OSM tools to update the database for other areas prior to remote mapping as part of our study.A mix of local and remote mappers including some slum residents mapped Karachi site.Postgraduate students mostly mapped slums in Nigeria.All inexperienced mappers who had not been exposed to OSM mapping techniques received training prior to remote mapping.
Ibadan site 2, with the least building density of 1706 buildings per sq.km, achieved maximum building completeness growth of 66% during remote mapping (Stage 1).Conversely, Dhaka site, which the highest of buildings per sq.km, achieved zero building completeness growth during remote mapping (Stage 1).This trend also applies to completeness growth contribution per mapper at Stage 1. Building completeness growth contribution per mapper in Ibadan site 2 during remote mapping (Stage 1) was nearly 4% (maximum) and zero in the case of Dhaka site.Building completeness growth remote mapping was zero percent for both slums in Asia.These results suggest that there are contextual factors that influence the to which mappers can contribute to growth.The differences in was due to of complexity of morphological features (i.e., building density)

Completeness Growth per Mapper at Each Mapping Stage
We used the number of mappers per stage shown in Table growth estimates completeness growth per mapper per stage in Table mappers who edited during a stage were counted and used for the calculation.Slum residents mapped Nairobi sites after prior training by the project team.Mostly experienced OSM mappers mapped Dhaka site; these mappers already OSM tools to update the database for other areas prior to remote mapping as part of our study.A mix of local and remote mappers including some slum residents mapped Karachi site.Postgraduate students mostly mapped slums in Nigeria.All inexperienced mappers who had not been exposed to OSM mapping techniques received training prior to remote mapping.
Ibadan site 2, with the least building density of 1706 buildings km, building completeness growth of 66% during remote mapping (Stage 1).Conversely, Dhaka site, which has the highest building density of 22,407 buildings per sq.km, achieved zero building completeness growth during remote mapping (Stage 1).This trend also applies to completeness growth contribution per mapper at Stage 1. Building completeness growth contribution per mapper in Ibadan site 2 during remote mapping (Stage 1) was nearly 4% (maximum) and zero in the case of Dhaka site.Building completeness growth per mapper during remote mapping was zero percent for both slums in Asia.These results suggest that there are contextual factors that influence the extent to which mappers can contribute to completeness growth.The differences in the study areas was partly due to the degree of complexity of morphological features (i.e., building density)

Completeness Growth per Mapper at Each Mapping Stage
We used the number of mappers per stage shown in Table 2 together with the completeness growth estimates in Table 3 to compute completeness growth per mapper per stage in Table 4.Only mappers who edited during a stage were counted and used for the calculation.Slum residents mapped Nairobi sites prior training by the project team.Mostly experienced OSM mappers mapped Dhaka site; these mappers were already using OSM tools to for other areas prior to remote mapping of A mix of local and remote mappers including some slum residents mapped Karachi site.Postgraduate students mostly mapped slums in Nigeria.All inexperienced mappers who had not been exposed to OSM mapping techniques training prior to remote mapping.
Ibadan site 2, with the least building density of 1706 buildings per sq.km, achieved maximum building completeness growth of 66% during remote mapping (Stage 1).Conversely, Dhaka site, which has the highest 22,407 buildings sq.achieved zero building completeness growth remote mapping (Stage 1).This trend also applies to completeness growth contribution per mapper at Stage 1. Building completeness growth contribution per mapper in Ibadan site 2 during remote mapping (Stage 1) was nearly 4% (maximum) and zero in the case of Dhaka site.Building completeness growth per mapper during remote mapping was zero percent for both slums in Asia.These results suggest that there are contextual factors that influence the extent to which mappers can contribute to completeness growth.The differences in the study areas was partly due to the complexity of morphological features density) the inability to due architecture), but there may be other example, regarding building density versus building completeness-growth relationship, the identified two "extreme" conditions (i.e., the two outliers; complex rooftop architecture and high density) are less impactful at Stage 2. As shown in Table 2, the total number of mappers differed across stages in all the sites and the difference in mappers' experience gathered in the course of mapping may differ which in turn may influence buildings and routes completeness-growth contribution behavior (future work should look into this).Additional analysis showed an increase in the linear relationship between building density and building completeness growth without consideration to the number of mappers (Figure 10).The same was not found for routes and future work should investigate these differences further.the relationship between route count completeness growth per mapper and route in both stages does not show a linear trend (Figure 11).However, it is important to note that given that the sample sizes (number of data points) for generating Figure 9, Figure 10, and Figure 11 are small, these relationships described are only indicative and we are not claiming that the relationship between the variables is statistically significant (which should be investigated in future work).

Discussion and Conclusions
This study has for the first time presented empirical evidence on completeness of data on buildings and routes of seven different slums in four countries across Africa and Asia at different stages of a systematic OSM-based participatory mapping process.The following research question was explored: What is the spatial data quality of collaborative remote mapping achieved by volunteer mappers in morphologically complex urban areas?In addressing this question, we focused on the possible extent to achieve completeness during remote mapping of slums based on field data; completeness growth during remote (and field) mapping of slums; and completeness growth contributions per mapper during remote mapping and fieldwork while providing additional perspective on how they relate with the density of buildings and routes.This section frames the discussion and conclusion in terms of collaborative remote mapping and spatial data quality of morphologically complex urban areas, lessons learnt from the mapping process, and limitations of the study and future outlook.

Discussion and Conclusions
This study has for the first time presented empirical evidence on completeness of data on buildings and routes of seven different slums in four countries across Africa and Asia at different stages of a systematic OSM-based participatory mapping process.The following research question was explored: What is the spatial data quality of collaborative remote mapping achieved by volunteer mappers in morphologically complex urban areas?In addressing this question, we focused on the possible extent to achieve completeness during remote mapping of slums based on field data; completeness growth during remote (and field) mapping of slums; and completeness growth contributions per mapper during remote mapping and fieldwork while providing additional perspective on how they relate with the density of buildings and routes.This section frames the discussion and conclusion in terms of collaborative remote mapping and spatial data quality of morphologically complex urban areas, lessons learnt from the mapping process, and limitations of the study and future outlook.

Collaborative Remote Mapping and Spatial Data Quality
The results presented in this study could provide insights into how much fieldwork would be needed in what kind of complexity and to what extent the involvement of local volunteers in these efforts is required.The major scientific contribution of this study is on the spatial data quality of remotely mapped data through volunteer mapping efforts in morphologically complex areas.This study advances our understanding of spatial data quality dimension in humanitarian remote mapping collaboration, providing a foundation for the improvement and use of OSM in future transdisciplinary studies in health and other fields.Humanitarian OSM-based collaborative mapping projects are exclusively based on digital imagery of the mapped areas, but this type of remote mapping may produce an uncertain data quality [11], since the mappers may not have the tacit knowledge on the local spatial context.There is an emergence of multiscale crowdsourced digital maps to help inform equitable urban planning where local knowledge is paramount for critical decision-making [48].In this study, it has been shown that even with the local knowledge of local mapping teams, data quality is still not one hundred percent achievable in the remote mapping stages.This finding raises further questions, such as to what extent can data emerging from remote mapping should be trusted.In some cases, it is impossible to trust the generated data from remote mapping, such as areas with complex rooftop architectures (e.g., Karachi site) and extreme building density (e.g., Dhaka site).The influence of rooftop architectures and density in this study is in line with other studies suggesting that roof characteristics (e.g., surfaces, and densities) can pose a problem when mapping complex morphological features where systematic studies on slum mapping are encouraged [3].The findings in this study show that it is possible to achieve completeness during remote mapping of slums for buildings (up to 85%) and routes (up to 73%) for sites with morphology that are more regular with less building density.The contribution to spatial data quality per mapper varied considerably across sites, reaching a maximum of 6% at the remote mapping stage and a maximum of 10% at the fieldwork stage.This finding is relevant within the context of humanitarian remote mapping of morphologically complex urban areas, like slum environments, where the completeness of the generated map is generally unknown.This study may therefore be used as a guide for future investigations on the expected contribution of individual mappers.

Lessons Learnt from the Mapping Process
This study employed a systematic OSM-based mapping approach for the production, curation, and analysis of volunteered geographic information (VGI) on urban communities based on a combination of collaborative satellite-imagery digitization and participatory mapping which relies upon geospatial open-source technologies and the collaborative mapping platform OSM.Findings across Stages 1 and 2 show that our method generated promising completeness results: particularly showing the heterogeneous nature of completeness growth during remote mapping.The participatory mapping process is reproducible given that the same mapping workflow and open-source technologies were used across all the study sites with different mapping teams.However, the process still requires technical expertise and future work should focus on optimizing the integration of the tools used to make it easier to implement for survey research.It is important to note that the overall goal of the mapping approach that we designed and implemented was to produce a high-quality spatial data sampling frame for a health survey and research in slums.Some of the lessons learnt are as follows.
Careful training of volunteer mappers on mapping tools is essential for the success of implementing OSM-based mapping for slum health surveys and research.The use of portable Global Positioning System (GPS) devices alone did not work at the household level during field mapping.In this study, GPS location functionality in the tablet was used as a guide along with the FieldPapers for orientation and identification of the actual building structure pre-loaded in the tablet as tiles.The use of FieldPapers played a key role in data collection and cleaning.FieldPapers served as a reference for solving any building identification disputes.Using the FieldPapers technology requires careful planning but ensures that buildings are identified and coded correctly as well as scanned properly to make it easier to upload and link to OSM editors for conflation.In this study, a 13-digit code was used to ensure unique structure codes.Routes are easy to interpret, useful in the field, and must always be mapped during remote mapping to facilitate fieldwork.Where the interpretation of satellite imagery is difficult during remote mapping (Stage 1), it is best not to attempt to map the individual buildings but to focus on well-known monuments (e.g., churches, mosques, and police stations) for orientation together with the mapping of routes.Online mapping of road networks and well-known monuments proved very useful for orientation in the field in this study.The rooftop architecture of building structures can create difficulties during mapping, and it is essential that consensus is reached regarding their interpretability prior to setup for remote mapping on Tasking Manager.Another important consideration is the security of mappers in the field; this could be ameliorated by working with the slum residents and community leaders.Although it is less unanticipated that experienced OSM-mappers in developed urban areas can generate high-quality data [49], our observations of the mapping process (as well as results from the quantitative analysis in this study) suggest that inexperienced but trained OSM-mappers in slums can also produce high quality data.

Limitations of the Study and Future Work
The study is limited by the scope, which is about slums.There is likely to be a plethora of other potential open-source technologies, which could have been extensively tested and used; familiarity with the technologies influenced the design and choices.Future work should examine how mappers (especially slum residents) perceive such mapping processes, sociodemographic profiles of mappers, and attribute accuracy to deepen our understanding of slum mapping for health research.Future research should look at methods for auto-defining the different stages of participatory mapping by using online mapping platforms to provide the basis for conducting a comparative systematic study of data quality at different geographic regions.Initial consideration could be to use the remote mapping period on HOT Tasking Manager as Stage 1 and explore OSM Changesets to identify data-source declarations and determine a Stage 2 timeframe.Such an endeavor will require a careful and systematic approach throughout the mapping stages to ensure explicit identification of objects and their validation.Although this study partly contributed to the temporal quality (another quality element looking at the validity of changes in the database in relation to real-world changes and also the rate of updates [22]), based on the completeness growth estimates, future work could examine the actual rate of modifications (e.g., deletes) and how they are reflected in space.Another possibility for future studies is to explore the remaining quality elements by considering their evolution across the two stages.Other emerging research is the use of participatory mapping and automated methods (e.g., machine learning) for structure detection and population estimation within slum regions.It is important to note that the participatory mapping approach used in this study is part of many approaches for slum mapping, which should be ideally combined towards an integrated deprived area "Slum" mapping system in the Global South [2].Additionally, we see the current work as a step for the improvement of OSM-based workflows and mapping tools in support of a methodological framework for geospatial mapping of health and wellbeing in urban poor areas.Such improvement can facilitate impactful and potential future collaboration with local partners, OSM community, and other researchers for data production, usage, and analytics.

Figure 1 .
Figure 1.Participatory mapping process workflow with defined stages in grey background.

Figure 1 .
Figure 1.Participatory mapping process workflow with defined stages in grey background.

Figure 3 .
Figure 3. Spatial location of study sites and country (a-d) and overview map (e).

Figure 3 . 25 Figure 3 .
Figure 3. Spatial location of study sites and country (a-d) and overview map (e).

Figure 4 .
Figure 4. Photographic characterization of study sites showing samples of satellite imagery (a,d,g,j,m,p,s), structures of buildings (b,e,h,k,n,q) and routes (c,f,i,l,o,r,u).

Figure 4 .
Figure 4. Photographic characterization of study sites showing samples of satellite imagery (a,d,g,j,m,p,s), structures of buildings (b,e,h,k,n,q) and routes (c,f,i,l,o,r,u).

Figure 9 .
Figure 9.Comparison of building completeness growth per mapper and density.(a) During remote mapping stage; (b) During fieldwork stage.

Figure 9 .
Figure 9.Comparison of building completeness growth per mapper and density.(a) During remote mapping stage; (b) During fieldwork stage.

Figure 9 .
Figure 9.Comparison of building completeness growth per mapper and density.(a) During remote mapping stage; (b) During fieldwork stage.

Figure 11 .
Figure 11.Comparison of route completeness growth per mapper and density.(a) During remote mapping stage; (b) During fieldwork stage.

Figure 11 .
Figure 11.Comparison of route completeness growth per mapper and density.(a) During remote mapping stage; (b) During fieldwork stage.

Figure A1 .Figure A1 .
Figure A1.Completeness of buildings and routes during mapping in Karachi site.

Figure A2 .
Figure A2.Completeness of buildings and routes during mapping in Ibadan site 2.

Figure A3 .Figure A3 .
Figure A3.Completeness of buildings and routes during mapping in Ibadan site 1.

Figure A4 .
Figure A4.Completeness of buildings and routes during mapping in Lagos site.

Figure A5 .Figure A4 . 25 Figure A4 .
Figure A5.Completeness of buildings and routes during mapping in Dhaka site.

Figure A5 .Figure A5 .
Figure A5.Completeness of buildings and routes during mapping in Dhaka site.

Figure A6 .
Figure A6.Completeness of buildings and routes during mapping in Nairobi site 2.

Figure A7 .
Figure A7.Completeness of buildings and routes during mapping in Nairobi site 1.

Figure A6 . 25 Figure A6 .
Figure A6.Completeness of buildings and routes during mapping in Nairobi site 2.

Table 4 .
Completeness growth per mapper per mapping stage.