Are Crowdsourced Datasets Suitable for Specialized Routing Services? Case Study of OpenStreetMap for Routing of People with Limited Mobility

: Nowadays, Volunteered Geographic Information (VGI) has increasingly gained attractiveness to both amateur users and professionals. Using data generated from the crowd has become a hot topic for several application domains including transportation. However, there are concerns regarding the quality of such datasets. As one of the most famous crowdsourced mapping platforms, we analyze the ﬁtness for use of OpenStreetMap (OSM) database for routing and navigation of people with limited mobility. We assess the completeness of OSM data regarding sidewalk information. Relevant attributes for sidewalk information such as sidewalk width, incline, surface texture, etc. are considered, and through both extrinsic and intrinsic quality analysis methods, we present the results of ﬁtness for use of OSM data for routing services of disabled persons. Based on empirical results, it is concluded that OSM data of relatively large spatial extents inside all studied cities could be an acceptable region of interest to test and evaluate wheelchair routing and navigation services, as long as other data quality parameters such as positional accuracy and logical consistency are checked and proved to be acceptable. We present an extended version of OSMatrix web service and explore how it is employed to perform spatial and temporal analysis of sidewalk data completeness in OSM. The tool is beneﬁcial for piloting activities, whereas the pilot site planners can query OpenStreetMap and visualize the degree of sidewalk data availability in a certain region of interest. This would allow identifying the areas that data are mostly missing and plan for data collection events. Furthermore, empirical results of data completeness for several OSM data indicators and their potential relation to sidewalk data completeness are presented and discussed. Finally, the article ends with an outlook for future research study in this area.


Introduction
Several research projects with sustainable applications varying from routing and navigation systems [1][2][3][4][5][6], traffic and transportation [7] to energy modeling [8] as well as population estimation [9] have used OpenStreetMap (OSM) as their primary data source because of being up-to-date, containing more detailed information compared to official datasets (in densely populated areas), and being open and free to use.Since the launch of OSM project in 2004 [10], the quality of its data has always been a concern for both research and industrial communities [11].The reason behind this concern might be that data are collected by volunteers who are not necessarily familiar with data collection procedures and the fact that volunteers have different levels of competency.
Numerous research studies (see Section 2) have explicitly stated the importance and value of OSM data quality evaluation in their projects.Hence, it is crucial to evaluate the quality of dataset to see if it fits the purpose of use.Several research studies have been conducted to understand and evaluate the quality of OSM data based on different data quality elements and for different application purposes [12][13][14][15][16][17][18].Some studies have only focused on assessing the completeness (as one of the geo-data quality elements) of OSM regarding certain objects of interests such as road street network [19][20][21][22], building footprints [23], bicycle trails [24], as well as land use information [25].
Quality of routing and navigation systems relies heavily on the quality and availability of input datasets.Nowadays, there are several data solutions available for routing and navigation systems, which OpenStreetMap data is one of the most up-to-date (in densely populated areas) and free to use.However, specialized routing and navigation systems such as systems for people with limited mobility need special consideration in terms of fitness for use of the data for their requirements.Existing commercial or governmental geo-data sources are not suitable for this issue due to lack of detailed attribute information about features such as sidewalks, surface condition, as well as height and slope information as well as the fact that oftentimes reference datasets are limited in terms of covered area and they are not up-to-date [26].In recent years, the research community has experienced a strong increase in studies related to routing applications tailored to people with disabilities in which the lack of a sophisticated dataset played a major role [3].
Within the CAP4Access European project [27], we aim to use OpenStreetMap data for routing and navigation of people with limited mobility.Research that focuses on routing specifications for disabled people, such as wheelchair users, blind, deaf or elderly people, has experienced a strong increase in recent years [28][29][30], where some of them use crowdsourced geographic information as their data source [29][30][31][32][33].An important issue that needs to be considered for such studies is that geo-data requirements may vary significantly depending on the project's purpose.Routing applications for pedestrians have different geo-data requirements than motorized traffic and vice versa [34].Therefore, research on the quality of OSM data with respect to wheelchair routing requirements in particular seems necessary.To the best of our knowledge, there is no study done on assessing the completeness and fitness for use of OpenStreetMap dataset for wheelchair routing services.Studies that have used OSM data for this purpose assumed that the dataset is good enough to be used.In order to properly assess the fitness for use of OSM in this regard, there is a need of understanding the data requirements of such a routing service and checking to see whether OSM database fulfills those requirements or not.
Several studies have highlighted the data requirements for a potential routing and navigation system for wheelchair users [31,35].Oftentimes the system and its corresponding data are created through surveys.As a reference in this study, the specification by the German Institute for Standardization provides a foundation for this particular type of information.DIN 18024-1 [36] describes the accessibility requirements for disabled people.The standard includes recommendations for different handicap types, which also help to define the target user group for which our study is conducted: Wheelchair users [36].Based on the specification, some of the recommended parameters that need to be implemented in the final dataset can be surface information, incline and width of the sidewalk segment.However, based on a number of other studies [29,[37][38][39], additional parameters for a disabled friendly routing network have been determined which would are later introduced in Section 3.
The main hypothesis behind this research is that patterns between geo-data implemented in such widely used applications and the geo-data requirements for applications tailored to disabled people need to be evaluated.Therefore, in order to evaluate the fitness of use of OSM data for this project, this study investigates the suitability and quality of OpenStreetMap data regarding the specific attributes that are relevant to wheelchair routing services.For this, we study the completeness of OpenStreetMap sidewalk data in Germany for three levels of completeness: objects, attributes and values.The results show the statistics of data inconsistencies in OpenStreetMap dataset for several German cities for wheelchair routing application.
The remainder of the paper is structured as follows.In the next section, we introduce the methodology for evaluating the completeness of OSM regarding sidewalk information in two stages: extrinsic and intrinsic.Section 3 presents the results of applying the methods for selected use cases with detailed discussion on the results.Finally, in Section 4, we conclude and point out some ideas for future work on this topic.

Methods
Several research studies have been conducted to understand and evaluate the quality of OSM data [12] based on different data quality elements and for different application purposes [13][14][15][16][17][18].Some studies have focused on validating the positional accuracy of OpenStreetMap data by comparing it with reference datasets [15,16] or by using photogrammetric approaches [40].Other studies have evaluated the quality of OpenStreetMap regarding the thematic accuracy [41][42][43] and topological consistency [44][45][46].In addition, several studies have assessed the completeness (the relevant data quality element for our study) of OSM regarding certain objects of interests such as road street network [19][20][21][22], building footprints [23], bicycle trails [24], and land use information [25].No particular study on completeness of sidewalk information has been done so far.
Completeness is defined as a measure of the lack of data.It could be divided into three types/levels: (a) object; (b) attribute; and (c) value.In the first case, it is an assessment of how many objects are expected to be found in the dataset but are missing, as well as an assessment of excess of data that should not be included [47].The same definition is valid for the other two types where missing/excess of attributes and/or values in a dataset for a certain object are counted as inconsistency for measuring the completeness of data.Since the aim of this study is to assess the completeness of sidewalk information, and because sidewalk information in OSM are attached to road features, an overview of literature about completeness of road features is necessary.
Several research attempts have been made for defining a quality indicator for completeness of OpenStreetMap data with regards to road street network.In one of the very first attempts, Haklay [15] studied the completeness of road features in OpenStreetMap of London.As the first indicator, the total length of road feature in OSM was compared to corresponding feature in reference data.It was assumed that, since their official reference dataset is generalized, it excludes minor roads as well as foot and cycle paths.Therefore, it was expected that, in areas that OSM has good coverage, the total length of OSM roads must be longer than the total length of features in reference dataset.The results showed that, at a macro level, the OSM dataset total length was 60% of the reference dataset.However, since their reference dataset was an incomplete and generalized coverage, there was an underestimation of the total length of roads for London.As a result, the author argues that, in 70% of the area, reference data provide a better, more comprehensive coverage than OSM [15].It has also been discussed that in the boundary between the city and the rural areas that surrounds it, the incompleteness of data is happening more evidently.The research study was furthermore extended by applying a visual inspection of two datasets against each other, which gave valuable insights of differences in mapping urban areas of OSM data in terms of missing/excess of object features.This visual comparison approach was also performed in other research studies were the completeness of OSM data for Ireland was evaluated by comparing it with Google and Bing maps [48].In this study, Ciepluch et al. have taken similar approaches for creating quality indicators of completeness of OpenStreetMap data through defining four indicators.First, they calculated the total length (number of kilometers) of roads in Ireland for both OpenStreetMap and reference dataset, compared the results and concluded that the availability of the road features in OpenStreetMap were good.There were some differences in values of length of roads in different road categories compared to reference dataset, but, depending on the purpose of application (e.g., navigation), it was concluded that OSM data for Ireland are a complete dataset.As another quality indicator for checking the completeness of data, the authors have performed a grid-based analysis where the total presence of road features per category for each dataset is counted and visualized.The third indicator was buffer analysis, where the number of km of OSM road (for each road class) which lie outside of a certain threshold value (i.e., buffer size) buffer of the corresponding road feature in the reference data is computed.Finally, the fourth indicator measured the completeness of road features by visualizing the places (e.g., grid bounding boxes) where one of the datasets (OSM or reference data) contained more road lengths in km.This was used as an indicator of where roads were better mapped [48].
Tenney [49] has also performed a completeness check for Canadian OpenStreetMap data.In his research, road names (attribute level) of OSM where compared to corresponding values in reference dataset using semantic-similarity and character-to-character comparison similar to those offered in [16,[50][51][52].Their results indicate that more attention should be paid to the role of imports to OSM data in terms of timelessness and systematic error propagation.Data imports are for example when information from an existing source has been integrated and uploaded as bulk import into OSM.This kind of enrichment seems to have both positive and negative impacts on the overall quality of the OSM map, where completeness of data increases with the data imports from authoritative datasets.Examples of negative impacts on bulk data imports include problems with miss-classifications and/or data integration inefficiencies [53].
Furthermore, findings based on an analysis to determine similarities and differences in data contributions and community development in OSM between 12 selected urban areas of the world has shown significantly different results in data collection efforts as well as local OSM community sizes [54].The results showed that European cities provide quantitatively larger amounts of geo-data and number of contributions in OSM that in turns results in a better representation of the real world in the database.Based on this research, one could conclude that the level of completeness of OSM data in Europe is generally higher than other continents, with the exceptions of densely populated cities.
It is important to note that the quality of Volunteered Geographic Information (VGI) in general strongly depends on the user's behaviors in mapping and understanding the concepts of data contribution [55].There have been some studies regarding this topic specifically for OSM data [55][56][57][58].Following this idea, we believe that there is a potential strong interrelation between intrinsic OSM data indicators (e.g., user experience, number of edits for a certain object, etc.) with data quality indicators.However, in this article, we examine the completeness of OSM data regardless of such relation.
The completeness check could be performed at an object level as well as attribute and value level.In order to assess the absence of sidewalks in OSM, we would need to compare OSM with a reference dataset, such as a map of all sidewalks in the selected city.The procedure for such data comparison is called extrinsic quality analysis.In cases where reference datasets are not available, the option of counting the number of objects or their attributes and values in OSM data could give us an understanding of level of completeness of data for that object.The counting of objects would be useful for estimating the completeness because one could measure the absence of data regarding the total objects of interest.This type of evaluation is so called intrinsic quality analysis.
In terms of relation to previous works, the initial part of our study is similar to works of [15,48] where we have conducted an extrinsic quality analysis of OSM data for the city of Heidelberg.In terms of indicators, this is done by using the total length of roads (highway tags in OSM) that have sidewalk tag assigned to it compared to total length of sidewalk objects in the reference dataset.Furthermore, since our project is not only for Heidelberg city and, in the future, our quality analysis should be applied to other cities where we do not have reference dataset at hand, in the second phase of our study, we perform an intrinsic quality analysis.This is completed by extending an open-source tool for area-aggregated analysis of such information.

Results and Discussion
In our study, we perform both extrinsic and intrinsic analysis in order to acquire a solid understanding of sidewalk completeness in OSM.However, the extrinsic quality analysis approach would only be carried out for Heidelberg, Germany since we have access to the sidewalk reference dataset of this city.In the case of intrinsic quality analysis, we have selected more cities so that our results could be more generalized to country level.The selected cities for intrinsic analysis include the capital of Germany (Berlin); two large and densely populated cities, Munich and Hamburg; and two smaller cities, Heidelberg and Freiburg.The selections of cities are based on being both large and smaller cities with different population density.In addition, Freiburg is selected specifically because previous projects and efforts have been completed regarding definition and collection of sidewalk information in OSM for this city.Therefore, we would like to see in our future statistic results whether this issue impacts the completeness of data for such city compared to others and to what level.

Extrinsic Quality Analysis
As the first step for checking the completeness of sidewalk information in OSM, we must have an estimation of total number of sidewalk features in the selected regions.Therefore, an extrinsic quality analysis has been done by means of checking the OSM road features that have sidewalk tag, compared to a reference dataset of sidewalks in the city of Heidelberg.For this issue, we follow the methodology presented by Koukolestos et al. [20], and Ciepluch et al. [48], where the total length of linear features (in Kilometers) are considered as the index for comparison.Our approach for matching OSM road features with the sidewalk reference dataset is similar to the map matching approach presented by Fan et al. [59].Figure 1 shows the flowchart of the tasks that were carried out for the extrinsic quality analysis.
Sustainability 2017, 9, 997 5 of 17 two smaller cities, Heidelberg and Freiburg.The selections of cities are based on being both large and smaller cities with different population density.In addition, Freiburg is selected specifically because previous projects and efforts have been completed regarding definition and collection of sidewalk information in OSM for this city.Therefore, we would like to see in our future statistic results whether this issue impacts the completeness of data for such city compared to others and to what level.

Extrinsic Quality Analysis
As the first step for checking the completeness of sidewalk information in OSM, we must have an estimation of total number of sidewalk features in the selected regions.Therefore, an extrinsic quality analysis has been done by means of checking the OSM road features that have sidewalk tag, compared to a reference dataset of sidewalks in the city of Heidelberg.For this issue, we follow the methodology presented by Koukolestos et al. [20], and Ciepluch et al. [48], where the total length of linear features (in Kilometers) are considered as the index for comparison.Our approach for matching OSM road features with the sidewalk reference dataset is similar to the map matching approach presented by Fan et al. [59].Figure 1 shows the flowchart of the tasks that were carried out for the extrinsic quality analysis.We have used OSM2PGSQL tool [60] in order to extract and transfer relevant data from OpenStreetMap into our database.The total linear features in OSM database that have a tag of sidewalk = yes assigned to them are filtered and considered as sidewalk features available in OSM data.As depicted in Figure 1, urban block as the smallest area surrounded by roads could be derived from road network data through three steps: (1) splitting road line segments into small line segments where they intersect with other roads; (2) forming polygons as areas enclosed by the line segments; and (3) extracting urban blocks from these polygons (considering different classes).For more information regarding preprocessing and extraction of urban blocks from road networks, refer to Fan et al. [59].After extracting the relevant features and preprocessing the data according to Figure 1, the overlapping and non-overlapping features of the resulting OSM dataset with the adjusted sidewalk We have used OSM2PGSQL tool [60] in order to extract and transfer relevant data from OpenStreetMap into our database.The total linear features in OSM database that have a tag of sidewalk = yes assigned to them are filtered and considered as sidewalk features available in OSM data.As depicted in Figure 1, urban block as the smallest area surrounded by roads could be derived from road network data through three steps: (1) splitting road line segments into small line segments where they intersect with other Sustainability 2017, 9, 997 6 of 17 roads; (2) forming polygons as areas enclosed by the line segments; and (3) extracting urban blocks from these polygons (considering different classes).For more information regarding preprocessing and extraction of urban blocks from road networks, refer to Fan et al. [59].After extracting the relevant features and preprocessing the data according to Figure 1, the overlapping and non-overlapping features of the resulting OSM dataset with the adjusted sidewalk reference dataset were identified.Next, the total length of features was calculated and compared (Table 1).

Intrinsic Quality Analysis
In addition to extrinsic analysis, we have performed an intrinsic analysis of OSM sidewalk data completeness by only looking into the OSM data.This analysis has been carried out for five selected cities in Germany (Table 2).As stated earlier, the completeness check could be performed at an object level as well as attribute and value level.For means of intrinsic analysis of OSM, we need to perform an analysis of data completeness at attribute level.This is because sidewalk information is modeled as tags of attributes to road objects in OSM.There could be other options for modeling sidewalks such as mapping sidewalk geometries together with their attributes.This would specially be beneficial for pedestrian/wheelchair routing purposes.However, such proposals are controversially discussed within the OSM community and there have been several good reasons for not selecting this modeling option.The main reasons include simplicity for data collection as well as reducing the size of dataset.Therefore, in this step, we have analyzed the OSM dataset at attribute level.For assessing the number of sidewalk objects, we have counted the number of road segments (e.g., highways in OSM (please note that the term "highway" has a different meaning in the OSM community and this term is used for lablling all linear passable routes)) that have a tag of sidewalk attached to them (e.g., highway = footway tag with assigned footway = sidewalk tag).This means that the route object also has sidewalk information attached to it.However, four possibilities could occur: In the first case, it might be that the value for sidewalk tag is equal to "none" or "no" which means that the specific way does not have sidewalk(s) beside it.The second value that this tag could carry is "both" meaning that the specific road has sidewalks on both left and right side.The third case could be a value of "left/right" meaning that the road has only one sidewalk attached to it and that it is known on which side the sidewalk is available.The fourth case is that the value only contains "yes".This means that the way has sidewalks on either one or two side of it.In the case of having one sidewalk, on which side of the way it exists, is unknown.
Different values of this tag would not make a difference for our completeness check.What is important is that the road segment has a sidewalk tag assigned to it, which means that we could gain information for routing and navigation service from this feature.In cases where no sidewalk tag is defined for a road segment, the object is counted as incomplete feature for having sidewalk information.Therefore, for the sidewalk object completeness check, we count all road segments (streets, highways, and footways) that have a tag of sidewalk attached to them and compare them to the total number of road segments to estimate the number of inconsistencies.Figure 2 gives a visual impression of road segments having sidewalk information in a small part of Heidelberg.As depicted, for this part of Heidelberg, it can be visually seen that the dataset is lacking sidewalk information for about 40% of the region.It is important to note that there are several examples of regions in the city where no sidewalk information exists.Furthermore, Table 2 gives the statistics relating to the number of ways with sidewalk objects in all selected cities.It is important to note that all the features with highway tag in OSM have been included in the analysis.However, in reality, certain highway features do not and should not have sidewalks, and it is logically expected to have sidewalk information tagged to residential highway features.Therefore, later we have improved this analysis by considering the statistics of highway types into two categories of major and minor.For the second level of completeness check, we aimed to assess the number of additional attributes relating to sidewalks (e.g., width, incline, etc.) that were present for the roads that have had the sidewalk tag provided.For this purpose, we used relevant sidewalk attributes for routing and navigation of people with limited mobility.The attributes selected were adopted from another study [3] except the attributes lit, crossing and general access, as they seemed irrelevant in our project (Table 3).
Results of checking the completeness of attributes are given in Table 4.The assessment shows that for larger cities such as Berlin with high density population, the number of sidewalk objects and their attributes are higher and hence the OSM for these regions are more complete compared to smaller cities.This might be due to higher number of OSM volunteers in these regions, which is directly related to the higher population.This issue has already been addressed in several research studies [22,54,[61][62][63].However, this is not always the case and there exist other primary cities with high population that such information is predominantly missing (e.g., Munich).For the second level of completeness check, we aimed to assess the number of additional attributes relating to sidewalks (e.g., width, incline, etc.) that were present for the roads that have had the sidewalk tag provided.For this purpose, we used relevant sidewalk attributes for routing and navigation of people with limited mobility.The attributes selected were adopted from another study [3] except the attributes lit, crossing and general access, as they seemed irrelevant in our project (Table 3).
Results of checking the completeness of attributes are given in Table 4.The assessment shows that for larger cities such as Berlin with high density population, the number of sidewalk objects and their attributes are higher and hence the OSM for these regions are more complete compared to smaller cities.This might be due to higher number of OSM volunteers in these regions, which is directly related to the higher population.This issue has already been addressed in several research studies [22,54,[61][62][63].However, this is not always the case and there exist other primary cities with high population that such information is predominantly missing (e.g., Munich).Table 3.The relevant attributes of sidewalk for routing and navigation of disabled people, adopted from [3].Notes: *: The percentage of coverage is calculated by considering the total number of ways tagged as sidewalk in the respected city (see Table 2).

OSM Tag Scale
Furthermore, the results show that although there has been some level of contribution made by OSM community for mapping information of sidewalks and their attributes, the total coverage of this information remains very low.This is problematic when the data are to be used for a wheelchair routing and navigation service in the city.This status could however change by raising the awareness of the importance of such information for OSM community and attempting to complete OSM with respect to this information.Despite the fact that OSM data are not complete for a whole city to be used by a wheelchair routing and navigation service, through a fine-scale inspection one could still find several large regions inside cities that might have an acceptable level of quality in terms of sidewalk data availability.Therefore, it is suggested to use OSM data of these regions for testing and implementing routing and navigation services.
Furthermore, within the CAP4Access project there is the need for performing an area aggregated quality assessment approach in order to understand the process of sidewalk data enrichment in OSM over different time periods.For this reason, we have extended a web service called OSMatrix [64] for the completeness check of sidewalk information in OpenStreetMap data.OSMatrix tool [65] is a web-based service for visual exploration and analysis of OSM data.It allows visualizing information on user contributions and the existence of certain features and attributes.The tool contains a graph-based visualization of a measure or attribute over time.In addition, it allows producing comparison maps for a certain attribute or measuring for different pre-selected timestamps.The relevant attributes and measures are aggregated into hexagonal cells with an edge length of 1 km.For the intrinsic evaluation of sidewalk data completeness, we examined the following attributes with regard to availability in the OSM dataset for five different timestamps: 2008, 2010, 2012, 2015 and 2016.The sidewalk measures include: total amount of sidewalk information, sidewalk width, sidewalk surface texture, sidewalk incline, and sidewalk smoothness.Figure 3 shows an example of the service portraying the total amount of sidewalk information (tags in OpenStreetMap related to sidewalk information) for an area of Heidelberg city center in April 2017.Each cell shows a value representing the aggregation of the total number of tags in OSM related to sidewalk attributes in that hexagon.Moreover, Figure 4 demonstrates the temporal functionality of OSMatrix in terms of comparing the data availability of certain feature (e.g., sidewalk information) for two different timestamps.This functionality is useful for understanding and controlling the enrichment of sidewalk data and could be used by mapping party organizers to collect data for certain regions where no sidewalk information is currently available [66]; hence, leading to sidewalk data enrichment.
Sustainability 2017, 9, 997 9 of 17 intrinsic evaluation of sidewalk data completeness, we examined the following attributes with regard to availability in the OSM dataset for five different timestamps: 2008, 2010, 2012, 2015 and 2016.The sidewalk measures include: total amount of sidewalk information, sidewalk width, sidewalk surface texture, sidewalk incline, and sidewalk smoothness.Figure 3 shows an example of the service portraying the total amount of sidewalk information (tags in OpenStreetMap related to sidewalk information) for an area of Heidelberg city center in April 2017.Each cell shows a value representing the aggregation of the total number of tags in OSM related to sidewalk attributes in that hexagon.Moreover, Figure 4 demonstrates the temporal functionality of OSMatrix in terms of comparing the data availability of certain feature (e.g., sidewalk information) for two different timestamps.This functionality is useful for understanding and controlling the enrichment of sidewalk data and could be used by mapping party organizers to collect data for certain regions where no sidewalk information is currently available [66]; hence, leading to sidewalk data enrichment.5), April 2017.Furthermore, to better understand the reasons behind the incompleteness of sidewalk information and the differences of statistics between various regions, we have used OSMatrix for the following data indicators:


Total number of highway features;  Total number of residential highway features only (i.e., residential streets);  Length of highway with major type including highways mapped as motorway, primary, secondary, tertiary and trunk;  Length of highway with minor type including highways mapped as residential, track, and service;  The total area covered by residential facilities;  Total number of buildings;  Total number of OSM users that have contributed data in a given area.
Figure 3 demonstrates a screenshot of OSMatrix for Heidelberg city with numbering the various hexagonal cells that cover the map.With respect to these cells, the values for all above-mentioned data indicators have been calculated and presented in Table 5.An interesting point to consider is that although the total amount of sidewalk information seems to always have direct positive relation with  5), April 2017.Furthermore, to better understand the reasons behind the incompleteness of sidewalk information and the differences of statistics between various regions, we have used OSMatrix for the following data indicators:

•
Total number of highway features;

•
Total number of OSM users that have contributed data in a given area.
Figure 3 demonstrates a screenshot of OSMatrix for Heidelberg city with numbering the various hexagonal cells that cover the map.With respect to these cells, the values for all above-mentioned data indicators have been calculated and presented in Table 5.An interesting point to consider is that although the total amount of sidewalk information seems to always have direct positive relation with total number of mapped highway features (as well as length of major and minor highways), this is not the case with total number of residential streets where one assumes to have more sidewalk information available (Figure 5a,b).An example for this is cell ID #1 where 172 residential streets have been mapped but no sidewalk information is available.On the other hand, only four residential streets are mapped in cell ID #1157086 while the total amount of sidewalk information in this area is 25.This means that highway features and residential streets are only two relevant features that might affect the mapping of sidewalk information.Another important feature is the building information.However, our analysis shows that it is rather difficult to understand the relation between sidewalk information and total number of buildings (Figure 5c).total number of mapped highway features (as well as length of major and minor highways), this is not the case with total number of residential streets where one assumes to have more sidewalk information available (Figure 5a,b).An example for this is cell ID #1 where 172 residential streets have been mapped but no sidewalk information is available.On the other hand, only four residential streets are mapped in cell ID #1157086 while the total amount of sidewalk information in this area is 25.This means that highway features and residential streets are only two relevant features that might affect the mapping of sidewalk information.Another important feature is the building information.However, our analysis shows that it is rather difficult to understand the relation between sidewalk information and total number of buildings (Figure 5c).Notes: *: TN: Total Number; ^: TA: Total Area.Furthermore, in the case of cell ID #1158888 where the most sidewalk information is available, it is observable that the level of completeness for all other features also seems to be high.Notably, this cell has the most number of users that have edited OSM database compared to other cells (Figure 5d) which could suggest that the total number of users have direct influence on the total amount of sidewalk information.However, this is not the case for various cells including cell ID #115887 and #1160692 where the total number of users is high and their contribution for mapping sidewalk data is extremely low (Figure 5d).This issue induces the idea that the type of users might also be very important to consider in this scenario.It might be the case where sidewalk information has been mapped by professional users and their activity have been more focused in parts of city where the information was required for pedestrian/wheelchair accessibility programs.
Another important data indicator that could have strong influence in this analysis is the land use information.For instance, in the case of cell ID #2 where no sidewalk information has been mapped, by checking the land use of the area it is understood that the area is mostly covered by forest and hills.Therefore, there are less actual sidewalks in reality to be mapped in that specific area.This also applies to other residential indicators such as total number of residential streets, total area of residential facilities as well as total number of buildings (Table 5).However, cell ID #1 covers a residential area, where quite large amount of buildings and residential streets have been mapped, but no attempts for mapping sidewalks have been made.This might relate to the fact that the users of that area are less aware of the importance and need of sidewalk information and therefore have not considered mapping sidewalks.
Furthermore, in the case of cell ID #1158888 where the most sidewalk information is available, it is observable that the level of completeness for all other features also seems to be high.Notably, this cell has the most number of users that have edited OSM database compared to other cells (Figure 5d) which could suggest that the total number of users have direct influence on the total amount of sidewalk information.However, this is not the case for various cells including cell ID #115887 and #1160692 where the total number of users is high and their contribution for mapping sidewalk data is extremely low (Figure 5d).This issue induces the idea that the type of users might also be very important to consider in this scenario.It might be the case where sidewalk information has been mapped by professional users and their activity have been more focused in parts of city where the information was required for pedestrian/wheelchair accessibility programs.
Another important data indicator that could have strong influence in this analysis is the land use information.For instance, in the case of cell ID #2 where no sidewalk information has been mapped, by checking the land use of the area it is understood that the area is mostly covered by forest and hills.Therefore, there are less actual sidewalks in reality to be mapped in that specific area.This also applies to other residential indicators such as total number of residential streets, total area of residential facilities as well as total number of buildings (Table 5).However, cell ID #1 covers a residential area, where quite large amount of buildings and residential streets have been mapped, but no attempts for mapping sidewalks have been made.This might relate to the fact that the users of that area are less aware of the importance and need of sidewalk information and therefore have not considered mapping sidewalks.

Conclusions and Future Work
Thanks to the openness and wide availability, there is an increasing interest in using OSM data in projects with different application domains.Nevertheless, people are often skeptical about the usability of VGI due to its quality issues (e.g., heterogeneity, unpredictability, credibility, ambiguity, inaccuracy, and incompleteness), because the data are collected through crowd-sourcing.OpenStreetMap data have heterogeneous characteristics because contributors use different tools and technologies, have different backgrounds and knowledge, and have different motivations for their activities.In fact, this heterogeneity is the main reasons that results in data with varying quality.For this reason, quality of OSM data needs to be evaluated before being used in projects.
In this paper, we have studied the completeness of sidewalk information in OSM in order to check its fitness for use for routing and navigation application for people with limited mobility.At first, through an extrinsic quality analysis we evaluated the completeness of sidewalk data in Heidelberg with comparing it to a reference dataset.The results showed that about 22.5% and 17.6% of sidewalks have been mapped in OSM database with regards to the total number of sidewalks and total length of sidewalks, respectively (Table 1).The research was continued by performing intrinsic quality analysis of OSM data for five selected cities in Germany.In this respect, we assessed the number of highway objects in OSM that have a sidewalk tag assigned and compared it to the total number of highways in OSM in the respected city (Table 2).
The percentage of coverage of highway features with sidewalk information gives us an understanding of the level of completeness of sidewalk objects in OSM data.It is important to note

Conclusions and Future Work
Thanks to the openness and wide availability, there is an increasing interest in using OSM data in projects with different application domains.Nevertheless, people are often skeptical about the usability of VGI due to its quality issues (e.g., heterogeneity, unpredictability, credibility, ambiguity, inaccuracy, and incompleteness), because the data are collected through crowd-sourcing.OpenStreetMap data have heterogeneous characteristics because contributors use different tools and technologies, have different backgrounds and knowledge, and have different motivations for their activities.In fact, this heterogeneity is the main reasons that results in data with varying quality.For this reason, quality of OSM data needs to be evaluated before being used in projects.
In this paper, we have studied the completeness of sidewalk information in OSM in order to check its fitness for use for routing and navigation application for people with limited mobility.At first, through an extrinsic quality analysis we evaluated the completeness of sidewalk data in Heidelberg with comparing it to a reference dataset.The results showed that about 22.5% and 17.6% of sidewalks have been mapped in OSM database with regards to the total number of sidewalks and total length of sidewalks, respectively (Table 1).The research was continued by performing intrinsic quality analysis of OSM data for five selected cities in Germany.In this respect, we assessed the number of highway objects in OSM that have a sidewalk tag assigned and compared it to the total number of highways in OSM in the respected city (Table 2).
The percentage of coverage of highway features with sidewalk information gives us an understanding of the level of completeness of sidewalk objects in OSM data.It is important to note that the completeness of highway objects in general is unknown and reference datasets would be needed to estimate this issue through an extrinsic quality check.However, as discussed in Section 2, other studies have shown that OSM dataset seems to be complete considering road network data especially in major cities [15].In this study, we performed an extrinsic analysis for Heidelberg, and the results showed that the total number of sidewalks (6398 in Heidelberg) are much lower than the total number of highways in OSM (10,178 in Heidelberg); more than half in this special case.The same issue could be predicted for other cities as well.Later, we introduced a list of sidewalk attributes that are relevant for our study and assessed the number of this information (number of available tags) for each city.Again, a percentage of coverage of such attribute information is calculated and presented.The results show that OSM dataset is not complete for both sidewalk objects and attributes.This status could however change by raising the awareness of the importance of such information for OSM community and attempting to complete OSM with respect to this information.Raising awareness and public engagement for sidewalk data collection is one of the aims in CAP4Access project.
Furthermore, we measured the completeness of several other OSM data indicators such as total number of highways, total number of residential highways, total length of major and minor highways, total number of buildings as well as total number of users who have contributed data in a given region of interest.Through a one-by-one comparison of the results of data indicators, we discussed the potential relation between completeness of sidewalk information versus the completeness of other features.We discussed that other indicator such as land use information and/or type of uses (e.g., professional, naïve, etc.) could potential affect the completeness of certain information (i.e., sidewalks).
It is concluded that there are sparse information regarding attributes of sidewalk in each city.This makes it possible to conclude that OSM data of relatively large spatial extents inside all studied cities could still be an acceptable region of interest to test and evaluate wheelchair routing and navigation as long as other data quality parameters such as positional accuracy and logical consistency are checked and proved to be acceptable.For example, the city of Heidelberg (one of the pilot sites in CAP4Access project) is an interesting case.Although the data are not at a good level of completeness in general, the figures indicate that values for each attribute are relatively available (Table 4).This shows that there is the possibility of selecting certain region(s) inside the city where the data quality in terms of data completeness for those regions is acceptable.
Furthermore, the OSMatrix tool could be beneficial for piloting activities whereas the pilot site planners can query OpenStreetMap data and visualize the degree of sidewalk information existence in a certain region of interest.This would allow identifying the areas that data are mostly missing and plan for data collection events.On the other hand, as shown in this study, OSMatrix would also allow users to identify regions where the relevant sidewalk information for routing and navigation of people are sufficiently available because at the same time it allows to have an understanding of data availability for other features such as road network and/or building information.
For future work in terms of CAP4Access project, research needs to be done in order to extract and derive sidewalk geometries from available information in OSM to be used for sidewalk network construction.The topological consistency of derived sidewalk geometries would however need to be evaluated.The second important quality check is to evaluate the positional accuracy of derived sidewalk geometries as well as to provide a method to assign the position of an end-user (e.g., people in wheelchairs) captured by GPS to the correct sidewalk that the person is traveling on.Last but not the least, it is recommended to develop a collective tagging system dedicated to the project, in order to allow wheelchair users to tag and update OSM data wherever they visit, as such information are not currently available.This would lead to the enrichment of OpenStreetMap data regarding sidewalk information.
In terms of research in the domain of OpenStreetMap quality analysis, it is understood that the results of extrinsic analysis for a certain city is not generalizable to other cities.Therefore, it is necessary to develop new methods and approaches for intrinsic analysis of OSM dataset by considering the OSM data indicators itself and its potential relationships with the results of quality evaluation of extrinsic analysis.Hence, for future work it is planned to develop a framework that provides in a systematic way the methods and measures to evaluate the fitness for purpose of OSM data.The main objective is that this shall also be usable when there is no reference dataset available for comparison, in order to make it applicable in a wide range of situations and extending the traditional approaches on spatial data quality evaluation.Such a framework should ideally benefit from a mathematical model representing the interrelationships between intrinsic OSM data indicators with quality indicators.

Figure 1 .
Figure 1.Workflow for matching and comparison of OpenStreetMap and reference dataset.

Figure 1 .
Figure 1.Workflow for matching and comparison of OpenStreetMap and reference dataset.

Figure 3 .
Figure 3. OSMatrix showing total amount of sidewalk information in Heidelberg (Note that two cells numbered #1 and #2 in red have been manually inserted to the figure and are later addressed in Table5), April 2017.

Figure 3 .
Figure 3. OSMatrix showing total amount of sidewalk information in Heidelberg (Note that two cells numbered #1 and #2 in red have been manually inserted to the figure and are later addressed in Table5), April 2017.

Figure 4 .
Figure 4. Total amount of sidewalk information around Heidelberg for different timestamps.

Figure 4 .
Figure 4. Total amount of sidewalk information around Heidelberg for different timestamps.
(a) Sidewalk vs. highway information.(b) Sidewalk vs. residential street information.

Figure 5 .
Figure 5.Comparison of line charts of sidewalk information with respect to other indicators ((a) highways; (b) residential streets; (c) buildings; (d) users).

Figure 5 .
Figure 5.Comparison of line charts of sidewalk information with respect to other indicators ((a) highways; (b) residential streets; (c) buildings; (d) users).

Table 1 .
The statistical comparison of sidewalk data in OSM and Reference data for Heidelberg, Germany.

Table 2 .
The statistics of sidewalk data in OSM for selected cities.

Table 2 .
The statistics of sidewalk data in OSM for selected cities.

Table 4 .
Statistics for completeness of sidewalk information at attribute level in OpenStreetMap.

Table 5 .
Statistics for various data indicators in OpenStreetMap for Heidelberg.

Table 5 .
Statistics for various data indicators in OpenStreetMap for Heidelberg.
Indicators TN *