In our study, we perform both extrinsic and intrinsic analysis in order to acquire a solid understanding of sidewalk completeness in OSM. However, the extrinsic quality analysis approach would only be carried out for Heidelberg, Germany since we have access to the sidewalk reference dataset of this city. In the case of intrinsic quality analysis, we have selected more cities so that our results could be more generalized to country level. The selected cities for intrinsic analysis include the capital of Germany (Berlin); two large and densely populated cities, Munich and Hamburg; and two smaller cities, Heidelberg and Freiburg. The selections of cities are based on being both large and smaller cities with different population density. In addition, Freiburg is selected specifically because previous projects and efforts have been completed regarding definition and collection of sidewalk information in OSM for this city. Therefore, we would like to see in our future statistic results whether this issue impacts the completeness of data for such city compared to others and to what level.
3.2. Intrinsic Quality Analysis
In addition to extrinsic analysis, we have performed an intrinsic analysis of OSM sidewalk data completeness by only looking into the OSM data. This analysis has been carried out for five selected cities in Germany (Table 2
). As stated earlier, the completeness check could be performed at an object level as well as attribute and value level. For means of intrinsic analysis of OSM, we need to perform an analysis of data completeness at attribute level. This is because sidewalk information is modeled as tags of attributes to road objects in OSM. There could be other options for modeling sidewalks such as mapping sidewalk geometries together with their attributes. This would specially be beneficial for pedestrian/wheelchair routing purposes. However, such proposals are controversially discussed within the OSM community and there have been several good reasons for not selecting this modeling option. The main reasons include simplicity for data collection as well as reducing the size of dataset. Therefore, in this step, we have analyzed the OSM dataset at attribute level. For assessing the number of sidewalk objects, we have counted the number of road segments (e.g., highways in OSM (please note that the term “highway” has a different meaning in the OSM community and this term is used for lablling all linear passable routes)) that have a tag of sidewalk attached to them (e.g., highway = footway tag with assigned footway = sidewalk tag). This means that the route object also has sidewalk information attached to it. However, four possibilities could occur: In the first case, it might be that the value for sidewalk tag is equal to “none” or “no” which means that the specific way does not have sidewalk(s) beside it. The second value that this tag could carry is “both” meaning that the specific road has sidewalks on both left and right side. The third case could be a value of “left/right” meaning that the road has only one sidewalk attached to it and that it is known on which side the sidewalk is available. The fourth case is that the value only contains “yes”. This means that the way has sidewalks on either one or two side of it. In the case of having one sidewalk, on which side of the way it exists, is unknown.
Different values of this tag would not make a difference for our completeness check. What is important is that the road segment has a sidewalk tag assigned to it, which means that we could gain information for routing and navigation service from this feature. In cases where no sidewalk tag is defined for a road segment, the object is counted as incomplete feature for having sidewalk information. Therefore, for the sidewalk object completeness check, we count all road segments (streets, highways, and footways) that have a tag of sidewalk attached to them and compare them to the total number of road segments to estimate the number of inconsistencies. Figure 2
gives a visual impression of road segments having sidewalk information in a small part of Heidelberg. As depicted, for this part of Heidelberg, it can be visually seen that the dataset is lacking sidewalk information for about 40% of the region. It is important to note that there are several examples of regions in the city where no sidewalk information exists. Furthermore, Table 2
gives the statistics relating to the number of ways with sidewalk objects in all selected cities. It is important to note that all the features with highway tag in OSM have been included in the analysis. However, in reality, certain highway features do not and should not have sidewalks, and it is logically expected to have sidewalk information tagged to residential highway features. Therefore, later we have improved this analysis by considering the statistics of highway types into two categories of major and minor.
For the second level of completeness check, we aimed to assess the number of additional attributes relating to sidewalks (e.g., width, incline, etc.) that were present for the roads that have had the sidewalk tag provided. For this purpose, we used relevant sidewalk attributes for routing and navigation of people with limited mobility. The attributes selected were adopted from another study [3
] except the attributes lit, crossing and general access, as they seemed irrelevant in our project (Table 3
Results of checking the completeness of attributes are given in Table 4
. The assessment shows that for larger cities such as Berlin with high density population, the number of sidewalk objects and their attributes are higher and hence the OSM for these regions are more complete compared to smaller cities. This might be due to higher number of OSM volunteers in these regions, which is directly related to the higher population. This issue has already been addressed in several research studies [22
]. However, this is not always the case and there exist other primary cities with high population that such information is predominantly missing (e.g., Munich).
Furthermore, the results show that although there has been some level of contribution made by OSM community for mapping information of sidewalks and their attributes, the total coverage of this information remains very low. This is problematic when the data are to be used for a wheelchair routing and navigation service in the city. This status could however change by raising the awareness of the importance of such information for OSM community and attempting to complete OSM with respect to this information. Despite the fact that OSM data are not complete for a whole city to be used by a wheelchair routing and navigation service, through a fine-scale inspection one could still find several large regions inside cities that might have an acceptable level of quality in terms of sidewalk data availability. Therefore, it is suggested to use OSM data of these regions for testing and implementing routing and navigation services.
Furthermore, within the CAP4Access project there is the need for performing an area aggregated quality assessment approach in order to understand the process of sidewalk data enrichment in OSM over different time periods. For this reason, we have extended a web service called OSMatrix [64
] for the completeness check of sidewalk information in OpenStreetMap data. OSMatrix tool [65
] is a web-based service for visual exploration and analysis of OSM data. It allows visualizing information on user contributions and the existence of certain features and attributes. The tool contains a graph-based visualization of a measure or attribute over time. In addition, it allows producing comparison maps for a certain attribute or measuring for different pre-selected timestamps. The relevant attributes and measures are aggregated into hexagonal cells with an edge length of 1 km. For the intrinsic evaluation of sidewalk data completeness, we examined the following attributes with regard to availability in the OSM dataset for five different timestamps: 2008, 2010, 2012, 2015 and 2016. The sidewalk measures include: total amount of sidewalk information, sidewalk width, sidewalk surface texture, sidewalk incline, and sidewalk smoothness.
shows an example of the service portraying the total amount of sidewalk information (tags in OpenStreetMap related to sidewalk information) for an area of Heidelberg city center in April 2017. Each cell shows a value representing the aggregation of the total number of tags in OSM related to sidewalk attributes in that hexagon. Moreover, Figure 4
demonstrates the temporal functionality of OSMatrix in terms of comparing the data availability of certain feature (e.g., sidewalk information) for two different timestamps. This functionality is useful for understanding and controlling the enrichment of sidewalk data and could be used by mapping party organizers to collect data for certain regions where no sidewalk information is currently available [66
]; hence, leading to sidewalk data enrichment.
Furthermore, to better understand the reasons behind the incompleteness of sidewalk information and the differences of statistics between various regions, we have used OSMatrix for the following data indicators:
Total number of highway features;
Total number of residential highway features only (i.e., residential streets);
Length of highway with major type including highways mapped as motorway, primary, secondary, tertiary and trunk;
Length of highway with minor type including highways mapped as residential, track, and service;
The total area covered by residential facilities;
Total number of buildings;
Total number of OSM users that have contributed data in a given area.
demonstrates a screenshot of OSMatrix for Heidelberg city with numbering the various hexagonal cells that cover the map. With respect to these cells, the values for all above-mentioned data indicators have been calculated and presented in Table 5
. An interesting point to consider is that although the total amount of sidewalk information seems to always have direct positive relation with total number of mapped highway features (as well as length of major and minor highways), this is not the case with total number of residential streets where one assumes to have more sidewalk information available (Figure 5
a,b). An example for this is cell ID #1 where 172 residential streets have been mapped but no sidewalk information is available. On the other hand, only four residential streets are mapped in cell ID #1157086 while the total amount of sidewalk information in this area is 25. This means that highway features and residential streets are only two relevant features that might affect the mapping of sidewalk information. Another important feature is the building information. However, our analysis shows that it is rather difficult to understand the relation between sidewalk information and total number of buildings (Figure 5
Furthermore, in the case of cell ID #1158888 where the most sidewalk information is available, it is observable that the level of completeness for all other features also seems to be high. Notably, this cell has the most number of users that have edited OSM database compared to other cells (Figure 5
d) which could suggest that the total number of users have direct influence on the total amount of sidewalk information. However, this is not the case for various cells including cell ID #115887 and #1160692 where the total number of users is high and their contribution for mapping sidewalk data is extremely low (Figure 5
d). This issue induces the idea that the type of users might also be very important to consider in this scenario. It might be the case where sidewalk information has been mapped by professional users and their activity have been more focused in parts of city where the information was required for pedestrian/wheelchair accessibility programs.
Another important data indicator that could have strong influence in this analysis is the land use information. For instance, in the case of cell ID #2 where no sidewalk information has been mapped, by checking the land use of the area it is understood that the area is mostly covered by forest and hills. Therefore, there are less actual sidewalks in reality to be mapped in that specific area. This also applies to other residential indicators such as total number of residential streets, total area of residential facilities as well as total number of buildings (Table 5
). However, cell ID #1 covers a residential area, where quite large amount of buildings and residential streets have been mapped, but no attempts for mapping sidewalks have been made. This might relate to the fact that the users of that area are less aware of the importance and need of sidewalk information and therefore have not considered mapping sidewalks.