The widespread availability of advanced web technologies, the improvement of mapping devices and positioning systems, as well as the escalating growth of users’ need of spatial information have led to an emerging era of massive citizen-collected geographic data. OSM and Wikimapia [1
] can be named as well-known examples of this, which evolve with users’ contributions [2
]. OSM has turned into one of the largest and most successful examples of volunteered geographic information (VGI) projects by creating and maintaining a free online map that could be revised by the users while having global coverage [3
]. The increase in the number of users and an extensive volume of compiled data in this project during recent years prove the rising community behind the contributions [4
]. It is of note that all the OSM data, such as the ways, nodes, and relations, besides its previous versions, are kept in the OSM data history, which itself entails all the spatial and semantic information of a certain object along with information related to the users of that object [5
]. Unlike other VGI projects, OSM not only allows the users to have access to the latest version of the information about the object, but also allows them to download the previous versions and the history file of the edits [6
]. Due to the nature of data collection in OSM and its openness for everyone with any level of mapping experience to contribute, its quality is subject to careful investigation prior to tailoring it for any application [7
]. Therefore, investigating data quality, and the mechanism to improve and enrich it, has been on the research agenda of the VGI community [9
The research relevant to VGI quality, in particular OSM, can be divided into three categories, as illustrated in Figure 1
. In the first category, OSM data is compared against a reference dataset produced by a governmental or private organization, focusing on quantitative analysis of data quality [10
]. This category includes a considerable volume of the research dealing with the VGI quality. For instance, Haklay [12
] compared OSM data of London with its official dataset generated by Ordnance Survey. Mohammadi and Malek [17
] compared OSM data with that of Iranian National Cartographic Center by extracting the model from the corresponding reference data, they estimated the accuracy of the corresponding reference OSM data. Lyu et al. [18
] proposed an evaluative approach on the basis of symmetric arc similarity and assessed the geometric quality of the VGI road network based on the difference with the land road network corresponding reference. Brovelli et al. [19
] presented an approach for evaluating two data quality elements, namely, completeness and positional accuracy, based on the difference with the reference data, which was capable of setting the parameters of quality assessment consistent with the users’ needs. These studies have been conducted across different countries with diverse contribution patterns, e.g., Germany [13
], Iran [15
], England [12
], France [11
], and Greece [10
]. Nonetheless, these methods are disadvantaged with high costs and limitations in acquiring highly-accurate proprietary data while the quality of reference data was also in question.
In the second category, the researchers elaborated on the data quality by linking the data quality with users’ behavior. For example, Arsanjani et al. [21
] investigated several quality-related elements, including positional accuracy, completeness, and semantic accuracy, linked with contributing users who produced the data. In this category, the researchers endeavored to explore the impact of the number of contributors on the data quality [22
The research falling in the third category does not employ the reference dataset for quality assessment, i.e., the recent research has proposed methods for the VGI quality assessment on the basis of data history rather than a comparison with reference data [25
]. As an alternative, the research in this category has made use of the information captured in the history of OSM data as a tool for assessing data quality [27
]. For instance, Keßler and De Groot [27
] recommended a set of indicators based on historical data, i.e., the number of versions, contributors, confirmations, tag corrections, and rollbacks for quality evaluation. Another study, by Sehra et al. [30
], was executed using the history file information in India to assess the completeness of the spatial data by utilizing intrinsic indicators. They used the data history files from January 2016 and February 2017 and through comparing the two sets of data, determined the length of the roads, completeness of the road name, the maximum speed, and the semantic accuracy. D’Antonio et al. [31
] introduced a model to evaluate VGI feature trustworthiness and user reputation. They evaluated the data validity on the basis of the history of the information, e.g., creation, modification, deletion, and topology. In another study, Touya et al. [32
] analyzed and interpreted the VGI quality, as well as examined their evolution over time, by proposing a combined method using the reference data, the objects’ history, and spatial relationships. In the third category, the information quality is investigated directly while taking into account investigating components other than the geometric information. They investigated the OSM history databases in England and Ireland in 2012. The authors reviewed the number and information of the users, the edits, the number of geometric changes to each object, and the labels attributed to each object. Their results showed that 11% of users created or edited 87% of spatial data more than 15 times [33
Although there are a number of VGI-related studies elaborating data quality, quality enhancement has been sidelined so far. The current research aims at proposing an approach to enhance the positional accuracy of the OSM road network through producing new data using data history and to minimize the uncertainty in the version provided to the users. Hence, this research presents a five-stage approach based on a Voronoi diagram for improving the quality of the linear road network in OSM through the historical edits within a long-term history. For the implementation of the proposed method, a case study of Tehran, Iran has been selected and the achieved results are presented.
This paper consists of the following sections: after the introduction in Section 1
, the data history file is introduced in Section 2
. Section 3
presents the theories related to this research. The proposed method is thoroughly explained in Section 4
. The case study, the proposed framework, as well as the obtained results are presented in Section 5
. The research draws some conclusions and suggestions for future study in Section 6
2. OSM History File
The history file of the OSM data encompasses every single edit in the lifetime of OSM, even the deleted contributions from later contributors. Furthermore, the changes on the OSM objects in terms of attributes and geometry or even the variations whose uid (i.e., user ID) and username are unknown have been stored in this file [5
]. The entire OSM data is publicly accessible on http://planet.openstreetmap.org/
in the form of XML and PBF. This file has a substantial volume due to the complete information of the history file, which is updated on a weekly basis. As an example, the history file updated on 29 June 2017 is roughly 37 GB [35
In case the users require data of a certain area, analyzing and processing such information would be time-consuming and complicated. To access the data for a small area, it is possible to access some subset of the history file from other repositories in order to save the processing time, for example, http://osm.personalwerk.de/full-history-extracts/
and Geofabrik.de [5
]. The information in the history file is organized based on the type of the objects (node, way, relations), ID, and the version of the objects.
As illustrated in Figure 2
, the information related to an object can be categorized into (a) geometric information of the object; (b) semantic information; and (c) the information related to the user producing the object. The geometric information of the object includes the ID of the set of the nodes forming a linear object while the semantic information encompasses the ID of the object, its version, timestamp, road type, and road name. To illustrate this, the information related to two revisions undertaken in 2010 and 2012 for Sepand Street (ID = 85479570) in the OSM history file is depicted in Figure 3
Containing all the spatial data of the object and the users’ information, the OSM history file allows us to conduct statistical and quantitative analyses. Additionally, it is possible to identify the changes and evaluate the stability of the spatial information over time by retaining the lineage of the historical data given by the users. Until now, this file has been employed to evaluate the accuracy of the dataset, to replace the reference dataset, and to perform statistical analyses mostly on the spatial aspects or the information of one of the contributors [33
6. Discussion and Conclusions
OSM, as the most successful instance of VGI projects, offers numerous unique merits, in particular, retrieval of historical contributions. On the other hand, studies have proven that a larger number of contributors leads to better data quality in VGI based on the fact that data quality is better if more people get involved and contribute. However, there is a high probability that some contributors might reduce the quality of a given feature by eliminating some information around a feature. Therefore, the historical contributions should include some useful information for improving data quality. Hence, the current research was designed to propose an approach to improve the quality of the contributions in OSM through the long-term historical edits. In this study, all the historical edits from 2007 to 2017 were used to extract representative features for the street network of the study area. The choice of Tehran as a case study aimed at exploring the feasibility of the proposed approach in areas with limited contributors and the lack of a strong mapping community. Finally, the results of the study were evaluated against a high-quality reference dataset. The resultant maps and quality measures reveal that our approach was successful to enhance the positional accuracy and completeness of OSM data considerably while resolving topological errors.
The results prove the capability of the proposed method to improve the quality of VGI data. However, it should be noted that the results are sensitive to temporality of the contributions because, in a dynamic landscape, some features might have changed in the chosen timeframe, while the reference data may not have been updated accordingly. Therefore, the results of our approach, as well as the quality of the reference data, might have been biased by data temporality. Future studies should focus on determining the weights of OSM features based on their currency so that the latest contributions receive higher weights.
One of the unique strengths of OSM is the availability of historical edits, which allows us to see behind the scenes of contributions and how they have evolved over time. Therefore, the history of OSM contributions is certainly a rich resource for a number of purposes, including (a) tracking the quality of contributions over time in order to enhance the latest version; (b) monitoring changes across the landscape since the OSM life cycle exceeds 14 years so far; (c) subsequently, generating multi-temporal datasets instead of representing a static time; (d) mining users’ behaviors in mapping practices, i.e., what they map correctly and incorrectly; (e) how an object constantly remains persistent/edited over time; (f) historical completion of semantic/attribute information of objects; (g) cross-linking data quality with user behaviors and their socioeconomic history; and (h) ranking users’ activities in OSM. Future studies should focus on exploiting the history of objects towards addressing these issues.
Finally, the role of a number of contributors is evident while looking at historical edits for quality enhancement. However, this factor might still be challenging in some areas with inadequate contributors contributing to creating maps of that particular region. This is more crucial in developing countries with scarce mapping communities compared to active mapping communities across the globe. Thus, the OSM mapping communities should extend the mapping events and strengthen the motivating mechanisms for further user engagement across the globe.