Assessing OpenStreetMap Data Using Intrinsic Quality Indicators : An Extension to the QGIS Processing Toolbox

OpenStreetMap (OSM) is a recent emerging area in computational science. There are several unexplored issues in the quality assessment of OSM. Firstly, researchers are using various established assessment methods by comparing OSM with authoritative dataset. However, these methods are unsuitable to assess OSM data quality in the case of the non-availability of authoritative data. In such a scenario, the intrinsic quality indicators can be used to assess the quality. Secondly, a framework for data assessment specific to different geographic information system (GIS) domains is not available. In this light, the current study presents an extension of the Quantum GIS (QGIS) processing toolbox by using existing functionalities and writing new scripts to handle spatial data. This would enable researchers to assess the completeness of spatial data using intrinsic indicators. The study also proposed a heuristic approach to test the road navigability of OSM data. The developed models are applied on Punjab (India) OSM data. The results suggest that the OSM project in Punjab (India) is progressing at a slow peace, and contributors’ motivation is required to enhance the fitness of data. It is concluded that the scripts developed to provide an intuitive method to assess the OSM data based on quality indicators can be easily utilized for evaluating the fitness-of-use of the data of any region.


Introduction
Volunteered geographic information (VGI) offers an intuitive method for the collection of geographic information (GI) through volunteering at a very low cost [1][2][3][4][5].Probably one of the best-known example of VGI is the OSM project, contributed by users of varying mapping experiences [6].This has become possible due to the availability of geo-location devices and more user-friendly software.The major driving forces behind the establishment and growth of OSM are restrictions on use, non-availability of data by various map providers, a high cost of procurement and the recency of data.Further, the OSM data are available under an Open Data Commons Open Database license (ODbL) for reuse, which allows for sharing, creation and adapting of the data unless reuse is attributed [7].OSM has encouraged 3,581,079 enthusiastic contributors across the globe to create 3,802,465,092 nodes, 401,428,744 ways and 4,885,160 relations [8].It uses a topological data structure (XML format) that contains a block of nodes, ways and relations to represent spatial features.Further, it contains tags (key-value pairs) for storing metadata about features.
The OSM project produces a huge amount of labeled geographical data contributed by users.However, the data suffer from significant limitations that come naturally with VGI data.The first and foremost reason is uncertainty about the data availability i.e., when, which, how, from where and how much data will be contributed [9][10][11][12].This is normally defined as the contribution patterns and contributor's motive.Further, no knowledge about the contributors, their skills and motivation augments heterogeneous quality.Furthermore, the lack of unified standards and no top-down quality assurance model provides the risk that the contributed data often have a certain degree of error [13].In addition, OSM provides an open classification tagging scheme (key-value pairs) that can lead to misclassification and reduction in data quality [14].The issue of vandalism and data imputation further deepens the quality concern.Moreover, the lack of clarity in license terms and its definitions also deters organizations from incorporating VGI into their datasets.This is due to the lack of longitudinal research on legal aspects of utilizing VGI datasets [15].These issues have raised a deep concern about the quality, reliability and fitness of data for GI application domains [16][17][18][19][20].Many of these issues have been discussed in the vast literature.It is concluded that being affected by many limitations, VGI data are still reasonably credible [1].Hence, for creating trust in OSM, the data need to be assessed.Various established quality standards (http://ncl.sbs.ohio-state.edu/ica/3_spatial.html,accessed on 13 September 2016) exist for assessment of data, but ISO TC211 [21] is primarily responsible for drafting international standards for quality assessment procedures.The data quality indicators identified by TC211 for the assessment of spatial data are completeness, logical or topological consistency, positional accuracy, semantic accuracy, attribute accuracy, temporal accuracy and lineage, which have been used in various studies [18,[21][22][23][24][25].For spatial data quality grading of OSM, all of these elements must be carefully assessed [4,26].
The lack of a suitable quality evaluation framework renders problems for potential users to assess OSM data [6,13].The established quality evaluation procedures demand authoritative data to evaluate VGI.The latter case is not always possible because acquiring authoritative data is not an easy task.This may be due to non-availability of data, licensing terms and the high cost of acquiring the data [5].It is suggested that under such a scenario, the existing quality measures become unsuitable for assessment [4,14].Thus, for the analysis of VGI data in the case of the scarcity of authoritative dataset, researchers tend to identify intrinsic quality indicators, such as the development of nodes, contributors and their behavior [6,[27][28][29][30] by considering the 'Linus law' as a benchmark [6].
In the spirit to fill in the research gap, the current study has extended the capabilities of the processing toolbox of QGIS in alignment with previous researchers [19,26,31].The intrinsic indicators used in this study are network length completeness, attribute completeness, semantic accuracy and heuristic indicator to assess route navigability.The model developed utilizes the scripts developed in Python binding for QGIS (PYQGIS) and are simply plug-and-play workflows.For the demonstration of the developed models, a case study of the Punjab, India, OSM dataset has been assessed using intrinsic quality indicators, and the results are presented.
The paper is divided into six sections.The next section discusses the data quality parameters and related research work.Data preparation, the methodology to develop and extend QGIS processing toolbox by developing PYQGIS scripts, is described in the third section.The fourth section elaborates the results obtained from different models, and the limitations of the current study are discussed in the fifth section.The conclusions drawn from findings of the study are provided in the last section.

Data Quality Parameters and Related Work in OSM
The quality issue in VGI data is a clear challenge.Rigorous and longitudinal studies are required for creating trust in OSM data.There have been numerous research studies in recent years.Figure 1 presents the data of 485 papers collected from various bibliographic databases, such as the IEEExplore, Springer, MDPI, Mendeley and Zotero repositories.A review of these articles shows that many of them reported on the assessment of OSM data.In these studies, the quality of data has been judged based on the context of their application, which is a function of intangible properties represented using quality indicators [32,33].Quantitative dimensions shared with GI review the completeness, logical consistency, positional, temporal and semantic consistency.Non-quantitative dimensions review lineage, purpose, usage and constraints.Further, dimensions exclusive for VGI review the believability, compliance and convergence [34].The following sections discuss the related work by researchers to assess the completeness, attribute accuracy and semantic accuracy using intrinsic data quality indicators.

Completeness
The completeness of map data is the most important data quality element next to positional accuracy.Completeness of spatial data describes the existence of features [21,[35][36][37] as compared to ground reality.Brassel et al. [35] have stated that the completeness of data focuses on the errors of omission and commission, whereas model completeness is an aspect of the fitness-of-use described under 'semantic accuracy' [38].Further, completeness is divided into two parts: feature and attribute completeness [32,37].Feature completeness presents the relative completeness of the percentage of features present in comparison to a reference dataset.Attribute completeness measures the relative completeness of attributes (e.g., the comparison of the road name in OSM to a reference dataset) in OSM and the reference dataset.Furthermore, the incompleteness of attributes has an impact on attribute accuracy, positional accuracy and logical consistency depending on their domains (geometric, topological, thematic, temporal, etc.).
For assessing the completeness, an established procedure is used to compare test data with reference data.The method requires external data that can be used as the ground truth, and for this reason, its measurement tends to be extrinsic.In this, the relative completeness of a road network is determined by calculating the total length of the roads of the test dataset within a predefined area and that is then compared to a reference dataset for the same area .However, due to issues discussed earlier in Section 1, such comparisons are limited [18].Therefore, suitable alternatives are necessary to assess completeness, based on intrinsic indicators, such as the OSM tags and their data's history [6,63].In the context of internal completeness quality analysis, Devillers et al. [64] have discussed meta-data and motivated developing and analyzing metadata [64,65].Metadata in OSM are the count of tags, the current maximum version for each tag, etc. [29].Further, information about contributors who upload and edit the map is also metadata.Researchers [6,27,39,40,[66][67][68][69] have used various methods to assess the completeness of the OSM using various intrinsic indicators as discussed in Table 1.
Table 1.Studies on intrinsic completeness assessment of OpenStreetMap (OSM) data.

Researcher Reference Datasets Description
Kounadi [39], Ather [40] OSM (Heathrow, U.K.) The study analyzed road features without names in the attribute tables, and the total length of these roads was calculated and presented as a percentage.
Keßler and de Groot [27] OSM (Altstadt, Heidelberg, Germany) The research employed term frequency-inverse distance frequency measure (tf-idf) to evaluate the importance of tags related to the feature type.
Bégin et al. [67] OSM (Canada) The study used concave hulls for defining contributor's editing sessions for producing an image of contribution.
Razniewski and Nutt [66] OSM (Lübbenau, Germany) In this study, spatial operations were applied on the metadata of the spatial dataset, e.g., star join, to extract "data completeness" of the area.
Gröchenig et al. [68] OSM (London, U.K.) The study used methodology to assess regional data completeness by analyzing changes in community activity over time periods.
Forghani and Delavar [70] OSM (Tehran, Iran) The authors assessed OSM based on metrics such as minimum bounding geometry area and directional distribution (standard deviational ellipse) and applied fuzzy logic to identify the completeness of OSM data in gridded cells Ballatore and Zipf [69] OSM (Selected regions of Germany and U.K.) The study developed a conceptual framework for analyzing the completeness and other quality attributes of data based on intrinsic indicators.

Attribute Accuracy
Attribute accuracy describes the quality of quantitative attributes and the correctness of non-quantitative attributes [71].The quantitative (or numeric) data can easily be compared, whereas qualitative (or text) data are more difficult to handle.The established metrics can be used to measure quantitative data.The non-quantitative attributes are not always rightly spelled or have abbreviations, especially in the OSM dataset, as contributors do not follow naming conventions.For measuring the similarity or lexicographical error between data, the text similarity approach is used.The results obtained for attribute accuracy are presented as the percentage correctness of numerical or text-based values associated with an attribute [34,46].
Various algorithms exist for text comparison, such as 'Soundex' and 'Metaphone'.However, these need to be customized for different languages.The 'Levenshtein' algorithm [72] measures the similarity between two strings by calculating the least number of edits that is needed to modify one string to another [46,49,51,55,71]; whereas, 'similar_text' function is a much simpler and faster one, which returns the number of similar characters between the two strings [73].Barron et al. [74] have assessed the attributes and quantitative information of data.A kappa index method [75,76] has been used to assess the quantitative attribute assessment based on the classification of classes in land use cover [57,62].

Semantic Accuracy
The semantic accuracy is the quality of spatial features and is described in accordance with the selected model [38].This information is defined by features, their attributes, values and relationships between features.The quality assessment process of a dataset allows measuring the 'distance' between the data and the perceived reality.Here, distance implies both geometric distance and semantic distance.In OSM, the semantic accuracy of map entities is described through tags [1].Further, the quality indicators for semantic accuracy can be categorized as data-centric, user-centric and context-centric [4].
The data-centric assessment is based on the extrinsic authoritative comparison and intrinsic assessment methods.Further, types of features and attributes of features are subjected to measure the semantic similarity [57,71,77].Al-Bakri and Fairbairn [78] have performed semantic similarity for the possible integration ordnance survey (OS) and OSM project.Ballatore and Zipf [69] have presented a conceptual model to assess the data quality elements using intrinsic indicators.Further, Jilani et al. [79] have used a machine learning approach to assess the semantic quality of the London OSM dataset.The accuracy has been obtained by finding a relative number of correctly-predicted instances with the total number of actual instances.Mülligann et al. [80] have introduced the semantic similarity measure based on the changes in the history of OSM elements.
The user contribution-centric approach for assessing the semantic accuracy relies on the trusted contributors.Flanagin and Metzger [81] and Mooney and Corcoran [29] have identified and analyzed the motivations of contributors and their feature editing.Rehrl et al. [82] have elaborated the methods used by researchers to evaluate VGI data quality based on contributors' motivations, experience, etc.There have been many studies that have examined the motivation of trusted individuals [2,48,83,84], but such studies are considered generic for evaluation.Further, Quattrone et al. [10] have presented the methodology to quantitatively measure the content bias in contributed data and maintenance efforts by the users to provide a deeper analysis of data.They have found that no content bias exists between two segments of users (power users and crowd users) for contribution, whereas significant geographic bias variation can occur with changes in culture.Quattrone et al. [11] have revealed that maintenance efforts of GI vary country-wise in terms of the spread of maintenance efforts, data, users involved, places of involvement and the motive that triggers users.
The third category to assess the semantic information is context-centric.It relies on the law of geography as discussed by Tobler [85] that "All things are related, but nearby things are more related than distant things".For example, an off-ramp is required to depart from a freeway at a small angle in the direction of traffic flow, and freeways have strict limits for the radius of curvature based on design speed [1,80].Further, the LinkedGeoData ontology represents information in the form of tree structures with objects being connected by relationships (e.g., 'is_a') [16,86,87].OSMonto, an ontology-based tool, was designed for use in a navigation web service by Codescu et al. [88] for providing the context-centric assessment.Hopf et al. [89] have developed a methodology on the semantic assessment of prohibition signs in OSM data.Indeed, context-centric assessment rules are essential to provide an automated triage of a very large number of updates and corrections by contributors.
To summarize, researchers have used various established assessment methods by comparing OSM with authoritative datasets.However, very few studies have reported on the fitness-of-use of the dataset in developing countries [56,[90][91][92][93] using established quality indicators.The established methods are unsuitable to assess OSM data quality in the case of the non-availability of authoritative data [14].Hence, assessment through intrinsic quality indicators would certainly encourage researchers to have a deeper understanding of datasets.The recent developments of OSM have witnessed intrinsic quality indicators to assess the data using history files [29,69,74,94].Three frameworks have been developed on these lines by Barron et al. [74], Ballatore and Zipf [69] and Rehrl and Gröchenig [94].However, these frameworks suffer from various issues that pose a challenge before the researchers extend existing tools and identify more indicators to assess datasets.This study offers an extension to the processing toolbox of the QGIS framework for sufficiently assessing the data quality based on the intrinsic parameters by providing easy-to-use workflow models.

Methodology: Extending the Processing Toolbox
QGIS (Version 2.8.6) [95] is an open source GIS software and utilizes an object-oriented Python framework called the processing toolbox.This toolbox provides the ability to extend the capabilities of existing algorithms [96].The PYQGIS bindings allow comfortable integration of Python code with Qtlibraries and eventually with QGIS.Further, the processing toolbox provides a graphical modeler for automating the complex workflow management of algorithms.The methodology used for the assessment of OSM data was developed in Python as processing scripts.These scripts were further used as components in the graphical modeler.The following section discusses the processing models developed to perform intrinsic quality assessment.

Network Length Completeness Model
As discussed in Section 2, the completeness of the data is vital to justify the fitness of data for various GI application domains.The network length assessment is an important intrinsic indicator to assess the completeness of road networks [31].The processing model developed (Figure 2) requires two types of input: (1) the network graph; (2) the polygon layer of smaller regions with the region ID field.This model was used to calculate the length of roads class-wise as suggested by Graser et al. [31].Firstly, the input graph was re-projected to a suitable coordinate system.Thereafter, the class-wise road network was extracted and presented to the "sum line lengths" processing algorithm to find road network statistics.OSM data contain many classes for "highway" tags (http://wiki.openstreetmap.org/wiki/India:Tags/Highway, accessed on 2 March 2017) with few classes representing links for connecting a higher order road from/to a lower order road.For example, trunk_link connects the trunk to/from the primary road.In the current study, for calculating the network length class-wise, the "links" were considered part of the same class of which they are represented, i.e., trunk_link would be considered as the same as trunk class [31].

Attribute Completeness Model
OSM tags represent important information about the data.Hence, the assessment of OSM tags provides useful information about its fitness for various GI application domains.The processing model developed was used to assess the name tag, e.g., name of ways.Hence, the model was able to determine the attribute completeness (Figure 3) of input data [31].This model used the "extract by attribute" processing algorithm to identify the features based on "expression".Further, it classified the "field" in "expression" by the NULL or NOT NULL rule.The model took an input layer, a field to select the class, a road class as a text string, an expression to extract required features and the size of the vector grid and provided the sum of the length of features as the output.For example, to find the length of tertiary road class without a name, the expression passed would be ("name" IS NOT NULL).

Semantic Accuracy Assessment Model
The semantic accuracy of data for this current study was measured by assessing tags.The model was developed to analyze the user contribution patterns, e.g., the number of edits and the number of contributors, etc.A processing model (Figure 4) was developed based on intrinsic indicators [18,74] to answer the following questions:  The history data used for assessment were prepared as discussed in Section 4.1.The model was developed for answering specific questions as outlined earlier.The Python script "development of contributors" was developed to find the number of contributors.This script read the history table of line features in the PostgreSQL database and identified and returned a list of contributors date-wise.The classification of contributors was done based on the number of edits made by them.Hence, the script provided more in-depth information about each user.The results were grouped by classifying three types of contributors: "senior-mappers", "junior-mappers" and "non-recurring-mappers", as suggested by Neis and Zipf [48].The calculation was based on that these parameters may be prejudiced, as one mapper could be less active in a certain area while being very active in other areas.Thus, classification provided the rough estimate about the active members in the region.The script "contributors and their cumulated percentage of contributions" calculated the total amount and percentage of node-contributions by each contributor.
Another important indicator is the number of distinct users who have contributed in developing a feature.This indicator would provide the status of correctness (positional accuracy) of the feature as per Linus's law.The script "number of contributors per feature" was developed to fetch each feature and listed all of the contributors that edited the respective feature.The latest edit was recorded with the date based on the timestamp.Further, the script "maximum version per feature" was added to the model for identifying the development of the feature over time and its valid maximum version.Mooney and Corcoran [29] and Haklay et al. [6] have suggested that the greater the number of edits per feature (at least 15 edits per feature), the better would be the accuracy of that feature.Hence, this script certainly helped with identifying the number of more complete and correct features lists.

Route Navigability Assessment Model
The route navigability assessment model (Figure 5) was developed to assess road network feature completeness by comparing point-to-point distances with actual OSM routing distance based on the heuristic method.This method depends on a sanity check that the routing distance length along a road network should never be shorter than the shortest distance between the two points.Hence, this approach was used to compare the relative difference in the shortest map distances from direct distances between the given origin-destination point pair.The shortest routing function was designed using inbuilt functions of pgRouting (http://pgrouting.org,accessed on 10 October 2016) (Version 2.2.3).In the current study, the geo-location of each random point was obtained by the geocoding plugin in QGIS.The origin-destination points table was used as the potential start and end points for calculating the shortest path routes.Further, it is possible that random points fetched may not be connected to the roads or on the roads.Hence, the KNN function from PostGIS (http://postgis.net,accessed on 10 October 2016) was used to select the nearest navigable point on the road dataset.This model (Figure 5) took two inputs: (1) routable graph; (2) points (origin-destination) data table.The first component created a matrix of origin-destination direct distances using the haversine algorithm; whereas, the second component created a matrix for the shortest routable distances between origin-destination points data using the Dijkstra algorithm.The outcomes of the point-to-point and shortest distance algorithm were given to the "joining attribute table" processing algorithm, for joining these tables based on the common parameter.Further, a field calculator was used to find the ratio of the shortest distance and point-to-point distance.The last component, based on the heuristic parameter, calculated the number of non-navigable routes due to the error of omission or incompleteness.

Case Study of Punjab OSM Data
This section elaborates the application of developed models, discussed in Section 3, to assess OSM data of Punjab (India) based on intrinsic quality indicators.

Data Preparation and Tools
During this study, two types of datasets were used: (1) history dump; (2) routing network data.The repositories were explored by downloading the required data, i.e., history planet dump (http://planet.openstreetmap.org/)and regular region-wise extracts (http://download.geofabrik.de/).Further, in this study, OSM data of the Punjab (India) region were used for analysis.The following sections elaborate the methods and tools for extracting the test data:

Routing Graph Preparation
Neis and Zipf [48] have suggested that the preparation of a routable OSM graph requires thorough preprocessing and conversion to a real topology.Shapefiles of proprietary datasets normally carry such information, but the database prepared from OSM-shapefiles using the shp2pgsql tool should be cautiously handled (http://pgrouting.org/docs/howto/shapefiles.html,accessed on 27 June 2016).In this study, India data in .osm.pbf format were downloaded from (http://download.geofabrik.de/osm/), and Punjab data were extracted using osmosis based on the polygon.A tool, osm2pgrouting (https://github.com/pgRouting/osm2pgrouting,accessed on 26 June 2016), was used to load the data to PostgreSQL database.It parsed OSM XML data and created a topologically-correct routing graph.Thereafter, it generated SQL files for PostGIS, which was compatible with pgRouting and QGIS.Further, pgRouting functions were used to assess the quality of routing graph data, i.e., pgr_analzeGraph() and pgr_nodeNetwork(), to fix some issues like noding of data for ensuring that intersections in the network are represented as the start or end of an edge, gaps, self-intersections and dead nodes, etc.

Network Length Completeness
In this study, two datasets (OSM dataset downloaded on January 2016 and February 2017) were evaluated using the network model (Figure 2) for the road network completeness.The statistical analysis of the OSM road network concluded that the total mapped length of highway type features is 33,813,720.6453m (old) and 38,147,564.3035m (recent).Further, it was evaluated that 98.27% (old) and 92% (recent) of the data could be used for the navigation of cars, whereas the rest of data is suitable for other domains, including pedestrian and cycleways.Further, about 4,333,843.6582m more road network data have been added during the period January 2016 to February 2017.
It was analyzed that for old data, there exists a total of 23 ways classes, and for recent data, this increased to 26.Further, few classes of ways diminished, and some new classes emerged from recent data, e.g., unclassified ways, earlier present in the data, have now been properly assigned tags and classified.The statistics of the resultant analysis are shown in Figure 6.Further, the two datasets were analyzed to find where geographic changes took place and found that during the period January 2016 to February 2017, rural area (in particular the areas around place name (Raikot) and geographic area, bounding box: 30.4842, 75.4376, 30.7961, 75.7892) has developed more as compared to urban area.In addition, it has been witnessed that the road network of the Malwa region of Punjab [97] is complete as compared to Doaba and Majha regions (Figure 9c).

Attribute Completeness
The model (Figure 3) processed two datasets to identify the developments and current status of attribute completeness.Two tag attributes "name" and "maxspeed" were assessed.The old and recent data of OSM have been compared, and the results are presented in Table 2.This table depicts statistical information about the length of the feature type and the percentage of features with the "name" attribute.This revealed some interesting facts that the community in India is growing and focused on contributing to OSM.In spite of the increase in the total length of the OSM network by 4,333,843.6582m, the length of certain network classes such as living_street, primary, secondary, has decreased.In addition, roads and unclassified classes have vanished from the recent network, whereas new classes have emerged from it, e.g., motorways, track grades and an unknown class type.The thorough analysis suggested that the motorway class was wrongly allocated to the network.
The second parameter analyzed was the "maxspeed" tag, and it has been found that only 4 to 5% of features have "maxspeed" tag information, which makes it unsuitable for navigation without pre-processing.The model (Figure 4) was used to assess the history file of the area under examination.The analysis concluded that a total of 611 users (Figure 7a) have contributed through the period 2007 to 2016.They have contributed 634,173 nodes, 86,176 ways and 393 relations with very the first node added in September 2007.Figure 8 presents the historical development of three components of OSM data, whereas Figure 9 depicts the changes in OSM data over the years.Further, by analyzing the user contribution count, the categories of user are classified as suggested by Neis and Zipf [48]. Figure 7b depicts the graphical count of the class of contributor and calculated that 45.6%, 11.9% and 42.5% as junior, senior and non-recurring mappers exist for data under investigation.Figure 7f presents the number of active distinct contributor's and provides a graphical analysis about the user activity in the region.Further, Figure 7e reveals the relationship between user feature edits and their percentage contributions.
The investigation of the number distinct contributors (Figure 7c), those who developed a feature out of a total 70,332 (line features), revealed that 8.45% of line features have been edited by more than 10 distinct users.Furthermore, the analysis of the maximum versions of features concluded that there are only 215 features, which are heavily edited, i.e., have a version more than or equal to 15, 71.13% features still exist with Version 1 status and 4.6% features have five or more than five, but less than five edits (Figure 7d).

Route Navigability Assessment
The model (Figure 5) used for the assessment of road navigability accepted two inputs: (1) routing network; (2) origin-destination points table.The origin-destination points dataset was prepared using the geocode plugin in QGIS, which uses Nominatim geocode services, to acquire the POIs for all districts of Punjab (India).The two inputs were given to the two components of the model, for finding the shortest OSM map distances (Figure 10) and direct distances between the points.In order to validate the results, a Python script was developed using the Google map distance matrix API to find the shortest routing distance between the same set of origin-destination points.Figure 11a,b present the plots of heuristic relative error used to measure the variation of shortest distances from direct distances obtained on OSM and Google maps, respectively.It was found that in the case of the district-to-district-based origin-destination matrix, nearly 69 routes had a distance more than 20% of the direct distances.Few longer routes were attributed to data preparation, as the neighboring state's network was not included.Therefore, particularly for the district Pathankot, the routes were 30% to 40% longer approximately.As per the outcome of the model, it is concluded that the completeness of the primary roads was quite satisfactory, but other types of roads are less complete.
Further, it was witnessed that in Figure 11b, the roads' variation in Google data is more as compared to OSM data.Firstly, this was due to the lazy computation used to find the approximate closest node to the locations of interest, and instead of taking the fraction length of the street, whole lengths were taken.Secondly, Google optimizes for time (cost based on time), which also considers traffic conditions, whereas our algorithm was optimized for distance only.Lastly, nearly in the whole of the OSM dataset, the turn restriction attribute was missing.

Limitations of the Study
The current study focuses on analyzing OSM data based on intrinsic quality indicators.However, only line type features have been assessed, whereas these models can be easily applied to other types of features (e.g., ways and relations), to assess their development, completeness and their fitness-of-use for various GI domains.Further, we have compared the OSM dataset of Punjab between two points in time, but the models can be easily adapted to assess and compare the urban and rural developments by preparing appropriate spatial datasets.Furthermore, the data were prepared using open source tools, and their weakness certainly would be reflected in data used for the study, e.g., Barron et al. [74] have reported that the OSM history importer carries a few bugs, and sometimes, the deleted ways and polygons are imported as if they had not been deleted.Another limitation of the current study is that the shortest routes are compared for the district-to-district level.More low-level origin-destination points data, e.g., village-to-village, may have revealed a clearer picture of the completeness of data.In addition to this, the routing algorithm computes the shortest path based on the availability of the route rather than other attributes, e.g., the maxspeed and traffic situation.Hence, the heuristic metrics defined for analyzing the completeness of data is a proxy measure, and longitudinal studies are required for the generalization of such heuristic measures.

Conclusions and Future Work
OSM provides a cost-effective solution for the community to develop the regions by crowdsourcing.Further, the OSM platform can work as a reliable, fast and efficient framework where the spatial data acquisition process by official mapping agencies is slow.Hence, OSM can enable quick visualization of the expanding road network in developing countries like India.Furthermore, it provides an opportunity for researchers to process a huge amount of labeled data for quality assessment.The thorough review of the literature revealed the need for the development of easy-to-use procedures for data assessment specific to different GI domains.Hence, to fill the gap, the current study has extended the functionality of the processing toolbox of QGIS.The developed models would pave the way to analyze the contributed data and identify deficiencies in the data.Thus, the community can easily perform a collaborative effort (mapping party) to rectify those issues.The models developed can be used to analyze street networks worldwide and can easily adapt to check for the completeness of other features.Further, easy-to-use models can encourage researchers to explore the regions in developing countries using intrinsic indicators.Hence, the models would help in overcoming the skewed distribution of studies in developing countries as compared to developed countries.
The intrinsic indicators used to assess the spatial dataset are network length completeness, attribute completeness, user contribution assessment and heuristic road navigability assessment.The case study of Punjab (India) has been taken to evaluate the models.Further, during the period January 2016 to February 2017, more rural area contribution has been observed as compared to the urban area, and also, the already existing class of roads has been renamed to other classes.Hence, a few old classes have diminished, and new classes have emerged during this period.The user contribution analysis model has revealed that although the count of users has increased over the period, only 11.2% of mappers are senior and active members.This count is in alignment with the fact that less than 20% contributes more than 80% of the data.The results have shown that OSM in the region is undergoing slow development, and there is a need to motivate local contributors by organizing mapping parties.
The results of heuristic road navigability analysis showed that the length of the shortest paths generated for navigation is affected by the completeness of the underlying dataset.Further, it has been observed that some street networks have a significant number of missing road networks.It is further revealed that turn-restrictions are totally missing, whereas low percentage maxspeed attributes are present in the data.The size of the origin-destination points sample used in this study was small to make definite statements about completeness.However, still, the model designed can be extended to any number of computations.
Further, computer science and geospatial researchers should view OSM as an opportunity to investigate computational research challenges.The future research would be focusing on the identification of heuristic intrinsic quality indicators for the assessment of OSM data and add more components to the existing script for the development of the QGIS plugin for assessing data specific to different GI domains.Furthermore, an effort would be made for the generalization of such heuristic methods through longitudinal studies.

Figure 2 .
Figure 2. Processing model for calculating the network length class-wise.

Figure 3 .
Figure 3. Processing model to compute the class-wise length of features with "name" attributes.

•
How many contributors have made contributions to OSM data?• What classes of contributors have edited the area under examination?• How many distinct users have contributed chronologically to develop a feature?• How has a feature developed over time, and what is its latest version number?• How many distinct active contributors edited OSM data?• Contributor-distribution of created OSM-features and OSM-feature-edits.

Figure 4 .
Figure 4. Processing model for identifying user contribution using history data.

Figure 5 .
Figure 5.The route navigability assessment model contains two submodels: "point-to-point direct distance model" and "origin-destination routable shortest distance model".

Figure 7 .Figure 8 .Figure 9 .
Figure 7. Pictorial presentation of statistics obtained from the user contribution processing model.(a) Development of contributors; (b) classification of contributors; (c) number of contributors per feature; (d) maximum version per features; (e) contributors and their accumulated percentage of contributions; (f) development of active distinct contributors.

Table 2 .
Class-wise statistics of the attribute completeness of two datasets.