Recent Developments and Future Trends in Volunteered Geographic Information Research: The Case of OpenStreetMap

: User-generated content (UGC) platforms on the Internet have experienced a steep increase in data contributions in recent years. The ubiquitous usage of location-enabled devices, such as smartphones, allows contributors to share their geographic information on a number of selected online portals. The collected information is oftentimes referred to as volunteered geographic information (VGI). One of the most utilized, analyzed and cited VGI-platforms, with an increasing popularity over the past few years, is OpenStreetMap (OSM), whose main goal it is to create a freely available geographic database of the world. This paper presents a comprehensive overview of the latest developments in VGI research, focusing on its collaboratively collected geodata and corresponding contributor patterns. Additionally, trends in the realm of OSM research are discussed, highlighting which aspects need to be investigated more closely in the near future.


Introduction
Since the discontinuation of the selective availability of the Global Positioning System (GPS) in 2000, allowing users to receive a non-degraded signal, the increase of GPS-enabled devices and new

OPEN ACCESS
technological developments allowed more and more people to use their location information for designated location-based services (LBS), to position their photographs or other information on a world map or to use it for spare time activities, such as geocaching [1][2][3][4].In a similar timeframe and as a consequence of the Web 2.0 phenomenon [5], Internet users began not only to passively consume information, but also started to create or edit web content based on their individual requirements, preferences or interests.Tapscott [6] described these participants as "prosumers", a portmanteau of producer and consumer, whereas Coleman [7] titled them "produsers".As a result of this development, projects, such as Wikipedia, or image and video sharing platforms, such as Flickr or YouTube, experienced a significant increase in user numbers and contributors over the past few years.
Several terms were introduced in mainstream media and research alike to describe this new pattern in data development.General data contributions, such as Wikipedia entries or blog posts, are oftentimes summarized as user-generated content (UGC; [8]) or user-created content [9].The additional geographic component, usually represented by latitude and longitude values, separates a special data type from these contributions, oftentimes termed crowd-sourced geodata [10,11], collaborative GI [12,13] or more commonly known as volunteered geographic information (VGI; [1]).Related concepts that do not solely focus on the collection of information have also been termed in many different ways, such as collaborative mapping [14], wikification of geographic information systems (GIS) [15], participatory GIS [16], public participation GIS [17] or web mapping 2.0 [18].
VGI has become a widespread phenomenon in media and academia alike.A number of research projects in recent years have analyzed the advantages and disadvantages related to VGI.In many cases, the researchers investigated the data and contributor information of the OpenStreetMap (OSM) project, one of the most successful VGI projects in recent years [19][20][21][22], which has also been frequently cited in the GIS community [1,23].
The objective of this paper is to provide an overview of current developments in VGI research, focusing on the different methods that were applied to analyze the members and their corresponding data contributions.After discussing the most essential results from the selected studies, different lessons that can be drawn from the presented research and potential future trends that lie in the empirical analysis of VGI datasets will be discussed.
The remainder of this article is structured as follows: The next section briefly introduces VGI, followed by a comparison of different VGI projects.Section 3 provides an overview and summarizes the findings of previously conducted OSM research.Future trends and questions are discussed in Section 4, followed by a conclusion in Section 5.

Volunteered Geographic Information
The term volunteered geographic information (VGI), coined by Goodchild [1] in 2007, describes the process of collecting spatial data by individuals, most times on a voluntary basis.In most cases, the contributed VGI is collected in a database or file system structure and sometimes freely available to other interested Internet users.To be able to contribute to one of the VGI platforms, the information has to match a geographic position.This can either be achieved by tracing it from georeferenced aerial imagery or by actively collecting GPS tracks with a designated device.Furthermore, a broadband Internet connection and additional hardware in the form of a smartphone or personal computer are needed.Although these prerequisites seem to be trivial in most modern societies, we will discuss at a later point that they tend to explain certain patterns in the global distribution of VGI projects.
The steep increases in VGI contributions lead to a number of diverse platforms and projects utilizing the data and technologies in spatial decision making, participatory planning and citizen science [24].VGI data also experienced more attention, due to its successful implementation for humanitarian or crisis mapping purposes [4].In this context, the Humanitarian OpenStreetMap Team (HOT) [25] obtained an important role.Since 2009, it has coordinated the creation, production and distribution of free mapping resources to support humanitarian relief efforts in many places around the world.Ushahidi [26], a different platform that collects humanitarian crisis information, was initially developed to map reports of violence in Kenya in 2008 and has evolved into a valuable tool during a number of projects in recent years.The potential of VGI has also been proven for urban management purposes [27], flood damage estimation [28], wild fire evacuation [29] or other important cases of risk, crisis and natural disaster management [30][31][32] or responses [33][34][35].
The reason why VGI is implemented in many of these scenarios is its open approach to data collection.VGI is sometimes the cheapest and oftentimes the only source of geo-information, particularly in areas where access to geographic information is considered an issue of national security [1].The aforementioned Internet connection, essential for data contributions to VGI platforms, can be a serious caveat in developing countries, which, in combination with language issues based on the fact that many VGI services only support the Roman alphabet in the English language and large analphabetic population rates, can hinder the contributions to a VGI project in these areas [1].However, despite these facts, the OSM project has developed into one of the largest and well-known VGI projects of the past seven years.Many different OSM-based maps have been created in recent years, tailored to different purposes, such as skiing, hiking or public transportation, by rendering the collected information in a particular way.More advanced projects, such as OpenRouteService [36,37] or OSRM [38], have shown that collaboratively collected geo-information by volunteers can be a reliable data source for car, bicycle, pedestrian and possibly wheelchair or haptic-feedback routing or navigation applications.Based on the increasing success of the OSM project, several other companies implemented the idea of collaboratively collected or corrected geographic information for their own business solutions or data sources.

Comparison of Recent VGI Projects
In this section, we compare six different VGI projects that collect geodata in a collaborative way (Table 1).The comparison focuses on platforms that are providing more advanced geographical information, such as real-world features, in comparison to other VGI portals that solely share geolocated tweets or images.Furthermore, the comparison does not include other widely used LBS platforms, such as Google Maps, Telenav or Apple Maps, due to their limited VGI functionality.
The oldest VGI project in our comparison is the aforementioned OSM project.A similar project, Wikimapia [39], was founded in 2006 and is not related to Wikipedia in any way.The four other projects listed in Table 1 (Map Maker [40], Here Map Creator [41], Map Share [42] and Waze [43]) were established several years after OSM claimed more popularity and are partially owned by well-known proprietary geodata providers.In the past, these platforms had a limited functionality that only allowed volunteers to report an error in the map data in the form of a note (e.g., Map Share).Nowadays, registered members can create or make corrections to existing street data on almost all of the aforementioned commercial platforms.The data license and the availability of the collected information is a major difference between the compared VGI projects.In contrast to the aforementioned proprietary data providers, the OSM dataset is available under the Open Data Commons Open Database License (ODbL) [44], which allows users to copy, distribute, transmit and adapt the data, as long as OSM and its contributors are credited.More importantly, if the user alters or builds upon the OSM dataset, the results can be distributed under the same license.The collected data of the OSM and Wikimapia projects can be downloaded through a designated application programming interface (API) or as a complete database dump file.Notes: 1 the number of enabled devices; 2 [45]; 3 [39]; 4 [46]; 5 [47]; 6 the number of countries for which map data is available; 7 only via a web-API.
Although the Waze project claims to have the largest contributor base, we will discuss, at a later point, that a large number of registered members does not always imply better data quantity or quality.

The OSM Project
The OSM project was initiated in 2004.Its main database and web services are hosted on a number of servers at University College London.Additional server infrastructure was established through several donation rounds.All servers and interfaces to create and share OSM data are mainly developed and administered by volunteers [48].The project's goal is sometimes described as "building a global map" [49]; however, the main aim of the project is to build a freely available database with geographic information, which can, of course, be used for mapping purposes, but also for navigation or other applications.The OSM Foundation (OSMF), an international not-for-profit organization, has been established to encourage the growth, development and distribution of free geospatial data and to provide geospatial data for anyone to use and share [50].Additionally, the OSMF is divided into several working groups [51] that support the project in specific areas of interest.For instance, the Operations Working Group plans and maintains the OSM API and servers.
At the time of writing, the OSM project had almost 1.4 million registered members [45], who contributed almost 2.1 billion points and 220 million lines, which are partially based on 3.6 billion GPS points that have been uploaded.Similar to other online communities, such as Wikipedia, only a small percentage of those volunteers actively contribute data on a regular basis, as we will discuss in more detail at a later point in this article.To be able to contribute data to the OSM project, the potential member has to register and create an account.In contrast to other VGI projects, the newly registered member can add, modify or delete geographic objects in the OSM database right after the registration process, whereas, for instance, in Google Map Maker, the edits made by new members are reviewed first.This relatively open approach to data contributions in OSM is described as one of the major benefits of the project [52].
The collection of geo-information by online communities is oftentimes described as a bottom-up approach [12].In the case of the OSM project, however, different data contribution types need to be distinguished.In the first few years of the project, most contributors collected the geo-information by utilizing GPS-enabled handhelds.However, between 2007 and 2011, the Internet company, Yahoo! [53], allowed the OSM project to trace data from their satellite imagery, and a similar agreement could be established with Microsoft Bing Aerial Imagery [54] in November 2010.The availability of both imagery platforms had a large impact on the collection of new objects in OSM.Particularly the release of the Bing imagery datasets resulted in a strong increase in building information [55].Additionally, several countries achieved a large data collection in OSM by importing commercial or governmental road network datasets that comply with the OSM license.Examples can be found in the Netherlands, Austria and the United States.For Spain and France, cadastral building information was successfully imported to the OSM database.
OSM contributors can use multiple ways to communicate with each other.Most of the project-related information is collected and shared in the official OSM wiki, which covers a plethora of subjects, such as detailed information about tutorials for beginners, usable software or documentation about how objects should be mapped.Additionally, contributors use a variety of Internet relay chats (IRCs) [56] or mailing lists [57] to ask questions regarding tagging conventions in OSM, software development, data imports or other topics.Figure 1 illustrates the digital infrastructure of the OSM project and its community.
Many active OSM contributors also participate in so-called "Mapping Parties", during which the contributors meet at a certain location, get to know each other, share experiences about OSM and spend some time exploring and mapping the community [23].Sometimes, these events can also take place at previously unmapped areas to improve the data collection efforts in those regions [49].The main events in the OSM community that attract most participants are the yearly "State of the Map" conferences, which are held at different locations in the world.
To make modifications to the OSM database, the contributors can use multiple editor types.The newly developed iD editor makes it easier for new contributors to add information to the map, while the long-established editors Potlatch or JOSM (Java OpenStreetMap Editor) are preferably used by more advanced members [58].There are also a large number of different editors available for multiple mobile devices, such as smartphones, and different operating systems.The data contribution patterns of the active OSM community have changed over the past few years for different world regions.During the first few years of the project, most volunteers focused their data collection efforts on road network data.Nowadays, other real world features, such as buildings, land use or public transportation information are being added in many regions to provide more details to the users.When a volunteer creates an object in the OSM database, representing a real-world feature, she/he can use three different object types [59].Point information is represented by a Node object in OSM, whereas a Way object is utilized when mapping lines or polygons (latter, in the form of a closed line feature).If a number of Node and/or Way objects are related to each other, the Relation object can be utilized to map this particular information (e.g., turn restrictions of the street network or tram/bus lines for public transportation).Any modifications or contributions made to the OSM database by a single member are stored in a changeset, and its extent covers the entire area within which a contributor made her or his changes.Each object in OSM can be annotated by a variety of attribute information, also referred to as tags, which consist of a key-value pair.Any contributor is free to propose and discuss new tags to describe real-world features [60], resulting in a bottom-up tagging approach, indicating that there is no traditionally, enforced tagging limitation with which mappers have to comply.However, a large number of suggested key-value combinations that are widely used in the community are provided in a wiki [61] that helps to standardize certain objects in OSM.It also needs to be noted that a variety of map render-engines influences the creation of map "standards", due to their specific rendering functions that only allow certain features with particular tags to be shown on the map.
The collected data and created changesets can be retrieved via the OSM web-API (only for a limited extent) or as a complete database dump for the entire world.Websites, such as Geofabrik's OSM Data Extracts, also provide data downloads for a specific area of interest.It needs to be noted that these pre-processed downloads only include the latest object versions, representing the current state of the features in the map.For analytical purposes, full history dump files are available [62], which include all versions of all features and allow for more advanced methods to test for potential changes between different versions in the dataset.Traditionally, the OSM datasets were provided in Extensible Markup Language (XML) format.To improve performance and to allow for faster processing, a binary data format, i.e., Protocol Buffer Binary Format (PBF), has become more common in recent years.
The success of the OSM project has been increasing in recent years, and several companies, such as Apple [63], Flickr [64] and foursquare [65], switched their mapping applications entirely or partially to OSM.Others created start-up companies that are building their solutions around OSM.

Current Developments
In 2007, early questions were raised about the usefulness of VGI in science [66].While a number of publications highlighted that the credibility and reliability of VGI could be questionable [48,67], more recently, researchers focused on the actual quality analyses of the VGI datasets [19,21,68].These first research contributions were followed by studies about trust [69,70], contributor behavior [23,71] and gender distributions in VGI projects [72,73].The goal of the following three sections is to give a comprehensive and detailed overview about VGI research progress in recent years with the focus on OSM.

Data Quality Analysis
The quality assessment of geographical information follows a predefined set of quality measures and criteria.A variety of publications are available that are related to the definition of these characteristics [74,75].In 2002, the International Organization for Standardization (known as ISO) released a standard that defines the quality attributes of geodata in ISO 19113 [76] (principles for describing the quality of geographic data) and ISO 19114 [77] (framework for procedures for determining and evaluating quality).According to ISO 19113:2002, the following five parameters define the quality of geodata: completeness, logical consistency, positional accuracy, temporal accuracy, and thematic accuracy.By the end of 2013, both ISO standards (19114 and 19113) have been aggregated to one single standard: ISO 19157 [78] (geographic information data quality).In the next sections, we will present several OSM research projects dedicated to one or multiple of the aforementioned spatial data quality parameters.

Road Network Evaluation
Over the past five years, most OSM quality analyses evaluated the OSM road network in comparison to administrative or commercial datasets.In the first analysis, Haklay [19] compared the 2008 OSM data for Great Britain with the ordnance survey (OS) dataset, Meridian 2. To evaluate the positional accuracy, he used a buffer comparison method previously introduced by Goodchild and Hunter [79] and Hunter [80].The completeness of the dataset was evaluated by conducting a grid-based length comparison of the road networks.The result revealed that the OSM dataset can provide an adequate coverage for 29.3% of the area of England.One year later, in 2009, the analysis was repeated, and OSM improved its coverage to 65% [81].The quality and coverage for OSM in England showed a heterogeneous pattern, with stronger road network concentrations in urban areas, but a lack of details, such as street names, whereas rural areas at times showed a complete lack of coverage.Zielstra and Zipf [82] utilized a similar methodology to compare the commercial TomTom Multinet dataset with OSM for Germany.They concluded that the OSM data in Germany have a similar heterogeneity as previously found between OSM and OS data in the UK in terms of its completeness, highlighting that this particular VGI source can be an alternative to commercial providers in densely populated urban areas.Rural areas, however, tend to show less coverage in OSM and are not sufficient for the creation of more advanced applications, such as route planners.A different study for Germany compared OSM data with the commercial road network dataset distributed by Navteq [83].The study implemented a feature matching method, which was previously introduced by Devogele et al. [84] and Walter and Fritsch [85].Similar conclusions as the ones previously found by Zielstra and Zipf [82] for the OSM dataset in Germany could be made.Girres and Touya [68] conducted a study for France by extending the previously introduced analysis by Haklay [19], which focused on the road network, to other features, such as points of interest (POIs), waterways and coastlines.The study showed a similar heterogeneity of the OSM dataset for France as previously revealed by other researchers for other countries.However, in this particular case, some of the discrepancies can be explained by imports of different datasets, a variety in data collection methods and the participation of contributors in designated projects focusing on selected features or a predefined area.Koukoletsos et al. [86] introduced additional methods, similar to Ludwig et al. [83], to match a VGI source to a reference dataset, improving the data quality evaluation.The results showed that their matching procedure for the two tested areas, utilizing OSM and OS transportation network information, is efficient and that the mismatch error was below 4%.
All aforementioned studies had a strong focus on geometrical accuracy and completeness.In the following years after these initial studies, different research projects shifted this focus to other geodata quality measures.The evaluation of attribute information revealed that the removal of topological errors in OSM for Great Britain was not keeping up with newly introduced data errors in the database [87].Canavosio-Zuzelski et al. [88] introduced a photogrammetric approach to assess and enhance the positional accuracy of the OSM street network data using stereo imagery and a vector adjustment model.In their method, they compared the road centerlines with referenced satellite imagery in the U.S. Based on several test areas, their proposed approach was able to improve the positional point accuracy and to recover the positional street displacement of OSM data.In a different study [89], a variety of methods were applied to evaluate positional and linear geometric accuracy and area shape similarity among datasets for integration purposes in different study areas for the UK and Iraq.The researchers concluded that the integration of OSM into the official dataset caused several issues from the geometrical matching perspective.Major differences can be accredited to the varying data collection procedures in OSM.In their test areas, some of the data was remotely mapped by contributors from different countries with little to no local knowledge.Hagenauer and Helbich [90] presented an algorithm that allowed the mining of land-use patterns from the OSM street network.This was the first approach that actively enhanced the existing or generated a new dataset based on the collected VGI data.Additionally, Helbich et al. [91] presented a spatial statistical method to compute the positional accuracy of road junctions by extracting and comparing these particular features in OSM to a proprietary dataset.
The temporal development of the OSM dataset in Germany was analyzed in a comprehensive study for the years 2007 to 2011 [21].The results showed that the total difference between the OSM street network for motorized traffic and a comparable proprietary dataset, i.e., TomTom, was only 9%, indicating missing data in OSM.However, when considering the entire German OSM street network, including pedestrian paths and small trails, the VGI data source exceeded the proprietary information by 27%.The same study and Scheider and Possin [92] revealed that other important information for navigation purposes, such as turn restrictions, included in most proprietary datasets, are oftentimes missing in the OSM dataset.
In 2011, Zielstra and Hochmair [93] conducted one of the first OSM studies outside of Europe.Based on similar methods introduced in prior studies [19], they compared the OSM dataset with proprietary data from TomTom and Navteq for the entire state of Florida (USA).In contrast to prior findings in Europe, the results of this study showed an opposite trend with stronger coverage in OSM for rural areas in Florida, whereas urban areas showed better coverage in TomTom and Navteq.However, the researchers accredited this pattern to the U.S. Census TIGER/Line street data import for OSM in 2008/2009.Data imports are a highly discussed topic within the OSM community, with strong opinions for and against the import of license conform datasets.Zielstra et al. [94] analyzed the import of the TIGER/Line dataset for the entire U.S. in more detail and summarized that the community is not focusing on improving the imported dataset.This statement could be made for rural, as well as urban areas.Instead, the OSM community rather focuses on adding more detailed information, such as pedestrian trails, after a data import of all major roads took place.
More detailed analyses for the OSM US dataset were conducted with the focus on the road networks for motorized and non-motorized traffic [95,96].The analyses included computations of pedestrian routes for different data sources in selected cities in Europe and the U.S., as well as the assessment of pedestrian accessibility to transit stations.Based on the total length comparisons of the generated routes, errors of commission and omission were identified in the datasets.Due to the dense coverage of pedestrian data in German cities, better results were found for European cities in comparison to U.S. cities [95,96].In a different study for the U.S., the development of cycling-related features in OSM and Google were investigated and compared.Results revealed a high heterogeneity with regards to completeness between analyzed cities and showed that off-road trails were more completely mapped than on-street bicycle lanes [97].

Evaluation of Points of Interest (POI) and Other Features
When considering OSM for navigation and routing purposes, it is important to implement an exact transformation of an address or textual description of a place into a geographic location.This process, referred to as geocoding, was investigated in more detail by Amelunxen [98], who compared the results of the geocoding functionality in Google Maps with the results gathered from OSM.Nearly all requests on the municipal, street and, in particular, house number level were classified as not sufficient for detailed spatial analysis purposes in OSM, highlighting one of the most profound caveats of the project.Jackson et al. [99] showed similar results with regards to address information in their data comparison analysis of point features in OSM and other datasets.
Other studies compared OSM land cover features, such as land use or natural areas, for several countries to pseudo ground-truth data [20,100].The analysis investigated spatial sample point characteristics of the polygons to retrieve results about cases where features are under-or over-represented (in terms of the number of points used to represent a polygon feature).Furthermore, the distance between the polygons' adjacent points was computed for quality measurements.Ciepluch et al. [101] manually compared the spatial coverage currency and ground-truth positional accuracy of OSM in comparison to Google Maps and Microsoft Bing Maps, revealing no clear pattern in favor of any of the tested sources.
POI features take another major role in VGI data sources, such as OSM.The collection of untraditional places of interest by volunteers, not available in governmental or proprietary datasets, drives the interest of many researchers in this domain.Mashhadi et al. [102] compared POI from Navteq and Yelp with the data collected in OSM for London (UK) and Rome (Italy) and found a highly accurate correlation in terms of geographic position.Hristova et al. [103] used a different approach to POI analysis and tried to map different community engagements based on the contributed POI data in OSM.The results showed that spatially clustered communities produce a higher quality of coverage than those with looser geographic affinity.In a different study, the spatial-semantic interaction of OSM POI was investigated in more detail [104].The authors presented a semantic similarity measure that can be used to support tools and contributors in collecting and cleaning up POI data.Lastly, a study that investigated the completeness of POI in OSM in the U.S. showed that, in contrast to prior findings about the road network, the imported data from the Geographic Names Information System (GNIS) database was subsequently updated by the OSM community [105].

Data Trust and Vandalism
Similarly to other open source related projects, Linus' Law [106] can have a major impact on the success of VGI.The assumption behind Linus' Law is that with the number of contributors, the quality of the product increases, an assumption which has been proven for Wikipedia, where the quality of an article increases with the number of contributors who work on it [107][108][109].While Haklay et al. [110] found that the law generally applies to OSM positional accuracy, Mooney and Corcoran [111,112] could not identify a similar pattern when analyzing heavily edited objects in OSM.
Nearly all presented studies discussed thus far showed indications of similar data completeness or improved data quality for densely populated areas in OSM in comparison to proprietary and governmental datasets [19,21,113].Mooney et al. [3] summarize that the OSM project proves to be heterogeneous with an urban bias and chances are that: "When one moves away from large urban centers the major issue for quality becomes one of coverage-in many rural areas there is little or no OSM coverage at all" [111].While most conducted analyses focused on different areas in Europe and the U.S., Neis et al. [114] investigated the aforementioned general pattern of OSM data on a larger scale.The study revealed that when comparing selected world regions, the data quality and contributor activity does not necessarily always show the same pattern in all urban areas, particularly outside of Europe.In prior research, it had already been shown that the number of contributors can strongly influence the geometric data quality and spatial concentration of OSM data in different areas [19,68]; additionally, it was also determined that the temporal dataset quality is highly affected by the same criteria [114].
It has been criticized that most prior studies about OSM with the objective of a data quality analysis only consider certain object types (e.g., roads) for descriptive measurements [90].However, other studies also highlighted the lack of attribute information, such as turn restrictions, speed limits or street names [19,21,83], the lack of a well-defined data standard [12,68,115] or some formal quality control process [99] in OSM and VGI data in general.All of these factors lead to the questionable statement by Fairbairn and Al-Bakri [89] that "it is probably better to have no mapping at all, rather than some inaccurate, possibly incomplete, user generated content".The best approach to answer whether OSM or any other VGI source should be utilized or not is to assess the OSM dataset quality for the selected area of interest and its particular role or purpose in the project [60,116,117].Therefore, it is important not to look only at the completeness of the map data, but also to review the collected information in more detail, especially in areas where data imports or automated scripts took place and no active contributor community is available.Furthermore, it needs to be noted that the availability of aerial imagery in OSM editors introduces "armchair-mapping" patterns, in which case, the contributors of the project only trace objects from the satellite images and no local knowledge is needed [114].In most cases, areas with lower OSM community member numbers tend to have higher contributions based on armchair mapping, which stands in contrast to the "local knowledge" most people identify with when they refer to VGI [1,118].
To simplify the evaluation of the OSM dataset for the users' areas of interest, many free and online quality assessment and assurance tools are available to get detailed quality information.Interested users are also able to report errors in the map by using OSM Notes or OpenStreetBugs.Other tools, such as Keep Right, Osmose or OSM Inspector, can be used to visualize detected errors in the map data.However, establishing some sort of trust in the collected VGI dataset is a really important factor.Several researchers presented the first approaches on how the volunteers could act as a sort of quality measure.Bishr and Janowicz [69] discussed a possible solution for VGI projects based on trust ratings in a social network that acts as an indicator for user reputation.In the case of the OSM project, this approach would not be suitable, due to the lack of a social network structure [119].Kessler et al. [70] specified different patterns that can be used to determine trust values based on the history of contributed objects.In a second study, it was shown that for a test area in Germany, the approach can provide useful information for potential data users, even without a reference dataset, and the researchers pointed out that for further analysis, the reputation of each contributor should be considered as an important factor [120].
Although the OSM project showed some promising developments in recent years, the increasing popularity also comes with a number of caveats, especially in the form of cases of vandalism, similar to developments seen in Wikipedia [121].While Coleman [122] summarized some of the first methods on how to validate contributors and their spatial information, Neis et al. [52] developed the first prototype to automatically detect vandalism in VGI projects and revealed that within a timeframe of one week, at least one case of vandalism or accidentally destroyed objects by new or inexperienced members can be detected in the OSM database each day.

Contributor Analysis
A second large spectrum of VGI research that experienced a strong increase of interest in the research community in recent years is dedicated to the contributor behavior of projects, such as Wikipedia or OSM.In many publications, the voluntary members are titled as "users" [72,118].
However, in the context of this paper, we want to distinguish between users (who use the data or online information), registered members (who have an account with the VGI project) and contributors (who actively contribute to a VGI project).The reason for this precise classification lies in the fact that the number of users does not reflect the actual number of active contributors and neither does the number of passive members that are only registered to the project, but never actively contribute.It is nearly impossible to determine the actual number of OSM users, since not every single user that implements the OSM data in a project or uses it in an application on his or her handheld device needs to be registered with the project.Merely the number of registered members can be determined from the OSM database and further processed for analysis.

Participation Inequality
Similarly to other open source-related or online community-based projects, VGI platforms experience a so-called "participation inequality".Nielson [123] describes this phenomenon with a 90-9-1 rule, representing the 90% of users who never contribute to the project and merely function as "lurkers", the 9% of contributors that add information on an irregular basis and the 1% of contributors that account for almost all the collected information of the project.This phenomenon has been identified for Wikipedia [108,124], as well as for OSM [71,125] in similar ways.The activity of the project's community has a major impact not only on the collection of new geographic data, but also on timeliness of existing datasets.In the context of UGC platforms, the widely used online encyclopedia, Wikipedia, had almost 20 million registered members at the beginning of October, 2013, of which, a total of 1.7 million members (9%) had edited at least one article, but only 125,000 members (0.7%) had performed an action to articles in September, 2013 [126].It needs to be noted, however, that these numbers do not include changes made by unregistered, anonymous contributors.
Similar patterns can be found in VGI projects, such as OSM.In 2008, about 10% of the 30,000 registered members actively contributed to OSM [127].This positive trend continued in the following year (2009) for which a study had shown that, in total, about 33,400 (28%) of the 120,000 registered members edited data for the project [125].In 2010, the number of registered members increased to 300,000, of which almost 5% (16,500) actively contributed to the project on a monthly basis and only 3.5% of all members (12,000) accounted for 98% of the data volume [21].In a more recent study, it was shown that 38% of the 500,000 registered OSM members edited at least one object of the projects dataset in 2011 [71].Figure 2 illustrates the growth of registered members and their corresponding activity over the past eight years.It also highlights the strong discrepancy between the number of registered members and the active contributors who created a changeset.Additionally, Figure 2 illustrates the significant difference between the number of "one-time-only" contributors [7] who created only one changeset and contributors who performed several edits in the OSM database.
The conducted research by Neis and Zipf [71] also analyzed the contributor activities by day of week and time of day.Almost all weekdays showed similar contribution patterns; only on Sundays did the project prove to have a slightly larger number of data edits, while the afternoon and evening hours were identified as time ranges with the highest activity in OSM for each day.Overall, the discussed results of the OSM project match similar patterns previously found in Wikipedia [128] or mobile phone communication behavior [129].Neis and Zipf [71] also identified different member groups in OSM based on the number of contributions the members made to the project and revealed that only around 5% of all members contributed in a significant way.Although the absolute number of registered members is still increasing, the relative number of active members has been decreasing in the past two years.In 2012, only around 3% of all registered members made a contribution each month; at the end of 2012, only 18,000 (1.8%) of the one million registered members actively contributed any data.This negative trend continued in 2013.As of October, 2013, the OSM project had almost 1.4 million registered members and the number of active contributors in that month was only around 22,000 (1.6%) [130].The negative trend in the relative contribution share is mainly influenced by the high amount of newly registered members in 2013 (Figure 2).Based on these prior findings, a number of studies questioned the long-term motivation of the contributors of the project [7,21,118].When analyzing all created changesets of the OSM project, the increase in monthly volunteer numbers over the past few years and the consistencies in data contributions can be visualized as shown in Figure 3.Only half of the active members in OSM that contributed in the month of October (2013) are also long-term contributors and registered before or during 2012.A clear pattern can also be seen for the years 2008 to 2012, with almost 70% of contributor loss over the following years, stopping significant data edits and object changes in OSM.

Areal Distribution
The areal distribution of the active OSM community members shows a similar heterogeneous pattern as the aforementioned data quality and quantity analyses.Since an OSM member does not have to provide his or her location information when registering to the project, Budhathoki [125] and Neis and Zipf [71] localized the members by utilizing different approaches.Budhathoki [125] analyzed the number of added Nodes per country for each contributor, whereas Neis and Zipf [71] focused on the first edit of a contributor or the area which shows the most activity, to identify the origin of the project members.Both studies showed similar results for the years 2010 and 2012 in which three-quarters of the contributors were located in Europe.The remaining quarter was distributed over North America and Asia.South America, Africa and Oceania proved to have only a small contributor number.When considering the population density of the different countries, it is surprising to see that the USA, China or India only show relatively small project contributor numbers, which can be caused by a number of reasons, such as other freely available datasets, such as the TIGER/Line data for the U.S., or governmental restrictions that make the collection of geodata illegal in certain countries.Additionally, Neis et al. [114] illustrated that other factors next to population density or income must have an influence on contributor growth.The highest concentration of active contributors in OSM can be found in Germany.Out of the 2500 daily contributors, around 550 members (22%) edited data in Germany [130].Thus, it is not surprising to see that the German OSM dataset also shows a higher quality.Figure 4 illustrates the distribution of active OSM contributors per day related to population in millions (a) and area (b) for each country, highlighting the strong concentration of OSM contributors in Europe (b).
The OSM contributors, however, are not just limiting their data collection efforts to their home regions.Budhathoki [125] and Neis and Zipf [71] revealed that a smaller number of highly active OSM members collect data in at least two or more countries.

Motivation, Behavior and Gender Dimensions
A number of studies in recent years also provided more insight about the discrepancies in contributor motivation, behavior or gender dimensions in VGI projects, such as OSM.In most cases, an extensive survey was conducted to evaluate the most detailed information about the contributors of the project.Budhathoki [125] and Budhathoki and Haythornthwaite [23] showed in a comprehensive study which criteria increase the contributor motivation in VGI projects.Similar studies were conducted with focus on UGC platforms, such as Wikipedia or other open source software development communities [7], whereas Steinmann et al. [131] compared the motivating factors for several online portals, such as OSM, Google Map Maker, Foursquare, Panoramio, Facebook and Wikipedia.Furthermore, the different factors have been classified by the authors in intrinsic and extrinsic or constructive and negative classes.Table 2 provides an overview of the classification schemas based on the aforementioned studies.The motivation of the different project members was one of the main criteria the researchers investigated in their studies.However, demographic factors, such as age, gender or educational background of the project's participants, were also analyzed in more detail.The results showed that the majority of OSM contributors, i.e., more than 97%, were males [23,132,133].For other online portals, such as Foursquare or Facebook, the participant data did not show this biased gender distribution; however, the female participation rate dropped substantially when geographic information was introduced, for instance during the geotagging procedure for images or posts on the social networking platforms [72,73].The comparison of contributor gender distribution between different mapping platforms showed that women and men contribute to Google Map Maker at equal rates, whereas the number of female contributors significantly dropped when considering contributions to OSM [72].However, these findings differ from the results by Steinmann et al. [73], where OSM and Google Map Maker showed equally low female participatory values.
The analysis of the OSM contributor age distribution revealed that the majority of contributors (>60%) are between 20 and 40 years old, and about 20% of the mappers are 40 years or older [23,72].The contributors also provided information about their educational background during the surveys, and the results showed that 63%-78% had a college, university or higher education degree [23,72,133].
Many research articles stated in the past that VGI is mostly contributed by non-experts [12] or volunteers who are untrained and unqualified [19,67,134].Janowicz and Hitzler [135] summarized that VGI is collected and edited by a heterogeneous online community with different backgrounds and application areas in mind.However, the conducted surveys did not support these statements entirely [72,125,133].Almost 50% of the respondents of each survey had degrees or worked in the fields of geography, geomatics, urban planning or computer/information sciences, highlighting that the OSM community does not necessarily only constitutes of GIS amateurs, as is oftentimes speculated [125].
Next to the aforementioned socioeconomic factors, the social interaction between contributors in OSM and their individual contribution patterns played a major role in a number of research articles.Oftentimes, only a few contributors collect the majority of data volume in a predefined area [112] or entire country [21].Mooney and Corcoran [119,136] attempted to identify explicit social networks between contributors based on this assumption.The results showed that most collaboration in OSM is purely incidental and that data contributions are mainly done in isolation.
In 2009, a first attempt was made to classify VGI project members into different overlapping contributor categories, such as neophyte, interested amateur, expert amateur, expert professional and expert authority [7].For OSM, the contributors were oftentimes classified or sorted based on their contribution patterns.Mooney and Corcoran [112] and Neis and Zipf [72] grouped the members of the OSM project based on the number of contributions or created objects to analyze the different groups in more detail.Mooney and Corcoran [119] classified the OSM contributors of the London (UK) area into four distinct groups, highlighting that the majority of the contributors edited the geometry of objects or their corresponding attributes, but in many cases not both.Steinmann et al. [131] utilized a clustering method based on the contribution and feature types of each contributor to identify different patterns in mapping behavior.As a result, contribution profiles, such as "Premium Creator", "Highway Mapper" or "All-Rounder" were created.

Additional Developments
Many published research articles in recent years did not intrinsically analyze OSM data quality or contributor patterns, but utilized the available dataset in a number of applications to investigate the fitness for the purposes of the contributions.With the increasing popularity of 3D applications, researchers tested the applicability of OSM for 3D applications or 3D location-based services [113].In the following years, the first publications suggested how the OSM schema could be extended to indoor environments [137].Other suggested methods on how to transform OSM data to the standardized City Geography Markup Language (CityGML) models [138] or for indoor evacuation simulations [139].
In particular, the potential of OSM data for routing or trip-planer applications attracted a high interest in the research community.Neis and Zipf [36] presented a first approach on how OSM data can be utilized for routing and address or POI allocations.Luxen and Vetter [38] improved the OSM routing performance with an open source mobile and server route planning application utilizing a contraction hierarchy method that enabled faster route computations [140].OSM data was also implemented in robot tasks and autonomous vehicle applications [141], whereas others augmented OSM route network data with Shuttle Radar Topography Mission (SRTM) height information to compute the optimal path for electric vehicles [142].
The open approach to data contributions in OSM allows for the development of a plethora of applications, online or printable maps tailored to particular needs, such as hiking, biking, skiing or public transportation.The detailed data requirements for routing applications tailored to disabled people, such as pavement width or surface conditions, can be added to OSM and annotated with a selected number of tags.First, studies introduced how OSM and its tagging schema can be utilized for applications tailored to wheelchair users [143,144] or visually impaired pedestrians that utilize haptic-feedback [145].The Wheelmap project [146] is a great example for this particular case.Any volunteers can mark locations with wheelchair-friendly environments or accessibility.The results and detailed information of the Wheelmap project is then saved to the OSM database.
A second large potential of OSM and other VGI-related projects lies in their support function on decision making processes during disaster management.Up-to-date geodata that includes detailed accessibility information for particular crisis regions can be of crucial importance during the relief response operations of organizations, such as the Red Cross.The data of the OSM project can be utilized in many ways during these events, due to its fast data processing methods and timely map updates.Additionally, the conversion of OSM data to Shapefiles, or other source files for handheld GPS devices, helps to develop tailored LBS applications and run spatial analysis tasks, for instance during natural disaster events.The success of OSM during these events has been proven during a devastating earthquake in Haiti in 2011, a tsunami in Japan in 2011 and after typhoon Yolanda in the Philippines in 2013.Contributors of the OSM project helped to collaboratively collect geodata for the crisis areas.At the latter event, more than 1500 contributors from 80 countries made more than 3.8 million map changes within 15 days [147].This was a significant increase in member activity in comparison to the prior events, where almost 700 (Haiti) and 350 (Japan) contributors collected information for the affected regions.Auer and Zipf [148] also demonstrated that open data and open standards can help reduce costs, whereas Goetz et al. [149] presented a workflow to develop an online map completely based on OSM data.

Future Trends
This extensive review of previous research articles has shown that VGI and, especially, OSM have experienced a strong increase of interest within the research community over the past few years.Although an impressive number of research projects focused on the quality assessment and contributor analysis of VGI, questions remain about different research domains when considering the voluntarily contributed datasets.
Most VGI data quality analyses in the past were conducted using a commercial or governmental reference dataset of high quality.While the general question remains whether these proprietary datasets can really be considered as more accurate than VGI, or if the opposite situation should be considered during the analysis, others focus on intrinsic data evaluations if no reference dataset exists or is not available, due to high costs or other factors.In the case of OSM, these intrinsic approaches included the evaluation of the number of edits or the number of contributors in a predefined area [21,110,114].However, more research needs to be conducted to evaluate the effectiveness of these studies.If a number of contributors in OSM stopped collecting a certain feature type in a predefined area and starts collecting other information, does this imply that this particular object type is completely mapped in the area of interest or are there other criteria that can play a role?Additionally, particularly for quality assessment analyses, it would be necessary to have more detailed information about how the data was collected.Did the contributor use a GPS-device, the available areal imagery or is her/his contribution based on a data import?Although the OSM project provides several tags to specify the source or contribution type, the overall usage of those key/value-pairs is only limited within the community.
Similarly, the discussion about trust in VGI is ongoing.Several studies have shown that VGI data can be used as an alternative to commercial or proprietary datasets and in the case of OSM, different companies already switched their mapping products to the freely available dataset.The credibility and trust in VGI plays a major role in these cases, but how can a VGI project, such as OSM, with no strict data specification or quality control, establish some type of trust?None of the previously conducted research projects considered one of the most important components of a VGI project: the individual contributor reputation.How can this reputation of a contributor be computed to provide a better understanding about the quality and trust of the data?What parameters are necessary to assess the quality of the contributions?Some first quantitative parameters, such as the amount of created or edited objects, have been investigated in prior studies to give some first insights.However, for a sophisticated trust estimation of a collaboratively collected dataset, other factors, such as the home location of a contributor, the mapping behavior, especially with regards to the usage of tags that represent a special area of interest of the contributor, or her/his acceptance and reputation within the community can play major roles.Would a contributor rating and reputation system, as discussed by Bishr and Kuhn [12] and Flanagin and Metzger [67], be useful to calculate the credibility and trust values as a proxy for VGI quality assessments?
In one of the aforementioned studies, a method was introduced that created contributor profiles, such as "Highway Mapper" or "Building Mapper", based on the added information of the contributor [131].A study about Wikipedia contributors revealed, however, a relation between the quality of an article and its authors and concluded that it is more important who contributes to the article and not as much what type of information was added [150].Similar methods that separate contributors by their reputation and experience rather than the type of information they contribute are required for VGI projects.
Neis et al. [114] proved in their study that for some areas, the majority of OSM data was collected by external contributors, whose home location was more than 1000 km away from the area in which the data was contributed.It can be assumed that these data contributions are made through tracing aerial imagery, and prior studies have shown that this mapping behavior can lead to an overrepresentation in the geometry of a feature or to missing feature descriptions, such as street names or other information [20,100,114].Thus, more detailed analyses are needed to determine whether external or remote members provide data with a better, equal or worse quality when contributing to the project.Additionally, Comber et al. [151] revealed that contributors oftentimes add information on different scales due to the resolution of the aerial imagery that is available.Similar to the work by Touya and Brando-Escobar [152], who presented a first approach to calculate the level-of-detail of different OSM features, it needs to be investigated how the scale of the contributed geo-information can potentially be used for VGI quality assessment.
The geographic scope of prior VGI analyses had a strong focus on areas in Europe and, to a lesser extent, the United States.One of the aforementioned studies [114] highlighted in a first comparison of selected urban areas of the world that factors, such as population density and income, can have an influence on data contributions and community efforts in the different selected regions.Goodchild [116] also pointed out that the digital divide could highly influence the mapping activities in less developed countries.Thus, a strict distinction between the developments of VGI in different geographic world regions and the analysis of potential socio-economic or cultural influential factors could give a better understanding about the individual motivations to contribute to a VGI project for each world region.
Due to the discrepancies in contributor concentration around the world in VGI projects, such as OSM, sometimes, the particular area of interest does not contain the required data or data types that the user would like to implement in a desired project.Hagenauer and Helbich [90] demonstrated that based on the availability of different feature types in OSM, other missing geographic objects, such as land use information, can be derived.Future research could investigate this process in more detail to see what type of additional geographic features can be derived based on existing objects in the dataset.A different approach that combines and enriches current datasets through data conflation of multiple sources has been the focus of many researchers in recent years.It needs to be noted that licensing conventions, such as the ODbL license in OSM, can be a hindrance in these cases, due to the limitations of the license, whereas other VGI data sources with fewer restrictions, such as geolocated images on Flickr or Panoramio, or tweets on Twitter, could help to improve proprietary or governmental datasets [4].
One of the major concerns that arose in recent years about the OSM project is the lack of detailed and complete address information.Adding this information is a tedious process that does not give contributors the same type of satisfaction as the collection of roads and buildings that are visualized in the projects map.However, how can this issue of missing or incomplete data in VGI projects be solved?Several companies have been supporting the development of a number of tools to display incorrectly mapped information in OSM or that help the contributor to simply collect the required data types.A study has shown that services operating on OSM have a regulative and quality assuring effect [37].The study, conducted in 2008, showed that the number of topology network errors was reduced after an online route planner based on OSM data was available.Similar approaches have been discussed under the term "gamifaction" to engage new or established contributors of the OSM project to solve errors in the collected dataset in the next few years.A popular example in this domain is the Kort Game, a mobile web-app to repair OSM data, which already showed that OSM contributors and other volunteers are willing to enter missing information, such as the name of a POI, to the dataset.
Due to several data imports, automated data edits and software issues in OSM, it is very important that researchers consider who created, modified or deleted the data in the area of interest during their analyses.Features and objects that were created during a data import, modified by an automated script (bot) or deleted by accident or as an act of vandalism do not represent the general pattern of the dataset and need to be highlighted or excluded during the analysis.Neis and Zipf [71] stated that before 2011, a software error in one of the OSM editors increased the version number of each object, which falls into the extent of a certain changeset, although it was not changed by the contributor, a major error that especially needs consideration in studies that utilize the OSM full history dump files.Zielstra et al. [94] showed in their study that the majority of motorized traffic-related data contributions in the U.S. are based on data imports or were changed by automated edits.Unless it is the purpose of the study to identify these patterns, researchers need to be aware of the potential errors that are caused by these procedures.
The detection of the aforementioned vandalism cases in VGI projects has been previously investigated for other UGC-related projects, such as Wikipedia [121].Although only a small number of vandalism cases were detected in OSM in recent years, a study [52] revealed that the number of vandalism cases correlates with the popularity increase of the OSM project.Thus, the need for methods and designated tools that provide secure VGI vandalism detection will spark some of the OSM-related future research.Goodchild and Li [153] proposed that VGI communities should implement data control methods, which could be based on a review system similar to the "edit-reviewer" function in Google Map Maker, in which contributions of newly active members are checked for their eligibility [154].In the case of OSM, the question remains if there are enough volunteers available that are willing to work on manual data validation tasks to approve data edits made by unknown contributors [52].Based on the findings of prior studies, this could be a difficult task, since most contributors restrict their edits and updates to their own collected data [119,136].
A number of significant questions about the long-term motivation and the future of VGI platforms and their contributors have been asked by many in recent years.Based on the presented results in this article about the OSM community, we can say that at least every third contributor, of all the active contributors that ever added information, will continue to contribute over several years.Future research could reveal what type of information long-term members contribute.Do they collect new data in different areas or do they start collecting more detailed information, such as trees or sidewalk and surface information, in their home area?On the other hand, it would be interesting to know what demotivates contributors and makes them stop contributing data to the project.One possible answer could be that there is "no-more interesting work" left to do [118].Besides the motivation of current contributors, others raise questions on how new volunteers can be attracted to VGI projects [2].In the past three months (August to October, 2013), OSM increased by 1,000 new members each day, of which 150 actively started contributing.Compared with prior findings, these numbers reveal a decreasing pattern in the number of registered members per day [21].However, the number of newly active members is identical between the years 2011 and 2013.

Conclusions
UGC and VGI online platforms have developed into a well-known phenomenon on the Internet in recent years.The review of previously conducted research in the realm of VGI in this article has clearly shown that the research community sees a lot of potential in the freely available data sources.OSM, with its exceptionally large community of registered and active contributors, demonstrates that collaboratively collected geographic information by volunteers around the world can lead to an impressive data source for multiple applications.In our study, we provided a comprehensive overview about the recent research projects in the field of VGI with a strong focus on the OSM project.The research efforts have been separated into two main domains: data quality and contributor analysis.A detailed discussion of the latest literature for each domain highlights the methodologies and findings for each study.VGI data quality analyses sparked the interest of the research community in the first few years after the platforms attracted more attention, resulting in a large body of studies in this domain.In more recent years, the analysis of contributor behavior, motivation and gender distribution in VGI projects has experienced more attention in the research community.
Many VGI quality analyses have demonstrated in the past that the freely available data can be used for a variety of applications.However, it is still an important and major task to evaluate whether the data is acceptable for each use case.The quality of VGI datasets, as has been proven for OSM, can be heterogeneous when considering different countries or discrepancies between rural and urban areas.There is no reliable estimation if a certain object or other detailed attribute information is included in a VGI dataset, unless the potential user investigates the data in more detail and compares it to ground truth information or a reference dataset of choice.The long-term motivation of the volunteers that contribute to VGI projects, which has been questioned in recent years, has been proven, at least for the OSM project in this study.
Based on the most recent developments in VGI research, we were able to discuss potential future trends in research and development, especially for the case of OSM.The intrinsic data assessment approach can be utilized for countries where no reference dataset is available or potential costs are too high to acquire the datasets needed.However, new methods need to be developed that utilize this approach and potentially include multiple VGI datasets.Similarly, conflation methods that utilize several VGI sources or combine VGI with other license conform datasets could be helpful in the near future.To make VGI more respected and eligible for these tasks, however, questions about credibility and trust in VGI datasets, with a focus on the contributor, need to be answered.Thus, the development of new methods that compute a trust factor, contributor reputation or individual contributor data quality is required.Finally, additional surveys are needed to gather more information about the differences in contributor motivation and behavior, especially when considering different continents and cultures.

Figure 1 .
Figure 1.OpenStreetMap (OSM) project digital infrastructure and its community.IRC, Internet relay chat.

Table 1 .
Comparison of volunteered geographic information (VGI) projects.ODbL, Open Database License; API, application programming interface; N/A, not available; CC BY-SA, Creative Commons Attribution-Share Alike.