Crowdsourcing, Citizen Science or Volunteered Geographic Information? The Current State of Crowdsourced Geographic Information

: Citizens are increasingly becoming an important source of geographic information, sometimes entering domains that had until recently been the exclusive realm of authoritative agencies. This activity has a very diverse character as it can, amongst other things, be active or passive, involve spatial or aspatial data and the data provided can be variable in terms of key attributes such as format, description and quality. Unsurprisingly, therefore, there are a variety of terms used to describe data arising from citizens. In this article, the expressions used to describe citizen sensing of geographic information are reviewed and their use over time explored, prior to categorizing them and highlighting key issues in the current state of the subject. The latter involved a review of ~100 Internet sites with particular focus on their thematic topic, the nature of the data and issues such as incentives for contributors. This review suggests that most sites involve active rather than passive contribution, with citizens typically motivated by the desire to aid a worthy cause, often receiving little training. As such, this article provides a snapshot of the role of citizens in crowdsourcing geographic information and a guide to the current status of this rapidly emerging and evolving subject.

These include: the thematic area in which the initiative fell (Section 3.1); the nature of the spatial data collected (Section 3.2); the level of expertise and training needed (Section 3.3); access to the data and metadata (Section 3.4); measures of quality assurance and use of the data in research (Section 3.5); information about the participants (Section 3. 6) and what incentives there were for participation (Section 3.7). Based on these findings, Section 4 provides a discussion of the main issues raised in Sections 2 and 3 and provides some suggestions for areas where further research is needed. Table 1 is a compilation of the different terms that have appeared in the literature to represent the general subject of citizen-derived geographical information, along with definitions and attributions. Underneath each term is the year in which it appeared. These terms were then divided into types as indicated in the third column of the table. If a term refers to data and/or information collected, then it is labelled with the letter "I" for information. For example, ambient geographic information is actual data collected from users and so the type is "I". The second type reflects whether a term refers to a process or mechanism that can result in the generation of information-for example, a citizen science initiative. If so, this is denoted by the letter "P" for process in the last column. Figure 1 is an attempt to place all these terms into a single representation that separates out the different terminology for information from the process that can be used to generate it. It must be stressed that Figure 1 is a simplification, aiming to provide a summary of the broad general nature of the topic. Thus, while some of the content extracted from Wikipedia and data from social media can, for example, be georeferenced, the majority of this user-generated content is aspatial. However, some data from social media have been used as passive crowdsourced geographical information.

Definitions
ISPRS Int. J. Geo-Inf. 2016, 5, 55 3 of 23 were added, which were then evaluated using a much broader set of criteria than used in Reference [15]. These include: the thematic area in which the initiative fell (Section 3.1); the nature of the spatial data collected (Section 3.2); the level of expertise and training needed (Section 3.3); access to the data and metadata (Section 3.4); measures of quality assurance and use of the data in research (Section 3.5); information about the participants (Section 3. 6) and what incentives there were for participation (Section 3.7). Based on these findings, Section 4 provides a discussion of the main issues raised in Sections 2 and 3 and provides some suggestions for areas where further research is needed. Table 1 is a compilation of the different terms that have appeared in the literature to represent the general subject of citizen-derived geographical information, along with definitions and attributions. Underneath each term is the year in which it appeared. These terms were then divided into types as indicated in the third column of the table. If a term refers to data and/or information collected, then it is labelled with the letter "I" for information. For example, ambient geographic information is actual data collected from users and so the type is "I'. The second type reflects whether a term refers to a process or mechanism that can result in the generation of information-for example, a citizen science initiative. If so, this is denoted by the letter "P" for process in the last column. Figure  1 is an attempt to place all these terms into a single representation that separates out the different terminology for information from the process that can be used to generate it. It must be stressed that Figure 1 is a simplification, aiming to provide a summary of the broad general nature of the topic. Thus, while some of the content extracted from Wikipedia and data from social media can, for example, be georeferenced, the majority of this user-generated content is aspatial. However, some data from social media have been used as passive crowdsourced geographical information. Examples include, for instance, the use of photographs from Flickr for assessing the accuracy of Corine land cover [9] or the use of Twitter for determining whether earthquakes were felt [8]. We Examples include, for instance, the use of photographs from Flickr for assessing the accuracy of Corine land cover [9] or the use of Twitter for determining whether earthquakes were felt [8]. We refer to the righthand side of Figure 1 as "crowdsourced geographic information" as this term covers both Table 1. Terminology and definitions found in the literature arranged alphabetically. Type I refers to information generated, while P refers to a process-based term.

Terminology Definition Type
Ambient geographic information (AGI) This term first appeared in Stefanidis et al. [16] in relation to the analysis of Twitter data. AGI, in contrast to VGI, is passively contributed data in which the people themselves may be seen as the observable phenomena, rather than only as sensors. These observations can therefore help us to better understand human behavior and patterns in social systems. However, the focus can also be on the content of the data.
I Citizen-contributed geographic information (CCGI) CCGI was introduced in Spyratos et al. [13], where the definition is based on the purpose of the data collection exercise. CCGI therefore has two main components, i.e., information generated for scientific-oriented voluntary activities, i.e., VGI, or from social media, which they refer to as social geographic data (SGD Citizen science was the name of a book written by Alan Irwin in 1995 which discussed the complementary nature of knowledge from citizens with that of science [17]. Rick Bonney of Cornell's Laboratory of Ornithology first referred to citizen science in the mid-nineties [18] as an alternative term for public participation in scientific research although citizens have had a long history of involvement in science [19]. A more recent definition from the Green Paper on Citizen Science for Europe [20] reads as follows: "the general public engagement in scientific research activities when citizens actively contribute to science either with their intellectual effort or surrounding knowledge or with their tools and resources. Participants provide experimental data and facilities for researchers, raise new questions and co-create a new scientific culture. While adding value, volunteers acquire new learning and skills, and deeper understanding of the scientific work in an appealing way. As a result of this open, networked and trans-disciplinary scenario, science-society-policy interactions are improved leading to a more democratic research, based on evidence-informed decision making as is scientific research conducted, in whole or in part, by amateur or non-professional scientists." The idea of more "democratic research" and the democratization of GIS and geographic knowledge has recently been challenged in Reference [21], who argues that neogeography (see below for a definition) has opened up access to geographic information to only a small part of society (technologically literate, educated, etc.).

Collaborative mapping (2003)
Collaborative mapping is the collective creation of online maps (as representations of real-world phenomena) that can be accessed, modified and annotated online by multiple contributors as outlined in MacGillavry [22]. P Table 1. Cont.

Terminology Definition Type
Collaboratively contributed geospatial information (CCGI) CCGI is a precursor to the term VGI, meaning user contributed geospatial information, which appeared in Bishr and Kuhn [23] and again in Keßler et al. [24]. CCGI implies collaboration between individuals while VGI has more of an individual component based on the views of Goodchild-see the definition of VGI below.
Harvey [10] distinguishes between CGI and VGI where CGI refers to geographic information "that has been collected without the immediate knowledge and explicit decision of a person using mobile technology that records location" whereas VGI refers to geographic information collected with the knowledge and explicit decision of a person. In VGI, data are collected using an "opt-in" agreement (e.g., OpenStreetMap and Geocaching where users choose to actively participate) in contrast to contributed CGI where data are collected via an "opt-out" agreement (e.g., cell phone tracking, RFID-enabled transport cards, other sensor data). Since opt-out agreements are more open-ended and offer few possibilities to control the data collection, this has implications for quality, bias assessment and fitness-for-use of the data in later analyses or in visualization. Harvey [10] raises issues such as data provenance, potential reuse of the data, privacy (both of the data and the location of the individual) and liability as key concerns for CGI.

Crowdsourcing (2006)
Crowdsourcing first appeared in Howe [4] where it was defined as a business practice in which an activity is outsourced to the crowd. The word crowdsourcing also implies a low cost solution, the involvement of large numbers of people and the fact that it has value as a business model. A classic example of a business-oriented crowdsourcing site is Amazon Mechanical Turk, which provides micro-payments to participants for undertaking small tasks, e.g., classification and transcription tasks [25]. More recently, Estellés-Arolas and González-Ladrón-de-Guevara [26] examined 32 definitions of crowdsourcing in the literature to produce a single definition as follows: "Crowdsourcing is a type of participative online activity in which an individual, an institution, a non-profit organization, or company proposes to a group of individuals of varying knowledge, heterogeneity, and number, via a flexible open call, the voluntary undertaking of a task. The undertaking of the task, of variable complexity and modularity, and in which the crowd should participate bringing their work, money, knowledge and/or experience, always entails mutual benefit. The user will receive the satisfaction of a given type of need, be it economic, social recognition, self-esteem, or the development of individual skills, while the crowdsourcer will obtain and utilize to their advantage what the user has brought to the venture, whose form will depend on the type of activity undertaken." This definition emphasizes the online nature of the activity, which makes it narrower than other definitions in this table. Data collection in citizen science projects can be undertaken in the field or using paper forms. Moreover, not all crowdsourcing need be open to all but could be restricted geographically or to groups with certain expertise. Digital and educational divides also impose barriers on participation. Finally, crowdsourcing may not always entail mutual benefit if the data collected are then used for another purpose that differs from the one for which they were originally intended. P Table 1. Cont.

Terminology Definition Type
Extreme citizen science (2011) Extreme citizen science can be attributed to Muki Haklay and his team at UCL (Excites). Extreme citizen science is at level 4 (or the highest level) of participation in the typology presented in Haklay [27]. Level 4 refers to collaborative science where the citizens participate heavily in, or lead on problem definition, data collection and analysis. It conveys the idea of a "completely integrated activity . . . where professional and non-professional scientists are involved in deciding on which scientific problems to work and on the nature of the data collection so that it is valid and answers the needs of scientific protocols while matching the motivations and interests of the participants. The participants can choose their level of engagement and can be potentially involved in the analysis and publication or utilisation of results." Scientists have more of a role as facilitators or the project could be entirely driven and run by citizens.

Geocollaboration (2004)
First defined by MacEachren and Brewer [28] as "visually-enabled collaboration with geospatial information through geospatial technologies." Geocollaboration involves two or more people to solve a problem or undertake a task together involving geographic information and a computer-supported environment. Tomaszewski [29] emphasizes that geocollaboration is multidisciplinary in nature, drawing upon human-computer interaction, computer science and psychology, and that it is a subset of the more general computer-supported collaborative work. Citizen science with a geographic or spatial context. The term appears in Haklay's [27] chapter on typology of participation in citizen science and VGI. P GeoWeb (or GeoSpatialWeb) (or Geographic World Wide Web) (1994/2006) The GeoWeb is the merging of spatial information with non-spatial attribute data on the web, which allows for spatial searching of the Internet. The concept (but not the actual term) was first outlined by Herring [30]. MacGuire [31] describes the GeoWeb 2.0 as the next step in the publishing, discovery and use of geographic data. It is a system of systems (GIS clients and servers, service providers, GIS portals, standards, collaboration agreements, etc.), which is very much in line with the idea of GEOSS (Global Earth Observation System of Systems).
This term first appeared in a paper by Fischer [32]. iVGI is defined as georeferenced data that have not been voluntarily provided by the individual and could be used for many purposes including mapping but also for more commercial applications such as geodemographic profiling. These type of data are usually generated in real-time from various kinds of social media.
The term "hacker" has been used to refer to someone who tries to break into a computer system. A more positive use of the term is someone who can devise a clever solution to a programming problem; someone who generally enjoys programming; or someone who can appreciate good "hacks" [33]. The term "Map hacking" has been used quite specifically in relation to computer/video games in which a player executes a program that allows them to bypass obstacles or see more of what they should actually be allowed to see-essentially a type of cheating [34]. However, a positive usage relates to creating creative and useful solutions with digital maps, e.g., see the book called "Hacking Google Maps and Google Earth" or "Google Maps Hack" or "Mapping Hacks: Tips & Tools for Electronic Cartography". Hackathons such as "Random Hacks of Kindness" have resulted in geospatial solutions in the area of post-disaster response. Appathons are now appearing with a particular emphasis on developing mobile applications. P Table 1. Cont.

Terminology Definition Type
Mashup (1999 or around time of Web 2.0) The term mashup was borrowed from the music industry where it originally denoted a piece of music that had been created by blending two or more songs. In a geographic context, a mashup is the integration of geographic information from sources that are distributed across the Internet to create a new application or service [35]. Mashup can also refer to a digital media file that contains a combination of elements including text, maps, audio, video and animation, to effectively create a new, derivative work for the existing pieces. Neogeography has been defined by Turner [3] as the making and sharing of maps by individuals, using the increasing number of tools and resources that are freely available. Implicit in this definition is the movement away from traditional map making by professionals. The definition of neogeography by Szott [36,37] [38] but is most likely older) PPSR was reviewed by Bonney et al. [40] in relation to informal science education. PPSR is defined as "public involvement in science including choosing or defining questions for the study; gathering information and resources; developing hypotheses; designing data collection and methodologies; collecting data; analyzing data; interpreting data and drawing conclusions; disseminating results; discussing results and asking new questions." Bonney et al. [40] categorize PPSR projects into three main types: contributory (mostly data collection); collaborative (data collection and refining project design, analyzing data, disseminating results; and co-created (designed together by scientists and the general public where the public inputs to most or all of the steps in the scientific process). PPSR appears to be equivalent to citizen science, with the typology defined by Bonney et al. [40] mapping fairly closely onto that of Haklay [27].

P Public Participation
Geographic Information Systems (PPGIS) (1996) The term PPGIS (Public Participation Geographic Information Systems) has its origins in a workshop organized by the National Center for Geographic Information and Analysis (NCGIA) in Orono, Maine USA, on 10-13 July 1996. PPGIS are a set of GIS applications that facilitate wider public involvement in planning and decision making processes [41]. PPGIS has been identified as relevant in processes of urban planning, nature conservation and rural development, among others. P First coined by Goodchild (2007), VGI is defined as "the harnessing of tools to create, assemble, and disseminate geographic data provided voluntarily by individuals". In Schuurman (2009), Goodchild argues that crowdsourcing implies a kind of consensus-producing process and the assumption that several people will provide information about the same thing so it will be more accurate than VGI. VGI, on the other hand, is produced by individuals without any such opportunity for convergence. Elwood et al. (2012) define VGI as spatial information that is voluntarily made available, with an aim to provide information about the world.

I Web mapping (Mid-nineties)
A term used in parallel with the development of web-based GIS solutions, which has recently evolved to mean "the study of cartographic representation using the web as the medium, with an emphasis on user-centered design (including user interfaces, dynamic map contents, and mapping functions), user-generated content, and ubiquitous access" and appears in Tsou [46].

P Wikinomics (2006)
The name of a book by Tapscott and Williams [47], wikinomics embodies the idea of mass collaboration in a business environment. It is based on four principles: (a) openness; (b) peering (or a collaborative approach); (c) sharing; and (d) acting globally. The book itself is meant to be a collaborative and living document that everyone can contribute to. P

Temporal Analysis of the Literature
The abstracts of 25,338 scientific papers, published between 1990 and 2015, which contained any of the terms listed in Table 1 in their title, keywords or abstract were downloaded from Scopus. The data were cleaned to remove English stopwords (conjunctions, pronouns etc.), numbers, punctuation, whitespaces and any words less than three characters long. The words were then stemmed, which is the process of establishing common etymological roots. For example, propose and proposal have the same stem of propos. The cleaned and stemmed abstracts were then organized into a corpus of 24 documents based on the year of publication. Figure 2 summarizes the frequency of their use, updating an initial analysis of such trends in Reference [48]. As expected, terms that describe more general crowdsourcing activities are more frequently used in contrast to GI specific ones but a number of specific temporal trends are evident: the steady rise of User-generated Content and Citizen Science, the long term, steady increase of Swarm Intelligence, the rise and perhaps fall of Mashups and the recent and intense rise of Crowdsourcing.
of the terms listed in Table 1 in their title, keywords or abstract were downloaded from Scopus. The data were cleaned to remove English stopwords (conjunctions, pronouns etc.), numbers, punctuation, whitespaces and any words less than three characters long. The words were then stemmed, which is the process of establishing common etymological roots. For example, propose and proposal have the same stem of propos. The cleaned and stemmed abstracts were then organized into a corpus of 24 documents based on the year of publication. Figure 2 summarizes the frequency of their use, updating an initial analysis of such trends in Reference [48]. As expected, terms that describe more general crowdsourcing activities are more frequently used in contrast to GI specific ones but a number of specific temporal trends are evident: the steady rise of User-generated Content and Citizen Science, the long term, steady increase of Swarm Intelligence, the rise and perhaps fall of Mashups and the recent and intense rise of Crowdsourcing. Here, the analysis of temporal trends in the use of key terms is extended with a focus on their relative search volumes with Google Trends.

Google Trends Analysis
The Google Trends website [49] allows for the examination of relative search volumes of terms over time. This analysis serves to illustrate trends in popularity of terms that are more mainstream than academic and is an indicator of movements from the academic literature to more layman outlets, e.g., through media and into popular science. Figure 3 shows the trends for the terms crowdsourcing and citizen science together. Both terms were first searched with sufficient volume using Google's search engine during 2006, and both show an increase in search volume over time to reflect an increasing interest in these subjects. Crowdsourcing, compared to citizen science, has much larger search volumes, but this is unsurprising given the commercial interest in crowdsourcing as a business model. The large peak in the term crowdsourcing coincides with large-scale efforts by citizens to Here, the analysis of temporal trends in the use of key terms is extended with a focus on their relative search volumes with Google Trends.

Google Trends Analysis
The Google Trends website [49] allows for the examination of relative search volumes of terms over time. This analysis serves to illustrate trends in popularity of terms that are more mainstream than academic and is an indicator of movements from the academic literature to more layman outlets, e.g., through media and into popular science. Figure 3 shows the trends for the terms crowdsourcing and citizen science together. Both terms were first searched with sufficient volume using Google's search engine during 2006, and both show an increase in search volume over time to reflect an increasing interest in these subjects. Crowdsourcing, compared to citizen science, has much larger search volumes, but this is unsurprising given the commercial interest in crowdsourcing as a business model. The large peak in the term crowdsourcing coincides with large-scale efforts by citizens to search for the missing Malaysian Airplane (flight MH370); over two million people helped search for the missing aircraft by analyzing satellite images [50]. The rest of the search terms from Table 1 were then put into the Google Trends application. Some search terms do not register a trend with sufficient search volume to generate a graph, including neogeography or terms that are generally more restricted to the academic literature. The term mashup(s) shows considerable search volume but is not displayed here, since mashup is a generic term for integrating data from different sources and can apply to non-geographic applications such as those connected with music or video and, therefore, goes beyond the realm of just spatial mashups that are relevant to this article.
search for the missing Malaysian Airplane (flight MH370); over two million people helped search for the missing aircraft by analyzing satellite images [50]. The rest of the search terms from Table 1 were then put into the Google Trends application. Some search terms do not register a trend with sufficient search volume to generate a graph, including neogeography or terms that are generally more restricted to the academic literature. The term mashup(s) shows considerable search volume but is not displayed here, since mashup is a generic term for integrating data from different sources and can apply to non-geographic applications such as those connected with music or video and, therefore, goes beyond the realm of just spatial mashups that are relevant to this article. Terms such as GeoWeb and web mapping pre-date 2005, which is around the time when Google Trends started. The web volume for the term GeoWeb was much higher than crowdsourcing, until the last few years when the search volumes are similar (Figure 4). Web mapping shows a steady decline over time and since 2010 is searched much less frequently than the other two terms.
The terms VGI, collaborative mapping and participatory sensing show very small search volumes with minor peaks of activity. These peaks might be linked to times of year when students have searched for references to complete course work or the occurrence of conferences and workshops at specific times of the year. However, when compared with terms such as citizen science, the search volumes of these terms are an order of magnitude lower ( Figure 5). This is similar to what was found in the semantic analysis, with low frequencies registered for VGI, collaborative mapping and participatory sensing. . Trend in the term "GeoWeb" (in blue) compared with "crowdsourcing" (in red) and "web mapping" (in yellow) over time. The y-axis is a relative volume expressed between 0 and 100 where the maximum search volume is set to 100. Terms such as GeoWeb and web mapping pre-date 2005, which is around the time when Google Trends started. The web volume for the term GeoWeb was much higher than crowdsourcing, until the last few years when the search volumes are similar (Figure 4). Web mapping shows a steady decline over time and since 2010 is searched much less frequently than the other two terms.
goes beyond the realm of just spatial mashups that are relevant to this article. Terms such as GeoWeb and web mapping pre-date 2005, which is around the time when Google Trends started. The web volume for the term GeoWeb was much higher than crowdsourcing, until the last few years when the search volumes are similar (Figure 4). Web mapping shows a steady decline over time and since 2010 is searched much less frequently than the other two terms.
The terms VGI, collaborative mapping and participatory sensing show very small search volumes with minor peaks of activity. These peaks might be linked to times of year when students have searched for references to complete course work or the occurrence of conferences and workshops at specific times of the year. However, when compared with terms such as citizen science, the search volumes of these terms are an order of magnitude lower ( Figure 5). This is similar to what was found in the semantic analysis, with low frequencies registered for VGI, collaborative mapping and participatory sensing.  . Trend in the term "GeoWeb" (in blue) compared with "crowdsourcing" (in red) and "web mapping" (in yellow) over time. The y-axis is a relative volume expressed between 0 and 100 where the maximum search volume is set to 100.
The terms VGI, collaborative mapping and participatory sensing show very small search volumes with minor peaks of activity. These peaks might be linked to times of year when students have searched for references to complete course work or the occurrence of conferences and workshops at specific times of the year. However, when compared with terms such as citizen science, the search volumes of these terms are an order of magnitude lower ( Figure 5). This is similar to what was found in the semantic analysis, with low frequencies registered for VGI, collaborative mapping and participatory sensing. Figure 5. Trend in the term "citizen science" (in blue) and the phrases "collaborative mapping" (in red), VGI (in yellow) and participatory sensing (in green) over time. The y-axis is a relative volume expressed between 0 and 100 where the maximum search volume is set to 100.

The Current State of Crowdsourced Geographic Information
To evaluate the current state of crowdsourced geographic information (which we use here as an umbrella term to include the different terms available), a review was undertaken of existing websites and mobile applications that involve the collection of any type of georeferenced information. The starting point for this review was VGI-Net [14], which was compiled by researchers at the University of California, Santa Barbara, the Ohio State University and the University of Washington in 2011. VGI-Net has not been maintained regularly, so hence the first task was to eliminate sites from the inventory that were no longer in operation (which was roughly half of the original sites on VGI-Net), keep those sites that were still operating, and then add new sites that have emerged since 2011. This resulted in approximately 100 sites and/or applications that have been reviewed. This review is not intended to be comprehensive, since sites and applications are changing all the time. Rather it is intended to provide a large enough sample from which to draw general conclusions about the current state of crowdsourced geographic information. These sites were then evaluated based on a series of criteria, as described below.

Theme
At the highest level, the sites and applications can be divided into three main types: (i) those that allow users to create and share a map; (ii) those that collect georeferenced data; and (iii) high level data sharing websites contributed by experts but which may include citizen-collected data. Of the roughly 100 sites reviewed, 12 were of the first type and four were of the third type. Therefore, the majority of sites/applications were focused on data collection, and these were further categorized by subject as outlined in Table 2.
The most frequent category of website was in the area of ecology (e.g., species identification), even though the websites and applications reviewed here represent only a small proportion of all the citizen science and crowdsourcing projects that are currently within the field of ecology, biology and nature conservation. Meta-sites maintained by Cornell University and SciStarter list many more that have not been reviewed here. This large number is unsurprising given the very long history of citizen science in these fields, stretching back decades and even centuries before the advent of the Internet [51,52]. Table 2. Subject area of crowdsourced geographic information sites in the review.

Subject Description
Communications Providing IP addresses, mobile cell ids, wireless networks

Crime/Public Safety Map showing reported crimes
Disasters (natural and man-made) Mapping after a natural or manmade disaster Figure 5. Trend in the term "citizen science" (in blue) and the phrases "collaborative mapping" (in red), VGI (in yellow) and participatory sensing (in green) over time. The y-axis is a relative volume expressed between 0 and 100 where the maximum search volume is set to 100.

The Current State of Crowdsourced Geographic Information
To evaluate the current state of crowdsourced geographic information (which we use here as an umbrella term to include the different terms available), a review was undertaken of existing websites and mobile applications that involve the collection of any type of georeferenced information. The starting point for this review was VGI-Net [14], which was compiled by researchers at the University of California, Santa Barbara, the Ohio State University and the University of Washington in 2011. VGI-Net has not been maintained regularly, so hence the first task was to eliminate sites from the inventory that were no longer in operation (which was roughly half of the original sites on VGI-Net), keep those sites that were still operating, and then add new sites that have emerged since 2011. This resulted in approximately 100 sites and/or applications that have been reviewed. This review is not intended to be comprehensive, since sites and applications are changing all the time. Rather it is intended to provide a large enough sample from which to draw general conclusions about the current state of crowdsourced geographic information. These sites were then evaluated based on a series of criteria, as described below.

Theme
At the highest level, the sites and applications can be divided into three main types: (i) those that allow users to create and share a map; (ii) those that collect georeferenced data; and (iii) high level data sharing websites contributed by experts but which may include citizen-collected data. Of the roughly 100 sites reviewed, 12 were of the first type and four were of the third type. Therefore, the majority of sites/applications were focused on data collection, and these were further categorized by subject as outlined in Table 2.  The most frequent category of website was in the area of ecology (e.g., species identification), even though the websites and applications reviewed here represent only a small proportion of all the citizen science and crowdsourcing projects that are currently within the field of ecology, biology and nature conservation. Meta-sites maintained by Cornell University and SciStarter list many more that have not been reviewed here. This large number is unsurprising given the very long history of citizen science in these fields, stretching back decades and even centuries before the advent of the Internet [51,52].
Other categories with multiple sites (i.e., greater than 5) include environmental monitoring; location-based social media, where location plays a pivotal role in the social networking function such as sites for connecting people based on proximity; sites of interest/travel with sharing of geo-tagged photographs, videos and travel stories; transport including sites like OpenStreetMap for digitizing roads; and weather data, which covers amateur weather stations, snow depth and avalanche reporting. Disaster mapping is another category that is probably under-represented in this review, since sites tend to appear during events and then disappear post-event, or because contributors are often recruited on the ground and mapping takes place internal to organizations. However, there are at least three permanent sites that are noteworthy, i.e., Ushahidi [53], which is a platform to allow people in affected areas to upload and view georeferenced information online, Tomnod [54] for crowdsourced damage mapping, and the humanitarian arm of OpenStreetMap [55].
Although it is not possible to readily characterize sites by data volumes or number of transactions, there is a relationship between the way that the data are subsequently provided to the public (if access is open) and the amount of data collected. For example, the largest data volumes are often served using APIs (Application Programming Interfaces) as evidenced by sites such as OpenStreetMap, Geograph, Flickr and Twitter. Data volumes also tend to be higher in passively collected geographic information, notably, for instance, in relation to communications, location-based social media, or where sensors are used to collect data such as with transport, weather, hiking, or any site where there is a mobile application to facilitate data collection using mobile-phones or tablets, which is common in many ecology related applications.

Nature and Types of Crowdsourced Geographic Information
If crowdsourced geographic information is taken to mean any data contributed by the crowd with a geographical reference that could potentially be mapped, the nature of the data can be characterized based on whether it falls into the territory of mapping agencies (or framework data) in the first dimension or axis as shown in Figure 6. Framework data are typically data that are collected by government agencies, which can be organized into the following themes: geodetic control, orthoimagery, elevation, transportation, hydrography, governmental units and cadaster, and which comprise the basic components of a spatial data infrastructure (SDI) [15]. Depending on the country, these datasets may vary (e.g., some countries do not have cadasters, while others may include a gazetteer as part of their SDI). In the second dimension, crowdsourced geographic information can be classified according to whether the data are contributed actively as part of a crowdsourcing system/campaign (hereafter referred to as active crowdsourced geographic information), or whether the data were collected for another purpose and were then mapped (hereafter referred to as passive crowdsourced geographic information).

Nature and Types of Crowdsourced Geographic Information
If crowdsourced geographic information is taken to mean any data contributed by the crowd with a geographical reference that could potentially be mapped, the nature of the data can be characterized based on whether it falls into the territory of mapping agencies (or framework data) in the first dimension or axis as shown in Figure 6. Framework data are typically data that are collected by government agencies, which can be organized into the following themes: geodetic control, orthoimagery, elevation, transportation, hydrography, governmental units and cadaster, and which comprise the basic components of a spatial data infrastructure (SDI) [15]. Depending on the country, these datasets may vary (e.g., some countries do not have cadasters, while others may include a gazetteer as part of their SDI). In the second dimension, crowdsourced geographic information can be classified according to whether the data are contributed actively as part of a crowdsourcing system/campaign (hereafter referred to as active crowdsourced geographic information), or whether the data were collected for another purpose and were then mapped (hereafter referred to as passive crowdsourced geographic information). Figure 6. Types of crowdsourced geographic information from the review characterized by framework/non-framework and active/passive. Crowdsourced geographic information in blue comes from other sources, e.g., academic publications. Figure 6 summarizes the current types of crowdsourced geographic information from the review by category, based on where they fall within the quadrants of these two dimensions. Types of crowdsourced geographic information that were not encountered in the review but which come from other sources such as from academic publications were also added in blue. Figure 6 aims only to provide a simple generalization of the situation. There are, for example, a wide variety of weather related citizen science projects which could occupy different locations in the space depicted in Figure  6; see for example, [56].

Expertise and Training
The sites that were reviewed were then evaluated based on the amount of expertise required of the participants and the amount of training available. As this applies primarily to active Figure 6. Types of crowdsourced geographic information from the review characterized by framework/non-framework and active/passive. Crowdsourced geographic information in blue comes from other sources, e.g., academic publications. Figure 6 summarizes the current types of crowdsourced geographic information from the review by category, based on where they fall within the quadrants of these two dimensions. Types of crowdsourced geographic information that were not encountered in the review but which come from other sources such as from academic publications were also added in blue. Figure 6 aims only to provide a simple generalization of the situation. There are, for example, a wide variety of weather related citizen science projects which could occupy different locations in the space depicted in Figure 6; see for example, [56].

Expertise and Training
The sites that were reviewed were then evaluated based on the amount of expertise required of the participants and the amount of training available. As this applies primarily to active crowdsourced geographic information, only those sites belonging to the categories on the right-hand side of Figure 6 were considered.
In general, it was found that most sites require very little expertise in order to participate, except for Internet and mobile phone literacy. Many sites involve filling in a simple online form, where location is indicated on a map interface or latitude longitude coordinates are input manually. Note that even though the form may be easy to fill out, the collection of the actual information may not be.
Other sites involve capturing the information using a mobile application, and uploading photographs and comments, so that spatial coordinates are automatically captured. These sites are at the most basic level and, therefore, often provide little in the way of training materials.
At the next level are sites where users must become familiar with how to characterize different phenomenon (e.g., different types of weather, recognizing different features in satellite sensor imagery etc.) and these sites tended to have some form of training material such as online instructions, videos and/or FAQs, which users could consult. Although the expertise required is minimal, involvement still requires a small learning effort on the part of the participants. Some sites did this more effectively than others.
At the highest level are some of the sites in the hiking/trails, ecology and weather categories. For the hiking/trails category, familiarity with the use of a global positioning system (GPS) is required. While one site had good training materials others did not. In the case of ecological sites, greater expertise is required for those sites that follow strict protocols in data collection, and in a few cases, these require physical attendance at a training session. In the case of the weather category, amateur weather stations require knowledge about installation, which must be in accordance with certain principles in order to satisfy quality concerns. The availability of training materials was, therefore, generally found to be a function of the difficulty of the task and/or whether the data collected were used subsequently for research or other authoritative purposes such as assimilation of weather data into a numerical weather prediction model. Training materials for these higher level sites were either extensive or designed to ensure minimum standards in data quality.

Crowdsourced Geographic Information Availability and Metadata
Data availability varied across the sites from unavailable (used only internally), only available to those people who contributed, only available to those who have registered and logged in, or more broadly open to everyone. Within these different levels of access, data were available for viewing on a map interface, available for downloading in a variety of formats (notably CSV, KML, KMZ, XML, Atom, GPX), and available via an API. For some sites, the data available were the raw data contributed by individuals, while in other examples this was only the aggregated data from multiple contributors. Some of the sites in the communications, feature mapping, geocaching, location-based social media, sites of interest/travel and transport categories were available via APIs, which reflects those sites with considerable data volumes and demand for the data.
Metadata, in the sense of standards such as those associated with the European Infrastructure for Spatial Information in the European Community (INSPIRE) directive (which requires that member states of the European Union comply with implementing specific rules for metadata), were not mentioned in any of the sites with the exception of one map creation and sharing site called Geocommons. The latter provides the option of sharing the contributed data with metadata that are compliant with ISO19115, a metadata standard for describing geographic information and services.
Metadata, in the sense of documentation of the data, are provided to some degree by all sites that offer access via an API, and to various degrees for other sites that offer the data in other downloadable formats. Some of the downloaded data files were well documented, while others expected users to interpret the headers of the data or the data themselves. Moreover, sites with strong data collection protocols were well documented in terms of metadata, and higher level data sharing sites require detailed metadata with each data set shared via the site.

Quality and Use of the Data for Research
Given that citizens may vary greatly in expertise and often collect data without regard to established protocols or standards, there is often considerable concern about the quality and usability of the data. The quality of citizen derived data can be viewed from a variety of ways [57]. Many comparative studies have shown that crowdsourced geographic information can be as good, if not better, than data from authoritative sources [58,59]. A comprehensive literature overview of the latest developments in crowdsourced geographic information research is presented in Reference [60], with a focus on trends related to OpenStreetMap while many others have discussed the quality of this volunteer data source [61][62][63]. Of the topics selected by the authors for future research, they emphasize the areas of: Intrinsic data quality assessment, conflation methods which combine crowdsourced geographic information and other data sources, and the development of credibility, reputation, and trust methodologies for crowdsourced geographic information. Data quality remains a topic of great interest and importance in this domain. In their work, the authors of [64] conclude that there is a trade-off between potentially improved data quality of crowdsourced geographic information and the requirement of facilitation and oversight which is resource intensive. Introducing overly burdensome structures to ensure quality could damage the potential contributions from related socially-conscious and citizen-focused data collection and mapping efforts. A review is provided in Reference [65] of the distinct types of citizen science projects and the expectations on the quality of the information they deal with, and in particular the quality of crowdsourced geographic information in those projects. They go on to propose an innovative model based on linguistic decision making for assessing the quality of a crowdsourced geographic information database created in citizen science projects. The authors build this model from the understanding that quality depends on several factors, both extrinsic and intrinsic, but also pragmatic, depending on the intended purpose and user needs, and so a flexible quality assessment method is necessary.
For the majority of sites, it is difficult to establish whether there is any quality control. In other words, quality control may be occurring in the background but may not be apparent from viewing the site alone. Thus, based on a review of what was apparent from the sites alone, most would appear to have no quality control. For those sites where some quality control is in place, this included one or more of the following: automated methods of checking (e.g., answers that fall outside an acceptable range); peer review, which could include comments, actual involvement in the validation process or ranking of the participant (see next item); ranking of participants, whether through an automated procedure or by other users, which may then influence the level of confidence in the contributions provided by the users; use of multiple observations at the same site as a cross-checking mechanism; and review by experts.
There are examples where some minimal qualifications are required (e.g., in some disaster mapping sites such as GEOCAN, a minimum number of years of remote sensing experience are required), which is checked in the registration process. However, the assignment of a reliability score to a user based on his/her experience, or to double-check any submission by a relative novice, does not seem to be commonly undertaken [66].
The greatest evidence of quality control, however, was in sites related to ecology and weather, although map creation sites such as OpenStreetMap and Google MapMaker have a range of quality assurance measures including automated checking, peer review and use of multiple observations. Greater attention to quality was apparent for sites where the data are used for scientific research, with evidence of publications. However, publications using the data were also listed on websites where no quality control was explicitly mentioned.
It is important to note that data quality is traditionally constrained to precise and accurate locations. For some applications and even scientific studies, the data quality issue may not be a problem at all; in other words, the fitness-for-use of the data will depend on the context, which must be well-defined by the potential user. Data quality is an issue when the data are scarce, but some authors argue that it will become less of an issue in the era of big data [67]. For example, the authors of [68] took entire country street networks of France, Germany and the UK and found that while the street networks are incomplete, especially in rural regions, this constituted only a minor problem for their particular study, which aimed to identify scaling patterns in street blocks, because the available data offered millions of street blocks for the countries under study.
Data quality is likely to remain an important topic in crowdsourced geographic information research for some time. A diverse range of approaches exist for quality assessment and control, e.g., [69][70][71], and guidelines for some applications are emerging [72].

Information about Participants
The sites can be categorized into three types based on the information they obtain about contributors: (i) no registration required, so no tracking of observation with the individual; (ii) registration required but only name and email entered; and (iii) registration required with additional information collected such as address, organization, age, level of expertise, motivation behind participation or registration via a social networking site such as Facebook, which implies additional information is retrievable from participants. The majority of sites reviewed fell into the first two categories, which implies that very little analysis of the crowdsourced geographic information can be undertaken in relation to the background of individuals. Some exceptions include research on contributors to OpenStreetMap [73] and Geo-Wiki [58].

Incentives for Participation
Understanding the motivations of citizen participants in the crowdsourcing of geospatial data remains one of the principal topics in current and future research. What are the ingredients for a successful crowdsourcing project and how are they achieved and maintained? As some research results are demonstrating, crowdsourcing of geospatial data is sometimes best seen as complementary to professional approaches rather than being considered as a direct competitor or replacement to these traditional approaches. Hence, motivation may be to enhance authoritative data sets rather than replace them, although it has the potential to be a competitor to established public and commercial sources of geographic information [74].
Looking at the sites of active crowdsourced geographic information, two generic incentives for participation can be identified: (i) being part of a good cause or contributing to the greater good, which often involves a one-way information flow (e.g., damage mapping) and (ii) gaining something tangible from the site such as information about traffic problems, evidence of response to reporting of waste/environmental problems, different kinds of advice, access to data, or geocached treasures, which often involve a two-way information flow. In both cases, but only evident in a much smaller number of the sites reviewed, are the use of additional incentives integrated into the contribution process such as social elements like discussion forums, gamification (e.g., through leaderboards and prizes), recognition of effort through achievement levels and interaction with experts. Sites that appear to be less successful, evidenced by a lack of recent contributions, are those that offer only the first type of incentive. An obvious exception to this generalization is OpenStreetMap or GoogleMapMaker, both of which are very successful, but where motivations for participation are not so easily explained. More studies into participant motivation are needed, as suggested by the authors of [75], so that we can understand which crowdsourcing managerial control features such as reward systems, different level of collaboration, voting and commenting or trust-building systems are required to deliver innovative, problem-solving types of crowdsourcing.

Discussion and Conclusions
A range of terms to describe the general subject area of citizen-derived geographic information exist and have been used variably over time. Similarly, there are a wide range of Internet sites that, in one way or another, use citizen-derived geographic information. Based on the review of sites, it is clear that most of the crowdsourced geographic information is actively contributed, which implies that motivation, incentives and community building are important considerations in terms of sustainability; a bias to active involvement would be expected given the set of sites selected and passive activity is less apparent. The majority of sites rely on participation because of the desire to aid a greater cause or for a worthy reason as the overarching incentive rather than more tangible incentives. This may have implications for successful and sustainable crowdsourced geographic information collection. Some sites operate for a finite period of time, such as the site on fracking which completed the work for a given state in the USA, or some disaster related sites where the focus is on campaign(s) associated with specific events that eventually end.
The majority of sites do not collect very much information about participants. This may make participation easier but it means that very little research can be undertaken on the relationships between participation, data quality and demographics, or on the understanding of motivational factors. The lack of information on participants may also make it difficult to develop and target training activity. Training provision varied from site to site but the amount of training material provided is a function of the difficulty of the task and the end use of the data (e.g., if for research or other authoritative uses). The lack of information on participants may also hamper some approaches to quality assurance as the background and expertise of the contributors and hence inferred quality of their data is unknown.
Very few sites are focused on the collection of framework data, which is of relevance to national mapping agencies, yet the latter have a strong and growing interest in the use of crowdsourced geographic information [76]. In addition, metadata standards are only mentioned as an option on one site. Research into metadata for crowdsourced geographic information is required, which could build on work such as that undertaken in Reference [77]. Sites that use the data for research or assimilation into models, on the other hand, are strongly driven by data collection protocols. The literature review also indicates that the crowdsourcing of geospatial data is often most suited to complementing professional approaches and that research into conflation methods should be a key area of future research. Minimum data collection protocols that would facilitate the use of crowdsourced geographic information in a way that could complement authoritative data should be developed. Although the free tagging of OpenStreetMap could be viewed as an inclusive approach, it also means that there are inconsistencies in the way that the data are tagged, which limits their use for some applications (e.g., land cover and land use mapping). Further work should also consider what conflationary methods have already been tested and where further developments need to take place. Data collection protocols could be built into the incentivisation strategy of crowdsourced geographic information sites, which is one further area of potential research. The literature also emphasizes the idea of using crowdsourced geographic information for change detection and/or low specification mapping tasks, leaving the more static baseline data to professional data collection.
Information about data quality is lacking on the majority of sites reviewed, but this may still be happening behind the scenes. Some data quality measures were evident when the data were then used for further scientific research, although this was not always the case. The literature highlights quality assurance as a key area for research, e.g., [23,[78][79][80], as well as the development of credibility, reputation and trust methodologies for crowdsourced geographic information such as discussed in Reference [81]. Fortunately, a variety of broad approaches for quality assessment and control exist [70], although further work is required if the full potential of citizen-derived data is to be achieved.
Most citizen science and crowdsourcing projects are focused on growing the number of users and the volume of the data since these factors can significantly influence the value of the service. This may bring with it challenges, not least associated with the size and heterogeneity of the data obtained. Indeed, the size, volume, and quantity of data from citizen-orientated approaches to geospatial data generation and collection poses major problems for the extraction of knowledge from these data streams and their subsequent storage, visualization and interpretation [82]. Aspects such as visualization could, however, also play a key role in helping to make sense of increasing data streams of crowdsourced geographic information. The actual usefulness of the geographic data collected was rarely addressed explicitly; one aspect of the research agenda set out in Reference [83] calls for a greater understanding of the different use cases for crowdsourced geographic information, particularly to better understand those applications/domains to which it could contribute the most.
Most of the sites focus only on collecting the data, which are then most often available through their website. It is rare that the projects also provide tools to easily and indiscriminately share the data. Even in those examples where services for sharing the data are provided, they are mostly in the form of "widgets" or "snippets" to integrate the data in a predefined form into another website, severely limiting the possibilities of data reuse (and thus the real power of crowdsourcing). The example of "real" sharing is obviously OpenStreetMap. Examples of predefined and limited sharing are Google MapMaker (only possible with the integration of Google Maps, which has server limitations) and Panoramio (only possible to include specific images). For those sites with large data volumes and demands for data, APIs are used to provide access. This represents the most ideal solution from a database point of view, yet the majority of sites serve the data in flat file formats (or not at all).
This review has not addressed all of the issues relevant to citizen sensing in the geographical information domain. Below is a list of areas that require further research beyond those already mentioned in the discussion above:

‚
There is a need to gain a better understanding of the currency of the data. This issue is critical for integration of crowdsourced geographic information with authoritative sources, particularly if crowdsourced geographic information is to be used for change detection. Crowdsourced geographic information is often assumed to be more current than the framework equivalent, but it is not always clear whether this is the case and requires further study.
‚ Investigation of how the interrelationships between terms in Table 1 have changed and evolved over time could be undertaken since phrases come in and out of fashion or they become synonyms for related but different activities. This would require an extension of this research into the domain of semiotics, for example, to develop a semantic or text mining analysis of the similarity of the changing contexts within which terms are used and is the subject of a research topic on its own.

‚
More research into incentives for participation and citizen motivations is required. More use of online surveys, see for example, [84,85], may help to better meet the needs of citizens in the future. For example, how can citizens be encouraged to map an area that has already been mapped in the last few years or be more actively engaged in change detection mapping?
‚ Issues of copyright, ownership, data privacy and licensing will become much more prevalent in the future as data contributed by citizens is integrated with base layers that are created by third parties. Saunders et al. [86] consider the licensing and copyright issues from a Canadian legal perspective when using a range of online current mapping tools. Data privacy laws vary from country to country but generally require the protection of personal information, i.e., information that could allow people to be identified [87]. However, location-based information can reveal personal information that could be disclosed without consent if the users of the data are not careful in how the data are subsequently employed [88]. Ethical issues surrounding the use of crowdsourced geographic information with respect to health and disease surveillance have been raised by Blatt [89] so this is a growing area where further research is needed.

‚
Data interoperability was not considered in the above review of websites but if the data are to be used in future projects or for different purposes than those for which the data were originally collected, more research into data standards for crowdsourced geographic information is required. Work is ongoing in this area within the COBWEB citizen observatory project [90] while the authors of [91] have presented a unified model for semantic interoperability of sensor data and VGI.
In summary, this article has clarified the meaning of various terms used in discussion of what we have collectively referred to as crowdsourced geographic information. We have shown the variation in the usage of terms and provided a snapshot of some of the issues connected with contemporary geographical information sites on the Internet. The subject is expected to grow and evolve in the future. Developments in data mining and knowledge discovery may increase the role of passive contributions.
Related to this, the volume and diversity of data sets may grow, requiring developments in relation to issues such as data quality assessment, visualization, data harmonization and metadata. The citizens themselves may also feature more prominently with increased attention to their motivation, training and general involvement in tasks increased by activities such as feedback on contributions and on how the data are used. Linked to this, a suite of legal and ethical issues such as those connected to data ownership, responsibility and privacy will require increasing attention.