Designing a Volunteered Geographic Information System for Road Data Validation

: The objective of this work is to build a Volunteered Geographic Information System (VGIS) using a methodological design process. The VGIS design focuses on coordinating its three main components—project (organization), participants (community), and technological infrastructure—by aligning the project goal, crowdsourcing strategy and participation environment, the drivers and mechanisms that motivates volunteers, and the technological and data management tools that facilitate engaged participation. Following this process helped to design a solution based on the project’s information requirements to handle a road data tagging task, while offering an experience that meets the interests and needs of potential participants.


Introduction
Web mapping and location-based services and tools, together with mobile networked devices equipped with several sensors, and the public access to global navigation satellite systems, facilitate the collection and online processing of geospatial data by almost anyone, allowing a new type of information from citizens on the Geoweb: volunteered geographic information (VGI) [1]. Following a crowdsourcing process to aggregate and spatially converged data, it is possible to build information systems that coordinate the public participation of online communities to collectively create VGI, while contributing to a common goal [2,3]. Crowdsourcing, the voluntary completion of tasks by individuals online, takes advantage of human distributed intelligence and collective knowledge to create value that benefits participants and organizations [4][5][6]. Obtaining VGI based on an information system implies following a methodological design approach by evaluating several interdependent aspects [3]. This article aims to define the high-level characteristics of different components to build a VGIS to discover changes in the territory through a data tagging task. It focuses on how to design a VGIS following a methodological process to create a solution based on the project's information requirements, while offering an experience that meets the interests and needs of potential participants. Section 2 introduces the first considerations of the VGIS design solution, Section 3 characterizes the system functional components, and Section 4 presents some general conclusions and future work.

Project
Through executing a project, the organization stablishes a clear goal, defines data requirements, and participants' actions necessary to obtain or process data, thereby designing a crowdsourcing process. This project (by the Technical University of Madrid; UPM in Spanish) concerns to discover changes in the territory using aerial images (orthophotos) from the National Plan for Aerial Orthophotography (PNOA in Spanish) and digital geographic information from the National Topographic Map (MNT in Spanish) at a scale of 1: 25,000. Seeks to involve an online community to visually identify and compare the information available in these resources using web map services (WMS). The goal is to collect a large set of input data (i.e., attribute data tags) to train machine learning algorithms to automatically update a geographic feature of the MNT25 vector files, in a first case, road basic cartography. The community creates attribute data to characterize a geographic area using various tags such as the presence or absence of a road, type of road (if any), and tags to validate the accuracy of the existing road cartography.

Crowdsourcing Strategy
The crowdsourcing process design focuses on participation and level of involvement/engagement that a project requires as first criterion [3]. Volunteers' contributions are generally distinguished between passive and active, or even pro-active, according to the ways of organizing the community and the task characteristics. Community involvement can be based on contributory, collaborative, or participatory levels of participation [3]. The mode of organization for this VGIS is contributory, consisting in crowd-based participation with an active task that does not require community-driven collaboration. To contribute data, volunteers perform tasks individually and independently; thus, tools to coordinate collaboration and communication among them are not necessary. The task-cognitive engagement (second criterion in [3]) is generally low, with low cognitive demand or effort. Using common abilities and low cognitive skills associated with identifying, recognizing or describing an image characteristic, without further analysis, or inferring new information.

Participants
Depending on the crowdsourcing strategy, different engagement mechanisms to incentivize and ease contribution can be directed towards specific types of volunteers. For participant engagement, the crowd-based participation in this VGIS makes necessary to focus more on participants' individualistic needs and interests than in motivations based on the fulfillment of social needs, but also in altruistic motives that drive participants to help achieve a goal for a broader social purpose [7]. In [8] several types of VGI participants were characterized. The main target to this project is the 'Adventurers' or 'Discoverers', driven by the mental or intellectual stimulation of tasks, curiosity and exploration of new things. Thus, it will be required to design and provide elements to satisfy their cognitive needs such as the need to obtain and apply knowledge. The exploration of geography and discovering its diverse characteristics through the visualization of aerial images plays an important role to get their involvement, generating intellectual entertainment through the task design features, and the opportunity to apply their knowledge and creativity to recognize terrain characteristics. Also, the motivation of 'Altruists' or 'Helpers' through the possibility to contribute to a goal that benefits others, by helping to create basic reference and updated geo-information, is also important.

VGIS Functional Components
The crowdsourcing strategy guides the design of several functional components of the VGIS that are part of its technological infrastructure and fulfill particular system objectives [9].

Data Modelling
To enable volunteers' distributed work, the WMS with the PNOA orthophotos and the corresponding MTN25 cartographic sheet are subdivided into tiles (i.e., tiled image unit at a certain scale). The VGIS links the collected attribute data to the specific geographic area in which the volunteer is working, by registering the geographic coordinate (i.e., point type geometric object) corresponding to the image tile center. Also, the URL addresses for each WMS with the tile that has been worked are stored. Then, the five categories used to tag an image tile are: Tag 1: No Road/Unregistered Road; Tag 2: Road Exists/Road (type) Registered Correctly; Tag 3: Road Exists/Unregistered Road; Tag 4: Road Exists/Road (type) Incorrectly Registered; and Tag 5: No Road/Road Incorrectly Registered. Each label corresponds to a combination that describes the situation or scenario that can be observed in the image and the road base maps. Additionally, for labels 3 and 4, the system allows selecting the observed road type: highway, route, street, or track.

Data Collection
The data tagging task consists of a manual process in which volunteers select and assign attributes to the different characteristics of the existing geographic information (i.e., aerial images and road base maps). The community defines which tiles and parts of the data set belong to which category by assigning a tag, helping to organizing the data for further processing. In addition to a web form and a set of buttons to select a tag, is necessary to use Geoweb technologies such as a web mapping (e.g., OpenLayers library) interface to retrieve both WMS based on a random geolocation within the cartographic sheet as a training zone. The web map uses an 18-level zoom to visualize several tiles of the image on one side of the interface, along with roads information on the other. Then, the volunteer selects one of the tiles to inspect and label whether or not it exists a road, also compares the tile with the roads layer in order to characterize if the current cartography corresponds to the reality that can be seen in the image. Finally, the volunteer registers the tag in the system. Figure 1 shows an example of the current system interface.

Data Processing and Data Access
The random presentation of image tiles ensures adequate coverage of most of the spatial distribution corresponding to a cartographic sheet. The VGIS gathers and converges all data within the geographic area of interest to build a database with multiple contributions that belong to each scenario represented by the tags. The value created by crowdsourcing arises from the accumulation and collective processing of complementary individual contributions. The VGIS facilitates recurrent characterization of tag scenarios allowing effective machine learning for pattern detection in aerial images. Now, data access will only be reflected in future updates to the MTN25 as verified open data.

Participant Engagement
Initial engagement mechanisms or tactics using some of the drivers of VGI presented in [8]: • 'Self-transcendent Purpose' call to action to attract potential volunteers on the mission for a better cartography of Spain. Indicating how valuable is their help to reduce base maps updating times. • 'Cognitive Arousal': offering variety, novelty and uncertainty, since participants do not know the cartographic sheet and its random tile that will be shown and with what characteristics. For each tile to be validated, the system queries and displays a pop-up window with additional geographic and statistical information about the use of land, vegetation types, and biodiversity of fauna and flora that is located within. This tool seeks to motivate participation by appealing to curiosity about the environment and the knowledge of its characteristics. Facilitating participant fun and entertainment by exploring and touring the Spanish geography. • 'Sense of Progress and Accomplishment': using gamification elements for participation feedback, the system records the spatial distribution of the contributions of each volunteer, indicating their progress and percentage of explored territory in relation to the unexplored land. 'Exploration Points' are granted as a reward for each square kilometer explored and tagged, assigning the participant different 'Adventurer Ranks' based on its activity. • 'Sense of Competence and Mastery´: participants' leaderboards by rank tiers can boost competition to rise levels. Also, it is facilitated social comparison and reputation of participants, opening the possibility to start creating a 'Sense of Relatedness and Community' for future tactics. • 'Sense of Identity': several badges are offered with each species of fauna and flora found by a participant, as well as vegetation and land use types that were explored. Volunteers have access to their own library with their badges and the geographic knowledge they have accumulated.

•
Elementary spanish students are a special target group. Having explored a territory linked to a tile, the next time a student finds a tile with already known geographic features, a questionnaire can be presented to reinforce his learning. Correct questions add 'Exploration Points' to raise the ranking in a school's competition, facilitating classroom fun and intellectual entertainment.

Conclusions
This work serves as input and roadmap for prototype development, accelerating the technological implementation, testing and refinement, based on a holistic and theoretical understanding of VGIS. The design methodology followed has been an effective guide, resulting on a system that uses volunteers' intelligence and skills to create a data management loop through community-system interactions. To engage participants, a key point in VGIS design is the need to offer and provide some benefit and functional value aligned with volunteers' needs. The last section presents tools to motivate and make participation more appealing. Finally, the system can be extended to include other geodata layers to continue training different machine learning algorithms.