The Role of Citizen Science in Earth Observation

Citizen Science (CS) and crowdsourcing are two potentially valuable sources of data for Earth Observation (EO), which have yet to be fully exploited. Research in this area has increased rapidly during the last two decades, and there are now many examples of CS projects that could provide valuable calibration and validation data for EO, yet are not integrated into operational monitoring systems. A special issue on the role of CS in EO has revealed continued trends in applications, covering a diverse set of fields from disaster response to environmental monitoring (land cover, forests, biodiversity and phenology). These papers touch upon many key challenges of CS including data quality and citizen engagement as well as the added value of CS including lower costs, higher temporal frequency and use of the data for calibration and validation of remotely-sensed imagery. Although still in the early stages of development, CS for EO clearly has a promising role to play in the future.


Introduction
Earth Observation (EO) is the collection of information about the Earth's surface using remote sensing and in situ surveying on the ground [1].Hence, it encompasses imagery from a range of satellite sensors, aerial imagery from airplanes and increasingly unmanned aerial vehicles, as well as permanent ground-based sensors and field-based measurements collected using handheld sensors, digital questionnaires or using paper-based formats.Once the domain of only professionals, the field of EO has seen new inputs coming from Citizen Science (CS), which is the involvement of citizens in scientific research, from data collection through to hypothesis generation [2].This is clearly reflected in the increasing number of publications that have appeared in the area of EO and CS as shown in Figure 1, which is based on a search of the terms 'Earth Observation' and 'Citizen Science' in both Scopus and Google Scholar.Citizens generally provide inputs to EO in two main ways, i.e., through image interpretation and through collection of in situ data, both of which are useful for the calibration and validation of remotely-sensed imagery or products derived from EO [3].Table 1 lists a number of different CS projects that are currently providing data useful for EO, classified by the type of data collected.Although the list is not exhaustive since this field is changing rapidly, it does serve to illustrate the vast breadth of projects that have emerged over more than four decades.Despite the long history of such initiatives, the majority of projects listed in Table 1 are much more recent and were started in the current decade.
We have also indicated whether the data collection is carried out outdoors, i.e., field-based, and which ones are carried out online, i.e., usually indoors.Some projects combine both data collection options.The summary of projects in Table 1 shows that around 58% are field-based only, 25% are online only, while 17% have both a field and online component.This shows that the majority of projects (around 83%) are collecting in situ data, which can help to fill a much needed data gap [4].Citizens generally provide inputs to EO in two main ways, i.e., through image interpretation and through collection of in situ data, both of which are useful for the calibration and validation of remotely-sensed imagery or products derived from EO [3].Table 1 lists a number of different CS projects that are currently providing data useful for EO, classified by the type of data collected.Although the list is not exhaustive since this field is changing rapidly, it does serve to illustrate the vast breadth of projects that have emerged over more than four decades.Despite the long history of such initiatives, the majority of projects listed in Table 1 are much more recent and were started in the current decade.
We have also indicated whether the data collection is carried out outdoors, i.e., field-based, and which ones are carried out online, i.e., usually indoors.Some projects combine both data collection options.The summary of projects in Table 1 shows that around 58% are field-based only, 25% are online only, while 17% have both a field and online component.This shows that the majority of projects (around 83%) are collecting in situ data, which can help to fill a much needed data gap [4].CS is used here in the widest sense since most of the CS projects listed in Table 1 involve citizens mainly in data collection rather than in scientific analysis of the data or project design.However, there are documented examples of where citizens are involved in the full sequence or workflow of a CS project, from project design, data collection, methodology development to data analysis and interpretation of results [6].
The increased role that citizens are playing in the field of EO has been driven by a number of factors.The first is the availability of very high resolution satellite imagery through initiatives like Google Maps and Bing Maps, which have brought very high resolution satellite imagery much closer to citizens and made it part of their daily lives.The second is technological, as advances in mobile technology and Web 2.0 have resulted in an environment where citizens can literally map the world, e.g., through OpenStreetMap (OSM) or Google MapMaker, or collect georeferenced data as they move throughout their physical space.At the same time, new satellite sensors have been launched, resulting in new big data streams from the Copernicus Sentinel satellites and through Planet, which will become enablers for many new EO applications in CS in the future.
However, CS remains a challenge in itself due to numerous issues such as quality [7][8][9][10], data interoperability [11,12], the engagement and motivation of citizens [13,14], strategies for retention and sustainability of participation [15,16], and increasingly, legal issues related to privacy, ethics and licensing [17], among others.This special issue includes papers that address some of these issues in relation to the use of CS for EO, while other aspects associated with crowdsourcing, in particular the value of crowdsourced data, are also considered within the different application-oriented papers.In the next section, a summary of the findings presented in the special issue papers is given, which represent some of the latest advances in CS and EO.In the final section, some ideas about the possible future directions for CS and EO are presented.

Latest Advances in Citizen Science and Earth Observation
This special issue contains 11 publications that include 10 research papers and one review paper.One of the 10 research papers focuses on the status of CS and is therefore summarized together with the review paper in Section 2.1.The other nine papers of this special issue deal with applications in the fields of disasters, the marine environment, biodiversity (species occurrence and phenology), land cover and forest monitoring, which are summarized in Section 2.2.

Citizen Observatories and the Status of CS for EO
In the paper by Grainger [18], the author provides a review and analysis of citizen observatories, which are described as a recent development facilitated through top-down funding by the European Commission.The author further points out that citizen observation systems are much more complex than traditional satellite-based observation systems and stresses that communication, the needs of the end user, and their understanding and acceptance of the data are critical.Grainger [18] then proposes a new framework for extending the current remote sensing framework to encompass all EO systems, which includes citizen observatories, and how data collected in these systems can be converted into usable information for decision makers.Finally, Grainger [18] divides the types of citizen contributions into three different categories, namely data collectors, participatory science and co-creation.It should be noted that the application papers in Section 2.2 fall primarily into the category of data collectors rather than having citizens actively participating in or co-creating the scientific project.This is a similar finding to the projects listed in Table 1.
Linking closely to [18], the paper by Mazumdar et al. [19] outlines a stakeholder analysis that was undertaken as part of the 'Crowdsourcing for observations from Satellites (Crowd4Sat)' project, which is one of the ESA's (European Space Agency) current demonstration projects on CS for EO.The stakeholder analysis investigated how CS and crowdsourcing impact the validation, use and enhancement of observations from satellites.Some key findings of the analysis include: (1) the first adopters that make use of CS are academia followed by civil protection agencies, tourism organizations and local authorities; (2) communication to citizens on how and for what their data are used is essential as well as continued feedback, where social media can play an important role; (3) CS and crowdsourcing can potentially verify and improve the spatial and temporal resolution of satellite observations, in particular in the field of disaster response and environmental monitoring, and will play an increasing role in the future; (4) the remaining challenges are data protection, privacy, standardization, clear policies, quality and understanding the true value of CS data; and (5) user experience and the touch and feel of applications play an essential role in engaging citizens.

Disaster Response
The paper by Albuquerque et al. [20] explores the value of crowdsourced data for humanitarian mapping.The authors identify a typology of the tasks used in geographic information crowdsourcing, including classification and digitization (both performed by citizens) and conflation (performed in this article by the research team).These were applied to a case study in the area of humanitarian aid, namely the "Map South Kivu" sub-project of the 'Missing Maps' project.The difficulty of the tasks was assessed as well as the quality of the results obtained, which were compared with the data available in OSM.The main findings were that: (1) 25% of the volunteers contributed 80% of the data collected, similar to findings in other CS and crowdsourcing projects [16]; (2) tasks that involve the creation of objects proved to be more difficult to execute than tasks that do not involve objects; (3) the agreement between volunteers was shown to be a very good indicator of task difficulty and of the reliability of the results; (4) task design was shown to be important for the performance of the results; and (5) the crowdsourced classification of satellite imagery can produce geographic information about human settlements with a high level of quality, achieving an accuracy of 89%, a sensitivity of 73% and a precision of 89%, where these results are comparable to the use of automated approaches.The advantages of the use of crowdsourced data over automated approaches were larger when mapping small buildings, which were not detected automatically, and, in some cases, these can represent a large percentage of the features.The authors suggest that crowdsourcing may be used in the future to identify training sets and to complement automated classification, tackling difficult situations in particular.

The Marine Environment
The paper by Busch et al. [21] illustrates the added value that citizens can bring when they monitor ocean color, in particular along coastlines.They can fill observational gaps since remote sensing methods are not accurate along coastal zones due to the mixed pixel problem of having land and water.This paper presents tools and methods derived from the EU-funded FP7 Citclops citizen observatory.The tools are attractive to a wide set of different users since they range from stand-alone smartphone apps to devices with Arduino and 3D printing.From the different methods used, the highest number of measurements collected by citizens was 1600 via the EyeOn Water-Color app, which is the simplest method developed.The authors stressed that offering easy-to-use tools with different levels of complexity is fundamental for engaging citizens.The methodologies implemented in the paper enabled high quality data to be collected when compared with standard in situ or laboratory measurements.Automatic quality control procedures were also implemented to guarantee the validity of the contributions.

Biodiversity: Species Occurrence
The article by Heigl et al. [22] compares roadkill data collected by citizens with data collected by hunters for hares killed in Lower Austria.The authors used road network data from OSM, Corine Land Cover products and aerial photography to manually map structural landscape elements.The results showed that hunters and citizens report roadkill in different areas, i.e., hunters report more often in rural areas, which have greater length of secondary roads and agricultural areas, compared with citizens, while citizens report more often in urban land cover, where there is higher coverage of motorways and residential roads.No significant differences in reporting between hunters and citizens were found in relation to the amount of landscape structural elements.Thus, these results showed that hunters and citizens cover different regions so should be viewed as complementary sources of data.The authors plan to scale up the exercise to all of Austria and they recommend the collection of data on the demographic and social background of the participants, since different types of people appear to survey different locations.

Land Cover
The aim of the paper by Laso Bayas et al. [23] was to present results from the FotoQuest Austria campaign, which ran during the summer of 2015 for several months.Data were collected via the FotoQuest Austria mobile app at the same sample locations where EUROSAT commissions the collection of ground data via the Land Use and Cover Area frame Survey (LUCAS) every three years.The mobile app was designed to follow the LUCAS protocol as closely as possible and to make the data collection as simple as possible.When comparing the official land cover and land-use classes collected by LUCAS with those collected by volunteers, there was generally good agreement for level 1 land cover classes (e.g., artificial surfaces and cropland) and at the most aggregated level for land-use (e.g., agriculture), but performance decreased at levels 2 and 3 (e.g., less common crop types).In general, homogeneous areas also showed higher agreement than heterogeneous ones.The first results from this initial campaign show that CS-based land cover collection can complement but not replace the official survey.Other advantages are reduced costs and potentially a more frequent inventory.
The paper by Salk et al. [24] analyzed data from a Geo-Wiki campaign, which was run over the summer with 30 students, and a second campaign using a Geo-Wiki game called Cropland Capture.Cropland Capture is available on desktop or mobile devices where users are asked to identify evidence of cropland from very high-resolution satellite imagery.During the Cropland Capture campaign, more than 4.5 million classifications of cropland and non-cropland were acquired by over 3300 volunteers.The paper focused on examining whether people who lived closer to an image performed better than volunteers who lived further away, i.e., does local knowledge influence performance?The results from the analysis showed that distance from a person's hometown to the image being classified had no or very little influence on the actual performance of the volunteers or students.For example, one difficulty volunteers faced was to differentiate between managed pasture and cropland, but the results showed little difference in the performance when comparing volunteers who lived in the same country from where the images were sourced compared to those living outside that country.One reason why there was little difference might be that most volunteers were living in cities rather than the countryside and hence there was little local knowledge of the surrounding rural landscapes.

Forests
The article by Molinier et al. [25] presents the Relasphone mobile application (version 1.5, VTT Technical Research Centre of Finland, Espoo, Finland), developed to collect in situ data for forest inventories, namely, basal area and tree species, height, diameter and age, among other data of interest.These data are expensive to obtain with traditional methods, and thus data collection is usually limited to only a small number of samples.The authors describe the application and the data gathered with the app in two study areas (with mixed forest from different biomes) located, respectively, in Finland and Mexico, and then assess the accuracy of the data collected using reference forest inventory data.The results showed a good agreement between the data collected by the Relasphone mobile application and the reference data and also consistency between data collected by different observers.The data collected regarding basal area and stem volume were used as training sets to classify optical satellite imagery in order to create maps of basal area and stem volume.The maps obtained for the two study areas were comparable to the maps obtained using other training areas.The authors also analyze the relevance of the application for CS, as well as its potential for EO and mapping biomass worldwide, in particular for monitoring tropical forests.

Biodiversity: Phenology
The first paper on phenology is by Wallace et al. [26] and provides an example of the use of CS to monitor invasive buffelgrass in Southern Arizona.Since treatment of buffelgrass is only efficient after a 50% greening, both remote sensing and observations on the ground are needed to optimize any interventions.CS was used to gather data on the phenological development of buffelgrass in two different locations and in two different ways: observations made by one observer over several years and observations made at the same place by 10 different citizen recruits for a shorter time period.The crowdsourced data were then used to establish a relationship between greenness and cumulative precipitation from remote sensing in order to determine the cumulative precipitation threshold at which treatment can begin, which differed between the two locations.Moreover, CS was used to validate the MODIS NDVI product, demonstrating a link between field observations and the NDVI profile.
The second paper on phenology [27] describes how citizens contributed to identifying features from images taken from a network of 300 phenology cameras as part of the CS project called 'Season Spotter'.Zooniverse was used as the platform to run the exercises.The Zooniverse network of volunteers is set up to undertake visual interpretation of images that cannot be done at all or not very well via automatic classification.In three exercises, citizens were asked to identify information on snow and image quality, delineate trees and to identify phenological change between pairs of images when they were a number of days apart.In general, citizens were able to identify the plant phases correctly (except for grass seedheads), they could spot snow and poor quality images, and they were able to demarcate trees as well as phenological changes.The information collected helped to identify the start and end of the growing season, which could be compared with remotely sensed data and serve as a calibration dataset.CS data in this application is particularly useful in complementing remotely sensed data as well as traditional ground-based data collection, in order to validate remotely-sensed vegetation indices and to help improve the phenophase detection algorithms.
The last paper on phenology by Elmore et al. [28] focuses on trees and examines if MODIS resolution data can be used to characterize the phenology of individual trees.The study taps into the Nature's Notebook project to obtain ground observations on phenological stages of different tree species including Poplar and Lilac.When processing the crowdsourced data, a number of quality control rules were applied, which effectively filtered the data and helped to improve the correlation between the observations on the ground and the time series of the MODIS indices.The findings showed that some species are better suited for phenotyping using MODIS data than others since the MODIS signal always represents the average over the pixel extent.Recommendations were given to target homogeneous forest areas since heterogeneous landscapes influence the overall average tree signal and affect the quality of the phenotyping.The paper showed that the data from citizens can clearly complement data from remote sensing.

Summary of the Applications
All applications in the special issue show that citizens do have a clear role in EO.They cover a wide range of fields from marine applications to disasters, as well as land monitoring applications in the field of land cover, forestry, phenology and biodiversity.Depending on the application, the added value of citizen observations ranges includes cost savings (S), making data available at a higher frequency (F), contributing to calibration and validation activities (P) and complementing traditional or RS-based methods (C).Table 2 provides a summary of the nine application papers in terms of this added value as well as the type of data used (CS-based and other types of data) and the maturity level of the application.Six applications involve outdoor activities while three could be performed indoors, involving the visual interpretation of very high-resolution images from space or images derived from a ground-based camera.All applications involve citizens in performing data collection tasks, eight use remote sensing data in one way or another and five make use of traditional or authoritative data and compare them to the crowdsourced data.Three applications directly compare remote sensing data with CS data.Most papers are in the initial research phase, with most at a low level of maturity.All applications clearly demonstrate an added value of citizen observations, where the majority complement RS data or traditional in situ data.

The Future Outlook for Citizen Science and Earth Observation
In the last decade, we have seen a massive increase in research on CS and EO (see Figure 1) and this trend is expected to continue in the future.Moreover, as we showed in Table 1, there are already many different ongoing CS projects that have relevance for EO, but most are not embedding their data streams into operational EO applications, while others suffer from insufficient citizen participation.One important element needed to make citizen observations an official data stream is legal recognition by governments and local authorities that citizen-based data are a valid source of information and to recognize that citizens are able to deliver data of sufficient quality that can potentially complement, but not necessarily replace, existing observation networks.For example, in the USA, the Crowdsourcing and CS Act, which came into force in January 2017, gives federal agencies clear authorization to use CS and crowdsourced data.This recognition paves the way for the development of truly integrated environmental monitoring systems involving citizens as a key contributor.No such legislation exists in the EU or other countries around the world.
The ubiquitous dispersion of smartphone technology has acted as an enabler and allows citizens, with little effort, using well-designed and user friendly apps, to collect observations of the environment, which can then be stored in a data repository for curation and further analysis.A second enabler of CS and EO is the sheer amount of new and freely available remotely-sensed data.When fully operational, the Sentinel satellites of the European Copernicus program will collect several TB of data per day, where Sentinel-2 images, for example, will be acquired every five days at a 10 m resolution.There are also new commercial providers such as Planet, which will eventually provide daily observations of the globe at a 3 m resolution, improving the chances of obtaining cloud free imagery.This will enable near real-time monitoring of changes occurring on the Earth; such data are becoming increasingly attractive since they can be used to build more applications for citizens that will potentially stimulate their engagement and further mainstream the use of satellite data by citizens in their daily lives.The two biggest challenges for CS at present are related to the quality of the data and how to engage and retain citizen participation in the longer term.Quality can be improved via training; through the continuous monitoring of data quality with embedded feedback provided to citizens, the volunteers can improve over time.In terms of engagement, Budhathoki and Haythornthwaite [29] showed that there are several reasons why citizens participate, including a range of motivations such as altruism, self-achievement and personal interest, among others.Successful CS or crowdsourcing projects have been those where citizens were able to derive a direct benefit from the data collected by others.One typical example is mountain bike tracks that are recorded at some locations in great detail.For example, a new route entered by one mountain biker can be used directly by another one living close by.Another increasingly popular way of engaging citizens is via gamification and making data collection a fun experience.For the first time, Pokémon Go has managed to engage a massive crowd of players to go outside their homes, sometimes to remote places, to collectively catch 88 billion monsters (as of 1 March 2017).Had Pokémon Go also collected information about real objects found in the environment, e.g., taking a picture of a tree while simultaneously catching a virtual monster, such data could have been very valuable and an initial global database on tree species, health and location could have been derived from such a game.Although the majority of observations would not have been in remote or inaccessible places, this could have been used to document tree health in cities and towns and benefited the CS applications of urban tree monitoring (see Table 1).The data would also have been provided at an incredibly high temporal resolution, i.e., possibly seconds between observations, which would allow for rapid infestations to be clearly identified.
This special issue dedicated to CS and EO includes some review-oriented publications but mainly consists of research papers demonstrating applications in many diverse environmental domains.As with much of CS today, the citizen contributions in the application papers are mainly in the form of data collection and not in project co-design or data analysis.Moreover, many of the most relevant aspects related to CS have been addressed in some way in this compilation of papers, such as the role of citizens in collecting data, the quality of crowdsourced data, data conflation, and the combination of CS with other technologies and methods applied by experts, to name just a few.Even though the use of CS for EO is still at an early stage, the huge potential arising from the combination of both data streams is already very clear.

Figure 1 .
Figure 1.Number of publications in SCOPUS and Google Scholar over time addressing 'Earth Observation' and 'Citizen Science'.

Figure 1 .
Figure 1.Number of publications in SCOPUS and Google Scholar over time addressing 'Earth Observation' and 'Citizen Science'.

Table 1 .
Examples of crowdsourcing and citizen science initiatives relevant to Earth Observation.

Collected Crowdsourcing and CS Projects and Initiatives Year Project Started, Launch Date or Earliest Date found on the Website or in Publications Field-Based or Online Participation
Water quality and biodiversity FieldScope (http://education.nationalgeographic.com/education/programs/fieldscope) Field-based Weather Weather Underground (http://www.wunderground.com/)Field-based WOW (http://wow.metoffice.gov.uk/)Field-based Citizen Weather Observer Program (http://www.wxqa.com/)1990s Field-based

Table 2 .
Summary of applications presented in the special issue.
*** high ** medium * low + applies -does not apply; S = cost savings, F = higher frequency, P = potential Cal/Val dataset, C = complements RS data or traditional in situ observations.