Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data

Antoniou, Vyron; Fonte, Cidália Costa; See, Linda; Estima, Jacinto; Arsanjani, Jamal Jokar; Lupia, Flavio; Minghini, Marco; Foody, Giles; Fritz, Steffen

doi:10.3390/ijgi5050064

Open AccessArticle

Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data

by

Vyron Antoniou

^1,*

,

Cidália Costa Fonte

²

,

Linda See

³

,

Jacinto Estima

⁴

,

Jamal Jokar Arsanjani

^5,6,

Flavio Lupia

⁷

,

Marco Minghini

⁸

,

Giles Foody

⁹

and

Steffen Fritz

³

¹

Hellenic Military Academy, Leof. Varis-Koropiou, 16673 Athens, Greece

²

Department of Mathematics, University of Coimbra—INESC Coimbra, Coimbra, Rua Antero de Quental, n° 199 Coimbra, Portugal

³

International Institute for Applied Systems Analysis (IIASA), Schlossplatz 1, 2361 Laxenburg, Austria

⁴

NOVA IMS, Universidade Nova de Lisboa (UNL), 1070-312 Lisboa, Portugal

⁵

GIScience Research Group, Institute of Geography, University of Heidelberg, Berliner Strasse 48, 69120 Heidelberg, Germany

⁶

Department of Planning and Development, Aalborg University Copenhagen, A.C. Meyers Vænge 15, DK-2450 Copenhagen, Denmark

⁷

Council for Agricultural Research and Economics (CREA), Via Po, 14 00198 Roma, Italy

⁸

Politecnico di Milano, Department of Civil and Environmental Engineering, Como Campus, Via Valleggio 11, 22100 Como, Italy

⁹

School of Geography, University of Nottingham, NG7 2RD Nottingham, UK

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2016, 5(5), 64; https://doi.org/10.3390/ijgi5050064

Submission received: 2 February 2016 / Revised: 8 April 2016 / Accepted: 9 May 2016 / Published: 13 May 2016

(This article belongs to the Special Issue Volunteered Geographic Information)

Download

Browse Figures

Versions Notes

Abstract

:

Geo-tagged photographs are used increasingly as a source of Volunteered Geographic Information (VGI), which could potentially be used for land use and land cover applications. The purpose of this paper is to analyze the feasibility of using this source of spatial information for three use cases related to land cover: Calibration, validation and verification. We first provide an inventory of the metadata that are collected with geo-tagged photographs and then consider what elements would be essential, desirable, or unnecessary for the aforementioned use cases. Geo-tagged photographs were then extracted from Flickr, Panoramio and Geograph for an area of London, UK, and classified based on their usefulness for land cover mapping including an analysis of the accompanying metadata. Finally, we discuss protocols for geo-tagged photographs for use of VGI in relation to land cover applications.

Keywords:

geo-tagged photographs; Volunteered Geographic Information; Flickr; Panoramio; Geograph; land cover; land use; fitness-for-use

1. Introduction

Volunteered Geographic Information (VGI) refers to the provision of location-based information by volunteers [1], where online collaborative mapping has made OpenStreetMap (OSM) one of the most successful examples of spatial data collection by citizens [2]. Other examples include VGI for disaster response [3], satellite image interpretation for land cover mapping and validation [4], and environmental monitoring [5]. There is also considerable interest in examining how VGI can be used to complement authoritative data or highlight areas of change faster than the mapping cycle of national mapping agencies. Much of this work has focused on comparing data from OSM with authoritative data [6,7,8,9,10]. Much less work has been focused on the use of geo-tagged photographs as a source of VGI [11], yet they represent a substantial and growing source of volunteered information, e.g., from online repositories such as Flickr, Panoramio and Geograph, as well as public social media platforms, such as Instagram. There are currently estimated to be around 90 million photographs in Panoramio [12] and just under 60 million photographs are uploaded per day on Instagram [13]. According to Michel [14], in Flickr, around two million photographs are uploaded per day and over 5.26 billion photographs were available by the end of 2014. In 2010, Kisilevich et al. [15] downloaded around 87 million geo-tagged photographs for their analysis and geovisualization of the activities and behavior of people around the world. The results showed that the attractiveness of places can be studied with photographs and that geo-visualization is of great value in analyzing the temporal and spatial characteristics of these datasets. Similar studies have appeared using geo-tagged images to find areas of touristic interest in Australia, Hong Kong, Italy and Switzerland, as well as extracting the behavioral patterns of tourists by identifying the main spatio-temporal trajectories taken by visitors [16,17,18,19,20]. Four types of spatio-temporal clusters (stationary, reappearing, occasional and regular moving) were found in Flickr photographs from Switzerland [19], where the authors combined these patterns with semantic interpretation of the text (i.e., tags from the photographs) and information from external data sources (i.e., geo-tagged Wiki pages). Geo-tagged photographs have also been used to provide personalized recommendations regarding which tourist attractions to visit [21] and to identify events [22,23,24,25]. For example, a novel method was developed by Rattenbury et al. [22] to detect event and place semantics from the distribution and frequency of tags from Flickr images in the San Francisco Bay area, outperforming other similar methods, while Chen and Roy [23] used geo-tagged photographs to detect aperiodic events, using wavelets to first filter out noise. Geo-tagged photographs have also been used to detect social events such as soccer matches, concerts, parades and festivals [24,25,26] while tourist photographs have also been used to reconstruct urban areas [27,28].

Another growing area of interest is in the use of geo-tagged photographs for the development, verification and validation of land use and land cover (LULC) maps [29]. For example, Li et al. [30] proposed an automated method to generate a map of the main roads in Beijing using geo-tagged images from Flickr. Road locations and classes were inferred from user-contributed trajectories while road names were derived from the tags. Estima et al. [31,32] evaluated whether photographs from Flickr could be used to support LULC classification for the city of Coimbra and for comparison with Corine Land Cover (CLC) classes (level 1 and 2) for continental Portugal, respectively. For Coimbra, the results showed an uneven distribution of geo-tagged photographs with a strong concentration in touristic places while the comparison with CLC indicated the highest density of photographs in artificial areas. The authors concluded that Flickr photographs were not suitable for LULC classification or comparison with CLC if used alone, but improvement might be obtained by combining them with other VGI sources. Leung and Newsam [33,34,35] investigated whether geo-tagged images could be used as inputs to a land cover classification. Using geo-tagged photographs from Flickr and Geograph, the authors created a binary land cover classification (developed/undeveloped) for an area of 100 × 100 km² in Great Britain. The accuracy achieved was around 75%, with higher results achieved for Geograph compared to Flickr [33,34]. The authors also claim that classification performance improves when images are acquired with mobile phones and images taken with flash are discarded. A similar approach was used by the authors in [35] for land use classification using geo-tagged images from Flickr. The algorithm was validated on land use maps from two university campuses and the overall results suggested there is good potential for generalizing this approach to areas where land use maps are not available or are out-of-date.

Moving to a global scale, Tsendbazar et al. [36] examined which global land cover products have used geo-tagged photographs in their training or validation, which includes GlobCover 2005 [37], GlobCover 2009 [38] and the GLCNMO [39], where one source of photographs has been the Degree Confluence Project (DCP). The DCP is a voluntary-based initiative aimed at collecting photographs and narratives at each latitude and longitude intersection point around the world. Foody and Boyd [40] used photographs from the DCP to validate forest cover in West Africa from the GlobCover map, finding that the data from the volunteers showed accuracy values similar to those of professional validators once a latent class model was applied. Geo-tagged photographs from DCP were also used for validating land cover maps by Iwao et al. [41]. The authors used the photographs and descriptions corresponding to 749 DCP points worldwide for validating MODIS Land Cover (MOD12), the University of Maryland’s Global Land Cover (UMD), Global Land Cover 2000 (GLC2000) and the Global Land Cover Characteristics Database (GLCC). The results showed that the DCP data have the same or a higher level of reliability than the validation data derived from visual interpretation of Landsat imagery. A larger validation dataset (4211 DCP data of worldwide locations) was used by Iwao et al. [42] for validating a new global land cover map created by combining three existing land cover maps (MOD12, GLC2000 and UMD). The rate of agreement of each land cover map was computed for six major climatic zones. The results showed that the overall agreement was higher for the new land cover map compared to the three individual products.

Although many of the aforementioned studies have shown promising results with respect to the use of geo-tagged photographs as inputs to land cover and land use mapping, this source of VGI comes with a number of caveats. Terminological consistency and lack of interoperability with authoritative data have been highlighted by Kinley [43], who explored the adequacy of geo-tagged photographs from Geograph for enhancing authoritative data on land cover, and Purves et al. [44], who examined the possibility to describe place from user-contributed tags in Geograph. There are various types of bias in the data, e.g., a spatial bias toward urban and touristic areas [32,45], concerns over the quality of VGI more generally [46,47] as well as specific issues such as positional accuracy of geo-tagged photographs [48,49], and legal issues around privacy and ownership [50,51]. Some of these concerns, such as the need to improve data quality and the trustworthiness of the data, could be helped by the use of data collection protocols. Data collection protocols for VGI are either non-existent or not strictly enforced, e.g., with OSM data. In contrast, many citizen science projects in the area of biodiversity monitoring and conservation have well-defined data collection protocols that require user training and/or strong interaction with experts in the field [52]. In a sense, trying to cover this gap, the World Wide Web Consortium (W3C) and the Open Geospatial Consortium (OGC) have jointly published a draft document [53] on the best practices that should be followed when publishing spatial data on the web but there is still generally a lack of protocols in VGI.

The overall aim of this paper is to examine the usability of geo-tagged photographs for land cover applications, and to provide recommendations regarding the minimum information required, including considerations of the quality of this information. As a starting point we examine the protocols that are associated with current applications and initiatives that collect geo-tagged photographs. We then provide an inventory of the metadata that are collected from geo-tagged photographs, e.g., from EXIF files and forms filled in when uploading photographs. This list also includes other information that could be provided by volunteers that may be of value to different applications. We then consider three use cases related to land cover and consider what metadata would be essential or desirable from the metadata list compiled. Geo-tagged photographs were then extracted from Flickr, Panoramio and Geograph to examine the metadata available and the images were classified based on their usefulness for land cover mapping. Finally, we provide guidance on the minimum data requirements for geo-tagged photographs with respect to land cover applications and discuss protocols for geo-tagged photographs in new applications of VGI in relation to land cover.

2. Protocols for Existing Geo-Tagged Photograph Sites and Inventory of Metadata

2.1. Current Protocols for Geo-Tagged Photographs

Many sites now exist that allow users to upload photographs, where the purpose differs from pure social networking to sites for sharing photographs to those designed to document the landscape. A selection of sites are listed in Table 1. Each of these sites was then examined to determine the minimum requirements or protocols for uploading photographs as well as information that can be optionally added. This survey was undertaken in order to understand what the current protocols are. This can be contrasted to the protocol for the Land Use/Cover Area frame Survey (LUCAS—shown at the bottom of Table 1), which is a professional protocol for the collection of land cover and land use data undertaken at a regularly spaced sample across Europe every three years for the purpose of change detection [54].

It is clear from Table 1 that there are few protocols associated with social media and photo sharing sites compared to sites concerned with documenting landscapes. The detailed LUCAS protocol, on the other hand, is outlined in a 109 page manual [54]. Tagging and comments are completely freeform in social media and photograph sharing sites while categories of land cover and land use are provided for some but not all of the landscape documentation sites and applications. In contrast, a detailed three-tier land cover and land use categorization is part of the LUCAS protocol [55].

2.2. Inventory of Metadata for Geo-Tagged Photographs

Table 2 provides an inventory of the types of metadata that are associated with photographs uploaded to collaborative photograph repositories and projects, as well as metadata that could be useful to LULC. In addition, the presence or absence of these metadata in Flickr, Panoramio and Geograph are provided, since the photographs used in this study were downloaded from these initiatives.

In general, photographs can be accompanied by no location information, which means that they are not useful for LULC applications unless they can be accurately georeferenced through accompanying metadata or visual analysis. The geo-tagged photographs can have none or differing amounts of accompanying metadata, some of which might be generated automatically (e.g., from mobile devices or cameras), while others might be added manually by users when the photographs are uploaded. Finally, some types of metadata could be useful for quality assessment (e.g., information about the method of positioning). However, these latter pieces of metadata are rarely if at all available. It is also noteworthy that photographs store a rich variety of metadata in their EXIF headers. However, the way that websites and applications treat this source of metadata is inconsistent. For example, the International Press Telecommunications Council (IPTC) examined the different ways that social media handle the metadata accompanying a photograph and the information ignored or dropped as a photograph passes through the workflow of a repository [56].

Having established the lack of protocols associated with the majority of geo-tagged photograph repositories and an inventory of metadata that could accompany geo-tagged photographs, we now consider three use cases and examine the usability of geo-tagged photographs for each of these applications.

3. Methodology

3.1. Study Area

The study area (8093 km²) corresponds to a region of London within the bounding box with limits 0.217°W, 51.466°N to 0.043°W, 51.526°N in the WGS84 reference system (Figure 1).

Geo-tagged photographs were downloaded for this region using the public Application Programming Interface (API) from Flickr, Panoramio and Geograph for May 2015 for the bounding box shown in Figure 1. In total there were 573,281 photographs from Flickr, 35,707 from Panoramio and 75,378 from Geograph. Flickr clearly has a much larger number of geo-tagged photographs compared to the other two initiatives.

3.2. Use Cases

Geo-tagged photographs might be useful in three use cases with respect to land cover and land use mapping, which for ease are referred to here as calibration, validation and verification. The first is in the calibration of land cover and land use maps (i.e., input data that could be used to train classification algorithms). As the spatial and temporal resolution of sensors continues to improve, the usefulness of additional information collected at point locations also increases in value. The second use case is the validation of land cover and land use maps. These use cases differ because the sampling required for validation of remotely-sensed products is more stringent than calibration [57]. Finally, we consider the use of geo-tagged photographs to augment the verification process of remotely sensed products. For example, photographs may provide additional context when checking the quality of a classification and helping to investigate areas of classification confusion.

3.2.1. Metadata Requirements of the Use Cases

Although the minimum data and quality requirements of the three use cases is dependent on the spatial resolution of the images and the nomenclature used, the first step of the analysis was to categorize the metadata listed in Table 2 as “essential”, “desirable” or “unnecessary” for each use case (see Section 4.1).

3.2.2. Analysis of Metadata

Although the scope of this paper is to qualitatively evaluate the metadata available for geo-tagged photographs against the use case requirements, an extra step was taken to calculate the quantity of tags and descriptions available, which are associated with the photographs. As discussed in the Introduction, a great deal of previous work has been based on the analysis of the tags so as to enhance the usability of the photographs. The results of the metadata analysis are presented in Section 4.2.

3.2.3. Analysis of Content Usability

The third part of the analysis considered the usefulness of the photographs based on their content since the metadata are only one aspect of usefulness. Of the photographs available from the study area, 1000 were randomly selected from each of the three repositories and then interpreted by volunteers (who were among the authors of this paper) in terms of usefulness. Usefulness was defined as whether the photograph could be used to identify land cover from among nine basic land cover types: tree cover, shrub cover, grassland/herbaceous, cropland, wetland, artificial surfaces, bare rock/barren surface, snow/ice and water. These high level land cover types are used by Geo-Wiki to collect land cover information [4] and are based on the land cover harmonization efforts of Herold et al. [58]. A simple interface was devised for the classification of the photographs where an answer of “Yes” indicated usefulness (i.e., only one land cover type could be clearly seen in the photograph) an answer of “maybe” was used when more than one type could be identified, and “no”, for when no useful evidence of land cover was available. An initial test showed that there were difficulties in performing the described classification. A series of rules were devised once the authors compared their experiences after classifying 100 photographs each, which are listed in Table 3 and the content evaluation was performed using this set of rules (see Section 4.3).

4. Results

4.1. Metadata Requirements of the Use Cases

Table 4 categorizes the metadata from Table 2 into “essential”, “desirable” and “unnecessary” for the three use cases. It is clearly “essential” to have location information for all use cases, i.e., the photographs must be geo-tagged.

For use case 2 (LULC map validation), the date when the photograph was taken is also “essential” since validation has much more stringent requirements than calibration or verification. Thus for use cases 1 and 3, the date that the photograph was taken is “desirable” but not “essential” if the date uploaded is available. According to Antoniou et al. [11], who studied the time difference between capturing and uploading a geo-tagged photograph in Flickr and Geograph, a small percentage of photographs (8.4% for Flickr and 9.2% for Geograph) have a time difference greater than one year. Moreover, Büttner et al. [59] state that the yearly average change value of land cover in Europe is very small (around 0.23%) and therefore some temporal discrepancies may not affect the use of geo-tagged photographs for calibration and verification of LULC mapping. Regardless, some type of temporal information is necessary.

The majority of the metadata listed in Table 2 would be “desirable” but not “essential” for all the use cases. Among these, one exception might be the licensing of the photographs, which may not allow for creation of derivative products under a commercial licence, particularly if these photographs were to be used for creating commercial map products and thus it will be “essential”. Meaningful tags would be desirable from which it would be possible to extract LULC information, e.g., the landscape-relevant tags that users can select with geo-tagged photographs uploaded to Geograph, or they might be used to exclude photographs that have no meaningful LULC content. A number of the “desirable” metadata could serve for assessing the positional quality of the photographs (i.e., information about the method of georeferencing) and the type and accuracy of the GPS-enabled device. The orientation of the scene, tilt, offset, focal length, and reference length or area may also help in determining the extent of the land cover shown in the photograph rather than only using it as point-based information. Having photographs in multiple directions from the same location would also be highly desirable as this provides additional context regarding the homogeneity of the land cover at a given location. The tilt might also be useful for helping to exclude photographs that are not useful from a content perspective (e.g., photographs taken that head to the zenith and do not reflect the land cover or land use on the ground). Although rarely present, information about the photographer behavior (types of photographs usually taken) could be useful for categorizing photographs by usefulness or for additional quality control particularly for use case 2 but is otherwise deemed “unnecessary”. Weather information could be useful for some applications but is deemed “unnecessary” for the LC use cases.

The different requirements of the three use cases do not really differ in terms of the metadata requirements but rather in the amount of photographs available and their spatial distribution. For training, photographs are mainly needed in locations that are representative of LULC classes while for validation purposes, a representative sample of the population under analysis should be used, which would be based on the LULC map to be validated. The sample units may be either points or areas and several sampling strategies may be considered for this purpose [57]. Regarding the third use case, since the photographs are only used to assist in the verification process, they are only required in locations where there is uncertainty in the classification or validation. Therefore the requirements regarding the spatial distribution are different from the two previous use cases. Since the aim of this paper is to analyze the usefulness of the photographs in terms of metadata and content, consideration of the spatial distribution of the photographs is not developed further in this study.

4.2. Analysis of Metadata

The 3000 photographs that were extracted randomly from the bounding box for the usability assessment (see Section 3.2.3) were used in the analysis of the metadata. Despite this small sample, it is nevertheless interesting to examine how productive the contributors were in terms of the number of tags found in Geograph, Flickr and Panoramio and the number of words in the descriptions and titles associated with photographs in Geograph and Flickr, respectively. These different metadata elements contribute towards better documentation of the photograph’s content, although considerable variations exist. Table 5 provides the mean, median, standard deviation, minimum, maximum and total number of tags and words in descriptions and titles as well as the number of photographs that had tags, descriptions and titles. The photographs in Panoramio have no separate description or title in the metadata. On average, Flickr has the highest mean number of tags, followed by Panoramio and Geograph with one photograph in Flickr having as many as 60 tags. Of the 1000 photographs analyzed for each source of photographs, the number of photographs with tags was highest for Panoramio and lowest for Geograph. Examining the results for the descriptions and titles, Geograph had longer titles on average than Flickr, which indicates a greater potential information content while titles were present in almost all of the photographs in Flickr (i.e., 927 out of 1000 photograph) compared to Geograph, where just over two-thirds had titles.

4.3. Usability of Photographs Based on Content Analysis

The content of the 3000 photographs downloaded from Flickr, Panoramio and Geograph were analyzed independently by seven volunteers and classified as explained in Section 3.2.3, into the classes “Yes”, “Maybe” and “No” with regards to usefulness of content for land cover applications.

From the results obtained, the first aspect analyzed was the variability of the answers given by the volunteers. Figure 2 shows the percentage of photographs where all volunteers chose the same class (“Yes”, “Maybe” or “No”), the percentage of photographs where the volunteers chose two different classes, and the percentage of photographs to which all three classes were assigned by the different volunteers.

Due to the large variability in the results, a more detailed analysis of the variation of outputs was undertaken. Figure 3 shows the results obtained for the seven volunteers, indicated as Vi (i = 1, …, 7), for the three initiatives. The results show that, in spite of the rules used, some volunteers determined whether the photographs were useful for land cover in a considerably different way than the majority. This shows that even though rules were developed to decrease the variability and subjectivity of the classification process, there may still exist an important influence of subjectivity, showing that it is useful to have multiple classifications undertaken by different individuals. To exclude the effect of these volunteers who generated outputs that were considerably different, the mean and the standard deviation of the number of photographs assigned to each of the three classes by all volunteers was computed, and the volunteers who assigned to each class values that were larger or lower than, respectively, the mean plus or minus the standard deviation, were excluded from the subsequent analysis. This procedure excluded volunteers V4 and V6 (Figure 3).

Figure 4 shows the variability of the results obtained with the remaining five volunteers. It can be seen that the variability decreased for all initiatives. The percentage of photographs to which three classes were assigned decreased to only 3% for Flickr and Geograph and 5% for Panoramio.

The mean number of photographs assigned to the classes “Yes”, “Maybe” or “No” is shown in Figure 5 and the standard deviation in Figure 6. The results show that for all initiatives, more than 50% of the photographs are considered useful, and in the case of Geograph, this number increased to 72%. This is similar to the findings of Leung and Newsam when they used both Flickr and Geograph for land cover classification [33,34]. The mean number of photographs considered not useful is below 20% for Panoramio and Geograph, but is close to 40% for Flickr. The mean percentage of photographs that were considered maybe useful is 9% for Flickr, 16% for Geograph and 28% for Panoramio. For the standard deviation, larger values were obtained for Flickr and the class “No” (a value of 46), while the minimum was obtained also for Flickr but for the class “Yes”.

5. Discussion

This paper explores the usability of geo-tagged photographs as sources of land cover data through the examination of the metadata available and the content in the photographs. While there are distinct differences between the three photograph repositories considered here, e.g., in the data collected and the scope (i.e., Flickr is a photograph sharing and social networking site, Panoramio is a landscape photograph sharing site while the aim of Geograph is landscape documentation), there is clearly potential for using geo-tagged photographs for the three identified land cover use cases.

With respect to the metadata analysis, the land cover cases selected (i.e., calibration, validation and verification of LC maps) need only two metadata elements (namely location and the date taken) as a minimum. This is important as it means that many well-known photograph-sharing repositories can be potential sources of input data to LULC mapping. This is further enhanced by the automatic recording of a number of metadata elements by the applications using their own built-in protocols (e.g., username, date submitted, filetype, etc.), which are then publicly shared through their respective APIs. For digital photographs, the EXIF header is of particular interest, which contains various metadata elements that are common to all modern photo-capturing devices. While the consistent presence of an EXIF header could provide a rich source of information, since it can cover many of the metadata elements listed in Table 1, each website has a different policy on how to handle, store and share these data. These automatically captured metadata elements are consistently recorded in contrast to the user-dependent elements such as tags and descriptions. Yet, these latter metadata elements could be extremely helpful for further automated analyses of the photographs. The presence of meaningful tags and comprehensive descriptions could be used to filter out noise and thus concentrate on those photographs that are relative to the case study in scope; see e.g., [17,18,19,20,21].

Examining the content of the geo-tagged photographs from a land cover point of view, it is interesting to note that more than 71% of the photographs retrieved from Geograph (Figure 5), which aims to document landscapes, can be useful for land cover mapping, compared to Flickr at 52%. Similarly, Leung and Newsam (2010; 2012) found that Geograph was more useful for land cover mapping than Flickr. On the other hand, Flickr is the photograph sharing site with the most photographs in the study area because it leverages the social networking factor. Thus, existing and future landscape initiatives should use geo-tagged photographs from both sites, i.e., attract the highest possible contributions while at the same time try to guide volunteers to follow a set of simple protocols that will make these geo-tagged photographs as useful as possible for LULC applications. Gamification techniques (e.g., through applications such as FotoQuest Austria that have built-in protocols [60]) might also help volunteers provide more useful metadata. Although the 71% usability achieved by Geograph might seem high, there is room for improvement, especially given that the primary aim of the website is to describe the landscape. Thus, apart from the quantity of photographs available, it is equally important to improve the quality of the metadata and the content of the photographs.

To this end, we provide a set of suggestions that equally target contributors to geo-tagged photograph repositories and administrators of LULC applications, which could take the form of protocols that need to be adopted if these photographs are to be useful for future land-based applications. The introduction of protocols would be similar to that found in many successful citizen science projects in the areas of conservation and biodiversity [61,62]. An important point here is that these protocols should not add unnecessary burden to the process or deter contributors from uploading images or diminish their spontaneous behavior while doing so. These protocols are organized into two main areas (i.e., improvements to the metadata collected and improvements to the content of the photographs).

Although different types of metadata are available, which vary across different photograph repositories (and even within each repository), it would be useful to standardize the choice of tags available and require minimum tagging of landscape photographs with LULC elements. A useful starting point would be the EAGLE framework [63], which breaks land cover and land use down into basic components. The framework has been developed for harmonizing classes of different land cover and land use products but could also provide a basic set of tags for geo-tagged photographs. Another “desirable” metadata element that should become “essential” is the GPS accuracy of the device. This provides critical information on the spatial accuracy of the photographs.

Regarding the content, in the context of environmental studies, photographs should be treated like measurements. Usually in crowdsourced or citizen science projects, photographs were simply used as another way to document a measurement or an observation of a phenomenon. Now the focus should be on the photographs themselves and the contributors should use other metadata elements (such as titles, tags and descriptions) to document them. In that sense, what is portrayed in the photograph plays a very important role. One step to increase the usability of the photographs for LULC studies is to ensure that enough ground is captured in such a way that unequivocally enables its categorization. To this end, photographers should refrain from taking photos from positions or under circumstances that degrade the quality of the content. For example, photographs taken from buildings or with increased zoom level create a false impression and discrepancies in terms of the location data (especially when there is no relevant information available in the metadata). Also, provision should be made so that there are no discrepancies between the LULC shown in the picture and the standing position of the photographer. Moreover, there should be no photographs taken where various LULC classes are mixed (i.e., heterogeneous landscapes) and it is, therefore, difficult to infer what the real situation on the ground is. This could be solved by capturing multiple photographs from the same point towards known (e.g., cardinal) directions. In any case, an important aspect to consider when extracting LULC classes from the photographs, either for training or validation, is the minimum mapping unit (MMU) of the map to be created or validated, because that will determine the relevance of considering objects or characteristics shown in the photographs for the classification. For example, if a small group of trees can be identified within a region of buildings and impervious surfaces, the need to report the trees as an independent class will depend on the MMU under consideration. If the desired MMU is identified prior to the collection of photographs, then it can be taken into consideration also when the photographs are captured, in the sense that they can show only what is relevant and/or be taken in a way that it is clear if certain characteristics are different from the surrounding region. The area captured by the photograph should also be, whenever possible, dependent on the MMU. If the MMU is in the order of some hundreds of meters, showing only what is present in a few meters will not provide appropriate or reliable information.

In our study, while the “Yes” and “No” categories clearly measure the usability of each photograph’s content, the “Maybe” category remains vague. In this category were photographs that either showed more than one class or it was unclear where the photographer was standing relative to water bodies. For the latter case, a topological check using vector VGI data (e.g., OSM) can disambiguate these cases and be used to assign the photographs to a single class. In fact, the use of additional sources (e.g., OSM, Google Maps, etc.) during the visual evaluation of photographs can help disambiguate a number of cases. Finally, in line with crowdsourcing principles, the use of multiple interpreters for the photographs can increase the overall accuracy of the classification. Indeed the use of multiple interpreters may be useful in enabling refined estimation, as well as the provision of information on the quality of the labelling from each individual interpreter [64].

6. Conclusions

Land use and land cover mapping programs need novel and flexible methods and approaches for collecting a large number of in situ observations (i.e., ground-based observations). While this is still a challenging task, especially in the context of environmental studies, the emergence of Web 2.0 and crowdsourcing have created huge volumes of data that could provide the necessary breadth of information needed for this purpose. The availability of online data repositories, both explicit (e.g., OSM) and implicit (e.g., Flickr), novel geographic information retrieval (GIR) methods and the phenomenon of VGI are a promising way forward despite the existing biases and caveats that accompany such datasets. The integration of citizens and crowdsourced data in, what has been up to now, authoritative methods for data collection can help reduce costs, increase accuracy, reinforce existing environmental monitoring systems and improve periodicity of land-monitoring products.

The research presented in this article aims to demonstrate whether the photographs extracted from Panoramio, Flickr and Geograph for an area located in the London region may be suitable for extracting LULC information for three possible use cases, namely classifiers for training, LULC map validation (providing the minimum metadata requirements are met and sampling design is satisfied, which is a different problem and outside the scope of this paper) and for qualitatively checking confusion that arises from the classification process. The types of metadata necessary for each use case were analyzed and compared to the metadata available for the photographs downloaded from the three initiatives considered. The content of the photographs was analyzed to determine if they actually provided any information about LULC. Since this aspect may vary with the type of LULC map to be generated, and in particular with the considered nomenclature and MMU, we adopted the nomenclature used in the Geo-Wiki project and a resolution of about 10 m.

The results show that, for the study area considered, more than half of the photographs collected from the three photograph repositories may be useful for extracting LULC information. Geograph was the initiative showing the best results, with a mean of only 12% of the photographs considered as unusable. Flickr was the site showing the worst results (with 40% of the photographs considered as unusable). However, most of the photographs analyzed for Flickr that were classified as useful were inside buildings, which, when considering the nomenclature indicated in Section 3.2.3, correspond to the class “artificial surface”. However, for a different nomenclature or MMU, these photographs might not provide reliable information, since they do not provide a sufficient representation of different land cover classes.

In this research the geo-tagged photographs were analyzed per se, without considering their geographic location. In the case of “Yes”, we assumed the area of view was homogeneous in terms of land cover, which compensates for any positional accuracy errors. In some cases the class “Maybe” was assigned because, even though it was clear where the photograph was taken from, such as a path in a park, there were other classes in the immediate vicinity, which made it difficult to be sure that the geolocation of the photograph would be exactly on the path and not on the class nearby, especially if GNSS positioning was used to georeference the photograph. To decrease this type of difficulty, it is advised to have additional information to help classify the photograph (e.g., location on a satellite image or an OSM map).

The results show that there is indeed a lot of useful information in the collected photographs. However, the extraction of the information from the photographs was time consuming and in some cases difficult and subjective, even when rules were used to make the classification easier for the interpreter. Therefore, the use of automatic methods, at least to exclude unusable photographs, would be an area for future work. Moreover, Geograph is currently limited to Great Britain and Ireland, although initiatives have begun in Germany and Corsica, while Flickr and Panoramio are available globally.

Another area of future work is the open research and methodological questions on how the “desirable” metadata elements can be transformed into required data to provide additional valuable LULC information. In other words, environmental studies could gain much from concrete methods that can leverage data, such as direction, orientation, tilt, etc., so to provide a more accurate description of a photograph’s content.

Acknowledgments

The authors would like to acknowledge the support and contribution of COST Actions TD1202 “Mapping and Citizen Sensor” http://www.citizen-sensor-cost.eu and IC1203 “ENERGIC” http://vgibox.eu/, the ERC CrowdLand project (No. 617754) and FCT (project UID/MULTI/00308/2013).

Author Contributions

Vyron Antoniou, Cidalia Costa Fonte, Linda See, Jacinto Estima, Jamal Jokar Arsanjani, Flavio Lupia and Marco Minghini wrote the paper and determined the usability of the photographs for land cover classification. Vyron Antoniou, Cidalia Costa Fonte, and Jacinto Estima extracted the information from Flickr, Panoramio and Geograph and analyzed the data. Giles Foody and Steffen Fritz provided useful ideas during the development of the paper and helpful comments to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

API	Application Programming Interface
CLC	CORINE land cover
DCP	Degree Confluence Project
IPTC	International Press Telecommunications Council
LULC	Land use land cover
MMU	Minimum Mapping Unit
OGC	Open Geospatial Consortium
OSM	OpenStreetMap
VGI	Volunteered Geographic Information
W3C	World Wide Web Consortium

References

Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef]
Jokar Arsanjani, J.; Zipf, A.; Mooney, P.; Helbich, M. An introduction to OpenStreetMap in Geographic Information Science: Experiences, research, and applications. In OpenStreetMap in GIScience; Jokar Arsanjani, J., Zipf, A., Mooney, P., Helbich, M., Eds.; Lecture Notes in Geoinformation and Cartography; Springer International Publishing: Berlin, Germany, 2015; pp. 1–18. [Google Scholar]
Goodchild, M.F.; Glennon, J.A. Crowdsourcing geographic information for disaster response: A research frontier. Int. J. Digit. Earth 2010, 3, 231–241. [Google Scholar] [CrossRef]
Fritz, S.; McCallum, I.; Schill, C.; Perger, C.; See, L.; Schepaschenko, D.; van der Velde, M.; Kraxner, F.; Obersteiner, M. Geo-Wiki: An online platform for improving global land cover. Environ. Model. Softw. 2012, 31, 110–123. [Google Scholar] [CrossRef]
Connors, J.P.; Lei, S.; Kelly, M. Citizen science in the age of neogeography: Utilizing volunteered geographic information for environmental monitoring. Ann. Assoc. Am. Geogr. 2012, 102, 1267–1289. [Google Scholar] [CrossRef]
Ciepłuch, B.; Jacob, R.; Mooney, P.; Winstanley, A. Comparison of the accuracy of OpenStreetMap for Ireland with Google Maps and Bing Maps. In Proceedings of the Ninth International Symposium on Spatial Accuracy Assessment in Natural Resuorces and Environmental Sciences, Leicester, UK, 20–23 July 2010.
Girres, J.-F.; Touya, G. Quality assessment of the French OpenStreetMap dataset. Trans. GIS 2010, 14, 435–459. [Google Scholar] [CrossRef]
Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [Google Scholar] [CrossRef]
Jackson, S.; Mullen, W.; Agouris, P.; Crooks, A.; Croitoru, A.; Stefanidis, A. Assessing completeness and spatial error of features in Volunteered Geographic Information. ISPRS Int. J. Geo-Inform. 2013, 2, 507–530. [Google Scholar] [CrossRef]
Brovelli, M.A.; Minghini, M.; Molinari, M.; Mooney, P. Towards an automated comparison of OpenStreetMap with authoritative road datasets. Trans. GIS. 2016. [Google Scholar] [CrossRef]
Antoniou, V.; Morley, J.; Haklay, M. Web 2.0 geotagged photos: Assessing the spatial dimension of the phenomenon. Geomatica 2010, 64, 99–110. [Google Scholar]
Panoramio Panorank: The Panoramio Rankings Web. Available online: http://www.panorank.com/ (accessed on 13 December 2015).
Statistics Brain: Instagram Company Statistics. Available online: http://www.statisticbrain.com/instagram-company-statistics (accessed on 13 December 2015).
Franck, M. How many photos are uploaded to Flickr every day, month, year? Available online: https://www.flickr.com/photos/franckmichel/6855169886 (accessed on 15 December 2015).
Kisilevich, S.; Krstajic, M.; Keim, D.; Andrienko, N.; Andrienko, G. Event-based analysis of people’s activities and behavior using Flickr and Panoramio geotagged photo collections. In Proceedings of the 14th International Conference on Information Visualisation (IV), London, UK, 26–29 July 2010; pp. 289–296.
Andrienko, G.; Andrienko, N.; Bak, P.; Kisilevich, S.; Keim, D. Analysis of community-contributed space- and time-referenced data (example of Flickr and Panoramio photos). In Proceedings of the IEEE Symposium on Visual Analytics Science and Technology (VAST 2009), Atlantic City, NJ, USA, 12–13 October 2009; pp. 213–214.
Vu, H.Q.; Li, G.; Law, R.; Ye, B.H. Exploring the travel behaviors of inbound tourists to Hong Kong using geotagged photos. Tour. Manag. 2015, 46, 222–232. [Google Scholar] [CrossRef]
Bermingham, L.; Lee, I. Spatio-temporal sequential pattern mining for tourism sciences. Procedia Comput. Sci. 2014, 29, 379–389. [Google Scholar] [CrossRef]
Kisilevich, S.; Keim, D.; Andrienko, N.; Andrienko, G. Towards acquisition of semantics of places and events by multi-perspective analysis of geotagged photo collections. In Geospatial Visualisation; Moore, A., Drecki, I., Eds.; Lecture Notes in Geoinformation and Cartography; Springer: Heidelberg, Germany, 2012; pp. 211–233. [Google Scholar]
Lee, I.; Cai, G.; Lee, K. Exploration of geo-tagged photos through data mining approaches. Exp. Syst. Appl. 2014, 41, 397–405. [Google Scholar] [CrossRef]
Majid, A.; Chen, L.; Chen, G.; Mirza, H.T.; Hussain, I.; Woodward, J. A context-aware personalized travel recommendation system based on geotagged social media data mining. Int. J. Geogr. Inform. Sci. 2013, 27, 662–684. [Google Scholar] [CrossRef]
Rattenbury, T.; Good, N.; Naaman, M. Towards automatic extraction of event and place semantics from flickr tags. In Proceedings of the 30th Annual International SIGIR Conference (SIGIR ’07), Amsterdam, Netherlands, 23–27 July 2007; ACM Press: New York, NY, USA, 2007; pp. 103–110. [Google Scholar]
Chen, L.; Roy, A. Event detection from flickr data through wavelet-based spatial analysis. In Proceedings of the 18th ACM conference on Information and knowledge management (CIKM ’09), Hong Kong, China, 2–6 November 2009; pp. 523–532.
Brenner, M.; Izquierdo, E. Social event detection and retrieval in collaborative photo collections. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval (ICMR ’12), Hong Kong, China, 5–8 June 2012; p. 21.
Sun, Y.; Fan, H. Event identification from georeferenced images. In Connecting A Digital Europe through Location and Place; Huerta, J., Schade, S., Granell, C., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 73–88. [Google Scholar]
Hu, Y.; Gao, S.; Janowicz, K.; Yu, B.; Li, W.; Prasad, S. Extracting and understanding urban areas of interest using geotagged photos. Comput. Environ. Urban Syst. 2015, 54, 240–254. [Google Scholar] [CrossRef]
Jankowski, P.; Andrienko, N.; Andrienko, G.; Kisilevich, S. Discovering landmark preferences and movement patterns from photo postings. Trans. GIS 2010, 14, 833–852. [Google Scholar] [CrossRef]
Snavely, N.; Seitz, S.M.; Szeliski, R. Modeling the world from internet photo collections. Int. J. Comput. Vis. 2007, 80, 189–210. [Google Scholar]
Fonte, C.C.; Bastin, L.; See, L.; Foody, G.; Lupia, F. Usability of VGI for validation of land cover maps. Int. J. Geogr. Inform. Sci. 2015, 29, 1–23. [Google Scholar] [CrossRef]
Li, J.; Qin, Q.; Han, J.; Tang, L.-A.; Lei, K.H. Mining trajectory data and geotagged data in social media for road map inference: Mining social media for road map inference. Trans. GIS 2015, 19, 1–18. [Google Scholar] [CrossRef]
Estima, J.; Fonte, C.C.; Painho, M. Comparative study of Land Use/Cover classification using Flickr photos, satellite imagery and Corine Land Cover database. In Proceedings of the 17th AGILE International Conference on Geographic Information Science, Castellon, Spain, 1–6 June 2014.
Estima, J.; Painho, M. Photo based Volunteered Geographic Information initiatives: A comparative study of their suitability for helping quality control of Corine Land Cover. Int. J. Agric. Environ. Inform. Syst. 2014, 5, 73–89. [Google Scholar] [CrossRef]
Leung, D.; Newsam, S. Proximate sensing: Inferring what-is-where from georeferenced photo collections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010.
Leung, D.; Newsam, S. Exploring geotagged images for land-use classification. In Proceedings of the ACM multimedia 2012 workshop on Geotagging and its applications in multimedia (GeoMM ’12), Nara, Japan, 29 October–2 November 2012; ACM Press: New York, NY, USA, 2012; pp. 3–8. [Google Scholar]
Leung, D.; Newsam, S. Land cover classification using geo-referenced photos. Multimed. Tools Appl. 2014, 74, 1–21. [Google Scholar] [CrossRef]
Tsendbazar, N.E.; de Bruin, S.; Herold, M. Assessing global land cover reference datasets for different user communities. ISPRS J. Photogramm. Remote Sens. 2015, 103, 93–114. [Google Scholar] [CrossRef]
Bicheron, P.; Defourny, P.; Brockmann, C.; Schouten, L.; Vancutsem, C.; Huc, M.; Bontemps, S.; Leroy, M.; Achard, F.; Herold, M.; et al. Globcover: Products Description and Validation Report. Available online: http://postel.obs-mip.fr/IMG/pdf/GLOBCOVER_Products_Description_Validation_Report_I2.1.pdf (accessed on 10 December 2015).
Bontemps, S.; Defourny, P.; van Bogaert, E.; Arino, O.; Kalogirou, V.; Perez, J.R. GLOBCOVER 2009: Products Description and Validation Report 2011. Available online: http://due.esrin.esa.int/files/GLOBCOVER2009_Validation_Report_2.2.pdf (accessed on 10 December 2015).
Tateishi, R.; Uriyangqai, B.; Al-Bilbisi, H.; Ghar, M.A.; Tsend-Ayush, J.; Kobayashi, T.; Kasimu, A.; Hoan, N.T.; Shalaby, A.; Alsaaideh, B.; et al. Production of global land cover data—GLCNMO. Int. J. Digit. Earth 2011, 4, 22–49. [Google Scholar] [CrossRef]
Foody, G.M.; Boyd, D.S. Using volunteered data in land cover map validation: Mapping tropical forests across West Africa. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Munich, Germany, 22–27 July 2012; pp. 6207–6208.
Iwao, K.; Nishida, K.; Kinoshita, T.; Yamagata, Y. Validating land cover maps with Degree Confluence Project information. Geophys. Res. Lett. 2006, 33, L23404. [Google Scholar] [CrossRef]
Iwao, K.; Nasahara, K.N.; Kinoshita, T.; Yamagata, Y.; Patton, D.; Tsuchida, S. Creation of new global land cover map with map integration. J. Geogr. Inform. Syst. 2011, 3, 160–165. [Google Scholar] [CrossRef]
Kinley, L. Assessing the potential for crowdsourced geospatial content to enhance the quality of authoritative land cover mapping. In Proceedings of the AGI GeoCommunity’13 Open for Business, Nottingham, UK, 17–18 September 2013.
Purves, R.S.; Edwardes, A.; Fan, X.; Hall, M.; Tomko, M. Automatically generating keywords for georeferenced images. In Proceedings of GISRUK’2008, Manchester, UK, 2–4 April 2008.
Hecht, B.; Stephens, M. A tale of cities: Urban biases in volunteered geographic information. In Proceedings of the ICWSM 2014, Ann Arbor, MI, USA, 1–4 June, 2014.
Bishr, M.; Kuhn, W. Geospatial information bottom-up: A matter of trust and semantics. In The European Information Society; Fabrikant, S.I., Wachowicz, M., Eds.; Springer: Heidelberg, Germany, 2007; pp. 365–387. [Google Scholar]
Flanagin, A.; Metzger, M. The credibility of volunteered geographic information. GeoJournal 2008, 72, 137–148. [Google Scholar] [CrossRef]
Hochmair, H.H.; Zielstra, D. Positional accuracy of Flickr and Panoramio images in Europe. In Proceedings of the Geoinformatics Forum, Salzburg, Austria, 3–6 June 2012; pp. 14–23.
Zielstra, D.; Hochmair, H.H. Positional accuracy analysis of Flickr and Panoramio images for selected world regions. J. Spat. Sci. 2013, 58, 251–273. [Google Scholar] [CrossRef]
Scassa, T. Legal issues with volunteered geographic information. Can. Geogr./Le Géogr. Can. 2013, 57, 1–10. [Google Scholar] [CrossRef]
Cho, G. Some legal concerns with the use of crowd-sourced Geospatial Information. IOP Conf. Ser. Earth Environ. Sci. 2014, 20, 012040. [Google Scholar] [CrossRef]
Munson, M.A.; Caruana, R.; Fink, D.; Hochachka, W.M.; Iliff, M.; Rosenberg, K.V.; Sheldon, D.; Sullivan, B.L.; Wood, C.; Kelling, S. A method for measuring the relative information content of data from different monitoring protocols: Measuring relative data quality. Methods Ecol. Evol. 2010, 1, 263–273. [Google Scholar] [CrossRef]
The Open Geospatial Consortium (OGC): W3C Spatial Data on the Web Use Cases & Requirements. Available online: https://www.w3.org/TR/sdw-ucr/ (accessed on 19 December 2015).
Eurostat LUCAS 2015 (Land Use/Cover Area Frame Survey). Technical Reference Document C1 Instructions for Surveyors. Available online: http://ec.europa.eu/eurostat/documents/205002/6786255/LUCAS2015-C1-Instructions-20150227.pdf/bbc63453-568f-44fc-a149-8ef6b04626d7 (accessed on 19 December 2015).
Eurostat LUCAS 2015 (Land Use/Cover Area Frame Survey). Technical Reference Document C3 Classification (Land Cover & Land Use); Available online: http://ec.europa.eu/eurostat/documents/205002/6786255/LUCAS2015-C3-Classification-20150227.pdf/969ca853-e325-48b3-9d59-7e86023b2b27 (accessed on 19 December 2015).
International Press Telecommunications Council: Social Media Sites Photo Metadata Test Results. Available online: http://www.embeddedmetadata.org/social-media-test-results.php (accessed on 11 December 2015).
Stehman, S.V. Sampling designs for accuracy assessment of land cover. Int. J. Remote Sens. 2009, 30, 5243–5272. [Google Scholar] [CrossRef]
Herold, M.; Mayaux, P.; Woodcock, C.E.; Baccini, A.; Schmullius, C. Some challenges in global land cover mapping: An assessment of agreement and accuracy in existing 1 km datasets. Remote Sens. Environ. 2008, 112, 2538–2556. [Google Scholar] [CrossRef]
Büttner, G.; Kosztra, B.; Maucha, G.; Pataki, R. Implementation and Achievements of CLC2006; European Environment Agency: Copenhagen, Denmark, 2012. [Google Scholar]
McCallum, I.; See, L.; Sturn, T.; Salk, C.; Perger, C.; Duerauer, M.; Karner, M.; Moorthy, I.; Domian, D.; Fritz, S. Engaging citizens in enviromental monitoring via gaming. In Proceedings of ENVIP, Barcelona, Spain, 28–30 October 2015.
Wiggins, A.; Newman, G.; Stevenson, R.D.; Crowston, K. Mechanisms for data quality and validation in citizen science. In Proceedings of the IEEE Seventh International Conference on e-Science Workshops (eScienceW), Stockholm, Sweden, 5–8 December 2011; pp. 14–19.
Sheppard, S.A.; Terveen, L. Quality is a verb: The operationalization of data quality in a citizen science community. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration, Mountain View, CA, USA, 3–5 October 2011; ACM Press: New York, NY, USA, 2011; pp. 29–38. [Google Scholar]
Arnold, S.; Kosztra, B.; Banko, G.; Smith, G.; Hazeu, G.; Bock, M.; Valcarcel Sanz, N. The EAGLE concept—A vision of a future European Land Monitoring Framework. In Proceedings 33th EARSeL Symposium towards Horizon 2020, Matera, Italy, 3–6 June 2013; pp. 551–568.
Foody, G.M.; See, L.; Fritz, S.; Van der Velde, M.; Perger, C.; Schill, C.; Boyd, D.S. Assessing the accuracy of volunteered geographic information arising from multiple contributors to an internet based collaborative project. Trans. GIS 2013, 17, 847–860. [Google Scholar] [CrossRef]

Figure 1. (a) OpenStreetMap, credit © OpenStreetMap contributors and (b) satellite image (credit © ESRI, provided base map) of the study area for examination of geo-tagged photographs from Flickr, Panoramio and Geograph.

Figure 2. Percentage of photographs to which one class, two classes of three classes were assigned by the seven volunteers, considering the classes “Yes”, “Maybe” or “No”, indicating whether the photographs are useful for extracting land cover information.

Figure 3. Number of photographs assigned to each class “Yes”, “Maybe” or “No” by the seven volunteers Vi (i = 1, …, 7) for (a) Panoramio; (b) Flickr and (c) Geograph.

Figure 4. Percentage of photographs to which one class, two classes of three classes were assigned by the five volunteers remaining after outliers were excluded from the analysis, considering the classes “Yes”, “Maybe” or “No”, indicating whether the photographs are useful to extract land cover information.

Figure 5. Mean number of photographs assigned to the classes “Yes”, “Maybe” and “No” by the five volunteers.

Figure 6. Standard deviation of number of photographs assigned to the classes “Yes”, “Maybe” and “No” by the five volunteers.

Table 1. Protocols associated with different sites where photographs are shared/uploaded.

**Table 1.** Protocols associated with different sites where photographs are shared/uploaded.
Primary Aim	Site	Protocols
Social networking/sharing of all kinds of information	Facebook/Google+	Minimum: None Optional: Tag friends; add comments; add location automatically from the photographs, if present.
	Foursquare	Minimum: None other than the photograph must be linked to one of four objects, e.g., the venue or a tip provided by the venue. Optional: Coordinates (and accuracy), altitude (and vertical accuracy), up to 200 characters of text to accompany the photograph.
	Pinterest	Minimum: Need to upload photographs to a “board” and add a description to the photograph. Optional: Add location on a map; tag friends.
	Twitter	Minimum: Text from the tweet, which might describe the photograph. Optional: Tag users, enhance and filter photographs, location.
Photograph sharing sites	Flickr	Minimum: None Optional: Title, description, tags, location. Assign photograph to a Flickr Group where there is internal moderation of the photographs’ theme.
	Instagram	Minimum: None but can only upload from mobile devices; EXIF data are removed from the photographs before upload unless saved to a person’s Photo Map. Optional: Effects or filters can be added; a caption; location information.
	Panoramio	Minimum: None. Optional: Title, comment, tags and location.
	Picasa	Minimum: None. Optional: Captions, manually locate photos using Google Earth.
Documenting Landscapes	Degree Confluence Project	Minimum: X, Y location, resolution 600 × 400 pixels 16-bit; single view shots; 2 pictures from the confluence (within 100 meters of the confluence); brief description of the confluence and the surrounding area. Optional: Photograph documenting the GPS acquisition (WGS84 position, altitude, reported error, and date/time); 4 pictures taken in the cardinal directions (N, S, E, W), or one or more panoramic views from the confluence; 1 picture of the general area of the confluence.
	Geograph	Minimum: 480 pixel long edge as jpg only but optimal is 640 pixels long edge; grid reference of 1km square; position of the photographer; position of the subject; title for the photograph; geographical context (must click one category); accept the terms of conditions of the CC license. The date taken is read from the photograph along with the data uploaded, and the view direction is calculated. Optional: A more detailed description/comment can be added after the title; as many of the geographical context tags can be selected; optional tags can be added.
	Oklahoma Field Photo library	Minimum: X, Y location. Optional: LULC category from a dropdown list; orientation (from 8 categories); a description field of more detailed LULC or other ancillary information about the photograph.
	Pictures Geo-Wiki	Minimum: X, Y location, direction/orientation, tilt, offset in meters (adjusted manually in user settings), date taken, accuracy of GPS, some limited information about the photographer from the initial registration on Geo-Wiki, land cover tags (or per-set tags from user constructed legends) Optional: Additional comments/tags can be added.
Professional in-situ data collection	Land Use/Cover Area frame Survey (LUCAS) (Eurostat)	The protocol is very detailed but this description deals only with the photographs. Minimum: A photograph of the LUCAS point is taken and should contain a stable landmark. A marker is placed on the point if the point is reachable. Four photographs are then taken in the mandatory order of N, E, S and W. Land cover and land cover percentage is specified from a predefined legend. Additional: Additional photographs are taken of irrigations systems, transects, soil, reasons of why the photograph could not be reached, or photos that complement the required ones (if relevant).

Table 2. An inventory of the types of metadata that are and could be associated with geo-tagged photographs.

**Table 2.** An inventory of the types of metadata that are and could be associated with geo-tagged photographs.
Data	Explanation	Flickr	Panoramio	Geograph
Location information	Photograph with location information, a place name or x,y coordinates	√	√	√
Direction/Orientation	Compass direction and precision	X	X	√
Tilt	Applicable if taken with a smartphone/tablet	X	X	X
Offset (in meters) of the subject of the photograph	How far in meters is the object being photographed away from the photographer	X	X	X
Date uploaded	The date the photograph was uploaded to the application	√	√	√
Date taken	The date the photograph was taken	√	X	√
Weather	Information about the weather conditions when the photograph was taken	X	X	X
Method of georeferencing	Manually located on a map, automatically matched using image processing or device-enabled positioning (e.g., GPS-enabled device, wifi/IP positioning)	X	X	X
Type of GPS-enabled device	Make and model of: GPS, smartphone, tablet, camera with built-in GPS	X	X	X
Accuracy of GPS-enabled device	Accuracy in meters	X	X	X
Focal length	In mm, which provides an indication of the zoom level	√ *	√ *	X
Reference length or area	Presence of a reference length or area on the photograph, e.g., a measuring stick	X	X	X
Type of tags present	None, fixed categories, freeform or mixed	Freeform	Freeform	Mixed
Tags	The tags that accompany the photograph	√	√	√
Description	Text describing the content portrayed in the photograph.	√	√	√
Title	Title of photograph	√	√	√
Different directions	Requirement to take the photograph in either four cardinal directions or panoramic	X	X	May be present
Information about the photographer	Age, gender, expertise, relationship to other photographers, home location of photographer (country, place, XY coordinate), if part of an online photo sharing website/system then perhaps the number of photographs the photographer has, the number of groups he/she is involved in, status and kudos indicators	Some may be present	Some may be present	Some may be present
Licensing	Openly available with no license, type of license, e.g., Creative Commons and the level of use, for private use only	√	√	√

* Information that may be in the EXIF file and will be available or not depending on the policy of the repository.

Table 3. Rules used to assist in the classification of the photographs as useful for land cover applications.

**Table 3.** Rules used to assist in the classification of the photographs as useful for land cover applications.
Rule Number	Rule Description
1	Land cover is only considered when it is within about 10 m of the photographer, to take into account positioning errors of the photograph. Thus, land cover types in the far distance should not be considered.
2	If it is possible to see or infer with reasonable certainty what is at the photographer’s footprint (even when the footprint is not visible), and there is only one possible class from the list indicated in Section 3.2.3, choose “Yes”.
3	If more than one of the classes above can be assigned to the photographer’s footprint vicinity (using the 10 m limit defined in 1), choose “maybe”.
4	If there is no information about what may be at the photographer’s footprint, e.g., an aerial or panoramic view, then choose “no”.
5	Individual trees are discounted regarding the dominant land cover (e.g., a tree in a grass field) unless one can infer from the photograph that there are many trees around.
6	For vintage photographs, the answer is “no”, since the land cover may have changed (or the photograph may be incorrectly geo-tagged).
7	For snow that completely covers the surface (so it is unclear what the underlying land cover is), because the study area is in London, the answer should be “no”. Here context is used, not only the photograph, because in the city of London it is known that no permanent snow cover exists.
8	For photographs taken underground, i.e., in a metro station, the answer is “no”. If the station is clearly above ground and there is no other land cover type within 10 m, then the answer is “Yes” (artificial surfaces).
9	Water frequently causes difficulties because in many cases it is not possible to unequivocally determine if the photograph was taken from a boat (then the answer should be “Yes”), on a bridge, or at the water vicinity. Then, if the water is identified to be within 10 m of the photographer, the answer is “maybe”.

Table 4. Categorization of the metadata into essential, desirable and unnecessary by use case.

**Table 4.** Categorization of the metadata into essential, desirable and unnecessary by use case.
Metadata Requirements	Use Case 1 (Classifiers Training)	Use Case 2 (LU/LC Map Validation)	Use Case 3 (Complement Validation)
Essential	Location information, Date (uploaded or taken)	Location information, Date taken, Method of georefencing (i.e., by GPS-enabled device)	Location information, Date (uploaded or taken)
Desirable	Tilt, Direction/Orientation, Offset (in meters) of the subject of the photograph, Method of georeferencing (any method for Use Cases 1 and 3), Type of GPS-enabled device, Accuracy of GPS-enabled device, Focal length, Reference length or area, Type of tags present, Tags, Title, Description, Different directions, Licensing, Information about the photographer (for Use Case 2)
Unnecessary	Weather, Information about the photographer (for Use Cases 1 & 3)

Table 5. Analysis of the tags associated with the geo-tagged photographs in Geograph, Flickr and Panoramio and the descriptions and titles for Geograph and Flickr.

**Table 5.** Analysis of the tags associated with the geo-tagged photographs in Geograph, Flickr and Panoramio and the descriptions and titles for Geograph and Flickr.
		Tags			Descriptions Geograph	Titles Flickr
		Geograph	Flickr	Panoramio	Descriptions Geograph	Titles Flickr
Number of tags/words in descriptions/words in titles	Mean	4.5	9.8	2.8	18.9	3.9
	Median	4	7	2	15	3
	St. Dev.	2.3	8.2	2.6	13.6	4.5
	Minimum	1	1	1	1	1
	Maximum	16	60	20	56	34
	Total	1543	6809	2834	11,841	3653
Number of photographs		344	696	787	628	927

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Antoniou, V.; Fonte, C.C.; See, L.; Estima, J.; Arsanjani, J.J.; Lupia, F.; Minghini, M.; Foody, G.; Fritz, S. Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data. ISPRS Int. J. Geo-Inf. 2016, 5, 64. https://doi.org/10.3390/ijgi5050064

AMA Style

Antoniou V, Fonte CC, See L, Estima J, Arsanjani JJ, Lupia F, Minghini M, Foody G, Fritz S. Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data. ISPRS International Journal of Geo-Information. 2016; 5(5):64. https://doi.org/10.3390/ijgi5050064

Chicago/Turabian Style

Antoniou, Vyron, Cidália Costa Fonte, Linda See, Jacinto Estima, Jamal Jokar Arsanjani, Flavio Lupia, Marco Minghini, Giles Foody, and Steffen Fritz. 2016. "Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data" ISPRS International Journal of Geo-Information 5, no. 5: 64. https://doi.org/10.3390/ijgi5050064

APA Style

Antoniou, V., Fonte, C. C., See, L., Estima, J., Arsanjani, J. J., Lupia, F., Minghini, M., Foody, G., & Fritz, S. (2016). Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data. ISPRS International Journal of Geo-Information, 5(5), 64. https://doi.org/10.3390/ijgi5050064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data

Abstract

1. Introduction

2. Protocols for Existing Geo-Tagged Photograph Sites and Inventory of Metadata

2.1. Current Protocols for Geo-Tagged Photographs

2.2. Inventory of Metadata for Geo-Tagged Photographs

3. Methodology

3.1. Study Area

3.2. Use Cases

3.2.1. Metadata Requirements of the Use Cases

3.2.2. Analysis of Metadata

3.2.3. Analysis of Content Usability

4. Results

4.1. Metadata Requirements of the Use Cases

4.2. Analysis of Metadata

4.3. Usability of Photographs Based on Content Analysis

5. Discussion

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI