Next Article in Journal
Hydrological Modeling with Respect to Impact of Land-Use and Land-Cover Change on the Runoff Dynamics in Godavari River Basin Using the HEC-HMS Model
Previous Article in Journal
An Automated Processing Method for Agglomeration Areas

ISPRS Int. J. Geo-Inf. 2018, 7(6), 205; https://doi.org/10.3390/ijgi7060205

Article
The Implications of Field Worker Characteristics and Landscape Heterogeneity for Classification Correctness and the Completeness of Topographical Mapping
Department of Geography, University of Tartu, Vanemuise 46, 51014 Tartu, Estonia
*
Author to whom correspondence should be addressed.
Received: 6 April 2018 / Accepted: 27 May 2018 / Published: 29 May 2018

Abstract

:
The quality of spatial data may vary spatially. If mapping (interpretation of orthophotos) is done during fieldwork, this variation in quality may occur as a result of differences in the complexity of the landscape, differences in the characteristics of individual field workers, and differences in their perception of the landscape. In this study, we explored the interaction between the characteristics of these workers, including their gender and years of experience (as a proxy for their mapping skills), and landscape heterogeneity. There was no significant difference between male and female workers. Although field workers with more years of experience generally had higher mapping quality, the relationship was not statistically significant. We found differences in the rates of misclassification, omission, and commission errors between workers in different landscape types. We conclude that the error rates due to misclassification, omission, and commission were the lowest in more diverse landscapes (high number of different land use types) with a relatively high amount of buildings, whereas the error rates were the highest in mainly forested landscapes with larger and more complex shaped patches.
Keywords:
classification correctness; commission; landscape metrics; omission; spatial data quality; thematic accuracy; topographical mapping; volunteered geographic information

1. Introduction

Spatial data quality has been the subject of discussions for more than 30 years [1,2,3,4,5,6]. During this period, researchers have conducted several academic studies on error or uncertainty modelling [7,8,9] and on how to communicate data quality information [10,11,12]. In addition, several international standards related to data quality have been established [13]. Standardisation work has been completed under the ISO 1957 standard [14], which clarifies the scope of data quality, defines the elements and the measures of quality, describes quality assessment procedures, provides guidelines for reporting the results of a quality evaluation, and introduces the concept of metaquality. Thus, metaquality provides information on the evaluation of quality and the results in order to describe the suitability of the evaluation method, the measure or measures that were applied, and the results. [14]. Nevertheless, confusion in using quality terms may still occur. For example, completeness and thematic accuracy have been defined as quality measures by Senaratne et al. [15], whereas ISO 19157 [14] defines them as quality elements.
In order to describe spatial data quality, ISO 19157 provides 21 quality elements (Figure 1), which have been organized into six categories: completeness, thematic accuracy, logical consistency, temporal quality, positional accuracy, and usability. However, the standard allows for expansion of the list, so that one can always add more quality elements. Talhofer et al. [16] suggested using five essential criteria for the evaluation of spatial data quality based on value-analysis theory: database content, database technical quality, database timeliness, area importance, and user friendliness; the latter is intended to consider data quality from the user’s point of view. Fonte et al. [17] proposed additional quality indicators for volunteered geographic information (VGI).
In the present study, we focused on three data quality elements: classification correctness, commission, and omission. Hereafter, we will refer to these as misclassification, commission, and omission (MCO) errors. To make our discussion less abstract, we will refer to the topographical data produced by the Estonian Land Board (Estonia’s national mapping agency) [18]. The Estonian topographic dataset includes not only information about relief, but also information about infrastructure (e.g., roads and electric power lines), settlements, hydrography, and land use [19]. In the current study, we did not consider relief and focused only on the other map elements. To assess MCO errors, we used direct evaluation methods [14]. In this approach, data are compared with ground-based data obtained by means of field-mapping, or with reference data such as highly accurate maps or imagery [20,21]. Using reference data for assessment of spatial data quality is common practice in remote sensing [22] and in VGI [23,24,25,26]. However, large-scale topographical maps (typically at a scale of ≥1:50,000 [27]) focus on small areas, and in habitat mapping, more accurate maps are rarely available. Moreover, the large-scale topographical datasets produced by national mapping agencies are very often used as reference data themselves [28]. Therefore, ground-based data is often used for an assessment of MCO errors in topographical or habitat data [29,30].
The quality of spatial data may vary spatially [31]. If mapping is done by fieldwork, the quality may be variable as a result of differences in the complexity of the landscape and in the mapping skills of individual workers, as well as in their perception of the landscape. Smith et al. [32], van Oort et al. [22], and Tran et al. [33] found that the probability of correct classification of satellite images depended on landscape complexity. The probability of correct classification was higher in more homogeneous landscapes and lower in heterogeneous landscapes. The effect of the mapping skills of the individual mappers on habitat mapping quality was tested by Cherrill and McClean [34]. In their study, six field workers independently surveyed the same area. Only 7.9% of the total study area was classified as the same land cover type by all six surveyors. This indicates that potentially large inconsistencies in mapping may result from differences in mapping skills among field surveyors. Hearn et al. [35] found that increasing years of experience and experience with mapping certain landscape types improved mapping quality, although mapping time, cost, and length of route did not correlate with the mean level of agreement among surveyors. Moreover, several studies [36,37,38,39] highlighted the possibility of gender differences in spatial ability (i.e., men and women interpret spaces differently). Reilly et al. [40] noted that gender gaps in spatial ability were the largest of all gender differences in cognitive abilities.
However, to our knowledge, the interaction between the impact of gender, years of experience, and landscape heterogeneity has not yet been investigated.
The main aims of the present study were to determine whether and how MCO errors differed among field workers and whether any differences were influenced by landscape heterogeneity. We hypothesised that mapping quality would decrease in heterogeneous landscapes and landscapes with relatively closed viewsheds, and that it would be affected by the worker’s characteristics.

2. Materials and Methods

2.1. Study Area and Data

We used quality control results (MCO errors) to validate the Estonian Basic Map, which is a national topographic vector database (1:10,000) that was produced by the Estonian Land Board after Estonia regained its independence in 1991 [41]. Because of the poor quality of Soviet maps [42], the Estonian Basic Map was created from scratch by means of stereo-photogrammetry [43] supported by extensive fieldwork, and the results were subsequently inspected by quality controllers who also worked in the field. The quality controllers inspected mapping quality along linear routes [18]. These routes and the quality control results were used in the current study.
To define the size of the controlled area, which was used for spatial analysis and the calculation of quality measures, we generated buffers around the field inspection routes. We used buffer widths of 50 m on either side of the route if the surrounding landscape had a “closed” viewshed (e.g., was located in forests, shrubs, or built-up areas) and 100 m in landscapes with “open” viewsheds. The routes were 11 to 15 km long. The quality control results were obtained from 2003 to 2006, and were recorded in an error database that comprised 5100 records. In total, the work of 21 workers was inspected by 6 quality controllers at 93 sites (Figure 2), which covered all main landscape types in Estonia [44]. The field workers were trained in the field classification of a landscape and supported by a detailed mapping specification [45] that included detailed descriptions of each category. Moreover, in each spring before the mapping season, a joint two-day seminar for all workers and quality controllers was held in order to harmonise their feature classifications. At the time of the survey, workers had 2 to 11 years of fieldwork experience. For each field worker, we also obtained data about their gender and years of experience. Because no formal skill assessments were obtained for the field workers, we used their years of experience as a proxy for their skill level.

2.2. Quality Elements and Measures

We described fieldwork quality in terms of completeness and thematic accuracy (Figure 1). Completeness was subdivided into two quality elements: omission and commission. Omission represents a case in which a landscape feature that must be mapped is missing, whereas commission represents a case in which a feature exists on the map, but not in the landscape. For thematic accuracy, we used only the classification correctness (expressed as the proportion of the total objects that were misclassified), which represented the agreement between map objects and the corresponding features in the landscape.
The ISO 19157 standard notes that errors can be classified differently into quality elements. For example, the misclassification of a local road as a path could be considered an error of omission of a local road or alternatively an error of commission in defining a path; therefore, it is possible to define two errors for the same problem [18]. For efficient quality control, it is important to classify all errors in the same way. For example, one rule that was used in quality reporting for the Estonian Basic Map concerns areal features; it states that polygons can only be recorded as misclassified, not as commission or omission errors [18]. Therefore, in our study, areal features do not have commission or omission errors (Table 1).
Each quality element is described by a quality measure. To offer quality results in a comparable way, ISO 19157 [14] provides a list of data quality measures. However, the list is not complete and users can define their own measures according to the structure given by the standard. In the present study, we used error rate as a quality measure for all quality elements. This rate is expressed as the total number, length or area of erroneous items in a geometrical type (e.g., lines) divided by the total number, length or area of items in that geometrical type, multiplied by 100. As there are different numbers of points, lines, and polygons in the landscape, each of these geometrical types comprises a different total number of features. Therefore, we calculated a weighted average for the error rate of these three geometrical types for every quality element, and summarised these values across all types to obtain a single combined error rate (Equations (1)–(3)). The weights equalled the proportion of the total numbers of point, line, and polygon features in the total number of features (based on the total number from the assessments by the expert quality controllers). Based on this assessment, 30% of the features were points in the present study, so the weight for points was 0.30. For lines, the weight was 0.48, and for polygons, the weight was 0.22.
MWA = 0.22 Mpoly + 0.48 Mline + 0.30 Mpoint
CWA = (0.48 Cline + 0.30 Cpoint)/(0.48 + 0.30)
OWA = (0.48 Oline+ 0.30 Opoint)/(0.48 + 0.30)
where M, C, and O are the rates of misclassification, commission, and omission errors, respectively; WA indicates the weighted average; and poly, line, and point represent the corresponding geometrical types.

2.3. Landscape Indicators

The topographical vector database of the Estonian Basic Map consists of lines, points, and polygons. Landscape indicators can be calculated only for polygons. Therefore, we integrated lines (e.g., hedgerows, watercourses, and fences) and points (e.g., groves, trees, and heaps of stones) by overlapping them to create areal features (e.g., fields, forests, and buildings) using buffers with an average width equal to that of the feature in reality [46].
A landscape structure can be characterized by its composition (the amount of each patch type within the landscape), which is measured by the proportions of different land use types and by diversity metrics, although these metrics are not spatially explicit. The structure can instead be measured based on its configuration (the spatial distribution or spatial characteristics of patches within the landscape), which is measured by edge shape and size metrics [47]. From a mapping perspective, both aspects are important. In this study, we used the Patch Analyst 5.1 software [48] to calculate 18 landscape indicators for each field-inspection site (Table 2). In addition to classical landscape metrics, we also calculated the proportion of land use that can be considered open areas (e.g., field and grassland), closed areas (e.g., forest, bush, and orchard), and built-up areas (e.g., yards with buildings) of the landscape to describe the land use composition.

2.4. Statistical Analyses

Landscape indicators have different units and scales. To rescale the data so that it could be analysed together, we standardised all landscape indicators before the statistical analysis to have the properties of a standard normal distribution with μ = 0 and σ = 1. Many of the landscape metrics are very strongly correlated. Therefore, we eliminated some of them by means of factor analysis using the varimax rotation. To find similar landscapes among the field inspection sites (Figure 1), we used k-means clustering [49] based on the factor scores for the landscape indicators and the proportion of built-up areas in the landscape. We identified similar landscape types to see if there were differences in error rates within similar landscape types. According to the Kolmogorov–Smirnov test for normality, none of the quality measures under consideration were normally distributed. Therefore, we used Kruskal–Wallis analysis of variance (ANOVA) to calculate mean rank values of error rates for each landscape type and used post-hoc multiple comparisons to identify significant differences between landscapes.
To detect differences in spatial data quality among the workers by gender and years of experience, we used box-plots and the Mann–Whitney U test. We also performed Spearman rank-order correlation to test the relationship between years of experience and error rates. All analyses were performed in Statistica 12 software [50].

3. Results

3.1. Landscape Metrics

The factor analysis showed that the first two factors together explained 62.9% of the total variation in the landscape indicators, and the first four factors together explained 82.3% of the variation (Table 3). A rule-of-thumb for retaining factors is that the associated eigenvalue must be greater than 1 [51]. All four factors met this criterion, although the eigenvalue of the fourth factor was close to the threshold; we retained the fourth factor because it appeared to be uniquely and strongly associated with one of the landscape metrics (MSI). The first axis was significantly positively correlated with diversity (SDI, SEI), edge (ED), shape (AWMPFD), and patch density (PD) metrics; it was also significantly negatively correlated with patch size (MPS). We described this axis as diversity. The second axis was most strongly correlated with patch size metrics (PSCoV, PSSD), but also with shape metrics (MPFD, MPAR) and patch richness density (PRD). As MPAR and MPFD are both based on patch area, we described this axis as the patch size distribution. The third axis was clearly related to landscape openness and closure, and we therefore named it closure.
The fourth axis was only significantly related to MSI (shape), but the loading was strong. Metrics that evaluate the shape of the patches have been one of the important factors in most previous landscape studies [52,53,54]. Moreover, MSI is a good indicator of human influence on the landscape, because its value is significantly lower for areas with strong human influence, as humans tend to create patches with a regular shape. Therefore, we named this axis patch shape complexity. BU only contributed significantly to axis 3, but as built-up areas are important features in field mapping and the loading was strong, we included BU as an additional parameter to in our cluster analysis.

3.2. Cluster Analysis of the Landscapes

We used clustering based on the factor scores for the landscape indicators and the proportion of built-up areas in the landscape to reveal clusters that represent different landscape types (Table 4, Figure 3). There were 17 sites in the first cluster and these landscapes had high diversity, mainly because of the relatively high amount of buildings between fields and grasslands. Patch shape complexity was very low, mainly because of the simple shape of buildings and other characteristics of built-up areas. We named this cluster built-up-diverse. The second landscape cluster had 37 sites. The diversity and the proportion of built-up areas in those landscapes were very low, patch shape complexity was relatively simple, and they were mainly dominated by cultivated and grassland areas. Therefore, we named this landscape cluster open-simple. The third landscape cluster had the opposite of the characteristics in the second cluster, with higher diversity, closure, and patch shape complexity, so we named this landscape cluster closed-complex. The 39 sites belonging to this landscape cluster were mainly forested.

3.3. Characteristics of Field Workers

Of the 21 field workers, 67% were male and 33% female (Table 5). Female field workers had slightly lower error rates than men. However, the Mann–Whitney U test showed that the difference in error rates between the male and female workers was not statistically significant (Figure 4a).
The experience in field mapping ranged from 2 to 11 years. One-third of the field workers had five or fewer years of experience, and two-thirds had more than five years of experience (Table 5). According to the Spearman rank-order correlation, there was no statistically significant negative relationship (ρ = −0.38; p= 0.09) between the years of experience and MCO error rates (Figure 5). Nonetheless, Figure 5 indicates an overall decreasing trend in error rates with increasing years of experience. There was only one field worker with two years of experience, and he had one of the lowest error rates (Figure 4b). Workers with three to four years of experience had significantly higher error rates, but the error rate decreased thereafter.

3.4. Misclassification, Commission, and Omission (MCO) Error Rates in Different Landscapes

The Kruskal–Wallis H test showed a statistically significant difference between the error rates in different landscapes. All three types of error rates were lowest in the built-up-diverse landscapes (Figure 6). The variation of the error rate was also the lowest for the built-up-diverse landscapes. The highest error rate occurred in closed-complex landscapes, which also had the highest variation. The commission category had the lowest error rate across all landscapes, and there was only a statistically significant difference for commission error rates between the built-up-diverse and open-simple landscapes. Open-simple landscapes had slightly higher commission error rates than built-up-diverse areas. Misclassification error rates varied the most across landscapes, with the highest values in closed-complex landscapes and the lowest values in built-up-diverse areas.

3.5. Misclassification, Commission, and Omission Error Rates among Field Workers in Different Landscapes

The number of sites mapped by a given field worker was unevenly distributed, and ranged from 1 to 11 (Table 5). Six field workers had inspected only one site, but nine field workers had inspected at least five sites (Table 5). In the built-up-diverse landscapes, the error rates were relatively low for all field workers (Figure 7). In addition, 13 field workers mapped open-simple and closed-complex landscapes, and nine of them made fewer mistakes in open-simple landscapes. There were five field workers (2, 6, 10, 16, and 18) who worked in all three landscape types, which let us evaluate the effect of landscape pattern on mapping quality independently from the characteristics of the individuals (Figure 7). Four out of five had the lowest error rates in built-up-diverse landscapes, and three out of five exhibited the highest error rates in closed-complex landscapes. All of them had higher error rates in closed-complex landscapes than in built-up-diverse landscapes. In general, there seems to be a trend that the more closed the landscape and the more complex its shape, the higher the error rate.

4. Discussion and Conclusions

Based on the factor and cluster analysis, we divided the study sites into three different landscape types: built-up-diverse, open-simple, and closed-complex. We found that the rates of MCO errors in these different landscape types differed significantly. The lowest error rates were in the built-up-diverse landscapes. This is likely a result of the fact that buildings are very distinctive features of the landscape and are very easy to recognize in the field. At the same time, buildings increase the landscape diversity, which might increase the attention of the field worker, leading to fewer mistakes. In addition, built-up-diverse landscapes had high openness, which increases the visibility of features such as buildings and thus eases mapping. The variation of error rates was also lowest in this landscape type, which in turn might be caused by the lower number of sites in this landscape type (i.e., fewer workers are required to map a smaller number of sites, leading to decreased variation).
In contrast, the closed-complex landscape types were mainly forested areas, and the error rates and variation of the error rates were highest. This can probably be explained by the mapping technology. Stereophotogrammetry was used for mapping all of the objects that could be recognized from the aerial photos, but the class of the object often could not be determined clearly by this method. In addition, many objects could be hidden by dense forest cover and not detected at all; this may mean that the field worker will not look for those features. Mõisja et al. [18] found that the features with the highest error frequency were culverts, paths, forest cutlines (8-m-wide areas in which trees had been removed), and ditches, which are often not visible from the aerial photos. These features were common in the forested areas and at the edges of forested areas. Therefore, their presence would lead to a high error rate in this landscape type. Moreover, the closed-complex landscapes have low visibility, which requires more work to cover the landscape on foot.
In terms of the quality elements, misclassification was the most common error type, regardless of the landscape type and field worker. This is also likely to be because of the mapping method, in which all features that are recognizable from the aerial photos have been put on the map based on stereophotogrammetry, followed by subsequent verification of these features during the field work [18]. Misclassification mainly results from uncertainty in classifying some objects, because some natural features are hard to consistently place in the same class, and it can be difficult to draw clear borders between certain features [35,55]. Some misclassification mistakes may also occur when the field worker has not actually checked the feature in the field. The misclassification errors are most frequent between relatively similar classes, in which case the errors may be unimportant from a practical perspective, whereas other errors may be highly significant [20]. In contrast with the classification of remote sensing images, in which commission errors are more common than omission errors, we found that commission errors were least common. This is likely because the field workers mostly checked the features detected from the aerial photos; in contrast, field checks are often not performed in the classification of remote sensing images.
When we explored each landscape type separately, we found large variation in error rates within each landscape type. This indicates that the individual characteristics of field workers have some effect on the mapping quality. Mõisja et al. [18] reached the same conclusion in their investigation of the distribution of errors among feature types and field workers. For example, they found that more than half the errors related to grasslands and narrow ditches were made by just two field workers; similarly, most of the mistakes related to culverts and open spaces were made by three field workers. This means that some field workers are not successful in correctly mapping specific features. This agrees with studies that explored the accuracy of vegetation mapping, which found that individual skills affected the values of MCO error rates [34,35,56]. Moreover, our results indicated that in some landscapes, individual characteristics may have had a bigger effect on mapping quality than in other landscapes. For example, in the built-up-diverse landscapes, the variation of error rates and overall error rates were lower than in closed-complex and open-simple landscapes, which suggests that built-up areas were easier to map than natural areas. In contrast, different vegetation communities that are similar in species composition and appearance can be easily confused in fieldwork in natural areas [35,56].
However, when we compared the five field workers who mapped all three landscape types, we found that four out of the five field workers had the lowest error rates in built-up-diverse landscapes and that most of them also had the highest error rates in closed-complex landscapes. We could not identify a statistically significant relationship, but there was still a visible trend that more open and simple landscapes are mapped with higher quality, most likely because they often are easier to access and because their better visibility enables higher mapping quality.
Traditionally, mapping has been conducted by trained and experienced professionals. However, in the modern age, many volunteers are mapping the world (i.e., VGI). VGI has become an increasingly common source of spatial data, thus there is an increasing need to assess the accuracy of VGI data [15,57]. In this context, it is interesting to observe the differences between professionals and volunteers. Girres and Touya [24], Haklay [23], and Dorn et al. [26] found that in urbanized areas, the completeness and classification correctness were higher than in rural areas, which agrees with the present results, in which the built-up areas were mapped with better quality. Girres and Touya [24] found that differences in mapping completeness were caused by the fact that VGI contributors are more focused on capturing attractive objects. They found that the main effect of landscape was that it biased the area and features that volunteers preferred to map. Moreover, Haklay [23] showed that mapping quality varied among the different parts of London; it was lower in poorer regions. Therefore, the quality of VGI spatial data can exhibit high spatial variation.
Several studies that investigated the differences in spatial ability and orientation between men and women found that men and women interpret space differently, and most of the studies suggested that men had better spatial orientation abilities [37,38,39]. Coluccia et al. [38] also pointed out that in their study of volunteer classifiers, men approached maps from a global perspective (the pattern of routes), whereas women focused on local features (landmarks). This might give women an advantage in mapping. Our results showed that women generally had lower error rates, although the difference from men was not statistically significant. This might be because all field workers in our study were trained before mapping and were professionals, whereas the participants in previous studies were mostly volunteers with no previous training or professional mapping experience. This demonstrates the importance of training, particularly if a worker’s classifications can be assessed quickly to detect errors so that they can be trained to avoid these errors.
In addition, several studies have shown that people navigate their environment better when they feel safe [36,58,59]. In our study, however, field workers could choose their preferred landscapes; this means that if a worker did not feel confident or safe in the forested areas, then they were assigned to map open and built-up areas. Only 23% of the closed-complex landscapes were mapped by women, whereas 43% of the open-simple landscapes were mapped by women, who indicated a preference for open landscapes. This might also partially explain why there was no significant difference in mapping quality between genders.
We found that the mapping quality was influenced by the field workers’ years of experience. There was general decreasing trend in the values of error rates with increasing years of experience. However, the trend was not statistically significant. The field worker with the fewest years of experience had one of the lowest error rates, which was an unexpected result; we expected that longer mapping experience would result in better mapping quality. Instead, we saw a large increase in mapping errors during the third and fourth years of work. We hypothesize that this is because the work had become routine, possibly leading to excessive self-confidence and a failure to consult the mapping guidelines, leading to mistakes. Subsequently, the mapping quality again improved, which indicated the influence of experience on mapping quality. Therefore, the relationship between years of experience and mapping quality might not be linear, but rather U-shaped. However, our study did not provide enough data to confirm this hypothesis because there were too few field workers with short experience. Hearn et al. [35] found similar results, with years of experience not significantly correlated with classification correctness in habitat mapping. This is probably because vegetation mapping is also more subjective, and it is more complicated to determine clear vegetation borders and vegetation types; even after many years of experience, this remains a complex task.
In conclusion, we found that the quality of mapping varied among the landscape types. Some landscape types show higher correctness (built-up-diverse) than others (open-simple and closed-complex). This is most likely because man-made objects are easier to identify than natural vegetation, where the similarity of species composition between different vegetation types might be confusing and where drawing clear borders between different vegetation types is harder. The mapping quality was also generally higher in more open landscapes because better visibility decreases the risk of MCO errors. Interestingly, there was no statistically significant difference in mapping quality between men and women. However, although there was a trend of decreasing error rates with increasing years of experience, it was also not statistically significant because the field worker with the fewest years of experience had among the lowest error rates, whereas field workers with average experience showed the poorest results, and field workers with the most extensive experience showed improved mapping quality. In the current study, it was impossible to clearly differentiate the effect of the individual characteristics of field workers on the mapping quality from the effect of landscape, partially because the number of field workers was limited, and because these effects are interrelated and it is inherently hard to separate them. Our results suggest that mapping quality can be improved if field workers can choose their preferred landscape. In addition, it will be beneficial if the mapping guidelines are improved for forested areas to reduce potential errors that can be avoided by proper fieldwork. Monitoring fieldwork to detect errors, so that workers can be trained to avoid such errors in the future, would also improve mapping accuracy.

Author Contributions

K.M. and E.U. conceived and designed the study; K.M. performed the spatial and statistical analysis; E.U., K.M., and T.O. wrote the paper.

Acknowledgments

The study was supported by funding from the Marie Skłodowska-Curie Actions individual fellowships offered by the Horizon 2020 Programme under Research Executive Agency grant agreement number 660391, by the Ernst Jaakson memorial stipend by the University of Tartu Foundation, and by institutional grant No. IUT 2-16 funded by the Estonian Ministry of Education and Research. The field-mapping quality control was supported by the Estonian Land Board. We also thank Kalle Remm, Ants Kaasik, and Ain Kull for their advice on our statistical analyses, and Geoff Hart for language editing and useful tips. We thank all the anonymous reviewers for insightful comments on the paper, as these comments led us to an improvement of the work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Goodchild, M.F.; Gopal, S. Accuracy of Spatial Databases; Taylor and Francis: London, UK, 1989; ISBN 0-203-49023-1. [Google Scholar]
  2. Guptill, S.C.; Morrison, J.L.; Association, I.C. Elements of Spatial Data Quality; Guptill, S.C., Morrison, J.L., Eds.; Elsevier Science: New York, NY, USA, 1995; Volume 1, ISBN 9780080424323. [Google Scholar]
  3. Veregin, H. Data quality parameters. In Geographical Information Systems; Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W., Eds.; John Wiley & Sons: New York, NY, USA, 1999; pp. 177–190. ISBN 0471-33132-5. [Google Scholar]
  4. Shi, W.; Fisher, P.F.; Goodchild, M.F. Spatial Data Quality; Taylor & Francis: New York, NY, USA, 2002; ISBN 9780415258357. [Google Scholar]
  5. Devillers, R.; Jeansoulin, R. Fundamentals of Spatial Data Quality; ISTE: London, UK, 2006; ISBN 9781905209569. [Google Scholar]
  6. Shi, W.; Wu, B.; Stein, A. Uncertainty Modelling and Quality Control. for Spatial Data; CRC Press: Boca Raton, FL, USA, 2016; ISBN 9781498733281. [Google Scholar]
  7. Hunter, G.J.; Beard, K. Understanding error in spatial databases. Aust. Surv. 1992, 37, 108–119. [Google Scholar] [CrossRef]
  8. Collins, F.C.; Smith, J.L. Taxonomy for error in GIS. In International Symposium on Spatial Accuracy in Natural Resource Data Bases: Unlocking the Puzzle; Congalton, R.G., Ed.; American Society for Photogrammetry and Remote Sensing: Williamsburg, VA, USA, 1994; pp. 1–7. [Google Scholar]
  9. Fisher, P.F. Models of uncertainty in spatial data. In Geographical Information Systems; Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W., Eds.; John Wiley & Sons: New York, NY, USA, 1999; pp. 191–205. ISBN 9780470870013. [Google Scholar]
  10. MacEachren, A.M. Visualizing uncertain information. Cartogr. Perspect. 1992, 13, 10–19. [Google Scholar] [CrossRef]
  11. Goodchild, M.F.; Clark, K.C. Data quality in massive data sets. In Handbook of Massive Data Sets; Abello, J., Pardalos, P.M., Resende, M.G.C., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2002; pp. 643–659. ISBN 978-1-4020-0489-6. [Google Scholar]
  12. Devillers, R.; Beard, K. Communication and use of spatial data quality information in GIS. In Fundamentals of Spatial Data Quality; Devillers, R., Jeansoulin, R., Eds.; ISTE: London, UK, 2006; pp. 237–253. ISBN 9781905209569. [Google Scholar]
  13. Kresse, W.; Danko, D.M.; Fadaie, K. Standardization. In Springer Handbook of Geographical Information; Kresse, W., Danko, D.M., Eds.; Springer: New York, NY, USA, 2012; pp. 393–566. ISBN 978-3-540-72680-7. [Google Scholar]
  14. International Organization for Standardization. ISO 19157:2013 Geographic Information—Data Quality; ISO: Geneva, Switzerland, 2013. [Google Scholar]
  15. Senaratne, H.; Mobasheri, A.; Ali, A.L.; Capineri, C.; Haklay, M. A review of volunteered geographic information quality assessment methods. Int. J. Geogr. Inf. Sci. 2017, 31, 139–167. [Google Scholar] [CrossRef]
  16. Talhofer, V.; Hošková-Mayerová, S.; Hofmann, A. Improvement of digital geographic data quality. Int. J. Prod. Res. 2012, 50, 4846–4859. [Google Scholar] [CrossRef]
  17. Fonte, C.C.; Antoniou, V.; Bastin, L.; Estima, J.; Arsanjani, J.J.; Bayas, J.-C.L.; See, L.; Vatseva, R. Assessing VGI data quality. In Mapping and the Citizen Sensor; Foody, G., See, L., Fritz, S., Mooney, P., Olteanu-Raimond, A.-M., Fonte, C.C., Antoniou, V., Eds.; Ubiquity Press: London, UK, 2017; pp. 137–163. ISBN 978-1-911529-16-3. [Google Scholar]
  18. Mõisja, K.; Oja, T.; Uuemaa, E.; Hastings, J.T. Completeness and classification correctness of features on topographic maps: An analysis of the Estonian Basic Map. Trans. GIS 2017, 21, 954–968. [Google Scholar] [CrossRef]
  19. Estonian Land Board. Estonian Basic Map. Available online: https://geoportaal.maaamet.ee/index.php?page_id=306&lang_id=2 (accessed on 20 March 2018).
  20. Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
  21. Stehman, S.V. Sampling designs for accuracy assessment of land cover. Int. J. Remote Sens. 2009, 30, 5243–5272. [Google Scholar] [CrossRef]
  22. Van Oort, P.A.J.; Bregt, A.K.; De Bruin, S.; De Wit, A.J.W.; Stein, A. Spatial variability in classification accuracy of agricultural crops in the Dutch national land-cover database. Int. J. Geogr. Inf. Sci. 2004, 18, 611–626. [Google Scholar] [CrossRef]
  23. Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [Google Scholar] [CrossRef]
  24. Girres, J.F.; Touya, G. Quality assessment of the French OpenStreetMap dataset. Trans. GIS 2010, 14, 435–459. [Google Scholar] [CrossRef]
  25. Jackson, S.; Mullen, W.; Agouris, P.; Crooks, A.; Croitoru, A.; Stefanidis, A. Assessing completeness and spatial error of features in volunteered geographic information. ISPRS Int. J. Geo-Inf. 2013, 2, 507–530. [Google Scholar] [CrossRef]
  26. Dorn, H.; Törnros, T.; Zipf, A. Quality evaluation of VGI using authoritative data—A Comparison with land use data in southern Germany. ISPRS Int. J. Geo-Inf. 2015, 4, 1657–1671. [Google Scholar] [CrossRef]
  27. Robinson, A.H.; Morrison, J.L.; Muehrcke, P.C.; Kimerling, A.J.; Guptill, S.C. Elements of Cartography, 6th ed.; John Wiley & Sons: New York, NY, USA, 1995; ISBN 0471555797. [Google Scholar]
  28. Jakobsson, A.; Giversen, J. Guidelines for Implementing the ISO 19100 Geographic Information Quality Standards in National Mapping and Cadastral Agencies. Available online: http://www.eurogeographics.org (accessed on 20 March 2018).
  29. Pätynen, V.; Kemppainen, I.; Ronkainen, R. Testing for completeness and thematic accuracy of the national topographic data system in Finland. In Proceedings of the 18th International Cartographic Conference, Stockholm, Sweden, 23–27 June 1997; Ottoson, L., Ed.; Gävle Offset AB: Gävle, Sweden, 1997; pp. 1360–1367. [Google Scholar]
  30. Stevens, J.P.; Blackstock, T.H.; Howe, E.A.; Stevens, D.P. Repeatability of Phase 1 habitat survey. J. Environ. Manag. 2004, 73, 53–59. [Google Scholar] [CrossRef] [PubMed]
  31. Kyriakidis, P.C.; Dungan, J.L. A geostatistical approach for mapping thematic classification accuracy and evaluating the impact of inaccurate spatial data on ecological model predictions. Environ. Ecol. Stat. 2001, 8, 311–330. [Google Scholar] [CrossRef]
  32. Smith, J.H.; Wickham, J.D.; Stehman, S.V.; Yang, L. Impacts of patch size and land-cover heterogeneity on thematic image classification accuracy. Photogramm. Eng. Remote Sens. 2002, 68, 65–70. [Google Scholar]
  33. Tran, T.; Julian, J.; de Beurs, K. Land cover heterogeneity effects on sub-pixel and per-pixel classifications. ISPRS Int. J. Geo-Inf. 2014, 3, 540–553. [Google Scholar] [CrossRef]
  34. Cherrill, A.; Mcclean, C. Between-observer variation in the application of a standard method of habitat mapping by environmental consultants in the UK. J. Appl. Ecol. 1999, 36, 989–1008. [Google Scholar] [CrossRef]
  35. Hearn, S.M.; Healey, J.R.; McDonald, M.A.; Turner, A.J.; Wong, J.L.G.; Stewart, G.B. The repeatability of vegetation classification and mapping. J. Environ. Manag. 2011, 92, 1174–1184. [Google Scholar] [CrossRef] [PubMed]
  36. Lawton, C.A. Gender differences in way-finding strategies: Relationship to spatial ability and spatial anxiety. Sex Roles 1994, 30, 765–779. [Google Scholar] [CrossRef]
  37. Coluccia, E.; Louse, G. Gender differences in spatial orientation: A review. J. Environ. Psychol. 2004, 24, 329–340. [Google Scholar] [CrossRef]
  38. Coluccia, E.; Iosue, G.; Brandimonte, M.A. The relationship between map drawing and spatial orientation abilities: A study of gender differences. J. Environ. Psychol. 2007, 27, 135–144. [Google Scholar] [CrossRef]
  39. Matthews, M.H. The influence of gender on the environmental cognition of young boys and girls. J. Genet. Psychol. 1986, 147, 295–302. [Google Scholar] [CrossRef]
  40. Reilly, D.; Neumann, D.L.; Andrews, G. Gender Differences in Spatial Ability: Implications for STEM Education and Approaches to Reducing the Gender Gap for Parents and Educators. In Visual-Spatial Ability: Transforming Research into Practice; Khine, M.S., Ed.; Springer International: Basel, Switzerland, 2017; pp. 195–224. [Google Scholar]
  41. Mõisja, K. Estonian Basic Map and its quality management. Trans. Estonia Agric. Univ. 2016 Balt. Surv. ’03 2003, 216, 135–142. [Google Scholar]
  42. Mardiste, H. Consequences of the Soviet map secrecy to national cartography in Estonia. In Geheimhaltung und Staatssicherheit. Zur Kartographie des Kaltes Krieges. Archiv zur DDR-Staatssicherheit; Unverhau, D., Ed.; LIT Verlag 9.1: Münster, Germany, 2009; pp. 107–118. ISBN 9783643100702. [Google Scholar]
  43. Li, D.; Zhang, J.; Wu, H. Spatial data quality and beyond. Int. J. Geogr. Inf. Sci. 2012, 26, 2277–2290. [Google Scholar] [CrossRef]
  44. Mander, Ü.; Uuemaa, E.; Kull, A.; Kanal, A.; Maddison, M.; Soosaar, K.; Salm, J.-O.; Lesta, M.; Hansen, R.; Kuller, R.; et al. Assessment of methane and nitrous oxide fluxes in rural landscapes. Landsc. Urban Plan. 2010, 98, 172–181. [Google Scholar] [CrossRef]
  45. Estonian Land Board. Eesti Põhikaardi 1:10,000 Digitaalkaardistuse Juhend. Available online: http://geoportaal.maaamet.ee/est/Andmed-ja-kaardid/Topograafilised-andmed/Eesti-pohikaart-110-000/Juhendid-ja-abifailid-p130.html (accessed on 2 April 2018).
  46. Mõisja, K.; Uuemaa, E.; Oja, T. Integrating small-scale landscape elements into land use/cover: The impact on landscape metrics’ values. Ecol. Indic. 2016, 67, 714–722. [Google Scholar] [CrossRef]
  47. McGarigal, K.; Cushman, S.A.; Neel, M.C.; Ene, E. FRAGSTATS v4: Spatial Pattern Analysis Program for Categorical and Continuous Maps; Umass Landscape Ecology Lab: Amherst, MA, USA, 2012. [Google Scholar]
  48. Rempel, R.S.; Kaukinen, D.; Carr, A.P. Patch Analyst and Patch Grid; Ontario Ministry of Natural Resources, Centre for Northern Forest Ecosystem Research: Ontario, CA, USA, 2012. [Google Scholar]
  49. Bishop, C.M. Neural networks for pattern recognition. J. Am. Stat. Assoc. 1995, 92, 1642–1645. [Google Scholar]
  50. StataCorp LP. StataCorp LP Stata Statistical Software: Release 12; StataCorp LP: College Station, TX, USA, 2011. [Google Scholar]
  51. Riitters, K.H.; O’Neill, R.V.; Hunsaker, C.T.; Wickham, J.D.; Yankee, D.H.; Timmins, S.P.; Jones, K.B.; Jackson, B.L. A factor analysis of landscape pattern and structure metrics. Landsc. Ecol. 1995, 10, 23–39. [Google Scholar] [CrossRef]
  52. Lausch, A.; Herzog, F. Applicability of landscape metrics for the monitoring of landscape change: Issues of scale, resolution and interpretability. Ecol. Indic. 2002, 2, 3–15. [Google Scholar] [CrossRef]
  53. Cushman, S.A.; McGarigal, K.; Neel, M.C. Parsimony in landscape metrics: Strength, universality, and consistency. Ecol. Indic. 2008, 8, 691–703. [Google Scholar] [CrossRef]
  54. Schindler, S.; Poirazidis, K.; Wrbka, T. Towards a core set of landscape metrics for biodiversity assessments: A case study from Dadia National Park, Greece. Ecol. Indic. 2008, 8, 502–514. [Google Scholar] [CrossRef]
  55. Fisher, P.; Comber, A.; Wadsworth, R. Approaches to uncertainty in spatial data. In Fundamentals of Spatial Data Quality; Devillers, R., Jeansoulin, R., Eds.; ISTE: London, UK, 2006; pp. 43–59. ISBN 9781905209569. [Google Scholar]
  56. Cherrill, A. Inter-observer variation in habitat survey data: Investigating the consequences for professional practice. J. Environ. Plan. Manag. 2016, 59, 1813–1832. [Google Scholar] [CrossRef][Green Version]
  57. Antoniou, V.; Skopeliti, A. Measures and indicators of VGI quality: An overview. In Proceedings of the ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, La Grande Motte, France, 28 September–3 October 2015; Volume II-3/W5, pp. 345–351. [Google Scholar]
  58. Schmitz, S. Gender-related strategies in environmental development: Effects of anxiety on wayfinding in and representation of a three-dimensional maze. J. Environ. Psychol. 1997, 17, 215–228. [Google Scholar] [CrossRef]
  59. Lawton, C.A.; Kallai, J. Gender differences in wayfinding strategies and anxiety about wayfinding: A cross-cultural comparison. Sex Roles 2002, 47, 389–401. [Google Scholar] [CrossRef]
Figure 1. Overview of the ISO 19157:2013 data quality elements modified [14]. The focus of the present study is highlighted.
Figure 1. Overview of the ISO 19157:2013 data quality elements modified [14]. The focus of the present study is highlighted.
Ijgi 07 00205 g001
Figure 2. Locations of the sites (red dots) where quality control was performed.
Figure 2. Locations of the sites (red dots) where quality control was performed.
Ijgi 07 00205 g002
Figure 3. Plot of the mean factor values of landscape factors and built-up areas for the three landscape clusters (types) and examples of maps for those landscape clusters: (1) built-up-diverse landscape; (2) open–simple landscape; and (3) closed-complex landscape.
Figure 3. Plot of the mean factor values of landscape factors and built-up areas for the three landscape clusters (types) and examples of maps for those landscape clusters: (1) built-up-diverse landscape; (2) open–simple landscape; and (3) closed-complex landscape.
Ijgi 07 00205 g003
Figure 4. Box plots of the error rates by field workers based on (a) gender (M—male; F—female) and (b) years of experience. For each fieldworker, we calculated the median value across the sites they examined.
Figure 4. Box plots of the error rates by field workers based on (a) gender (M—male; F—female) and (b) years of experience. For each fieldworker, we calculated the median value across the sites they examined.
Ijgi 07 00205 g004
Figure 5. Relationship between the field worker’s (n = 21) years of experience and their median misclassification, commission, and omission (MCO) error rate across sites (all error types summed for one site).
Figure 5. Relationship between the field worker’s (n = 21) years of experience and their median misclassification, commission, and omission (MCO) error rate across sites (all error types summed for one site).
Ijgi 07 00205 g005
Figure 6. Box plots for the rates of misclassification, commission, and omission errors in the different landscapes defined in Table 4. For a given error type, based on the Kruskal–Wallis multiple comparison of mean ranks for all groups: 1 = statistically significant difference from built-up-diverse, 2 = statistically significant difference from open-simple, 3 = statistically significant difference from closed-complex.
Figure 6. Box plots for the rates of misclassification, commission, and omission errors in the different landscapes defined in Table 4. For a given error type, based on the Kruskal–Wallis multiple comparison of mean ranks for all groups: 1 = statistically significant difference from built-up-diverse, 2 = statistically significant difference from open-simple, 3 = statistically significant difference from closed-complex.
Ijgi 07 00205 g006
Figure 7. Box plots of the summed values of MCO error rates (all three categories combined) by field workers in the three landscape types defined in Table 4. Field workers who mapped all three landscape types are shaded grey.
Figure 7. Box plots of the summed values of MCO error rates (all three categories combined) by field workers in the three landscape types defined in Table 4. Field workers who mapped all three landscape types are shaded grey.
Ijgi 07 00205 g007
Table 1. Quality elements and measures of each element according to the ISO 19157 [14] standard used in the study.
Table 1. Quality elements and measures of each element according to the ISO 19157 [14] standard used in the study.
Quality ElementQuality Measure
MisclassificationError rate of lines
Error rate of points
Error rate of polygons
OmissionError rate of lines
Error rate of points
CommissionError rate of lines
Error rate of points
Table 2. Landscape indicators used in the study. For a more detailed description, see Rempel et al. [48].
Table 2. Landscape indicators used in the study. For a more detailed description, see Rempel et al. [48].
Landscape Indicator TypeLandscape Indicator
Diversity metricsSDI: Shannon’s diversity index
SEI: Shannon’s evenness index
Shape metricsAWMSI: area-weighted mean shape index
MSI: mean shape index
MPAR: mean perimeter–area ratio
MPFD: mean patch fractal dimension
AWMPFD: area-weighted mean patch fractal dimension
Edge metricsED: edge density
MPE: mean patch edge
Patch density and size metricsMPS: mean patch size
PD: patch density
PRD: patch richness density
MedPS: median patch size
PSCoV: patch size coefficient of variance
PSSD: patch size standard deviation
Land use compositionOV: proportion of land use creating open viewsheds in the landscape
CV: proportion of land use creating closed viewsheds in the landscape
BU: proportion of built-up areas in the landscape
Table 3. Results of the factor analysis using the varimax rotation. Significant values are in bold (p < 0.05).
Table 3. Results of the factor analysis using the varimax rotation. Significant values are in bold (p < 0.05).
Factor Number
1234
DiversityPatch Size DistributionClosurePatch Shape Complexity
Eigenvalue6.934.392.431.06
Cumulative % of variance38.5262.9276.4182.32
% Total variance38.5224.4013.495.91
Factor Loadings (after Varimax Rotation)
SDI0.920.12−0.070.11
SEI0.900.040.010.17
AWMSI0.410.560.10−0.04
MSI−0.070.090.300.90
MPAR−0.10−0.780.000.51
MPFD0.220.860.29−0.06
AWMPFD0.83−0.340.14−0.22
ED0.940.170.15−0.05
MPE−0.360.710.400.37
MPS−0.760.430.100.26
MedPS−0.320.380.460.21
PSCoV−0.030.92−0.040.07
PSSD−0.510.750.060.20
PRD0.15−0.74−0.03−0.31
PD0.92−0.14−0.13−0.23
OV−0.15−0.10−0.94−0.08
CV0.65−0.02−0.45−0.33
BU−0.100.080.920.19
Table 4. Means, standard deviations, and variances of factor values for the landscape clusters. N is number of sites, and FW is the number of field workers who mapped sites in that cluster. Landscape factors were derived from the factor analysis (Table 3).
Table 4. Means, standard deviations, and variances of factor values for the landscape clusters. N is number of sites, and FW is the number of field workers who mapped sites in that cluster. Landscape factors were derived from the factor analysis (Table 3).
Landscape Indicator or FactorCluster 1Cluster 2Cluster 3
Built-Up-DiverseOpen-SimpleClosed-Complex
(N = 17; FW = 7)(N = 37; FW = 14)(N = 39; FW = 19)
MeanSt.Dev.VarianceMeanSt.Dev.VarianceMeanSt.Dev.Variance
Built-up area1.631.191.41−0.340.440.2−0.390.420.18
Diversity1.31.011.02−0.680.670.450.080.580.34
Patch size distribution0.070.980.960.630.990.97−0.630.540.29
Closure−0.590.510.26−0.370.810.660.61.011.03
Patch complexity−0.440.840.71−0.060.740.540.251.211.46
Table 5. Field workers’ gender, years of experience in field mapping, and number of mapped sites they investigated in the different landscape types described in Table 4. M—male; F—female.
Table 5. Field workers’ gender, years of experience in field mapping, and number of mapped sites they investigated in the different landscape types described in Table 4. M—male; F—female.
Field Worker IDGenderYears of ExperienceTotal Number of Inspected SitesNumber of Sites in the Landscape
Built-Up-DiverseOpen-SimpleClosed-Complex
1M63012
2F68242
3M51010
4M26051
5M44022
6M711623
7M75041
8F116042
9M710046
10M79423
11M53012
12M52002
13M68044
14M73102
15F71100
16F83111
17M31001
18F86222
19M51001
20F81001
21M71001

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop