Spatial data quality has been the subject of discussions for more than 30 years [1
]. During this period, researchers have conducted several academic studies on error or uncertainty modelling [7
] and on how to communicate data quality information [10
]. In addition, several international standards related to data quality have been established [13
]. Standardisation work has been completed under the ISO 1957 standard [14
], which clarifies the scope of data quality, defines the elements and the measures of quality, describes quality assessment procedures, provides guidelines for reporting the results of a quality evaluation, and introduces the concept of metaquality. Thus, metaquality provides information on the evaluation of quality and the results in order to describe the suitability of the evaluation method, the measure or measures that were applied, and the results. [14
]. Nevertheless, confusion in using quality terms may still occur. For example, completeness and thematic accuracy have been defined as quality measures
by Senaratne et al. [15
], whereas ISO 19157 [14
] defines them as quality elements
In order to describe spatial data quality, ISO 19157 provides 21 quality elements (Figure 1
), which have been organized into six categories: completeness, thematic accuracy, logical consistency, temporal quality, positional accuracy, and usability. However, the standard allows for expansion of the list, so that one can always add more quality elements. Talhofer et al. [16
] suggested using five essential criteria for the evaluation of spatial data quality based on value-analysis theory: database content, database technical quality, database timeliness, area importance, and user friendliness; the latter is intended to consider data quality from the user’s point of view. Fonte et al. [17
] proposed additional quality indicators for volunteered geographic information (VGI).
In the present study, we focused on three data quality elements: classification correctness, commission, and omission. Hereafter, we will refer to these as misclassification, commission, and omission (MCO) errors. To make our discussion less abstract, we will refer to the topographical data produced by the Estonian Land Board (Estonia’s national mapping agency) [18
]. The Estonian topographic dataset includes not only information about relief, but also information about infrastructure (e.g., roads and electric power lines), settlements, hydrography, and land use [19
]. In the current study, we did not consider relief and focused only on the other map elements. To assess MCO errors, we used direct evaluation methods [14
]. In this approach, data are compared with ground-based data obtained by means of field-mapping, or with reference data such as highly accurate maps or imagery [20
]. Using reference data for assessment of spatial data quality is common practice in remote sensing [22
] and in VGI [23
]. However, large-scale topographical maps (typically at a scale of ≥1:50,000 [27
]) focus on small areas, and in habitat mapping, more accurate maps are rarely available. Moreover, the large-scale topographical datasets produced by national mapping agencies are very often used as reference data themselves [28
]. Therefore, ground-based data is often used for an assessment of MCO errors in topographical or habitat data [29
The quality of spatial data may vary spatially [31
]. If mapping is done by fieldwork, the quality may be variable as a result of differences in the complexity of the landscape and in the mapping skills of individual workers, as well as in their perception of the landscape. Smith et al. [32
], van Oort et al. [22
], and Tran et al. [33
] found that the probability of correct classification of satellite images depended on landscape complexity. The probability of correct classification was higher in more homogeneous landscapes and lower in heterogeneous landscapes. The effect of the mapping skills of the individual mappers on habitat mapping quality was tested by Cherrill and McClean [34
]. In their study, six field workers independently surveyed the same area. Only 7.9% of the total study area was classified as the same land cover type by all six surveyors. This indicates that potentially large inconsistencies in mapping may result from differences in mapping skills among field surveyors. Hearn et al. [35
] found that increasing years of experience and experience with mapping certain landscape types improved mapping quality, although mapping time, cost, and length of route did not correlate with the mean level of agreement among surveyors. Moreover, several studies [36
] highlighted the possibility of gender differences in spatial ability (i.e., men and women interpret spaces differently). Reilly et al. [40
] noted that gender gaps in spatial ability were the largest of all gender differences in cognitive abilities.
However, to our knowledge, the interaction between the impact of gender, years of experience, and landscape heterogeneity has not yet been investigated.
The main aims of the present study were to determine whether and how MCO errors differed among field workers and whether any differences were influenced by landscape heterogeneity. We hypothesised that mapping quality would decrease in heterogeneous landscapes and landscapes with relatively closed viewsheds, and that it would be affected by the worker’s characteristics.
4. Discussion and Conclusions
Based on the factor and cluster analysis, we divided the study sites into three different landscape types: built-up-diverse, open-simple, and closed-complex. We found that the rates of MCO errors in these different landscape types differed significantly. The lowest error rates were in the built-up-diverse landscapes. This is likely a result of the fact that buildings are very distinctive features of the landscape and are very easy to recognize in the field. At the same time, buildings increase the landscape diversity, which might increase the attention of the field worker, leading to fewer mistakes. In addition, built-up-diverse landscapes had high openness, which increases the visibility of features such as buildings and thus eases mapping. The variation of error rates was also lowest in this landscape type, which in turn might be caused by the lower number of sites in this landscape type (i.e., fewer workers are required to map a smaller number of sites, leading to decreased variation).
In contrast, the closed-complex landscape types were mainly forested areas, and the error rates and variation of the error rates were highest. This can probably be explained by the mapping technology. Stereophotogrammetry was used for mapping all of the objects that could be recognized from the aerial photos, but the class of the object often could not be determined clearly by this method. In addition, many objects could be hidden by dense forest cover and not detected at all; this may mean that the field worker will not look for those features. Mõisja et al. [18
] found that the features with the highest error frequency were culverts, paths, forest cutlines (8-m-wide areas in which trees had been removed), and ditches, which are often not visible from the aerial photos. These features were common in the forested areas and at the edges of forested areas. Therefore, their presence would lead to a high error rate in this landscape type. Moreover, the closed-complex landscapes have low visibility, which requires more work to cover the landscape on foot.
In terms of the quality elements, misclassification was the most common error type, regardless of the landscape type and field worker. This is also likely to be because of the mapping method, in which all features that are recognizable from the aerial photos have been put on the map based on stereophotogrammetry, followed by subsequent verification of these features during the field work [18
]. Misclassification mainly results from uncertainty in classifying some objects, because some natural features are hard to consistently place in the same class, and it can be difficult to draw clear borders between certain features [35
]. Some misclassification mistakes may also occur when the field worker has not actually checked the feature in the field. The misclassification errors are most frequent between relatively similar classes, in which case the errors may be unimportant from a practical perspective, whereas other errors may be highly significant [20
]. In contrast with the classification of remote sensing images, in which commission errors are more common than omission errors, we found that commission errors were least common. This is likely because the field workers mostly checked the features detected from the aerial photos; in contrast, field checks are often not performed in the classification of remote sensing images.
When we explored each landscape type separately, we found large variation in error rates within each landscape type. This indicates that the individual characteristics of field workers have some effect on the mapping quality. Mõisja et al. [18
] reached the same conclusion in their investigation of the distribution of errors among feature types and field workers. For example, they found that more than half the errors related to grasslands and narrow ditches were made by just two field workers; similarly, most of the mistakes related to culverts and open spaces were made by three field workers. This means that some field workers are not successful in correctly mapping specific features. This agrees with studies that explored the accuracy of vegetation mapping, which found that individual skills affected the values of MCO error rates [34
]. Moreover, our results indicated that in some landscapes, individual characteristics may have had a bigger effect on mapping quality than in other landscapes. For example, in the built-up-diverse landscapes, the variation of error rates and overall error rates were lower than in closed-complex and open-simple landscapes, which suggests that built-up areas were easier to map than natural areas. In contrast, different vegetation communities that are similar in species composition and appearance can be easily confused in fieldwork in natural areas [35
However, when we compared the five field workers who mapped all three landscape types, we found that four out of the five field workers had the lowest error rates in built-up-diverse landscapes and that most of them also had the highest error rates in closed-complex landscapes. We could not identify a statistically significant relationship, but there was still a visible trend that more open and simple landscapes are mapped with higher quality, most likely because they often are easier to access and because their better visibility enables higher mapping quality.
Traditionally, mapping has been conducted by trained and experienced professionals. However, in the modern age, many volunteers are mapping the world (i.e., VGI). VGI has become an increasingly common source of spatial data, thus there is an increasing need to assess the accuracy of VGI data [15
]. In this context, it is interesting to observe the differences between professionals and volunteers. Girres and Touya [24
], Haklay [23
], and Dorn et al. [26
] found that in urbanized areas, the completeness and classification correctness were higher than in rural areas, which agrees with the present results, in which the built-up areas were mapped with better quality. Girres and Touya [24
] found that differences in mapping completeness were caused by the fact that VGI contributors are more focused on capturing attractive objects. They found that the main effect of landscape was that it biased the area and features that volunteers preferred to map. Moreover, Haklay [23
] showed that mapping quality varied among the different parts of London; it was lower in poorer regions. Therefore, the quality of VGI spatial data can exhibit high spatial variation.
Several studies that investigated the differences in spatial ability and orientation between men and women found that men and women interpret space differently, and most of the studies suggested that men had better spatial orientation abilities [37
]. Coluccia et al. [38
] also pointed out that in their study of volunteer classifiers, men approached maps from a global perspective (the pattern of routes), whereas women focused on local features (landmarks). This might give women an advantage in mapping. Our results showed that women generally had lower error rates, although the difference from men was not statistically significant. This might be because all field workers in our study were trained before mapping and were professionals, whereas the participants in previous studies were mostly volunteers with no previous training or professional mapping experience. This demonstrates the importance of training, particularly if a worker’s classifications can be assessed quickly to detect errors so that they can be trained to avoid these errors.
In addition, several studies have shown that people navigate their environment better when they feel safe [36
]. In our study, however, field workers could choose their preferred landscapes; this means that if a worker did not feel confident or safe in the forested areas, then they were assigned to map open and built-up areas. Only 23% of the closed-complex landscapes were mapped by women, whereas 43% of the open-simple landscapes were mapped by women, who indicated a preference for open landscapes. This might also partially explain why there was no significant difference in mapping quality between genders.
We found that the mapping quality was influenced by the field workers’ years of experience. There was general decreasing trend in the values of error rates with increasing years of experience. However, the trend was not statistically significant. The field worker with the fewest years of experience had one of the lowest error rates, which was an unexpected result; we expected that longer mapping experience would result in better mapping quality. Instead, we saw a large increase in mapping errors during the third and fourth years of work. We hypothesize that this is because the work had become routine, possibly leading to excessive self-confidence and a failure to consult the mapping guidelines, leading to mistakes. Subsequently, the mapping quality again improved, which indicated the influence of experience on mapping quality. Therefore, the relationship between years of experience and mapping quality might not be linear, but rather U-shaped. However, our study did not provide enough data to confirm this hypothesis because there were too few field workers with short experience. Hearn et al. [35
] found similar results, with years of experience not significantly correlated with classification correctness in habitat mapping. This is probably because vegetation mapping is also more subjective, and it is more complicated to determine clear vegetation borders and vegetation types; even after many years of experience, this remains a complex task.
In conclusion, we found that the quality of mapping varied among the landscape types. Some landscape types show higher correctness (built-up-diverse) than others (open-simple and closed-complex). This is most likely because man-made objects are easier to identify than natural vegetation, where the similarity of species composition between different vegetation types might be confusing and where drawing clear borders between different vegetation types is harder. The mapping quality was also generally higher in more open landscapes because better visibility decreases the risk of MCO errors. Interestingly, there was no statistically significant difference in mapping quality between men and women. However, although there was a trend of decreasing error rates with increasing years of experience, it was also not statistically significant because the field worker with the fewest years of experience had among the lowest error rates, whereas field workers with average experience showed the poorest results, and field workers with the most extensive experience showed improved mapping quality. In the current study, it was impossible to clearly differentiate the effect of the individual characteristics of field workers on the mapping quality from the effect of landscape, partially because the number of field workers was limited, and because these effects are interrelated and it is inherently hard to separate them. Our results suggest that mapping quality can be improved if field workers can choose their preferred landscape. In addition, it will be beneficial if the mapping guidelines are improved for forested areas to reduce potential errors that can be avoided by proper fieldwork. Monitoring fieldwork to detect errors, so that workers can be trained to avoid such errors in the future, would also improve mapping accuracy.