Next Article in Journal
Mass–Energy Profiles Obtained by Quantum Chemical Computing Applied in Mass Spectrometry: A Case Study with Identification of a Group of Acetalized Monosaccharide Isomers
Previous Article in Journal
Sanitation of Apple Cultivars from AP Phytoplasma and ApMV and ACLSV Viruses Using In Vitro Culture and Cryo-Knife Therapy in Liquid Nitrogen
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of Land Cover Categorical Data Stored in OSM and Authoritative Topographic Data

by
Sylwia Borkowska
*,
Elzbieta Bielecka
and
Krzysztof Pokonieczny
Faculty of Civil Engineering and Geodesy, Military University of Technology, 00-908 Warsaw, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(13), 7525; https://doi.org/10.3390/app13137525
Submission received: 11 May 2023 / Revised: 18 June 2023 / Accepted: 23 June 2023 / Published: 26 June 2023

Abstract

:
This study aims at a comparative analysis of quantitative data, namely, OSM and BDOT10k. Analyses were conducted in a 1 km2 hexagonal grid, in seven test counties located in different regions of Poland, differing in the degree of urbanization, land cover and natural environment. It is assumed that the authors’ consolidated regional classification of the Compound Correspondence Index CCIRn is attributed to the geometric mapping unit based on TOPSIS values, and their statistical measure of dispersion enables the comparison of datasets for individual geographically disjointed areas according to uniform criteria, e.g., the number of topographic features stored in analyzed datasets, both polygonal (buildings, forests, surface water) and linear (roads, watercourses, railroads). The final results of the regional assessment outperform the local classification giving a higher level of data compliance. Overestimation of regional concordance ranges from 9 to 20% of the county area, with an average of 3% reduction in the area where the two datasets (BDOT10k and OSM) have comparable information ranges. Areas of medium and high nonconformity are decreased by an average of 2.4%.

1. Introduction

Information is a key asset in the 21st century, referred to as the information age. The information age is, in turn, inextricably linked to the information society, a society based on information and knowledge. Furthermore, geospatial (geographical) information (GI) plays, as noted by [1,2], a crucial role among available information, as it is the basis for 60–80% of decisions made by public administrations [3]. In the third decade of the 21st century, geospatial information forms a foundation of the information-based society. Geographic information and GI technology are present in almost every sector of science, the economy and industry. Nowadays, in addition to traditional maps and databases, they offer a plethora of web applications that can solve real-world problems. Furthermore, these analyses are presented in a form understandable to the end user and the consumer of information, and add value to the economy, science and society.
The large amount and variety of data not only create new challenges in effective management and analysis, but also create opportunities to explore the potential value of data. Data, including geospatial data, are not error-free, and the results of their analysis are burdened with uncertainty. Data entry errors lead to inaccurate geospatial data, which in turn has wide implications and, as discussed by Bielecka and Burek [4], significantly affects the results of the final analysis, and can lead to wrong decisions and financial losses. Therefore, the GI society is tirelessly striving to improve the reliability of geospatial data, proposing methods for assessing quality and reliability and testing the suitability of data for a specific task (their fitness for purpose). Data quality assessment is particularly important in the context of selecting a dataset that best meets the user’s needs. The selection of fit-for-purpose data is usually supported by an analysis of their quality or completeness, often understood as information capacity. The analysis is carried out mainly by comparison with reference data considered to be very reliable. The motivation for this study is therefore to develop a composite measure that allows a comparative assessment of the completeness of two datasets. Contrary to studies conducted so far, described in Section 2, the assessment of completeness refers to the basic analytical unit, facilitating a detailed assessment of completeness in particular geographical locations. The comprehensive methodology developed is based on a compensatory comparative analysis (TOPSIS), not previously used in categorical data evaluation, using the linear ranking of the authoritative Compound Correspondence Index (CCI) and statistical measures of dispersion for its in explicite spatial visualization. Therefore, this study aims to compare the volume of information (capacity) of two open spatial datasets: (1) Database of Topographic Objects (BDOT10k), made available on an open basis by the Head Office of Geodesy and Cartography, and (2) OpenStreetMap, created by volunteers from all over the world. This methodology contributes to both academics and practitioners by helping a user select the best (fit-for-purpose) spatial dataset available for the task at hand. The decision behind the choice of topographic data, both authoritative and voluntary, lies in their wide use in a plethora of applications, e.g., environmental analysis [5,6], protecting and validating landscapes [7,8], and sustainable rural and urban planning [9,10]. The elaborated methodological framework is intended to facilitate cartographic modeling in the analysis of selecting the best geospatial dataset, namely, the data that are most fit for purpose.

2. Related Works

2.1. Categorical Data Comparison—Literature Review

The literature on geospatial data quality is extremely rich, and those on the implementation of machine learning solutions as part of data quality improvement strategies are gaining popularity. This brief review will thus cover only one aspect, namely, the comparative analysis of maps and datasets containing categorical data. Categorical data typically result from mapping, classification, or modeling; hence most of the literature in the field of qualitative data comparison deals with land cover/land use or landscape data [11]. Differences between categorical maps can be characterized and measured in a variety of ways [12], from descriptive or inferential statistics to advanced data mining. However, two approaches dominate among comparative studies of categorical data, the first based on a cross-tabulation matrix to summarize the degree of data association, and the second using spatial and statistical analysis to determine data comparisons in locational and qualitative dimensions.
The Cohen kappa coefficient of agreement derived from a cross-tabulation matrix, i.e., the relative rating of two or more classifications based on the proportion of correctly allocated cases, dominates most of the literature [13,14]. The Kappa coefficient can also be used with missing data, as pointed out by De Raadt et al. [15]. The authors analyzed three kappa coefficients: Gwet’s kappa, regular kappa, and listwise deletion kappa. They found that both Gwet’s kappa and listwise deletion kappa outperform regular category kappa in terms of bias, and generally have a very small mean squared error. It is even possible to use weighted kappa, introduced by Cohen in 1968 [16], which is of the utmost importance when disagreements between datasets are not equally important. Vanbelle and Albert [17] remarked that, under certain conditions, “the weighted kappa coefficient is equivalent to the product-moment correlation coefficient”. In 2000, Pontius [18] introduced some extension of kappa, namely, kappa with random change agreement and kappa for location (Klocation).
The kappa coefficient, however, has some disadvantages. Foody [13] pointed out that the sample used to evaluate maps should be non-dependent. However, in practice, this assumption is almost impossible, since the same sample of ground data sites is often used for each case of map elaboration. In conclusion, in [13], the author also suggests using a measure of the proportion of correctly assigned cases. A similar opinion is shared by Pontius and Millones [18], who summed up more than a decade of research on the kappa coefficient and concluded that the simple measures of quantitative and allocation disagreement are even more useful in showing differences in categorical data. Moreover, Pontius [19] also suggests that in data agreement analysis, statistical measures of dispersion such as mean deviation, mean absolute deviation and correlation and slope are even more helpful in interpreting data dissimilarity than indices of agreement.
In view of the aforementioned limitations of kappa, researchers use some qualitative–quantitative approaches that deserve attention due to their combination of evaluation and analysis. Multi-criteria analytical techniques predominantly rely on statistical methods or a combination with data mining techniques. Li and Reynolds [20] quantified spatial heterogeneity in categorical maps using ANOVA statistics. Among scientists, a very popular method for comparing categorical data is clustering. However, as noted by Lex et al. [21], the clustering of multi-dimension data can conceal some important relations between object classes, and the final results strongly depend on the algorithm used. Promising results were obtained by Hagen [22], using the fuzzy set theory, especially for ambiguities in determining the “location of the category (fuzziness of location) and in the definition of the category (fuzziness of category)”. The simultaneous analysis of location and quantity was also the subject of research by Pontius and Suedmeyer [23], who developed a new technique of budgeting agreement and disagreement between two categorical maps. Their methodology also includes stratification, hard and soft classification and multiple resolutions to compare maps by quantity and location. Wabiński et al. [24] adapted the structural information measure, introduced by the Russian cartographer Salistchev [25], to compare tactile thematic maps, which extended the use of comparative measures to maps developed for the needs of visually impaired people. Comber et al. [26] recall that different methods of data comparison require many disparate processing steps, and may lead to different results and conclusions. This assessment was also supported by Boots and Csillag [27], based on the results of an expert workshop conducted in March 2004.

2.2. OpenStreetMap Data Quality

With the advent of OSM, due to the lack of control over the VGI data that characterize most user-provided web resources, many questions have been raised about the quality and reliability of the data. However, as summarized in Arsjani et al.’s [28] study on OpenStreetMap, in some regions, OSM geodata are more complete and geometrically and semantically more accurate than the corresponding proprietary datasets. Haklay [29] was the first who compared the quality of OpenStreetMap data with the Meridian2 data maintained by the Ordnance Survey (OS). The research results published in the fourth year of the project indicate the comparable quality of OS and OSM in terms of accuracy and completeness. The most comprehensive assessment of OSM quality was presented by Girres and Touya [30], providing such elements of spatial data quality as geometric, attribute, semantic and temporal accuracy, logical consistency, completeness, lineage and usage. The outcome of their research shows heterogeneous quality across regions and land use elements, indicating relatively high positioning accuracy and very different completeness depending on the density of OSM volunteers. Mondzech and Sester [31] analyzed OpenStreetMap, focusing on determining the optimal routes for pedestrian traffic. Nevertheless, as demonstrated by Ciepluch et al. [32] and Zielstra and Zipf [33], other topographic data with which OSM data are compared are not always more accurate.
Contrary to the extensive analysis of buildings and roads in OSM [31,34], the first attempt at analyzing land use features was carried out by Hagenauer and Helbich [35]. The authors compared the land use polygons and land use patterns of OSM and Urban Atlas data, and found that the location agreements expressed by kappa were 91, 79 and 76% across the three classification levels, while the attributes of both datasets matched at 81, 67 and 65%. Significantly worse completeness and accuracy results were observed by Zhou et al. [36] when comparing OSM land cover with global CCI-LC (Climate Change Initiative Land Cover) data in 168 countries. In 129 countries, completeness did not exceed 40%, and in only 17 European countries was it higher than 60%. Much better results were obtained for accuracy, which was higher than 60% in 149 countries.

3. Materials and Methods

3.1. Study Area and Data Used

The study covers the territory of seven Polish counties (an administrative unit corresponding to the European Territorial Units for Statistics NUTS 4), which are located in different physical–geographical mesoregions and reflect the diversity of both the natural and anthropogenic environment, making them representative (Table 1). Due to various threats, including the violation of state borders’ integrity, they are of strategic importance for security. The area of interest (see Figure 1) covers 3.1% of Poland. Piaseczno and Otwocki counties are located in the central part of Poland, adjacent to the capital city of Warsaw, Ostrowski, in Great Poland. The next two study areas are situated along the eastern border of Poland, namely, Sokólski (in the north) along the border with Belarus, and Sanocki (in the south) on the Polish–Ukrainian and Polish–Slovak border. The sixth county Słupski is located in the Baltic Sea coastal zone. Last but not least, Międzyrzecki county is located in western Poland. Five of the regions of interest have previously been the subject of OSM data quality assessments [37,38].
Two topographic datasets were investigated, namely, the National Database of Topographic Objects (BDOT10k), managed by the Head Office of Geodesy and Cartography, and OpenStreetMap, created on a voluntary basis. Six thematic layers were analyzed in detail: three polygon layers, such as buildings, forests and water bodies, and three linear ones—streams and canals, paved roads and railway lines. Topographic data were chosen because they reflect the complex relationships between components of the geographical environment, related to morphology, geology, hydrology, vegetation, and microclimate.

3.2. Method Applied

3.2.1. Main Methodological Assumptions and Research Question

The basic research problem to which this study refers concerns the definition of a Compound Correspondence Index (CCI), at local and regional scales, and the determination of the number and optimal class ranges that explicitly indicate the spatial location of differences in the information capacity of two investigated datasets. The WLC (Weighted Linear Combination) method for comparative, multi-criteria analysis was used on the basis of such criteria as differences in the area covered by buildings, forest and water bodies, and the lengths of roads, railways and rivers. The minimum difference indicates a very similar information volume, while maximum implies large differences between the two sets. It is assumed that the consolidated regional classification of the Compound Correspondence Index (regional CCI, hereinafter referred to as CCIRn), attributed to the 1 km2 hexagonal grid, and their statistical measure of dispersion, enables the comparison of datasets for individual geographically disjoint areas according to uniform criteria. The decision to use hexagons has some advantages, as it is closer in shape to circles than squares, potentially reducing bias due to edge effects [40].
Therefore, the answering of two research questions is the priority of this study, and the questions are as follows:
Q1—Is there the difference between the CCI value calculated for grid cells of all research areas together (CCIRn) and the CCI value calculated in grid cells separately for each area (local CCI; hereinafter referred to as CCILn), and if so, how big?
Q2—How does the value of regional CCI (CCIRn) change if we include another research area in the analysis, i.e., how sensitive is the CCIRn?
The answers to the questions allow us to verify the hypothesis that CCIRn underestimates the dissimilarity between analyzed datasets, indicating slightly higher compliance than the local CCI.
This approach is innovative as it allows the comparison of two sets containing qualitative data for geographically disjoint areas, thus enabling the user to choose one of them consciously and responsibly. It also shows the differences between CCI classifications at the local and regional levels.

3.2.2. Research Schema

The research was conducted in four consecutive phases: (1) preparatory; (2) computational; (3) sensitivity analysis; and (4) visualization. The preparatory phase relied on data acquisition, checking and preprocessing, including coordinate transformation, hexagonal grid creation, as well as assigning appropriate attributes to grid cells, e.g., the area covered by buildings, forests and water reservoirs and the length of roads, streams and railways in both datasets BDOT10k and OSM. This stage is effectively described in a former publication of Borkowska and Pokonieczny [37]. Phase 2—calculation CCI based on TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution). Phase 3—sensitivity assessment, which aimed to analyze the variability of regional CCI values when the research area is extended to other regions (disjoint or adjacent), based on descriptive statistics measures of dispersion and centrality and spatial inferential statistics, Global Moran’s I. Phase 4—visualization, which is based on the portrayal of the CCI value in the form of a choropleth map. The five class intervals reflect the conformity of BDOT10k and OSM data. Contrary to the classic Likert scale [41], reverse ordering was used, consistent with the CCI values, and so class 1, starting with the value 0, meant full compliance, and class 5 meant maximum non-compliance. Class ranges are based on one standard deviation interval. All research phases and stages are shown in Figure 2.

3.2.3. Compound Correspondence Index Calculation

TOPSIS is perceived as the most widely used approach among multi-criteria decision analysis (MCDA) [42,43]. This classical MCDA method, formerly developed by Hwang and Yoon in 1981 [44], addresses complex decision problems involving conflicting goals, uncertainty and different data formats. TOPSIS has gained popularity in a plethora applications because it evaluates real-world problems based on a variety of criteria, both quantitative and qualitative. Simultaneously, it takes into account the mutual distances to positive and negative ideal solutions, and finally orders the rank of preferences based on their relative proximity (a combination of these two distance measures). This study uses the classical TOPSIS method to portray differences in the thematic scope (understood as the area covered by buildings, forests and water reservoirs and the lengths of roads, rivers and railways) of two topographic datasets (BDOT10k and OSM) in the form of a composite index to compare and rank the obtained alternatives.
The seven standardized steps of the TOPSIS method described in detail by many papers, e.g., Pavic and Novoselac [45], Zavadskas et al. [46], are in general presented below.
1.
Problem description
The decision relies on the selection of BDOT10k or OSM sets (two alternatives; m = 2) based on the minimal value of the k criteria (k = 6), as shown in Equation (1):
x 1 = | B T B O S M B | , x 2 = B T F O S M F , x 3 = B T W O S M W , x 4 = B T R o O S M R o , x 5 = B T S O S M S , x 6 = B T R a O S M R a ,
where BT denotes the BDOT10k dataset; OSM—OpenStreetMap data; subscripts B, F, W—total area covered by buildings, forest and water bodies in the grid cell, and Ro, S, Ra—total length of paved roads, streams, and railways, respectively.
2.
Calculation of the normalized decision matrix (nij) using the quotient method (Equation (2)):
n i j = x i j i = 1 m x i j 2 , i = 1 m ; j = 1 n .
where m is the number of alternatives, and n is the number of hexagonal grid cells.
3.
Calculation of the weighted normalized decision matrix—Equation (3):
r i j = w j × n i j = 1 ,   i = 1 m ; j = 1 n .
The criterion for weighting objects was their recognizability on satellite and aerial imagery, which is the main source for their acquisition [34,47,48]. Thus, buildings and forests were given a weight of w j = 0.25, paved roads and railways w j = 0.15, while water bodies and streams just w j = 0.10. Futhermore, these weighting rules are also used in accessibility and passability analyses and, as noted by Pokonieczny [49], are extremely important in crisis management.
4.
Determine the positive (PIS) and negative (NIS) ideal solutions, as shown by Equations (4) and (5).
P I S = r 1 , r j r n , w h e r e   v j + = m a x ( r j i   i f   j B ; min r j i   i f   j J
N I S = r 1 , r j r n , w h e r e   v j + = min ( r j i   i f   j B ; max r j i   i f   j J
As all criteria used are destimulants, the positive ideal solution was computed as r j i min, while the negative ideal was r j i max.
5.
Calculate the separation measure Si of each alternative (relative closeness to the positive ideal solution) as (Equations (6) and (7)):
S i + = i n r i j r j + 2 ; S i = i n r i j r j 2 i = 1 , , m
6.
Calculate the closeness coefficient of the alternatives (CCi) as:
C C i = S i S i + + S i
7.
Sort alternatives in descending order, whereby the highest CCi indicates the best performance in relation to the evaluation criteria.

4. Results

4.1. Local CCI Diversity

The CCIL differs between regions considered, taking the highest values in Ostrowski and Międzyrzecki (see Table 2), which are both located in Greater Poland. The counties are characterized by a similar urbanization level (53.7 and 52.3, respectively) and land cover structure (see Table 1). Very high statistical dispersion of CCIL, as measured by the coefficient of variation, is observed in Słupski and Międzyrzecki counties, where population density, afforestation and percentage of area covered by agricultural land are similar. However, both counties differ significantly in terms of area, with Słupski county being twice as large.
The values of the local Compound Correspondence Index in Otwocki and Piaseczno counties are characterized by the highest value of IQR, as well as standard deviation, and thus indicate a high dispersion of local CCI values. Furthermore, mean CCIL takes a value greater than σ, which indicates that for most of the hexagonal cells, the CCIL value is lower than the mean value, i.e., the data consistency is relatively high here. The standard deviation of 0.068 and 0.072 in Otwocki and Piaseczno counties is almost twice as high as the lowest value recorded in Sokólski county (0.039). Furthermore, the relatively high diversity of analyzed data is proven by variance, which takes the value of 0.0052, 0.0046 and 0.0039 in Piaseczno, Otwocki, and Sanocki counties, respectively.
The CCIL values in all analyzed counties reveal clustering, as indicated by the spatial autocorrelation Global Moran’s I statistics, with z-score ranges of 21.80 to 14.62 (p-values < 0.001) in Piaseczno and Słupski counties, respectively.
A predominance of areas with low and very low differentiation between BDOT10k and OSM (CCIL first and second class) ranging from 83.5% to 85.3% was observed in Słupski and Międzyrzecki counties. The relative lack of congruence, defined as semi-compliance, with the highest values of 13.1% to 13.3% characterizes Otwocki, Piaseczno and Ostrowski regions. A great diversity in the analyzed datasets (CCIL fourth and fifth class) is noted in Otwocki (9.3%) and Piaseczno (9%) counties. The remaining districts stand out with relatively small noncompliance, amounting in Ostrowski to 7.1%, in Sanocki to 6.6% and in Słupski to 6.3%, with the least in Sokólski (5.6%) and Międzyrzecki (5.1%) (see Table 3). The level of compliance between the BDOT10k and OSM datasets is shown in Figure 3.

4.2. Regional CCI Diversity and Sensitivity Analysis

As the region’s area expands, the range of regional CCI values increases, and the local differences between the data become blurred. The variance, standard deviation and interquartile range decrease, which proves that CCIR7 values are less diverse than CCIR6, CCIR5 and CCIR4 (Table 4). Regardless of the region’s extent, the CCIRn mean value is greater than the median and slightly lower than the standard deviation, indicating a predominance of values smaller than the mean, i.e., maximum and moderate compliance between BDOT10k and OSM. Nevertheless, CCIRn is characterized by a large disparity, based on cv, which takes values above 100%.
The standard deviation of CCIR4 (0.0343) is smaller than that of CCIL in Sokólski county (CCIL σ = 0.0386), with the lowest CCIL value, which value is less than half that of the highest value in Piaseczno county (CCIL σ = 0.0723). Additionally, in terms of variance, the value of CCIR4 (0.0012) is noticeably smaller than that of CCLL, whereas, in Piaseczno county, the variance is four times higher (CCIL σ2 = 0.0052) (see also Table 2). The maximum compliance between BDOT10k and OSM oscillates around 32% of the entire region area. In contrast, moderate compliance, with CCIRn values in the range of −0.5 σ to 0.5 σ>, represents just over 50% (Table 5). The maximum disagreement between the data is basically negligible. The area where both BDOT10k and OSM data have the lowest correspondence between geospatial objects varies between 3.6 and 6.3%. Figure 4 presents a cartographic visualization of regional CCI7.
The highest IQR indicates that the spread of the CCIRn values declines when the region’s area expands, with the highest value in Piaseczno (CCIR4–IQR = 0.045; CCIR5–7–IQR 0.035) and the lowest in Sokólski (CCIR4–IQR = 0.020; CCIR5–7–IQR 0.017). In the remaining counties, the IQR amounted to about 0.025 (see Table 6). In Sanocki, Słupski and Otwocki counties, the standard deviation CCIRn takes values lower than the mean CCIRn in more than 50% of the grid cells (µ > σ), indicating the predominance of high agreement between BDOT10k and OSM data. In each county, regardless of the spatial extent of the region, the variance does not change, and the standard deviation decreases slightly, as does the achieved maximum value of CCIRn.
Figure 5 presents a cartographic visualization of CCIRn in two counties: Piaseczno, with the highest spread of CCI, and Sokólski with the lowest.

4.3. Local vs. Regional CCI

Each of the counties analyzed is characterized by an overestimation of the area of maximum compliance by regional CCI compared to local CCI. The largest growth of maximum data compliance takes place in Otwocki and Piaseczno counties, amounting to 50.9% and 40.1%, respectively, and the lowest in Międzyrzecki (16%) and Słupski counties (14.7%). The overestimation of a very good match between BDOT10k and OSM does not depend on the spatial extent of the region, i.e., it does not change when counties are added and the extent of the region increases. The upsurge in maximum data compliance was at the expense of a moderate compliance class, showing an average of 17.2% (the highest, 29.8%, in Otwocki and the lowest, 6.8%, in Międzyrzecki counties), and that in comparable data agreement (semi-conformance class) showed an average of 8.2% (the highest, 12.2%, in Otwocki and the lowest, 4.1%, in Słupski); low and very low compliance classes (moderate and maximum non-compliance) increased by 3.5% and 1.9%, respectively. The maximum decline in the area considered non-compliant occurred in Piaseczno and Otwocki counties. However, their extents do not exceed 9%.

5. Discussion

5.1. Semantic Uncertainty

The comparative analysis of data requires semantic similarity in the investigated concepts. As emphasized by many researchers [34,50,51], this assumption is a prerequisite for proper comparative data quality assessment. Semantic similarity in GIScience enables not only data comparison, but also the integration of data from different sources and their further analysis. If the semantic similarity between concepts is ignored, the evaluation or data comparison will be inaccurate [52]. The semantic consistency of concepts in IG can be analyzed at the general level, i.e., only the definition of the object and its geometrical representation are given [53], or at the detailed level, which also comprises definitions of attributes [34,54]. In general, semantic inconsistency results from the classification used, as well as scaling [53]. Scientists use numerous methods to quantify and visualize geodata uncertainties, many of which rely on mathematical models. Semantic uncertainty results from incomplete knowledge of spatial features and phenomena. Traditionally, the boundaries of geographical features are treated as discrete and mutually exclusive, and often, the true location of the boundary is unknown and subject to positional and semantic errors. Zhao et al. [52] introduced a conceptual framework of ontology that includes the definition, semantic relationship, nature, and attributes of geographic features (concept). The authors defined semantic similarity as “calculated on the basis of the feature similarity of ontology concept and the semantic distance of concept to measure the semantic similarity of ontology concept. The smaller the semantic distance between concepts, the higher the similarity between concepts”.
Based on OSMwiki [55] and Ministry Regulation [56], it is seen that in both datasets, a “building” is a man-made structure with a roof, standing (more or less) permanently in one place and geometrically represented by a polygon. Differences in definition only arise when “tag:building use” is considered. During forest surveys, two tags, “natural = wood” and “landuse = forest” are used to map. They represent forests or other areas of trees. Both tags together are compliant with the forest definition in BDOT10k. Water bodies are represented in OSM by the following tags: “natural = water”, “landuse = reservoir” and “water = reservoir”; these are equivalent to “surface water”, i.e., areas occupied by rivers, canals and reservoirs. “Highway = {motorway, trunk, primary, secondary, tertiary, unclassified, residential}” is the principal tag for the road network and corresponds to “road”, while “tag:railway = rail” matches “track or set of tracks” in BDOT10k. Finally, streams, rivers and other watercourses in OSM have the tag “waterway = {river, stream, tidal_channel}”, and in BDOT10k are named “river and stream/channel” [37].

5.2. Validity and Applicability of Data Comparison

Decisions to use fit-for-purpose geospatial datasets are heavily based on data quality, and in particular, information volume, which is understood as the number of geographical features captured. OSM data are rich and heterogeneous, and their quality strongly depends on the degree of urbanization, as mentioned by [34,57]. On the contrary, BDOT10k is perceived as very reliable as authoritative data [58]. It is updated every 2–3 years [56]. Both datasets are used in many applications in Poland by commercial companies and administrations, and in science and education.
Risk assessment and risk management, spatial planning, and environmental protection and monitoring applications often require detailed data when every object is important. In loss assessment (e.g., due to flood or fire), indicating access roads and planning the locations of investments and many other applications, every building, road, railway and object that constitute some kind of barrier, such as water or forest, is important. When the analyzed sets are characterized by high data agreement, the choice of the set is definitely less important than when the two sets vary from each other. It is then necessary to analyze the quality of the datasets based on the indicators described, inter alia, by Borkowska and Pokonieczny [37], and to make a choice supported by the cartographic visualization of data quality [38].
It is worth mentioning that OSM data are made available alongside official data in the form of the WMS (Web Map Service) in the national web portals (Geoportals), providing access to geographic information and spatial data services, e.g., in Poland, Germany, France and Greece [59,60,61,62].

6. Conclusions

It was assumed that the selection of a set of geospatial data for a specific application (fit-for-purpose data) could be completed on the basis of the local or regional CCI. Geospatial datasets are assessed by clustering hexagonal grid cells based on ordered CCI similarity. Like any data classification, it is an exploratory procedure because it leads to an understanding of objects and processes. The research is theoretical and methodological in nature, but the close connection to a specific problem situation also gives it a cognitive aspect.
Assuming that the study area consists of several spatially disconnected areas (e.g., counties, cities), the regional CCI makes it possible to assess the suitability of the sets according to a common scale based on the standard deviation. However, the results of the regional assessment outperform the local classification, giving better results, i.e., a higher level of data compliance. The overestimation of regional compliance ranges from 9 to 20% of the county’s area, with an average of 3% reduction in the area over which the two datasets (BDOT10k and OSM) have comparable information scope. Areas of medium and large incompatibility are reduced by an average of 2.4%. Sensitivity analysis shows that neither the size of the region nor the spatial location of the counties had a significant impact on the values of the regional CCI.
The CCI values in all analyzed districts revealed clustering. A greatest variation between BDOT10k and OSM data was observed in areas with a high degree of urbanization (e.g., Piaseczno city, Otwock city) and near the course of major transportation routes. The analyses carried out did not prove statistically significant correlations between the CCI coefficients and the land cover elements studied (buildings, roads, rivers, railways, forests and water bodies).
OSM is a valuable source of up-to-date geographic data in emergency mapping, with capacities including, but not limited to, identifying infrastructure at risk of destruction, collapsed buildings, fires and accessibility, which can be important inputs for the orientation of rescuers on the ground. The use of these data in areas with varying degrees of completeness and timeliness of the other official spatial data is of particular importance. Nevertheless, the volunteer type of data may raise issues about quality. The analysis performed of the compliance of the OSM dataset in comparison with official data allows the selection of a set with appropriate characteristics suitable for the intended purpose.
Conducting and comparing several counties using a common analytical framework allows for synthesizing how complete the analyzed geodatasets are, and identifying potential commonalities and differences across places.
The method proposed in this paper has several limitations. Among them are TOPSIS constrains, such as the way that variables are weighted, correlations between variables, and the possibility of an alternative that is close to the ideal point and the nadir point simultaneously. An additional limitation is the CCI designation that is based solely on criteria (variable) such as differences in the area and length of the geographic feature analyzed in the OSM and BDOT10k datasets. In future work, our research will primarily address at least some of the limitations mentioned above. Thus, we will use a different WLC approach to assess the correspondence between OSM and authoritative topographic data. The Hellwig’s information capacity method, a the method of optimal predictor selection, is considered for the selection of explanatory variables to be used in a model to evaluate geodatasets. Another research question to deliberate regards the impact of the scope and methods of weighting variables.

Author Contributions

Conceptualization S.B. and E.B.; methodology S.B. and K.P.; formal analysis S.B.; investigation S.B.; resources S.B.; data curation S.B.; writing—original draft preparation S.B.; writing—review and editing S.B., E.B. and K.P.; visualization S.B.; supervision K.P. and E.B.; project administration S.B.; funding acquisition K.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Institute of Geospatial Engineering and Geodesy, Faculty of Civil Engineering and Geodesy, Military University of Technology under the statutory research UGB-IG 531-4000-22-816.

Institutional Review Board Statement

Not relevant to this study.

Informed Consent Statement

Not applicable.

Data Availability Statement

The research data are available on the request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Goodchild, M.F. Geographical information science. Int. J. Geogr. Inf. Syst. 1992, 6, 31–45. [Google Scholar] [CrossRef]
  2. Sui, D.; Goodchild, M. The convergence of GIS and social media: Challenges for GIScience. Int. J. Geogr. Inf. Sci. 2011, 25, 1737–1748. [Google Scholar] [CrossRef]
  3. Pachelski, W. Aktualny stan europejskich i krajowych prac normalizacyjnych w dziedzinie Informacji Geograficznej [Present status of European and National Standardization in Geographic Information]. Rocz. Geomatyki Ann. Geomat. 2004, 2, 96–105. [Google Scholar]
  4. Bielecka, E.; Burek, E. Spatial data quality and uncertainty publication patterns and trends by bibliometric analysis. Open Geosci. 2019, 11, 219–235. [Google Scholar] [CrossRef]
  5. Najwer, A.; Jankowski, P.; Niesterowicz, J.; Zwoliński, Z. Geodiversity assessment with global and local spatial multicriteria analysis. Int. J. Appl. Earth Obs. Geoinf. 2022, 107, 102665. [Google Scholar] [CrossRef]
  6. Pluto-Kossakowska, J.; Tulkowska, W.; Władyka, M. GIS technology in green and blue infrastructure analysis. Ann. Geomat. 2020, XVIII, 33–50. [Google Scholar]
  7. Bober, A.; Calka, B.; Bielecka, E. Application of state survey and mapping resources for selecting sites suitable for solar farms. In Proceedings of the 16th International Multidisciplinary Scientific Geoconference (SGEM 2016), Albena, Bulgaria, 29 June–5 July 2016; Volume 1, pp. 593–600. [Google Scholar]
  8. Mierzwiak, M.; Calka, B. Multi-Criteria Analysis for Solar Farm Location Suitability. Rep. Geod. Geoinform. 2017, 104, 20–32. [Google Scholar]
  9. Give, S.; Brancia, A.; Satterstrom, F.K.; Linkov, I. Decision Support Systems and Environment: Role of MCDA. In Decision Support Systems for Risk Based Management of Contaminated Sites; Marcomini, A., Suter, G.W., Critto, A., Eds.; Springer: New York, NY, USA, 2009. [Google Scholar]
  10. Zykwinska-Rauba, K. Optymalizacja wielokryterialna w procesach decyzyjnych i jej wykorzystanie w zarządzaniu środowiskiem w zrównoważonych miastach. In Zarządzanie Przedsiębiorstwem Wobec Współczesnych Wyzwań Technologicznych. Społecznych i Środowiskowych; Walaszczyk, A., Koszewska, M., Eds.; Wydawnictwo Politechniki Łódzkiej: Łódź, Poland, 2021; pp. 174–186. [Google Scholar]
  11. Remmel, T.K.; Fortin, M.-J. Categorical, Class-focused map patterns: Characterization and comparison. Landsc. Ecol. 2013, 28, 1587–1599. [Google Scholar] [CrossRef]
  12. Boots, B.; Csillag, F. Categorical maps. Comparisons. and confidence. J. Geogr. Syst 2006, 8, 109–118. [Google Scholar] [CrossRef]
  13. Foody, G.M. Thematic Map Comparison: Evaluating the Statistical Significance of Differences in Classification Accuracy. Photogramm. Eng. Remote Sens. 2004, 70, 627–633. [Google Scholar] [CrossRef]
  14. Pontius, R.G., Jr. Comparison of categorical maps. Photogramm. Eng. Remote Sens. 2000, 66, 1011–1016. Available online: http://www2.clarku.edu/~rpontius/pontius_2000_pers.pdf (accessed on 13 January 2023).
  15. De Raadt, A.; Warrens, M.J.; Bosker, R.J.; Kiers, H.A. Kappa Coefficients for Missing Data. Educ. Psychol. Meas. 2019, 79, 558–576. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Cohen, J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 1968, 70, 213–220. [Google Scholar] [CrossRef] [PubMed]
  17. Vanbelle, S.; Albert, A. A note on the linearly weighted kappa coefficient for ordinal scales. Stat. Methodol. 2009, 6, 157–163. [Google Scholar] [CrossRef] [Green Version]
  18. Pontius, R.G.; Millones, M. Death to Kappa: Birth of quantity disagreement and allocation disagreement for accuracy assessment. Int. J. Remote Sens. 2011, 32, 4407–4429. [Google Scholar] [CrossRef]
  19. Pontius, R.G. Indices of Agreement. In Metrics that Make a Difference. Advances in Geographic Information Science; Springer: Cham, Switzerland, 2022. [Google Scholar] [CrossRef]
  20. Li, H.; Reynolds, J.F. A simulation experiment to quantify spatial heterogeneity in categorical maps. Ecology 1994, 75, 2446–2455. [Google Scholar] [CrossRef]
  21. Lex, A.; Streit, M.; Partl, C.; Kashofer, K.; Schmalstieg, D. Comparative Analysis of Multidimensional. Quantitative Data. IEEE Trans. Vis. Comput. Graph. 2010, 16, 1027–1035. [Google Scholar] [CrossRef]
  22. Hagen, A. Fuzzy set approach to assessing similarity of categorical maps. Int. J. Geogr. Inf. Sci. 2003, 17, 235–249. [Google Scholar] [CrossRef] [Green Version]
  23. Pontius, R.G., Jr.; Suedmeyer., B. Components of agreement between categorical maps at multiple resolutions. In Remote Sensing and GIS Accuracy Assessment; Routledge: Oxfordshire, UK, 2004; Volume 2004, pp. 233–251. Available online: http://www2.clarku.edu/faculty/rpontius/pontius_suedmeyer_2004_rsgisaa.pdf (accessed on 16 January 2023).
  24. Wabiński, J.; Mościcka, A.; Kuźma, M. The Information Value of Tactile Maps: A Comparison of Maps Printed with the Use of Different Techniques. Cartogr. J. 2021, 58, 930. [Google Scholar] [CrossRef]
  25. Salistchev, K.A. Kartografia Ogólna, 2nd ed.; Wydawnictwo Naukowe PWN: Warsaw, Poland, 1998. [Google Scholar]
  26. Comber, A.; Fisher, P.; Wadsworth, R. What is Land Cover? Environ. Plan. B Plan. Des. 2005, 32, 199–209. [Google Scholar] [CrossRef] [Green Version]
  27. Csillag, F.; Boots, B. Comparing maps as spatial processes. In Developments in Spatial Data Handling; Fisher, P., Ed.; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2004; pp. 641–652. [Google Scholar]
  28. Arsanjani, J.J.; Zipf, A.; Mooney, P.; Helbich, M. An introduction to OpenStreetMap in Geographic Information Science: Experiences, research, and applications. In OpenStreetMap in GIScience: Experiences, Research, and Applications. Lecture Notes in Geoinformation and Cartography; Springer International Publishing: Cham, Switzerland, 2015; pp. 1–15. [Google Scholar] [CrossRef]
  29. Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environ. Plan. B Plan. Des. 2008, 37, 682–703. [Google Scholar] [CrossRef] [Green Version]
  30. Girres, J.F.; Touya, G. Quality Assessment of the French OpenStreetMap Dataset. Trans. GIS 2010, 14, 435–459. [Google Scholar] [CrossRef] [Green Version]
  31. Mondzech, J.; Sester, M. Quality Analysis of OpenStreetMap Data Based on Application Needs. Cartographica 2011, 46, 115–125. [Google Scholar] [CrossRef]
  32. Ciepłuch, B.; Jacob, R.; Mooney, P.; Winstanley, A. Comparison of the accuracy of OpenStreetMap for Ireland with Google Maps and Bing Maps. In Proceedings of the Ninth International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Leicester, UK, 20–23 July 2010. [Google Scholar]
  33. Zielstra, D.; Zipf, A. A Comparative Study of Proprietary Geodata and Volunteered Geographic Information for Germany. In Proceedings of the 13th AGILE International Conference on Geographic Information Science, Guimarães, Portugal, 10–14 May 2010. [Google Scholar]
  34. Nowak Da Costa, J. Novel tool for examination of data completeness based on a comparative study of VGI data and official building datasets. Geod. Vestn. 2016, 60, 495–508. [Google Scholar] [CrossRef]
  35. Hagenauer, J.; Helbich, M. Mining urban land-use patterns from volunteered geographic information by means of genetic algorithms and artificial neural networks. Int. J. Geogr. Inf. Sci. 2012, 26, 963–982. [Google Scholar] [CrossRef]
  36. Zhou, Q.; Wang, S.; Liu, Y. Exploring the accuracy and completeness patterns of global land-cover/land-use data in OpenStreetMap. Appl. Geogr. 2022, 145, 102742. [Google Scholar] [CrossRef]
  37. Borkowska, S.; Pokonieczny, K. Analysis of OpenStreetMap Data Quality for Selected Counties in Poland in Terms of Sustainable Development. Sustainability 2022, 14, 3728. [Google Scholar] [CrossRef]
  38. Borkowska, S.; Bielecka, E.; Pokonieczny, K. OpenStreetMap—Building data completeness visualization in terms of “Fitness for purpose”. Adv. Geod. Geoinf. 2023, 72, e35. [Google Scholar]
  39. Solon, J.; Borzyszkowski, J.; Bidłasik, M.; Richling, A.; Badora, K.; Balon, J.; Brzezinska-Wójcik, T.; Chabudzinski, Ł.; Dobrowolski, R.; Grzegorczyk, I.; et al. Physico-geographical mesoregions of Poland: Verification and adjustment of boundaries on the basis of contemporary spatial data. Geogr. Pol. 2018, 91, 143–170. [Google Scholar] [CrossRef]
  40. Krebs, C.J. Ecological Methodology; Harper Collins: New York, NY, USA, 1989. [Google Scholar]
  41. Song, Z.; Roth, R.E.; Houtman, L.; Prestby, T.; Iverson, A.; Gao, S. Visual storytelling with maps: An empirical study on story map themes and narrative elements. visual storytelling genres and tropes. and individual audience differences. Cartogr. Perspect. 2022, 100, 10–44. [Google Scholar] [CrossRef]
  42. Ren, L.; Zhang, Y.; Wang, Y.; Sun, Z. Comparative analysis of a novel M-TOPSIS method and TOPSIS. Appl. Math. Res. Express 2007, 2007, abm005. [Google Scholar] [CrossRef] [Green Version]
  43. Zyoud, S.H.; Fuchs-Hanusch, D. A bibliometric-based survey on AHP and TOPSIS techniques. Expert Syst. Appl. 2017, 78, 158–181. [Google Scholar] [CrossRef]
  44. Hwang, C.L.; Yoon, K. Methods for multiple attribute decision making. In Multiple Attribute Decision Making; Beckmann, M., Künzi, H.P., Eds.; Springer: Berlin, Germany, 1981; pp. 58–191. [Google Scholar]
  45. Zlatko Pavić, Z.; Novoselac, V. Notes on TOPSIS Method. Int. J. Res. Eng. Sci. 2013, 1, 5–12. [Google Scholar]
  46. Zavadskas, E.-K.; Mardani, A.; Turskis, A.; Jusoh, A.; Khalil, M.D. Development of TOPSIS Method to Solve Complicated Decision-Making Problems—An Overview on Developments from 2000 to 2015. Int. J. Inf. Technol. Decis. Mak. 2016, 15, 645–682. [Google Scholar] [CrossRef]
  47. Goodchild, M.F. Citizens as Voluntary Sensors: Spatial Data Infrastructure in the World of Web 2.0. Int. J. Spat. Data Infrastruct. Res. 2007, 2, 24–32. [Google Scholar]
  48. Cichociński, P. A study on the usability of open spatial data for road network-based analysis—Using OpenStreetMap as an example. Geoinformatica Pol. 2021, 20, 89–96. [Google Scholar] [CrossRef]
  49. Pokonieczny, K.J. Comparison of land passability maps created with use of different spatial data bases. Geografie Prague 2018, 123, 317–352. [Google Scholar] [CrossRef] [Green Version]
  50. Ballatore, A.; Bertolotto, M.; Wilson, D.C. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl. Inf. Syst. 2013, 37, 61–81. [Google Scholar] [CrossRef] [Green Version]
  51. Fonte, C.C.; Antoniou, V.; Bastin, L.; Estima, J.; Arsanjani, J.J.; Bayas, J.-C.L.; See, L.; Vatseva, R. Assessing VGI Data Quality. In Mapping and the Citizen Sensor; Foody, G., See, L., Fritz, S., Mooney, P., Olteanu-Raimond, A.-M., Fonte, C.C., Antoniou, V., Eds.; Ubiquity Press: London, UK, 2017; pp. 137–163. [Google Scholar] [CrossRef] [Green Version]
  52. Zhao, Y.; Wei, X.; Liu, Y.; Liao, Z. A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts. Appl. Sci. 2022, 12, 11363. [Google Scholar] [CrossRef]
  53. Calka, B.; Orych, A.; Bielecka, E.; Mozuriunaite, S. The Ratio of the Land Consumption Rate to the Population Growth Rate: A Framework for the Achievement of the Spatiotemporal Pattern in Poland and Lithuania. Remote Sens. 2022, 14, 1074. [Google Scholar] [CrossRef]
  54. Majic, I.; Winter, S.; Tomko, M. Finding equivalent keys in OpenStreetMap: Semantic similarity computation based on extensional definitions. In Proceedings of the 1st Workshop on Artificial Intelligence and Deep Learning for Geographic Knowledge Discovery GeoAl’17, Los Angeles, CA, USA, 7–10 November 2017; pp. 24–32. [Google Scholar]
  55. OSM. Available online: https://wiki.openstreetmap.org/wiki/ (accessed on 19 January 2023).
  56. Regulation of the Minister of Development, Labour and Technology of July 27, 2021 on the Database of Topographic Objects and the Database of General Geographic Objects, as well as Standard Cartographic Studies, Dz.U. 2021, nr 30, poz. 1412. Available online: https://isap.sejm.gov.pl/isap.nsf/DocDetails.xsp?id=WDU20210001412 (accessed on 26 March 2023).
  57. Ribeiro, A.; Fonte, C.C. A Methodology for Assessing Openstreetmap Degree of Coverage for Purposes of Land Cover Mapping. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 2015, II-3/W5, 297–303. [Google Scholar] [CrossRef] [Green Version]
  58. Bielecka, E. Geographical Data Sets Fitness of Use Evaluation. Geod. Vestn. 2018, 59, 335–348. [Google Scholar] [CrossRef]
  59. Geoportal Krajowy Service. Available online: https://mapy.geoportal.gov.pl/imap/Imgp_2.html?gpmap=gp0 (accessed on 26 March 2023).
  60. Geoportal. de Service. Available online: https://www.geoportal.de/map.html (accessed on 26 March 2023).
  61. Geoportail Service. Available online: https://www.geoportail.gouv.fr/donnees/openstreetmap-monde (accessed on 26 March 2023).
  62. Geodata.gr Service. Available online: http://geodata.gov.gr/maps/?locale=en (accessed on 26 March 2023).
Figure 1. Study area.
Figure 1. Study area.
Applsci 13 07525 g001
Figure 2. The workflow schema.
Figure 2. The workflow schema.
Applsci 13 07525 g002
Figure 3. CCIL in analyzed counties: (a) Piaseczno, (b) Sokólski, (c) Sanocki, (d) Słupski, (e) Ostrowski, (f) Otwocki, (g) Międzyrzecki.
Figure 3. CCIL in analyzed counties: (a) Piaseczno, (b) Sokólski, (c) Sanocki, (d) Słupski, (e) Ostrowski, (f) Otwocki, (g) Międzyrzecki.
Applsci 13 07525 g003
Figure 4. Regional CCI7 in analyzed counties: (a) Piaseczno, (b) Sokólski, (c) Sanocki, (d) Słupski, (e) Ostrowski, (f) Otwocki, (g) Międzyrzecki.
Figure 4. Regional CCI7 in analyzed counties: (a) Piaseczno, (b) Sokólski, (c) Sanocki, (d) Słupski, (e) Ostrowski, (f) Otwocki, (g) Międzyrzecki.
Applsci 13 07525 g004
Figure 5. CCIRn choropleth map: (a) Piaseczno; (b) Sokólski.
Figure 5. CCIRn choropleth map: (a) Piaseczno; (b) Sokólski.
Applsci 13 07525 g005
Table 1. General characteristics of the counties under consideration.
Table 1. General characteristics of the counties under consideration.
DescriptionPiasecznoSokólskiSanockiSłupski OstrowskiOtwockiMiędzyrzecki
Geographical Subprovinces 1Central Polish LowlandsPodlasie-
Bialystok
Upland
Eastern BeskidsSouth Baltic CoastCentral Polish LowlandsCentral Polish LowlandsGreater Poland Lake District
Area (km2) 2621.122054.341223.622347.591159.92616.461387.61
People190,60764,90292,90098,761161,581124,28357,100
Population density31132814313920241.5
Number of cities4422233
Urbanization level (%)47.841.747.220.753.761.852.3
Land use 2 (km2):
Built-up and artificial
82.5172.9343.6585.1555.6258.9124.08
Forest132.88547.91586.67864.13347.59250.15735.43
Agriculture387.371426.28512.141234.97728.53276.51513.42
Water bodies16.446.7113.60110.6513.3111.1438.39
Protected areaChojnów Landscape Park, protected landscape areasKnyszyn Forest, Biebrza National ParkSłonne Mountains Landscape Park, protected landscape areasSłowiński National ParkLandscape Park Dolina Baryczy, protected landscape areaMasovian Landscape Park, Landscape Park Dolina Środkowego ŚwidraNotecka Forest, Pszczewska Landscape Park
1 Solon et al. [39]; 2 Based on data from the National Register of Boundaries (PRG); Based on Cadastral data 2021 from geoportal.gov.pl.
Table 2. Descriptive CCIL statistics.
Table 2. Descriptive CCIL statistics.
StatisticsPiasecznoSokólskiSanockiSłupski OstrowskiOtwockiMiędzyrzecki
Mean 0.09150.03900.06780.04140.04270.09890.0353
Median0.07540.03040.05330.03130.03140.08320.0271
Minimum 0000000
Maximum0.48280.53290.49950.47640.57650.50690.5899
Q10.04200.01590.03150.01650.01810.05420.0129
Q30.11960.04990.07900.05070.05380.12460.0427
Variance (σ2)0.00520.00150.00390.00180.00170.00460.0016
Std. Dev. (σ)0.07230.03860.06270.04260.04060.06800.0405
Coeff. of variation78.975598.890892.3641103.042795.107268.7311114.7669
Interquartile range (IQR)0.07750.03400.04750.03420.03570.07040.0298
Table 3. The percentage of a county’s area in CCIL classes.
Table 3. The percentage of a county’s area in CCIL classes.
ClassDescriptionRangePercentage of the County’s Area (%)
PiasecznoSokólskiSanockiSłupski OstrowskiOtwockiMiędzyrzecki
1maximum compliance−0.50 σ < CCIL35.532.130.630.633.234.129.1
2moderate compliance−0.5 σ < CCIL ≤ 0.5 σ42.349.652.252.946.543.556.2
3semi-compliance0.5 σ ≤ CCIL ≤ 1.5 σ13.112.710.610.313.313.19.6
4moderate noncompliance1.5 σ ≤ CCIL ≤ 2.5 σ 5.62.73.13.14.69.32.5
5maximum noncomplianceCCIL > 2.5 σ3.42.93.53.22.5-2.6
Table 4. Regional CCI descriptive statistics.
Table 4. Regional CCI descriptive statistics.
StatisticsCCIR4 1CCIR5CCIR6CCIR7
Mean 0.03270.02720.02810.0263
Median0.02410.02010.02080.0193
Minimum 0000
Maximum0.49710.52890.52990.5389
Q10.01230.01040.01080.01
Q30.04100.03400.03520.0327
Variance (σ2)0.00120.00080.00080.0008
Std. Dev. (σ)0.03430.02870.02890.0278
Coefficient of variation (cv)104.9797105.3271102.7925105.6453
Interquartile range (IQR)0.02870.02360.02440.0227
1 CCIR4—denotes the regional CCI computed for four counties, namely, Piaseczno, Sokólski, Sanocki and Słupski; CCIR5—five counties, Ostrowski was added; CCIR6—six counties, additionally Ostrowski; and CCIR7—seven regions, Międzyrzecki added.
Table 5. The percentage of a county’s area belonging to regional CCI class of compliance.
Table 5. The percentage of a county’s area belonging to regional CCI class of compliance.
ClassDescriptionInterval SizePercentage of the Counties Area (%)
Regional CCIR4Regional CCIR5Regional CCIR6Regional CCIR7
1maximum compliance−0.50 σ < CCIRn31.931.532.332.1
2moderate compliance−0.5 σ < CCIRn ≤ 0.5 σ50.951.350.151.2
3semi-compliance0.5 σ ≤ CCIRn ≤ 1.5 σ11.110.911.010.4
4moderate noncompliance1.5 σ ≤ CCIRn ≤ 2.5 σ 6.13.63.96.3
5maximum noncomplianceCCIRn > 2.5 σ-2.72.7-
Table 6. CCIRn descriptive statistics in each county.
Table 6. CCIRn descriptive statistics in each county.
CountyCCIRnMean MedianMinMaxQ1Q3IQRσ2σ
CCIR40.05000.03270.00000.43560.01700.06200.04500.00280.0527
CCIR50.04030.02700.00000.37780.01410.04950.03540.00180.0425
PiasecznoCCIR60.04030.02710.00000.37780.01420.04950.03530.00180.0425
CCIR70.03980.02610.00000.38450.01370.04910.03540.00180.0429
CCIR40.02490.01740.00000.49710.00910.03000.02090.00090.0304
SokólskiCCIR50.02080.01470.00000.39490.00780.02490.01720.00060.0254
CCIR60.02090.01470.00000.39120.00770.02500.01720.00070.0256
CCIR70.02010.01380.00000.39090.00720.02380.01660.00060.0255
CCIR40.02900.02630.00000.17410.01480.03820.02340.00040.0204
SanockiCCIR50.03480.03150.00000.19750.01760.04580.02820.00060.0243
CCIR60.02910.02640.00000.17310.01490.03840.02350.00040.0204
CCIR70.02790.02490.00000.16190.01410.03720.02310.00040.0196
CCIR40.03360.02570.00000.32800.01330.04220.02900.00110.0331
SłupskiCCIR50.02830.02150.00000.26990.01100.03550.02450.00080.0280
CCIR60.02830.02150.00000.26840.01110.03540.02430.00080.0279
CCIR70.02690.02030.00000.26390.01050.03360.02320.00070.0269
CCIR50.02720.01910.00000.52890.01110.03280.02170.00090.0298
OstrowskiCCIR60.02720.01930.00000.52990.01110.03290.02180.00090.0298
CCIR70.02680.01880.00000.53890.01060.03200.02130.00090.0300
CCIR60.03840.03100.00000.24290.01930.04790.02860.00090.0292
OtwockiCCIR70.03690.02950.00000.24720.01830.04600.02770.00080.0289
MiędzyrzeckiCCIR70.02170.01690.00000.23730.00840.02760.01920.00050.0223
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Borkowska, S.; Bielecka, E.; Pokonieczny, K. Comparison of Land Cover Categorical Data Stored in OSM and Authoritative Topographic Data. Appl. Sci. 2023, 13, 7525. https://doi.org/10.3390/app13137525

AMA Style

Borkowska S, Bielecka E, Pokonieczny K. Comparison of Land Cover Categorical Data Stored in OSM and Authoritative Topographic Data. Applied Sciences. 2023; 13(13):7525. https://doi.org/10.3390/app13137525

Chicago/Turabian Style

Borkowska, Sylwia, Elzbieta Bielecka, and Krzysztof Pokonieczny. 2023. "Comparison of Land Cover Categorical Data Stored in OSM and Authoritative Topographic Data" Applied Sciences 13, no. 13: 7525. https://doi.org/10.3390/app13137525

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop