A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts

Zhao, Yijiang; Wei, Xingcai; Liu, Yizhi; Liao, Zhuhua

doi:10.3390/app122211363

Open AccessArticle

A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts

by

Yijiang Zhao

^1,2,*,

Xingcai Wei

^1,2,

Yizhi Liu

^1,2 and

Zhuhua Liao

^1,2

¹

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

²

Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan 411201, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(22), 11363; https://doi.org/10.3390/app122211363

Submission received: 29 July 2022 / Revised: 25 October 2022 / Accepted: 7 November 2022 / Published: 9 November 2022

(This article belongs to the Section Earth Sciences)

Download

Browse Figures

Versions Notes

Abstract

:

Due to their status as non-professionals, the reputations of Volunteered Geographic Information (VGI) contributors have a very important impact on data quality. In the process of contributor reputation evaluation in OpenStreetMap (OSM), it is very difficult to calculate the semantic similarity between object versions contributed by volunteers. Aimed at this issue, this paper proposes a model of contributor’s reputation based on semantic similarity of ontology concepts. Firstly, contributors are classified into three categories based on an improved WPCA and classification method. Then, an initial reputation is set for every OSM user in each class according to these categories and related research. Secondly, the related concept ontology is constructed for OSM entities; then, the semantic similarity of the object version is calculated according to the similarity of concept attributes and the semantic distance of concept. The contributor’s evaluation reputation is computed by synthesizing the semantic similarity, geometric similarity, and topological similarity of object versions. Thirdly, the contributor’s evaluation reputation and the initial reputation is aggregated to obtain the contributor’s reputation; finally, the OSM data of Rutland, England, is used as an example to verify the validity of our model. The experimental results show that the proposed model can obtain a more comprehensive contributor’s evaluation by fusing with the semantic similarity of ontology concept. The evaluation bias caused only by the semantic change between versions can be eliminated. Moreover, the obtained user’s reputation is positively correlated with the data quality. The contributor’s reputation evaluation method proposed in this paper is an effective method for evaluating the contributor’s reputation in OSM-like systems.

Keywords:

contributor’s reputation; semantic similarity; ontology; VGI; OSM

1. Introduction

Volunteered Geographic Information (VGI) was proposed by Goodchild (2007) [1], which provided a platform for geographic knowledge sharing and collaborative work [2] and changed the traditional acquisition and application mode of traditional geographic information. Benefiting from its low cost and the ability to generate diversified and refined spatial data, VGI has gradually become an important supplement to traditional geographic information [3]. As the most representative project of VGI, OSM has been one of the research hotspots in the field of geographic information for many years.

OSM is an open-source map project that provides free and accessible global geographic data [4]. Due to the uneven geographic information knowledge level and contribution habits of OSM data contributors, they may bring a large quantity of low-quality and false data, which leads to a certain degree of uncertainty in the description of the objective world. Academics and industry are concerned about data quality, which affects its application and promotion in projects with high-quality requirements [5].

At present, many scholars have studied the quality of OSM data. Due to the heterogeneity of the quality of crowdsourcing geographic data, Zhou et al. evaluated the quality of crowdsourcing geographic data from the perspectives of the overall evaluation and individual evaluation [6]. The overall evaluation mainly compares the data with the authoritative data [7,8], but this method has some limitations, such as the authoritative data being difficult to obtain and the slow update. In order to solve the former limitations, many scholars began to explore the evaluation methods for the individual quality of crowdsourcing objects. This type of approach evaluates data quality by analyzing the evolution of the dataset itself rather than comparing it with external datasets. In view of the importance of contributors in VGI, many methods were proposed to evaluate their reputation base on either direct or indirect scoring method in which a direct-scoring-based evaluation method requires contributors to explicitly score the object. For example, Bishr et al. proposed an explicit evaluation mechanism for crowdsourcing geographic information, stressing the notion of people–object transitivity: the degree of reliability associated with a person can be reflected in a related entity [9]. This method needs to obtain information such as social relations among contributors, which increases extra workload. Therefore, as an important method for quality assessment, it concerned many scholars. Fogliaroni et al. believed that data credibility can be used as a measurement index of contributor’s reputation—contributor’s reputation is the average value of their contribution data credibility—and proposed a model to measure data credibility and contributor’s reputation by analyzing geometric and topological attributes [10]. Zhao et al. believed that the credibility of data is related to the reputation of current contributors and previous contributors, and they proposed a calculation model based on OSM user’s reputation to evaluate volunteered geographic information from the perspective of credibility [11]. To sum up, many scholars have proposed that the semantic similarity of objects should be considered in the process of user reputation calculation [10,11]. However, the above evaluation methods of contributor’s reputation have their advantages and disadvantages. Fogliaroni et al. proposed a reputation model which mentioned three similarities, including geometry, topology, and semantics, but the semantics are not used in actual calculation models and experiments. In the editing process, there are frequently many modifications made by contributors only to object tags. In this case, only the semantic properties of the object are changed, while the geometric and topological properties remain unchanged. If the semantic similarity between object versions is ignored, the evaluation of the volunteer’s will be inaccurate.

Many scholars have shown that the contributors’ reputations have an important impact on the quality of the data they produce. It can be used to evaluate the data quality. However, there is still a lack of research on the contributor’s reputation considering the semantic similarity of the object’s versions. Aimed at the existing problems, this paper proposes a reputation model of OSM contributor based on semantic similarity of ontology concepts. Firstly, this model analyzes the composition of the contributor’s reputation, and divides the contributor’s reputation into two parts: evaluation reputation and initial reputation [12]. The initial reputation is determined by the classified contributor’s categories, and the evaluation reputation is based on semantic similarity of ontology concept, geometric similarity, and topological similarity. Then, the initial reputation and the evaluation reputation are combined to obtain the contributor’s reputation. Finally, the reputation evaluation model proposed in this paper is verified by taking the historical data of Rutland County, England, in OSM as an example. The experimental results show that the evaluation model proposed in this paper can more truly evaluate the quality of users’ contribution objects and verify the effectiveness of the model. The main contributions of this paper are as follows:

(1): Ontology-related research in the field of knowledge is used in this study, constructing the volunteered geographic information ontology and establishing a semantic similarity evaluation model for evaluating volunteer contributed objects, then combining the semantic similarity of ontology concept, geometric similarity, and topological similarity to obtain the contributor’s evaluation reputation;
(2): An evaluation method of the contributor’s initial reputation is proposed. Firstly, contributors are classified by the improved WPCA-based feature dimension reduction and classification method. Then, an initial reputation is set for every OSM user in each class according to these categories and related research results. The effectiveness of this method is verified by experiments;
(3): A comprehensive evaluation method of the contributor’s reputation is proposed. This method more successfully combines initial reputation and evaluation reputation. With the increase of the contributor’s reputation evaluation, the weight of the initial reputation becomes smaller;
(4): The validity of the contributor’s reputation model proposed in this paper is verified by experiments using the real historical data of OSM. The experiment shows that the contributor’s reputation is essentially positively correlated with the contributor’s initial reputation. Because the semantic similarity between object versions is considered, the quality of contributors’ tagged data can be evaluated more accurately.

This paper is divided into six sections. Related work is presented in Section 2. Then, in Section 3, we introduce the general idea of the model and the calculation method of contributor’s reputation. The experimental procedure and analysis results are presented in Section 4. Section 5 discusses the advantages, limitations, and future research. Finally, Section 6 provides a conclusion.

2. Related Work

2.1. The Quality of Crowdsourcing Geographic Data

Although VGI is a low-cost geographic information collection method and has been successfully applied, the disadvantage, which has been puzzling academia, is that contributors provide data in an unconstrained way, which leads to the inability to guarantee the data quality. According to recent studies on VGI data quality, the overall evaluation is mainly through comparison with authoritative datasets [13,14]. This part of the research focuses mainly on the quality evaluation of road networks in several European developed countries [15,16,17]. The accuracy and coverage of these studies are high, but the authoritative datasets are difficult to obtain and the high-cost characteristics restrict their development. Individual evaluation aims at analyzing the evolution of data itself to provide indicators for evaluating the quality of the data. For example, Barron et al. proposed a comprehensive framework for assessing the intrinsic quality of OSM based on historical data [5]. However, in the process of analyzing data evolution, Mooney et al. found that the way in which contributors marked or annotated objects would lead to some serious problems, which were caused mainly by contributors manually selecting and misspelling object values from the ontology [18]. Therefore, semantic change is one of the factors that affect the quality of OSM data. In order to explore the impact of semantics on data quality [19], Ballatore et al. proposed the use of a co-reference algorithm to calculate conceptual similarity in semantic networks [20]. Zhao et al. reduced semantic heterogeneity and improved semantic quality from an objective point of view by studying the modification mode of feature classes in OSM [21].

2.2. The Contributor’s Reputation of Crowdsourcing Geographic Data

On the basis of the credibility principle—with the increase of user editing, the authenticity of the object will also increase—the quality of VGI can be measured according to the reliability of contributors [22]. For example, when an object is edited by several reliable contributors, the data quality of the object is higher. Adler et al. proposed a content-driven reputation system. The user’s reputation depends on the length of time that their content remains intact and the number of times of its modification. The result shows that the users with low reputation provide data that have lower quality and are more frequently modified [23]. According to the above literature, it can be seen that contributor’s reputation will affect the data quality they produce, and it can be used as the basis for evaluating and guaranteeing data quality [24]. Contributor’s reputation of crowdsourcing geographic data is one of the common methods for evaluating data quality [25].

Flanagin et al. proposed the construction of contributor indicators, such as contributor’s reputation, which can evaluate VGI quality in most cases [26]. Goodchild et al. proposed a method to ensure the quality of VGI through highly reliable contributors [27]. Lodigiani et al. proposed a PageRank algorithm suitable for web pages to obtain the contributor’s reputation [28]. This method evaluates the contributor’s reputation through personalized vectors according to their behaviors. Zhang et al. proposed an evaluation-based weighted page ranking (EWPR) algorithm based on the PageRank algorithm to provide a ranking measure of reputation [2]. Ghasemi Nejad et al. proposed a comprehensive reputation system based on various implicit evaluations of volunteers in the process of contribution [29]. D’Antonio et al. proposed a method to evaluate the quality of VGI based on direct, indirect, and time effects by studying the factors that affect data quality evaluation. Then, credibility and contributor’s reputation can be automatically calculated according to users [30]. Keßler et al. used five indicators, namely contributor, version, confirmation, correction, and modification, as credibility measures and calculated the object credibility score and contributor’s reputation value through the evolution process of the object, which was helpful for evaluating the quality of data [31]. Zhao et al. proposed a spatio-temporal VGI model considering contributor’s reputation and object’s credibility by analyzing the elements that affect VGI, such as geographic entities, state versions, contributors, reputation, and geographic events as well as the interaction mechanism between them [32].

2.3. The Ontology of Crowdsourcing Geographic

In recent years, many studies have introduced the concept of ontology into the GIS field and proposed the concept of geographic ontology. The main goal of geographic ontology is to form a hierarchical system of various objects and relationships that constitute the geographic space [33]. The concept of ontology originates from the field of philosophy, which refers mainly to the nature of the existence of things and the description of the laws of existence. With the development of science and technology, this concept has been applied to the computer field. A certain field in the real world is abstracted into a set of concept and the relationships between concept, and the ontology of this field is constructed. At present, semantic similarity based on ontology concepts can be divided into three categories. Feature-based similarity measure is based on the set theory proposed by Tversky [34]. This method focuses mainly on the similarity between two conceptual attributes and has advantages in calculating the semantic similarity of a hierarchical classification of complex structures, but it needs to assign weights to each independent item; IC-based similarity measure evaluates semantic similarity on the basis of the information quantity of concept, which relies on a large amount of prepared data to discover the heterogeneous meaning of each concept [35]; edge-based similarity measure aims to calculate the link or depth between concepts. This method is relatively simple and the calculation is low-cost [36].

3. Methods

3.1. Model Overview

In order to make up for the gap in the semantics of contributors’ reputation evaluation, this paper proposes a reputation model of OSM contributor based on semantic similarity of ontology concepts. The specific process is shown in Figure 1.

Contributor’s reputation is divided into two main parts: initial reputation and evaluation reputation. Initial reputation refers to the reputation of contributors as new users before they participate in the contribution, which depends mainly on contribution habits, registration time, and other factors. The contributors are classified by the improved WPCA-based feature dimension reduction and classification method [37], then different types of contributors and their initial reputation can be obtained. Evaluation reputation refers to the evaluation of a user by other users in the process of contributing data, which depends mainly on factors such as the quality and quantity of his contributed data. This model compares the similarities between two adjacent versions and continually aggregates the evaluation of contributors in the editing process to obtain the contributor’s evaluation reputation. Finally, this model aggregates the initial reputation and evaluation reputation of contributors to obtain the contributor’s reputation.

3.2. Contributor’s Initial Reputation

The contributor’s initial reputation is the reputation before they participate in the contributions. Different types of users provide data of different quality. For example, the data quality provided by professional contributors is higher than that provided by novice or unskilled contributors [38]. Therefore, the method adopted in this paper is to divide the contributors in the region into different types and determine a different initial reputation for different types of contributors so as to obtain more accurate evaluation results of contributor’s reputation. In existing research, Boakes et al. grouped volunteers according to three indicators: activity, contribution duration, and contribution cycle change [39]. Jacobs et al. proposed a method based on the combination of PCA and K-means to classify contributors into four categories; [38] however, classification of contributor is implemented based on various contributor’s characteristics. Due to the different types and dimensions of contributors, the unified normalization method will lead to the problem that the classification characteristics are not obvious.

In response to the above problems, Zhao et al. proposed the use of the improved WPCA-based feature dimension reduction and classification method to cluster contributors with similar features into four types, namely novice, unskilled contributors, major contributors, and professional contributors [37]. On the basis of the above study, this paper determines different initial reputation according to data quality provided by three types of contributors. According to the current research on OSM user’s reputation, such as Zhao and Van Exel [32,40], most of them set a fixed value to the initial value of user’s reputation. The research of Kealeboga Moreri et al. shows that the reliable dataset in OSM is provided mainly by three-quarters of contributors [41]. Therefore, according to this statistic, this paper classifies contributors into three categories [37]: novice and unskilled contributors, major contributors, and professional contributors. In addition, this paper set 0.75, 0.9, and 1 to the initial reputation values of the three categories of users, respectively.

3.3. Contributor’s Evaluation Reputation

Since each OSM object is edited by contributors with different cartographic experience levels [42], each OSM object contains multiple different versions. During the editing process of each object, the factors that affect the data quality are continually changing, including semantic changes, geometric changes, and topological changes. In this process, the evaluation of the current contributor by the next contributor will be generated continuously. For example, user U_i generates version V_i, and user U_i+₁ modifies version V_i to generate version V_i+₁. By comparing the similarities between V_i+₁ and V_i, the evaluation of U_i+₁ to U_i can be obtained. The greater the similarity between V_i₊₁ and V_i, the smaller the modification of U_i by U_i+₁, which means that U_i+₁ has a high degree of recognition for the U_i, and the U_i evaluation reputation is also high, and vice versa.

3.3.1. Constructing Geographic Ontology

Using ontology to express the semantics of volunteered geographic information, concepts are divided into two parts: organizing concept and describing concept. Organizing concept is used to define the concept and the architecture of ontology and to organize the concept into the architecture of geographic information ontology. First, the relationships among the concept are defined by referring to the OpenStreetMap Data in Layered GIS Format [43] and the description of features on OSM Wiki [44], as shown in Table 1. Then, the tree architecture of volunteered geographic information ontology with top-down classification is constructed, which can reflect the different abstractions of volunteered geographic information. The classification tree structure is presently the most commonly used method to construct a hierarchical system. Its characteristic is that each concept can be divided into different levels according to its characteristics, and there is only one parent class and multiple subclasses, which can clearly distinguish the relationship between nodes (parent–child nodes, sibling nodes, etc.). The distance between concepts can be calculated by using the node relationship for the calculation of semantic similarity. Kuhn discussed the semantic modeling of geographical types from the idea of concept integration [45] and believed that the advantage of ontology with tree structure or concept hierarchy organization was that it could measure semantic similarity on the basis of the distance between concepts [46]. Firstly, this paper defines the broadest concept in the volunteered geographic information field, such as geographic information entities at the top, roads, main roads, highways, and then refines them. Finally, each concept is organized into a classification tree. As shown in Figure 2, the nodes in the classification tree structure represent the volunteered geographic information concept, and the edges represent IS-A semantic relations. The description of the volunteered geographic information entity starts from the top layer, and the description becomes more detailed as it proceeds down.

Describing concept refers to describing the definition, semantic relationship, nature, and attributes of volunteered geographic information concept through an ontology conceptual framework. Firstly, this study defines an ontology conceptual framework to describe concept, forms a conceptual framework system, and then determines the nature and attributes of each concept. There are two ways to define the properties and attributes of concept in ontology: one is to inherit the properties and attributes of its parent class through inheritance mechanism, and the other is the unique properties and attributes of concept. In the process of defining the properties and attributes of concept, this study pays attention to the standardization of naming and avoids the situation where the same properties and attributes are named differently. The ontology conceptual framework includes the definition, semantic relationship, nature, and attributes of concept, taking trunk as an example:

trunk IS-A highway
{Definition: Important roads, typically divided;
Semantic relationship: IS-A relationship with highway;
Nature: roads, such as elevated expressways, airport inbound expressways, river-crossing tunnels, and expressways on bridges;
Attribute: road width; road speed limit.}

In the above description, definition is the specific description of the concept, and semantic relationship is the relationship between concepts. Nature is the essential feature shared by all concepts, which is used to distinguish different concepts. Attribute is the non-essential feature of a concept, which is used to distinguish different instances of the same concept. After organizing concept and describing concept, the relationship between concept and the semantics of concept are described, and the volunteered geographic information ontology with classification tree structure is constructed, as shown in Figure 3.

3.3.2. Semantic Similarity of Object Version

Semantics describe mainly nongeometric characteristics of the features, determined by semantic similarity between different versions. In this paper, each entity is classified according to the classification rules, and the classified object concepts form a classification tree structure, as shown in Figure 3. The hierarchical classification is simple, so this paper chooses the edge-based similarity measure: the semantic similarity between two concepts is obtained by calculating the links or depths between concepts. Concept attributes and concept tree structure are the main factors that affect the semantic similarity of ontology-based concept. In this paper, the similarity of concept attribute and semantic distance of concept tree structure are used as indicators to calculate the semantic similarity between concepts.

(1): Similarity of concept attribute.

Entities themselves are described by conceptual attributes. Different concepts can be distinguished to a certain extent according to the differences between their attributes. It is generally believed that the more the same attributes exist between concepts, the higher the similarity between concepts. In this paper, repetition rate is only one aspect of the semantic similarities between two concepts. The literature [30,47] proposes that each OSM element has some tags, and each tag is represented by a form of key–value pair to describe its attributes which combined with the specific expression of OSM, as shown in Table 1. Because the conceptual attributes are explained by tags, it is appropriate to use tag repetition rate as one of the metrics to calculate the semantic similarity between two concepts. The similarity of concept attribute between versions x and y is expressed by the tag repetition rate, and its specific calculation formula is as follows:

S i m_{t a g} = \frac{t a g_{x} \cap t a g_{y}}{t a g_{x} \cup t a g_{y}}

(1)

(2): Semantic distance of concept tree structure.

Semantic distance is another important indicator to measure the semantic similarity of ontology concept. The smaller the semantic distance between concepts, the higher the similarity between concepts. Therefore, this paper uses the edge-based similarity measurement method and uses the shortest path length between the features of the ontology structure graph as the semantic distance indicator, and the semantic distance is defined as the sum of the shortest path lengths of two concepts from any common ancestor node LCA(X,Y) to x, y, respectively [48]:

d i s t a n c e = l e n g t h (L C A (X, Y), X) + l e n g t h (L C A (X, Y), Y)

(2)

To sum up, the similarity of concept tree structure can be obtained, and the specific calculation is as follows [48]:

S i m_{s t r u c t u r e} = \frac{μ}{d i s t a n c e + μ}

(3)

where

μ

is determined according to the maximum value of the shortest path length between two concepts in the concept tree structure. Combining the similarity of concept attribute and the similarity of concept tree structure, the semantic similarity based on ontology concept is calculated by weighted summation [48]:

S i m_{s e m a n t i c} = ω \cdot S i m_{t a g} + (1 - ω) S i m_{s t r u c t u r e}

(4)

where

ω

is the weight to calculate the semantic similarity of ontology-based concepts, which is a real number distributed between (0,1).

3.3.3. Geometric Similarity of Object Version

Geometric features are used mainly to describe the shape and position of features, and geometric similarity is evaluated mainly by calculating a series of geometric features. The method based on the turning function distance directly uses the shape points of the object boundary to calculate the shape similarity [49] to represent the rotation change of the object shape more intuitively. Therefore, the geometric similarity in this paper is determined mainly by calculating the shape similarity between adjacent versions based on the turning function distance.

As shown in Figure 4,

θ

is the azimuth angle of the starting edge,

α

is the turning angle, and

Θ_{(l)}

expresses the turning angle function of curve A. The x-axis represents the length of the curve, and the y-axis represents the angle between the curve and the left turn tangent. If the edge of the curve rotates counterclockwise, the turning angle is added to the original angle; otherwise, the turning angle will be subtracted. The turning function distance between object versions A and B can be defined as [49]:

T F (A, B) = {(\int {| Θ_{(A)} - Θ_{(B)} |}^{p} d l)}^{\frac{1}{p}}

(5)

where the value of p is usually 2. When calculating the shape similarity by the turning function distance, the smaller the turning function distance, the greater the shape similarity. Therefore, the shape similarity of object versions A and B is defined as [49]:

S i m_{s h a p e} = 1 - \frac{T F (A, B)}{m a x A_{A, B} - m i n A_{A, B}}

(6)

where

m a x A_{A, B}

and

m i n A_{A, B}

represent the maximum and minimum turning angles of A and B, respectively.

3.3.4. Topological Similarity of Object Version

Topology is used mainly to describe the context of spatial changes between features and their adjacent versions. When both geometric changes and topology changes are considered, the spatial changes between object versions can be described more comprehensively. If the geometric characteristics of the object change, this may affect its spatial position, and then its topology will also change. For example, as shown in Figure 5, the two object versions A_i and B_i are close in space and overlap in topology, but when the geometric characteristics of the object version B_i+₁ change, its object versions A_i₊₁ and B_i₊₁ change from overlap to disjoint in topology.

In the OSM platform, modifications of geographical objects always occur in the same type of objects, such as line or area objects. The most important topological relationships between two area objects are the overlap, disjoint, meet, and equal, etc. This study calculates mainly the topological similarity between different versions of a same object. The area overlap rate can quantify the topological relationship between versions, to a certain extent. Therefore, this study considers only the area overlap rate as a measurement of topological similarity. If it is a line object, we set a buffer around the line object, and the area overlap rate of the buffer is calculated, taking the centroid distance and area size between two adjacent versions of objects as the influencing factors of the area overlap rate. If the centroid positions of the two version objects are A(Lat_A,Lon_A) and B(Lat_B, Lon_B), respectively, the distance

L_{A, B}

between the two centroids is:

L_{A, B} = 2 R * a r c s i n (\sqrt{s i n^{2} (\frac{{Lat}_{A} - {Lat}_{B}}{2}) + \cos ({Lat}_{A}) * \cos ({Lat}_{B}) * s i n^{2} (\frac{{Lon}_{A} - {Lon}_{B}}{2})})

(7)

where R is the radius of the earth, approximately 6400 km,

{Lat}_{A}

is the latitude of centroids A,

{Lon}_{A}

is the longitude of centroids A,

{Lat}_{B}

is the latitude of centroids B, and

{Lon}_{B}

is the longitude of centroids B. In addition to the centroid distance, the area size is also used as a measure of the area overlap rate and mainly considers the overlap of the object area between two adjacent versions. The specific calculation method is as follows [49]:

S i m_{t o p o l o g y} = \frac{A r e a_{a \cap b}}{A r e a_{m a x} (A r e a_{a}, A r e a_{b})}

(8)

where

A r e a_{a \cap b}

is the size of the intersection area between version a and version b, and

A r e a_{m a x} (A r e a_{a}, A r e a_{b})

is the maximum area between version a and version b. The factors that affect the single evaluation value include semantic similarity, geometric similarity, and topological similarity, based on ontology concept. These three factors are important indicators of data quality contributed by contributors. The calculation method of one-time evaluation

R_{c} (u)

is:

R_{c} (u) = \frac{S i m_{s e m a n t i c} + S i m_{s h a p e} + S i m_{t o p o l o g y}}{3}

(9)

The one-time evaluation is a single evaluation for an object contributed by a user, but OSM users will contribute multiple object data, and in this case, users will be evaluated many times, so evaluation reputation is the accumulation of a single evaluation process. For example, the evaluation times of a user is m, the evaluation reputation

R_{e} (u)

of the contributor is calculated as follows [32]:

R_{e} (u) = \sum_{i = 1}^{m} \frac{R_{c} (u)}{m}

(10)

3.4. Contributor’s Reputation

Contributor’s reputation is composed of initial reputation and evaluation reputation. With the increase of data provided by contributors, the main part of the composition of contributor’s reputation gradually changes from initial reputation to evaluation reputation. If the user has not yet contributed data, at that time, the user has not been evaluated by other users, and the contributor’s reputation is determined by the initial reputation. If a user contributes less data, the user will get less evaluation, and the impact of the contributor’s evaluation reputation on the contributor’s reputation is small, which may not truthfully reflect the contributor’s reputation. Therefore, its reputation composition is based mainly on the initial reputation, supplemented by the evaluation reputation. With the increase of contribution data, users get increasingly more evaluations. At this time, the impact of initial reputation on contributor’s reputation becomes smaller, the evaluation reputation can be similar to contributors’ reputation, and the composition of contributors’ reputation becomes dominated by evaluation reputation, supplemented by initial reputation. Contributor’s reputation is obtained by a weighted calculation of contributor’s initial reputation and contributor’s evaluation reputation [32]:

R (u) = R_{o} (u) \cdot e^{- m / M} + R_{e} (u) \cdot (1 - e^{- m / M})

(11)

where

R_{o} (u)

is the initial reputation of the user and m is the number of times it has been evaluated; M is a positive natural number constant, which is used to control the weight of the initial reputation and evaluation reputation on the contributor’s reputation, and it is determined by the average number of times a user has been evaluated.

4. Experiment and Analysis

This paper’s assessments among contributors are derived by modifications of the OSM objects, and the dataset must contain history edit processes, both spatial and attribute aspects, of all objects. Thus, the experimental data in this paper is based on the historical data of OSM in Rutland, England (obtained from https://download.geofabrik.de/), as of 12 April 2022. The satellite map data is from Google Maps, as shown in Figure 6a. In this experiment, 724 volunteers participated in the contribution. Due to the processing methods of line objects being similar to the polygon objects, only polygon objects are used for the experiment, and 16,794 different object versions were parsed from the OSM historical data. The dataset contains all the data types such as roads, buildings, etc., and it is rich enough to meet the basic requirements for our study; the version distribution is shown in Figure 6b.

According to the model of contributor’s reputation evaluation based on semantic similarity of ontology concepts, the semantic similarity is calculated from tag repetition rate and shortest path; the geometric similarity is calculated mainly by shape similarity based on turning function distance, and the topological similarity considers mainly area overlap rate and calculates the reputation of the contributor who participated in the contribution. Through many experiments, the difference in the influence degree of each influencing factor on this model was analyzed, and finally, the parameters in this model were set as

μ

= 5,

ω

= 0.8, and M = 23.

By parsing the historical data of OSM in Rutland, England, a total of 724 users participated in this reputation evaluation model. The users are distributed in the range of 0.5–1 according to their reputation, and they are divided into five intervals in 0.1 increments. The number of people in each interval is 5, 22, 124, 549, and 24, respectively. It can be seen from the number of people in each interval that the distribution of contributors’ reputation essentially obeys the normal distribution.

As shown in Table 2, the initial reputation of the five users in the range of 0.5–0.6 is 0.75, which means all five users are novice or unskilled contributors. In the range of 0.6–0.7, most of them are novice or unskilled contributors (initial reputation is 0.75). However, a few major contributors (initial reputation is 0.9) begin to appear in this range. In the range of 0.7–0.8, although most users are still novice or unskilled contributors (initial reputation is 0.75), the number of major contributors (initial reputation is 0.9) starts to increase. In the range of 0.8–0.9, most of them are major contributors (initial reputation is 0.9). There are few novice or unskilled contributors (initial reputation is 0.75) and professional contributors (initial reputation is 1). They are all professional contributors within the range of 0.9–1 (initial reputation is 1). It can be seen that on the whole, the contributor’s reputation is essentially positively correlated with the initial reputation of users. This means that the higher the initial reputation of contributors, the greater the possibility of their high reputation.

In order to verify the experimental results, this experiment attached the contributor’s reputation to their contributed way objects and displayed way objects in different colors on ArcGIS according to the distribution interval of the contributor’s reputation. The effect is shown in Figure 7. Since it is difficult to verify the accuracy of each way object one by one, the sampling analysis method is selected in each reputation interval in this experiment. Firstly, five contributors are selected in each reputation interval with the interval of 0.1 to analyze their contributions, and a total of 25 contributors are sampled to participate in the evaluation, including 3417 valid versions. Finally, according to the sampling situation, the contribution data effect can be divided into three categories: error (single evaluation value less than 0.5), poor (single evaluation value between 0.5 and 0.75), and good (single evaluation value greater than 0.75). Among the sampled data, there are 17 error data cases, 337 poor data cases, and 3063 good data cases.

According to the sampled data results, as shown in Table 3, the OSM data quality in this region is generally good, with only a few error data cases. Among the data provided by contributors with low reputation, the proportion of error data and poor data is high. For example, when the reputation of contributors is lower than 0.8, there are 16 error data cases, accounting for approximately 94.1% of the error data. With the improvement of contributors’ reputation, the proportion of data with higher contribution quality gradually increases, and the proportion of error data shows a downward trend. For example, when the contributor’s reputation is higher than 0.8, there are 2948 contributed data objects, accounting for 96.2% of the good data. The contributor’s reputation is within the reputation interval of 0.9–1, and there is no error data. It can be seen that the quality of data increases with the reputation of contributors, and there is a positive correlation between data quality and the contributor’s reputation.

This paper selects three examples to further verify the effectiveness of the model proposed in this paper. The above Figure 8a shows the object with the object id of 4406271, contributed by the user (id: 280348). It can be clearly seen from the image that the dataset is error data, and the object is below the reference real dataset, which does not match the real dataset in geometry and topology. Figure 8b shows the object with the object id of 43407756 contributed by the user (id: 12355). It can be seen that this data object has a high degree of matching with the real dataset in terms of geometry and topology similarity, which is more in line with the real situation on the ground and has high data quality. Figure 8c shows that the object with the object id of 3700148 contributed by the user (id: 1377) has a high degree of matching with the actual object in geometry and topology. The object tag provided by the contributor is amenity = school, which has semantic deviations. Therefore, combining semantic similarity to evaluate contributor’s reputation can improve the accuracy of the evaluation. The scores of specific evaluation indicators and the contributor’s reputation of semantic similarity, geometric similarity, and topological similarity are shown in Table 4 and Table 5, respectively. According to the superposition example of objects in Figure 8c, it can be seen that when the geometric similarity and topological similarity are high but the semantic features deviate, the data quality of the object may not be very good, so this model can evaluate the data quality provided by contributors more accurately.

According to the data of Rutland, England, the results show that the higher the initial reputation of contributors, the greater the possibility of their high reputation. This indicates that the initial reputation evaluation method proposed in this paper is reasonable. In general, the quality of data increases with the improvement of contributors’ reputation, and there is a positive correlation with the contributors’ reputation. The experiment proves that the model proposed in this paper can evaluate the data quality provided by contributors more accurately, especially when the semantic features change. Considering semantic similarity of versions enables the calculation of contributor’s evaluation to be more comprehensive and eliminates the evaluation bias caused by only semantic changes between versions. Generally speaking, the contributor’s reputation evaluation method proposed in this paper is an effective method for evaluating contributor’s reputation in OSM-like systems.

5. Discussion

Contributor’s reputation in VGI has received the attention of many scholars in recent years [29,30,31,32]. These studies show that contributor’s reputation is highly correlated with data quality. However, most studies did not consider the impact of the semantics of volunteer contribution objects on contributor’s reputation and data quality. In this study, the ontology of volunteered geographic information is constructed, and the semantic similarity evaluation model for evaluating volunteer contribution objects is established. The evaluation reputation of a contributor is calculated by combining the semantic similarity, geometric similarity, and topological similarity between object versions. In this study, the relationship between contributor’s reputation and data quality was further verified by sampling experiments.

Many scholars consider initial reputation and evaluation reputation as two important components of contributor’s reputation [49]. User’s initial reputation has an important impact on a contributor’s reputation, and how to determine a user’s initial reputation is very critical to this study.

It can be seen from the reputation distribution of contributors in Rutland County that contributors with high reputation (reputation greater than 0.8) account for the majority (79%) of contributors. Most of the contributors with high reputation also have a high initial reputation, which is generally positively correlated. As shown in Table 2, this result shows that the OSM data in this study area are relatively mature, which is in line with expectations. Because this study area is located in the United Kingdom, one of the most developed OSM regions [27], the reputation of OSM contributors is generally good [2]. This paper determines the initial reputation of novice and unskilled contributors as the same value. Although this method is proved to be effective, it needs further experimental verification to determine whether the effect is better after subdivision.

In this study, the semantic similarity, geometric similarity, and topological similarity of ontology concepts between object versions are combined to calculate contributors’ evaluation reputations, in which case, the data selection and weight-setting are sensitive. The assessments among contributors are derived by modifications of the OSM objects. Therefore, it has some restrictions to the dataset. This model deals mainly with OSM-like datasets, which need to contain a history of modifications to the location and properties of the object. Regrading the process of determining the variables, the variables selected in this paper are verified by experiments. For example,

μ

is determined on the basis of the maximum value of the shortest path length between two concepts in the concept tree structure; M is determined by the average number of times the user has been evaluated, and different systems will have different adaptability to M. Sampling analysis is designed to verify the accuracy of the results of statistical experiments. Because it is difficult to verify the accuracy of each way object one by one, random sampling analysis is used in our experiments. To reduce the bias in the sampling process, each reputation interval is sampled to verify the accuracy of our experimental results. This study considers that mainly the similarity between different versions of the same object is calculated. The area overlap rate can quantify the topological relationship between versions, to a certain extent. Therefore, this study considers only the area overlap rate as a metric of topological similarity for the time being. This method cannot distinguish between the topological relationships of disjoint and meet, and how to quantify these two cases can be considered in the next step.

The results of the sampling experiment show that users with higher reputation tend to provide higher quality data, as shown in Table 3. The overall high quality of data in the region verified the conclusion that the reputation of OSM contributors in the region is generally high. Contributor’s evaluation depends mainly on factors such as semantic similarity, geometric similarity, and topological similarity, as shown in Figure 8c. When the way object matches the actual object well in terms of geometric and topological relationship, but there is a semantic deviation, it results in the contributor’s reputation being low. The model proposed in this paper considers only the semantic changes of feature class tag and linguistic change of other tags of OSM objects rather than the semantic changes of tags of non-feature class, which need to be further improved.

6. Conclusions

At present, volunteered geographic information has been widely used in many fields, but there are still some malicious, error-ridden, and low-quality data in volunteered geographic information due to the non-professional and non-standard contributors. Contributor’s reputation has an important impact on geospatial data quality. However, there is a lack of study on the contributor’s reputation based on semantic similarity of their contributions. In order to solve the existing problems and calculate the contributor’s reputation more effectively, this paper draws on the ontology-related study in the field of knowledge and proposes a model of contributor’s reputation evaluation based on semantic similarity of ontology concepts.

The reputation model calculation is divided into two parts: initial reputation and evaluation reputation. The initial reputation is determined by the category of users classified based on improved WPCA feature dimension reduction and classification method. The evaluation relationship among users is based on their modification of OSM objectives, and the quantitative method of this evaluation relationship is designed. Firstly, by constructing geographic ontology, the semantic similarity is calculated on the basis of the feature similarity of ontology concept and the semantic distance of concept. Then, the contributor’s reputation is evaluated by integrating the geometric similarity and topological similarity. Further, the initial reputation and evaluation reputation of users are combined to obtain their reputation. Finally, the historical data of OSM in Rutland, England, is used as an example to calculate the reputation of every user. The experimental results were evaluated by sampling analysis. The analysis results show that there is a positive correlation between data quality and contributors’ reputation. In addition, it is proved that this model is a more effective method to evaluate data quality when semantic features change. It provides an effective metric to evaluate the semantic similarity between object versions for contributors’ reputation evaluation.

Author Contributions

Conceptualization, Y.Z. and X.W.; methodology, Y.Z. and Y.L.; software, X.W.; validation, X.W.; formal analysis, Y.Z.; investigation, Y.Z.; resources, Y.Z.; data curation, X.W.; writing—original draft preparation, Y.Z and X.W.; writing—review and editing, Y.Z., Y.L. and Z.L.; visualization, X.W.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z., Y.L. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 41871320; Key Project of Hunan Provincial Education Department grant number 19A172; and Hunan Provincial Natural Science Foundation of China grant number 2021JJ30276.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Geofabrik at https://download.geofabrik.de/ (accessed on 28 July 2022).

Acknowledgments

The authors would like to thank the editors and the reviewers for their contributions.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef] [Green Version]
Zhang, D.; Ge, Y.; Stein, A.; Zhang, W. Ranking of VGI contributor reputation using an evaluation-based weighted pagerank. Trans. GIS 2021, 25, 1439–1459. [Google Scholar] [CrossRef]
Sehra, S.S.; Singh, J.; Rai, H.; Anand, S.S. Extending Processing Toolbox for assessing the logical consistency of OpenStreetMap data. Trans. GIS 2019, 24, 44–71. [Google Scholar] [CrossRef]
Al-Bakri, M.; Fairbairn, D. Assessing similarity matching for possible integration of feature classifications of geospatial data from official and informal sources. Int. J. Geogr. Inf. Sci. 2012, 26, 1437–1456. [Google Scholar] [CrossRef]
Barron, C.; Neis, P.; Zipf, A. A Comprehensive Framework for Intrinsic OpenStreetMap Quality Analysis. Trans. GIS 2013, 18, 877–895. [Google Scholar] [CrossRef]
Zhou, X.; Zhao, Y. Issues and Advances of Crowdsourcing Geographic Data Quality. Geomat. World 2020, 27, 9–15. (In Chinese) [Google Scholar]
Fan, H.; Zipf, A.; Fu, Q.; Neis, P. Quality assessment for building footprints data on OpenStreetMap. Int. J. Geogr. Inf. Sci. 2014, 28, 700–719. [Google Scholar] [CrossRef]
Zhang, W.-B.; Leung, Y.; Ma, J.-H. Analysis of positional uncertainty of road networks in volunteered geographic information with a statistically defined buffer-zone method. Int. J. Geogr. Inf. Sci. 2019, 33, 1807–1828. [Google Scholar] [CrossRef]
Bishr, M.; Janowicz, K. Can we trust information?-The case of volunteered geographic information. In Proceedings of the Towards Digital Earth Search Discover and Share Geospatial Data Workshop at Future Internet Symposium, Berlin, Germany, 20 September 2010. [Google Scholar]
Fogliaroni, P.; D’Antonio, F.; Clementini, E. Data trustworthiness and user reputation as indicators of VGI quality. Geo-Spat. Inf. Sci. 2018, 21, 213–233. [Google Scholar] [CrossRef] [Green Version]
Zhao, Y.; Zhou, X.; Huang, M. Computing Model of Volunteered Geographic Information Trustworthiness Based on User Reputation. Geomat. Inf. Sci. Wuhan Univ. 2016, 41, 1530–1536. [Google Scholar]
Malik, Z.; Bouguettaya, A. Reputation Bootstrapping for Trust Establishment among Web Services. IEEE Internet Comput. 2009, 13, 40–47. [Google Scholar] [CrossRef]
Agapiou, A. Estimating proportion of vegetation cover at the vicinity of archaeological sites using sentinel-1 and-2 data, supplemented by crowdsourced openstreetmap geodata. Appl. Sci. 2020, 10, 4764. [Google Scholar] [CrossRef]
Costantino, D.; Vozza, G.; Alfio, V.S.; Pepe, M. Strategies for 3D Modelling of Buildings from Airborne Laser Scanner and Photogrammetric Data Based on Free-Form and Model-Driven Methods: The Case Study of the Old Town Centre of Bordeaux (France). Appl. Sci. 2021, 11, 10993. [Google Scholar] [CrossRef]
Haklay, M. How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [Google Scholar] [CrossRef] [Green Version]
Girres, J.-F.; Touya, G. Quality Assessment of the French OpenStreetMap Dataset. Trans. GIS 2010, 14, 435–459. [Google Scholar] [CrossRef]
Zielstra, D.; Zipf, A. A comparative study of proprietary geodata and volunteered geographic information for Germany. In Proceedings of the 13th AGILE International Conference on Geographic Information Science, Guimarães, Portugal, 10–14 May 2010. [Google Scholar]
Mooney, P.; Corcoran, P. The Annotation Process in OpenStreetMap. Trans. GIS 2012, 16, 561–579. [Google Scholar] [CrossRef]
Lai, C.-M.; Chen, M.-H.; Kristiani, E.; Verma, V.K.; Yang, C.-T. Fake News Classification Based on Content Level Features. Appl. Sci. 2022, 12, 1116. [Google Scholar] [CrossRef]
Ballatore, A.; Bertolotto, M.; Wilson, D.C. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl. Inf. Syst. 2013, 37, 61–81. [Google Scholar] [CrossRef] [Green Version]
Zhao, Y.; Yang, W.; Liu, Y.; Liao, Z. Discovering transition patterns among OpenStreetMap feature classes based on the Louvain method. Trans. GIS 2021, 26, 236–258. [Google Scholar] [CrossRef]
Forati, A.M.; Ghose, R. Volunteered Geographic Information Users Contributions Pattern and its Impact on Information Quality. Preprints 2020, 2020070270. [Google Scholar] [CrossRef]
Adler, B.T.; De Alfaro, L. A content-driven reputation system for the Wikipedia. In Proceedings of the 16th International Conference on World Wide Web, Banff, AB, Canada, 8–12 May 2007; pp. 261–270. [Google Scholar]
Degrossi, L.C.; De Albuquerque, J.P.; Rocha, R.D.S.; Zipf, A. A taxonomy of quality assessment methods for volunteered and crowdsourced geographic information. Trans. GIS 2018, 22, 542–560. [Google Scholar] [CrossRef]
Yang, H.; Zhang, J.; Roe, P. Reputation modelling in Citizen Science for environmental acoustic data analysis. Soc. Netw. Anal. Min. 2012, 3, 419–435. [Google Scholar] [CrossRef]
Flanagin, A.J.; Metzger, M.J. The credibility of volunteered geographic information. GeoJournal 2008, 72, 137–148. [Google Scholar] [CrossRef]
Goodchild, M.F.; Li, L. Assuring the quality of volunteered geographic information. Spat. Stat. 2012, 1, 110–120. [Google Scholar] [CrossRef]
Lodigiani, C.; Melchiori, M. A pagerank-based reputation model for VGI data. Procedia Comput. Sci. 2016, 98, 566–571. [Google Scholar] [CrossRef] [Green Version]
Nejad, R.G.; Abbaspour, R.A.; Chehreghan, A. Spatiotemporal VGI contributor reputation system based on implicit evaluation relations. Geocarto Int. 2022, 1–28. [Google Scholar] [CrossRef]
D Antonio, F.; Fogliaroni, P.; Kauppinen, T. VGI edit history reveals data trustworthiness and user reputation. In Proceedings of the AGILE’2014 International Conference on Geographic Information Science, Castellón, Spain, 3–6 June 2014. [Google Scholar]
Keßler, C.; Trame, J.; Kauppinen, T. Tracking editing processes in volunteered geographic information: The case of OpenStreetMap. Identifying objects, processes and events in spatio-temporally distributed data (IOPE). In Proceedings of the Workshop at Conference on Spatial Information Theory, Belfast, ME, USA, 12–16 September 2011; Volume 12, pp. 6–8. [Google Scholar]
Zhao, Y.; Zhou, X.; Li, G.; Xing, H. A Spatio-Temporal VGI Model Considering Trust-Related Information. ISPRS Int. J. Geo-Inf. 2016, 5, 10. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Zhou, C.H.; Wang, J.G. Advances in the study of the geo-ontology. Earth Sci. Front. 2006, 13, 81–90. (In Chinese) [Google Scholar]
Tversky, A. Features of similarity. Psychol. Rev. 1977, 84, 327. [Google Scholar] [CrossRef]
Taieb, M.A.H.; Ben Aouicha, M.; Ben Hamadou, A. Ontology-based approach for measuring semantic similarity. Eng. Appl. Artif. Intell. 2014, 36, 238–261. [Google Scholar] [CrossRef]
Li, Y.; Bandar, Z.; McLean, D. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 2003, 15, 871–882. [Google Scholar] [CrossRef]
Zhao, Y.; Wei, X.; Liu, Y.; Liao, Z. An OSM Contributors Classification Method Based on WPCA and GMM. J. Phys. Conf. Ser. 2021, 2025, 012040. [Google Scholar] [CrossRef]
Jacobs, K.T.; Mitchell, S.W. OpenStreetMap quality assessment using unsupervised machine learning methods. Trans. GIS 2020, 24, 1280–1298. [Google Scholar] [CrossRef]
Boakes, E.H.; Gliozzo, G.; Seymour, V.; Harvey, M.; Smith, C.; Roy, D.B.; Haklay, M. Patterns of contribution to citizen science biodiversity projects increase understanding of volunteers’ recording behaviour. Sci. Rep. 2016, 6, 33051. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Van Exel, M.; Dias, E.; Fruijtier, S. The impact of crowdsourcing on spatial data quality indicators. In Proceedings of the GIScience 2010 Doctoral Colloquium, Zurich, Switzerland, 14–17 September 2010. [Google Scholar]
Moreri, K. Volunteer reputation determination in crowdsourcing projects using latent class analysis. Trans. GIS 2020, 25, 968–984. [Google Scholar] [CrossRef]
Vargas-Munoz, J.E.; Srivastava, S.; Tuia, D.; Falcao, A.X. OpenStreetMap: Challenges and Opportunities in Machine Learning and Remote Sensing. IEEE Geosci. Remote Sens. Mag. 2020, 9, 184–199. [Google Scholar] [CrossRef]
Ramm, F. OpenStreetMap Data in Layered GIS Format; Version 0.7.12; 2022. Available online: https://www.geofabrik.de/data/geofabrik-osm-gis-standard-0.7.pdf (accessed on 25 June 2022).
Features on OSM Wiki[EB/OL]. Available online: https://wiki.openstreetmap.org/wiki/Zh-hans:Map_Features (accessed on 25 June 2022).
Kuhn, W. Modeling the semantics of geographic categories through conceptual integration. In Proceedings of the International Conference on Geographic Information Science, Boulder, CO, USA, 25–28 September 2002. [Google Scholar]
Muttaqien, B.I.; Ostermann, F.O.; Lemmens, R.L.G. Modeling aggregated expertise of user contributions to assess the credibility of OpenStreetMap features. Trans. GIS 2018, 22, 823–841. [Google Scholar] [CrossRef]
Zhao, Y.; Guo, X.; Liu, Y.; Liao, Z.; Liu, M. A Tag Recommendation Method for OpenStreetMap Based on FP-Growth and Improved Markov Process. In Advances in Artificial Intelligence and Security. ICAIS 2021. Communications in Computer and Information Science; Springer: Cham, Switzerland, 2021; pp. 407–419. [Google Scholar] [CrossRef]
Yang, N.; Zhang, Q.; Niu, J. Computational model of geospatial semantic similarity based on ontology structure. Sci. Surv. Mapp. 2015, 40, 6. (In Chinese) [Google Scholar]
Zhao, Y.; Zhou, X. Version Similarity-based Model for Volunteers’ Reputation of Volunteered Geographic Information: A case of Polygon. Acta Geod. Cartogr. Sin. 2015, 44, 578–584. (In Chinese) [Google Scholar]

Figure 1. Flow chart of contributor’s reputation evaluation.

Figure 2. The architecture diagram of volunteered geographic information ontology.

Figure 3. Part of volunteered geographic information ontology.

Figure 4. Polysemy lines expressed by the turning function.

Figure 5. Graph of geometric changes affect topological diagrams.

Figure 6. Map of Rutland, England: (a) satellite map data in Rutland; (b) OSM map in Rutland.

Figure 7. Graph of contributor’s reputation distribution map.

Figure 8. Overlay example graph of objects: (a) The object id of 4406271; (b) The object id of 43407756; and (c) The object id of 3700148.

Table 1. Part of the OpenStreetMap Data in Layered GIS Format.

Code	Layer	Class	Description	OSM Tag
5111	roads	motorway	Motorway/freeway	highway = motorway
5112	roads	trunk	Important roads, typically divided	highway = trunk
5122	roads	residential	Roads in residential areas	highway = residential
5131	roads	motorway_link	Roads that connect from one road to another of the same of lower category	highway = motorway_link
2401	buildings	hotel	A building designed with separate rooms available for overnight accommodation	tourism = hotel
7201	landuse	forest	A forest or woodland	landuse = forest, nature = wood
7202	landuse	park	A park	leisure = park, leisure = common
7204	landuse	industrial	An industrial area	landuse = industrial

Table 2. Sample table of the relationship between contributor reputation and initial reputation.

	0.5–0.6	0.6–0.7	0.7–0.8	0.8–0.9	0.9–1	Total/Person
Initial Reputation	0.5–0.6	0.6–0.7	0.7–0.8	0.8–0.9	0.9–1	Total/Person
0.75 (Novice or unskilled contributors)	5	21	116	19	0	161
0.9 (Major contributors)	0	1	7	523	0	531
1 (Professional contributors)	0	0	1	7	24	32
Total/person	5	22	124	549	24	724

Table 3. Sample table of the relationship between contributor reputation and data quality.

	0.5–0.6	0.6–0.7	0.7–0.8	0.8–0.9	0.9–1	Total/Version
Effect	0.5–0.6	0.6–0.7	0.7–0.8	0.8–0.9	0.9–1	Total/Version
Error	10	5	1	1	0	17
Poor	3	10	78	175	71	337
Good	9	14	92	580	2368	3063
Total/version	22	29	171	756	2439	3417

Table 4. Evaluation index score table of objects examples.

Object Id	Semantic Similarity	Geometric Similarity	Topological Similarity	One-Time Evaluation
4406271	0.333	0.333	0.228	0.298
43407756	1	0.871	0.812	0.894
3700148	0.235	1	1	0.745

Table 5. Reputation of user id 280348, 280348, and 1377.

User Id	Evaluation Reputation	Initial Reputation	Reputation
280348	0.316	0.75	0.526
12355	0.884	1	0.948
1377	0.627	0.9	0.736

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Y.; Wei, X.; Liu, Y.; Liao, Z. A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts. Appl. Sci. 2022, 12, 11363. https://doi.org/10.3390/app122211363

AMA Style

Zhao Y, Wei X, Liu Y, Liao Z. A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts. Applied Sciences. 2022; 12(22):11363. https://doi.org/10.3390/app122211363

Chicago/Turabian Style

Zhao, Yijiang, Xingcai Wei, Yizhi Liu, and Zhuhua Liao. 2022. "A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts" Applied Sciences 12, no. 22: 11363. https://doi.org/10.3390/app122211363

APA Style

Zhao, Y., Wei, X., Liu, Y., & Liao, Z. (2022). A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts. Applied Sciences, 12(22), 11363. https://doi.org/10.3390/app122211363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts

Abstract

1. Introduction

2. Related Work

2.1. The Quality of Crowdsourcing Geographic Data

2.2. The Contributor’s Reputation of Crowdsourcing Geographic Data

2.3. The Ontology of Crowdsourcing Geographic

3. Methods

3.1. Model Overview

3.2. Contributor’s Initial Reputation

3.3. Contributor’s Evaluation Reputation

3.3.1. Constructing Geographic Ontology

3.3.2. Semantic Similarity of Object Version

3.3.3. Geometric Similarity of Object Version

3.3.4. Topological Similarity of Object Version

3.4. Contributor’s Reputation

4. Experiment and Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI