A System-Oriented Framework for Reliability Assessment of Crowdsourced Geospatial Data Using Unsupervised Learning

Hassan, Hussein Hamid; Ali Abbaspour, Rahim; Chehreghan, Alireza

doi:10.3390/systems14020129

Open AccessArticle

A System-Oriented Framework for Reliability Assessment of Crowdsourced Geospatial Data Using Unsupervised Learning

by

Hussein Hamid Hassan

^1,2

,

Rahim Ali Abbaspour

^1,*

and

Alireza Chehreghan

³

¹

School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran P.O. Box 14155-6619, Iran

²

Department of Spatial Planning, College of City and Regional Planning, University of Duhok, Zakho Street 38, Duhok 1006 AJ, Kurdistan Region, Iraq

³

Faculty of Mining Engineering, Sahand University of Technology, Tabriz P.O. Box 51335-1996, Iran

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(2), 129; https://doi.org/10.3390/systems14020129

Submission received: 24 October 2025 / Revised: 19 December 2025 / Accepted: 29 December 2025 / Published: 27 January 2026

Download

Browse Figures

Versions Notes

Abstract

Crowdsourced geospatial platforms constitute complex socio-technical systems in which data quality and reliability emerge from collective user behavior rather than centralized control. This study proposes a system-oriented, unsupervised machine learning framework to assess the reliability of crowdsourced building data using only intrinsic indicators. The framework is demonstrated through a large-scale analysis of OpenStreetMap building polygons in Tehran. Six intrinsic indicators—reflecting contributor activity, temporal dynamics, semantic instability, and geometric evolution—were normalized using fuzzy membership functions and objectively weighted based on their discriminative influence within a K-means clustering process. Five reliability classes were identified, ranging from very low to very high reliability. The resulting classification exhibited strong internal validity (average silhouette coefficient = 0.58) and pronounced spatial coherence (Global Moran’s I = 0.85, p < 0.001). This approach eliminates dependence on authoritative reference datasets, enabling scalable, reproducible, and feature-level reliability assessment in open geospatial systems. The framework provides a transferable methodological foundation for trust-aware analysis and decision-making in participatory and data-intensive systems.

Keywords:

crowdsourced geospatial systems; reliability modeling; intrinsic quality assessment; unsupervised learning; trust-aware systems; VGI

1. Introduction

In recent years, crowdsourced geospatial platforms have evolved into complex socio-technical systems, in which geospatial data are continuously produced, modified, and validated through decentralized human–technology interactions. Within such systems, data quality and reliability do not result from centralized control mechanisms but rather emerge from the collective behavior of contributors over time. Ensuring the trustworthiness of data generated in these open, volunteered, and participatory systems has thus become a critical challenge for data-driven decision-making processes, including urban planning, infrastructure management, and public services.

The integration of Web 2.0 technologies with geospatial tools has fundamentally transformed the generation, modification, and utilization of geographic information. This technological convergence enables users to actively contribute spatial data through participatory mapping platforms, resulting in dynamic and continuously evolving datasets [1,2,3]. In this context, geographic information is no longer produced through a linear data collection process but through an interactive system in which individual contributions collectively shape the structure and quality of the dataset. A pivotal development in this domain is the concept of Volunteered Geographic Information (VGI), introduced by Goodchild [4], which represents a shift from traditional top-down data production toward decentralized, user-driven geospatial systems.

The significance of VGI lies in its ability to overcome the limitations of traditional geospatial data collection methods, which are often costly and time-consuming. By leveraging the power of crowdsourcing, VGI offers a scalable alternative that provides timely and diverse geographic information across various domains, including disaster management, transportation planning, and urban development [4,5,6,7]. This shift has effectively transformed passive data consumers into active contributors, thereby democratizing the production of geographic information.

However, the characteristics that make VGI systems open and scalable also introduce significant challenges related to data quality and reliability. The diversity of contributor backgrounds, the absence of formal training, and the lack of enforced standards transform data quality into a system-level property that varies across space and time. Unlike authoritative datasets governed by centralized validation procedures, crowdsourced geospatial systems rely on decentralized and heterogeneous contributor behavior, resulting in varying levels of consistency, accuracy, and temporal relevance [7,8,9,10]. These characteristics make conventional quality assurance mechanisms insufficient for such systems and necessitate alternative approaches that can account for their intrinsic dynamics.

To address these quality concerns, researchers have developed two primary assessment approaches: extrinsic and intrinsic. Extrinsic evaluation involves comparing VGI datasets with authoritative reference data to identify discrepancies and assess positional accuracy [11,12]. However, this approach is fundamentally limited by the scarcity of high-quality reference datasets with global coverage [13,14]. In response to these limitations, intrinsic assessment methods have gained prominence. These techniques evaluate data quality based on internal characteristics such as contributor reputation, edit history, and spatiotemporal consistency, without requiring external benchmarks [15,16,17,18,19]. While intrinsic methods offer a more scalable solution, existing research has predominantly focused on linear features [12,20,21], leaving a significant gap in the assessment of polygonal VGI data—despite their critical importance in applications such as 3D building reconstruction [22,23], urban studies [24], and disaster management [25]. Furthermore, current intrinsic assessment methodologies typically rely on expert-driven weighting of quality indicators through multi-criteria decision-making processes, which are inherently subjective and time-consuming [26].

Among various crowdsourced geospatial platforms, OpenStreetMap (OSM) has become a prominent example due to its global coverage, continuous updates, and large volunteer community. However, the reliability of its contributed data remains a key concern that limits broader adoption in analytical and operational contexts. The expansion of this platform has facilitated the rapid collection of vast amounts of geospatial data. However, the quality and reliability of this data often remain uncertain due to the decentralized and heterogeneous nature of the underlying contribution processes. Without systematic assessment and labeling mechanisms, using such data for scientific research, urban planning, or crisis management may lead to unreliable or biased outcomes. Moreover, most existing studies evaluate data quality at aggregated spatial or thematic levels, overlooking the fact that reliability in participatory geospatial platforms emerges from the interaction of individual contributors, editing histories, and feature-level characteristics. As a result, there is a growing need for system-level approaches capable of modeling reliability as a dynamic property of the data production process, rather than as a static attribute of the final dataset.

In response to these challenges, this study proposes a system-oriented framework for estimating the reliability of crowdsourced building polygons using intrinsic quality indicators and unsupervised machine learning techniques. By analyzing contributor behavior, temporal metadata, and geometric evolution extracted from the OpenStreetMap history, the proposed framework models reliability as an emergent property of the underlying participatory system. The framework eliminates reliance on authoritative reference datasets and expert-based weighting, thereby enhancing objectivity, scalability, and reproducibility. The main contributions of this research are (1) the development of intrinsic reliability indicators tailored to polygonal geospatial data; (2) the introduction of an unsupervised, data-driven approach for criterion weighting; (3) feature-level reliability classification within a large-scale crowdsourced system; and (4) the provision of a transferable methodological foundation for trust-aware analysis in open geospatial systems.

Our findings have significant implications for both academic research and practical applications, offering urban planners, emergency responders, and GIS professionals robust tools to assess the suitability of VGI data for critical decision-making processes. Unlike previous studies, which depended heavily on reference data or expert evaluations, this research utilizes unsupervised learning and data mining techniques to deliver a scalable, data-driven solution for trust assessment in VGI environments. Ultimately, this work contributes to the broader objective of establishing VGI as a reliable complement to authoritative geospatial data sources.

This paper is organized as follows. After an introductory assessment of intrinsic quality, Section 2 provides a detailed discussion of the materials and proposed approach. Section 3 examines and evaluates the outcomes derived from implementing the proposed method. Finally, Section 4 and Section 5 present the discussion and conclusion.

Intrinsic Assessment of Quality

Intrinsic assessment of quality refers to the evaluation of data trustworthiness based on the internal characteristics of the dataset itself, without relying on external reference datasets. This approach is particularly valuable in the context of VGI, where authoritative ground-truth data may be unavailable, inconsistent, or impractical to obtain at scale. One of the key concepts employed in intrinsic assessment is reliability. The concept of reliability in geospatial data was first introduced by Azouzi [27], who emphasized the role of user trust in assessing the credibility of geospatial information. Reliability in a dataset represents the degree of confidence or trust that users might have. It directly relates to data quality; more dependable data is generally associated with higher quality. When a dataset consistently produces accurate and valid results, users can rely on it for analysis, decision-making, and other purposes, which is considered reliable [28].

Quality and reliability are two closely related concepts in the context of data quality analysis. Given the importance of geospatial data sharing and the increasing use of crowdsourced geospatial data in geospatial information systems, evaluating the reliability of geospatial data has become a significant area of research [29]. Another motivation for investigating the reliability of crowdsourced geospatial information is its potential feasibility as a substitute for established geospatial data sources. Data reliability is influenced by the volunteer nature of contributions, the concept of developing crowdsourced geographic information infrastructures based on open data input, and the characteristics of contributors [30,31].

This example highlights the increasing use of VGI by established mapping organizations. VGI is a substantial and frequently updated source of publicly available geographic data, making it highly valuable. However, the reliability of VGI depends on the quality of data contributed by individuals with varying levels of expertise. Despite this, VGI offers organizations significant cost-saving opportunities by reducing the need for expensive data collection methods and helping to improve the accuracy of maps and geographic information. Thus, the advantages of using VGI are balanced against potential risks to data validity and quality.

To ensure integration and consistency in assessing spatial accuracy, attribute accuracy, logical consistency, completeness, data generation processes, and metadata, a comprehensive set of geospatial data quality control standards has been established [32]. The most recent version of these standards is ISO 19157:2013 [33].

One of the primary challenges in applying geographic information quality standards to crowdsourced data arises from the diverse processes involved. This issue poses a significant barrier to extending the application of spatial information quality standards to the domain of crowdsourced geospatial data. However, certain quality criteria for crowdsourced geospatial data—such as positional accuracy, completeness, and logical consistency—have been implemented based on related concepts from the aforementioned standards. Teimoory et al. [28] defined the concept of reliability using the data history file and quantitative measures, including the number of versions, the number of contributors, temporal changes, and the number of tag edits. To evaluate reliability, the results of the proposed model were compared with existing official data from the study area, and the role of each criterion in data conformity and reliability was examined.

Intrinsic evaluation approaches focus on documented occurrences throughout the data lifecycle, fundamental data attributes, and the development of estimation models to predict quantifiable metrics. These methodologies address the problem of data quality [8,13,16,28,34,35,36,37,38,39].

Reliability in volunteered geographic information is defined as the degree of confidence that users have in the information associated with a given feature. Haklay et al. [39] emphasized the importance of reliability in the use of VGI, which has contributed to its increasing adoption.

In this study, after identifying and evaluating relevant criteria from the literature review, six criteria were extracted from the data source to assess the reliability of polygon data: the number of contributors, the number of versions, the creation date, the last edit date, the number of tag edits, and cumulative area change. Each of these criteria is described below.

Number of Contributors: This metric reflects the total number of distinct users who have contributed to creating or editing a feature, such as a building, in OSM. According to the “many-eyes” principle, the reliability of a feature increases with the number of contributors involved in its editing. In other words, the more users who validate a feature, the higher its reliability [40]. Therefore, this criterion is regarded as a positive indicator.

Number of versions: The number of versions refers to how many times a feature—such as a building, road, or park—has been edited, updated, or corrected by users over time. A higher number of versions indicates that the feature has undergone more revisions, suggesting ongoing quality improvement. Therefore, a greater number of versions in the history file implies enhanced quality and increased reliability of the feature [16]. This criterion is considered a positive factor.

Creation Date: In geospatial data, the creation date of a feature refers to the date when a volunteer user first created that feature. In the OSM Full History file, each feature—such as a building—has a temporal series of versions, with the first record in this series representing the creation date of the feature. The earlier the creation date, the greater the likelihood of subsequent edits. Since the recorded value represents the time difference (in days) between a reference date (1 January 1970) and the feature’s creation date, a lower value indicates an earlier creation. Therefore, this is considered a negative criterion.

Last edit date: This refers to the most recent modification made to a feature, which may include changes to its geometry (such as location or shape) or its descriptive attributes (tags). The more recent the last edit date of a feature, the higher its reliability [16]. It is important to note that the value obtained for this criterion represents the number of days between the reference date and the feature’s last edit date. A higher value indicates more up-to-date data. Therefore, this criterion is considered a positive indicator.

Number of tag edits: This refers to the number of times the descriptive tags of a feature have been modified throughout its history. The frequency of edits and corrections made to a feature’s tags (e.g., a building’s land use tag) indicates a degree of ambiguity [28,40]. A higher number of tag edits corresponds to lower reliability of the feature. Therefore, this criterion is considered a negative indicator.

Accumulative Area Changes: The accumulative area change is a metric used to quantify the extent of geometric changes in the area of a feature (such as a building) over time and across its different versions. It is calculated for each version relative to the previous one, and the cumulative sum is recorded for the corresponding polygon ID in the latest version. This metric indicates the rate at which the area is increasing or decreasing (see Equation (1)). Essentially, the trend of polygon area changes in the history file is analyzed by calculating the area change indicator. This metric is considered a positive criterion.

A c c u m u l a t i v e a r e a c h a n g e = \sum_{1}^{n} |a r e a_{t n} - a r e a_{t n - 1}|

(1)

2. Materials and Methods

This research introduces a novel approach for evaluating the reliability of volunteered geospatial polygon data without relying on reference data. Instead, the method utilizes intrinsic criteria—fundamental characteristics inherent within the data itself—to assess reliability systematically. By analyzing these criteria, the study derives key insights into data quality and employs machine learning to identify the criteria that most significantly influence reliability.

The approach integrates two primary sources of OpenStreetMap (OSM) data: historical records and the most recent OSM dataset, ensuring a comprehensive assessment. Subsequently, the study evaluates the effectiveness of the proposed method, with a specific focus on its performance in measuring spatial accuracy.

Figure 1 outlines the overall approach, and the subsequent sections provide a detailed exploration of each step in the process. This structured methodology not only improves the understanding of the reliability of volunteered geospatial data but also offers a scalable tool for future quality assessments.

2.1. Data Collection

The required data were collected from two sources: the OSM history data and the latest OSM version. Both files were downloaded from the official OpenStreetMap repository (https://planet.openstreetmap.org (accessed on 12 February 2025)), which are 214 GB and 75 GB, respectively, in ZIP format. The history file contains the complete edit history of all features (points, lines, and polygons) and documents every user contribution. This comprehensive record enables detailed temporal and structural analysis. To prepare the building polygon data for the study area, the relevant subsets were extracted from the comprehensive history file and from the latest file and converted to the XML (Extensible Markup Language) format.

An XML format is a hierarchical and tag-based structure. In this structure, each geographic feature—such as a node (point), way (line or polygon), or relation—is represented as a discrete XML element, accompanied by metadata including version number, timestamp, user ID, and associated tags. Figure 2 shows an example of an OSM data history file in XML format, corresponding to a polygon feature.

2.2. Preprocessing and Data Preparation

To ensure spatial consistency across the dataset, the OSM-derived data were projected into a common coordinate system, specifically the Universal Transverse Mercator (UTM)-Zone 38 and the WGS84 reference ellipsoid. Polygon features of the study area were then extracted from the OSM history file. During preprocessing, geometric inconsistencies—including duplicate polygons, overlapping features, and cross-layer overlaps—were identified and corrected to improve geometric accuracy and data integrity.

2.3. Criteria Scaling

To perform criteria scaling, the data were converted into a standardized and comparable scale. Fuzzy membership function normalization, a commonly used method in fuzzy set theory, was employed. This approach is particularly effective when dealing with uncertainty in the data [41,42,43].

In fuzzy membership normalization, data values are transformed into membership degrees ranging from 0 to 1, indicating the extent to which each data point belongs to a specific fuzzy set. This transformation provides a continuous representation of uncertainty, rather than confining data within rigid categorical boundaries. The commonly used sigmoid function is employed in this research [28]. This method standardizes heterogeneous data, making it more suitable for further analysis while enhancing comparability and decision-making across different attributes. For positive criteria whose influence increases with higher values, Equation (2) is used. For negative criteria whose influence decreases as their values increase, Equation (3) is applied.

μ_{2} (X_{i}^{j}) = \{\begin{matrix} 2 {(\frac{X_{i}^{j}}{δ_{1}})}^{2}; 0 \leq X_{i}^{j} \leq \frac{δ_{1}}{2} \\ 1 - 2 {(1 - (\frac{X_{i}^{j}}{δ_{1}}))}^{2}; \frac{δ_{1}}{2} < X_{i}^{j} \leq δ_{1} \\ 1; X_{i}^{j} > δ_{1} \end{matrix}

(2)

μ_{μ_{k} (X_{i}^{j}) = \{\begin{matrix} 1 - 2 {(\frac{x_{i}^{j}}{δ_{2}})}^{2}; 0 \leq X_{i}^{j} \leq \frac{δ_{2}}{2} \\ 2 {(1 - (\frac{X_{i}^{j}}{δ_{2}}))}^{2}; \frac{δ_{2}}{2} < X_{i}^{j} \leq δ_{2} \\ 0; X_{i}^{j} > δ_{2} \end{matrix}}

(3)

In these two equations,

X_{i}^{j}

represents the feature value of the i-th instance for the j-th criterion;

δ_{1}

is the upper threshold value for positive criteria; and

δ_{2}

is the upper threshold for negative criteria. The membership degree values (

μ_{k} (X_{i}^{j})

) range between 0 and 1. For each criterion, two threshold values—upper and lower—are defined. For positive criteria, values equal to or greater than the upper threshold are normalized to one, while values equal to or less than the lower threshold are normalized to zero. Conversely, for negative criteria, values equal to or greater than the upper threshold are normalized to zero, and those equal to or less than the lower threshold are normalized to one. These thresholds are applied to eliminate outliers and incorporate expert judgment in constraining the data range. For example, from an expert perspective, a feature modified 10 times is conceptually similar to one modified 27 times. Thus, values equal to or greater than 10 are normalized to one, and the lowest values, which conceptually represent zero, are normalized to zero. The empirical values for the lower and upper thresholds in Table 1 were defined through expert group consultation to minimize individual bias and enhance the reliability of the decisions, which is an approach recognized as effective for reducing subjectivity and improving validity [44].

2.4. Clustering of Building Polygons Based on Reliability-Related Criteria

In the context of geospatial data quality assessment and the automation of this process, the application of machine learning—specifically unsupervised learning methods such as clustering—has shown significant potential for identifying structural patterns and grouping geospatial features based on intrinsic geometric and attribute characteristics. Building polygons can vary considerably in terms of their quality and geometric regularity. Reliability-related criteria, such as the inherent criteria discussed in the Section Intrinsic Assessment of Quality, serve as meaningful descriptors for assessing the quality of these polygons. To effectively group these polygons based on quality, the K-means clustering algorithm, a widely used unsupervised learning method, has been employed [45]. This method partitions an input dataset into K disjoint clusters by minimizing the within-cluster sum of squared errors (SSE) [46]. The algorithm uses distance as a similarity metric, which determines how objects are assigned to the nearest centroid. The most widely used metric in K-means is Euclidean distance [47]. This distance reflects the geometric distance between two data points in an n-dimensional feature space. Equation (4) presents its squared form, which is preferred in K-means due to computational efficiency.

d (p, q) = \sum_{j = 1}^{n} {(p_{j} - q_{i})}^{2}

(4)

Each building polygon is encoded as a feature vector composed of its reliability-based geometric descriptors. K-means iteratively assigns polygons to the nearest cluster centroid, updating the centroids until convergence. This approach facilitates a data-driven classification of buildings into groups of similar criteria in terms of reliability. A critical aspect of implementing K-means is determining the optimal number of clusters k, which greatly influences the clustering results. To address this, the Elbow Method is commonly applied and remains highly effective in practice, particularly for spatial data, where it has successfully identified optimal clusters [48]. This widely used technique evaluates how the Sum of Squared Errors (SSE) varies with increasing values of k. As more clusters are introduced, the SSE generally decreases because polygons are grouped into smaller, more homogeneous clusters, thus reducing intra-cluster variability. This combination of K-means clustering and the Elbow Method enables a robust unsupervised classification framework that can divide building polygons into an optimal number of reliability categories. The quality of a cluster

C_{i}

can be measured using intra-cluster variability, which is calculated as the sum of squared errors (SSE) of the distances between all objects within

C_{i}

and its centroid

μ_{i}

. This objective function is formally defined in Equation (5) [46]:

E = \sum_{i = 1}^{k} \sum_{x_{j} \in C_{i}} {∥ x_{j} - μ_{i} ∥}^{2}

(5)

In this equation, E represents the sum of squared errors (SSE) for all entities in the dataset.

x_{j}

signifies a point representing a data object; and

μ_{i}

refers to the centroid of the cluster

C_{i}

. Both

x_{j}

and

μ_{i}

are multidimensional; specifically, for each object within each cluster, the squared distance to its cluster centroid is calculated, and these values are aggregated across all objects and clusters. The objective function seeks to maximize the density of each cluster while maintaining its distinctiveness. Mathematically, the K-means algorithm steps are as follows:

○

Step 1: Initialization

▪: Randomly initialize k centroids: $μ_{1} {, μ}_{2}, \dots \dots, μ_{k}$

○

Step 2: Assignment Step

▪: Assign each data point to the nearest cluster centroid using squared Euclidean distance based on Equation (6):

x_{j} \in C_{i} i f {∥ x_{j} - μ_{i} ∥}^{2} \leq {∥ x_{j} - μ_{l} ∥}^{2} \forall l = 1,2, \dots, k

(6)

○

Step 3: Update step

▪: Recalculate the centroid $μ_{l}$ of each cluster $C_{i}$ as the mean of all points assigned to that cluster based on Equation (7):

${μ_{i}}^{(t + 1)} = \frac{1}{|C_{i}|} \sum_{x_{j} \in C_{i}} x_{j}$

(7)

where $|C_{i}|$ is the number of points in the cluster $C_{i}$ .

○: Step 4: Convergence Check
○: Repeat steps 2 and 3 until no change happens in cluster assignments based on Equation (8):

\forall j, x_{j} \in {C_{i}}^{(t)} ⟺ x_{j} \in {C_{i}}^{(t - 1)}

(8)

2.5. Criteria Importance Assessment

Measuring the importance of criteria is a fundamental step in multi-criteria decision-making (MCDM) processes. Traditionally, expert judgment has been used to determine these measurements, which can sometimes lead to subjectivity, inconsistency, and time consumption. Recent studies indicate that objective weighting techniques and machine learning provide a consistent alternative by automatically computing criteria weights based on data patterns, thereby reducing the need for extensive expert involvement. Odu [49] emphasized that although subjective methods are simple, they are prone to bias, whereas objective, data-driven approaches produce more reliable and reproducible weighted outcomes. Further research by Van Dua, et al. [50] demonstrated that combining multiple objective weighting methods with machine learning-based ranking strategies enhances the consistency and stability of decision-making models. These findings confirm that employing machine learning for weighting criteria not only accelerates the decision-making process but also improves transparency, reduces errors associated with human judgment, and ensures reproducibility in complex multi-criteria contexts. In the absence of predefined reliability labels, a machine learning approach was implemented to derive the criteria weights objectively.

In the K-means clustering algorithm, the importance of the criteria can be objectively determined by analyzing the distance between cluster centroids and the global mean of each feature. In this approach, features that cause greater deviations of cluster centroids from the overall mean values are considered more discriminative and, therefore, more influential in the clustering process [51,52]. This approach provides a transparent, data-driven alternative to traditional expert-based weighting, aligning with the findings of Manzali, et al. [53], where machine learning methods demonstrated high efficiency in evaluating feature relevance.

2.6. Reliability Class Estimation for Each Cluster

To evaluate the quality class associated with each of the five clusters, a scoring mechanism was developed. After the importance (weight) of each criterion was objectively calculated by analyzing its influence on cluster separation, as discussed in Section 2.5, a weighted reliability score was computed for each polygon within each cluster. This was achieved by Equation (9):

R_{i} = \sum_{j = 1}^{n} W_{j} * X_{i j}

(9)

where

R_{i}

is the reliability for polygon i,

X_{i j}

represents the normalized value of criterion j for polygon i, and

W_{j}

is the calculated importance of criterion j.

2.7. Clustering Quality Assessment

Evaluating the quality of clustering is a critical step in ensuring that the identified groups are both meaningful and reliable. A valid clustering solution should ideally satisfy two conditions: (1) internal consistency, where objects within the same cluster are highly similar to one another and clearly distinct from objects in other clusters, and (2) spatial coherence, where clusters exhibit recognizable and interpretable patterns across the study area. To address these complementary dimensions, clustering quality was assessed using both an intrinsic validity index (the silhouette coefficient) and a spatial autocorrelation measure (Global Moran’s I).

2.7.1. Silhouette Coefficient

The silhouette coefficient is an internal metric that simultaneously evaluates intra-cluster cohesion and inter-cluster separation [54,55,56]. This measure is widely used in clustering evaluation due to its intuitive interpretation and demonstrated reliability across various partitioning algorithms, including K-means. For each polygon (i), the silhouette value is defined by Equation (10):

s (i) = \frac{b (i) - a (i)}{m a x {a (i), b (i)}}

(10)

where

a (i)

is the intra-cluster distance, computed as the average distance between object i and all other objects within the same cluster. This term measures cluster cohesion Equation (11):

a (i) = \frac{1}{|C_{i}| - 1} \sum_{\begin{matrix} j \in C_{i} \\ j \neq i \end{matrix}} d (i, j)

(11)

in which

C_{i}

denotes the set of objects in the same clusters as

i

, and

d (i, j)

denotes the distance between objects

i

and

j

.

b (i)

is the inter-cluster distance, defined as the minimum average distance between objects

i

and all objects in any other cluster

C_{k}

. This term measures separation (Equation (12)):

b (i) = \min_{k \neq i} (\frac{1}{|C_{k}|} \sum_{j \in C_{k}} d (i, j))

(12)

Silhouette coefficient with a value close to +1 indicates well separated and cohesive clusters, while values around 0 suggest that objects lie near boundaries between clusters, and negative values less than 0 to −1 imply possible misclassification. The average silhouette coefficient across all polygons provides an overall measure of clustering validity, serving as a theoretical basis for the practical evaluation in the implementation section.

2.7.2. Moran’s I

While the silhouette coefficient assesses the structural integrity of clusters in feature space, it does not account for their spatial arrangement. To complement this perspective, the Global Moran’s I statistic [57] was applied to evaluate the degree of spatial autocorrelation among cluster assignments. Moran’s I is defined according to Equations (13) and (14):

I = \frac{N}{W} \cdot \frac{\sum_{i = 1}^{n} \sum_{i = 1}^{n} ω_{i j} (X_{i} - \bar{X}) (X_{j} - \bar{X})}{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}}

(13)

W = \sum_{i = 1}^{n} \sum_{j = 1}^{n} ω_{i j}

(14)

where n is the total number of spatial units (building polygons);

X_{i} a n d X_{j}

are attributes of spatial units, while

\bar{X}

is the mean of the attribute values across all units.

ω_{i j} i s

spatial weight and

W

represents the sum of all spatial weights.

The spatial weight matrix

ω_{i j}

generates based on a first-order rook contiguity approach, where two spatial units were considered neighbors only if they shared a common boundary edge. To ensure consistency across spatial units with differing numbers of neighbors, the spatial weights matrix was row-standardized, ensuring that the total weight distributed among a unit’s neighbors equals one. This measure serves as a global indicator of the overall spatial pattern of a variable. Values approaching +1 indicate strong positive spatial autocorrelation, meaning that similar values tend to cluster near each other. Values close to 0 suggest a random spatial distribution without a discernible pattern, whereas values approaching −1 reflect negative spatial autocorrelation, where dissimilar values are systematically interspersed throughout the space. Thus, this measure provides a spatial perspective that complements the silhouette coefficient and enables the assessment of how clusters ranging from “very low” to “very high” building reliability are distributed across the study area.

3. Results

The study area for this research is Tehran, the capital city of Iran (Figure 3). The latest version and complete historical dataset of OSM for the study area were extracted in XML format from the comprehensive latest and historical files, as described in Section 2.1. The dataset for these two subsets is approximately 818 MB for the latest version and around 123 GB for the historical file.

Before preprocessing, the raw OSM subset datasets included all feature types (points, lines, and polygons), containing a total of 1,137,139 features in the latest version and 3,181,423 features in the full history dataset. Among these two subsets, the number of polygon features (including all land use types) was 77,447 in the latest version and 154,307 in the history file. After filtering for building polygons, 58,550 features were identified in the latest version and 92,170 features in the history dataset. These building polygons were then used as the basis for further analysis and quality assessment within the scope of this study.

3.1. Extraction of the Criteria Value

Based on the description provided in the Section Intrinsic Assessment of Quality, the quantitative values for each criterion—including the number of versions, creation date in days, cumulative area change, number of contributors, last edit date in days, and number of tag edits—were extracted from both the latest version and the historical OSM file. Table 2 presents an example of these values for each criterion related to building polygons. Total number of building polygons = 58,550.

After collecting the data related to the criteria, the values were standardized and transformed into a comparable format by converting them to a common scale ranging from 0 to 1 using a fuzzy membership function, as described in Section 2.3. Table 3 presents an example of normalized values for each criterion applied to building polygons.

3.2. Clustering of Building Polygons Based on Reliability

In this implementation phase, the K-means clustering algorithm was applied to classify building polygons into distinct groups based on their reliability criteria. K-means operates by minimizing the intra-cluster variance, specifically the sum of squared distances between data points and their corresponding cluster centroids. However, to effectively apply this method, it is essential to determine the optimal number of clusters, k, which governs the granularity of classification and directly affects the interpretability of the results. As shown in Figure 4, the sum of squared errors (SSE) exhibits a sharp change in slope at k = 5. This elbow point suggests that the building polygons can be effectively grouped into five distinct reliability classes: very high, high, medium, low, and very low. Accordingly, the clustering process was executed with k = 5, and each polygon was assigned to one of the resulting clusters based on its feature vector.

After determining the optimal number of clusters (k), the K-Means machine learning algorithm was applied to the dataset to partition the building polygons into distinct groups. This algorithm assigns each building polygon to one of the k clusters based on its similarity across selected features. Because the clustering process was guided by intrinsic criteria reflecting the reliability of VGI, the resulting clusters can be interpreted as representing different classes of data reliability.

To illustrate the outcome of the clustering process, Table 4 provides a sample showing building polygons alongside their corresponding cluster labels.

3.3. Quantitative Class of Reliability for Each Cluster

To assess the qualitative class corresponding to each of the five identified clusters, a systematic scoring framework was implemented, as previously detailed in Section 2.6. First, the relative importance of each criterion was obtained as shown in Table 5.

The average reliability for each cluster was then calculated to rank the five clusters in ascending order. Clusters with higher mean scores were interpreted as representing building polygons with greater reliability, while those with lower scores indicated less reliability. The five clusters (Table 6) were subsequently classified into five qualitative categories: very high, high, moderate, low, and very low. Figure 5 shows the distribution of these clusters across the case study area. This classification provides an interpretable and quantitative basis for the overall and automated evaluation of building polygons within each cluster.

3.4. Correlation Matrix Between Criteria

To assess the independence and interrelationships among the criteria, pairwise correlations were calculated using the Pearson method. Figure 6 presents a heatmap of the Pearson correlation matrix, illustrating the degree of association between the variables.

The interpretation of correlation coefficients in this study follows commonly accepted guidelines, where values less than 0.10 are regarded as irrelevant, 0.10–0.39 as weak, 0.40–0.69 as moderate, 0.70–0.89 as strong, and 0.90–1.00 as very strong. Two correlations exceed 0.7. As presented in Figure 6, there is a strong positive correlation (r = 0.86) between the number of versions and the number of contributors, and a strong negative correlation (r = −0.76) between the creation date and the last edit date.

3.5. Clustering Quality Assessment

The quality of the K-means clustering solution was assessed using an approach that integrates both intrinsic and spatial perspectives. First, the silhouette coefficient was calculated to provide an internal validity measure of intra-cluster cohesion and inter-cluster separation, as described in Section 2.7.1. The average silhouette score across all building polygons was 0.58, indicating that polygons within the same cluster were fairly well separated. This result confirms the internal validity of the five-cluster partition derived from the reliability-related criteria.

To complement this intrinsic evaluation, the Global Moran’s I as a spatial statistical measure was applied to determine whether clusters exhibited significant spatial autocorrelation or were randomly distributed across the study area, as described in Section 2.7.2. When it was calculated using the categorical cluster field from k-means clustering, Moran’s I was 0.85 with a z-score of 107.9 and a p-value < 0.001, indicating a very strong and highly significant clustered spatial pattern. This demonstrates that polygons belonging to the same reliability class are spatially concentrated rather than randomly distributed.

To further validate findings, Moran’s I was recalculated using the continuous reliability score, which was obtained based on reliability criteria. In this case, Moran’s I value was 0.59 with a z-score of 74.74 and a p-value < 0.001, again confirming a significant tendency toward spatial clustering. Although the magnitude of Moran’s I is lower for the continuous score compared to the categorical clusters, both results consistently show that building reliability values are spatially structured rather than randomly dispersed.

The results of these assessments are summarized in Table 7, which consolidates the numerical values for the silhouette coefficient and Moran’s I under different input fields. Further interpretation of these results is presented in the discussion section.

This analysis serves as a validation step, confirming that the five clusters identified by the K-means algorithm and labeled with qualitative categories (very low, low, moderate, high, very high) correspond closely with the underlying reliability scores. In practical terms, if a cluster is labeled as Moderate, the majority of polygons within that cluster exhibit reliability values concentrated in the moderate range. This consistency indicates that the categorical classification not only reflects algorithmic grouping but also accurately represents the spatial distribution of building reliability.

4. Discussion

The result of the scoring framework reveals a clear differentiation in the reliability of the five clusters. Cluster 5 achieved the highest mean reliability (60.2%), representing polygons with very high reliability, while cluster 2 showed the lowest average score (22.2%), corresponding to the very low reliability class. Clusters 1 and 4 were placed in the moderate and low categories, respectively, whereas cluster 3 was assigned to the high reliability group. Some samples of polygons from each class and their calculated reliability are shown in Figure 7.

This categorizing framework not only provides a systematic means of quantifying polygons’ reliability but also offers an interpretable basis for understanding how individual editing behaviors’ intrinsic criteria influence the trustworthiness of building polygons across the study area.

The correlation analysis demonstrates a positive association between the number of versions and the number of contributors (0.86), showing that these two measures represent the same editing activity. It is reasonable to retain only one or combine them into a new composite criterion, depending on data availability. It is suggested for future research to calculate a Collaboration Rate as the ratio of contributors to versions, which shows how collaborative editing is. Similarly, a strong negative correlation was found between creation date and last edit date (r = −0.76), reflecting the temporal dynamics of data. Rather than excluding one of these criteria, as future research, they can be combined into a composite measure called Data Activity Age, as the time interval between the last edit and creation date.

The results further validate that the five K-means clusters, which were qualitatively labeled as very low, low, moderate, high, and very high, are consistent with the underlying reliability scores. For instance, polygons grouped in the cluster labeled as moderate predominantly exhibit reliability values concentrated in the moderate range. By combining both the silhouette analysis and Moran’s I, the clustering framework is validated from both intrinsic and spatial perspectives, ensuring robust and interpretable classification results.

5. Conclusions

This research proposes a machine learning-based approach to assess the reliability of polygonal building data in OSM by leveraging historical edit information and intrinsic feature attributes. Six key criteria were extracted for each polygon: the number of versions, creation date, last edit date, number of tag edits, cumulative area change, and number of contributors. These criteria were subsequently normalized using fuzzy sigmoid functions to ensure comparability across different value ranges. To evaluate the reliability class of VGI data and assign it to individual data records, a computational algorithm and framework were developed. This system provides a valuable supplementary mechanism for enhancing data quality and enables the implementation of quality control filters within a VGI-based geospatial information system.

In this study, we propose an analytical framework based on data mining and machine learning—specifically, unsupervised learning methods—to estimate the reliability of VGI and assign an appropriate reliability category. By extracting influential features from the data, including spatial, temporal, and other relevant attributes, the reliability of each data record was classified into one of five categories: Very High, High, Moderate, Low, and Very Low.

The spatial distribution of building polygon reliability classes, as illustrated in Figure 5, reveals distinct clustering patterns across the study area. The Moderate class, which constitutes the largest proportion (52.1%), is predominantly concentrated in the northwestern and southwestern sectors. At the same time, it appeared scattered in other parts of the city. Apart from these moderate-class concentrations, the remaining areas mainly consist of buildings with Low reliability (28.9%), notably concentrated in the northwestern sector, with no clear or consistent spatial pattern elsewhere in the urban landscape. In contrast, the Very Low reliability class (12.4%) is primarily concentrated in the central part of the study area, where it forms the majority presence, indicating a strong spatial focus rather than a dispersed distribution. The High reliability class (4.0%) is sparsely distributed, with clusters in certain planned urban districts. The Very High reliability class, representing the smallest proportion (2.6%), is scattered throughout the study area in small, contiguous groups of buildings. These clusters typically consist of several adjoining structures that collectively exhibit high geometric consistency and precise spatial data capture. Overall, the spatial configuration of building polygon reliability demonstrates clear differentiation in distribution across reliability classes.

Future research could explore extending its applicability across various geographic regions and VGI platforms. Additionally, incorporating a broader range of quality indicators may further improve the reliability estimation of polygon features by capturing more diverse aspects of data quality.

Author Contributions

H.H.H.: Conceptualization, Methodology, Data curation, Writing—Original draft preparation; R.A.A.: Supervision, Validation, Writing—Reviewing and Editing; A.C.: Visualization, Investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Datasets and Python 3.12.7 scripts can be provided by the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

VGI	Volunteered Geographic Information
OSM	OpenStreetMap
XML	eXtensible Markup Language
GIS	Geographic Information System

References

See, L.; Mooney, P.; Foody, G.; Bastin, L.; Comber, A.; Estima, J.; Fritz, S.; Kerle, N.; Jiang, B.; Laakso, M.; et al. Crowdsourcing, citizen science or volunteered geographic information? The current state of crowdsourced geographic information. ISPRS Int. J. Geo-Inf. 2016, 5, 55. [Google Scholar] [CrossRef]
Roche, S.; Propeck-Zimmermann, E.; Mericskay, B. GeoWeb and crisis management: Issues and perspectives of volunteered geographic information. GeoJournal 2013, 78, 21–40. [Google Scholar] [CrossRef] [PubMed]
Gray, S.J. Enhancing Geospatial Data: Collecting and Visualising User-Generated Content Through Custom Toolkits and Cloud Computing Workflows. Ph.D. Thesis, UCL (University College London), London, UK, 2023. [Google Scholar]
Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef]
Goodchild, M.F.; Li, L. Assuring the quality of volunteered geographic information. Spat. Stat. 2012, 1, 110–120. [Google Scholar] [CrossRef]
Neis, P.; Zipf, A. Analyzing the contributor activity of a volunteered geographic information project—The case of OpenStreetMap. ISPRS Int. J. Geo-Inf. 2012, 1, 146–165. [Google Scholar] [CrossRef]
Mohammadi, N.; Sedaghat, A.; Khademi, M. Mining spatiotemporal growth pattern of volunteered data using a contributor-based approach. Geocarto Int. 2022, 37, 4805–4822. [Google Scholar] [CrossRef]
D’Antonio, F.; Fogliaroni, P.; Kauppinen, T. VGI edit history reveals data trustworthiness and user reputation. In Proceedings of the 17th AGILE International Conference on Geographic Information Science, Castellon, Spain, 3–6 June 2014. [Google Scholar]
Van Exel, M.; Dias, E.; Fruijtier, S. The impact of crowdsourcing on spatial data quality indicators. In Proceedings of the GIScience 2010 Doctoral Colloquium, Zurich, Switzerland, 14–17 September 2010; pp. 14–17. [Google Scholar]
Neis, P.; Singler, P.; Zipf, A. Collaborative mapping and emergency routing for disaster logistics–case studies from the Haiti earthquake and the UN Portal for Afrika. In Proceedings of the Geoinformatics Forum Salzburg (GI_Forum 2010), Salzburg, Austria, 6–9 July 2010; pp. 239–248. [Google Scholar]
Haklay, M. How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [Google Scholar] [CrossRef]
Wu, H.; Lin, A.; Clarke, K.C.; Shi, W.; Cardenas-Tristan, A.; Tu, Z. A comprehensive quality assessment framework for linear features from Volunteered Geographic Information. Int. J. Geogr. Inf. Sci. 2021, 35, 1826–1847. [Google Scholar] [CrossRef]
Barron, C.; Neis, P.; Zipf, A. A comprehensive framework for intrinsic OpenStreetMap quality analysis. Trans. GIS 2014, 18, 877–895. [Google Scholar] [CrossRef]
Antoniou, V.; Skopeliti, A. Measures and Indicators of VGI Quality: An Overview. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 2, 345–351. [Google Scholar] [CrossRef]
Mooney, P.; Corcoran, P. Characteristics of heavily edited objects in OpenStreetMap. Future Internet 2012, 4, 285–305. [Google Scholar] [CrossRef]
Keßler, C.; De Groot, R.T.A. Trust as a proxy measure for the quality of volunteered geographic information in the case of OpenStreetMap. In Geographic Information Science at the Heart of Europe; Springer: Cham, Switzerland, 2013; pp. 21–37. [Google Scholar]
Xu, Y.; Chen, Z.; Xie, Z.; Wu, L. Quality assessment of building footprint data using a deep autoencoder network. Int. J. Geogr. Inf. Sci. 2017, 31, 1929–1951. [Google Scholar] [CrossRef]
Sehra, S.S.; Singh, J.; Rai, H.S. Assessing OpenStreetMap data using intrinsic quality indicators: An extension to the QGIS processing toolbox. Future Internet 2017, 9, 15. [Google Scholar] [CrossRef]
Minghini, M.; Frassinelli, F. OpenStreetMap history for intrinsic quality assessment: Is OSM up-to-date? Open Geospat. Data Softw. Stand. 2019, 4, 9. [Google Scholar] [CrossRef]
Xu, Y.; Xie, Z.; Wu, L.; Chen, Z. Multilane roads extracted from the OpenStreetMap urban road network using random forests. Trans. GIS 2019, 23, 224–240. [Google Scholar] [CrossRef]
Abdolmajidi, E.; Mansourian, A.; Will, J.; Harrie, L. Matching authority and VGI road networks using an extended node-based matching algorithm. Geo-Spat. Inf. Sci. 2015, 18, 65–80. [Google Scholar] [CrossRef]
Zhou, Q.; Zhang, Y.; Chang, K.; Brovelli, M.A. Assessing OSM building completeness for almost 13,000 cities globally. Int. J. Digit. Earth 2022, 15, 2400–2421. [Google Scholar] [CrossRef]
Zhou, Q.; Zhai, M.; Yu, W. Exploring point zero: A study of 20 Chinese cities. Geo-Spat. Inf. Sci. 2020, 23, 258–272. [Google Scholar] [CrossRef]
Botta, F.; Gutiérrez-Roig, M. Modelling urban vibrancy with mobile phone and OpenStreetMap data. PLoS ONE 2021, 16, e0252015. [Google Scholar] [CrossRef]
Mohammadi, N.; Malek, M. Artificial intelligence-based solution to estimate the spatial accuracy of volunteered geographic data. J. Spat. Sci. 2015, 60, 119–135. [Google Scholar] [CrossRef]
Bączkiewicz, A.; Wątróbski, J. Crispyn—A Python library for determining criteria significance with objective weighting methods. SoftwareX 2022, 19, 101166. [Google Scholar] [CrossRef]
Azouzi, M. Introducing the concept of reliability in spatial data. In Spatial Accuracy Assessment; CRC Press: Boca Raton, FL, USA, 2000; pp. 139–144. [Google Scholar]
Teimoory, N.; Abbaspour, R.A.; Chehreghan, A. Reliability extracted from the history file as an intrinsic indicator for assessing the quality of OpenStreetMap. Earth Sci. Inform. 2021, 14, 1413–1432. [Google Scholar] [CrossRef]
Radosevic, N.; Duckham, M.; Saiedur Rahaman, M.; Ho, S.; Williams, K.; Hashem, T.; Tao, Y. Spatial data trusts: An emerging governance framework for sharing spatial data. Int. J. Digit. Earth 2023, 16, 1607–1639. [Google Scholar] [CrossRef]
Truong, Q.T.; de Runz, C.; Touya, G. Analysis of collaboration networks in OpenStreetMap through weighted social multigraph mining. Int. J. Geogr. Inf. Sci. 2018, 33, 1651–1682. [Google Scholar] [CrossRef]
Nejad, R.G.; Abbaspour, R.A.; Chehreghan, A. Assessing the Reliability of Contributors in VGI Using Implicit Factors. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 10, 215–220. [Google Scholar]
Jakobsson, A.; Giversen, J. Guidelines for Implementing the ISO 19100 Geographic Information Quality Standards in National Mapping and Cadastral Agencies; Eurogeographics Expert Group on Quality: Brussels, Belgium, 2007. [Google Scholar]
ISO 19157:2013; Geographic Information—Data Quality, 2029-6991, S. International Organization for Standardization (ISO): Geneva, Switzerland, 2013.
Girres, J.F.; Touya, G. Quality assessment of the French OpenStreetMap dataset. Trans. GIS 2010, 14, 435–459. [Google Scholar] [CrossRef]
Jilani, M.; Corcoran, P.; Bertolotto, M. Multi-granular street network representation towards quality assessment of OpenStreetMap data. In Proceedings of the Sixth ACM SIGSPATIAL International Workshop on Computational Transportation Science, Orlando, FL, USA, 5 November 2013; pp. 19–24. [Google Scholar]
Mooney, P.; Corcoran, P.; Winstanley, A.C. Towards quality metrics for OpenStreetMap. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 514–517. [Google Scholar]
Neis, P.; Zielstra, D.; Zipf, A. The street network evolution of crowdsourced maps: OpenStreetMap in Germany 2007–2011. Future Internet 2011, 4, 1–21. [Google Scholar]
Keßler, C.; Trame, J.; Kauppinen, T. Provenance and trust in volunteered geographic information: The case of OpenStreetMap. In Proceedings of the First International Conference on Computer Science and Information Technology (COSIT 2011), Bangalore, India, 2–4 January 2011; pp. 2–4. [Google Scholar]
Haklay, M.; Basiouka, S.; Antoniou, V.; Ather, A. How many volunteers does it take to map an area well? The validity of Linus’ law to volunteered geographic information. Cartogr. J. 2010, 47, 315–322. [Google Scholar] [CrossRef]
Yamashita, J.; Seto, T.; Nishimura, Y.; Iwasaki, N. VGI contributors’ awareness of geographic information quality and its effect on data quality: A case study from Japan. Int. J. Cartogr. 2019, 5, 214–224. [Google Scholar] [CrossRef]
Deng, J.; Deng, Y. Information Volume of Fuzzy Membership Function. Int. J. Comput. Commun. Control 2021, 16, 4106. [Google Scholar] [CrossRef]
Chehreghan, A.; Abbaspour, R.A. Improvement of geometric-based roads matching on multi-scale data structures. J. Geomat. Sci. Technol. 2017, 6, 89–102. [Google Scholar]
Chehreghan, A.; Abbaspour, R.A. An assessment of spatial similarity degree between polylines on multi-scale, multi-source maps. Geocarto Int. 2017, 32, 471–487. [Google Scholar]
Bhandari, S.; Hallowell, M.R. Identifying and controlling biases in expert-opinion research: Guidelines for variations of Delphi, nominal group technique, and focus groups. J. Manag. Eng. 2021, 37, 04021015. [Google Scholar] [CrossRef]
Putra, D.M.; Abdulloh, F.F. Comparison of Clustering Algorithms: Fuzzy C-Means, K-Means, and DBSCAN for House Classification Based on Specifications and Price. J. Appl. Inform. Comput. 2024, 8, 509–515. [Google Scholar] [CrossRef]
Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts; Techniques; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
Gupta, M.K.; Chandra, P. An empirical evaluation of K-means clustering algorithm using different distance/similarity metrics. In Proceedings of the ICETIT 2019: Emerging Trends in Information Technology, New Delhi, India, 21–22 June 2019; Springer: Cham, Switzerland, 2019; pp. 884–892. [Google Scholar]
Et-Taleby, A.; Boussetta, M.; Benslimane, M. Faults Detection for Photovoltaic Field Based on K-Means, Elbow, and Average Silhouette Techniques through the Segmentation of a Thermal Image. Int. J. Photoenergy 2020, 2020, 6617597. [Google Scholar] [CrossRef]
Odu, G. Weighting methods for multi-criteria decision making technique. J. Appl. Sci. Environ. Manag. 2019, 23, 1449–1457. [Google Scholar] [CrossRef]
Van Dua, T.; Van Duc, D.; Bao, N.C.; Trung, D.D. Integration of objective weighting methods for criteria and MCDM methods: Application in material selection. EUREKA Phys. Eng. 2024, 131–148. [Google Scholar] [CrossRef]
Hancer, E.; Xue, B.; Zhang, M. A survey on feature selection approaches for clustering. Artif. Intell. Rev. 2020, 53, 4519–4545. [Google Scholar] [CrossRef]
Yu, J.; Zhong, H.; Kim, S.B. An ensemble feature ranking algorithm for clustering analysis. J. Classif. 2020, 37, 462–489. [Google Scholar]
Manzali, Y.; Akhiat, Y.; Barry, K.A.; Akachar, E.; El Far, M. Prediction of student performance using random forest combined with Naïve Bayes. Comput. J. 2024, 67, 2677–2689. [Google Scholar] [CrossRef]
Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Gallego, O.A.; Goikoetxea, I.G.; Rivero, J.F.M.; de la Fuente, J.M.P.; Balda, I.P. An extensive comparative study of cluster validity indices. Pattern Recognit. 2012, 46, 243–256. [Google Scholar] [CrossRef]
Shahapure, K.R.; Nicholas, C. Cluster quality analysis using silhouette score. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 747–748. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, M.; Sun, X. Evolutionary characteristics and influencing factors of non-grain of cultivated land in main grain-producing areas—A case study of Lianyungang City, China. PLoS ONE 2025, 20, e0325259. [Google Scholar] [CrossRef]

Figure 1. Diagram of reliability estimation for polygonal buildings using machine learning. Dash green line shows using machine learning part in reliability estimation.

Figure 2. A sample of an XML file (a) with corresponding polygonal buildings (b) in OSM.

Figure 3. Location of the study area (Tehran, Iran) and overview of the extracted OpenStreetMap dataset.

Figure 4. Elbow Plot showing the optimal number of clusters, k = 5.

Figure 5. Spatial distribution of building polygon clusters classified by qualitative reliability classes.

Figure 6. Heatmap of the Pearson correlation matrix between criteria.

Figure 7. Sample of building polygons from the five clusters with their calculated reliability value.

Table 1. Empirical threshold values for each criterion based on expert Judgment.

Criterion	Lower Threshold	Upper Threshold	Effect on Reliability
Number of versions	0	10	positive
Creation date in days	14,413	19,680	negative
Accumulative area changes (m²)	0	100	positive
Last edit date in day	14,413	19,680	positive
Number of contributors	1	5	positive
Number of tag edits	0	10	negative

Table 2. Example of extracted criteria values of building polygons.

ID	Number of Versions	Creation Date in Days	Accumulative Area Change (m²)	Number of Contributors	Last Edit Date in Days	Number of Tag Edits
32278636	18	15,586	2114.7174	13	19,249	8
36332739	20	14,413	1954.9253	17	19,670	2
37076735	4	14,425	0.1187808	4	18,602	1
39962666	13	15,397	11.566847	12	19,624	10
48829964	4	14,633	0	4	17,334	3

Table 3. The normalized values of the criteria by fuzzy membership (sigmoid function).

ID	Number of Versions	Creation Date in Days	Accumulative Area Change (m²)	Number of Contributors	Last Edit Date in Days	Number of Tag Edits
32278636	1	0.901	1	1	0.987	0.08
36332739	1	1	1	1	0.999	0.92
37076735	0.32	0.999	0.0000028	0.875	0.916	0.98
39962666	1	0.930	0.0267583	1	0.999	0
48829964	0.32	0.996	0	0.875	0.603	0.82

Table 4. Sample of building polygons with their corresponding cluster assignments.

ID	Cluster Number	ID	Cluster Number	ID	Cluster Number
32278636	5	676111335	1	934760774	4
36332739	5	676111336	3	934772353	4
37076735	3	676111337	1	934772354	5
39962666	3	676111338	3	935156794	2
48829964	3	676111335	1	935283065	4

Table 5. Criteria importance based on the K-means method.

Criterion	Number of Versions	Creation Date in Days	Accumulative Area Change	Number of Contributors	Last Edit Date in Days	Number of Tag Edits
Importance	13%	14%	21%	25%	9%	18%

Table 6. Descriptive statistics of each cluster based on reliability-related criteria.

Cluster Number	Cluster Members	Average Weighted Mean Criteria (R%)	Reliability Order	Qualitative Class	Percentage of Each Class
1	30,513	28.4	3	Moderate	52.1
2	7259	22.2	5	Very low	12.4
3	2349	45.9	2	High	4
4	16,895	27.4	4	Low	28.9
5	1532	60.2	1	Very high	2.6

Table 7. Clustering quality assessment results.

Measure	Input Field	Value	Z-Score	p-Value	Interpretation
Silhouette coefficient	K-means clusters (k = 5)	0.58	-	-	Fairly well clustering
Global Moran’s I	Cluster field (categorical)	0.85	107.89	<0.001	Strong spatial clustering
Global Moran’s I	Reliability score (continuous)	0.59	74.74	<0.001	Significant spatial clustering

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hassan, H.H.; Ali Abbaspour, R.; Chehreghan, A. A System-Oriented Framework for Reliability Assessment of Crowdsourced Geospatial Data Using Unsupervised Learning. Systems 2026, 14, 129. https://doi.org/10.3390/systems14020129

AMA Style

Hassan HH, Ali Abbaspour R, Chehreghan A. A System-Oriented Framework for Reliability Assessment of Crowdsourced Geospatial Data Using Unsupervised Learning. Systems. 2026; 14(2):129. https://doi.org/10.3390/systems14020129

Chicago/Turabian Style

Hassan, Hussein Hamid, Rahim Ali Abbaspour, and Alireza Chehreghan. 2026. "A System-Oriented Framework for Reliability Assessment of Crowdsourced Geospatial Data Using Unsupervised Learning" Systems 14, no. 2: 129. https://doi.org/10.3390/systems14020129

APA Style

Hassan, H. H., Ali Abbaspour, R., & Chehreghan, A. (2026). A System-Oriented Framework for Reliability Assessment of Crowdsourced Geospatial Data Using Unsupervised Learning. Systems, 14(2), 129. https://doi.org/10.3390/systems14020129

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A System-Oriented Framework for Reliability Assessment of Crowdsourced Geospatial Data Using Unsupervised Learning

Abstract

1. Introduction

Intrinsic Assessment of Quality

2. Materials and Methods

2.1. Data Collection

2.2. Preprocessing and Data Preparation

2.3. Criteria Scaling

2.4. Clustering of Building Polygons Based on Reliability-Related Criteria

2.5. Criteria Importance Assessment

2.6. Reliability Class Estimation for Each Cluster

2.7. Clustering Quality Assessment

2.7.1. Silhouette Coefficient

2.7.2. Moran’s I

3. Results

3.1. Extraction of the Criteria Value

3.2. Clustering of Building Polygons Based on Reliability

3.3. Quantitative Class of Reliability for Each Cluster

3.4. Correlation Matrix Between Criteria

3.5. Clustering Quality Assessment

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI