Reasoning over Heterogeneous Geospatial Schemas: Aligning Authoritative Taxonomies and Collaborative Folksonomies Through Large Language Models

Souza, Fabíola Andrade; Camboim, Silvana Philippi

doi:10.3390/ijgi15020087

Open AccessArticle

Reasoning over Heterogeneous Geospatial Schemas: Aligning Authoritative Taxonomies and Collaborative Folksonomies Through Large Language Models

by

Fabíola Andrade Souza

^1,2,*

and

Silvana Philippi Camboim

²

¹

Polytechnic School, Federal University of Bahia (UFBA), Salvador 40231-300, BA, Brazil

²

Polytechnic Centre, Federal University of Paraná (UFPR), Curitiba CEP 81531-970, PR, Brazil

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2026, 15(2), 87; https://doi.org/10.3390/ijgi15020087

Submission received: 13 December 2025 / Revised: 28 January 2026 / Accepted: 12 February 2026 / Published: 18 February 2026

(This article belongs to the Special Issue LLM4GIS: Large Language Models for GIS)

Download

Browse Figures

Versions Notes

Abstract

Semantic interoperability remains a critical challenge in Spatial Data Infrastructures (SDIs), particularly when aligning authoritative taxonomies with collaborative folksonomies. Traditional alignment tools often fail to bridge the semantic and structural asymmetry between these schemas. This paper evaluates the capability of Large Language Models (LLMs), specifically distinguishing between traditional architectures and emerging Large Reasoning Models (LRMs), to perform semantic alignment between the Brazilian national topographic data model standard (EDGV) and OpenStreetMap (OSM). Using a formal ontology as a prompting scaffold, we tested seven model versions (including ChatGPT 5, DeepSeek R1, and Gemini 2.5) on their ability to bridge the gap between rigid hierarchical classes and the dynamic, ‘long-tail’ vocabulary of the folksonomy. Results reveal a distinct trade-off: while traditional LLMs exhibited ‘lexical rigidity’ and popularity bias—failing to map low-frequency tags—Reasoning Models demonstrated significantly improved capacity for semantic expansion, correctly identifying complex many-to-one (n:1) relationships across linguistic barriers. However, this reasoning depth often came at the cost of ‘hallucination by over-specification’ and syntactic instability in generating OWL code. We conclude that a neuro-symbolic approach, positioning LRMs as ‘Semantic Catalysts’ within a Human-in-the-Loop (HITL) workflow, provides a viable pathway for interoperability, balancing generative power with the need for logical rigor and spatial validation.

Keywords:

semantic interoperability; Large Language Models (LLMs); neuro-symbolic AI; geospatial ontologies; Spatial Data Infrastructure (SDI); OpenStreetMap

1. Introduction

The representation of geospatial phenomena has long relied on conceptual models that formalize how entities, attributes, and relationships should be interpreted within a given domain. In Spatial Data Infrastructures (SDIs), where data from multiple institutions and scales converge, the lack of shared conceptualizations frequently leads to semantic heterogeneity and obstructs interoperability. This challenge becomes particularly significant when integrating authoritative geospatial datasets—such as Brazil’s official topographic specification (EDGV)—with collaboratively produced structures, such as OpenStreetMap (OSM).

Semantic discrepancies between these models arise from linguistic conventions, cultural abstraction, and domain-specific categorization. Human cognition adds another layer of complexity: conceptualizations reflect hierarchical categorization, prototype structures, and culturally embedded meanings [1,2,3,4]. As a result, aligning heterogeneous schemas requires more than lexical matching; it demands understanding hierarchical structures, domain constraints, and context-dependent meanings.

Discussions regarding geospatial data interoperability highlight criteria for defining heterogeneity, with lexical and geometric variations being the most common [5,6]. Some studies [6,7,8,9] focused on data integration based on geometric matching; however, this study prioritized lexical variability in the conceptual definition of geospatial objects representation and the understanding of hierarchical structures, closer to the theoretical basis of the importance of aligning the concept, as discussed by [10]. Ontologies, therefore, play a central role in structuring conceptual knowledge and mitigating semantic inconsistencies across geospatial datasets [11,12].

Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) have recently created new opportunities for supporting semantic interoperability. Because map legends, data dictionaries, and model definitions are primarily expressed in natural language, LLMs can help infer semantic correspondences across schemas, including those expressed in different languages or at different abstraction levels. Preliminary studies indicate that LLMs can recognize similarities between geospatial concepts when these are presented as structured text or formal ontologies [13,14,15]. However, the extent to which LLMs—and more recent reasoning-oriented variants (Large Reasoning Models, LRMs)—can understand and reason over formalized geospatial ontologies remains insufficiently explored.

Discussions regarding geospatial interoperability traditionally categorize heterogeneity into three levels: syntactic, schematic, and semantic [5,11]. While recent integration efforts often prioritize geometric matching (positional accuracy) as the primary anchor for alignment [7,16,17], this approach relies on the assumption that the objects being matched share a compatible conceptual definition. However, in the integration of authoritative SDIs with collaborative folksonomies, this assumption rarely holds. A geometric overlap between an EDGV feature and an OSM polygon is meaningless if the systems disagree on whether that feature is a ‘Commercial Building’ or a ‘Service Point’.

Therefore, this study prioritizes resolving schematic and semantic heterogeneity [5,18]—specifically, aligning hierarchical taxonomies—as a foundational prerequisite for geospatial integration. We argue that ensuring Thematic Accuracy [19] through ontology-driven reasoning is not merely a linguistic exercise, but a structural requirement to enable valid geometric comparisons. Without first establishing that a generic OSM tag (e.g., building = yes) corresponds conceptually to a specific EDGV class (e.g., ‘Health Building’), geometric algorithms lack the domain constraints needed to validate topological relationships correctly.

Existing research has extensively employed ontology-driven methods for semantic alignment using traditional similarity metrics [6,7,8,20]. While recent studies have evaluated the quality of ontological abstractions [21], there is still a gap in assessing how LLMs handle hierarchical and relational reasoning within these structured representations. Simultaneously, current literature highlights critical limitations in LLM reasoning, including weaknesses in logical consistency and spatial understanding [22,23,24,25]. These findings raise important questions about the suitability of current geospatial alignment architectures.

To address this gap, this study evaluates whether LLMs, both traditional and reasoning-oriented, can interpret an OWL ontology derived from the EDGV schema and establish semantic correspondences, in the lexical variability, with OSM tags, in a neuro-symbolic AI approach [26]. Specifically, this study examines the models’ ability to: (i) comprehend hierarchical structures; (ii) generate semantically coherent alignments across multilingual schemas; and (iii) reproduce the ontology in valid OWL syntax. By assessing seven versions of three model families (ChatGPT, DeepSeek, and Gemini), this paper provides an empirical evaluation of their capabilities and limitations for supporting semantic interoperability in the conceptual integration of geospatial data.

Consequently, this study investigates the ‘neuro-symbolic hypothesis’: the premise that grounding the probabilistic generative power of Large Language Models (Neural) within the rigid logical structure of formal ontologies (Symbolic) can bridge the semantic gap between authoritative schemas and collaborative folksonomies, overcoming the limitations of using either approach in isolation.

2. Related Work and Conceptual Background

2.1. Authoritative Geospatial Models: EDGV

Brazil’s ‘Especificação Técnica para Estruturação de Dados Geoespaciais Vetoriais’ (EDGV) is a prescriptive, hierarchical, and strongly typed national specification designed to promote semantic consistency and interoperability across federal geospatial producers [27]. Its conceptual model is organized into domains, categories, and classes, each defined by controlled vocabularies and attribute domains. This structure was built using the Object Modeling Technique for Geographic Applications—OMT-G [28], which extends the semantic representation of OMT by including geospatial data and is based on UML class diagrams, whose documentation is textual [27].

This structure reflects a top-down governance model aligned with symbolic data-modelling traditions, where each feature class is supported by explicit constraints that guide both data production and validation. For example, a “hotel” is not stored as a free-text label but as an instance of the class Commerce and/or Services Building, associated with a specific, predefined attribute value indicating its subtype [12]. Such granularity enables rigorous, automated quality assurance and supports national-scale interoperability, but also produces rigid conceptual boundaries that complicate integration with more flexible datasets.

2.2. OpenStreetMap: A Collaborative Folksonomy

In contrast to EDGV’s rigidity, OpenStreetMap (OSM) constitutes a decentralized, volunteer-driven mapping initiative governed by community consensus rather than formal standards, acting as a collaborative folksonomy. Unlike formal taxonomies, a folksonomy arises from the aggregation of user-generated tags, reflecting the community’s vocabulary and conceptualizations rather than a top-down hierarchy [29]. Its data model is intentionally lightweight, comprising nodes, ways, and relations whose semantics are expressed entirely through an open-ended system of tags.

Thesekey=value pairs allow contributors to describe features with remarkable flexibility, resulting in a long-tail distribution of data: while a small core of standardized tags (the ‘head’, e.g., building=yes) covers the majority of features, a vast number of low-frequency tags (the ‘tail’) emerge to capture highly specific attributes and local variations [30,31,32,33]. The system supports unlimited multilingual tag values, and its semantics evolve dynamically through use and documentation on the OSM Wiki. As a result, OSM often offers more up-to-date, fine-grained, and context-sensitive information than official maps, particularly in urban centers. However, this flexibility comes at the cost of structural and semantic uniformity, generating inconsistencies and ambiguities that pose substantial challenges for alignment with authoritative standards such as EDGV [12].

2.3. The Semantic Alignment Challenge

Aligning EDGV and OSM involves more than translating terms or matching database structures. It requires reconciling two fundamentally different conceptual models driven by distinct philosophies of data creation. EDGV embodies a prescriptive, Portuguese-language, hierarchical worldview designed for institutional consistency, whereas OSM reflects an emergent, largely English-dominant folksonomy grounded in collaborative, bottom-up practices. This divergence introduces linguistic, structural, and conceptual mismatches.

A single EDGV class may correspond to multiple highly specific OSM tags. Conversely, a detailed OSM feature may need to be decomposed and reinterpreted to fit into EDGV’s rigid schema. More deeply, the two models categorize and describe the world through different epistemological lenses: EDGV defines fixed categories sanctioned by national agencies. At the same time, OSM represents how contributors perceive and describe features in situ. Achieving meaningful interoperability, therefore, requires navigating not only naming differences but also conflicting modelling assumptions about how geographic phenomena are structured and understood.

This challenge has significant practical implications for Brazilian Spatial Data Infrastructures. OSM can complement gaps in official mapping, particularly in fast-changing areas, but only if robust semantic interoperability mechanisms bridge the conceptual chasm between these two models.

2.4. Approaches to Geospatial Semantic Alignment

The challenge of reconciling heterogeneous schemas has been addressed through a lineage of distinct methodological approaches. Early efforts focused primarily on Lexical and Syntactic Matching, employing string similarity metrics (e.g., Levenshtein distance) or thesauri like WordNet to align feature names [6,18]. However, these methods often fail when dealing with the ambiguity of folksonomies, where tags like highway=unclassified carry specific community meanings unrelated to the dictionary definition of “unclassified” [12].

To overcome lexical limitations, Structural and Geometric Matching emerged as a robust alternative. Studies such as [7,16] utilized spatial signatures and graph centrality to infer semantic similarity, operating on the premise that “objects with similar geometries and spatial contexts likely share the same concept”. While effective for matching instances, these approaches struggle to align abstract schemas (T-Box) where no geometry is yet present.

The evolution towards Semantic Web technologies introduced Ontology-Based Alignment, where formal logic defines relationships. Works such as [34] and the logic-based approaches reviewed by [11] demonstrated that formalizing tags as OWL classes enables consistency checking. However, these symbolic methods are historically rigid and brittle when facing the noise and variability of collaborative data.

This study positions itself at the intersection of these lineages—a Neuro-Symbolic approach. By utilizing Large Language Models to handle the linguistic variability (lexical lineage) within a scaffold of formal rules (ontological lineage), we aim to address the limitations of purely symbolic or purely statistical methods.

2.5. AI Paradigms and Ontological Reasoning

Artificial Intelligence provides several pathways for supporting semantic alignment, each grounded in distinct theoretical traditions. Historically, AI has evolved through symbolic, connectionist, and, more recently, neuro-symbolic paradigms [26,35]. Symbolic methods rely on formal representations, logical reasoning, and explicit rules, making them suitable for modelling geospatial concepts whose structure must be verifiable and auditable [26]. However, symbolic systems depend heavily on manually encoded knowledge and scale poorly in the face of linguistic variability.

The rise in connectionist (sub-symbolic) approaches, particularly neural networks and Transformer-based language models, shifted the focus to learning patterns from large corpora [36,37,38]. Large Language Models (LLMs) excel at capturing linguistic regularities and generating plausible text but often struggle with hierarchical reasoning, logical consistency, and systematic generalization [23,25,39]. This limitation frequently manifests as ‘hallucination’—the generation of content that is grammatically correct and contextually confident but factually ungrounded [40]. In geospatial domains, hallucinations are particularly risky as models may fabricate non-existent map tags or spatial relationships that appear valid but violate the underlying data schema. These weaknesses are especially pronounced in tasks involving structured knowledge, such as ontologies.

Ontologies became central during the Semantic Web era as a means to formalize domain knowledge through classes, properties, and axioms [41,42]. While they enable explicit reasoning and consistency checking, their interpretative scope is limited to what is explicitly modelled. Geospatial data, typically expressed through hybrid notations that combine text, geometry types, and hierarchical relationships, presents challenges for both purely statistical and purely symbolic systems [22,43,44]. This has renewed interest in neuro-symbolic approaches that combine the linguistic breadth of LLMs with the logical rigor of formal ontologies [26].

This dichotomy has led to the emergence of Neuro-Symbolic AI, a paradigm that seeks to combine the learning and generalization capabilities of neural networks (connectionist) with the reasoning and interpretability of logic-based systems (symbolic) [45]. In the context of geospatial interoperability, the neuro-symbolic approach proposes using ontology not merely as a target format but as a semantic constraint that guides the LLM, theoretically mitigating hallucinations while leveraging the model’s linguistic flexibility.

3. Methodology

This study investigates the capacity of Large Language Models (LLMs) to reason about formal ontologies. Specifically, we evaluate whether these tools can identify semantically equivalent concepts across two structurally divergent schemas: the prescriptive Brazilian EDGV and the community-driven OSM folksonomy. The experimental workflow (Figure 1) comprises three stages: (i) formalizing the EDGV schema into an ontology; (ii) prompting diverse LLMs to align this ontology with OSM tags; and (iii) evaluating the semantic and structural quality of the outputs.

3.1. Ontology Construction

The ‘Buildings’ category of the EDGV specification was selected as the reference domain, comprising 14 geospatial classes, attributes, and domains (see Table 1). This choice is particularly relevant in the context of developing economies like Brazil, where rapid urbanization dynamics challenge the maintenance of authoritative maps. According to the 2022 Census, 87.4% of the Brazilian population resides in urban areas [46], creating a constant demand for updating the ‘Buildings’ class in Spatial Data Infrastructures. Conversely, this is the most active category in collaborative mapping: as of January 2026, the building key was the most frequent tag in the Brazilian OSM dataset, with approximately 8.8 million features [47], offering a rich and heterogeneous vocabulary (folksonomy) to test the reasoning capabilities of the models.

The original EDGV specification is formalized using the OMT-G [28], which relies on UML class diagrams supported by extensive textual documentation. As no official machine-readable ontology for EDGV is available, the conversion to OWL was performed manually in Protègè (version 5.6.4). This manual modeling was necessary to adapt the schema for neuro-symbolic reasoning. While the original OMT-G model treats characteristics as attributes constrained by domains, our OWL implementation re-engineered these elements into a rigid class hierarchy. For instance, attribute domains were converted into nested subclasses (e.g., Level of Care became a subclass of Healthcare Building, containing subclasses Primary, Secondary, Tertiary)—Figure 2. Additionally, the original textual definitions from the technical specifications were embedded as annotation properties (rdfs:comment) to provide semantic context for the models. These processes resulted in a hierarchy of 87 class elements.

Consequently, the ontology prioritizes the formalization of descriptive semantics (taxonomies, definitions, and property constraints) over geometric primitives. This aligns with the OMT-G modeling principle where geometric shape is a secondary attribute constrained by the class definition (e.g., a ‘Hospital’ must be a polygon), rather than an intrinsic property that defines the concept itself. By focusing on the hierarchical structure, the experiment isolates reasoning about the features, effectively decoupling the semantic alignment challenge from the geometric matching problem.

Additionally, an ‘OSM_Tags’ superclass was established to house the alignment targets. Instances representing real-world locations in Salvador, Brazil, were populated to test instance-level reasoning. The resulting EDGV ontology serves as the ground truth for the alignment experiments and is available in the project repository (see Data Availability Statement). To inspect the resulting taxonomy and class relationships, the ontology was visualized using the OntoGraf plugin for Protègè, as illustrated in Figure 3. The figure legend uses Portuguese EDGV labels; for reference, the translations are: está_operacional (“is operational”), tem_altura_aprox (“has approximate height”), tem_geom_aprox (“has approximate geometry”), tem_nome (“has name”), é_cultural (“is cultural”), and é_turística (“is touristic”). Structural links are represented by has subclass (class → class) and has individual (class → instance).

3.2. Language Models and Dialogue Prompt

The NLP tools selected for the analysis were ChatGPT (Access to ChatGPT: https://openai.com/index/chatgpt/ on 30 September 2025), DeepSeek (Access to DeepSeek: https://www.deepseek.com/ on 30 September 2025), and Gemini (Access to Gemini: https://gemini.google.com/app on 30 September 2025). All three tools are proprietary; however, DeepSeek provides openly accessible free versions, while Gemini and ChatGPT offer both free and paid tiers.

Different versions of each tool were employed, including ChatGPT 4o, o1.preview, and 5.0; DeepSeek V3 and R1; and Gemini 2.0 Flash Thinking Experimental and 2.5 Pro. While strictly speaking, all evaluated models are Large Language Models (LLMs) based on Transformer architectures, we distinguish between General-Purpose LLMs (e.g., ChatGPT 4o, DeepSeek V3) and Large Reasoning Models (LRMs) (e.g., ChatGPT 5.0, DeepSeek R1, Gemini 2.5 Pro). The latter are specifically optimized via reinforcement learning to simulate methodical cognitive processes (Chain-of-Thought) before generating an output.

The selection of these tools was based on several criteria. ChatGPT was included due to its strong performance in earlier studies on semantic reasoning for geospatial ontologies [14,15]. DeepSeek and Gemini were selected primarily because they represent distinct model architectures relevant for comparison.

For the evaluation, all seven model versions received the same task prompt and the same ontology code. The prompt instructed the models to identify the semantic associations between subclasses of the EDGV_Edificações class and the corresponding OpenStreetMap (OSM) tags, and to generate new subclasses within the OSM_Tags class using appropriate OWL notation. This prompt (see Listing 1) was used verbatim in all experiments to ensure full reproducibility.

Listing 1. Prompt used in all experiments.

Considering that the instances in the ontology belong to classes and subclasses of the EDGV_Edificações class, for example, the UPA instance is of the Primário and Edificação de Saúde types.
Considering also that all the classes and subclasses presented in the ontology inherit from EDGV_Edificações and can have OSM tags semantically associated with them (according to existing tags on https://wiki.openstreetmap.org/wiki/Map_features, accessed on 30 September 2025).
Please make the semantic association of all the subclasses of EDGV_Edificações in this ontology with the OSM tags on the website, if compatible.
As a result, create new subclasses for the OSM_Tags class with the name of the OSM tags that should be used to complement the ontology, indicating their semantic association with the corresponding subclass(es) of EDGV_Edificações. Use the appropriate OWL notation.

3.3. Evaluation Protocol

Generated ontologies were analyzed structurally using the Protègè software and visualized using the OntoGraph tool, with findings detailed in Section 4 and the discussion in Section 5. LLM’s performance against the dialogue prompt (Listing 1) was evaluated using a set of metrics covering multiple aspects:

Completeness in Semantic Alignment: This metric measures the number of classes accurately associated relative to the total of 87 classes, attributes, and domains of the EDGV. The validation of ‘accuracy’ was not based on simple lexical matching but benchmarked against the expert-validated semantic alignment established by [12]. In that reference study, the correspondence between EDGV and OSM was derived through a rigorous manual analysis of technical legislation and conceptual definitions. Therefore, an association was considered a ‘True Positive’ only if it aligned with this Gold Standard or offered a logically valid alternative justified by the cross-lingual definitions.
Syntactic Conformity: the ability to generate valid OWL code while maintaining structural integrity, for example, preserving the existing class hierarchy, consistently creating new classes representing the OSM tags as subclasses of OSM_Tags, maintenance of the original object properties and instances, and use of ‘EquivalentTo’ notation as an element to semantic association.
Complex reasoning: test of LLM’s ability to infer domain rules, by making associations at different hierarchical levels (e.g., all religious buildings correspond to ‘amenity=place_of_worship’). Additionally, the ability to classify relationships into multiple associations (1:1, 1:n, and n:1) and identify the classes involved in many-to-one relationships.

This process evaluates model evolution (between versions of the same LLM) and variation (between different LLMs), as well as their typical failure modes. The goal was to isolate the impact of incremental training and each LLM’s architecture on the ability to understand the ontology and generate semantic alignment, as measured by the applied metrics. Ultimately seeking to understand the advantages and limitations of integrating symbolic AI (ontology) with sub-symbolic AI (direct text dialogue), in a neuro-symbolic approach.

4. Results

The resulting ontologies for each model are available in the Supplementary Material (see Data Availability Statement). This section details the structural and semantic outcomes categorized by model architecture.

4.1. OpenAI (ChatGPT)

The traditional model (ChatGPT 4o) produced minimal alignment, identifying only four OSM tags (e.g., healthcare=hospital, tourism=hotel). Structurally, it failed to preserve the input ontology’s hierarchy, properties, or instances, resulting in a flat list of simple 1:1 associations using the ‘EquivalentTo’ notation (Figure 4).

A clear evolution was observed in the reasoning models. ChatGPT o1-preview increased the alignment to 14 associations but still discarded the original hierarchy and annotations (Figure 5). Conversely, the most advanced version (ChatGPT 5.0) achieved 31 associations and successfully maintained the original class hierarchy, object properties, and instances (Figure 6). Notably, it began to demonstrate complex reasoning, identifying 1:n relationships (e.g., relating the ‘Hotel’ domain to both amenity=hotel and building=hotel), although it duplicated the structural tree in the output.

4.2. DeepSeek

DeepSeek V3 (Traditional) struggled with OWL syntax, generating only 11 associations and inverting the hierarchy, placing the original EDGV classes as subclasses of OSM tags (Figure 7). Furthermore, it associated concepts only at the key level (e.g., educational building linked simply to amenity), ignoring specific values.

Figure 7. Ontological alignment generated by DeepSeek V3 (Traditional LLM). The alignment connects the EDGV building module (EDGV_Edificação and its subclasses) to high-level OSM tag groupings through the hasOSMTag property. Blue arrowed links represent has_subclass relations. Yellow dashed links denote asserted hasOSMTag (Domain > Range) connections, while orange dashed links represent hasOSMTag restrictions used in Subclass (∃ …) axioms. Full English translations are provided in the Supplementary Material, as a high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025). The reasoning model (DeepSeek R1) significantly improved semantic recall, identifying 34 associations, including complex n:1 relationships (e.g., grouping multiple healthcare levels under amenity=hospital), and consistently applied the ‘EquivalentTo’ axioms directly to class names (Figure 8). However, structural integrity remained poor; the model failed to preserve the original hierarchy and properties.

Figure 8. Ontological alignment generated by DeepSeek R1 (Reasoning Model). Yellow-circle nodes represent regular classes, while yellow-circle nodes with three horizontal bars indicate classes that received an EquivalentClass axiom in the generated alignment. Blue arrowed links indicate has subclass relations. Full English translations are provided in the Supplementary Material, as a high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

4.3. Google (Gemini)

Gemini models achieved the highest recall, identifying 62 semantic associations in both versions. The preview model (2.0 Flash Thinking) exhibited structural flaws similar to DeepSeek, inverting the hierarchy and merging class names (e.g., OSM_Tags_amenity_hospital) (Figure 9).

Figure 9. Ontological alignment generated by Gemini 2.0 Flash Thinking (Reasoning Model). Yellow-circle nodes represent regular classes, and blue arrowed links indicate has_subclass relations, highlighting the taxonomy produced from the model output. Full English translations are provided in the Supplementary Material, as a high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025). The advanced reasoning model (2.5 Pro) delivered the most robust structural performance. It preserved the complete EDGV ontology, including the hierarchy, instances, and properties, while generating 50 domain-level and 10 class-level associations. It also demonstrated advanced inference capabilities by establishing n:1 relationships for complex categories, such as mapping multiple religious domains (‘Church’, ‘Mosque’, ‘Synagogue’) to a single amenity=place_of_worship tag. However, it altered original class names by prepending the associated OSM keys (Figure 10). The figure legend uses Portuguese EDGV labels; for reference, the translations are: está_operacional (“is operational”), tem_altura_aprox (“has approximate height”), tem_geom_aprox (“has approximate geometry”), tem_nome (“has name”), é_cultural (“is cultural”), and é_turística (“is touristic”). Structural links are represented by has subclass (class → class) and has individual (class → instance).

Figure 10. Ontological alignment generated by Gemini 2.5 Pro (Reasoning Model). Arrowed links represent structural relations (has subclass: class → class; has individual: class → instance), whereas the remaining EDGV-derived relations are shown as dashed links (see legend). Yellow-circle nodes represent regular classes, while yellow-circle nodes with three horizontal bars indicate classes that received an EquivalentClass axiom in the generated alignment. Purple diamonds denote individuals (instances). A high-resolution version of this figure and the Portuguese–English label translations are provided in the Supplementary Material; available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

4.4. Qualitative Analysis

Given the quantitative results presented in the previous sections, we highlight several noteworthy examples to qualitatively assess the reasoning depth. This analysis accounts for both cross-lingual inference (Portuguese-English) and hierarchical categorization. The model outputs were benchmarked against the findings of [12], who employed a traditional manual matching method supported by extensive documentary analysis (e.g., Brazilian Traffic Code and DNIT manuals).

For instance, while the traditional manual method [12] required consulting external legal documents to distinguish between ‘Road’ and ‘Highway’, the Reasoning Models (LRMs) demonstrated an ability to infer these nuances directly from the ontology’s context. Regarding specific mappings, ChatGPT performed most associations at the domain level—the most concrete—using a 1:1 mapping. This indicates a direct translation in cases such as ‘Bank’ with the amenity=bank tag, ‘Pharmacy’ with amenity=pharmacy, or ‘Supermarket’ with shop=supermarket and amenity=supermarket.

However, particularly in the reasoning version, some associations diverge from this direct relationship, aggregating concepts into a more abstract level. Examples include ‘Religious Building’, which, despite being directly associated with the tag building=religious, was also linked to amenity=place_of_worship; and ‘Educational Building’, associated with the tags amenity=school and amenity=university. A noteworthy occurrence in the o1-preview version involves certain religious buildings that were associated through direct translation but with a contextual adjustment—for instance, ‘Mosque’ with amenity=place_of_worship_muslim and ‘Temple’ with amenity=place_of_worship_generic. In these cases, the value qualifies the type of religion, even though such tags are not officially recorded in OSM.

In contrast, DeepSeek, despite its poor performance in maintaining OWL code, exhibited more appropriate levels of generalization, even in its standard version. This version considered only the keys rather than the values, correlating EDGV classes such as ‘Farming Building’ and ‘Mineral Extraction Building’ with the landuse key, or ‘Building of Leisure’ with leisure and tourism. Its reasoning version, meanwhile, yielded more complex groupings, such as ‘Secondary’ and ‘Tertiary’ (under ‘Healthcare Building’/‘Level of Care’) associated with the amenity=hospital; ‘Indigenous Building’ with building=hut; or ‘Church’, ‘Mosque’, ‘Synagogue’, ‘Temple’, and ‘Afro-Brazilian religious temple’ with the amenity=place_of_worship. This latter association was also observed in Gemini’s reasoning version.

Finally, regarding the systematic analysis of failure cases, two distinct patterns emerged regarding the folksonomy structure:

1. Frequency Bias (The ‘Long Tail’ Problem): Cross-referencing the results with the tag counts provided in the Supplementary Material (Table S1), we observed a strict cutoff in traditional models. ChatGPT 4o and DeepSeek V3 only identified high-frequency global tags (minimum instance count >6000 and >12,000, respectively), effectively ignoring the ‘long tail’ of the collaborative vocabulary. In contrast, Reasoning Models (e.g., DeepSeek R1 and Gemini 2.5) successfully retrieved rare tags with as few as 13 instances (e.g., building=cowshed), demonstrating a capacity to navigate the full breadth of the folksonomy.

2. Hallucination by Over-specification: A failure mode specific to the reasoning models was the generation of non-existent tags (zero frequency) to force a precise semantic fit. For instance, to distinguish between religious denominations defined in the ontology, ChatGPT o1 invented tags such as amenity=place_of_worship_muslim and _christian (0 instances), rather than accepting the broader valid tag amenity=place_of_worship. Similarly, Gemini 2.5 proposed man_made=beehive (0 instances) instead of the existing man_made=apiary. This suggests that while LRMs possess superior semantic inference, they prioritize logical precision over adherence to the existing controlled vocabulary.

5. Discussion

We hypothesized that structuring schemas within an ontology enables an LLM to simulate human cognitive categorization more effectively, moving from concrete objects to abstract hierarchies [1,2], thereby simulating a neuro-symbolic AI [26]. By constraining the LLM’s probabilistic analysis with a structured input, we aimed to limit the randomness inherent in ‘sub-symbolic’ AI. The results confirm that while LLMs have evolved significantly in interpreting these structures, a gap remains between semantic recognition and logical syntactic generation.

5.1. Completeness in Semantic Alignment

As detailed in Table 2 and Figure 11, reasoning models (LRMs) consistently outperformed traditional architectures in semantic recall. Gemini (versions 2.0 and 2.5) achieved the highest completeness (~60% of classes associated), a substantial improvement over ChatGPT 4o’s conservative performance (~4.6%). Crucially, the models successfully navigated the cross-lingual barrier (Portuguese-English) without explicit translation steps, corroborating the findings of [13]. While Table 2 summarizes the quantitative results, the complete list of semantic associations derived by each model—including the specific OSM tags and logical axioms employed—is detailed in the Supplementary Material.

It is noteworthy that omission rates were high, mainly in traditional models. There are two types of false positives. The first are semantically inconsistent, such as the tag ‘historic’ for Phenomenon Mediation Station, as suggested by DeepSeek V3. These associations are rare and did not exceed one or two tags per model. The few errors observed involved generic mappings (e.g., associating ‘Mineral Extraction’ broadly with building=industrial), which, while technically imprecise, are not semantically incoherent.

The second error is when the tag, even if it makes semantically sense, represents zero or very low usage on the Taginfo platform. These are mainly non-existent tags or tags not recommended for use within the OpenStreetMap Wiki instructions, as is the case with building=pigsty, whose instructions in natural language on the page are to join it with the tag building=sty. This type of error was not observed in DeepSeek. Thus, even though the quantity and quality of associations in LRM versions have improved, a more direct translation of terms is still used, failing to address hallucinations or natural-language instructions on the platform, at least when such concerns are not mentioned in the original prompt.

Although Gemini 2.5 achieved the highest recall, finding low-frequency concepts that Traditional models missed, raw generation numbers must be interpreted with caution due to the 5.8% hallucination rate (non-existent tags) identified.

5.2. Syntactic Conformity and Structural Integrity

While semantic retrieval improved, the ability to generate valid, structurally sound OWL code remains a significant bottleneck (Table 3). DeepSeek and the early Gemini version failed to preserve the input hierarchy, frequently inverting relationships by making the original EDGV classes subclasses of OSM tags. This fundamental logical error highlights the distinction between linguistic understanding (identifying that ‘Hospital’ relates to ‘Health’) and formal reasoning (understanding that a Class cannot logically be a subclass of an Attribute).

ChatGPT demonstrated the highest syntactic stability, creating valid ‘EquivalentTo’ axioms and preserving object properties. However, it exhibited a ‘Structural Flattening’ failure mode: despite valid syntax, it systematically discarded the nested hierarchy of the input ontology, reducing the complex schema to a linear list of properties. This structure suggests an inability to maintain hierarchical depth during translation. Conversely, Gemini 2.5 Pro, despite its superior semantic recall, altered original class names (e.g., merging keys into names such as OSM_key_value), thereby compromising the ontology’s integrity. These results align with those of [48], who observed that LLMs struggle to self-correct in code-generation tasks, often prioritizing content over syntax.

5.3. Reasoning Depth: Hierarchical Levels and Multiplicity

Cognitive theory posits that matching concepts at higher levels of abstraction (Classes) is more cognitively demanding than matching direct instances or domains [1,49]. The data in Table 4 and Figure 12 support this: traditional models predominantly formed associations at the simpler Domain level (direct term translation). In contrast, reasoning models (DeepSeek R1, Gemini 2.5) increasingly operated at the Class level, demonstrating an ability to infer broader categorical equivalence.

This reasoning capacity is further evidenced by the handling of multiplicity (Table 5 and Figure 13). Semantic alignment is rarely one-to-one; it often requires aggregating multiple source concepts into a single target tag (n:1).

While traditional models defaulted to simple 1:1 mappings, reasoning models successfully identified complex n:1 relationships. For instance, Gemini 2.5 correctly aggregated diverse EDGV religious domains (‘Church’, ‘Mosque’, ‘Temple’) into a single OSM tag, amenity=place_of_worship. This capability indicates a shift from mere lexical matching to genuine semantic inference, addressing a core challenge in aligning authoritative taxonomies with collaborative folksonomies [12].

5.4. Impact on Geospatial Interoperability

The findings reveal a distinct trade-off between syntactic stability and semantic reasoning depth, with no single architecture achieving simultaneous success across all dimensions. Traditional models (e.g., ChatGPT 4o) demonstrated superior syntactic adherence and low hallucination rates (zero non-existent tags, as shown in Table 2). However, this came at the cost of ‘lexical rigidity’: these models failed to perform the complex n:1 generalizations required to bridge the folksonomy, resulting in high omission rates (over 80%).

In contrast, Reasoning Models (LRMs) prioritized semantic expansion. As shown in Table 2, models such as Gemini 2.0 and 2.5 drastically reduced omissions and successfully identified complex associations. Nevertheless, this generative capability introduced a specific failure mode: ‘Hallucination by Over-specification’. Gemini 2.0, for instance, generated 13 plausible but non-existent tags, effectively prioritizing semantic fit over adherence to the strict OSM vocabulary. Notably, the evolution to Gemini 2.5 Pro showed a significant correction of this behavior, increasing valid associations to 55 while reducing hallucinations to 5, suggesting that newer LRMs are beginning to balance reasoning with factual grounding.

This marks a paradigm shift from traditional ontology-matching systems. Well-known tools like LogMap [50] or AgreementMakerLight (AML) [51] usually use metrics such as the Levenshtein distance to measure how similar two pieces of text are and to match their structures. They rely heavily on lexical similarity and structural isomorphism, typically failing in cross-lingual, asymmetric scenarios without extensive preprocessing. These systems are very good at working with a single language and formal ontologies. However, they face specific challenges with SDIs, such as cross-lingual barriers (e.g., Portuguese vs. English) and the unstructured nature of folksonomies. Unlike these tools, LRMs demonstrated a native capacity to bridge semantic gaps ‘zero-shot’, internalizing the translation and disambiguation steps.

However, the transition from unstructured text to formal OWL ontologies introduces a ‘structural gap’. Our findings align with broader AI research, indicating that while LLMs excel at natural language reasoning, they face constraints with logical consistency [23,24,25]. The syntactic instability observed in DeepSeek and Gemini 2.0 (e.g., inverting hierarchies) mirrors documented weaknesses in formal code generation. However, the superior performance of LRMs in preserving subclass structures supports the ‘neuro-symbolic’ proposition. By using the ontology as a formal scaffold to constrain the model’s generative tendencies, a hybrid workflow—combining LRM semantic suggestions with human validation of ‘non-existent’ tags—represents the most viable path for operational interoperability.

5.5. Limitations and Directions for Future Research

This study presents constraints that outline clear avenues for future investigation. Firstly, the evaluation focused on a single EDGV category (‘Buildings’), selected for its high volume of collaborative tags. Future work should expand to domains with distinct topological characteristics, such as ‘Hydrography’ (connected networks) or ‘Administrative Boundaries’ (partitioned spaces), and test alignment against other standards, such as CityGML or INSPIRE, to verify generalizability across different modeling philosophies.

Secondly, while manual evaluation was necessary to establish a semantic ground truth, it limits scalability. Subsequent studies should integrate automated reasoning engines (e.g., HermiT or Pellet) into the pipeline to validate the logical consistency of the generated OWL code in real-time. Furthermore, establishing a larger, community-validated “Gold Standard” dataset for English-Portuguese geospatial mapping would enable the use of quantitative metrics (F1-score) without relying solely on expert inspection.

Thirdly, regarding the interaction model, we employed a uniform zero-shot prompt to ensure comparability. Future research must explore advanced strategies, such as Chain-of-Thought (CoT) or “Interactive Prompting”, where the model explicitly asks the human expert for clarification on ambiguous concepts before finalizing the alignment, aligning with the proposed Human-in-the-Loop (HITL) paradigm.

Finally, to transcend generic natural language processing and achieve true geospatial reasoning, future work must integrate spatial constraints directly into the prompting framework. We propose a ‘Topology-Check’ mechanism based on the Dimensionally Extended nine-Intersection Model (DE-9IM) [52]. In this workflow, the LLM would not only suggest a semantic match (e.g., ‘School’ equals ‘Education Building’) but also generate a spatial verification query (e.g., ‘Do instances of School spatially OVERLAP instances of Education Building?’). If the topological relationship contradicts the semantic hypothesis, the model would autonomously revise the alignment. Incorporating these topological axioms as ‘contextual constraints’ would ground the linguistic inference in physical reality, distinguishing geospatial AI from standard text-based tasks.

6. Conclusions

This study reveals that Large Language Models, when grounded by formal ontologies, possess the capability to bridge the semantic gap between authoritative schemas (EDGV) and collaborative folksonomies (OSM), validating the ‘neuro-symbolic’ hypothesis. While traditional LLMs rely on superficial pattern matching, reasoning-oriented models (LRMs) guided by an ontological structure simulate complex semantic inference, effectively identifying equivalences across linguistic barriers. In this context, the ontology served as a crucial semantic and structural scaffold, guiding the models beyond purely linguistic similarity and enabling partial reasoning over class hierarchies.

Our results demonstrate a clear divergence in performance: while traditional LLMs were limited by a ‘popularity bias’—relying on superficial pattern matching of high-frequency tags—Reasoning Models (LRMs) guided by the ontological structure successfully simulated complex semantic inference, effectively identifying equivalences across linguistic barriers and navigating the ‘long tail’ of the folksonomy.

However, significant limitations persist. The models struggled with syntactic generation, highlighting a disconnect between semantic comprehension and logical structuring. Cognitively, this underscores the challenge of automating ‘tacit knowledge’ without explicit spatial grounding. While LRMs inferred complex conceptual relationships, they lack the genuine spatial perception required to fully understand cartographic generalization and scale. Enhancing LLMs to process not only feature names but also their shapes and spatial contexts will be essential to bridging the remaining gap in the next generation of Geospatial AI.

Therefore, we conclude that the optimal role for current Large Reasoning Models in SDIs is not that of an autonomous mapper, but of a ‘Semantic Catalyst’ within a HITL workflow. The experiments demonstrate that while LRMs cannot fully automate the process due to risks of hallucination and syntactic errors, they excel at the labor-intensive task of discovering potential associations across linguistic and conceptual barriers (high recall).

By positioning the LRM as a ‘First-Draft Assistant’, national mapping agencies can use the model’s reasoning capabilities to rapidly generate alignment proposals, shifting the human expert’s effort from creation to validation. This hybrid architecture addresses the challenges associated with the semantic integration’s start-up phase without compromising the rigorous authority required by official cartographic standards.

Supplementary Materials

The data and materials that support the findings of this study are openly available. The repository includes: (1) the reference EDGV ontology (OWL); (2) the complete set of ontologies generated by the evaluated LLMs/LRMs; (3) all figures in high resolution; and (4) a detailed spreadsheet containing the full list of semantic associations, the data used in the graphs and translations English-Portuguese. These resources can be accessed at: https://github.com/LabgeolivreUFPR/reasoning_ontology, accessed on 30 September 2025.

Author Contributions

Conceptualization, Fabíola Andrade Souza and Silvana Philippi Camboim; Formal Analysis, Fabíola Andrade Souza; Investigation, Fabíola Andrade Souza; Methodology, Fabíola Andrade Souza and Silvana Philippi Camboim; Supervision, Silvana Philippi Camboim; Validation, Silvana Philippi Camboim; Visualization, Fabíola Andrade Souza; Drafting—Initial Draft, Fabíola Andrade Souza; Drafting—Revision and Editing, Fabíola Andrade Souza and Silvana Philippi Camboim. All authors read and agreed with the published version of the manuscript.

Funding

This research received no external funding and the APC was funded by the Brazilian National Council for Scientific and Technological Development (CNPq), grant number 316103/2021-7.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors report there are no competing interests to declare.

Abbreviations

The following abbreviations are used in this manuscript:

EDGV	Estruturação de Dados Geoespaciais Vetoriais (Brazilian Portuguese)
LLM	Large Language Model
LRM	Large Reasoning Model
OSM	OpenStreetMap
OWL	Ontology Web Language

References

Rosch, E. Cognitive representations of semantic categories. J. Exp. Psychol. Gen. 1975, 104, 192–233. [Google Scholar] [CrossRef]
Rosch, E. Natural categories. Cogn. Psychol. 1973, 4, 328–350. [Google Scholar] [CrossRef]
Petchenik, B.B. Cognition in Cartography. Cartogr. Int. J. Geogr. Inf. Geovis. 1977, 14, 117–128. [Google Scholar] [CrossRef]
Fremlin, G.; Robinson, A.H. What Is It That Is Represented on a Topographical Map? Cartogr. Int. J. Geogr. Inf. Geovis. 1998, 35, 13–19. [Google Scholar] [CrossRef]
Bishr, Y. Overcoming the semantic and other barriers to GIS interoperability. Int. J. Geogr. Inf. Sci. 1998, 12, 299–314. [Google Scholar] [CrossRef]
Yu, L.; Qiu, P.; Liu, X.; Lu, F.; Wan, B. A holistic approach to aligning geospatial data with multidimensional similarity measuring. Int. J. Digit. Earth 2018, 11, 845–862. [Google Scholar] [CrossRef]
Anand, S.; Morley, J.; Jiang, W.; Du, H.; Hart, G. When worlds collide: Combining Ordnance Survey and Open Street Map data. In AGI Geocommunity ’10; University of Nottingham: London, UK, 2010. [Google Scholar]
Du, H.; Alechina, N.; Jackson, M.; Hart, G. Matching Formal and Informal Geospatial Ontologies. In Geographic Information Science at the Heart of Europe; Vandenbroucke, D., Bucher, B., Crompvoets, J., Eds.; Springer International Publishing: Cham, Switzerland, 2013; pp. 155–171. [Google Scholar] [CrossRef]
Silva, L.S.L. Integração de Dados Provenientes de Mapeamento Colaborativo na Cartografia de Referência do Brasil. Ph.D. Thesis, Federal University of Paraná, Curitiba, Brazil, 2022. [Google Scholar]
Noy, N.F.; Musen, M.A. SMART: Automated Support for Ontology Merging and Alignment. In Workshop on Knowledge Acquisition; Modeling and Management: Banff, AL, Canada, 1999. [Google Scholar]
Janowicz, K.; Scheider, S.; Adams, B. A Geo-semantics Flyby. In Reasoning Web: Semantic Technologies for Intelligent Data Access; Rudolph, S., Gottlob, G., Horrocks, I., Van Harmelen, F., Eds.; Reasoning Web Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8067, pp. 230–250. [Google Scholar]
Machado, A.A.; Camboim, S.P. Semantic Alignment of Official and Collaborative Geospatial Data: A Case Study in Brazil. Rev. Bras. Cartogr. 2024, 76. [Google Scholar] [CrossRef]
Souza, F.A.; da Silva, E.D.B.; Camboim, S.P. Explorando o Uso de Large Language Model (ChatGPT) para Alinhamento Semântico entre Esquemas Conceituais de Dados Geoespaciais. Rev. Bras. Cartogr. 2025, 77, 1. [Google Scholar] [CrossRef]
Souza, F.A.; Camboim, S.P. Advancing Geospatial Data Integration: The Role of Prompt Engineering in Semantic Association with chatGPT. In Proceedings of the Free and Open-Source Software for Geospatial 2024 (FOSS4G 2024), Belém, Brazil, 2–8 December 2024; Session Academic Track, Part Full Papers. pp. 87–92. Available online: https://zenodo.org/records/14250739 (accessed on 16 September 2025).
Souza, F.A.; Camboim, S.P. Semantic Alignment of Geospatial Data Models using chatGPT: Preliminary studies. In Proceedings of the Brazilian Symposium on GeoInformatics; da Fonseca Feitosa, F., Vinhas, L., Eds.; National Institute for Space Research (INPE): São José dos Campos, Brazil, 2023; pp. 399–404. Available online: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85181118913&partnerID=40&md5=45de9b24f4242bc1e4306f46b84a1ed0 (accessed on 16 September 2025).
Novack, T.; Peters, R.; Zipf, A. Graph-Based Matching of Points-of-Interest from Collaborative Geo-Datasets. ISPRS Int. J. Geo-Inf. 2018, 7, 117. [Google Scholar] [CrossRef]
Brovelli, M.A.; Minghini, M.; Molinari, M.E.; Zamboni, G. Positional Accuracy Assessment of the OpenStreetMap Buildings Layer Through Automatic Homologous Pairs Detection: The Method and a Case Study. ISPRS Int. Arch. Photogramm. 2016, 41, 615–620. [Google Scholar]
Al-Bakri, M.; Fairbairn, D. Assessing similarity matching for possible integration of feature classifications of geospatial data from official and informal sources. Int. J. Geogr. Inf. Sci. 2012, 26, 1437–1456. [Google Scholar] [CrossRef]
ISO Standard No. 19157-1:2023; Geographic Information—Data Quality: Part 1: General Requirements. International Organization for Standardization: Geneva, Switzerland, 2023.
Ballatore, A.; Bertolotto, M.; Wilson, D.C. Geographic knowledge extraction and semantic similarity in OpenStreetMap. Knowl. Inf. Syst. 2013, 37, 61–81. [Google Scholar] [CrossRef]
Romanenko, E.; Calvanese, D.; Guizzardi, G. Evaluating quality of ontology-driven conceptual models abstractions. Data Knowl. Eng. 2024, 153, 102342. [Google Scholar] [CrossRef]
Kang, Y.; Gao, S.; Roth, R. Artificial intelligence studies in cartography: A review and synthesis of methods 2342, applications, and ethics. Cartogr. Geogr. Inf. Sci. 2024, 51, 599–630. [Google Scholar] [CrossRef]
Mirzadeh, I.; Alizadeh, K.; Shahrokhi, H.; Tuzel, O.; Bengio, S.; Farajtabar, M. GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. arXiv 2024, arXiv:2410.05229. [Google Scholar] [CrossRef]
Tucker, S. A systematic review of geospatial location embedding approaches in large language models: A path to spatial AI systems. arXiv 2024, arXiv:2401.10279. [Google Scholar] [CrossRef]
Valmeekam, K.; Stechly, K.; Kambhampati, S. LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench. arXiv 2024, arXiv:2409.13373. [Google Scholar] [CrossRef]
Liang, B.; Wang, Y.; Tong, C. AI Reasoning in Deep Learning Era: From Symbolic AI to Neural–Symbolic AI. Mathematics 2025, 13, 1707. [Google Scholar] [CrossRef]
NCB-CC/E 0001B08; Especificações Técnicas para Estruturação de Dados Geoespaciais Vetoriais (ET-EDGV 3.0). Brazilian National Cartography System: Rio de Janeiro, Brazil, 2017.
Borges, K.A.V.; Davis, C.A., Jr.; Laender, A.H.F. OMT-G: An Object-Oriented Data Model for Geographic Applications. GeoInformatica 2001, 5, 221–260. [Google Scholar] [CrossRef]
Trant, J. Studying Social Tagging and Folksonomy: A Review and Framework. J. Digit. Inf. 2009, 10. Available online: https://jodi-ojs-tdl.tdl.org/jodi/article/view/269 (accessed on 16 September 2025).
Mocnik, F.-B.; Zipf, A.; Raifer, M. The OpenStreetMap folksonomy and its evolution. Geo-Spat. Inf. Sci. 2017, 20, 219–230. [Google Scholar] [CrossRef]
Grinberger, A.Y.; Minghini, M.; Juhász, L.; Yeboah, G.; Mooney, P. OSM Science—The Academic Study of the OpenStreetMap Project, Data, Contributors, Community, and Applications. ISPRS Int. J. Geo-Inf. 2022, 11, 230. [Google Scholar] [CrossRef]
Kaur, J.; Singh, J.; Sehra, S.S.; Rai, H.S. Systematic Literature Review of Data Quality Within OpenStreetMap. In Proceedings of the 2017 International Conference on Next Generation Computing and Information Systems (ICNGCIS), Jammu, India, 11–12 December 2017; pp. 177–182. [Google Scholar] [CrossRef]
OSM. OpenStreetMap: Map Features. 2025. Available online: https://wiki.openstreetmap.org/wiki/Map_features (accessed on 30 September 2025).
Codescu, M.; Horsinka, G.; Kutz, O.; Mossakowski, T.; Rau, R. Osmonto—An ontology of OpenStreetMap tags. In State of the Map Europe (SOTM-EU); DFKI GmbH: Bremen, Germany, 2011; pp. 23–24. [Google Scholar]
Mira, J.M. Symbols versus connections: 50 years of artificial intelligence. Neurocomputing 2008, 71, 671–680. [Google Scholar] [CrossRef]
Santhanam, S.; Shaikh, S. A Survey of Natural Language Generation Techniques with a Focus on Dialogue Systems—Past, Present and Future Directions. arXiv 2019, arXiv:1906.00500. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Jozefowicz, R.; Vinyals, O.; Schuster, M.; Shazeer, N.; Wu, Y. Exploring the Limits of Language Modeling. arXiv 2016, arXiv:1602.02410. [Google Scholar] [CrossRef]
Prince, S.J.D. Understanding Deep Learning. 2023. Available online: http://udlbook.com (accessed on 25 November 2024).
Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.J.; Madotto, A.; Fung, P. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Berners-Lee, T.; Hendler, J.; Lassila, O. The Semantic Web: A New Form of Web Content That Is Meaningful to Computers Will Unleash a Revolution of New Possibilities; Scientific American: New York, NY, USA, 2021; Volume 284, pp. 34–43. [Google Scholar]
Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
Zhang, Y.; Wei, C.; He, Z.; Yu, W. GeoGPT: An assistant for understanding and processing geospatial tasks. Int. J. Appl. Earth Obs. Geoinf. 2024, 131, 103976. [Google Scholar] [CrossRef]
Mooney, P.; Cui, W.; Guan, B.; Juhász, L. Towards Understanding the Geospatial Skills of ChatGPT Taking a Geographic Information Systems (GIS) Exam; Newsam, S., Yang, L., Mai, G., Martins, B., Lunga, D., Gao, S., Eds.; Maynooth University: Maynooth, Ireland, 2023; pp. 85–94. [Google Scholar] [CrossRef]
Hitzler, P.; Sarker, M.K.; Eberhart, A. Compendium of Neurosymbolic Artificial Intelligence (V. 369); IOS Press: Amsterdam, The Netherlands, 2023; Available online: https://books.google.com/books?hl=en&lr=&id=MAjXEAAAQBAJ&oi=fnd&pg=PR1&dq=Compendium+of+Neurosymbolic+Artificial+Intelligence&ots=Bg9U4wdLwA&sig=4hE9S_yD8nV4DqP5AUMACCebdgg (accessed on 14 January 2026).
IBGE. Panorama do Censo 2022; IBGE: Rio de Janeiro, Brazil, 2022. Available online: https://censo2022.ibge.gov.br/panorama/?localidade=BR (accessed on 21 January 2026).
OSM. OpenStreetMap: tagInfo Brazil. 2026. Available online: https://taginfo.geofabrik.de/south-america:brazil/search (accessed on 27 January 2026).
Zhang, Q.; Zhang, T.; Zhai, J.; Fang, C.; Yu, B.; Sun, W.; Chen, Z. A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair. arXiv 2024, arXiv:2310.08879. [Google Scholar] [CrossRef]
Bravo, J.V.M. A Confiabilidade Semântica das Informações Geográficas Voluntárias como Função da Organização Mental do Conhecimento Espacial. Master’s Thesis, Universidade Federal do Paraná, Curitiba, Brazil, 2014; p. 139. [Google Scholar]
Jiménez-Ruiz, E.; Cuenca Grau, B. LogMap: Logic-Based and Scalable Ontology Matching. In The Semantic Web—ISWC 2011; Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 7031, pp. 273–288. [Google Scholar] [CrossRef]
Faria, D.; Pesquita, C.; Santos, E.; Palmonari, M.; Cruz, I.F.; Couto, F.M. The AgreementMakerLight Ontology Matching System. In On the Move to Meaningful Internet Systems: OTM 2013 Conferences; Meersman, R., Panetto, H., Dillon, T., Eder, J., Bellahsene, Z., Ritter, N., De Leenheer, P., Dou, D., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8185, pp. 527–541. [Google Scholar] [CrossRef]
Strobl, C. Dimensionally Extended Nine-Intersection Model (DE-9IM). In Encyclopedia of GIS; Springer: Boston, MA, USA, 2008; pp. 240–245. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the methodology. The process begins at the green start node and ends at the red end node. Blue rounded rectangles represent the main processing steps: selecting classes and attributes from EDGV, developing a reference ontology in Protégé, and—for each LLM (grey dashed container)—selecting the model, executing the prompt, and retrieving the generated ontology, followed by analysis of the results. Document icons denote artefacts used or produced, while solid arrows indicate the primary workflow and dotted connections indicate artefact dependencies/outputs. A high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 2. Screenshot of the class hierarchy and annotations in Protégé for the EDGV building theme. Class labels follow the original EDGV terminology in Portuguese; for clarity, key non-English terms shown in the figure are defined here: “Edificação de saúde” = healthcare building, “Nível de atenção” = level of care, and “Primário/Secundário/Terciário” = primary/secondary/tertiary (care levels). Full English translations are provided in the Supplementary Material, as a high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 3. Visualization of the reference EDGV ontology structure using the OntoGraf plugin for Protégé. Circles denote classes and purple diamonds denote individuals (instances). Arrowed links represent structural relations (has subclass: class → class; has individual: class → instance), whereas the remaining EDGV-derived relations are shown as dashed links (see legend). A high-resolution version of this figure and the Portuguese–English label translations are provided in the Supplementary Material; available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 4. Ontological alignment generated by ChatGPT 4o (Traditional LLM). The alignment links EDGV classes to corresponding OSM tags through the hasOSMTag relation. In the OntoGraf visualization, yellow-circle nodes represent “regular” classes, while yellow-circle nodes with the three horizontal bars indicate classes that received an EquivalentClass axiom, and orange dashed links represent hasOSMTag restrictions used in equivalent-class axioms (∃ hasOSMTag). Full English translations are provided in the Supplementary Material, as a high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 5. Ontological alignment generated by ChatGPT o1-preview (Reasoning Model). The graph links OSM tag entities to EDGV classes via “maps to EDGV class” restrictions, formalized as EquivalentClass (∃ …) axioms. Yellow-circle nodes represent regular classes, while yellow-circle nodes with three horizontal bars indicate classes that received an EquivalentClass axiom as part of the alignment. Full English translations are provided in the Supplementary Material, as a high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 6. Ontological alignment generated by ChatGPT 5.0 (Reasoning Model). Yellow-circle nodes represent ontology classes. Blue arrowed links indicate has_subclass relations. The mapping between EDGV classes and OSM tag entities is expressed via the associated_with property in two ways: orange dashed arrows represent asserted associated_with (Domain > Range) links, whereas yellow dashed arrows represent associated_with restrictions used in EquivalentClass (∃ …) axioms. Full English translations are provided in the Supplementary Material, as a high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 11. Comparative performance of semantic recall. The visual analysis highlights the significant reduction in omission errors (grey bars) achieved by reasoning-oriented models compared to traditional architectures. A high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 12. Analysis of reasoning depth by taxonomic level. While traditional models rely predominantly on lexical matching at the ‘Domain’ level, Large Reasoning Models (LRMs) demonstrate advanced inference capabilities, identifying a significantly higher number of associations at the abstract ‘Class’ level. A high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Figure 13. Complexity of semantic mappings. The chart details the cardinality of the associations identified by each model. While traditional models are limited to simple one-to-one (1:1) pairings, reasoning models can infer complex many-to-one (n:1) relationships, aggregating multiple specific domains into broader OSM tags. A high-resolution version of this figure is available at: https://github.com/LabgeolivreUFPR/reasoning_ontology/ (accessed on 30 September 2025).

Table 1. Classes, attributes, and domains from the Buildings Category of EDGV used.

EDGV Class	Attribute	Domain Values (Enumeration)
Building (General/Root)	Operational; Touristic; Cultural	Yes, No, Unknown
Building (General/Root)	Name; Approx. geometry; Approx. height	(Free text/Numeric)
Farming, plant extraction and/or fishing	Building type	Apiary, Aviary, Barn, Pigsty, Farm operational headquarters, Plant nursery, Aquaculture nursery
Commerce and/or Services	Finality	Commercial, Residential, Services
Commerce and/or Services	Building type	Newsstand, Bank, Shopping center, Convention/Exhibition center, Butcher shop, Pharmacy, Hotel, Convenience store, Building materials/hardware store, Furniture store, Clothing store, Public marketplace, Motel, Car Repair, Inn, Greengrocer, Restaurant, Supermarket, Dealership, Other businesses/services
Healthcare	Level of care	Primary, Secondary, Tertiary
Leisure	Building type	Amphitheater, Library, Cultural Center, Documentation center, Circus, Acoustic concert hall, Conservatory, Bandstand, Event/cultural space, Film screening space, Stadium, Gallery, Gymnasium, Museum, Fishing platform, Theater, Public Records, Various cultural facilities
Religious	Building type	Mortuary chapel, Center, Convent, Church, Mosque, Monastery, Synagogue, Temple, Afro-Brazilian religious temple (‘Terreiro’)
Religious	Specific attributes	Christian, Teaching, Religion type
Indigenous	Type	Collective, Isolated
Other Classes	(No specific domains)	Fuel station, Public toilets, Educational Building, Mineral extraction Building, Housing Construction, Residential Building, Measurement station

Source: Adapted from [27]—original in Brazilian Portuguese.

Table 2. Number of EDGV ontology classes semantically associated with OSM tags.

Metric/Category	ChatGPT			DeepSeek		Gemini
Metric/Category	4o	o1.Preview	5.0	V3	R1	2.0	2.5
Unassociated classes (Omission)	83 (95.4%)	73 (83.9%)	60 (69.0%)	77 (88.5%)	53 (60.9%)	25 (28.7%)	25 (28.7%)
Total classes associated	4	14	27	10	34	62	62
– Appropriate associations (True Positive)	4 (4.6%)	10 (11.5%)	23 (26.4%)	9 (10.4%)	32 (36.8%)	49 (56.4%)	55 (63.2%)
– Inappropriate associations (False Positive)—semantically incoherent	0 (0.0%)	0 (0.0%)	1 (1.1%)	1 (1.1%)	2 (2.3%)	0 (0.0%)	2 (2.3%)
– Inappropriate associations (False Positive)—non-existent tags	0 (0.0%)	4 (4.6%)	3 (3.5%)	0 (0.0%)	0 (0.0%)	13 (14.9%)	5 (5.8%)

The total number of EDGV classes (87), whether associated with tags or not, is shown in bold. For those associated, the subdivision into appropriate or inappropriate associations is indicated in italics.

Table 3. Syntactic conformity and structural integrity of generated ontologies.

Model Family & Version	Generated Tags (n)	Structural Integrity (Hierarchy & Classes)	Naming Convention	Ontology Components Preservation (Props/Instances)	Mapping Logic (Association Method)
OpenAI
ChatGPT 4o	4	Failed. Lost hierarchy; retained only associated classes (flat structure).	key–value	Low. Lost properties and instances.	EquivalentTo + ObjectProperty (hasOSMTag)
ChatGPT o1-preview	14	Partial. Retained associated classes but lost global hierarchy.	key_value	Medium. Retained properties; lost annotations/instances.	EquivalentTo + ObjectProperty (mapsTo…)
ChatGPT 5.0	31	High. Preserved full hierarchy and original classes.	OSM_key_value	High. Retained annotations and properties. Instances unlinked.	EquivalentTo + ObjectProperty (associated_with)
DeepSeek
DeepSeek V3	11	Failed (Inverted). Created OSM superclasses containing EDGV subclasses.	key	Low. Lost all components.	SubclassOf + ObjectProperty
DeepSeek R1	34	Failed. Treated original classes as subclasses of OSM tags.	OSM_key_value	Low. Lost all components.	EquivalentTo (direct naming association)
Google
Gemini 2.0 Flash	62	Failed (Inverted). Similar to V3; inverted hierarchy structure.	OSM_Tags_key_value	Low. Lost all components.	SubclassOf (direct)
Gemini 2.5 Pro	62	High (Renamed). Preserved hierarchy but renamed original classes.	key_value = Class	High. Preserved annotations, properties, and instances.	Mixed: EquivalentTo and SubclassOf (Inconsistent)

In bold, highlight the LLM used and its level of success or failure in maintaining the structural integrity and components of the ontology.

Table 4. EDGV taxonomy level with semantic associations.

Taxonomic Level	ChatGPT			DeepSeek		Gemini
Taxonomic Level	4o	o1	5.0	V3	R1	2.0	2.5
Unassociated classes (Omission)	83	73	60	77	53	25	25
Appropriate associations (Total)	4	10	23	9	32	49	55
– Superclass level	0	0	0	0	0	1	0
– Class level	3	2	7	9	5	2	10
– Attribute level	0	0	0	0	2	0	0
– Domain level	1	8	16	0	25	46	45

The number of unassociated or appropriately associated EDGV classes (as per Table 2) is shown in bold. For appropriately associated classes, the subdivision by association level is indicated in italics.

Table 5. Semantic associations categorized by mapping multiplicity.

Mapping Multiplicity	ChatGPT			DeepSeek		Gemini
Mapping Multiplicity	4o	o1	5.0	V3	R1	2.0	2.5
Unassociated classes (Omission)	83	73	60	77	53	25	25
Appropriate associations (Total)	4	10	23	9	32	49	55
– 1:1 mapping (One-to-One)	4	10	20	6	19	45	40
– 1:n mapping (One-to-Many)	0	0	3	1	0	0	0
– n:1 mapping (Many-to-One)	0	0	0	2	13	4	15

The number of unassociated or appropriately associated EDGV classes (as per Table 2) is shown in bold. For appropriately associated classes, the subdivision by multiplicity is indicated in italics.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Share and Cite

MDPI and ACS Style

Souza, F.A.; Camboim, S.P. Reasoning over Heterogeneous Geospatial Schemas: Aligning Authoritative Taxonomies and Collaborative Folksonomies Through Large Language Models. ISPRS Int. J. Geo-Inf. 2026, 15, 87. https://doi.org/10.3390/ijgi15020087

AMA Style

Souza FA, Camboim SP. Reasoning over Heterogeneous Geospatial Schemas: Aligning Authoritative Taxonomies and Collaborative Folksonomies Through Large Language Models. ISPRS International Journal of Geo-Information. 2026; 15(2):87. https://doi.org/10.3390/ijgi15020087

Chicago/Turabian Style

Souza, Fabíola Andrade, and Silvana Philippi Camboim. 2026. "Reasoning over Heterogeneous Geospatial Schemas: Aligning Authoritative Taxonomies and Collaborative Folksonomies Through Large Language Models" ISPRS International Journal of Geo-Information 15, no. 2: 87. https://doi.org/10.3390/ijgi15020087

APA Style

Souza, F. A., & Camboim, S. P. (2026). Reasoning over Heterogeneous Geospatial Schemas: Aligning Authoritative Taxonomies and Collaborative Folksonomies Through Large Language Models. ISPRS International Journal of Geo-Information, 15(2), 87. https://doi.org/10.3390/ijgi15020087

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Reasoning over Heterogeneous Geospatial Schemas: Aligning Authoritative Taxonomies and Collaborative Folksonomies Through Large Language Models

Abstract

1. Introduction

2. Related Work and Conceptual Background

2.1. Authoritative Geospatial Models: EDGV

2.2. OpenStreetMap: A Collaborative Folksonomy

2.3. The Semantic Alignment Challenge

2.4. Approaches to Geospatial Semantic Alignment

2.5. AI Paradigms and Ontological Reasoning

3. Methodology

3.1. Ontology Construction

3.2. Language Models and Dialogue Prompt

3.3. Evaluation Protocol

4. Results

4.1. OpenAI (ChatGPT)

4.2. DeepSeek

4.3. Google (Gemini)

4.4. Qualitative Analysis

5. Discussion

5.1. Completeness in Semantic Alignment

5.2. Syntactic Conformity and Structural Integrity

5.3. Reasoning Depth: Hierarchical Levels and Multiplicity

5.4. Impact on Geospatial Interoperability

5.5. Limitations and Directions for Future Research

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI