Next Article in Journal
Temperature-Dependent Degradation in SiC MOS Structures Under Laser-Assisted AC BTI
Previous Article in Journal
Counterfactual Graph Representation Learning for Fairness-Aware Cognitive Diagnosis
Previous Article in Special Issue
Virtual Reality Centric Stress Detection Using Dynamic Baseline Calibration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage

1
Department of Computer Science, Sangmyung University, 20 Hongjimoon-2gil, Jongno-gu, Seoul 03016, Republic of Korea
2
Department of Computer Engineering, Dankook University, 152 Jukjeon-ro, Suji-gu, Yongin-si 16890, Gyeonggi-do, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(2), 336; https://doi.org/10.3390/electronics15020336
Submission received: 28 December 2025 / Revised: 7 January 2026 / Accepted: 8 January 2026 / Published: 12 January 2026

Abstract

A cultural heritage augmented reality system overlays virtual information onto real-world heritage sites, enabling intuitive exploration and interpretation with spatial and temporal contexts. This study presents the design and implementation of a cognitive Mobile Augmented Reality (MAR) system that integrates KCHDM-based ontologies with large language models (LLMs) to facilitate intelligent exploration of urban heritage. While conventional AR guides often rely on static data, our system introduces a Semantic Retrieval-Augmented Generation (RAG) pipeline anchored in a structured knowledge base modeled after the Korean Cultural Heritage Data Model (KCHDM). This architecture enables the LLM to perform dynamic contextual reasoning, transforming heritage data into adaptive question-answering (Q&A) and interactive knowledge-testing quizzes that are precisely grounded in both historical and spatial contexts. The system supports on-site AR exploration and map-based remote exploration to ensure robust usability and precise spatial alignment of virtual content. To deliver a rich, multisensory experience, the system provides multimodal outputs, integrating text, images, models, and audio narration. Furthermore, the integration of a knowledge sharing repository allows users to review and learn from others’ inquires. This ontology-driven LLM-integrated MAR design enhances semantic accuracy and contextual relevance, demonstrating the potential of MAR for socially enriched urban heritage experiences.

1. Introduction

Unlike Virtual Reality (VR), which immerses users in a fully synthetic environment by decoupling them from physical reality, Augmented Reality (AR) enhances the perception of the real world by seamlessly integrating digital contents—such as 3D models, texts, and animations, etc.—into the user’s immediate physical context. In the cultural heritage domain, AR has emerged as a transformative technology that transcends traditional viewing experiences by facilitating situated learning and historical reconstruction. While early AR applications primarily focus on the visual restoration of lost artifacts, recent advancements in Mobile Augmented Reality (MAR) have enabled more sophisticated interactions. By leveraging the ubiquity of smartphones and tablets, MAR provides a platform for real-time contextual interpretation, where historical data is not merely overlaid but intelligently synchronized with the visitor’s precise location and orientation.
Consequently, MAR applications are being widely adopted in museums, tourism, and heritage education to enhance immersion, storytelling, and learning engagement. Previous studies have demonstrated that MAR promotes sensory immersion and behavioral guidance through gamification and historical reenactment [1,2,3,4,5,6,7]. Research has also shown that user satisfaction and learning outcomes can be optimized through interface designs that manage cognitive load while maximizing enjoyment [3,6]. Furthermore, to ensure seamless spatial alignment, active efforts are being made to leverage location-based and hybrid tracking technologies [4,8,9], with evidence suggesting that hybrid interfaces combining map-based navigation and AR overlays are more effective for exploration efficiency than AR-only approaches [8,9].
Despite these technological advancements, significant challenges remain. Most existing MAR-based cultural heritage systems are limited to static or pre-authored content delivery, offering little support for active user participation, adaptive interpretation, or knowledge construction [2]. These systems often rely on one-way information overlays and flat data structures, which restrict the depth of user engagement and fail to account for the complex, relational nature of historical facts. Moreover, while hybrid interfaces combining map-based navigation and AR visualization has improved exploration efficiency, its integration with intelligent, context-aware reasoning mechanisms—capable of generating dynamic, explanatory responses based on a structured knowledge base—remains largely underexplored. There is a critical need for systems that move beyond simple information provision toward a cognitive framework that can interpret and respond to the user’s spatial and semantic context in real time.
Recently, the integration of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) within Extended Reality (XR) environments has gained significant attention for its ability to enable natural language interaction, personalized feedback, and adaptive learning experiences. Current research trends indicate that the educational domain is a primary driver for LLM applications, where real-time analysis of learner input facilitates customized feedback in virtual classrooms and simulation-based training [10]. In cultural heritage contexts, LLM-based systems have been explored for interactive storytelling, conversational guidance, and contextual interpretation. However, LLMs operating over unstructured or loosely curated data are prone to factual inaccuracies and hallucination, which pose critical challenges in heritage interpretation. Retrieval-augmented generation (RAG) has emerged as a promising approach to address these limitations by grounding LLM responses in curated knowledge sources, yet its integration with mobile AR for in situ, spatially grounded heritage exploration remains limited.
To address these challenges, this study proposes a cognitive Mobile Augmented Reality (MAR) system that provides real-time, context-aware interpretation of urban cultural heritage sites. The system integrates map-based navigation and AR-based interaction with an LLM-driven semantic retrieval framework grounded in KCHDM (Korean Cultural Heritage Data Model) ontologies [11]. By leveraging the user’s geolocation and spatial context with a semantic RAG pipeline, the system transcends passive data delivery to support adaptive question answering and interactive knowledge-testing quizzes directly in situ. This approach ensures high factual reliability and explanatory depth, enabling the system to generate dynamic responses that reflect the complex relational history of artifacts. Ultimately, the proposed framework transforms cultural heritage exploration into an explanatory and exploratory learning experience, grounding intelligent, and natural language interactions within the user’s immediate physical environment.
This paper is structured as follows. Section 2 reviews the previous research on large language models (LLMs) and retrieval-augmented generation (RAG) in Extended Reality (XR) environments. Section 3 describes the design and implementation of the proposed RAG-based mobile augmented reality system for context-aware cultural heritage exploration and learning. Section 4 presents a real-world application case of the system, namely the Jongno-gu cognitive cultural heritage MAR guide. Section 5 reports the evaluation results, including quantitative performance analysis and expert-based qualitative assessment. Finally, Section 6 discusses the implications of the findings, limitations of the current approach, and directions for future research.

2. Related Work on Large Language Models in Extended Reality

Recent studies have explored the integration of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) in Extended Reality (XR) environments, including Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR), to enhance contextual understanding and user interaction. A comprehensive review by Tang et al. [10] reports that educational applications constitute the most active research area, in which LLMs analyze user input in real time to provide personalized feedback for learning and exploration. In cultural heritage contexts, LLM have been shown to interpret users’ questions and actions in exhibitions, historical site tours, and museum experiences, generating natural language-based commentary and narratives that enhance situational awareness and social interaction compared to static audio guides.
Within XR-based cultural and educational experiences, LLMs have been applied to generate natural language-based commentary, narratives, and conversational interactions. For example, CulturAI [12] demonstrated personalized interpretation in MR art exhibitions by analyzing visitors’ location, gaze, and conversation history, highlighting the potential of LLMs to support situational awareness and social interaction beyond static audio guides. Similarly, MAGICAL [13] and Virtual Albert Einstein [14] illustrated how LLM-driven storytelling and virtual characters can enhance engagement in museum and heritage contexts, while also identifying challenges related to hallucination, bias, and content verification.
LLM-based conversational agents have also been integrated into immersive VR and AR systems for education and well-being. Studies such as ARELE-bot [15] and AR-chatbot [16] reported improved engagement and learning motivation through the combination of AR visualization and chatbot feedback. In VR contexts, systems such as AI-based chatbot in VR [17], VRChat NPCs [18], Digital-SAT [19], and GPT-VR Nexus [20] explored emotional support, memory-based dialog, and natural language interaction, demonstrating that LLM-powered agents can significantly enhance immersion and personalization. However, these studies also noted technical limitations, including response appropriateness, contextual grounding, and interaction robustness.
More recent work has investigated multimodal and vision-language approaches in XR. Vision LLM-based interfaces have been proposed to support more intuitive search and interaction in immersive environments [21], suggesting the potential of visual–linguistic paradigms for future XR systems. Despite these advances, many existing LLM–XR applications rely primarily on unstructured data or generic language models, which can result in factual inaccuracies and weak grounding in domain-specific knowledge—particularly problematic in cultural heritage contexts where historical accuracy and spatial specificity are essential.
In summary, prior research demonstrates the strong potential of LLMs to enhance interaction, personalization, and immersion in XR environments. However, limitations remain in ensuring semantic accuracy, contextual grounding, and reliable knowledge generation. These challenges motivate the need for ontology-driven and retrieval-supported approaches that tightly integrate structured domain knowledge with LLM-based reasoning, which the proposed system addresses in the context of mobile augmented reality for urban cultural heritage exploration.

3. System Design and Implementation

This section describes the architecture and implementation of the proposed cognitive Mobile Augmented Reality (MAR) system, designed for context-aware cultural heritage exploration and learning. As illustrated in the system data flow block diagram (Figure 1), the framework integrates a structured knowledge base with an intelligent retrieval and generation pipeline to bridge the gap between physical heritage sites and digital historical data. Cultural Heritage Knowledge Base (CHKB) provides an ontology-based structured heritage data. A context encoding and semantic retrieval module performs semantic searches based on user context. An LLM-based question answering and quiz generation module generates adaptive Q&A and quiz content for heritage sites. The MAR client collects user interactions and spatial context and communicates with the backend, delivering multimodal outputs including text, image, audio narration, 3D models and AR overlays. The backend service orchestrates the end-to-end pipeline by processing user queries and spatial context in real time, ensuring that the delivered multimodal content is both factually accurate and spatially relevant.

3.1. KCHDM Ontology-Based Cultural Heritage Knowledge Base (CHKB)

To ensure reliable contextual reasoning in urban heritage sites, we constructed a Cultural Heritage Knowledge Base (CHKB) structured according to the Korea Cultural Heritage Data Model (KCHDM) ontology. This design aims to ensure metadata consistency and enable semantic interoperability with ontology-based inference systems, allowing the system to explicitly represent relational keys—such as temporal period, functional purposes, related events, and spatial place relationships—that are essential for nuanced contextual reasoning.
The cultural heritage knowledge base (CHKB) was constructed by aggregating raw cultural heritage descriptions from the National Heritage Administration’s (NHA) portal [22], which were subsequently restructured into an ontology-based information model aligned with the Korea Cultural Heritage Data Model (KCHDM) [11]. While the original NHA data provided fundamental attributes (e.g., name, type, classification, and detailed description), our schema transforms these into a multifaceted JSON-LD compatible structure. As shown in Table 1, each point-of-interest (POI) is defined as a structured entity that captures the comprehensive dimensions of a heritage site—specifically answering the Who, What, When, and How of its historical existence. Each field functions as an ontology property, enabling consistent subsequent retrieval and generation.
The POI entities in the CHKB are categorized into the following functional contexts:
  • Identity and Classification: Unique identifiers and heritage categories (e.g., heritage id, site name, type).
  • Temporal Context: Defined timeframes and significant historical events, such as construction or reconstruction periods.
  • Spatial Context: Precise geographical coordinates and hierarchical place relationships.
  • Interpretive Context: Symbolic meanings, original functions, and aesthetic evaluations.
  • Relational Context: Comparative analysis across different objects and events to support cross-referencing.
  • Resources and Provenance: Multimodal assets (images/URLs) to ensure data traceability and rich content delivery.
This ontology-driven approach is critical for adaptive Q&A and quiz generation. By constraining the LLM to verified, semantically consistent metadata units rather than relying on unstructured narrative text, the system significantly mitigates the risk of factual hallucination and ensures high-fidelity knowledge dissemination in the Mobile AR environment.

3.2. Context Encoding and Semantic Retrieval

The system performs context-aware retrieval by combining user-provided queries with real-time location tracking. Given that cultural heritage interpretation is inherently situational, retrieval is conditioned on both semantic similarity and context filters derived from user location, interaction mode, and query intent. At each interaction, the backend constructs a structured context object:
  • Spatial context: User location and heading, POI ID (nearest or selected), distance to POI
  • Interaction context: Mode (QA vs. Quiz), UI state (Map vs. AR)
  • Intent features: Question type such as what/when/why/how (used to shape retrieval emphasis and prompting)
This context object is passed to retrieval and generation modules and is also stored for analysis (e.g., contextual relevance assessment). Then, the semantic retrieval pipeline processes as follows:
  • Query embedding: The user query is embedded using a multilingual embedding model (e.g., text-multilingual-embedding-002).
  • Vector search: Cosine similarity is computed against the precomputed POI embedding stored in a vector index (e.g., FAISS).
  • Top-k selection: The system retrieves top-k evidence units (default k = 5).
  • Thresholding and metadata filtering: Retrieved items must exceed a similarity threshold (default 0.3) and may be filtered or re-ranked by metadata constraints such as POI match and spatial proximity.
  • Context block construction: Retrieved fields and snippets are assembled into a structured context block that preserves provenance (POI ID, field names, and source URLs) for use as grounding evidence.
All retrieval parameters (embedding model, index type, similarity metric, top-k, and threshold) are configurable and explicitly reported to support reproducibility.

3.3. LLM Question Answering and Quiz Generation with Validation

LLM-based QA and Quiz generation is implemented using Google Gemini (gemini-2.5-pro) accessed through the LangChain framework within the Flask backend. The system distinguishes between two interaction types: question answering and quiz. To ensure high-fidelity heritage interpretation, the system employs an evidence-grounded generation strategy that distinguishes between open-ended inquiry and structured knowledge testing.
The QA module focuses on transforming retrieved CHKB metadata into natural, explanatory narratives:
  • Prompt construction: The backend constructs a system prompt that includes the retrieved evidence block and instructs LLM to act as a Korean cultural heritage expert, providing concise, factual, and non-guess-based answers derived solely from the CHKB. The LLM is strictly instructed to act as a heritage expert, limiting its output to the provided factual grounding to prevent external hallucinations.
  • System prompt is “You are an expert cultural heritage guide. Users interact with real-world heritage sites through a mobile augmented reality application and pose contextual questions. Generate accurate, fact-based responses grounded strictly in the provided cultural heritage knowledge. Present explanations in a concise, accessible, and reader-friendly manner, suitable for both children and general audiences. Where applicable, incorporate historical period, material characteristics, functional aspects, and symbolic significance.”
  • Parameter tuning: LangChain’s ChatGoogleGenerativeAI interface is configured with a low temperature (T = 0.1) and specific safety settings to prioritize factual precision over creative generation.
  • Response handling: The generated answer is returned as text and optionally enriched with evidence identifiers for logging.
The quiz module transforms the cultural heritage data into interactive gamified learning content, enabling users to naturally learn and immerse themselves in cultural heritage within an AR environment:
  • Prompt constraints: The quiz prompt enforces a strict JSON schema output, requiring four distinct fields: question, choices (four answer choices), correct index, and a brief explanation based on the historical context.
  • System prompt is “Output only a single-line JSON following the schema {{“items”:[{{“question”:str,”choices”:[str,str,str,str],”correctIndex”:int,”explanation”:str}}]}}. Generate exactly {count} quiz items based strictly on the provided cultural heritage information. All text must be written in Korean. Each question should be fact-based, unambiguous, and suitable for stable cultural heritage knowledge. Avoid speculative or time-sensitive information.
  • Heuristic normalization: Upon receiving the LLM response, the backend executes a post-processing pipeline that removes duplicate distractors, validates the correct index range, and ensures the explanation aligns with the CHKB’s interpretive fields.
  • Output format: Validated quiz items are returned as structured JSON objects ready for direct rendering in the MAR client.
To address the reliability concerns inherent in LLM-based systems, a multi-layer validation protocol is implemented:
  • Format validation: The system ensures syntactic correctness of QA and quiz outputs.
  • Evidence consistency checks: The system verifies that key factual elements (e.g., period, function) appear in the retrieved ontology fields.
  • Abstention policy: If evidence is insufficient or inconsistent, the system returns a fallback response rather than speculative output.
  • Logging: For each request, the backend logs retrieval latency, end-to-end response time, selected evidence IDs, and interaction mode to a MySQL database via SQLAlchemy.

3.4. Mobile Augmented Reality Client and Interaction Flow

Figure 2 illustrates the context-aware interaction flow of the proposed system, detailing how user inputs in physical space are transformed into intelligent MAR outputs through the context encoding and reasoning pipeline. The interaction process is categorized into three primary stages:
  • User in physical space: The system continuously monitors the user’s real-world context, including location updates (via GPS, VPS, and Compass) and proximity to POIs. Users initiate interactions by selecting a mode (QA or Quiz) and submitting natural language queries through Voice or Text.
  • Context encoding and reasoning: The backend translates raw spatial data and queries into a structured reasoning task. It identifies the specific POI ID based on proximity, assesses the Query Intent (e.g., historical “Why” or temporal “When”), and applies the corresponding Interaction Mode. These parameters drive the KCHDM Ontology-grounded Reasoning, where semantic retrieval (FAISS) and LLM inference (Gemini) generate contextually accurate content.
  • MAR output: The generated results are delivered back to the client as Multimodal Content, including generated text, TTS audio, images, and 3D models. The system also supports interactive Quiz sessions and logs all User QA and Interaction data to the database for performance evaluation and personalized learning history.
Figure 2. Context-aware interaction flow.
Figure 2. Context-aware interaction flow.
Electronics 15 00336 g002
To execute this flow, the MAR client, developed in Unity with the ARCore Extensions for Geospatial SDK, serves as the primary interface for spatially grounded interaction.
  • Spatial tracking and localization: The Location Manager continuously monitors the user’s position and orientation. The AR Session and XR Origin synchronize the real-world and virtual coordinate spaces. The Geospatial SDK provides Earth-anchored localization, combining GPS, VPS, and sensor data to precisely align cultural heritage POIs with their physical locations.
  • Context and proximity control: The User Proximity Manager determines when a user enters a POI radius, triggering events such as loading the POI detail view or activating AR overlays.
  • Knowledge delivery and interaction: The QA Manager handles natural language inputs (such as “When was this building built?”) via text or voice and displays LLM-generated responses. The Quiz Provider parses structured JSON data from the backend into interactive AR learning objects.
  • Navigation interface: The Map Manager integrates the Google Static Map API to display 2D navigation with POI markers, while the Mode Toggle Manager enables smooth transitions between Map and AR modes.

4. Jongno-Gu Cognitive Cultural Heritage MAR Guide

This study implemented the proposed system as the Jongno-gu Cognitive Cultural Heritage MAR Guide, focusing on the historical center of Seoul, Republic of Korea. Jongno-gu was selected due to its high density of national heritage sites, including Gyeongbokgung Palace and Changdeokgung Palace. Out of 511 registered sites in the district, we curated 34 major POIs into the KCHDM-based CHKB, prioritizing sites with rich temporal and architectural metadata to validate the RAG-based Q&A and quiz generation.

4.1. Implementation of Multi-Modal Navigation and Activation

Figure 3 shows the Map View and POI Detail View interface of the Jongno-gu Cognitive Cultural Heritage MAR Guide application. The application manages exploration through a transition-based state machine that ensures seamless shifts between global navigation and local immersion.
  • Proximity-based discovery: Upon user authentication, the Map View displays nearby POIs based on real-time GPS coordinates, Google Elevation, and compass. As shown in the right image of Figure 3, when a user enters a 50 m radius of a POI, the User Proximity Manager automatically triggers the POI Detail View, integrating GPS and VPS data for precise site identification.
  • Multisensory detail delivery: The POI Detail View serves as a pre-exploration phase, delivering historical images, descriptive text, and automated audio narration via the text-to-speech (TTS) Manager. This allows users to consume foundational knowledge hands-free while visually observing the physical artifact.
Figure 3. Map view around the Gyeongbokgung Palace (left) and POI detail view of Changuimun Gate (middle) and Seokpajeong Annex (right).
Figure 3. Map view around the Gyeongbokgung Palace (left) and POI detail view of Changuimun Gate (middle) and Seokpajeong Annex (right).
Electronics 15 00336 g003

4.2. In-Site Learning via RAG-Driven Interaction

Once at a POI, the system enables deep engagement through the AR Mode, which is activated only when the user is within the spatial threshold.
  • Location-specific AR quizzes: As shown in the left image of Figure 4, the Quiz Provider overlays three sequence-based multiple-choice questions onto the live camera view. These quizzes are dynamically generated to reflect the specific historical context of the site, providing immediate feedback and explanatory clarifications for both correct and incorrect answers.
  • Context-aware semantic Q&A: Users can perform open-ended historical inquiries through voice or text input. For example, user query such as “What was this building used for?” at Hyangwonjeong Pavilion is processed through the RAG pipeline. The system retrieves specific CHKB fields—noting its function as a hexagonal pavilion for enjoying the scenic pond—and generates an expert-level response in real-time.

4.3. Multi-User Exploration and Knowledge Repository

To validate the system’s robustness in multi-user environments, we observed concurrent users at Gyeongbokgung Palace as shown in the right image of Figure 4.
  • Asynchronous knowledge sharing: The system maintains a knowledge sharing repository where users can review questions and answers previously generated by others. This feature allows the system to evolve from a static guide into a growing collective knowledge base.
  • Operational Resilience and Perceived Responsiveness: During on-site testing, the end-to-end RAG pipeline exhibited significant generation latency. Despite this, the system maintained user engagement by leveraging location-based context and multimodal feedback. This approach effectively masked processing gaps, ensuring that interaction interruptions did not disrupt the overall navigation flow.

5. System Evaluation and Results

5.1. Network Condition Evaluation for MAR-LLM Interaction

To assess whether network conditions significantly affect MAR–LLM interaction latency, we analyze and compare system performance across WiFi and 4G networks. Specifically, QA interactions under WiFi and 4G are contrasted with quiz-based interactions under an identical 4G condition to isolate network-related effects. Table 2 summarizes the distribution of experimental requests across all conditions, including the total number of requests, error cases, valid requests used for latency analysis, and corresponding error rates. A total of 129 requests were issued during the experiment, among which 10 requests resulted in network or system errors and were excluded from latency analysis. Consequently, the quantitative performance results reported in subsequent tables are based on 119 valid requests. Error rates varied across conditions, reflecting differences in interaction structure and response validation requirements.
Latency measurements were computed using only valid requests after excluding erroneous cases, and the results are summarized in Table 3, Table 4 and Table 5. Error cases were excluded from latency analysis to ensure a fair comparison of system performance under successful interaction conditions. Table 3 reports the end-to-end latency of MAR–LLM interactions under different network and interaction conditions, based exclusively on valid requests. The results show that question answering exhibits similar latency under WiFi and 4G networks, with median values of 17.56 s and 17.78 s, respectively, indicating that network conditions have a limited effect on overall response time. In contrast, quiz-based interaction under the same 4G network achieves substantially lower latency, with a median of 15.14 s, corresponding to a reduction of more than 2.6 s compared to QA over 4G. This result suggests that the observed performance improvement is not attributable to network transmission, but rather to differences in interaction design and response generation.
Table 4 summarizes network latency measured at the client side for successful MAR–LLM interactions. Consistent with the end-to-end results, QA interactions show comparable network latency under WiFi and 4G environments, with nearly identical median values. Notably, quiz-based interaction under 4G demonstrates lower network latency than QA under the same condition. Since both QA (4G) and Quiz (4G) share an identical network environment, this difference indicates that reduced network latency in the quiz condition reflects shorter overall response cycles rather than improved network quality. These findings further support the interpretation that network conditions are not the primary determinant of latency differences observed across interaction types.
Table 5 presents server-side LLM inference latency for valid requests across all conditions. The inference latency for QA remains almost identical under WIFI and 4G networks, confirming that server-side processing time is independent of client network conditions. In contrast, quiz-based interaction consistently requires less inference time, with a median latency of 14.91 s compared to approximately 17.2 s for QA. This reduction of over 2 s at the inference stage accounts for the majority of the end-to-end latency improvement observed in Table 3, demonstrating that the structured nature of quiz generation reduces the computational burden of LLM inference.
Figure 5 presents the distribution of latency measurements for three interaction conditions—QA (WiFi), QA (4G), and Quiz (4G)—based on valid requests only, using combined boxplot and violin plot visualizations. Each panel illustrates the latency decomposition of the MAR–LLM interaction pipeline, including end-to-end latency, network latency, and server-side LLM latency. Across all three latency metrics, question answering interactions under WiFi and 4G exhibit highly similar distributional patterns, with comparable medians and interquartile ranges. This observation indicates that, for QA-based interaction, network conditions have a limited impact on overall system latency, as latency is predominantly governed by server-side LLM inference. The server LLM latency distributions for QA (WiFi) and QA (4G) almost entirely overlap, reinforcing the interpretation that LLM processing time constitutes the dominant bottleneck in QA interactions.
In contrast, the quiz-based interaction under 4G shows a clear shift toward lower latency ranges across all three metrics. The violin plots reveal a denser concentration of samples at lower latency values, accompanied by a reduced median and fewer extreme delays compared to QA conditions. Notably, this reduction is consistently observed not only in end-to-end latency but also in server-side LLM latency, suggesting that the observed performance gain cannot be attributed solely to network factors. Instead, it reflects the reduced inference complexity and constrained response structure inherent to quiz-based interaction design.

5.2. Expert-Based Qualitative Evaluation

We also conducted an expert evaluation with twelve specialists in HCI, XR, and cultural heritage, most of whom possessed over a decade of professional experience (10–20+ years). In terms of primary specialization, XR (n = 5) accounted for the largest proportions, followed by HCI (n = 3), software-related fields (n = 3), and cultural heritage (n = 1), reflecting a balanced representation of both domain knowledge and interaction technology expertise. In addition, half of the evaluators (n = 6) reported prior experience in assessing digital cultural heritage content, while the remaining participants had not previously engaged in formal evaluation of such systems. This mix of evaluators with and without direct cultural heritage evaluation experience enabled the study to capture both expert domain-oriented perspectives and fresh usability-focused viewpoints, contributing to a more comprehensive assessment of the proposed system.
Table 6 summarizes the results of the expert evaluation across five assessment categories—semantic accuracy, contextual appropriateness, spatial context awareness, explanation quality, and comparative effectiveness. Each item was rated on a 5-point Likert scale (1 = strongly disagree, 5 = strongly agree), and mean scores are reported based on valid responses only. Overall, the results indicate consistently high expert ratings across all categories, reflecting strong factual reliability, contextual relevance, spatial awareness, and explanatory quality of the proposed system.
In addition to the quantitative Likert-scale results (Table 6), experts provided open-ended qualitative feedback on the strengths, limitations, and practical applicability of the proposed system. Overall, experts agreed that the system delivers highly accurate and reliable responses, particularly due to its strict grounding in structured cultural heritage knowledge. At the same time, they identified several areas for improvement related to knowledge coverage, response flexibility, and experiential guidance.
  • Perceived accuracy and reliability of LLM responses: Experts consistently reported that the system’s answers were factually accurate and trustworthy, especially for questions concerning the construction period, function, and historical role of major heritage sites (e.g., Geunjeongjeon, Gyeonghoeru). Several experts highlighted that when information was not available in the knowledge base, the system explicitly acknowledged this limitation rather than generating speculative content, which was regarded as a strong indicator of effective hallucination suppression.
  • Limitations in knowledge coverage and abstention behavior: While experts appreciated the system’s conservative response strategy, they noted that the frequent use of phrases such as “information unavailable” could negatively affect user experience in real-world scenarios. In particular, general inquiries (e.g., etymological meanings, modern restoration history, or supplementary historical context) were sometimes left unanswered due to the restricted scope of the CHKB. Experts suggested that selectively incorporating verified external sources (e.g., Wikipedia or official archival materials) through controlled prompting could mitigate this issue without compromising factual reliability.
  • Need for expanded and authoritative training data: A recurring theme was the necessity of broader and higher-quality historical corpora. Experts emphasized that integrating datasets from authoritative institutions (e.g., national historical archives) would reduce cases where broad questions about large sites (such as Gyeongbokgung) are unintentionally narrowed to specific substructures. This expansion was viewed as essential for improving contextual completeness and interpretive depth.
  • Bridging facts to on-site experience: Beyond factual correctness, experts underscored the importance of connecting explanations to physical, on-site experience. Experts proposed a Fact → Meaning → Experience structure, where factual descriptions are followed by interpretive insights and concrete guidance on what visitors should observe in their immediate surroundings. Such “actionable interpretation” was considered crucial for maximizing the educational value of mobile AR-based heritage exploration.
  • Applicability as a real-world heritage guide system: Experts generally agreed that the system is well suited for deployment as an on-site cultural heritage guide, particularly due to its location-aware interaction, flexible pacing, and ability to support personalized inquiry. Compared to traditional docent-led tours or static signboards, the system was perceived as more accommodating to individual exploration styles and diverse user needs. However, experts noted that adaptive explanation levels and improved readability (e.g., structured formatting) would further enhance usability and long-term engagement.

6. Discussion and Conclusions

This study investigated the design and evaluation of a RAG-based MAR system for cultural heritage exploration and learning, integrating structured cultural heritage knowledge base, map–AR hybrid navigation, and LLM-driven question answering and quiz interaction. By combining spatial context, ontology-based cultural heritage metadata, and LLM-based semantic reasoning, the proposed system moves beyond conventional AR guides that primarily deliver static, pre-scripted information.

6.1. Discussion

The evaluation results indicate that the performance and perceived effectiveness of the proposed RAG-based MAR system are shaped more by interaction design and knowledge grounding than by network conditions. Quantitative latency analysis revealed that system responsiveness is primarily constrained by server-side LLM inference rather than network transmission, with question answering interactions showing nearly identical latency distributions under WIFI and 4G environments. In contrast, quiz-based interaction achieved consistently lower latency, demonstrating that constraining generative complexity can significantly enhance system responsiveness.
Comparative analysis further confirms that while the network environment’s impact on total latency was limited, the end-to-end response time—with a median exceeding 17 s—remains substantially high for real-time Mobile AR (MAR) applications. According to prior XR studies, response times within 1–3 s are ideal for maintaining interaction flow, whereas delays exceeding 10 s typically disrupt immersion and user engagement [23]. Thus, the recorded latency highlights a significant gap between current RAG implementations and the requirements for seamless real-time interaction.
Despite these technical constraints, the system sustained immersion by leveraging physical movement and continuous location-based feedback to mask these processing gaps. By providing persistent sensory cues during navigation, the system prevented the perception of a stalled state, effectively maintaining user engagement throughout the temporal delay. Consequently, this work serves as a research prototype to validate the methodological feasibility of location-aware RAG architectures rather than a latency-optimized production service.
Expert-based quantitative evaluation further confirmed the system’s strengths, with consistently high ratings across semantic accuracy, contextual appropriateness, spatial awareness, explanation quality, and comparative effectiveness, highlighting the reliability of RAG-grounded responses and effective hallucination suppression. At the same time, qualitative expert commentary identified a key trade-off: while strict reliance on curated knowledge enhances trustworthiness, it can limit information coverage and lead to frequent abstention responses when expected background knowledge is missing. Experts also emphasized the need to extend explanations beyond factual delivery toward a “Fact → Meaning → Experience” structure, as well as to support adaptive explanation levels and expand authoritative cultural heritage corpora.
It is important to note that the present evaluation constitutes an expert-driven formative assessment, rather than a comprehensive user study. Although the participating experts provided valuable insights into semantic accuracy, contextual appropriateness, and system reliability, this evaluation does not directly measure usability, learning outcomes, or behavioral changes among end users such as tourists, students, or the general public. Consequently, our claims regarding “enhanced learning” or “social engagement” should be interpreted as design implications and preliminary potential, not as empirically validated outcomes. Controlled user studies and in situ field trials are required to rigorously assess learning effectiveness, engagement dynamics, and long-term usability, and are identified as a critical direction for future work.

6.2. Conclusions and Future Directions

Overall, this research demonstrates that a RAG-based MAR system can provide accurate, context-aware, and engaging cultural heritage interpretation when grounded in structured knowledge and spatial context. The proposed system successfully transforms cultural heritage exploration from passive information into an interactive, conversational, and exploratory learning experience, supporting both individual inquiry and active knowledge verification through collective knowledge sharing in real-world environments.
At the same time, the study reveals important design implications for future RAG-based MAR system. First, latency should be addressed not only through infrastructure-level optimization but also through interaction design strategies that reduce unnecessary generative complexity. Second, hallucination suppression through strict grounding is essential for cultural heritage applications, where factual reliability is critical. However, factual rigor and information coverage is a major practical hurdle. While abstention responses such as “information unavailable” enhance trustworthiness, they can diminish user satisfaction when commonly expected background information is absent from the knowledge base. This suggests the need for carefully controlled integration of verified external sources and transparent citation mechanisms.
Finally, expert commentary emphasized the importance of moving beyond fact delivery toward experience-oriented interpretation. In particular, experts suggested that explanations should adopt an actionable “Fact → Meaning → Experience” structure, in which historical facts are contextualized with interpretive meaning and explicitly connected to what users should observe or attend to in the physical environment. Such an approach can better bridge semantic understanding and embodied spatial experience, which is central to effective cultural heritage interpretation in mobile AR.
Future work will focus on improving system performance and practical scalability. To address latency constraints in real-time MAR interaction, we will investigate response optimization strategies such as semantic caching, embedding optimization, and the adoption of lightweight or task-specialized LLMs. To overcome limitations in knowledge coverage while preserving factual reliability, we plan to employ a tiered knowledge sourcing approach in which the ontology-based CHKB remains the primary trusted source, while selectively incorporating verified external repositories with explicit provenance control. Incremental CHKB expansion supported by expert-in-the-loop validation will further ensure long-term accuracy and maintainability.
Furthermore, building upon this expert-driven formative assessment, we will conduct controlled field trials with diverse end-user groups, including tourists and students. These studies will rigorously evaluate usability, learning effectiveness, and social engagement dynamics in authentic settings. Such evaluations are expected to provide deeper insight into how context-aware MAR-LLM interaction influences sustained use and learning behavior in authentic cultural heritage settings, and to inform design guidelines for deploying AI-augmented XR platforms in real-world heritage tourism and education contexts.

Author Contributions

All authors contributed equally to the manuscript—literature search, figures, system design, development, and system evaluation, as well as article writing and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no extra funding.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets generated and analyzed during this study (including system log data and expert evaluation results) are not publicly available due to privacy and ethical considerations, but can be made available by the corresponding author upon reasonable request. Source codes used in this study can also be provided upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, C.; Zhu, Y. A survey of museum applied research based on mobile augmented reality. Comput. Intell. Neurosci. 2022, 2022, 2926241. [Google Scholar] [CrossRef] [PubMed]
  2. Martínez-Carrillo, L.; del Moral-Pérez, M.E.; Villalustre-Martínez, L. Augmented reality to foster participatory cultural heritage activities: A systematic review. Heritage 2023, 6, 2161–2187. [Google Scholar] [CrossRef]
  3. De Paolis, L.T.; Gatto, C.; Corchia, L.; De Luca, V. Usability, user experience and mental workload in a mobile Augmented Reality application for digital storytelling in cultural heritage. Virtual Real. 2023, 27, 1117–1143. [Google Scholar] [CrossRef]
  4. Sprung, G.; Haxha, A. iFAR: MobileAR for Cultural Heritage. XCR 2020, 2618, 1–4. [Google Scholar]
  5. Veas, E.; Kleftodimos, A.; Evagelou, A.; Triantafyllidou, A.; Grigoriou, M.; Lappas, G. Location-based augmented reality for cultural heritage communication and education: The Doltso district application. Sensors 2023, 23, 4963. [Google Scholar] [CrossRef] [PubMed]
  6. Xu, N.; Li, Y.; Lin, J.; Yu, L.; Liang, H.N. User retention of mobile augmented reality for cultural heritage learning. In Proceedings of the 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Singapore, 17–21 October 2022; pp. 447–452. [Google Scholar]
  7. Pollalis, C.; Minor, E.J.; Westendorf, L.; Fahnbulleh, W.; Virgilio, I.; Kun, A.L.; Shaer, O. Evaluating learning with tangible and virtual representations of archaeological artifacts. In Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction, Stockholm, Sweden, 18–21 March 2018; pp. 626–637. [Google Scholar]
  8. Dünser, A.; Billinghurst, M.; Wen, J.; Lehtinen, V.; Nurminen, A. Exploring the use of handheld AR for outdoor navigation. Comput. Graph. 2012, 36, 1084–1095. [Google Scholar] [CrossRef]
  9. Matviienko, A.; Günther, S.; Ritzenhofen, S.; Mühlhäuser, M. AR sightseeing: Comparing information placements at outdoor historical heritage sites using augmented reality. Proc. ACM Hum.-Comput. Interact. 2022, 6, 1–17. [Google Scholar] [CrossRef]
  10. Tang, Y.; Situ, J.; Cui, A.Y.; Wu, M.; Huang, Y. LLM Integration in Extended Reality: A Comprehensive Review of Current Trends, Challenges, and Future Perspectives. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025; pp. 1–24. [Google Scholar]
  11. Kim, S.; Ahn, J.; Suh, J.; Kim, H.; Kim, J. Towards a semantic data infrastructure for heterogeneous Cultural Heritage data-challenges of Korean Cultural Heritage Data Model (KCHDM). In Proceedings of the 2015 Digital Heritage, Granada, Spain, 28 September–2 October 2015; Volume 2, pp. 275–282. [Google Scholar]
  12. Constantinides, N.; Constantinides, A.; Koukopoulos, D.; Fidas, C.; Belk, M. CulturAI: Exploring mixed reality art exhibitions with large language models for personalized immersive experiences. In Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, Cagliari, Italy, 1–4 July 2024; pp. 102–105. [Google Scholar]
  13. Trichopoulos, G. Large language models for cultural heritage. In Proceedings of the 2nd International Conference of the ACM Greek SIGCHI Chapter, Athens, Greece, 27–28 September 2023; pp. 1–5. [Google Scholar]
  14. Shoa, A.; Oliva, R.; Slater, M.; Friedman, D. Sushi with Einstein: Enhancing hybrid live events with LLM-based virtual humans. In Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents, Würzburg, Germany, 19–22 September 2023; pp. 1–6. [Google Scholar]
  15. Hajahmadi, S.; Clementi, L.; López, M.D.J.; Marfia, G. Arele-bot: Inclusive learning of spanish as a foreign language through a mobile app integrating augmented reality and chatgpt. In Proceedings of the 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Orlando, FL, USA, 16–21 March 2024; pp. 335–340. [Google Scholar]
  16. Chuang, C.H.; Lo, J.H.; Wu, Y.K. Integrating chatbot and augmented reality technology into biology learning during COVID-19. Electronics 2023, 12, 222. [Google Scholar] [CrossRef]
  17. Ren, X. Artificial intelligence and depression: How AI powered chatbots in virtual reality games may reduce anxiety and depression levels. J. Artif. Intell. Pract. 2020, 3, 48–58. [Google Scholar]
  18. Wan, H.; Zhang, J.; Suria, A.A.; Yao, B.; Wang, D.; Coady, Y.; Prpa, M. Building llm-based ai agents in social virtual reality. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; pp. 1–7. [Google Scholar]
  19. Matsumoto, A.; Kamita, T.; Tawaratsumida, Y.; Nakamura, A.; Fukuchimoto, H.; Mitamura, Y.; Inoue, T. Combined Use of Virtual Reality and a Chatbot Reduces Emotional Stress More Than Using Them Separately. J. Univers. Comput. Sci. 2021, 27, 1371–1389. [Google Scholar] [CrossRef]
  20. Chen, J.; Lan, T.; Li, B. GPT-VR Nexus: Chatgpt-powered immersive virtual reality experience. In Proceedings of the 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Orlando, FL, USA, 16–21 March 2024; pp. 1–2. [Google Scholar]
  21. Liu, C.; Cheung, C.S.; Xu, M.; Zhang, Z.; Su, M.; Fan, M. Toward Facilitating Search in VR With the Assistance of Vision Large Language Models. In Proceedings of the 30th ACM Symposium on Virtual Reality Software and Technology, Trier, Germany, 9–11 October 2024; pp. 1–14. [Google Scholar]
  22. National Heritage Administration Search Database. Available online: https://www.heritage.go.kr/heri/cul/culSelectView.do?pageNo=1_1_1_0 (accessed on 1 December 2025).
  23. Maslych, M.; Katebi, M.; Lee, C.; Hmaiti, Y.; Ghasemaghaei, A.; Pumarada, C.; Palmer, J.; Martinez, E.S.; Emporio, M.; Snipes, W.; et al. Mitigating Response Delays in Free-Form Conversations with LLM-powered Intelligent Virtual Agents. In Proceedings of the 7th ACM Conference on Conversational User Interfaces (CUI), Waterloo, ON, Canada, 8–10 July 2025; pp. 1–15. [Google Scholar]
Figure 1. Overall system data flow block diagram of the proposed RAG-based MAR.
Figure 1. Overall system data flow block diagram of the proposed RAG-based MAR.
Electronics 15 00336 g001
Figure 4. Quiz view (left), and QA view (right).
Figure 4. Quiz view (left), and QA view (right).
Electronics 15 00336 g004
Figure 5. Box and violin plots of latency decomposition for MAR-LLM interaction.
Figure 5. Box and violin plots of latency decomposition for MAR-LLM interaction.
Electronics 15 00336 g005
Table 1. Structured Data Schema for Point-of-Interest (POI).
Table 1. Structured Data Schema for Point-of-Interest (POI).
FieldRole and MeaningConnection to Ontology (KCHDM-Based Inference)
heritage_idUnique Identifier for the cultural heritage itemInstance ID of Object
site_nameName of the Point-of-Interest (POI)Object name
type, categoryClassification and categorization (e.g., National Treasure)Properties of Object
periodTime frame of construction or major event (Temporal context)Linked to Time or Event
locationGeographical location (Spatial context)Place
historical_contextKey historical events (e.g., construction, destruction, reconstruction)Event, Activity
religious_contextPresence or absence of religious significance/useContextual properties
architectural_structureDescription of the building’s form, dimensions, and compositionPhysical properties
material_and_techniqueMaterials used and construction techniques employedPhysical properties
symbolism_and_functionSymbolic meaning and original purpose/role (e.g., banquet hall, ritual site)Contextual properties
aesthetic_evaluationInterpretation and esthetic value assessmentInterpretive information properties
comparative_analysisInformation comparing the item across different periods or typesRelations between Objects and Events
image_resourcesFile path or URL for associated image contentLinked to web resource (has representation)
url_resourcesURL of the original data source (e.g., National Heritage portal)Linked to web resource (has URL)
Table 2. Experimental Request Summary.
Table 2. Experimental Request Summary.
ConditionTotal RequestsErrorsValid RequestsError Rate (%)
Question Answering (WIFI)3743310.81
Question Answering (4G)711701.41
Quiz (4G)2151623.81
Total129101197.75
Table 3. MAR End-to-End Latency (Error Excluded) (ms).
Table 3. MAR End-to-End Latency (Error Excluded) (ms).
ConditionAverageStd. DevMedianMinMax
Question Answering (WIFI)17,459.21349.817,564.014,886.019,706.0
Question Answering (4G)17,697.61957.517,779.013,874.023,578.0
Quiz (4G)15,029.22270.515,144.011,864.019,186.0
Table 4. MAR Network Latency (Error Excluded) (ms).
Table 4. MAR Network Latency (Error Excluded) (ms).
ConditionAverageStd. DevMedianMinMax
Question Answering (WIFI)17,382.81359.917,399.014,882.019,657.0
Question Answering (4G)17,534.91958.817,689.013,807.023,535.0
Quiz (4G)14,997.02264.915,109.011,827.019,134.0
Table 5. Server LLM Latency (Error Excluded) (ms).
Table 5. Server LLM Latency (Error Excluded) (ms).
ConditionAverageStd. DevMedianMinMax
Question Answering (WIFI)17,181.01332.717,298.014,741.019,525.0
Question Answering (4G)17,182.91875.117,143.513,663.023,015.0
Quiz (4G)14,778.82258.014,908.011,409.018,936.0
Table 6. Results of Expert Evaluation using a 5-point Likert Scale.
Table 6. Results of Expert Evaluation using a 5-point Likert Scale.
CategoryStatementMean
Semantic AccuracyThe responses are grounded in historical and cultural facts.4.92
The responses contain no obvious errors or incorrect information.4.67
The system accurately understands and addresses the core intent of the question.4.67
No speculative or hallucinatory information is observed in the responses.4.67
From an expert perspective, the explanations are considered trustworthy.4.67
Contextual AppropriatenessThe responses are appropriate for the question context (on-site exploration and viewing situation).4.67
The level of explanation (difficulty and length) is appropriate for the question.4.58
The responses provide meaningful interpretation rather than simple factual listing.4.42
The explanations appropriately reflect cultural heritage context (period, function, symbolism).4.50
From an expert perspective, the explanations are considered reliable.4.67
Spatial Context AwarenessThe responses accurately reflect the user’s current location (POI).4.67
The responses consider the architectural elements currently viewed by the user.4.83
Direction, distance, and spatial references are consistent with the real environment.4.50
Explanation QualityThe structure of the explanation is logically well organized.4.50
The explanation is delivered clearly without unnecessary repetition.4.42
The explanation uses expressions understandable to general users.4.58
The content is considered suitable for use as cultural heritage interpretation material.4.75
Comparative EffectivenessAccuracy of information delivery compared to traditional signboards or static explanations.4.50
Superiority of context-aware explanation compared to conventional methods.4.50
Contribution to improving understanding of cultural heritage.4.75
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cho, Y.; Park, K.S. A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage. Electronics 2026, 15, 336. https://doi.org/10.3390/electronics15020336

AMA Style

Cho Y, Park KS. A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage. Electronics. 2026; 15(2):336. https://doi.org/10.3390/electronics15020336

Chicago/Turabian Style

Cho, Yongjoo, and Kyoung Shin Park. 2026. "A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage" Electronics 15, no. 2: 336. https://doi.org/10.3390/electronics15020336

APA Style

Cho, Y., & Park, K. S. (2026). A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage. Electronics, 15(2), 336. https://doi.org/10.3390/electronics15020336

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop