A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage
Abstract
1. Introduction
2. Related Work on Large Language Models in Extended Reality
3. System Design and Implementation
3.1. KCHDM Ontology-Based Cultural Heritage Knowledge Base (CHKB)
- Identity and Classification: Unique identifiers and heritage categories (e.g., heritage id, site name, type).
- Temporal Context: Defined timeframes and significant historical events, such as construction or reconstruction periods.
- Spatial Context: Precise geographical coordinates and hierarchical place relationships.
- Interpretive Context: Symbolic meanings, original functions, and aesthetic evaluations.
- Relational Context: Comparative analysis across different objects and events to support cross-referencing.
- Resources and Provenance: Multimodal assets (images/URLs) to ensure data traceability and rich content delivery.
3.2. Context Encoding and Semantic Retrieval
- Spatial context: User location and heading, POI ID (nearest or selected), distance to POI
- Interaction context: Mode (QA vs. Quiz), UI state (Map vs. AR)
- Intent features: Question type such as what/when/why/how (used to shape retrieval emphasis and prompting)
- Query embedding: The user query is embedded using a multilingual embedding model (e.g., text-multilingual-embedding-002).
- Vector search: Cosine similarity is computed against the precomputed POI embedding stored in a vector index (e.g., FAISS).
- Top-k selection: The system retrieves top-k evidence units (default k = 5).
- Thresholding and metadata filtering: Retrieved items must exceed a similarity threshold (default 0.3) and may be filtered or re-ranked by metadata constraints such as POI match and spatial proximity.
- Context block construction: Retrieved fields and snippets are assembled into a structured context block that preserves provenance (POI ID, field names, and source URLs) for use as grounding evidence.
3.3. LLM Question Answering and Quiz Generation with Validation
- Prompt construction: The backend constructs a system prompt that includes the retrieved evidence block and instructs LLM to act as a Korean cultural heritage expert, providing concise, factual, and non-guess-based answers derived solely from the CHKB. The LLM is strictly instructed to act as a heritage expert, limiting its output to the provided factual grounding to prevent external hallucinations.
- System prompt is “You are an expert cultural heritage guide. Users interact with real-world heritage sites through a mobile augmented reality application and pose contextual questions. Generate accurate, fact-based responses grounded strictly in the provided cultural heritage knowledge. Present explanations in a concise, accessible, and reader-friendly manner, suitable for both children and general audiences. Where applicable, incorporate historical period, material characteristics, functional aspects, and symbolic significance.”
- Parameter tuning: LangChain’s ChatGoogleGenerativeAI interface is configured with a low temperature (T = 0.1) and specific safety settings to prioritize factual precision over creative generation.
- Response handling: The generated answer is returned as text and optionally enriched with evidence identifiers for logging.
- Prompt constraints: The quiz prompt enforces a strict JSON schema output, requiring four distinct fields: question, choices (four answer choices), correct index, and a brief explanation based on the historical context.
- System prompt is “Output only a single-line JSON following the schema {{“items”:[{{“question”:str,”choices”:[str,str,str,str],”correctIndex”:int,”explanation”:str}}]}}. Generate exactly {count} quiz items based strictly on the provided cultural heritage information. All text must be written in Korean. Each question should be fact-based, unambiguous, and suitable for stable cultural heritage knowledge. Avoid speculative or time-sensitive information.
- Heuristic normalization: Upon receiving the LLM response, the backend executes a post-processing pipeline that removes duplicate distractors, validates the correct index range, and ensures the explanation aligns with the CHKB’s interpretive fields.
- Output format: Validated quiz items are returned as structured JSON objects ready for direct rendering in the MAR client.
- Format validation: The system ensures syntactic correctness of QA and quiz outputs.
- Evidence consistency checks: The system verifies that key factual elements (e.g., period, function) appear in the retrieved ontology fields.
- Abstention policy: If evidence is insufficient or inconsistent, the system returns a fallback response rather than speculative output.
- Logging: For each request, the backend logs retrieval latency, end-to-end response time, selected evidence IDs, and interaction mode to a MySQL database via SQLAlchemy.
3.4. Mobile Augmented Reality Client and Interaction Flow
- User in physical space: The system continuously monitors the user’s real-world context, including location updates (via GPS, VPS, and Compass) and proximity to POIs. Users initiate interactions by selecting a mode (QA or Quiz) and submitting natural language queries through Voice or Text.
- Context encoding and reasoning: The backend translates raw spatial data and queries into a structured reasoning task. It identifies the specific POI ID based on proximity, assesses the Query Intent (e.g., historical “Why” or temporal “When”), and applies the corresponding Interaction Mode. These parameters drive the KCHDM Ontology-grounded Reasoning, where semantic retrieval (FAISS) and LLM inference (Gemini) generate contextually accurate content.
- MAR output: The generated results are delivered back to the client as Multimodal Content, including generated text, TTS audio, images, and 3D models. The system also supports interactive Quiz sessions and logs all User QA and Interaction data to the database for performance evaluation and personalized learning history.

- Spatial tracking and localization: The Location Manager continuously monitors the user’s position and orientation. The AR Session and XR Origin synchronize the real-world and virtual coordinate spaces. The Geospatial SDK provides Earth-anchored localization, combining GPS, VPS, and sensor data to precisely align cultural heritage POIs with their physical locations.
- Context and proximity control: The User Proximity Manager determines when a user enters a POI radius, triggering events such as loading the POI detail view or activating AR overlays.
- Knowledge delivery and interaction: The QA Manager handles natural language inputs (such as “When was this building built?”) via text or voice and displays LLM-generated responses. The Quiz Provider parses structured JSON data from the backend into interactive AR learning objects.
- Navigation interface: The Map Manager integrates the Google Static Map API to display 2D navigation with POI markers, while the Mode Toggle Manager enables smooth transitions between Map and AR modes.
4. Jongno-Gu Cognitive Cultural Heritage MAR Guide
4.1. Implementation of Multi-Modal Navigation and Activation
- Proximity-based discovery: Upon user authentication, the Map View displays nearby POIs based on real-time GPS coordinates, Google Elevation, and compass. As shown in the right image of Figure 3, when a user enters a 50 m radius of a POI, the User Proximity Manager automatically triggers the POI Detail View, integrating GPS and VPS data for precise site identification.
- Multisensory detail delivery: The POI Detail View serves as a pre-exploration phase, delivering historical images, descriptive text, and automated audio narration via the text-to-speech (TTS) Manager. This allows users to consume foundational knowledge hands-free while visually observing the physical artifact.

4.2. In-Site Learning via RAG-Driven Interaction
- Location-specific AR quizzes: As shown in the left image of Figure 4, the Quiz Provider overlays three sequence-based multiple-choice questions onto the live camera view. These quizzes are dynamically generated to reflect the specific historical context of the site, providing immediate feedback and explanatory clarifications for both correct and incorrect answers.
- Context-aware semantic Q&A: Users can perform open-ended historical inquiries through voice or text input. For example, user query such as “What was this building used for?” at Hyangwonjeong Pavilion is processed through the RAG pipeline. The system retrieves specific CHKB fields—noting its function as a hexagonal pavilion for enjoying the scenic pond—and generates an expert-level response in real-time.
4.3. Multi-User Exploration and Knowledge Repository
- Asynchronous knowledge sharing: The system maintains a knowledge sharing repository where users can review questions and answers previously generated by others. This feature allows the system to evolve from a static guide into a growing collective knowledge base.
- Operational Resilience and Perceived Responsiveness: During on-site testing, the end-to-end RAG pipeline exhibited significant generation latency. Despite this, the system maintained user engagement by leveraging location-based context and multimodal feedback. This approach effectively masked processing gaps, ensuring that interaction interruptions did not disrupt the overall navigation flow.
5. System Evaluation and Results
5.1. Network Condition Evaluation for MAR-LLM Interaction
5.2. Expert-Based Qualitative Evaluation
- Perceived accuracy and reliability of LLM responses: Experts consistently reported that the system’s answers were factually accurate and trustworthy, especially for questions concerning the construction period, function, and historical role of major heritage sites (e.g., Geunjeongjeon, Gyeonghoeru). Several experts highlighted that when information was not available in the knowledge base, the system explicitly acknowledged this limitation rather than generating speculative content, which was regarded as a strong indicator of effective hallucination suppression.
- Limitations in knowledge coverage and abstention behavior: While experts appreciated the system’s conservative response strategy, they noted that the frequent use of phrases such as “information unavailable” could negatively affect user experience in real-world scenarios. In particular, general inquiries (e.g., etymological meanings, modern restoration history, or supplementary historical context) were sometimes left unanswered due to the restricted scope of the CHKB. Experts suggested that selectively incorporating verified external sources (e.g., Wikipedia or official archival materials) through controlled prompting could mitigate this issue without compromising factual reliability.
- Need for expanded and authoritative training data: A recurring theme was the necessity of broader and higher-quality historical corpora. Experts emphasized that integrating datasets from authoritative institutions (e.g., national historical archives) would reduce cases where broad questions about large sites (such as Gyeongbokgung) are unintentionally narrowed to specific substructures. This expansion was viewed as essential for improving contextual completeness and interpretive depth.
- Bridging facts to on-site experience: Beyond factual correctness, experts underscored the importance of connecting explanations to physical, on-site experience. Experts proposed a Fact → Meaning → Experience structure, where factual descriptions are followed by interpretive insights and concrete guidance on what visitors should observe in their immediate surroundings. Such “actionable interpretation” was considered crucial for maximizing the educational value of mobile AR-based heritage exploration.
- Applicability as a real-world heritage guide system: Experts generally agreed that the system is well suited for deployment as an on-site cultural heritage guide, particularly due to its location-aware interaction, flexible pacing, and ability to support personalized inquiry. Compared to traditional docent-led tours or static signboards, the system was perceived as more accommodating to individual exploration styles and diverse user needs. However, experts noted that adaptive explanation levels and improved readability (e.g., structured formatting) would further enhance usability and long-term engagement.
6. Discussion and Conclusions
6.1. Discussion
6.2. Conclusions and Future Directions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, C.; Zhu, Y. A survey of museum applied research based on mobile augmented reality. Comput. Intell. Neurosci. 2022, 2022, 2926241. [Google Scholar] [CrossRef] [PubMed]
- Martínez-Carrillo, L.; del Moral-Pérez, M.E.; Villalustre-Martínez, L. Augmented reality to foster participatory cultural heritage activities: A systematic review. Heritage 2023, 6, 2161–2187. [Google Scholar] [CrossRef]
- De Paolis, L.T.; Gatto, C.; Corchia, L.; De Luca, V. Usability, user experience and mental workload in a mobile Augmented Reality application for digital storytelling in cultural heritage. Virtual Real. 2023, 27, 1117–1143. [Google Scholar] [CrossRef]
- Sprung, G.; Haxha, A. iFAR: MobileAR for Cultural Heritage. XCR 2020, 2618, 1–4. [Google Scholar]
- Veas, E.; Kleftodimos, A.; Evagelou, A.; Triantafyllidou, A.; Grigoriou, M.; Lappas, G. Location-based augmented reality for cultural heritage communication and education: The Doltso district application. Sensors 2023, 23, 4963. [Google Scholar] [CrossRef] [PubMed]
- Xu, N.; Li, Y.; Lin, J.; Yu, L.; Liang, H.N. User retention of mobile augmented reality for cultural heritage learning. In Proceedings of the 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Singapore, 17–21 October 2022; pp. 447–452. [Google Scholar]
- Pollalis, C.; Minor, E.J.; Westendorf, L.; Fahnbulleh, W.; Virgilio, I.; Kun, A.L.; Shaer, O. Evaluating learning with tangible and virtual representations of archaeological artifacts. In Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction, Stockholm, Sweden, 18–21 March 2018; pp. 626–637. [Google Scholar]
- Dünser, A.; Billinghurst, M.; Wen, J.; Lehtinen, V.; Nurminen, A. Exploring the use of handheld AR for outdoor navigation. Comput. Graph. 2012, 36, 1084–1095. [Google Scholar] [CrossRef]
- Matviienko, A.; Günther, S.; Ritzenhofen, S.; Mühlhäuser, M. AR sightseeing: Comparing information placements at outdoor historical heritage sites using augmented reality. Proc. ACM Hum.-Comput. Interact. 2022, 6, 1–17. [Google Scholar] [CrossRef]
- Tang, Y.; Situ, J.; Cui, A.Y.; Wu, M.; Huang, Y. LLM Integration in Extended Reality: A Comprehensive Review of Current Trends, Challenges, and Future Perspectives. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025; pp. 1–24. [Google Scholar]
- Kim, S.; Ahn, J.; Suh, J.; Kim, H.; Kim, J. Towards a semantic data infrastructure for heterogeneous Cultural Heritage data-challenges of Korean Cultural Heritage Data Model (KCHDM). In Proceedings of the 2015 Digital Heritage, Granada, Spain, 28 September–2 October 2015; Volume 2, pp. 275–282. [Google Scholar]
- Constantinides, N.; Constantinides, A.; Koukopoulos, D.; Fidas, C.; Belk, M. CulturAI: Exploring mixed reality art exhibitions with large language models for personalized immersive experiences. In Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, Cagliari, Italy, 1–4 July 2024; pp. 102–105. [Google Scholar]
- Trichopoulos, G. Large language models for cultural heritage. In Proceedings of the 2nd International Conference of the ACM Greek SIGCHI Chapter, Athens, Greece, 27–28 September 2023; pp. 1–5. [Google Scholar]
- Shoa, A.; Oliva, R.; Slater, M.; Friedman, D. Sushi with Einstein: Enhancing hybrid live events with LLM-based virtual humans. In Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents, Würzburg, Germany, 19–22 September 2023; pp. 1–6. [Google Scholar]
- Hajahmadi, S.; Clementi, L.; López, M.D.J.; Marfia, G. Arele-bot: Inclusive learning of spanish as a foreign language through a mobile app integrating augmented reality and chatgpt. In Proceedings of the 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Orlando, FL, USA, 16–21 March 2024; pp. 335–340. [Google Scholar]
- Chuang, C.H.; Lo, J.H.; Wu, Y.K. Integrating chatbot and augmented reality technology into biology learning during COVID-19. Electronics 2023, 12, 222. [Google Scholar] [CrossRef]
- Ren, X. Artificial intelligence and depression: How AI powered chatbots in virtual reality games may reduce anxiety and depression levels. J. Artif. Intell. Pract. 2020, 3, 48–58. [Google Scholar]
- Wan, H.; Zhang, J.; Suria, A.A.; Yao, B.; Wang, D.; Coady, Y.; Prpa, M. Building llm-based ai agents in social virtual reality. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; pp. 1–7. [Google Scholar]
- Matsumoto, A.; Kamita, T.; Tawaratsumida, Y.; Nakamura, A.; Fukuchimoto, H.; Mitamura, Y.; Inoue, T. Combined Use of Virtual Reality and a Chatbot Reduces Emotional Stress More Than Using Them Separately. J. Univers. Comput. Sci. 2021, 27, 1371–1389. [Google Scholar] [CrossRef]
- Chen, J.; Lan, T.; Li, B. GPT-VR Nexus: Chatgpt-powered immersive virtual reality experience. In Proceedings of the 2024 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Orlando, FL, USA, 16–21 March 2024; pp. 1–2. [Google Scholar]
- Liu, C.; Cheung, C.S.; Xu, M.; Zhang, Z.; Su, M.; Fan, M. Toward Facilitating Search in VR With the Assistance of Vision Large Language Models. In Proceedings of the 30th ACM Symposium on Virtual Reality Software and Technology, Trier, Germany, 9–11 October 2024; pp. 1–14. [Google Scholar]
- National Heritage Administration Search Database. Available online: https://www.heritage.go.kr/heri/cul/culSelectView.do?pageNo=1_1_1_0 (accessed on 1 December 2025).
- Maslych, M.; Katebi, M.; Lee, C.; Hmaiti, Y.; Ghasemaghaei, A.; Pumarada, C.; Palmer, J.; Martinez, E.S.; Emporio, M.; Snipes, W.; et al. Mitigating Response Delays in Free-Form Conversations with LLM-powered Intelligent Virtual Agents. In Proceedings of the 7th ACM Conference on Conversational User Interfaces (CUI), Waterloo, ON, Canada, 8–10 July 2025; pp. 1–15. [Google Scholar]



| Field | Role and Meaning | Connection to Ontology (KCHDM-Based Inference) |
|---|---|---|
| heritage_id | Unique Identifier for the cultural heritage item | Instance ID of Object |
| site_name | Name of the Point-of-Interest (POI) | Object name |
| type, category | Classification and categorization (e.g., National Treasure) | Properties of Object |
| period | Time frame of construction or major event (Temporal context) | Linked to Time or Event |
| location | Geographical location (Spatial context) | Place |
| historical_context | Key historical events (e.g., construction, destruction, reconstruction) | Event, Activity |
| religious_context | Presence or absence of religious significance/use | Contextual properties |
| architectural_structure | Description of the building’s form, dimensions, and composition | Physical properties |
| material_and_technique | Materials used and construction techniques employed | Physical properties |
| symbolism_and_function | Symbolic meaning and original purpose/role (e.g., banquet hall, ritual site) | Contextual properties |
| aesthetic_evaluation | Interpretation and esthetic value assessment | Interpretive information properties |
| comparative_analysis | Information comparing the item across different periods or types | Relations between Objects and Events |
| image_resources | File path or URL for associated image content | Linked to web resource (has representation) |
| url_resources | URL of the original data source (e.g., National Heritage portal) | Linked to web resource (has URL) |
| Condition | Total Requests | Errors | Valid Requests | Error Rate (%) |
|---|---|---|---|---|
| Question Answering (WIFI) | 37 | 4 | 33 | 10.81 |
| Question Answering (4G) | 71 | 1 | 70 | 1.41 |
| Quiz (4G) | 21 | 5 | 16 | 23.81 |
| Total | 129 | 10 | 119 | 7.75 |
| Condition | Average | Std. Dev | Median | Min | Max |
|---|---|---|---|---|---|
| Question Answering (WIFI) | 17,459.2 | 1349.8 | 17,564.0 | 14,886.0 | 19,706.0 |
| Question Answering (4G) | 17,697.6 | 1957.5 | 17,779.0 | 13,874.0 | 23,578.0 |
| Quiz (4G) | 15,029.2 | 2270.5 | 15,144.0 | 11,864.0 | 19,186.0 |
| Condition | Average | Std. Dev | Median | Min | Max |
|---|---|---|---|---|---|
| Question Answering (WIFI) | 17,382.8 | 1359.9 | 17,399.0 | 14,882.0 | 19,657.0 |
| Question Answering (4G) | 17,534.9 | 1958.8 | 17,689.0 | 13,807.0 | 23,535.0 |
| Quiz (4G) | 14,997.0 | 2264.9 | 15,109.0 | 11,827.0 | 19,134.0 |
| Condition | Average | Std. Dev | Median | Min | Max |
|---|---|---|---|---|---|
| Question Answering (WIFI) | 17,181.0 | 1332.7 | 17,298.0 | 14,741.0 | 19,525.0 |
| Question Answering (4G) | 17,182.9 | 1875.1 | 17,143.5 | 13,663.0 | 23,015.0 |
| Quiz (4G) | 14,778.8 | 2258.0 | 14,908.0 | 11,409.0 | 18,936.0 |
| Category | Statement | Mean |
|---|---|---|
| Semantic Accuracy | The responses are grounded in historical and cultural facts. | 4.92 |
| The responses contain no obvious errors or incorrect information. | 4.67 | |
| The system accurately understands and addresses the core intent of the question. | 4.67 | |
| No speculative or hallucinatory information is observed in the responses. | 4.67 | |
| From an expert perspective, the explanations are considered trustworthy. | 4.67 | |
| Contextual Appropriateness | The responses are appropriate for the question context (on-site exploration and viewing situation). | 4.67 |
| The level of explanation (difficulty and length) is appropriate for the question. | 4.58 | |
| The responses provide meaningful interpretation rather than simple factual listing. | 4.42 | |
| The explanations appropriately reflect cultural heritage context (period, function, symbolism). | 4.50 | |
| From an expert perspective, the explanations are considered reliable. | 4.67 | |
| Spatial Context Awareness | The responses accurately reflect the user’s current location (POI). | 4.67 |
| The responses consider the architectural elements currently viewed by the user. | 4.83 | |
| Direction, distance, and spatial references are consistent with the real environment. | 4.50 | |
| Explanation Quality | The structure of the explanation is logically well organized. | 4.50 |
| The explanation is delivered clearly without unnecessary repetition. | 4.42 | |
| The explanation uses expressions understandable to general users. | 4.58 | |
| The content is considered suitable for use as cultural heritage interpretation material. | 4.75 | |
| Comparative Effectiveness | Accuracy of information delivery compared to traditional signboards or static explanations. | 4.50 |
| Superiority of context-aware explanation compared to conventional methods. | 4.50 | |
| Contribution to improving understanding of cultural heritage. | 4.75 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Cho, Y.; Park, K.S. A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage. Electronics 2026, 15, 336. https://doi.org/10.3390/electronics15020336
Cho Y, Park KS. A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage. Electronics. 2026; 15(2):336. https://doi.org/10.3390/electronics15020336
Chicago/Turabian StyleCho, Yongjoo, and Kyoung Shin Park. 2026. "A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage" Electronics 15, no. 2: 336. https://doi.org/10.3390/electronics15020336
APA StyleCho, Y., & Park, K. S. (2026). A Mobile Augmented Reality Integrating KCHDM-Based Ontologies with LLMs for Adaptive Q&A and Knowledge Testing in Urban Heritage. Electronics, 15(2), 336. https://doi.org/10.3390/electronics15020336
