Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models

Sánchez-Berriel, Isabel; Pérez-Nava, Fernando; Pérez-Rosario, Lucas

doi:10.3390/electronics14122478

Open AccessArticle

Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models

by

Isabel Sánchez-Berriel

^1,*

,

Fernando Pérez-Nava

¹

and

Lucas Pérez-Rosario

²

¹

Department of Ingeniería Informática y de Sistemas, Escuela Técnica Superior de Ingeniería Informatica, Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain

²

Escuela Técnica Superior de Ingeniería Informática, Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(12), 2478; https://doi.org/10.3390/electronics14122478

Submission received: 1 April 2025 / Revised: 6 June 2025 / Accepted: 10 June 2025 / Published: 18 June 2025

(This article belongs to the Special Issue Recent Advances in Virtual Reality and Computer Vision Based on Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

In recent years, Virtual Reality (VR) has emerged as a powerful tool for disseminating Cultural Heritage (CH), often incorporating Virtual Humans (VHs) to guide users through historical recreations. The advent of Large Language Models (LLMs) now enables natural, unscripted communication with these VHs, even on limited devices. This paper details a natural interaction system for VHs within a VR application of San Cristóbal de La Laguna, a UNESCO World Heritage Site. Our system integrates Speech-to-Text, LLM-based dialogue generation, and Text-to-Speech synthesis. Adhering to user-centered design (UCD) principles, we conducted two studies: a preliminary study revealing user interest in historically adapted language, and a qualitative test that identified key user experience improvements, such as incorporating feedback mechanisms and gender selection for VHs. The project successfully developed a prioritized user experience, focusing on usability evaluation, immersion, and dialogue quality. We propose a generalist methodology and recommendations for integrating unscripted VH dialogue in VR. However, limitations include dialogue generation latency and reduced quality in non-English languages. While a formative usability test evaluated the process, the small sample size restricts broad generalizations about user behavior.

Keywords:

natural interaction; virtual humans; user experience; virtual reality; prompt engineering; Large Language Models; cultural heritage

1. Introduction

There is a growing commitment to preserving and restoring Cultural Heritage (CH) by promoting improved management, study, and dissemination. Given that the passage of time sometimes irreversibly degrades CH, it is essential to recover the original memory of historic buildings as well as urban and landscape environments. Information and communication technologies are fundamental for the preservation of CH; their use has been proven to be essential in the conservation, preservation, and dissemination of Cultural Heritage. The types of technologies that have been used for this purpose globally between 2018 and 2022 are mainly 3D digital technologies, including AR/VR (immersive and non-immersive) [1]. From the point of view of CH dissemination, the use of these technologies has many advantages, including the ability to create digital replicas that can be used for educational and exhibition purposes [2]. The widespread use of mobile devices has influenced the dissemination of knowledge through Virtual Reality, augmented reality, and mixed reality, allowing users to enjoy these works in a much more immersive way. The computational capabilities of modern devices enable the creation, exploration, and interaction with virtual content. With these materials, the public gains access to otherwise inaccessible environments. Users can interact with the virtual world, opening new ways of experiencing CH.

In 3D virtual spaces, there are more possibilities for interaction because users can explore, move, change their point of view, or even manipulate objects [3]. In VR experiences, users travel to different landscapes, even to past eras, creating a feeling of being part of the virtual environment. Users perceive greater immersion when additional audio or tactile stimuli are incorporated [4]. Furthermore, this type of application in CH has been used together with storytelling, increasing immersion and enhancing heritage appreciation [5]. The use of digital storytelling combined with the virtual reconstruction of CH reaches a broader audience than conventional dissemination tools, improving the communication of cultural content and increasing interest in learning about these environments [6]. Digital resources, such as 3D reconstructions of cultural monuments and sites, must be presented appropriately to provide a meaningful experience. In this context, exploring new methods to increase immersion, the sense of presence, and the user’s ability to act within the VR or AR experience is particularly interesting. In this regard, the use of virtual characters recreating the past enhances the believability of the representation. These characters can be interactive or non-interactive, acting as virtual assistants or narrators and playing an essential role in guiding the user through the transmission of knowledge [7].

One of the most significant challenges in further enhancing immersion lies in the interaction with virtual narrators within these virtual environments. Their dialogue is often predictable, repetitive, or unnatural, which affects the player’s experience and limits the sense of immersion. The ability to interact with the Virtual Human (VH) in natural language through verbal communication enhances immersion and increases user engagement [8]. Nowadays, language models have reached a level of maturity that enables the use of these technologies to incorporate virtual characters capable of interacting naturally with users. A generative-based chatbot is an approach for generating answers in the interaction between the player and the Virtual Human.

This work focuses on the integration of Virtual Humans within the 3D reconstruction of the World Heritage Site San Cristóbal de La Laguna in the Canary Islands. Among the key factors contributing to its designation are its status as the first unfortified Spanish city and the largely unaltered nature of its urban layout, as evident in the map created by engineer Leonardo Torriani in 1588. Within a World Heritage context like La Laguna, heritage is an exceptionally significant cultural, social, and economic resource; therefore, techniques are required to facilitate a better understanding of its historical evolution and to promote its dissemination. The 3D model, based on Torriani’s map, balances fidelity to the original with the integration of supplementary historical sources, providing a reconstruction of the city in the 16th century. This model has enabled the development of Virtual Reality (VR) applications for immersive exploration of the ancient city and several serious VR video games incorporating narratives based on its history. All interactions within these applications follow conventional approaches without the use of generative AI. Based on prior research findings, this study aims to investigate the potential application of generative Artificial Intelligence (AI) within VR environments, following this research trajectory.

Different experiments are conducted to evaluate how the responsiveness and adaptability closest to that of a human interlocutor are affected, as well as the influence of communication through a multimodal interface that allows users to communicate with the characters through their microphones. In the first phase, a comparison is made between the responses generated using LLMs, providing different contexts. In the second phase, the application is modified by incorporating voice interaction. Static characters of the environment will be brought to life through user interactions and relevant responses based on the historical context. This will involve designing and developing the necessary interactions for the VHs to respond coherently and contextually, in line with the historical theme of the project. To carry out this study, an application is developed in which conventional interaction is replaced by an advanced system that simulates realistic and fluid conversational interactions between users and the VHs in the virtual scenario corresponding to the plot of Plaza del Adelantado. Finally, user tests are carried out.

The main contributions of our work are as follows:

The development of an enhanced immersive VR application for the dissemination of CH in San Cristóbal de La Laguna, featuring a natural interaction system for communicating with the VH in the role of a guide.
Improved engagement, presence, and user participation in the VR application.
The implementation of pre- and post-development studies to incorporate user preferences into the interaction system design, thereby enhancing user engagement and attention.

The remainder of this work is structured as follows: Section 2 offers an overview of related works in the field of natural interaction in VR applications. Section 3 details the characteristics of the 3D reconstruction and historical sources used in this research. Section 4 presents the methodology employed in adapting the VR environment to recreate San Cristóbal de La Laguna utilizing Large Language Model (LLM)-based interaction, including a comprehensive discussion of each step. Section 5 presents the results, while Section 6 discusses the findings, and proposes potential future research directions. Finally, Section 7 summarizes the conclusions of this study.

2. Related Works

Recent work [9] explores the integration of LLM-generated dialogue in video games. From an interaction perspective, insights in this area are particularly relevant to this work. The interaction with Non-Player Characters (NPCs) in video games presents the same challenges as interacting with VHs in 3D CH reconstructions.

NPCs in games constructed using LLMs demonstrate great ability when interacting with human players. Dynamic NPC dialogue generation can be developed using a generative-based chatbot. If the NPCs are given control similar to a chatbot, the player’s experience will be more immersive [10].

The study in [11] evaluates the use of LLMs in a commercial video game for generating NPC responses. The analysis is conducted through player comments on Discord. A guideline is established for the use of LLMs in dialogue generation to mitigate the main issues identified, which are categorized as hallucinations, content quality, memory, style, directionality and conversation tracking, NPC evasiveness, social norms, input modality, design changes, and handling of technical failures. In summary, discrepancies between LLM-generated content and the player’s experience should be avoided, while features such as model memory and open-ended dialogues should be leveraged to enhance immersion.

The development of a mystery VR game was used to compare voice-based communication with NPCs against interaction through conventional dialogues. In both cases, NPC responses were generated using ChatGPT-4. Two groups of users participated in this study, playing the game and subsequently answering a series of questionnaires on presence, gaming experience, and open-ended feedback. Voice interaction was considered simple and immersive, but the freedom to talk about any topic with the NPC caused disorientation. In contrast, dialogue-based interaction was found to be more complex but allowed for successful game completion, likely due to the user’s greater familiarity with this interaction style in VR [12]. In [13], the use of LLMs and Retrieval-Augmented Generation (RAG) for NPCs is analyzed using various metrics to evaluate dialogues, as well as computational and financial costs and latency. For dialogue evaluation, problem-specific metrics are defined using the G-Eval method. The measured response attributes include relevance, ability to foster interaction, engagement or entertainment value versus being generic or dull, objective correctness in relation to the context, avoidance of misinformation, contextual specificity, inclusion of details, coherence with the context, consistency with personality traits, logical continuity of the dialogue, and avoidance of contradictions with prior facts. Once again, the findings highlight concerns regarding latency and hallucinations while demonstrating good performance in terms of narrative quality and credibility.

Additionally, ref. [14] examines the influence of emotions on players based on their interactions with NPCs that express emotions generated through LLMs. The authors use dialogues with NPCs to study players’ emotions throughout the game. They experiment with different Large Language Models (LLMs) and find that emotional responses vary depending on the NPC’s mood and evolve alongside the game’s progression. It is important to note that, due to the unpredictable nature of generative AI, the NPC may produce responses that do not align with the game’s rules, which can affect player interactions. The authors of [15] compare LLM-based NPCs for generating content aligned with human personality traits and find that they can emulate characteristics similar to those of humans. In this case, they analyze different language models, highlighting GPT-4.0 over the others for its ability to exhibit behavior patterns consistent with human personality traits.

In [16], the enhancement of user experience within Cultural Heritage applications is proposed through the integration of VR and LLMs. Two groups of subjects are compared: one using a predefined chatbot and the other using an LLM-based chatbot, utilizing OpenAI’s API in a VR environment. Both experiments are analyzed based on usability variables, including accuracy, communication, intuitiveness, overall experience, response time, and satisfaction. Additionally, other factors such as user engagement, learning outcomes, and cognitive task load were considered. This study concludes that VR experiences supported by LLMs are of significant interest, as they lead to improvements in usability, offer a more interactive approach to heritage dissemination, and increase user engagement, particularly among younger audiences. Furthermore, such experiences are recommended in contexts where on-site learning is not feasible. In addition, the detection of different interaction styles among user profiles leads the authors to recommend personalized VR learning experiences that can be adapted to users’ diverse preferences. The authors of [17] demonstrate the potential of using GPT as a museum guide and recommendation system, enabling the creation of personalized and interactive guides while enhancing user engagement. However, there are ethical considerations that must be addressed: inclusion, preservation, and the safeguarding of cultural integrity. Similarly, ref. [18] highlights the importance of ethics due to the necessity of honoring the privacy, dignity, and historical rigor of the character. The advantages of using XR + LLMs in Cultural Heritage education are also presented in [19], combining immersive visualization with AI-driven information tailored to each student’s needs. A usability comparison is conducted in [20] between a mixed reality (MR) application for an art exhibition using LLM-based interaction and a control group using the application with conventional interaction. This study assesses presence, flow, interaction naturalness, user experience (evaluated with the UEQ questionnaire), perceived reliability, perceived technical competence, and perceived understandability. It concludes that the use of LLMs positively impacts user experience and perceived reliability when interacting with virtual art exhibitions. In [21], the use of eye-tracking metrics is proposed to measure user attention and interaction in a VR environment. The main objective of this study is to explore the impact of different levels of personalized storytelling on cognitive load, visual attention, and user engagement. The narratives are generated using generative AI, either provided by an intelligent virtual assistant or through instruction generation. This study concludes that AI in VR environments enhances sustained engagement and attention in Cultural Heritage learning.

3. Materials

In this Section, we explain the 3D reconstruction of San Cristóbal de La Laguna. This 3D model is the environment where users can walk and maintain dialogues with the VH. Furthermore, we describe all the documental fonts that we use to develop the dialogue system.

3.1. 16th Century VR Application of San Cristóbal De La Laguna

The reconstruction of San Cristóbal de La Laguna in the 16th century was obtained by integrating several models from Torriani’s map [22]. The resulting 3D model includes a Digital Elevation Model around the city, a 3D model of houses, a procedural generation of the city, 3D models of singular buildings, a 3D model of city inhabitants, and 3D models of other elements. Each component has been developed using the most appropriate technique according to the available information to ensure the reconstruction is as accurate as possible. In the following lines, we describe them.

Around the city, a Digital Elevation Model (DEM) was used, taking into account the terrain elevation. We used an up-to-date elevation model with a resolution of 1 m, covering an area of approximately 3.7 × 4.6 km. This model was obtained from CityEngine [23] and has been processed to reflect the terrain situation in 1588. Furthermore, the city’s location on the island allows for the observation of elements such as the Teide volcano, which is more than 50 km away. Therefore, we have also used a second DEM for the rest of the island that surrounds the first DEM.

Five constructive house types were developed by civil architecture in the 16th century in La Laguna: terrera (single-story house), granero (house–barn), sobradada (multi-story house), commercial (commercial house), and armera (armory house) [24]. For each typology, different distributions of windows, doors, and roofs were determined. To optimize the 3D model, the geometries of the houses were designed to be simple and built from three basic elements identified from the analysis of the façades in Torriani’s map: a façade fragment with only a door, a façade fragment with only a window, or a façade fragment with both a door and a window. With these characteristics, along with information about the distribution of the different typologies according to social strata, shape grammar was defined, allowing for the procedural creation of a city model with approximately 1300 buildings. The realistic finish was achieved with high-quality textures.

The remaining buildings in the city reconstruction correspond to convents, churches, the Cabildo House, and the Palace of the Adelantado. All of them were created following the parametric modeling paradigm using Edificius (Version 2022) [25] software. It is worth mentioning that the generated models were post-processed in Blender (Version 4.2 LTS) [26] to simplify geometries and efficiently assign textures. The result is a collection of singular buildings whose structures and ornamental elements are based on the architectural style prevailing in the 16th century in the Canary Islands.

The characters inhabiting the city were determined according to the different social strata in Spain during the recreated period. In the design of their models, the book [27] on clothing and social types, which includes patterns of Spanish fashion, was fundamental. Based on this source, and combining the use of specialized clothing modeling software, Marvelous Designer (Version 7) [28], along with MakeHuman (Version 1.2.0) [29] and Mixamo [30] for the creation of animated avatars, at least one male and one female model were created for the wealthy class, middle social sector, lower social class, religious figures, and military personnel.

Other elements present on the map, such as water conduits, fountains, walls, etc., were designed in collaboration with historians and artists through a process of successive improvements to the initial model based on expert knowledge.

The result is a 3D model of the city in which distinct relevant areas are differentiated around one or more singular buildings. Figure 1a,c show the map fragment and 3D model of Adelantado Square, while Figure 1b,d depict the surroundings of Concepción Church with the lagoon in the background.

Using the reconstruction, an application was developed for virtual visits to the historical city, where the characters, as VHs, provide information about the city’s heritage and history. The Meta Quest platform was selected for development, supporting both Quest 2 HMD and Quest 3. In the initial development, information is provided through pre-established dialogues, either through panels in the model as we can see in Figure 2 or via Unity’s audio playback component. The process is triggered when the player selects the VH using the device controllers, which generates an event that displays an interface in the world where the panels with the characters’ and users’ dialogues will appear along with the characters. Movement through the map is achieved by controlled movement using the joystick of one of the controllers. This application serves as the foundation for building the dialogue system based on LLMs. It should be noted that in this application, the characters are animated. For a fluid visualization of the model, it was necessary to apply optimization techniques [22]. As a result, it was possible to develop a VR application for exploring the Adelantado Square environment with a refresh rate of about 90 FPS, memory requirements of 350 Mb, and without containing any file larger than 50 Mb.

3.2. Historical Documentation

The development of personalized and non-deterministic dialogues using LLMs should consider the possibility of mitigating errors caused by model hallucinations. Additionally, efforts have been made to give the characters 16th-century personalities. Both issues have been addressed under the hypothesis of using historical textual corpora to provide context for the generation of dialogues. A similar approach can be found in MonadGPT [31], an LLM fine-tuned on early modern texts in English, French, and Latin, which answers questions about the solar system in a manner consistent with 17th-century European knowledge and provides medical advice based on the four humors model predominant at the time [32].

A collection of literary texts from the period was used, but in modern Spanish, composed of works by Miguel de Cervantes, including Don Quixote, Novelas Ejemplares, plays, and other novels with an approximate total of 735,000 words (Figure 3). The original format of these texts was PDF, so the PyPDF 2 library in Python was used to extract the text and store the collection in a JSON file in key–value format, where the key was the book’s title, and the value was the string containing the entire text of the work.

On the other hand, the Biblioteca Virtual Viera y Clavijo of the Instituto de Estudios Canarios houses the Fontes rerum Canarium, the largest digitized collection of historical texts from the Canary Islands, which has been analyzed and commented on by specialized historians. Among other compilations, it includes council records, notarial protocols, wills, the earliest chronicles of the conquest, and editions of previously unpublished Canary Island chronicles. For the experiments, document [33] from the collection was used, which contains the “juicio de residencia” (residency trial) to which Adelantado Alonso Fernández de Lugo was subjected in 1508, conducted by Lope de Sosa on behalf of the Catholic Monarchs [34]. The digital document is in image format, requiring text extraction processing. Pages 8–48 were used, as they contain the analysis conducted by the authors regarding the trial proceedings, providing valuable information about the island’s history in the post-conquest period. In Figure 4, the excerpt from the historical document written in the 16th century by Lope de Sosa is highlighted, while the regular text corresponds to the authors of the work.

4. Methods

Several tasks have been addressed in the implementation of a natural interaction system with VHs within the historical reconstruction of San Cristóbal de La Laguna. These tasks encompass, on the one hand, the development of a voice-based communication system, and on the other hand, the design of a dialogue generation strategy leveraging LLMs. This Section outlines the interaction system architecture and relevant aspects of the Virtual Reality (VR) development. Subsequently, it details the analysis conducted using various prompt templates and provides recommendations for achieving the project objectives.

4.1. Interaction System Architecture

The current version of the application features a single interactive Virtual Human (VH) who plays the role of a guide, providing information about Adelantado Alonso Fernández de Lugo and the square where the scene takes place. The character chosen for interaction is a wealthy man. The remaining characters are included to enhance the atmosphere, similar to NPCs in video games. Users can freely ask the VH any question. The entire interaction takes place in Spanish.

LLMs have been trained on vast amounts of text data and have learned to model the structure and meaning of language in a more accurate and contextual way. These models are capable of generating coherent and human-like text and can even exhibit a degree of creativity in their output. Furthermore, their architecture enables knowledge transfer across related tasks, which has led to the development of pre-trained language models that can be adjusted and fine-tuned for specific tasks with smaller datasets, through the fine-tuning technique.

When referring to the execution of this type of pre-trained model, it refers to the process of performing inference on the model, that is, giving it an input and obtaining an output. This input is known as prompt and the output as completion. The output is influenced not only by the configuration parameters of each model but also by the input string. The design and structuring of the prompt to obtain results in a specific format or with specific characteristics is known as prompt engineering. This technique can be exploited to obtain responses that fit different styles or registers, depending on the context in which the dialogue occurs. Some prompting strategies, such as Zero-Shot, Role-Play, Few-Shot, Generated-Knowledge, or Reflexion are more useful for obtaining texts in literary registers [35]. To meet the requirements of our application, it was necessary to adjust the structure of the question formulated to the LLM to produce textual structures and vocabulary typical of the 16th century. The strategy followed was to use the works of Miguel de Cervantes described in Section 3.2 as an example.

Retrieval Augmented Generation (RAG) is a technique based on LLMs that allows the construction of dialogues in a chatbot by restricting the responses to the content of a collection of relevant documents, using the text generation quality learned by a general LLM. To determine the ideal templates for the prompts, the interest in using the corpus compiled using this technique was assessed, using the historical scientific article in Section 3.2 to generate information, reducing the risk of hallucination

In addition to historical information, the prompt includes instructions regarding the VH’s tone, social status, historical period, and place of origin. Regarding the length of the response, an upper limit of 12 sentences was established for conveying historical content, and a limit of 3 sentences for conversational links or farewells. A single API prompt template was defined and specifically configured to produce language contextualized to the historical period. This register was chosen based on preferences identified in prior user research, as detailed in Section 4.2. Furthermore, the interaction is designed to resemble a conversation between humans, avoiding an assistant-like mode (Figure 5). All of these constraints are aimed at achieving a natural interaction with a 16th-century VH.

Users interact with the VHs via voice, and the application reproduces the textual response generated by the language model using the designated voice for the VH. Interaction initiates when a user selects a VH by intersecting it with the ray from the right controller, which is part of the VR device’s input system. Subsequently, the user activates voice recording by pressing the right controller’s A button, if recording is not already in progress. The application’s user interface (UI) provides real-time feedback on the recording status, API request, and response, including a transcription display. To conclude their query, users select the VH with the B button. Users can cancel an ongoing recording by pressing the A button again. The recording duration is limited, and the system will timeout if the limit is reached. Figure 6 illustrates this process.

The audio captured between the start and stop recording points is stored as a WAV file for transmission to the Speech-to-Text API. Users receive visual feedback through a UI animation. The acquired audio is transcribed into text using a sequence-to-sequence model, accessible within Unity via the Hugging Face Unity API plugin for the Speech-to-Text task. The request is configured to use the whisper-tiny model from Hugging Face [36]. In the absence of errors, a prompt is formulated using the transcribed text and sent to the LLM service. The prompt template adheres to the recommendations derived from the analysis presented in Section 5. The resulting textual response triggers the Text-to-Speech process, generating an audio clip played by the VH.

Given the resource and computational demands of Large Language Models (LLMs), text generation is delegated to a language model via a web API. Language models accessible through HTTP requests offer immediate access to robust language processing and generation capabilities, facilitating seamless integration into web, mobile, and Virtual Reality (VR) applications without the requirement for local infrastructure deployment and maintenance. Prominent examples include the OpenAI API, featuring GPT-4 and ChatGPT models, as well as commercial solutions such as the Google Cloud Natural Language API and Microsoft Azure Cognitive Services. The pre-trained model employed is GPT-3.5-turbo, which is specifically trained for dialogue tasks and incorporates conversation history for contextually relevant response generation. At the time of development, the cost of GPT-3.5-turbo was USD 0.002 per thousand tokens processed. The entire process is summarized in Figure 7.

4.2. Dialogue Generation Adapted to Historical Context Using LLMs

The comfort and satisfaction that the user feels with the generated dialogues is a crucial element for the interaction system. The way to influence and achieve the desired effects will be through the prompts that are sent to the API. To determine the preferences regarding the communication style with the VH, a survey was conducted with potential users. The survey included five questions specifically focused on the historical figure of the Adelantado, Alonso Fernández de Lugo, a key figure in the conquest and colonization of the Canary Islands during the late 15th and early 16th centuries. The template for the prompt used is in Figure 8. The language style varies between current language or examples extracted from the literary works described in Section 3.2. In the preliminary tests carried out during development, it was decided to place limitations on the number of generated tokens, since otherwise excessively long explanations are generated. This results in cognitive overload and loss of user interest and participation.

The formulated questions are listed below, along with example responses based on the following approaches: RAG strategy with contemporary language (RAG), RAG strategy with 16th-century literary style, and direct query using only the prompt as input (Table 1). These responses were used to generate the following selection options that respondents must choose from:

What role did the 1508 residencia trial play in controlling Alonso de Lugo’s administration?
What were the main accusations presented against Alonso de Lugo during his tenure as governor?
How did the Crown’s decisions influence the limitation of the Adelantado’s powers in Tenerife and La Palma?
What conflicts did Alonso de Lugo have with other governors of the Canary Islands, such as Lope de Sosa?
How did Alonso de Lugo manage the distribution of lands and waters in Tenerife, and what consequences did his administration have?

As observed in Table 1, the responses generated by the general model are less precise compared to those generated using RAG strategies and provide erroneous information (highlighted in red). Given the informative nature intended for this application, the RAG modalities are recommended. However, we present examples of all three styles to the users to determine which style best fosters engagement and user participation based on their preferences. The complete table with all questions and the responses from all three LLMs is provided in Appendix A.

For the dialogue generation task, when providing the prompt, the AI generates a response influenced not only by the model used but also by the structure of the text itself. To obtain good results, an effective design of the prompts is required. The survey presented in this Section focuses on user preferences regarding the question–answer results generated by the chatbot.

The survey included a Section with questions about age, as well as interest in and knowledge of the Cultural Heritage (CH) of San Cristóbal de La Laguna. The second Section introduced users to the Virtual Reality (VR) application, providing images for context. Each question in Appendix A was presented alongside three stylistic variations of responses, and users were asked to indicate their preferences.

4.3. User Experience of the LLM-Based Interaction System

The usability of the application was evaluated through a qualitative user test. It is worth noting that qualitative testing is an effective method for studying usability and user experience. When used for exploratory purposes, this type of test can uncover most usability issues in an application without requiring a large number of participants. Therefore, critical usability issues can be identified quickly with a modest number of participants. According to [37], it has been empirically established that five users detect approximately 80% of the problems, and eight users detect 90%.

To evaluate the UX with the interaction system, user tests were conducted with 9 participants who had no prior knowledge of the system. They were divided into two distinct groups: one group familiar with the use of new technologies and video games, and another group with limited exposure to such technologies. The task assigned during the test consisted of navigating through La Plaza del Adelantado with the goal of reaching the Virtual Human (VH) and requesting information about the square, the conquistador, and the city of La Laguna in the 16th century. The tests were conducted using an Oculus Quest 3 device, on which both the application and Cognitive3D software (Web application and SDK for Unity 1.4.3) were installed. Additionally, an external camera was set up to record the user experience from a third-person perspective. A laptop was used to monitor both the external camera feed and the view seen through HMD.

According to Nielsen [38], users who are unfamiliar with augmented reality technologies may require initial guidance. This also applies to Virtual Reality, given that it remains unfamiliar to most users; therefore, we took this recommendation into account. Moreover, tasks should be clearly defined and easy to understand in order to minimize potential confusion or misunderstandings. For this reason, the first few minutes of the session (approximately three) were dedicated to explaining how to use the VR device while allowing users to experiment with the controllers and movement to become familiar with the system. Participants were also informed of the main objective and were encouraged to discuss specific topics with the VH. Some of the proposed questions included: “Tell me about the history of La Plaza del Adelantado,” “How was the conquest of Tenerife financed?” and “What role did Valle de Guerra and Tejina play in the conquest of Tenerife?”

The tests were carried out in a room of approximately 7 m², where users could move freely. The furniture was removed, and the area was cleared to prevent potential collisions, ensuring good lighting throughout the session. The application runs natively on Quest 3. A camera connected to the computer, along with the Meta Quest 3 device’s transmission function, recorded the user during the session. Before recording began, it was important to guide the user on how to use the Virtual Reality headset, explain the test procedure, and clarify that the objective was to maintain a dialogue with the Virtual Human (VH). The test facilitator resolved user queries while the application was running, ensuring that the reactions captured were those from the first interaction, without the need for repetition.

The data was complemented with user responses to a post-test questionnaire in which participants were asked to evaluate their experience with the application. The questions explored users’ perceptions of the application’s usability, as well as key factors relevant to usability in Virtual Reality environments and in the use of a conversational avatar. The questionnaire addressed aspects such as perceived comfort or discomfort during use, the degree of immersion experienced, and the overall efficiency of the interaction. The complete set of questions is provided in Appendix B. Participants rated their experience from 1 (not at all) to 5. Although the responses are not statistically significant, they provide valuable feedback regarding their experience with the application. Additionally, Cognitive 3D software was used to record user interaction with the VH through the dynamic object functionality of this software.

5. Results

This Section presents the results regarding user style preferences, the interaction system of the VR application, and the user experience.

5.1. Dialogue Style Survey

The survey results on users’ preferences regarding dialogue style from a university environment, aged between 18 and 58 years, showed two main age groups emerging: 18–23 years and 48–58 years. Additionally, one respondent was 15 years old. A total of 77.3% of participants expressed a high interest in the history of San Cristóbal de La Laguna, though most reported limited knowledge of the city’s history: 50% had a medium level of knowledge, 31.8% a low level, and 9.5% a very low level. Users preferred period-appropriate language in all questions, with a preference rate ranging from 45.5% to 59.1% (Figure 9).

Among the responses in contemporary language, those generated using the RAG strategy were preferred, with the older age group predominantly selecting one of the two options in contemporary language (Figure 10). Notably, preferences for period-style language were more pronounced among younger respondents.

5.2. User Experience of the LLM-Based Interaction System

This work analyzes user interaction through qualitative user testing, focusing on dialogues between users and a Virtual Human (VH) within a Virtual Reality (VR) environment. The sample size was determined based on the objective of conducting a formative usability evaluation. The aim of this study was to understand natural interaction patterns in VR applications powered by generative AI and to identify potential usability issues early in the development process.

Figure 11 presents the statistics provided by Cognitive 3D regarding the test sessions, which allow us to characterize the type of interaction users had with our application. It is observed that the Engagement variable reaches its maximum value (100%), confirming that the participants interacted with the dynamic object in every instance, meaning they successfully established interaction with the VH. On the other hand, 90% of users remained within the designated limits, so we can affirm that the application does not induce movements that could put them at risk. Finally, the Controller Engagement value is low, indicating that users did not attempt interactions beyond initiating the dialogue. If this value had been high, it would have suggested that users tried other styles of more abrupt movements, which are not programmed into the application. Regarding the Orientation variable, a value of 63% was obtained, showing moderate variability in orientation relative to the VH, which can be attributed to the continuous movement users made to receive feedback from the information pane.

Moreover, the analysis of the session videos reveals a certain delay between the information displayed on the panel and the start of the audio response playback from the VH. On average, it takes 5.2 s to display the response visually and 10.8 s to play the audio, as shown in Figure 12. The videos also reveal that nearly all users tend to humanize the VH by using courtesy expressions such as “please” or “thank you” and common conversational phrases when addressing another person. Additionally, it was observed that in only two out of the nine tests, the VH understood the first user’s question correctly, with the remaining instances requiring the user to repeat the question. In all cases of failure, the user reformulated the question more slowly and enunciated better in the next attempt, which led to the VH understanding the question and providing a response. In cases where proper names of locations on the island or individuals were asked, the VH often did not understand them, leading to responses that did not match the query, resulting in user frustration. Finally, this, along with the intonation from the voice model, highlights the need for further refinement of these models for the Spanish language.

After completing the test, users responded to a questionnaire that included items assessing immersion, enjoyment, and user comfort [39] as key aspects in the evaluation of VR applications. The application received positive ratings from the test participants (Figure 13). Similar to other studies [40], the typical issues faced by users in VR applications have remained at a low level, with only one case of motion sickness reported (Figure 13a). Despite latency delays in the playback of the VH’s audio messages, users have perceived a high degree of fluency in dialogue generation, and they also consider the application’s performance to be adequate as we can see in Figure 13b. These aspects are positive, as the additional computational complexity has not negatively impacted the user experience and allows for a new form of natural communication in such applications. This positively influences presence and attention, with high or very high ratings given to questions regarding the overall experience and more specific aspects, such as response quality and sense of immersion (Figure 13c).

6. Discussion

Our study analyzes user interaction through qualitative user testing in dialogues between users and a VH within a VR. Given the informative nature of the application, an important objective has been the reliability of the responses obtained by users. The introduction of LLMs for generating coherent and creative dialogues through the use of models, along with the strategy of conditioning responses via RAG techniques, is essential for achieving greater comprehension and coherence [14], reaffirming the importance of context in such applications [11]. The use of documentary studies for this purpose has been fundamental to the system. To determine the dialogue style that captures users’ attention, we have adapted the approach of conducting online surveys with users to develop prototypes aligned with their interests, as proposed in [41]. In our case, it was found that using a language register appropriate to the historical period generates the most interest. However, it is also worth noting that, according to the results, age may influence the preferred style, as it was predominantly chosen by individuals between 15 and 23 years old.

Likewise, complementing the virtual environment with a VH contributes to realism and authenticity, which are key aspects of XR Cultural Heritage experiences. Generating historically contextualized dialogues ensures that the virtual character’s behavior is as faithful as possible to its historical counterparts [42], also contributing to creativity.

While other studies such as [6,12,16,41,43] involve larger samples, these are typically summative evaluations, designed to compare different systems or to validate usability questionnaires. Prior research such as [44,45,46] has shown that with a relatively small sample (n between 5 and 10), it is possible to identify the majority of usability problems and obtain representative insights into users’ overall perception of usability. Additionally, the error detection model proposed by Nielsen and Landauer (1993) [47] predicts that, assuming a 31% detection rate, a group of nine participants is sufficient to uncover approximately 95% of usability issues. This reinforces the value of formative studies as an effective and methodologically appropriate approach for early-stage development, with the added advantage of being cost-efficient.

On the other hand, when post-test questionnaires are used for comparative or summative purposes, a larger sample size is generally required than that used in this study. Nevertheless, a post-test questionnaire was included to gather users’ subjective impressions of the application. This allows for a descriptive analysis of key usability-related aspects in VR and natural interaction, complementing the logged interaction data with indicative, though not conclusive, information.

The questionnaire includes items related to positive aspects such as presence and the system’s ability to sustain user attention, as well as negative factors such as simulator sickness, cognitive overload, and stress—factors commonly considered in usability evaluations of VR applications as in [43] or in the Virtual Reality system usability questionnaire (VRSUQ) [48]. Appendix B presents a table with the full set of post-test questions, along with the items from VRSUQ that measure similar attributes.

On the other hand, using these questionnaires for comparative purposes requires a larger sample size than the one considered in our study. Nevertheless, we included a post-test questionnaire to capture users’ perceptions of the application. This enables a descriptive analysis of key usability aspects in VR applications and natural interaction. The results complement the recorded user interaction data, serving as indicative but not conclusive evidence.

The questionnaire included both positive aspects—such as presence and the system’s ability to sustain user attention—and negative aspects such as motion sickness, cognitive overload, and stress, as recommended in VR usability research [43,48]. Appendix B presents a table listing the questions included, along with the corresponding VRSUQ items that target similar attributes. Among the reviewed literature, the study most closely related to our work focuses on comparing pre-scripted dialogues with those generated by Large Language Models (LLMs) [16]. Due to the nature of that study, a sample size large enough to ensure statistical significance is required. In contrast, our work is framed as a formative study and therefore places fewer demands in terms of the number of participants. In our case, the authors employ an adaptation of a validated usability questionnaire—the System Usability Scale (SUS)—for chatbot evaluation. However, while their instrument remains more general, our adaptation incorporates elements specific to Virtual Reality environments, along with items addressing natural interaction.

The experience gained from the design, development, and evaluation of the application enabled the formulation of initial guidelines for advancing this line of research. The introduction of conversational dialogue in Virtual Reality applications for Cultural Heritage dissemination has proven to be viable, although its complexity extends beyond technical challenges.

First, in order to enhance users’ sense of presence, we propose that the language used by the Virtual Human (VH) be aligned with the historical period being represented. Previous research has emphasized the importance of user preferences in selecting the appropriate linguistic register for chatbot interactions. For example, ref. [49] analyzes register use in tourism-related dialogue systems, while [50] examines the impact of chatbot communication style on user satisfaction and trust in customer service. In our own preliminary survey on preferred language style for the VH, we observed potential differences in preferences depending on user profiles. This highlights the need to define the target audience for the application and to conduct a deeper study of how users prefer to receive information from the VH.

Second, the Cultural Heritage dissemination goal demands historical accuracy. The virtual reconstruction process was guided by historians specializing in the history of San Cristóbal de La Laguna. When integrating open-ended dialogue with the VH, it becomes crucial to ensure the accuracy and historical fidelity of its responses, in contrast to entertainment-focused systems that may generate misleading or incorrect content [18]. This concern has been addressed in our work through the implementation of Retrieval-Augmented Generation (RAG) techniques, restricting the chatbot’s output to content extracted from a validated historical corpus.

Third, several ethical issues must also be addressed, particularly regarding information accuracy, privacy, and bias. Accuracy was ensured using the RAG-based approach mentioned above. As for privacy, challenges inherent in LLM-based systems are compounded in XR environments by the use of body tracking and spatial sensors. In future stages of development, it will be necessary to include questions regarding users’ privacy preferences, as suggested in [51]. In our case, user testing revealed a gender bias in the VH’s assumptions, which defaulted to addressing the user as male. This was mitigated through prompt design, allowing users to specify how they wished to be addressed. We also confirmed the importance of user preferences regarding the VH’s linguistic style. For this reason, we believe that adapting dialogue through prompt engineering—and even customizing the RAG document corpus to include different levels of complexity—are essential strategies to ensure inclusivity.

Finally, and directly related to the results of the user interaction logs from the formative study, we observed a drop in engagement associated with overly long responses and noticeable latency in the VH’s speech generation. To address this, we implemented a visual feedback panel in the interface to indicate the system’s processing status. Despite the statistical limitations of this study, the overall process of design, development, and user testing helped us contextualize and understand challenges specific to developing a VR-based Cultural Heritage application featuring conversational VHs. This enabled us to compile a set of preliminary guidelines that may be useful for similar projects and also helped define the following future directions of this development: (1) conducting a thorough analysis of reliable textual sources for use with RAG techniques, (2) defining the target audience, (3) testing different prompt styles with user involvement, (4) limiting response length to maintain user attention and engagement, (5) employing LLM-based generation services to ensure dialogue fluency, and (6) incorporating user interfaces to provide feedback on the dialogue generation process.

7. Conclusions

This paper presents a detailed description of the development of an LLM-based interaction system for a VR application that allows users to explore a 3D reconstruction of the historical center of San Cristóbal de La Laguna. Users can engage in conversation with a virtual character designed in the aesthetic style of the time, enhancing immersion. The system leverages expert knowledge about the city’s history to generate dialogues using Retrieval-Augmented Generation (RAG) techniques. Additionally, the prompt design incorporates examples from Cervantes’ novels to achieve a dialogue style that aligns with the language of the 16th century.

A preliminary study was conducted to assess user preferences regarding the responses provided by the virtual character. The findings led to the adoption of a RAG-based dialogue generation strategy with a historical linguistic style. Finally, a qualitative user study was carried out to evaluate the system’s feasibility, demonstrating its potential to enhance the sense of presence and user engagement in the virtual environment.

Based on our findings, we propose a set of recommendations for developing Cultural Heritage dissemination applications featuring virtual characters as guides.

For future research, we propose exploring models optimized for resource-constrained devices to improve response latency, allowing users to select their preferred dialogue style, and training voice models tailored to the language used in the generated dialogues, as well as conducting a summative usability study at a more advanced stage of development.

Author Contributions

Conceptualization, F.P.-N. and I.S.-B.; methodology, F.P.-N. and I.S.-B.; software, L.P.-R.; validation, L.P.-R. and I.S.-B.; investigation, F.P.-N. and I.S.-B.; resources, F.P.-N.; data curation, I.S.-B. and L.P.-R.; writing—original draft preparation, F.P.-N. and I.S.-B.; writing—review and editing, F.P.-N. and I.S.-B.; visualization, I.S.-B. and L.P.-R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The test participants agreed with the use of the anonymously collected data for the user test. Data will be made available upon reasonable request to the corresponding author. The data are not publicly available due to privacy issues.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Question	RAG	RAG-Literary	Open
1	Alonso de Lugo’s residencia trial was a control mechanism imposed by the Crown to evaluate his administration. Lope de Sosa led the investigation, compiling accusations of abuse of power. However, despite the denunciations, Lugo managed to remain in his position until his death.	Oh, the misfortune of power ill-exercised! The Crown, suspicious of Lugo, sent Lope de Sosa to investigate his deeds. An inquiry was opened, witnesses were interrogated, and the parchments burned with accusations. But the Adelantado, with cunning and favor, maintained command like a helmsman in a storm.	The residencia trial of 1508 was crucial in overseeing Fernández de Lugo. It allowed the Crown, specifically the Council of the Indies, to examine his actions in Hispanic America. This judicial process established Lugo’s accountability and served as a supervisory mechanism to prevent abuses and ensure compliance with royal laws.
2	The denunciations against Lugo included misappropriation of land, the sale of public offices to relatives, the illegal enslavement of indigenous people, and the use of justice for his own benefit. He was accused of governing with authoritarianism and prioritizing his personal enrichment over the well-being of the people.	The main accusations revolved around his financial management, including embezzlement of funds and excessive personal expenditures. He was also charged with failing to control the encomenderos, which led to abuses against the indigenous population and issues with army supplies.	The main accusations revolved around his financial management, including embezzlement of funds and excessive personal expenditures. He was also charged with failing to control the encomenderos, which led to abuses against the indigenous population and issues with army supplies.
3	To curb Lugo’s excesses, the Crown restricted his judicial powers, prohibiting him from intervening in appeals. It also ordered him to replace his lieutenant in La Palma and dispatched officials to review his distribution of land and water, thus attempting to limit his absolute power over the islands.	The monarchs, seeing his excessive power, restrained his will with royal decrees: they removed appeals from his jurisdiction, forbade him from seizing more land, and even forced him to dismiss his lieutenant. Thus, little by little, the Crown tamed his ambition, like breaking in a wild colt.	The Crown, particularly the Council of the Indies, sought to exert greater direct control over the Canary Islands. They significantly reduced Lugo’s powers by appointing royal officials to oversee the administration and limiting his ability to make important decisions, favoring a more centralized model.
4	Lope de Sosa, governor of Gran Canaria, was a key opponent of Lugo. Tasked by the Crown with investigating Lugo’s administration, Sosa’s efforts fueled a fierce rivalry between them. Sosa gathered evidence against Lugo, who skillfully defended himself and maintained his position despite the accusations.	If two suns cannot share the same sky, neither could two governors coexist in the islands. Sosa, the royal envoy, relentlessly pursued his abuses, uncovering witnesses and secrets. Yet Lugo, with his cunning, wove his defenses and held onto his position, though the enmity between them smoldered like a hidden ember.	Lugo and Sosa clashed over supremacy in Tenerife. Supported by the Crown, Sosa challenged Lugo’s dominance, leading to tensions and disputes over resource control and political authority on the island, further exacerbating rivalries between governors.
5	Lugo oversaw the distribution of land and water in Tenerife, favoring his allies and excluding perceived opponents. This sparked numerous complaints to the Crown, which dispatched officials to review his administration. His decisions incited conflict and resentment among the island’s inhabitants.	As master and lord, Lugo distributed lands at his will, rewarding friends and punishing others with dispossession. Many cried out to the heavens and the Court, leading to inquiry after inquiry. Yet, with cunning and patience, he always managed to evade the blows of discontent.	Lugo implemented a land distribution system based on the “obra,” which led to conflicts with the peasants. His administration provoked discontent, revolts, and social tensions due to the exploitation of the indigenous workforce and the lack of protection for settlers, worsening economic problems.

Appendix B

The table below includes the questions as they were presented to users, alongside the corresponding item from the VRSUQ that addresses an equivalent usability attribute, where applicable. Questions specifically related to the experience of natural interaction are also indicated.

Question	Category
Do you understand how you are using the application?	VRSUQ-4
Do you think the task was difficult to complete?	VRSUQ-5
Do you feel that movements were smooth?	VRSUQ-1
Do you consider the conversation with the character to have been fluid?
Are you satisfied with the responses provided by the character?	Natural interaction
Did you enjoy the application and the experience?	Natural interaction
Would you recommend the app or use it again?	VRSUQ-6
Did you experience nausea or dizziness at any point?	VRSUQ-7
Was the experience comfortable?	VRSUQ-8
Did you feel immersed in the virtual environment?
Additional comments or suggestions for improvement	Open feedback

References

Mendoza, M.A.D.; De La Hoz Franco, E.; Gómez, J.E.G. Technologies for the Preservation of Cultural Heritage—A Systematic Review of the Literature. Sustainability 2023, 15, 1059. [Google Scholar] [CrossRef]
Kantaros, A.; Ganetsos, T.; Petrescu, F.I.T. Three-Dimensional Printing and 3D Scanning: Emerging Technologies Exhibiting High Potential in the Field of Cultural Heritage. Appl. Sci. 2023, 13, 4777. [Google Scholar] [CrossRef]
Pietroni, E.; Ferdani, D. Virtual Restoration and Virtual Reconstruction in Cultural Heritage: Terminology, Methodologies, Visual Representation Techniques and Cognitive Models. Information 2021, 12, 167. [Google Scholar] [CrossRef]
Innocente, C.; Ulrich, L.; Moos, S.; Vezzetti, E. A Framework Study on the Use of Immersive XR Technologies in the Cultural Heritage Domain. J. Cult. Herit. 2023, 62, 268–283. [Google Scholar] [CrossRef]
Theodoropoulos, A.; Antoniou, A. VR Games in Cultural Heritage: A Systematic Review of the Emerging Fields of Virtual Reality and Culture Games. Appl. Sci. 2022, 12, 8476. [Google Scholar] [CrossRef]
De Paolis, L.T.; Chiarello, S.; Gatto, C.; Liaci, S.; De Luca, V. Virtual Reality for the Enhancement of Cultural Tangible and Intangible Heritage: The Case Study of the Castle of Corsano. Digit. Appl. Archaeol. Cult. Herit. 2022, 27, e00238. [Google Scholar] [CrossRef]
Papaefthymiou, M.; Kanakis, M.E.; Geronikolakis, E.; Nochos, A.; Zikas, P.; Papagiannakis, G. Rapid Reconstruction and Simulation of Real Characters in Mixed Reality Environments. In Digital Cultural Heritage; Ioannides, M., Ed.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 10605, pp. 267–276. ISBN 978-3-319-75825-1. [Google Scholar]
Machidon, O.M.; Duguleana, M.; Carrozzino, M. Virtual Humans in Cultural Heritage ICT Applications: A Review. J. Cult. Herit. 2018, 33, 249–260. [Google Scholar] [CrossRef]
Shao, Y.; Li, L.; Dai, J.; Qiu, X. Character-LLM: A Trainable Agent for Role-Playing. arXiv 2023, arXiv:2310.10158. [Google Scholar]
Hasani, M.F.; Udjaja, Y. Immersive Experience with Non-Player Characters Dynamic Dialogue. In Proceedings of the 2021 1st International Conference on Computer Science and Artificial Intelligence (ICCSAI), Jakarta, Indonesia, 28 October 2021; IEEE: Piscataway, NJ, USA, 2021; Volume 1, pp. 418–421. [Google Scholar]
Cox, S.R.; Ooi, W.T. Conversational Interactions with NPCs in LLM-Driven Gaming: Guidelines from a Content Analysis of Player Feedback. In Chatbot Research and Design; Følstad, A., Araujo, T., Papadopoulos, S., Law, E.L.-C., Luger, E., Goodwin, M., Hobert, S., Brandtzaeg, P.B., Eds.; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2024; Volume 14524, pp. 167–184. ISBN 978-3-031-54974-8. [Google Scholar]
Christiansen, F.R.; Hollensberg, L.N.; Jensen, N.B.; Julsgaard, K.; Jespersen, K.N.; Nikolov, I. Exploring Presence in Interactions with LLM-Driven NPCs: A Comparative Study of Speech Recognition and Dialogue Options. In Proceedings of the 30th ACM Symposium on Virtual Reality Software and Technology, Trier, Germany, 9–11 October 2024; ACM: New York, NY, USA, 2024; pp. 1–11. [Google Scholar]
Kostilainen, S. Next Generation of NPC Dialogue: Creating Responsive NPCs (Non-Player Characters) with Retrieval-Augmented Generation and Real-Time Player Data. 2024. Available online: https://urn.fi/URN:NBN:fi:amk-2024053119434 (accessed on 25 March 2025).
Marincioni, A.; Miltiadous, M.; Zacharia, K.; Heemskerk, R.; Doukeris, G.; Preuss, M.; Barbero, G. The Effect of LLM-Based NPC Emotional States on Player Emotions: An Analysis of Interactive Game Play. In Proceedings of the 2024 IEEE Conference on Games (CoG), Milan, Italy, 5–8 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Klinkert, L.J.; Buongiorno, S.; Clark, C. Evaluating the Efficacy of LLMs to Emulate Realistic Human Personalities. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Lexington, KY, USA, 18–22 November 2024; Volume 20, pp. 65–75. [Google Scholar]
Lau, K.H.C.; Bozkir, E.; Gao, H.; Kasneci, E. Evaluating Usability and Engagement of Large Language Models in Virtual Reality for Traditional Scottish Curling. arXiv 2024, arXiv:2408.09285. [Google Scholar]
Trichopoulos, G. Large Language Models for Cultural Heritage. In Proceedings of the 2nd International Conference of the ACM Greek SIGCHI Chapter, Athens, Greece, 27–28 September 2023; ACM: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
Hutson, J. Combining Large Language Models and Immersive Technologies to Represent Cultural Heritage in the Metaverse Context. In Augmented and Virtual Reality in the Metaverse; Geroimenko, V., Ed.; Springer Series on Cultural Computing; Springer Nature: Cham, Switzerland, 2024; pp. 265–281. ISBN 978-3-031-57745-1. [Google Scholar]
Garzarella, S.; Vallasciani, G.; Cascarano, P.; Hajahmadi, S.; Cervellati, E.; Marfia, G. An Extended Reality Platform Powered by Large Language Models: A Case Study on Teaching Dance Costumes. In Proceedings of the 2025 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR), Lisbon, Portugal, 27–29 January 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 369–375. [Google Scholar]
Constantinides, N.; Constantinides, A.; Koukopoulos, D.; Fidas, C.; Belk, M. CulturAI: Exploring Mixed Reality Art Exhibitions with Large Language Models for Personalized Immersive Experiences. In Proceedings of the Adjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization, Cagliari, Italy, 1–4 July 2024; ACM: New York, NY, USA, 2024; pp. 102–105. [Google Scholar]
Lau, K.H.C.; Sen, S.; Stark, P.; Bozkir, E.; Kasneci, E. Personalized Generative AI in VR for Enhanced Engagement: Eye-Tracking Insights into Cultural Heritage Learning through Neapolitan Pizza Making. arXiv 2024, arXiv:2411.18438v1. [Google Scholar]
Pérez Nava, F.; Sánchez Berriel, I.; Pérez Morera, J.; Martín Dorta, N.; Meier, C.; Hernández Rodríguez, J. From Maps to 3D Models: Reconstructing the Urban Landscape of San Cristóbal de La Laguna in the 16th Century. Appl. Sci. 2023, 13, 4293. [Google Scholar] [CrossRef]
Esri. ArcGIS CityEngine; Version 2022.0; Esri: Redlands, CA, USA, 2024; Available online: https://www.esri.com/en-us/arcgis/products/arcgis-cityengine/overview (accessed on 25 March 2025).
Navarro Segura, M. L La Laguna 1500: La Ciudad-República. Una Utopía Insular Según “Las Leyes” de Platón; Ayuntamiento de San Cristóbal de La Laguna: San Cristóbal de La Laguna, Spain, 1999. [Google Scholar]
ACCA Software S.p.A. Edificius, Version 2022; ACCA Software S.p.A.: Bagnoli Irpino, Italy, 2022. Available online: https://www.acca.it/software-architecture-3D-BIM-Edificius (accessed on 10 October 2022).
Blender Foundation. Blender; Version 4.2 LTS; Blender Foundation: Amsterdam, The Netherlands, 2024; Available online: https://www.blender.org/ (accessed on 25 March 2020).
Bernis, C. El Traje y Los Tipos Sociales En El Quijote; Ediciones El Viso: Madrid, Spain, 2001; ISBN 84-95241-17-X. [Google Scholar]
CLO Virtual Fashion Inc. Marvelous Designer 9; Version 5.1.472; CLO Virtual Fashion Inc.: Seoul, Republic of Korea, 2019; Available online: https://www.marvelousdesigner.com/ (accessed on 25 March 2020).
MakeHuman Team MakeHuman Community. MakeHuman; Version 1.2.0; MakeHuman Community: Italy, 2020; Available online: http://www.makehumancommunity.org/ (accessed on 25 March 2020).
Adobe. Mixamo; Adobe Inc.: San Francisco, CA, USA, 2020; Available online: https://www.mixamo.com/ (accessed on 25 March 2020).
Pierre-Carl Langlais MonadGPT, Fine-Tuned Mistral-Hermes 2 Model. Available online: https://huggingface.co/pclanglais/MonadGPT (accessed on 25 March 2025).
Varnum, M.E.W.; Baumard, N.; Atari, M.; Gray, K. Large Language Models Based on Historical Text Could Offer Informative Tools for Behavioral Science. Proc. Natl. Acad. Sci. USA 2024, 121, e2407639121. [Google Scholar] [CrossRef] [PubMed]
De la Rosa Olivera, L. El Adelantado D. Alonso de Lugo y Su Residencia Por Lope de Sosa; Consejo Superior de Investigaciones Científicas, Instituto de Estudios: Madrid, Spain, 1949. [Google Scholar]
Gambín García, M. El Juicio de Residencia de Lope de Sosa a Alonso de Lugo En 1508: Una Visión de Conjunto. 2002. Available online: http://riull.ull.es/xmlui/handle/915/22374 (accessed on 25 March 2020).
de la Cruz Fernández, E. Inteligencia Artificial y Poesía Sintética: Una Metodología Para La Escritura Creativa Usando Grandes Modelos de Lenguaje. Master’s Thesis, Universidad Politécnica de Madrid, Madrid, Spain, 2024. [Google Scholar]
OpenAI. Whisper Tiny. Available online: https://huggingface.co/openai/whisper-tiny (accessed on 25 March 2025).
Turner, C.W.; Lewis, J.R.; Nielsen, J. Determining Usability Test Sample Size. Int. Encycl. Ergon. Hum. Factors 2006, 3, 3084–3088. [Google Scholar]
Guidelines for Testing Mobile Augmented-Reality Apps, Nielsen; Nielsen Norman Group: Dover, DE, USA, 2023.
Wang, Z.; Yuan, L.-P.; Wang, L.; Jiang, B.; Zeng, W. VirtuWander: Enhancing Multi-Modal Interaction for Virtual Tour Guidance through Large Language Models. In Proceedings of the CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024; ACM: New York, NY, USA, 2024; pp. 1–20. [Google Scholar]
Škola, F.; Rizvić, S.; Cozza, M.; Barbieri, L.; Bruno, F.; Skarlatos, D.; Liarokapis, F. Virtual Reality with 360-Video Storytelling in Cultural Heritage: Study of Presence, Engagement, and Immersion. Sensors 2020, 20, 5851. [Google Scholar] [CrossRef]
Yu, J.; Wang, Z.; Cao, Y.; Cui, H.; Zeng, W. Centennial Drama Reimagined: An Immersive Experience of Intangible Cultural Heritage through Contextual Storytelling in Virtual Reality. J. Comput. Cult. Herit. 2025, 18, 1–22. [Google Scholar] [CrossRef]
Okanovic, V.; Ivkovic-Kihic, I.; Boskovic, D.; Mijatovic, B.; Prazina, I.; Skaljo, E.; Rizvic, S. Interaction in Extended Reality Applications for Cultural Heritage. Appl. Sci. 2022, 12, 1241. [Google Scholar] [CrossRef]
Li, J.; Nie, J.-W.; Ye, J. Evaluation of Virtual Tour in an Online Museum: Exhibition of Architecture of the Forbidden City. PLoS ONE 2022, 17, e0261607. [Google Scholar] [CrossRef]
Lewis, J.R. Sample Sizes for Usability Studies: Additional Considerations. Hum. Factors 1994, 36, 368–378. [Google Scholar] [CrossRef]
Virzi, R.A. Refining the Test Phase of Usability Evaluation: How Many Subjects Is Enough? Hum. Factors 1992, 34, 457–468. [Google Scholar] [CrossRef]
Faulkner, L. Beyond the Five-User Assumption: Benefits of Increased Sample Sizes in Usability Testing. Behav. Res. Methods Instrum. Comput. 2003, 35, 379–383. [Google Scholar] [CrossRef]
Nielsen, J.; Landauer, T.K. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT’93 and CHI’93 Conference on Human Factors in Computing Systems, Amsterdam, The Netherlands, 24–29 April 1993; pp. 206–213. [Google Scholar]
Kim, Y.M.; Rhiu, I. Development of a Virtual Reality System Usability Questionnaire (VRSUQ). Appl. Ergon. 2024, 119, 104319. [Google Scholar] [CrossRef]
Chaves, A.P.; Doerry, E.; Egbert, J.; Gerosa, M. It’s How You Say It: Identifying Appropriate Register for Chatbot Language Design. In Proceedings of the 7th International Conference on Human-Agent Interaction, Kyoto, Japan, 6–10 October 2019; ACM: New York, NY, USA, 2019; pp. 102–109. [Google Scholar]
Cai, N.; Gao, S.; Yan, J. How the Communication Style of Chatbots Influences Consumers’ Satisfaction, Trust, and Engagement in the Context of Service Failure. Humanit. Soc. Sci. Commun. 2024, 11, 687. [Google Scholar] [CrossRef]
Bozkir, E.; Özdel, S.; Lau, K.H.C.; Wang, M.; Gao, H.; Kasneci, E. Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy. In Proceedings of the ACM Conversational User Interfaces 2024, Luxembourg, 8–10 July 2024; ACM: New York, NY, USA, 2024; pp. 1–7. [Google Scholar]

Figure 1. (a) Fragment of Adelantado Square in Torriani’s map; (b) fragment of Concepción Church in Torriani’s map; (c) Adelantado Square area in the 3D model of the city; and (d) Concepción Church area in the 3D model of the city.

Figure 2. Screen in the Meta Quest 3 VR application, Adelantado Square zone. (a) Entry point to the Adelantado Square with UI; (b) Cabildo House; (c) interactive VH and UI; and (d) Adelantado Square view, showing Cabildo House, Palace of the Adelantado, and characters.

Figure 3. Fragment from the work The Generous Lover, written in the 16th century.

Figure 4. Text fragment from Ref. [33] containing expert knowledge about the Adelantado.

Figure 5. Prompt template used for dialogues between the user and a VH.

Figure 6. Interaction sequence for asking the VH a question.

Figure 7. Pipeline for creating VH audio responses to users’ voice questions.

Figure 8. Prompt template for generating responses using different strategies for the user survey.

Figure 9. Distribution of user preferences for the three dialogue styles (RAG, RAG-literary, and Open) for each of the five questions (P1–P5).

Figure 10. Dialogue preferences by question (P1–P5) and user age group.

Figure 11. Spatial and Engagement metrics of the interaction with VH metrics recorded using the software Cognitive 3D during user tests.

Figure 12. The delay between text and voice generation in seconds.

Figure 13. User responses regarding their perception of the application were rated from 1 (strongly disagree) to 5 (strongly agree) for: (a) comfort, (b) fluidity, and (c) experience.

Table 1. Responses generated by the LLMs, according to the prompt strategy.

Question	RAG	RAG-Literary	Open
1	Alonso de Lugo’s residencia trial was a control mechanism imposed by the Crown to evaluate his administration. Lope de Sosa led the investigation, compiling accusations of abuse of power. However, despite the denunciations, Lugo managed to remain in his position until his death.	Oh, the misfortune of power ill-exercised! The Crown, suspicious of Lugo, sent Lope de Sosa to investigate his deeds. An inquiry was opened, witnesses were interrogated, and the parchments burned with accusations. But the Adelantado, with cunning and favor, maintained command like a helmsman in a storm.	The residencia trial of 1508 was crucial in overseeing Fernández de Lugo. It allowed the Crown, specifically the Council of the Indies, to examine his actions in Hispanic America. This judicial process established Lugo’s accountability and served as a supervisory mechanism to prevent abuses and ensure compliance with royal laws.

Example created by GPT-3.5-turbo; the complete set is in Appendix A.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sánchez-Berriel, I.; Pérez-Nava, F.; Pérez-Rosario, L. Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models. Electronics 2025, 14, 2478. https://doi.org/10.3390/electronics14122478

AMA Style

Sánchez-Berriel I, Pérez-Nava F, Pérez-Rosario L. Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models. Electronics. 2025; 14(12):2478. https://doi.org/10.3390/electronics14122478

Chicago/Turabian Style

Sánchez-Berriel, Isabel, Fernando Pérez-Nava, and Lucas Pérez-Rosario. 2025. "Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models" Electronics 14, no. 12: 2478. https://doi.org/10.3390/electronics14122478

APA Style

Sánchez-Berriel, I., Pérez-Nava, F., & Pérez-Rosario, L. (2025). Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models. Electronics, 14(12), 2478. https://doi.org/10.3390/electronics14122478

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models

Abstract

1. Introduction

2. Related Works

3. Materials

3.1. 16th Century VR Application of San Cristóbal De La Laguna

3.2. Historical Documentation

4. Methods

4.1. Interaction System Architecture

4.2. Dialogue Generation Adapted to Historical Context Using LLMs

4.3. User Experience of the LLM-Based Interaction System

5. Results

5.1. Dialogue Style Survey

5.2. User Experience of the LLM-Based Interaction System

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI