Voice, Text, or Embodied AI Avatar? Effects of Generative AI Interface Modalities in VR Museums
Abstract
1. Introduction
2. Related Work
2.1. Virtual Museums and Information Interaction in Immersive Environments
2.2. Generative AI and Intelligent Agents in VR and XR Systems
2.3. Multimodal and Embodied Interfaces for Information Delivery
2.4. Interface Modality, Cognitive Demand, and Engagement in Immersive VR
3. Generative AI Interface Design in a Virtual Museum
3.1. Virtual Museum Context
3.2. System Architecture Design
3.3. Interface Modalities
3.4. AI Configuration and Runtime Control
4. Research Method
4.1. Research Design
4.2. Participants
4.3. Instruments
4.3.1. User Engagement Scale
4.3.2. Perceived Information Quality
4.3.3. NASA Task Load Index
4.3.4. Information-Seeking Behavior
4.4. Research Procedures
4.5. Data Collection and Data Analysis
5. Results
5.1. User Engagement
5.2. Resulst of Perceived Information Quality
5.3. Resulst of NASA Task Load Index
5.4. Results of System Log Data
6. Discussion
6.1. Effects of AI Interface Modality on User Engagement
6.2. Multimodal Effects on Perceived Information Quality
6.3. Interface Modality and Cognitive Workload Stability
6.4. Design Implications for Generative AI-Driven Virtual Museums
6.5. Limitations and Future Work
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1
| Dimension | Questionnaire |
|---|---|
| Focused attention | I was fully focused on the interaction with the AI assistant during the VR task. |
| I lost track of time while interacting with the AI assistant. | |
| I was deeply concentrated on the information provided by the AI assistant. | |
| Perceived Usability | The AI assistant was easy to interact with during the VR task. |
| The interaction with the AI assistant felt smooth and well-organized. | |
| I could interact with the AI assistant without confusion or difficulty. | |
| Aesthetic appeal | The presentation of the AI assistant was visually appealing in the VR environment. |
| The AI interface enhanced my overall VR experience. | |
| Reward | Interacting with the AI assistant was enjoyable. |
| I felt motivated to continue interacting with the AI assistant. | |
| The AI assistant made the museum experience more interesting. |
Appendix A.2
| Dimension | Questionnaire |
|---|---|
| Accuracy | The information provided by the AI assistant was accurate. |
| Clarity | The information was clear and easy to understand. |
| Relevance | The information was relevant to the exhibit I was viewing. |
| Completeness | The explanations provided by the AI assistant were sufficiently detailed. |
| Appropriateness | I trusted the information provided by the AI assistant. |
| Usefulness | The AI assistant addressed my questions appropriately. |
| Trustworthiness | The information helped me understand the exhibit better. |
| Confidence | I felt confident relying on the information presented by the AI assistant. |
Appendix A.3
| Dimension | Questionnaire |
|---|---|
| Mental Demand | How mentally demanding was the task while interacting with the AI assistant in the VR museum? |
| Physical Demand | How physically demanding was the interaction with the AI assistant during the VR task? |
| Temporal Demand | How hurried or rushed was the pace of the task? |
| Performance | How successful were you in accomplishing the task goals? |
| Effort | How hard did you have to work to accomplish your level of performance? |
| Frustration | How insecure, discouraged, irritated, stressed, or annoyed did you feel during the task? |
References
- Tsita, C.; Satratzemi, M.; Pedefoudas, A.; Georgiadis, C.; Zampeti, M.; Papavergou, E.; Tsiara, S.; Sismanidou, E.; Kyriakidis, P.; Kehagias, D.; et al. A Virtual Reality Museum to Reinforce the Interpretation of Contemporary Art and Increase the Educational Value of User Experience. Heritage 2023, 6, 4134–4172. [Google Scholar] [CrossRef]
- Ruang-on, S.; Kidjaideaw, S.; Kaewchada, S.; Songsri-in, K. The Virtual Museum of Wat Wang Tawan Tok for Cultural Heritage Learning. KKU Sci. J. 2024, 52, 262–271. [Google Scholar] [CrossRef]
- Rahimi, F.; Sadeghi-Niaraki, A.; Choi, S.-M. Generative AI Meets Virtual Reality: A Comprehensive Survey on Applications, Challenges, and Future Directions. IEEE Access 2025, 13, 94893–94909. [Google Scholar] [CrossRef]
- Iacono, S.; Scaramuzzino, M.; Martini, L.; Panelli, C.; Zolezzi, D.; Perotti, M.; Traverso, A.; Vercelli, G.V. Virtual Reality in Cultural Heritage: A Setup for Balzi Rossi Museum. Appl. Sci. 2024, 14, 3562. [Google Scholar] [CrossRef]
- Sylaiou, S.; Fidas, C. Virtual Humans in Museums and Cultural Heritage Sites. Appl. Sci. 2022, 12, 9913. [Google Scholar] [CrossRef]
- Voinea, G.D.; Gîrbacia, F.; Postelnicu, C.C.; Duguleana, M.; Antonya, C.; Soica, A.; Stănescu, R.C. Study of Social Presence While Interacting in the Metaverse with an Augmented Avatar during Autonomous Driving. Appl. Sci. 2022, 12, 11804. [Google Scholar] [CrossRef]
- Yang, F.-C.; Acevedo, P.D.; Guo, S.; Choi, M.; Mousas, C. Embodied Conversational Agents in Extended Reality: A Systematic Review. IEEE Access 2025, 13, 79805–79824. [Google Scholar] [CrossRef]
- Ariya, P.; Khanchai, S.; Intawong, K.; Puritat, K. Enhancing Textile Heritage Engagement through Generative AI-Based Virtual Assistants in Virtual Reality Museums. Comput. Educ. X Real. 2025, 7, 100112. [Google Scholar] [CrossRef]
- Sangamuang, S.; Wongwan, N.; Intawong, K.; Khanchai, S.; Puritat, K. Gamification in Virtual Reality Museums: Effects on Hedonic and Eudaimonic Experiences in Cultural Heritage Learning. Informatics 2025, 12, 27. [Google Scholar] [CrossRef]
- Czimre, K.; Teperics, K.; Molnár, E.; Kapusi, J.; Saidi, I.; Gusman, D.; Bujdosó, G. Potentials in Using VR for Facilitating Geography Teaching in Classrooms: A Systematic Review. ISPRS Int. J. Geo-Inf. 2024, 13, 332. [Google Scholar] [CrossRef]
- Theodoropoulos, A.; Antoniou, A. VR Games in Cultural Heritage: A Systematic Review of the Emerging Fields of Virtual Reality and Culture Games. Appl. Sci. 2022, 12, 8476. [Google Scholar] [CrossRef]
- Li, J.; Lv, C. Exploring User Acceptance of Online Virtual Reality Exhibition Technologies: A Case Study of Liangzhu Museum. PLoS ONE 2024, 19, e0308267. [Google Scholar] [CrossRef] [PubMed]
- Jangra, S.; Singh, G.; Mantri, A.; Ahmed, Z.; Liew, T.W.; Ahmad, F. Exploring the Impact of Virtual Reality on Museum Experiences: Visitor Immersion and Experience Consequences. Virtual Real. 2025, 29, 84. [Google Scholar] [CrossRef]
- Chang, S.; Suh, J. The Impact of VR Exhibition Experiences on Presence, Interaction, Immersion, and Satisfaction: Focusing on the Experience Economy Theory (4Es). Systems 2025, 13, 55. [Google Scholar] [CrossRef]
- Xu, H.; Li, Y.; Tian, F. Contrasting Physical and Virtual Museum Experiences: A Study of Audience Behavior in Replica-Based Environments. Sensors 2025, 25, 4046. [Google Scholar] [CrossRef]
- Machidon, O.M.; Duguleana, M.; Carrozzino, M. Virtual Humans in Cultural Heritage ICT Applications: A Review. J. Cult. Herit. 2018, 33, 249–260. [Google Scholar] [CrossRef]
- Alabau, A.; Fabra, L.; Martí-Testón, A.; Muñoz, A.; Solanes, J.E.; Gracia, L. Enriching User–Visitor Experiences in Digital Museology: Combining Social and Virtual Interaction within a Metaverse Environment. Appl. Sci. 2024, 14, 3769. [Google Scholar] [CrossRef]
- Song, Y.; Wu, K.; Ding, J. Developing an Immersive Game-Based Learning Platform with Generative Artificial Intelligence and Virtual Reality Technologies—“LearningverseVR”. Comput. Educ. X Real. 2024, 4, 100069. [Google Scholar] [CrossRef]
- Lee, L.K.; Chan, E.H.; Tong, K.K.L.; Wong, N.K.H.; Wu, B.S.Y.; Fung, Y.C.; Fong, E.K.S.; Leong Hou, U.; Wu, N.I. Utilizing Virtual Reality and Generative AI Chatbot for Job Interview Simulations. In Proceedings of the 2024 International Symposium on Educational Technology (ISET 2024), Macau, Macao, 29 July–1 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 209–212. [Google Scholar]
- Hong, S.; Moon, J.; Eom, T.; Awoyemi, I.D.; Hwang, J. Generative AI-Enhanced Virtual Reality Simulation for Pre-Service Teacher Education: A Mixed-Methods Analysis of Usability and Instructional Utility for Course Integration. Educ. Sci. 2025, 15, 997. [Google Scholar] [CrossRef]
- Ngo, B.; Harman, J.; Türkay, S. Using LLMs to Develop Personalities for Embodied Conversational Agents in Virtual Reality. In Proceedings of OZCHI 2024: 36th Australasian Conference on Human–Computer Interaction; Association for Computing Machinery: New York, NY, USA, 2024; pp. 745–751. [Google Scholar]
- Dasa, D.; Board, M.; Rolfe, U.; Dolby, T.; Tang, W. Evaluating AI-Driven Characters in Extended Reality (XR) Healthcare Simulations: A Systematic Review. Artif. Intell. Med. 2025, 170, 103270. [Google Scholar] [CrossRef]
- Kiuchi, K.; Otsu, K.; Hayashi, Y. Psychological Insights into the Research and Practice of Embodied Conversational Agents, Chatbots and Social Assistive Robots: A Systematic Meta-Review. Behav. Inf. Technol. 2024, 43, 3696–3736. [Google Scholar] [CrossRef]
- Li, Y.; Yang, R.; Zou, J.; Xu, H.; Tian, F. Human-Centric Virtual Museum: Redefining the Museum Experience through Immersive and Interactive Environments. Int. J. Hum.–Comput. Interact. 2025, 41, 8426–8437. [Google Scholar] [CrossRef]
- Spyrou, O.; Hurst, W.; Krampe, C. A Reference Architecture for Virtual Human Integration in the Metaverse: Enhancing the Galleries, Libraries, Archives, and Museums (GLAM) Sector with AI-Driven Experiences. Future Internet 2025, 17, 36. [Google Scholar] [CrossRef]
- Chen, M.-X.; Hu, H.; Yao, R.; Qiu, L.; Li, D. A Survey on the Design of Virtual Reality Interaction Interfaces. Sensors 2024, 24, 6204. [Google Scholar] [CrossRef] [PubMed]
- Yuan, Z.; He, S.; Liu, Y.; Yu, L. MEinVR: Multimodal Interaction Techniques in Immersive Exploration. Vis. Inform. 2023, 7, 37–48. [Google Scholar] [CrossRef]
- Chang, Z.; Bai, H.; Zhang, L.; Gupta, K.; He, W.; Billinghurst, M. The Impact of Virtual Agents’ Multimodal Communication on Brain Activity and Cognitive Load in Virtual Reality. Front. Virtual Real. 2022, 3, 995090. [Google Scholar] [CrossRef]
- Elfleet, M.; Chollet, M. Investigating the Impact of Multimodal Feedback on User-Perceived Latency and Immersion with LLM-Powered Embodied Conversational Agents in Virtual Reality. In Proceedings of IVA 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–9. [Google Scholar] [CrossRef]
- Jolibois, S.C.; Ito, A.; Nose, T. The Development of an Emotional Embodied Conversational Agent and the Evaluation of the Effect of Response Delay on User Impression. Appl. Sci. 2025, 15, 4256. [Google Scholar] [CrossRef]
- Potdevin, D.; Clavel, C.; Sabouret, N. Virtual Intimacy in Human–Embodied Conversational Agent Interactions: The Influence of Multimodality on Its Perception. J. Multimodal User Interfaces 2021, 15, 25–43. [Google Scholar] [CrossRef]
- Radianti, J.; Majchrzak, T.A.; Fromm, J.; Wohlgenannt, I. A Systematic Review of Immersive Virtual Reality Applications for Higher Education: Design Elements, Lessons Learned, and Research Agenda. Comput. Educ. 2020, 147, 103778. [Google Scholar] [CrossRef]
- Sutcliffe, A. Multimedia and Virtual Reality: Designing Multisensory User Interfaces; Psychology Press: London, UK, 2003. [Google Scholar]
- Nelson, B.C.; Erlandson, B.E. Managing Cognitive Load in Educational Multi-User Virtual Environments: Reflections on Design Practice. Educ. Technol. Res. Dev. 2008, 56, 619–641. [Google Scholar] [CrossRef]
- Marucci, M.; Di Flumeri, G.; Borghini, G.; Sciaraffa, N.; Scandola, M.; Pavone, E.F.; Babiloni, F.; Betti, V.; Aricò, P. The Impact of Multisensory Integration and Perceptual Load in Virtual Reality Settings on Performance, Workload and Presence. Sci. Rep. 2021, 11, 4831. [Google Scholar] [CrossRef] [PubMed]
- Pan, Y.; Steed, A. Avatar Type Affects Performance of Cognitive Tasks in Virtual Reality. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology (VRST); Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
- Kruijff, E.; Marquardt, A.; Trepkowski, C.; Schild, J.; Hinkenjann, A. Designed Emotions: Challenges and Potential Methodologies for Improving Multisensory Cues to Enhance User Engagement in Immersive Systems. Vis. Comput. 2017, 33, 471–488. [Google Scholar] [CrossRef]
- O’Brien, H.L.; Cairns, P.; Hall, M. A Practical Approach to Measuring User Engagement with the Refined User Engagement Scale (UES) and New UES Short Form. Int. J. Hum.–Comput. Stud. 2018, 112, 28–39. [Google Scholar] [CrossRef]
- Hart, S.G. NASA-Task Load Index (NASA-TLX); 20 Years Later. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting; Sage: Los Angeles, CA, USA, 2006; Volume 50, pp. 904–908. [Google Scholar] [CrossRef]
- Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload; Hancock, P.A., Meshkati, N., Eds.; North-Holland: Amsterdam, The Netherlands, 1988; pp. 139–183. [Google Scholar]
- Rothe, S.; Tran, K.; Hußmann, H. Dynamic Subtitles in Cinematic Virtual Reality. In Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video (TVX ’18), Seoul, Republic of Korea, 26-28 June 2018; ACM: New York, NY, USA, 2018; pp. 209–214. [Google Scholar] [CrossRef]
- Rothe, S.; Hußmann, H. Guiding Visual Attention in Cinematic Virtual Reality by Subtitles. In Proceedings of the 2019 ACM International Conference on Interactive Experiences for TV and Online Video (TVX ’19), Salford, UK, 5–7 June 2019; ACM: New York, NY, USA, 2019; pp. 121–128. [Google Scholar]
- Kaplan-Rakowski, R.; Gruber, A. An Experimental Study on Reading in High-Immersion Virtual Reality. Br. J. Educ. Technol. 2024, 55, 541–559. [Google Scholar] [CrossRef]
- Yoshida, S.; Koyama, Y.; Ushiku, Y. Toward AI-Mediated Avatar-Based Telecommunication: Investigating Visual Impression of Switching between User- and AI-Controlled Avatars in Video Chat. IEEE Access 2024, 12, 113372–113383. [Google Scholar] [CrossRef]
- Kim, H.; Lee, S.; Kang, C. From Controllers to Multimodal Input: A Chronological Review of XR Interaction across Device Generations. Sensors 2025, 26, 196. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Deng, Z.; Deng, D.; Wang, X.; Sheng, R.; Cai, Y.; Qu, H. Empowering Multimodal Analysis with Visualization: A Survey. Comput. Sci. Rev. 2025, 57, 100748. [Google Scholar] [CrossRef]
- Horvat, N.; Kunnen, S.; Štorga, M.; Nagarajah, A.; Škec, S. Immersive Virtual Reality Applications for Design Reviews: Systematic Literature Review and Classification Scheme for Functionalities. Adv. Eng. Inform. 2022, 54, 101760. [Google Scholar] [CrossRef]
- Karuzaki, E.; Partarakis, N.; Patsiouras, N.; Zidianakis, E.; Katzourakis, A.; Pattakos, A.; Zabulis, X. Realistic Virtual Humans for Cultural Heritage Applications. Heritage 2021, 4, 4148–4171. [Google Scholar] [CrossRef]
- Liu, J.Y.W.; Yin, Y.H.; Kor, P.P.K.; Cheung, D.S.K.; Zhao, I.Y.; Wang, S.; Leung, A.Y. The Effects of Immersive Virtual Reality Applications on Enhancing the Learning Outcomes of Undergraduate Health Care Students: Systematic Review with Meta-Synthesis. J. Med. Internet Res. 2023, 25, e39989. [Google Scholar] [CrossRef]
- El Iskandarani, M.; Bolton, M.; Riggs, S.L. Examining Dual-Task Interference Effects of Visual and Auditory Perceptual Load in Virtual Reality. Int. J. Hum.–Comput. Stud. 2025, 205, 103619. [Google Scholar] [CrossRef]








| Gender | Voice-Only (n = 25) | Voice + Text (n = 25) | Voice + Avatar (n = 25) | Total (n = 75) |
|---|---|---|---|---|
| Male | 13 | 12 | 14 | 39 |
| Female | 12 | 13 | 11 | 36 |
| Task No. | Task Description | Purpose |
|---|---|---|
| T0 | Participants familiarized themselves with basic VR controls and movement within the virtual museum. No data from this phase were included in the analysis. | Reduce novelty effects and ensure basic VR proficiency. |
| T1 | Participants activated the generative AI assistant at a predefined exhibit using the standard interaction method provided by the system. | Ensure consistent initiation of AI interaction across participants. |
| T2 | Participants asked a fixed set of predefined questions related to the exhibit (e.g., object identity, function, materials, and cultural or historical significance) in a prescribed order. | Isolate and compare AI interaction across interface modalities. |
| T3 | Participants requested a brief clarification or one-sentence summary from the AI assistant following the initial responses. | Increase cognitive processing demands and assess perceived information quality. |
| T4 | Participants ended the AI interaction using the system’s standard exit mechanism. | Ensure consistent completion of the AI interaction session. |
| Questionnaire | Source | Sum of Squares | df | Mean Square | F | Sig. |
|---|---|---|---|---|---|---|
| UES | Between Groups | 2.487 | 2 | 1.243 | 7.36 | <0.001 |
| Within Groups | 12.158 | 72 | ||||
| Total | 14.645 | 74 | ||||
| PIQ | Between Groups | 1.772 | 2 | 0.886 | 54.68 | <0.001 |
| Within Groups | 1.166 | 72 | 0.016 | |||
| Total | 2.938 | 74 | ||||
| NASA-TLX | Between Groups | 0.082 | 2 | 0.041 | 1.28 | 0.28 |
| Within Groups | 2.30 | 72 | 0.032 | |||
| Total | 2.38 | 74 |
| Questionnaire | (I) Group | (J) Group | Mean Difference (I–J) | Sig. |
|---|---|---|---|---|
| UES (Post hoc: Games–Howell) | Voice only | Voice + Text | −0.08 | 0.81 |
| Voice only | Voice + Avatar | −0.42 | 0.003 | |
| Voice + Text | Voice + Avatar | −0.34 | 0.004 | |
| PIQ (Post hoc: Tukey HSD) | Voice only | Voice + Text | −0.34 | <0.001 |
| Voice only | Voice + Avatar | −0.03 | 0.74 | |
| Voice + Avatar | Voice + Text | −0.31 | <0.001 | |
| NASA-TLX (Post hoc: Tukey HSD) | Voice only | Voice + Text | 0.08 | 0.31 |
| Voice only | Voice + Avatar | 0.02 | 0.88 | |
| Voice + Avatar | Voice + Text | 0.06 | 0.41 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ariya, P.; Worragin, P.; Khanchai, S.; Poollapalin, D.; Julrode, P. Voice, Text, or Embodied AI Avatar? Effects of Generative AI Interface Modalities in VR Museums. Informatics 2026, 13, 42. https://doi.org/10.3390/informatics13030042
Ariya P, Worragin P, Khanchai S, Poollapalin D, Julrode P. Voice, Text, or Embodied AI Avatar? Effects of Generative AI Interface Modalities in VR Museums. Informatics. 2026; 13(3):42. https://doi.org/10.3390/informatics13030042
Chicago/Turabian StyleAriya, Pakinee, Perasuk Worragin, Songpon Khanchai, Darin Poollapalin, and Phichete Julrode. 2026. "Voice, Text, or Embodied AI Avatar? Effects of Generative AI Interface Modalities in VR Museums" Informatics 13, no. 3: 42. https://doi.org/10.3390/informatics13030042
APA StyleAriya, P., Worragin, P., Khanchai, S., Poollapalin, D., & Julrode, P. (2026). Voice, Text, or Embodied AI Avatar? Effects of Generative AI Interface Modalities in VR Museums. Informatics, 13(3), 42. https://doi.org/10.3390/informatics13030042

