Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance

Palige, Selina; Legler, Franziska; Dittrich, Frank; Bullinger, Angelika C.

doi:10.3390/electronics15030577

Open AccessArticle

Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance

Chair of Ergonomics and Innovation, Department of Mechanical Engineering, Chemnitz University of Technology, 09125 Chemnitz, Germany

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(3), 577; https://doi.org/10.3390/electronics15030577

Submission received: 23 December 2025 / Revised: 22 January 2026 / Accepted: 26 January 2026 / Published: 28 January 2026

(This article belongs to the Special Issue Emerging Technologies in Augmented, Virtual, and Mixed Reality: Advancing Human Experience in Digital and Physical Environments)

Download

Browse Figures

Versions Notes

Abstract

Emerging technologies offer cultural institutions opportunities to expand their audiences and make their content more accessible. Augmented Virtuality (AV), which integrates real-world content into virtual environments, shows particular potential. It enables the transmission of live or pre-recorded stage performances, such as concerts and theater productions, from the stage to virtual audiences via increasingly affordable head-mounted displays (HMDs). However, as AV remains under-researched, little is known about which recording features enable an immersive user experience while maintaining cost-efficiency—an essential requirement for resource-constrained cultural institutions. In this context, we investigate the influence of features of audio and video recordings on key dimensions of user experience in the use case of a real opera performance integrated into a virtual opera house. We conducted a 2 × 2 within-subjects study with 30 participants to measure the effects of 2D versus 3D videos and stereophonic versus spatial audio rendering on several key dimensions of user experience. The results show that spatial audio has a positive impact on Place Illusion, whereas video dimensionality had no significant effect. Recommendations for the design of AV applications are derived from study results, aiming at balancing immersive user experience and cost-efficiency for virtual cultural participation.

Keywords:

user experience; spatial audio; stereoscopic video; mixed reality; augmented virtuality; immersive cultural experience; virtual environments

1. Introduction

Cultural stage events like operas or theater performances not only offer social elements that enhance live experience, but also convey an ambience, a unique atmosphere and a sense of being “live” that have been proven difficult to experience via traditional screen-based streaming [1,2]. With the ever-increasing accessibility of low-cost head-mounted displays (HMDs), many possibilities arise that might change this status.

Users of HMDs are able to experience an extension of the real world from home by accessing three-dimensional artificial worlds. Virtual environments (VEs) are based on powerful computing technology and algorithms to create immersive experiences [3]. These immersive experiences can be defined as “the acceptance of one’s involvement in the moment that is conceived through multiple senses, creating fluent and uninterrupted physical, mental, and/or emotional engagements with a present experience” [4] (p. 633). Depending on the purpose, VEs may involve the simultaneous participation of multiple users who are in different physical locations, which enables social interaction [5,6]. Due to their immersive features, VEs can create a sense of presence, colloquially described as a “sense of being there” [3].

This opens up great opportunities for both cultural creators, such as theater and opera professionals, and cultural enthusiasts: On the one hand, VEs provide new business models for the provision of and access to entertainment and cultural content, e.g., allowing real-world regional events to be shared with a wider audience. Therefore, it is a solution to help cultural stage formats to find their way out of their fixed locations and into the homes of their audiences [7]. On the other hand, cultural VE formats enable expanded access opportunities for audiences—referred to as “users” in the context of virtual events—who have previously been unable or unwilling to experience traditional cultural events, such as individuals with physical disabilities, neurodivergent individuals, or those who simply prefer to experience cultural content from the comfort of their own homes. Immersive technologies, therefore, have the potential to foster greater diversity and inclusion in cultural participation [8].

The original Reality-Virtuality Continuum by Milgram and Kishino [9] characterizes the degrees of extending the real world with virtual content. In their original taxonomy, Virtual Reality (VR) is presented as the endpoint of the continuum, fully immersing the user in an artificial virtual world and blocking out the real world, allowing them to immerse themselves entirely in the virtual one. So far, research on VR in the cultural context has primarily focused on musical performances [10,11,12] (e.g., under the keyword “Music 3.0” [13]) and cinema [14,15]. However, as the technology continues to evolve, new areas of application are emerging. While VR has been argued to show potential for innovative theatrical performances, including enhanced artistic creativity [8], concerns about preserving the “performance’s ephemeral nature” [16] of live performances make fully artificial virtual environments challenging for cultural stage events. Moreover, developing immersive VR applications requires significant financial resources, which cultural institutions rarely have at their disposal.

Between the endpoints of pure reality and virtuality, Mixed Reality (MR) describes technologies that blend real and virtual elements to varying degrees, including Augmented Reality (AR) and Augmented Virtuality (AV) [9]. To overcome the limitations of VR in the specific case of virtual stage events, AV has particular potential as it allows real-time transmission of cultural performances like concerts or theater from the physical stage to a virtual environment. AV comprises two elements: a three-dimensional virtual environment and real-world content embedded within it [9,17]. In a virtual opera, this can be achieved, for example, by recording the real-world stage performance and integrating the recordings in a multiplayer application that consists of a virtual 3D model of a concert hall or theatre, and includes a virtual stage as well as a virtual auditorium. Although the implementation of pre-recorded video content naturally implies a lower level of direct interaction between performers and audience, this aligns well with the traditionally passive nature of opera and theatre experiences, making AV particularly well-suited for these cultural formats. As the recordings are embedded in an interactive, immersive VE, AV also holds significant advantages over 360-degree video streaming. While the latter merely enables content to be streamed, AV enables multiplayer setups and, therefore, social interaction and connection over distance.

Despite being the most underexplored research area within the Reality-Virtuality Continuum [9,18], AV therefore offers several advantages for the implementation of stage performances: (i) The virtual environment only needs to be designed and developed once, therefore complex methods for digitalizing real spaces, e.g., photogrammetry [19], can be applied to ensure a high level of immersion. (ii) Recording performers’ emotions is beneficial, as even by applying advanced AI algorithms, replicating emotional nuances in performers’ facial expression in VR based on avatar representations remains challenging [20]. (iii) Further, the technical implementation based on real-time recordings enables hybrid formats, extending the performance’s reach beyond the real auditorium, encompassing the virtual audience as well.

When designing an AV application, the perennial challenge is to determine which elements and features contribute to an immersive user experience and should therefore be implemented. A recent systematic review on attendance motives has shown that ‘artistic content’ is a main motivator for both real-world and virtual art performances [21]. This attendance motive category highlights ‘aesthetics’, which results from sensory appeal as well as the visual and auditory design of the performance. Therefore, the representation of artistic content must be carefully considered to meet the needs and expectations of the audience in virtual stage performances. In the special case of AV applications, the representation of the artistic content is highly dependent on the recording methods used. As the event environment (e.g., theater, concert hall) is based on a multiplayer, three-dimensional environment, empirical findings have shown that increasing users’ feeling of presence requires the quality of various implemented elements to be matched [22]. Consequently, audio and video recordings would require spatial features to match the three-dimensional event environment. Both features show promise in creating immersive audio-visual experiences that match users’ real-world perception, from which they could potentially benefit.

However, implementing these spatial features introduces significant technical constraints for real-time, distributed AV applications. The increased technical requirements can reduce the number of users who can simultaneously access a distributed AV application or negatively impact system performance, e.g., through increased latency or reduced bandwidth, which are crucial factors for user experience [23]. Furthermore, empirical findings on the effects of stereoscopic video and spatial audio in virtual environments show contradictory results: Some studies demonstrate positive effects on presence [24,25,26,27,28], while others show no effects or even negative effects such as motion sickness [29,30].

To the best of our knowledge, there are no empirical studies that examine this relationship between specific recording features and the user experience in cultural AV environments. Although prior work has examined the effects of video and audio features in VR and 360-degree video applications, these results cannot be easily transferred to AV applications, which—mentioned above—differ in their levels of virtuality and interactivity and, crucially, in the integration of spatial audio and video. The context-dependency of the effects [31] and the lack of studies specifically investigating AV applications [18,32] make it unclear whether the benefits of spatial features justify their technical costs. Drawing from the need to balance user experience and technical performance, such as system stability and accessibility for distributed users, our interest is in exploring the relation of audio and video features to the perceived user experience in an AV application. We chose the context of an opera performance for this research, as it places demands on both the video and audio features. Our study is the first to investigate the absolute and relative effects of video and audio recording features for the design of AV applications for cultural institutions. The study results are not only an empirical contribution but also provide practical recommendations that act as a resource allocation framework. This enables designers and developers to make informed decisions about effective and efficient AV applications. This is important for cultural institutions, which have limited resources at their disposal.

As theoretical models and constructs considering user experience in virtual environments are highly diverse and debated in the VR community, in the following, this paper first briefly summarizes key dimensions of user experience to define terms and establish a common understanding of constructs relevant for this paper.

1.1. Key Dimensions of User Experience in Virtual Environments

There is no consensus in MR research on how to operationalize user experience, and different theoretical models subsume different constructs under this umbrella term [33]. Within the field of human factor research, user experience can be defined as “a person’s perceptions and reactions resulting from the usage or expected use of a product, system, or service” (ISO 9241-210, 2019; [34]). These perceptions and reactions include all emotions, ideas, preferences, perceptions, feelings of well-being or discomfort, behaviors, and performance that occur throughout the entire usage cycle. The selection and measurement of relevant constructs typically depend on the specific application context and research objectives. For the present study of a virtual opera performance, we identified several relevant constructs that capture different key dimensions of user experience in an AV application for cultural event experiences. Reference [35] suggests several constructs to be relevant to capture different key dimensions of user experience in VR, including presence, VR sickness, emotions, aesthetics, and usability. Our study’s primary objective was to examine the audio and video features of embedded recordings in AV. Given that these recordings do not impact the application’s usability, it appears unreasonable to consider usability as a metric for assessing their effects. The same argument can be applied to aesthetics, as this dimension is operationalized as aesthetic appreciation of the application design. It is measured using adjectives such as ‘beautiful’, ‘attractive’, and ‘elegant’, and therefore represents an overall impression and evaluation of the entire VR experience. It is not assumed to be sensitive to the spatial features of embedded recordings in an AV application. However, the three remaining dimensions of user experience seem applicable to an AV application and are further described.

The dimension of (tele)presence is one of the most widely used and established concepts in VR research, often considered a key indicator of the overall “quality” of a virtual experience [22]. It has gained growing attention as researchers and practitioners from different fields seek to understand the social and psychological effects of immersive, interactive technologies [36]. According to a meta-analysis by Skarbez et al. (2017) [22] presence can be broadly understood as “the perceived realness of a mediated or virtual experience” (p. 16). This differs from the commonly used definition of presence, which describes the subjective ‘sense of being there’ [3], an experience in which users react to the mediated space as if it were real, even though they are cognitively aware of its artificial nature. However, as [22] argue, this particular sensation of being located in a remote or virtual place is more accurately captured by the term place illusion. Place illusion—rather than presence in its broader sense—refers specifically to the sensation of being physically situated in a space that is not actually real. For this paper, we follow the model of Skarbez et al. (2017) [22] that defines place illusion as closely related to pure spatial presence, resulting from the sensory immersive characteristics of the application and being a subdimension of the broader and higher-order construct of presence. From this, it can be assumed that improvements in sensory immersive characteristics, such as spatial features in an AV application, directly affect the place illusion and therefore contribute positively to overall presence. Still, Skarbez et al. (2017) [22] do not address MR applications, thereby limiting the transfer of relationships and hierarchical orders of constructs to AV to an exploratory basis. As the overall presence as well as the place illusion are relevant for the user experience of a virtual opera experience, both constructs should be considered separately as outcomes in empirical studies.

Compared to live performances, the dimension VR sickness or motion sickness represents a potential negative consequence when using an HMD, like eye-strain, blurred vision, or nausea, that can significantly impair user experience [37]. Given that various characteristics of content in VEs, including visual features such as stereoscopy, can influence these sources of discomfort, it is essential to monitor motion sickness when evaluating different media integration approaches in an AV application.

The emotions dimension is characterized by the occurrence of basic feelings that positively or negatively contribute to the overall user experience. In VR research, enjoyment [38] is an important outcome measure, as it represents an overarching goal of the experience of cultural stage events. Enjoyment captures the affective or emotional dimension of user experience, reflecting the pleasure and emotional engagement users derive from the virtual experience. As enjoyment was found to be a central attendance motive for performing arts events—both in real-world and virtual events [21]—the concept of enjoyment serves as a critical indicator of whether the application successfully delivers emotionally meaningful content. Enjoyment and presence were found to be positively related [39] and findings of [35] even suggest that emotions partially mediate the relationship between presence and user experience. Study results of [40] indicate that presence increases enjoyment and [41] even defined enjoyment to be a ‘presence effect’. This suggests enjoyment to be a rather distal outcome, while presence might be a rather direct and distal outcome resulting from VE design.

Additionally, we assume Quality of Experience (QoE) to be a relevant criterion. QoE, in contrast, represents the cognitive-evaluative dimension of user experience. It refers to users’ overall evaluation and judgment of the quality of their experience, encompassing factors such as perceived technical quality, satisfaction with the application, and the degree to which the experience meets their expectations [42]. Therefore, while enjoyment focuses on emotional responses, QoE captures users’ reflective assessment of the experience.

Summing up, a substantial amount of MR literature highlights the relevance of the above-mentioned key dimensions of user experience. While there is, to our knowledge, no model or framework that explains the relationships of these dimensions in AV, we still assume these dimensions to be a useful guide to select relevant outcome criteria for evaluating design features in AV.

1.2. Video Features

The visual sense is the most valued of the five human senses [43]. Therefore, it is not surprising that a major effort in the development of a VE is put into the visual design. The majority of humans can perceive depth through natural stereoscopic vision, which is based on binocular perception. This enables them to view the world in three dimensions. Stereopsis contributes to a vivid experience of the world and improves perceptual quality [44]. From this, it can be deduced that depth perception is crucial in VEs, as it enhances the sense of realism (presence) by creating an experience that visually resembles the real world, and simultaneously increases place illusion. To conclude, the format of video recording transferred to an AV application should be designed carefully. Stereoscopic videos, recorded with two cameras and an appropriate stereo base [45], can mirror this natural perception and potentially enhance presence in VEs.

As discussed above, findings from TV, VR, or 360-degree video applications may not be directly applicable to AV applications. Nevertheless, we want to provide a brief review of the current state of research in these related fields below, as the reported findings can offer insights into the potential effects of spatial audio and video rendering features within AV applications. Regarding the effect of stereoscopy, contradictory results can be found in the literature, with some studies showing positive effects, while others show no or even negative effects. Consistent with expectations based on human natural stereoscopic vision, several studies on TV screens [25,26,27] as well as one study on HMDs [28] have shown positive effects of stereoscopy on the sense of presence. Furthermore, a meta-analysis on immersive system features [24] emphasizes stereoscopic vision as a major factor in enhancing presence, recommending its prioritization in immersive system development. On the other hand, [31] shows that stereoscopy does not automatically improve the sense of presence and that the context of use has to be taken into account. Using 360-degree videos, the comparison of 2D and 3D videos did neither show significant differences for presence, nor, surprisingly, for the subscale realness [46]. Furthermore, it was shown that 3D visuals might lead to motion sickness [29,30]. This effect is enhanced even more in case of low update rates [23] resulting from an insufficient available data rate. The risk of insufficient data rates is higher when streaming stereoscopic videos into the virtual environment than with 2D videos. To the best of our knowledge, no study investigated effects of video features in an actual AV application. In summary, the effects of video features in VEs in general and for AV applications in particular remain unclear as a result of limited research in this field. However, since (i) depth information may enhance an immersive experience, (ii) stereoscopic recordings mirror familiar real-world perception, and (iii) stereoscopic recordings create visual quality that matches the three-dimensional virtual environment [22], it can be assumed that stereoscopic recordings positively affect both place illusion and overall presence in AV applications. As a result of a vivid experience and with reference to the above effects of presence (cf. Section 1.1), it can further be assumed that spatial video features enhance enjoyment and QoE but simultaneously increase the risk of motion sickness.

1.3. Audio Features

Audio features represent the second characteristic that can be intentionally influenced when integrating recordings in an AV application.

Sound is an important aspect of an immersive and realistic experience as well, and the increasing demand for immersive virtual environments has given the topic of spatial sound a new relevance [47,48,49]. Humans localize sound through binaural hearing by the comparison of signals received by the two ears [50]. Spatial audio rendering in VEs can recreate this three-dimensional auditory experience [47,48,49]. Therefore, binaural hearing, like stereoscopic vision, should lead to a higher perception of vividness and mirror the real-world experiences (presence) of people in virtual environments. Additionally, auditory stimuli from different spatial directions increase the impression of being in a three-dimensional space. As audio contributes to the overall immersion of a system [24] and immersion being the precondition for the development of place illusion [22], spatial audio features need to be considered in the design of an AV application.

In virtual environments, binaural audio rendering is created by estimating the spatial position of the user relative to the sound source(s) and accordingly adapting the auditory information to both ears based on direction, distance or sound barriers [51]. Achieving spatial audio in virtual environments requires a complex rendering process [47]. Ideally, each sound source is recorded separately by microphones, which further increases technical requirements.

In an empirical study by Shin et al. (2019) [11] including 360-degree videos of a live concert, spatial audio was found to increase presence, which further positively affected enjoyment. The effect was higher than the effect of stereoscopic vision. However, spatial sound can lead to motion sickness and thus negatively affect the user experience [52]. Contrary to Shin et al. (2019) [11], Cummings and Bailenson (2016) [24] also argue that stereoscopic vision should be preferred over auditory stimuli in the development of immersive applications from a cost–benefit perspective, as the positive effect of audio on presence measured in their meta-analysis was comparably low. To the best of our knowledge, there is no study that investigated audio effects in an actual AV application.

In summary, the effects of audio features in VEs in general and for AV applications in particular remain unclear as a result of limited research in this field. Still, it can be assumed that spatial audio positively affects place illusion, presence, enjoyment, and QoE due to the following reasons: (i) Spatial audio has the potential to induce a vivid experience, (ii) binaural rendering mirrors familiar real-world experiences, and (iii) spatial sound matches the three-dimensional virtual environment regarding the level of audio quality. Simultaneously, the risk of motion sickness was found to increase when applying spatial audio.

1.4. Scope of Study

Facing inconsistent results from VR research and insufficient empirical data from AV research, the present study investigates the relevance of the spatial aspects of video and audio features for key dimensions of user experience. We evaluate which components of an AV application actually enhance the user experience and exploratively compare the relative importance of video and audio features, especially taking into account the limited system performance resources of AV applications. Therefore, our research question is as follows:

How do video dimensionality and audio rendering features of an opera performance recording affect the user experience of an AV opera experience?

We operationalize the user experience based on specific key dimensions derived from the literature (cf. Section 1.1). Due to the lack of specific theoretical models and frameworks for AV, we explore the derived key dimensions as separate outcome criteria. Based on the theoretical considerations on the effects of spatial features of recordings in AV applications outlined above, we derive the following hypotheses, which will be tested in the present study:

H1.

Compared to 2D videos, integrating 3D videos into a virtual environment will result in higher levels of presence (H1a), place illusion (H1b), enjoyment (H1c), perceived QoE (H1d), but will also lead to increased motion sickness (H1e).

H2.

Compared to stereophonic audio, the integration of spatial audio recordings into a virtual environment will result in higher levels of presence (H2a), place illusion (H2b), enjoyment (H2c), perceived QoE (H2d), but will also lead to increased motion sickness (H2e).

To answer the research question and test the hypotheses, a quantitative laboratory study was conducted with participants experiencing opera scenes in an AV application, applying a within-subjects design. The following sections describe the study method in detail, followed by the results and implications for the design of AV applications.

2. Materials and Methods

2.1. Participants

A total of 30 participants (17 female and 13 male) aged between 19 and 43 years (M = 26.27; SD = 7.66) took part in the experiment. The sample consisted of interdisciplinary researchers related to the Chemnitz University of Technology, as well as interested participants from the surrounding area. They had no involvement in the project or prior knowledge of the study. Participants were all informed about the possible occurrence of motion sickness due to the use of an HMD. Diagnosed epilepsy was an exclusion criterion.

Additionally, 25 participants (83.3%) had used an HMD before; however, only two individuals (6.7%) reported using them regularly (at least once a month). Only 5 of the 30 participants (16.7%) regularly attend real-world stage performances (such as theater and opera), while eight (26.7%) reported visiting cultural stage events at most once a year or less frequently. Eleven people (36.7%) wore visual aids during the trial, each reporting full visual acuity.

2.2. Virtual Environment

Using photogrammetry techniques, the VE with six degrees of freedom was constructed as a digital replica of a German opera house. Participants were seated in the center of a row, positioned seven meters from the stage, the same distance at which the videos were recorded. This seating position remained unchanged throughout the experiment. No non-player character avatars or a characteristic soundscape of the opera hall were integrated into the seating area to avoid distraction of participants and ensure full focus on the video and audio recordings. The digital replica of the stage was extended by a virtual screen object. On this screen, video recordings from the operetta “The Merry Widow”, shot at the real opera house, were seamlessly integrated to make the stage content look like part of the VE, rather than on a cinema screen. Figure 1 shows a schematic visualization of the AV application. Participants’ (visualized by the avatar) ego perspective included both the virtual model of the opera and the real-world video recording [53].

2.3. Technical Setup

The prototype AV application was developed using Unity 2021.3.0f1 (Unity Technologies, San Francisco, CA, USA). It ran on a Fujitsu Celsius H980 laptop (Fujitsu Limited, Tokyo, Japan) equipped with an Intel i7-8850H processor (Intel Corporation, Santa Clara, CA, USA) and an Nvidia Quadro P4200 graphics card (Nvidia Corporation, Santa Clara, CA, USA). The experiment was conducted in 2023 using the Meta Quest 2 HMD (Meta Platforms Inc., Menlo Park, CA, USA), as it was the most widely used VR headset for home use. It enabled a total render solution of 1832 × 1920 pixels per eye and was connected to the laptop via an Oculus Link cable. The VE was rendered with 72 frames per second. Due to the field of view of the Meta Quest 2, the entire stage was visible, with part of the surrounding VE also within the user’s field of vision, without the need to turn one’s head.

The video was recorded using eight Sony RX0 II cameras (Sony Corporation, Tokyo, Japan), enabling recordings with 4K resolution, arranged in a stereoscopic camera array at a height of 140 cm. Afterward, the recordings were rendered to create a 3D video (stereo base = 12 cm). Due to the use case of an opera experience, the videos were shot at a large distance from the stage (seven meters), requiring hyper-stereographic video to create a sufficient depth effect. Therefore, the stereo base of 12 cm, which exceeds the natural human stereo base, was chosen to ensure an appropriate depth perception.

Both videos—2D and 3D—were then integrated into the VE by mapping them to the texture of the virtual screen object with a resolution of 1920 × 1080 pixels. The frame rate of the video rendering was set to 30 frames per second. Both settings—screen resolution and frame rate—were based on pre-tests that showed a subjectively satisfactory level of video detail and no apparent restrictions on the fluidity of the video, while ensuring manageable file sizes and good overall VE performance.

With regard to the audio recordings, all performers were recorded individually. In addition, the orchestra and the hall were picked up via stereo miking at the orchestra pit. The final stereo mix was created via Samplitude Pro X7 (Magix Software GmbH, Berlin, Germany) and then rendered in Unity in stereo and in spatial audio, resulting in two audio rendering versions. For the stereo condition, a consistent auditory experience was created regardless of head movement, whereas in the spatial audio condition, sound sources were rendered with head tracking, allowing for realistic spatial localization and direction-dependent sound intensity. To ensure high audio quality despite the relatively limited audio capabilities of the Meta Quest 2, external Bluetooth headphones (JBL Live 660; JBL, Stamford, CT, USA) were used for audio playback within the experiment.

2.4. Experimental Design

To test the hypotheses, a 2 × 2 within-subjects design was applied for the study. We varied the recordings of an opera that were integrated on the virtual stage by two independent variables: Video dimensionality (2D (V_2D) vs. 3D (V_3D)) and audio rendering (stereophonic (A_ST) vs. spatial (A_SP)). This experimental design resulted in four scenarios that all participants experienced in randomized order after a baseline condition (cf. Section 2.6). Therefore, a Latin square was applied, resulting in 24 sequences of the four experimental scenarios. Accordingly, six sequences need to be used twice and were selected to minimize sequence overlap. Apart from the recording features, neither the VE nor the content of the integrated real-world recordings changed between the scenarios. The study design was reviewed and approved by the ethics committee of Chemnitz under the reference number #101558791.

2.5. Data Collection

Several dependent variables were applied. In addition to the subjective quantitative measurement tools, qualitative data was collected at the end of each experiment in the form of a short-guided interview with open-ended questions.

2.5.1. Quantitative Measures

To measure the dependent variables, and therefore the relevant key dimensions of user experience as presented in Chapter 1.1, we aimed to use well-established, validated scales. However, due to the specific context of our AV application, we occasionally had to delete individual items or develop items on our own for content-related reasons. Through reliability analysis, we ensured internal consistency of the factors. Since we used a 2 × 2 within-subject design, all participants had to complete four questionnaires. To minimize the burden and effort for the participants, we therefore included several single-item measures. Across all scales, participants answered 14 items following each of the four scenarios. Responses were given on a 7-point Likert scale, with varying anchor labels depending on the specific question. A complete list of the items is provided in Appendix A. German translations of the scales were used.

Presence: To measure the overarching sense of presence, we used the slightly adapted single-item measure by [54] as it has demonstrated good reliability, validity, and sensitivity across different levels of presence. We therefore followed the recommendation of a meta-analysis [22], which recommended this item as an efficient direct measure of presence due to its brevity and strong correlations with established questionnaires.

Place Illusion: To measure Place Illusion, we used three items from the TPI [36] subscale Spatial Presence. Given the specific characteristics of our AV environment, where real video recordings of an opera performance are integrated into a virtual opera house, we identified a need for additional items that specifically address the spatial integration of this video content. To address this specific aspect of Place Illusion, we developed two additional items. Both items directly addressed whether participants perceived the stage performance as being located within the same spatial environment as themselves: “How much did it seem as if you and the actors were together in the same place?” and “How much did it seem as if the actors and requisites were in the virtual space?” As the third item of the TPI subscale already addresses the impression of a shared space based on auditory integration, the strong emphasis on audio design is offset by the two additional items. To ensure adequate scale quality, we undertook reliability and validity assessments as recommended by [55]. Confirmatory factor analysis (CFA) was used to evaluate the factor structure of the scale (cf. Section 2.6). We exemplarily report the results for scenario ‘V_3D + A_SP’. The suitability of the data for factor analysis was checked using the KMO test. Following Kaiser [56], KMO showed suitability (‘meritorious’) of data for factor analysis (KMO = 0.81). Analysis of Eigenvalues scree plot supported a one-factor structure of the five items. The specified measurement model for one latent factor showed excellent fit for the data (p(χ²) = 0.347, CFI = 0.992, TLI = 0.983, RMSEA = 0.063). We performed a reliability analysis to assess the internal consistency of the scale by calculating Cronbach’s Alpha and McDonald’s Omega [57] for each of the scenarios. The lowest internal consistency measured was α = 0.83 and ω = 0.81, both demonstrating a good internal consistency [58].

Enjoyment: We included the scale of Enjoyment by [11] that was developed based on [38] to measure the overall affinity to the experiences. Reliability analysis demonstrated high internal consistency, with the lowest measure being α = 0.92 and ω = 0.92.

QoE: To measure the direct effect of different audio and video representations on the perceived quality of experience, we used a single-item: “How would you rate the overall quality of the experienced stage show?” The item was designed based on the work of [59,60], who both used single-measures to examine QoE.

Motion Sickness: Motion Sickness was measured by a single-item capturing general physical discomfort, which allows for a concise yet comprehensive assessment of motion-related unease across all scenarios. It was developed based on the first Item of the cybersickness scale by [61].

2.5.2. Qualitative Evaluation

After the experiment, short semi-structured interviews were conducted. As we aimed for unaffected opinions to achieve a better understanding of user requirements, we did not ask directly about stereoscopy or spatial audio. The participants answered, amongst others, the following questions (translated here from German to English) verbally:

How would the application have to change to give you added value?
How important was the video quality of the performance to you?
How important was the audio quality of the performance to you?

When respondents indicated that video or audio quality was relevant to their user experience, follow-up questions were asked to better understand which aspects of video and audio design were particularly relevant.

2.6. Procedure

After arriving at the laboratory, participants obtained a written and an oral instruction describing the study procedure and were asked to sign consent forms. Subsequently, they completed a pre-survey that collected demographic data and asked about prior experience with XR technologies. Afterwards, spatial-hearing and stereo-vision tests were conducted to ensure that participants were able to experience the variations between scenarios. For testing stereo-vision, we used the Lang Stereo Test II [62]. As far as we know, there is no standardized test for measuring spatial-hearing. We therefore tested participants’ spatial-hearing by presenting them with three different 3D audio recordings in which the position of the sound source changed in three-dimensional space. The test was passed if the subjects were able to identify the direction of the moving audio source. JBL Live 660 headphones were used for the spatial-hearing test. Participants were then instructed on the use of HMDs and were given the opportunity to explore and familiarize themselves with the VE in a baseline scenario. In this scenario, participants were immersed in the virtual opera environment, including an empty virtual stage (Figure 2). In this condition, the brightness of the virtual opera hall was higher than in the later experimental scenarios, allowing participants to familiarize themselves with the virtual space. The primary goal of this baseline scenario was to mitigate the novelty effect and allow novice XR users to adapt to the new experience. Following this introduction, which lasted a minimum of 90 s but could be extended at the participants’ request, participants completed a brief follow-up questionnaire regarding motion sickness on a tablet.

They were then asked to put on the HMD again and watch the first of four video clips of the opera scene in randomized order. As mentioned above, all four video clips showed the same scene of the opera and differed only in their recording modalities (video dimensionality and audio rendering; cf. Section 2.4). Each video lasted three minutes, a time span equivalent to the results of [63], suggesting that subjects need at least three minutes for a sense of presence to emerge. Participants were reminded that they were free to move and look around during the performance while remaining seated.

After each of the four scenarios, participants completed a follow-up questionnaire assessing their subjective impressions (cf. Section 2.5.1) and were able to take a break to reduce the possibility of motion sickness. Towards the end of the experiment, a short semi-structured interview was conducted in which participants answered open-ended questions about their perception of the audio and video recordings and possibilities for enhancing the experience. In total, the experiment lasted approximately 50 min. Participants were paid 15 euros for their participation.

2.7. Data Analysis

Statistical analyses were conducted using IBM SPSS Statistics (version 30.0; IBM Corporation, Armonk, NY, USA). Prior to the main analyses, the dataset was screened for serious outliers and unexpected patterns within individual items. Although the Shapiro–Wilk test indicated that the data for the variables Presence, QoE and Motion Sickness deviated from perfect normality, two-way repeated measures ANOVA was performed. This approach is justified by simulation-based evidence suggesting that repeated-measures ANOVA is robust to violations of the normality assumption as long as the sphericity assumption is met [64]. A priori power analysis for a two-way repeated measures ANOVA (effect size f = 0.25, α = 0.05, power of 1 − β = 0.90) indicated a minimum sample size of 30 participants, which was achieved. The effect size used for the calculation was based on a meta-analysis [24], which showed a medium-sized effect of both stereoscopy and audio stimuli on the user’s presence in virtual environments. Therefore, two-way repeated measures ANOVAs with the factors ‘video’ (2D (V_MO) vs. 3D (V_ST)) and ‘audio’ (stereophonic (A_ST) vs. spatial (A_SP)) were conducted for all dependent variables (see Section 2.5.1) to test for differences between the audio and video formats as described in our hypotheses as well as to compare relative effects of both independent variables. Significant effects were further examined using paired-samples t-tests to evaluate simple effects.

Statistic Software R (version 4.5.0; R Foundation for Statistical Computing, Vienna, Austria) was used to specify a measurement model by structural equation modeling. The dimensionality of the scale was determined by Eigenvalue > 1 and scree plot analysis. The package “lavaan” [65] was applied using maximum likelihood estimation.

The qualitative data were transcribed using MAXQDA 2020 analysis software (VERBI Software, Berlin, Germany) and afterwards coded and categorized using a qualitative content analysis afterwards [66]. The categories were initially created deductively and are based on the research and interview questions. These formed the main categories of the analysis and the basis for the first coding run. In a second step, all text passages in the main category were assigned to subcategories according to their content. The subcategories were therefore created inductively, as they were derived from the collected material and thus contribute to the differentiation of the category catalog. For example, the different aspects of video and audio features mentioned by the participants served as subcategories. Finally, the interviews were coded in their entirety using the differentiated category system. Text passages were assigned to several categories if it seemed appropriate based on their content. As the interviews also included questions relevant to the user-centered development process of the AV application, only answers to the relevant questions are reported in the results.

3. Results

3.1. Quantitative Data

3.1.1. Descriptive Statistics

Table 1 shows means and standard deviations for all dependent variables. Across all four experimental conditions, the condition with 2D video and stereophonic audio rendering (V_2D + A_ST) consistently resulted in the lowest mean values on all experience-related measures (Presence, Place Illusion, Enjoyment and QoE). Motion Sickness scores were low in all four conditions.

3.1.2. Effects of Video and Audio

The two-way repeated-measures ANOVA showed no significant main effect of ‘video’ for Presence (F(1, 29) = 0.40, p = 0.531, partial η² = 0.014), Place Illusion (F(1, 29) = 0.50, p = 0.824, partial η² = 0.002), Enjoyment (F(1, 29) = 3.365, p = 0.081, partial η² = 0.101), QoE (F(1, 29) = 1.481, p = 0.233, partial η² = 0.049) or Motion Sickness (F(1, 29) = 0.01, p = 0.914, partial η² < 0.001).

Also, no significant main effect of ‘audio’ was found for Presence (F(1, 29) = 0.11, p = 0.745, partial η² < 0.004), Enjoyment (F(1, 29) = 1.51, p = 0.229, partial η² = 0.050), QoE (F(1, 29) = 1.71, p = 0.202, partial η² = 0.056) or Motion Sickness (F(1, 29) = 0.70, p = 0.409, partial η² = 0.024). However, for Place Illusion, a significant main effect of ‘audio’ was found (F(1, 29) = 14.20, p < 0.001, partial η² = 0.329), as can be seen in Figure 3.

We found a significant interaction effect between the two factors ‘audio’ and ‘video’ on the perceived QoE of the experienced stage performance (F(1, 29) = 4.27, p = 0.048, partial η² = 0.128). This indicates that the different audio features only influence the perceived QoE when combined with the 2D video (see Figure 4). Bonferroni-adjusted post hoc analysis revealed significantly higher QoE in the A_SP conditions than in the A_ST conditions for V_2D (M_Diff = 0.47, 95% CI [0.054, 0.880], p = 0.028). In condition V_3D, the analysis revealed no significant effect for ‘audio’ on QoE (M_Diff = −0.100, 95% CI [−0.52, 0.32], p = 0.630). This is consistent with the descriptive data, which indicates that the combination of stereophonic audio and 2D video was perceived as a particularly poor combination regarding QoE. There were no significant interaction effects for the other dependent variables.

3.2. Qualitative Results

In the following, we report relevant results of the analysis of the qualitative interview data. Numbers in square brackets refer to the individual ID of participants.

When asked about important aspects of video quality, only a few mentioned stereoscopy (n = 5). One person [P12] referred to the experiment and said, “The thing that I thought added a lot of value was the impression of depth on stage.” Most participants (n = 22) wished for better video quality and resolution. Due to the fact that they were set seven meters away from the stage, they could not see all the performers’ facial expressions, which was mentioned as a shortcoming. The video dimensionality was of secondary importance to many, as long as the quality itself was perceived as sufficient. The relevance of good video quality was justified by its being crucial for a pleasant and less strenuous viewing experience, especially during prolonged use.

The user survey revealed a variety of opinions regarding the importance of audio quality. Some users found audio quality to be “extremely important,” [P10] or even “more relevant than video quality” [P1; P23] while others considered it less important or even “significantly less important” [P12], especially when compared to spatial vision. The diversity of opinions highlights the varying priorities of users regarding audio and video quality in a virtual environment for theater and opera experiences. Some participants (n = 8) placed significant value on spatial audio in AV applications for theater and opera experiences: “It definitely enhanced the experience” [P9]. Aspects such as the spatial distribution of the orchestra and stage, clear sound reproduction, appropriate volume, and a balanced ratio between vocals and music were particularly emphasized. Furthermore, it was mentioned that spatial audio rendering should not only affect the sound of the stage performance but also the interaction with other avatars in a multiplayer immersive virtual environment, as well as the soundscape of an opera.

When asked about general improvements to the user experience, many participants expressed a desire for social interactions and embodiment through avatars as well as increased interactivity with the environment through more actions and options. Regarding the integration of real recordings, here as well, many participants (n = 15) expressed the desire for more clarity in the video recordings, so that the performers’ facial expressions and props can be perceived even at a greater distance and on larger stages. Table 2 provides an overview of the themes that were identified (n > 4) through qualitative content analysis. Besides these aspects, four participants perceived the spatial effect of the audio as too strong which was further explained by two of these participants mentioning that they missed the characteristic soundscape of an opera. This absence was notable in A_SP when they turned their heads, as there were no audio signals from the environment: “Then you noticed very clearly that you weren’t there. The empty hall definitely has its disadvantages. I found that very strange” [P13]. Finally, concerns regarding the physical comfort of current head-mounted displays were articulated (n = 3): “The glasses were very heavy, so wearing them for three hours straight would be difficult. Yes, I think I would use them more for shorter periods, maybe half an hour at a time” [P29].

4. Discussion

This study explored the effects of video dimensionality and audio rendering on user experience in an augmented virtuality (AV) application that integrates real-life recordings in a virtual environment. For this reason, a virtual opera house was created, using photogrammetry techniques. Within a 2 × 2 within-subjects design, we measured the impact of video dimensionality (2D vs. 3D videos) and audio rendering (stereophonic vs. spatial) on relevant key dimensions of the user experience. We found that there were no significant differences between the 2D and 3D videos regarding their effects on most dimensions of users’ subjective experience, though a medium effect size was observed for Enjoyment. Due to a lack of significance, this effect needs to be replicated in further studies. Overall, the results did not support H1a–e, which supposed that stereoscopic visual features significantly improve key dimensions of user experience. It therefore showed that the video dimensionality of integrated recordings had no significant effect on the user experience in the AV stage event setting, where users were seated at a distance of seven meters away from the implemented video recordings. This result is supported by the analysis of the qualitative data.

It is conceivable that due to the relatively large distance to the integrated videos in the VE (seven meters), which is a realistic seating position within an opera hall, the positive effects of stereoscopy were not sufficiently effective due to reduced parallax cues, even though we already applied a hyper-stereoscopy condition that should have resulted in a higher perception of depth compared to the natural human stereoscopic vision. Another reason why we did not find an effect of video dimensionality on presence may be explained by the already high immersion of the VE that could be experienced through the HMD. The VE used in our study was developed by photogrammetry, giving participants the feeling of being in a real opera house. This could explain the difference in our study results from those of previous studies, which were conducted in front of TV screens [24,25,26,27]. Within the VE, the dimensionality of integrated real-life video recordings does not appear to be as influential as it is on TV screens. This is in line with results from [31], who did not find a significant difference between monoscopic and stereoscopic visualization on the feeling of presence within VEs.

To answer H2a–e, we compared stereophonic audio (A_ST) with spatial audio (A_SP). In scenarios with A_SP, participants could experience spatial hearing as the sound adapted to their (head) movements. Results of our study showed that spatial sound resulted in higher Place Illusion compared to stereophonic sound alone. Interpreting the partial eta square revealed a huge effect of spatial sound on Place Illusion. This supported hypothesis H2b, demonstrating the theoretically derived importance of spatial audio for creating a sense of “being there” in the virtual opera house. This aligned with theoretical predictions that binaural rendering mirrors real-world auditory experiences and that sound should match the three-dimensional characteristics of the virtual environment [50]. The qualitative data further emphasized the importance of spatial audio for participants’ user experience. The remaining hypotheses regarding spatial sound were, however, not supported by data, as there were no other significant differences between stereophonic and spatial sound found in our study. Therefore, our results empirically underline the theoretical considerations of [22], indicating that Place Illusion is directly affected by immersive features such as spatial audio, while overall presence, as a higher-order construct, is affected by additional subdimensions such as social presence illusion and plausibility illusion. This probably lowers the direct effect of a single predictor, such as immersion, on overall Presence. Following this, unsurprisingly, the impact of a single immersive feature on even more distal and broader constructs like Enjoyment or QoE was also not significant.

Comparing the main effects of video dimensionality and audio rendering on user experience in an AV application, the effect of audio rendering exceeded the effect of video dimensionality regarding Place Illusion.

The analysis of the interviews suggested that personal preferences are a key factor in how users differ in their perceptions of content consumption in VE. The overall picture that emerges is that the design choice of appropriate audio and video features should be based on both context and requirements, as needs may vary. Overall, the participants’ responses clearly indicate a demand for high video quality to ensure an optimal and immersive opera experience in the AV application. In the use case of opera, where both musical performance and performers’ acting contribute to the experience, high-resolution recordings may be more relevant than stereoscopic video features. Due to the distance to the stage from a typical seating position within an opera concert hall, visual details of performers appear relatively small, reducing the perceptual salience of 3D video effects. In contrast, the results show that spatial audio cues are perceptually effective even at large viewing distances, as they dynamically respond to head movements and provide directional information that scales with the listener’s position. This suggests that, in distributed AV applications with fixed seating, auditory spatialization has a greater impact on Place Illusion than visual dimensionality, emphasizing the comparatively stronger contribution of spatial audio in this context.

4.1. Design Recommendations

Our findings have implications for the resource-efficient design of AV applications in cultural settings. It is important to emphasize that these recommendations apply specifically to the opera use case examined in this study, as well as to applications with a similar technical viewing geometry. The integration of real-world content into virtual environments must be tailored to the specific use case and user requirements of each cultural application. Different performing arts contexts, such as music performances, dance productions, or experimental theater, may place different demands on video and audio features, and audience needs and expectations may vary accordingly [21].

The lack of significant effects for stereoscopic video suggests that the considerable effort and resources required for 3D video production and post-processing may not be justified within this context. High-quality 2D recordings appear to be sufficient for delivering an immersive opera experience in AV, particularly when combined with a highly immersive virtual environment. This was also emphasized by the qualitative analysis, in which participants expressed a greater demand for video quality than for stereoscopic effects. However, these findings contradict those of a meta-analysis [24], which found exactly the opposite. We assume that these findings cannot be transferred to the specific context of cultural AV applications, as the majority of studies included in the meta-analysis used large-screen displays, projection systems, or desktop monitors rather than HMDs.

Therefore, several practical advantages can be derived from our findings: First, 2D production is substantially less resource-intensive in terms of recording equipment, post-production workflow, and computational requirements for real-time rendering, which offers even smaller cultural institutions with little budget the chance to benefit from this technology solution. Second, and perhaps most importantly, 2D video enables live streaming capabilities, allowing for real-time transmission of performances into the virtual event venues. This would be significantly more challenging with stereoscopic 3D content due to the necessary data rates. Third, this approach enables a sustainable production model where resources can be invested once in creating a high-quality, interactive virtual environment (e.g., through photogrammetry of the venue), which can then be reused for multiple performing arts productions over time.

For audio content, our findings present a more nuanced picture. From a practical significance perspective, spatial audio demonstrated a large effect on Place Illusion (partial η² = 0.329), representing a substantial, user-perceptible enhancement. However, it did not significantly affect other key dimensions of user experience, such as Presence, Enjoyment, or QoE. This suggests that while spatial audio contributes to enabling an immersive experience, its impact may be more specific than previously assumed. From a resource allocation perspective, we recommend implementing spatial audio for the performance content, as it was demonstrated to enhance Place Illusion, a core component of a virtual experience. The investment in spatial audio rendering appears justified, given its positive effect, particularly as spatial audio can simultaneously serve multiple purposes within the application: enhancing the performance content, supporting social features in multiplayer environments through spatially accurate voice communication and enhancing the immersion of being in an actual event location by implementing characteristic soundscapes. This dual functionality maximizes the return on development investment.

In summary, our findings suggest that for opera AV applications, resources should be prioritized as follows: (i) invest in creating a high-quality, reusable VE (e.g., through photogrammetry), (ii) implement high-resolution monoscopic video recording with live streaming capability, and (iii) incorporate spatial audio rendering to enhance Place Illusion and support, if applicable, social interaction of multiplayer to both enhance cultural and social participation.

Beyond the audio and video features focused on in this paper, there are, of course, other factors that can influence and enhance the experience of a virtual cultural application. As the qualitative analysis (cf. Section 3.2) and the literature [4,67] show, these include opportunities for interaction with the VE as well as social interaction and embodiment through avatars, and, last but not least, the control of navigation. We therefore strongly recommend adopting a user-centered design approach [34] when developing AV applications for cultural contexts. Through iterative user research methods such as interviews, focus groups, and usability testing with target audiences, developers can identify the specific requirements and preferences that should guide design decisions. This approach ensures that technical choices regarding media integration are grounded in empirical evidence about what matters most to users in a given context, rather than being solely technology-driven.

4.2. Limitations and Future Research

Our study has some limitations that should be considered when interpreting the results and planning future research on the topic: In this study, we examined the use of AV in the context of cultural stage events by integrating real-world opera recordings into a virtual environment. The results of the study are therefore not applicable to other formats, such as virtual concerts, where users assume a standing position and/or dance. Future research, therefore, may investigate how different audio and video features affect user experience in AV settings where participants can move freely and therefore change their position to the visual and auditory cues. In these interactive contexts, it may be useful to additionally integrate objective measures—such as position tracking and interaction patterns—to complement self-report assessments. In this study, we also did not integrate social interactivity and an environmental soundscape. However, this may have influenced the results as these aspects would generate new sound sources located in different locations within the virtual room, which in turn could further increase the impact of spatial audio [48]. Further research should therefore use multi-user social AV applications.

We used hyper-stereoscopic videos with a stereo base of 12 cm to compensate for the relatively large distance to the stage, and thus to the integrated video recordings, of seven meters. Data from the conducted ANOVA showed that these hyper-stereoscopic videos did not influence motion sickness. The observed effects of 2D vs. 3D videos may nonetheless have been smaller than in comparable studies, where stereoscopic video content filled the entire field of view [25,26,27]. In addition, qualitative interview data indicated that the quality of the embedded videos was not satisfactory, both in the monoscopic and stereoscopic video conditions, with a resolution of 1920 × 1080, when viewed over the Meta Quest 2. We chose these technical specifications as they represent a realistic implementation scenario for the use case, reflecting both the hardware capabilities commonly available for private home use at the time of the study. In this context, it is noteworthy that while the video format showed no significant main effect on Enjoyment (F(1, 29) = 3.365, p = 0.081, partial η² = 0.101), the exploratory observed medium effect size suggests a potential influence of 3D video that may become detectable with advanced HMDs or larger sample sizes. To replicate and validate the effects of video dimensionality and audio rendering found in this study, therefore, more advanced HMDs with better lenses and correspondingly higher image and video quality should be used. Future research could investigate the effects of higher-resolution recordings to determine whether improved video quality might reveal effects that were not detected in this study under the current technical constraints. When using the Meta Quest 2 with the external over-ear headphones, some testers reported a sense of confinement. Therefore, the hardware may have caused some discomfort. For future studies, we plan on using HMDs with superior sound integration and reduced weight.

With regard to the measurement of constructs, single items were used for some constructs in the present study. Single items may have reduced content validity when used to represent a multidimensional construct [68]. We therefore used single items that queried the overall impression, rather than specific indicators of the respective constructs. Due to the within-subjects design with repeated measurement, this approach reduces various negative effects of multi-item scales, e.g., less effort and time of participants to answer the items [68]. However, finer-grained assessment might be useful to detect small effects and should be used in future research when the trade-off with data minimization is deemed acceptable.

Finally, although participants with special needs are expected to benefit from the accessibility afforded by AV opera performances, the current study did not include this group of participants as a sample. Future research should systematically include this user group to understand the specific requirements of these users to ensure that the potential benefits of the technology can be fully realized.

5. Conclusions

This study examined the integration of real-world recordings into an AV application for consuming cultural content, as well as the potential variations in audio and video features and their impact on key dimensions of the user experience. The findings highlight the importance of spatial audio for the perception of Place Illusion, while Enjoyment, Presence, and perceived QoE were not significantly affected. The 3D video did not yield significant improvements in user experience in comparison to the 2D video, while both formats were perceived as insufficient in visual quality. The results indicate that when developing AV applications, it is essential to have a clear understanding of the intended use case and the user group. Although immersive features are important in the context of virtual environments, the quality of the audio and video material is particularly important for delivering a satisfying user experience when implementing virtual stage events.

Author Contributions

Conceptualization, S.P. and F.D.; methodology, S.P.; formal analysis, S.P.; data curation, S.P.; writing—original draft preparation, S.P. and F.L.; writing—review and editing, A.C.B. and F.D.; visualization, S.P. and F.D.; supervision, A.C.B. and F.L.; project administration, F.D.; funding acquisition, F.D. and F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research took place within the scope of the project “SocialSTAGE-VR” (funding code 16SV8772) supported by the German Federal Ministry of Research, Technology and Space. The funding agency did not have an impact on study design, data acquisition, analysis and interpretation of data as well as authoring and submission of this paper.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank our research partners from the Fraunhofer Institute for Integrated Circuits ISS and Die Etagen GmbH for the implementation of the technical setup. Furthermore, we thank Sebastian Meisner for his valuable support in data collection and study preparation and the ethics committee of Chemnitz for the ethical approval.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AR	Augmented reality
AV	Augmented virtuality
HMD	Head-mounted displays
MR	Mixed reality
VE	Virtual environment
VR	Virtual reality
QoE	Quality of Experience

Appendix A

Dependent Variables

Table A1. Quantitative Measures.

Variable	Source	Item	Anchors (1–7)
Presence	[54]	To which extend did you feel present in the virtual environment?	1–7
Place Illusion	TPI [36]	How much did it seem as if the objects and people you saw/heard had come to the place you were?	not at all–very much
	TPI	To what extent did you experience a sense of being there inside the environment you saw/heard?	not at all–very much
	TPI	To what extent did it seem that sounds came from specific different locations?	not at all–very much
	own	How much did it seem as if you and the actors were together in the same place?	not at all–very much
	own	How much did it seem as if the actors and requisites were in the virtual space?	not at all–very much
Enjoyment	[11]	The experience I just had was for me … entertaining	not at all–extremely
		… interesting	not at all–extremely
		… enjoyable	not at all–extremely
		… fun	not at all–extremely
		… exciting	not at all–extremely
		… satisfying	not at all–extremely
QoE	own	How would you rate the overall quality of the experienced stage event you just saw?	unpleasant–pleasant
Motion Sickness	own	To what extend did you feel physically uncomfortable during the media experience?	not at all–extremely

References

Sullivan, E. Live to Your Living Room: Streamed Theatre, Audience Experience, and the Globe’s A Midsummer Night’s Dream. J. Audience Recept. Stud. 2020, 17, 92–119. [Google Scholar]
Walmsley, B. Why People Go to the Theatre: A Qualitative Study of Audience Motivation. J. Cust. Behav. 2011, 10, 335–351. [Google Scholar] [CrossRef]
Slater, M. Place Illusion and Plausibility Can Lead to Realistic Behaviour in Immersive Virtual Environments. Philos. Trans. R. Soc. B Biol. Sci. 2009, 364, 3549–3557. [Google Scholar] [CrossRef]
Han, D.-I.D.; Melissen, F.; Haggis-Burridge, M. Immersive Experience Framework: A Delphi Approach. Behav. Inf. Technol. 2024, 43, 623–639. [Google Scholar] [CrossRef]
Li, J.; Subramanyam, S.; Jansen, J.; Mei, Y.; Reimat, I.; Ławicka, K.; Cesar, P. Evaluating the User Experience of a Photorealistic Social VR Movie. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR); IEEE: New York, NY, USA, 2021; pp. 284–293. [Google Scholar]
Pidel, C.; Ackermann, P. Collaboration in Virtual and Augmented Reality: A Systematic Overview. In Augmented Reality, Virtual Reality, and Computer Graphics; De Paolis, L.T., Bourdot, P., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12242, pp. 141–156. [Google Scholar]
Montagud, M.; Segura-Garcia, J.; De Rus, J.A.; Jordán, R.F. Towards an Immersive and Accessible Virtual Reconstruction of Theaters from the Early Modern: Bringing Back Cultural Heritage from the Past. In Proceedings of the 2020 ACM International Conference on Interactive Media Experiences; Association for Computing Machinery: New York, NY, USA, 2020; pp. 143–147. [Google Scholar]
Wang, F.; Zhang, Z.; Li, L.; Long, S. Virtual Reality and Augmented Reality in Artistic Expression: A Comprehensive Study of Innovative Technologies. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2024, 15, 641–649. [Google Scholar] [CrossRef]
Milgram, P.; Kishino, F. A Taxonomy of Mixed Reality Visual Displays. IEICE Trans. Inf. 1994, E77-D, 1321–1329. [Google Scholar]
Janer, J.; Gomez, E.; Martorell, A.; Miron, M.; de Wit, B. Immersive Orchestras: Audio Processing for Orchestral Music VR Content. In Proceedings of the 2016 8th International Conference on Games and Virtual Worlds for Serious Applications (VS-GAMES); IEEE: New York, NY, USA, 2016; pp. 1–2. [Google Scholar][Green Version]
Shin, M.; Song, S.W.; Kim, S.J.; Biocca, F. The Effects of 3D Sound in a 360-Degree Live Concert Video on Social Presence, Parasocial Interaction, Enjoyment, and Intent of Financial Supportive Action. Int. J. Hum.-Comput. Stud. 2019, 126, 81–93. [Google Scholar] [CrossRef]
Park, J.; Choi, Y.; Lee, K.M. Research Trends in Virtual Reality Music Concert Technology: A Systematic Literature Review. IEEE Trans. Vis. Comput. Graph. 2024, 30, 2195–2205. [Google Scholar] [CrossRef]
Charron, J.-P. Music Audiences 3.0: Concert-Goers’ Psychological Motivations at the Dawn of Virtual Reality. Front. Psychol. 2017, 8, 800. [Google Scholar] [CrossRef] [PubMed]
Cao, R.; Zou-Williams, L.; Cunningham, A.; Walsh, J.; Kohler, M.; Thornas, B.H. Comparing the Neuro-Physiological Effects of Cinematic Virtual Reality with 2D Monitors. In Proceedings of the 2021 IEEE Virtual Reality and 3D User Interfaces (VR); IEEE: New York, NY, USA, 2021; pp. 729–738. [Google Scholar]
Zhang, Y.; Zhou, H.; Jiang, Z.; Tang, Z.; Luo, T.; Lei, Q. Exploring Viewing Modalities in Cinematic Virtual Reality: A Systematic Review and Meta-Analysis of Challenges in Evaluating User Experience. Proc. ACM Hum.-Comput. Interact. 2025, 9, CSCW082:1–CSCW082:30. [Google Scholar] [CrossRef]
Simpson, J. An Industry in Crisis: Virtual Mediums for Theatre and Live Performance. In Proceedings of the Extended Reality and Metaverse; Jung, T., Tom Dieck, M.C., Correia Loureiro, S.M., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 282–293. [Google Scholar]
Skarbez, R.; Smith, M.; Whitton, M.C. Revisiting Milgram and Kishino’s Reality-Virtuality Continuum. Front. Virtual Real. 2021, 2, 647997. [Google Scholar] [CrossRef]
Zuniga Gonzalez, D.A.; Richards, D.; Bilgin, A.A. Making It Real: A Study of Augmented Virtuality on Presence and Enhanced Benefits of Study Stress Reduction Sessions. Int. J. Hum.-Comput. Stud. 2021, 147, 102579. [Google Scholar] [CrossRef]
Shan, J.; Hu, Z.; Tao, P.; Wang, L.; Zhang, S.; Ji, S. Toward a Unified Theoretical Framework for Photogrammetry. Geo-Spat. Inf. Sci. 2020, 23, 75–86. [Google Scholar] [CrossRef]
Kong, L.; Guo, X.; Liu, Y. The Impact of Digital Media, Virtual Reality, and Computer-Generated Art on Traditional Art Forms. SHS Web Conf. 2024, 183, 01004. [Google Scholar] [CrossRef]
Winkler, L.-M.; Palige, S.; Zeiler, A.; Legler, F.; Bullinger, A.C. Why People Attend and How to Design for It: A Systematic Review of Attendance Motives to Inform User-Centred Design of Virtual Performing Arts Events. In Proceedings of the Mensch und Computer 2025; Association for Computing Machinery: New York, NY, USA, 2025; pp. 796–803. [Google Scholar]
Skarbez, R.; Brooks, F.P., Jr.; Whitton, M.C. A Survey of Presence and Related Concepts. ACM Comput. Surv. 2017, 50, 1–39. [Google Scholar] [CrossRef]
Geris, A.; Cukurbasi, B.; Kilinc, M.; Teke, O. Balancing Performance and Comfort in Virtual Reality: A Study of FPS, Latency, and Batch Values. Softw. Pract. Exp. 2024, 54, 2336–2348. [Google Scholar] [CrossRef]
Cummings, J.J.; Bailenson, J.N. How Immersive Is Enough? A Meta-Analysis of the Effect of Immersive Technology on User Presence. Media Psychol. 2016, 19, 272–309. [Google Scholar] [CrossRef]
Freeman, J.; Avons, S.; Pearson, D.; Ijsselsteijn, W. Effects of Sensory Information and Prior Experience on Direct Subjective Ratings of Presence. Presence 1999, 8, 1–13. [Google Scholar] [CrossRef]
IJsselsteijn, W.; de Ridder, H.; Hamberg, R.; Bouwhuis, D.; Freeman, J. Perceived Depth and the Feeling of Presence in 3DTV. Displays 1998, 18, 207–214. [Google Scholar] [CrossRef]
Rooney, B.; Benson, C.; Hennessy, E. The Apparent Reality of Movies and Emotional Arousal: A Study Using Physiological and Self-Report Measures. Poetics 2012, 40, 405–422. [Google Scholar] [CrossRef]
Ling, Y.; Brinkman, W.-P.; Nefs, H.; Qu, C.; Heynderickx, I. Effects of Stereoscopic Viewing on Presence, Anxiety, and Cybersickness in a Virtual Reality Environment for Public Speaking. Presence Teleoperators Virtual Environ. 2012, 21, 254–267. [Google Scholar] [CrossRef]
Keshavarz, B.; Hecht, H. Stereoscopic Viewing Enhances Visually Induced Motion Sickness but Sound Does Not. Presence Teleoperators Virtual Environ. 2012, 21, 213–228. [Google Scholar] [CrossRef]
Kim, J.; Charbel-Salloum, A.; Perry, S.; Palmisano, S. Effects of Display Lag on Vection and Presence in the Oculus Rift HMD. Virtual Real. 2022, 26, 425–436. [Google Scholar] [CrossRef]
Baños, R.; Botella, C.; Rubió, I.; Quero, S.; Garcia-Palacios, A.; Alcañiz Raya, M. Presence and Emotions in Virtual Environments: The Influence of Stereoscopy. Cyberpsychology Behav. Impact Internet Multimed. Virtual Real. Behav. Soc. 2008, 11, 1–8. [Google Scholar] [CrossRef] [PubMed]
Ali, A.A.; Augusto, J.C.; Dafoulas, G. Investigating the Role Augmented Virtuality Has in Assessing Collaboration in Educational Environments. Educ. Media Int. 2025, 2025, 1–20. [Google Scholar] [CrossRef]
Kim, Y.M.; Rhiu, I.; Yun, M.H. A Systematic Review of a Virtual Reality System from the Perspective of User Experience. Int. J. Hum.–Comput. Interact. 2020, 36, 893–910. [Google Scholar] [CrossRef]
ISO 9241-210:2019; Ergonomics of Human-System Interaction—Part 210: Human-Centred Design for Interactive Systems. International Organization for Standardization: Geneva, Switzerland, 2019.
Cheiran, J.F.P.; Bandeira, D.R.; Pimenta, M.S. Measuring the Key Components of the User Experience in Immersive Virtual Reality Environments. Front. Virtual Real. 2025, 6, 1585614. [Google Scholar] [CrossRef]
Lombard, M.; Ditton, T.; Weinstein, L. Temple Measuring Presence: The Temple Presence Inventory. In Proceedings of the 12th International Workshop on Presence (PRESENCE’09), Los Angeles, CA, USA, 11–13 November 2009. [Google Scholar]
Ames, S.L.; Wolffsohn, J.S.; Mcbrien, N.A. The Development of a Symptom Questionnaire for Assessing Virtual Reality Viewing Using a Head-Mounted Display. Optom. Vis. Sci. 2005, 82, 168. [Google Scholar] [CrossRef]
Tamborini, R.; Bowman, N.D.; Eden, A.; Grizzard, M.; Organ, A. Defining Media Enjoyment as the Satisfaction of Intrinsic Needs. J. Commun. 2010, 60, 758–777. [Google Scholar] [CrossRef]
Sylaiou, S.; Mania, K.; Karoulis, A.; White, M. Exploring the Relationship between Presence and Enjoyment in a Virtual Museum. Int. J. Hum.-Comput. Stud. 2010, 68, 243–253. [Google Scholar] [CrossRef]
Tussyadiah, I.P.; Wang, D.; Jung, T.H.; tom Dieck, M.C. Virtual Reality, Presence, and Attitude Change: Empirical Evidence from Tourism. Tour. Manag. 2018, 66, 140–154. [Google Scholar] [CrossRef]
Kim, S.J.; Laine, T.H.; Suk, H.J. Presence Effects in Virtual Reality Based on User Characteristics: Attention, Enjoyment, and Memory. Electronics 2021, 10, 1051. [Google Scholar] [CrossRef]
Vlahovic, S.; Suznjevic, M.; Skorin-Kapov, L. A Survey of Challenges and Methods for Quality of Experience Assessment of Interactive VR Applications. J. Multimodal User Interfaces 2022, 16, 257–291. [Google Scholar] [CrossRef]
Enoch, J.; McDonald, L.; Jones, L.; Jones, P.R.; Crabb, D.P. Evaluating Whether Sight Is the Most Valued Sense. JAMA Ophthalmol. 2019, 137, 1317–1320. [Google Scholar] [CrossRef]
Levi, D.M. Learning to See in Depth. Vis. Res. 2022, 200, 108082. [Google Scholar] [CrossRef] [PubMed]
Schmidt, U. Digitale Film- Und Videotechnik. In Digitale Film- und Videotechnik; Carl Hanser Verlag GmbH & Co. KG: Hamburg, Germany, 2010; ISBN 978-3-446-42477-7. [Google Scholar][Green Version]
Narciso, D.; Bessa, M.; Melo, M.; Coelho, A.; Vasconcelos-Raposo, J. Immersive 360° Video User Experience: Impact of Different Variables in the Sense of Presence and Cybersickness. Univers. Access Inf. Soc. 2019, 18, 77–87. [Google Scholar] [CrossRef]
Beig, M.; Kapralos, B.; Collins, K.; Mirza-Babaei, P. An Introduction to Spatial Sound Rendering in Virtual Environments and Games. Comput. Game J. 2019, 8, 199–214. [Google Scholar] [CrossRef]
de Haas, E.H.A.; Lee, L.-H. Deceiving Audio Design in Augmented Environments: A Systematic Review of Audio Effects in Augmented Reality. In Proceedings of the 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct); IEEE: New York, NY, USA, 2022; pp. 36–43. [Google Scholar]
Serafin, S.; Geronazzo, M.; Erkut, C.; Nilsson, N.C.; Nordahl, R. Sonic Interactions in Virtual Reality: State of the Art, Current Challenges, and Future Directions. IEEE Comput. Graph. Appl. 2018, 38, 31–43. [Google Scholar] [CrossRef]
Moore, D.R. Anatomy and Physiology of Binaural Hearing. Audiology 1991, 30, 125–134. [Google Scholar] [CrossRef]
Assenmacher, I.; Kuhlen, T.; Lentz, T.; Vorländer, M. Integrating Real-Time Binaural Acoustics into VR Applications; The Eurographics Association: Eindhoven, The Netherlands, 2004. [Google Scholar]
Dicke, C.; Aaltonen, V.; Billinghurst, M. Simulator Sickness in Mobile Spatial Sound Spaces. In Proceedings of the Auditory Display; Ystad, S., Aramaki, M., Kronland-Martinet, R., Jensen, K., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 287–305. [Google Scholar]
Dittrich, F.; Legler, F.; Palige, S.; Bullinger, A.C.; Keinert, J.; Jaschke, T.; Josef, A.; Zeiler, A.; Krabbe, M. Kulturelle Und Soziale Teilhabe Aus Der Ferne–Gestaltung Und Entwicklung Eines Immersiven Und Sozialen Virtuellen Theatererlebnisses–SocialSTAGE- VR. In Virtuelle Beteiligung, Reale Teilhabe: Transformative Technologien für eine Inklusivere Gesellschaft; Tröge, J., Stepczynski, J., Wiesner, H., Runde, C., Eds.; Campus Verlag: Frankfurt, Germany; New York, NY, USA, 2025; pp. 315–331. [Google Scholar]
Bouchard, S.; Robillard, G.; St-Jacques, J.; Dumoulin, S.; Patry, M.J.; Renaud, P. Reliability and Validity of a Single-Item Measure of Presence in VR. In Proceedings of the Second International Conference on Creating, Connecting and Collaborating Through Computing; IEEE: New York, NY, USA, 2004; pp. 59–61. [Google Scholar] [CrossRef]
Perrig, S.A.C.; Aeschbach, L.F.; Scharowski, N.; von Felten, N.; Opwis, K.; Brühlmann, F. Measurement Practices in User Experience (UX) Research: A Systematic Quantitative Literature Review. Front. Comput. Sci. 2024, 6, 1368860. [Google Scholar] [CrossRef]
Kaiser, H.F. An Index of Factorial Simplicity. Psychometrika 1974, 39, 31–36. [Google Scholar] [CrossRef]
McDonald, R.P. Test Theory: A Unified Treatment; Psychology Press: New York, NY, USA, 2013. [Google Scholar]
Gliem, J.A.; Gliem, R.R. Calculating, Interpreting, and Reporting Cronbach’s Alpha Reliability Coefficient for Likert-Type Scales. In Proceedings of the Conference in Adult, Continuing, and Community Education; The Ohio State University: Columbus, OH, USA, 2003. [Google Scholar]
van Kasteren, A.; Brunnström, K.; Hedlund, J.; Snijders, C. Quality of Experience of 360 Video–Subjective and Eye-Tracking Assessment of Encoding and Freezing Distortions. Multimed. Tools Appl. 2022, 81, 9771–9802. [Google Scholar] [CrossRef]
Brunnström, K.; Dima, E.; Qureshi, T.; Johanson, M.; Andersson, M.; Sjöström, M. Latency Impact on Quality of Experience in a Virtual Reality Simulator for Remote Control of Machines. Signal Process. Image Commun. 2020, 89, 116005. [Google Scholar] [CrossRef]
Ma, Z. Persuasive Effects of Narratives in Immersive Mediated Environments. Ph.D. Thesis, University of Maryland, College Park, MD, USA, 2018. [Google Scholar]
Lang, J. A New Stereotest. J. Pediatr. Ophthalmol. Strabismus 1983, 20, 72–74. [Google Scholar] [CrossRef] [PubMed]
Melo, M.; Sampaio, S.; Barbosa, L.; Vasconcelos-Raposo, J.; Bessa, M. The Impact of Different Exposure Times to 360° Video Experience on the Sense of Presence. In Proceedings of the 2016 23rd Portuguese Meeting on Computer Graphics and Interaction (EPCGI); IEEE: New York, NY, USA, 2016; pp. 1–5. [Google Scholar]
Blanca, M.J.; Arnau, J.; García-Castro, F.J.; Alarcón, R.; Bono, R. Non-Normal Data in Repeated Measures ANOVA: Impact on Type I Error and Power. Psicothema 2023, 35, 21–29. [Google Scholar] [CrossRef]
Rosseel, Y. Lavaan: An R Package for Structural Equation Modeling. J. Stat. Softw. 2012, 48, 1–36. [Google Scholar] [CrossRef]
Kuckartz, U. Qualitative Inhaltsanalyse: Methoden, Praxis, Computerunterstützung; Beltz Juventa: Weinheim, Germany; Basel, Switzerland, 2012. [Google Scholar]
Balakrishnan, B.; Sundar, S.S. Where Am I? How Can I Get There? Impact of Navigability and Narrative Transportation on Spatial Presence. Hum.-Comput. Interact. 2011, 26, 161–204. [Google Scholar] [CrossRef]
Allen, M.S.; Iliescu, D.; Greiff, S. Single Item Measures in Psychological Science. Eur. J. Psychol. Assess. 2022, 38, 1–5. [Google Scholar] [CrossRef]

Figure 1. Schematic visualization of the prototype AV application, including a computer-generated opera hall, real-world video recordings of the opera performance embedded on the virtual screen, and a participant seated in the audience hall. White boarder around the implemented video recording was added for visualization reasons in the screenshot.

Figure 2. Participants’ viewpoint of the stage during baseline scenario (a) and during an opera performance recording implemented in the AV application in experimental conditions (b).

Figure 3. Effects of video dimensionality and audio rendering on the perception of Place Illusion.

Figure 4. Effects of video format and audio rendering on the subjective QoE; 95% CI; circles show mean values.

Table 1. Means and standard deviation for the dependent variables. Scale range 1 to 7 for all variables.

Condition *	V_2D + A_ST (V↓ + A↓)		V_2D + A_SP (V↓ + A↑)		V_3D + A_ST (V↑ + A↓)		V_3D + A_SP (V↑ + A↑)
Variable	Mean	SD	Mean	SD	Mean	SD	Mean	SD
Presence	4.7	1.21	4.63	1.38	4.77	1.13	4.73	1.31
Place Illusion	3.94	1.17	4.63	1.16	4.12	1.25	4.51	1.23
Enjoyment	4.03	1.43	4.42	1.38	4.5	1.17	4.45	1.40
QoE	3.83	1.34	4.36	1.37	4.3	1.29	4.26	1.33
Motion Sickness	1.77	1.25	1.66	1.21	1.56	1.19	1.90	1.49

* The arrows refer to the immersive features of the video dimensionality and audio rendering; ↓ refers to 2D video/stereophonic audio and ↑ to 3D video/spatial rendering.

Table 2. Categories and illustrative quotes of user-recommended improvements for the AV application.

Category	n	User Recommendations	Illustrative Quotes
Social Presence	18	Shared viewing experiences with others in the virtual space, using avatars to enhance social presence and co-presence	“That’s what makes it an experience […] when you do it together.” [P17); “For other people to be there virtually, so that you really feel like you are sitting in a community.” [P22]
Enhanced Video Quality	15	Higher video resolution and clarity to enhance perception of visual details	“It would be more relaxing if the video was a little sharper.” [P23]
Personalization and Control	5	User-controlled customization options including seat selection, lighting adjustment, audio settings, and viewing angle control	“Free choice of seating is also of greater value.” [P4]
Experiential Authenticity	5	Physical venue features and rituals associated with attending live performances	“When I have the whole experience. Even little things like handing in my jacket or my coat […].” [P30]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Palige, S.; Legler, F.; Dittrich, F.; Bullinger, A.C. Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance. Electronics 2026, 15, 577. https://doi.org/10.3390/electronics15030577

AMA Style

Palige S, Legler F, Dittrich F, Bullinger AC. Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance. Electronics. 2026; 15(3):577. https://doi.org/10.3390/electronics15030577

Chicago/Turabian Style

Palige, Selina, Franziska Legler, Frank Dittrich, and Angelika C. Bullinger. 2026. "Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance" Electronics 15, no. 3: 577. https://doi.org/10.3390/electronics15030577

APA Style

Palige, S., Legler, F., Dittrich, F., & Bullinger, A. C. (2026). Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance. Electronics, 15(3), 577. https://doi.org/10.3390/electronics15030577

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Designing Augmented Virtuality: Impact of Audio and Video Features on User Experience in a Virtual Opera Performance

Abstract

1. Introduction

1.1. Key Dimensions of User Experience in Virtual Environments

1.2. Video Features

1.3. Audio Features

1.4. Scope of Study

2. Materials and Methods

2.1. Participants

2.2. Virtual Environment

2.3. Technical Setup

2.4. Experimental Design

2.5. Data Collection

2.5.1. Quantitative Measures

2.5.2. Qualitative Evaluation

2.6. Procedure

2.7. Data Analysis

3. Results

3.1. Quantitative Data

3.1.1. Descriptive Statistics

3.1.2. Effects of Video and Audio

3.2. Qualitative Results

4. Discussion

4.1. Design Recommendations

4.2. Limitations and Future Research

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Dependent Variables

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI