1. Introduction
In the past two decades, numerous studies have demonstrated the benefits of virtual reality (VR) over multiple levels of education [
1,
2] and in many application domains [
3,
4]. The term VR encompasses the array of software and hardware that creates a digital representation of a 3D object or environment, with which a user can interact and feel a sense of immersion.
Immersion is one of the primary motivations for educators employing VR, because a sense of immersion in students can foster engagement with the learning content [
5,
6,
7].
The definition of immersion is widely debated. In this paper, we use an integrative definition from Mütterlein, describing immersion as “the subjective experience of feeling totally involved in and absorbed by the activities conducted in a place or environment, even when one is physically situated in another” [
8]. This activity-centric definition allows immersion to be differentiated from the related concept of presence, which describes “how realistically participants respond to the environment, as well as their subjective sense of being in the place depicted by the Virtual Environment” [
9]. A heightened sense of presence in a virtual environment is associated with increased immersion in the learning activities performed there [
4].
VR has encompassed experiences on many different devices with different levels of immersion and interactivity, from low-immersion, 2D computer screens, to fully immersive, VR head-mounted displays (HMDs). However, there has been a recent surge in consumer interest in VR and the concurrent release of a new generation of affordable HMDs and peripherals, greatly reducing the barrier of entry to immersive VR [
10,
11,
12,
13]. This has transformed the landscape of research into VR and has led the educational literature to focus on immersive VR as a newly accessible option.
Immersive VR provides a complete simulation of a new reality by tracking the user’s position and creating multiple sensory outputs through a VR HMD [
14]. It has been successfully applied to education in many fields, including engineering [
15], architecture [
16], medicine [
17], history [
18], and music [
19]. Educators using immersive VR often rely on constructivist pedagogy, because the medium facilitates the experiential and environment-driven learning at the centre of this methodology [
20,
21]. Studies have consistently found positive learner attitudes towards immersive VR [
22], and it has been shown to improve time on task [
23], motivation [
6], and knowledge acquisition [
24].
Despite these findings, immersive VR has failed to achieve widespread adoption in education. Traditionally, this failure was attributed to the high cost and questionable usability of the technology [
2,
3,
20,
25]. With modern VR systems available, this hesitance now arguably comes from educators weighing two other factors: whether there is evidence that VR has enough of an impact on students’ learning to justify costs; and whether they can guarantee the availability of VR content in the long term [
22].
Currently, if educators wish to create their own VR content, they either have to learn how to program 3D graphical applications themselves or hire content developers to create it for them [
26,
27]. Novel technologies such as neural radiance field [
28] and photogrammetry [
29] have made it easier to create customised 3D content, but content creation is still a major barrier towards a wider use of VR in education.
To address these barriers to VR adoption, we investigated
video as an alternative for educational VR content.
video, when viewed inside an HMD, has been considered by some authors as a subset of immersive VR [
30,
31], but it is more commonly defined as different from VR [
32]. We consider
video as a simplified form of VR, since it shares some properties of VR but lacks essential functionalities.
video provides a movable, user-centric viewpoint inside an immersive, digitally projected environment. However, standard
video only allows passive viewing, users can not interact with the 3D environment, and the view position is fixed and movements restricted to the prerecorded camera movements. This differentiates it from VR using 3D environments, where users can interact with the environment and actively change it. More complex interactions are possible, e.g., gaze-based point-and-click mechanics or possibilities to change the storyline [
33], but this usually requires post-processing of recorded videos and specialized applications.
video is much easier to create than interactive VR. Spherical cameras that capture
video have followed a similar trajectory to VR HMDs, with a recent generation of consumer-focused models greatly reducing the cost barrier [
34,
35]. In response to this, popular video sharing platforms YouTube and Facebook added support for
video in 2015, and later Vimeo in 2017 [
36]. Once captured,
video can be edited using traditional video editing software in the same file formats as standard videos [
37,
38]. This means that
video is cheaper to make than immersive VR, more easily sharable, and requires less upskilling of current content creators. Recent research has looked into making
video more useful by proposing more powerful editing algorithms [
39] and mixing it with other media [
40,
41].
videos can also be combined with VR environments, e.g., improving collaborations by sharing remote environments via
videos [
42,
43,
44], improving the realism of 3D environments by fusing them with (potentially real-time) video data [
45], creating immersive virtual field trips [
40], or creating gamified learning experiences by combining
technologies with VR [
46].
Finally,
videos can be used to create immersive VR environment by reconstructing 3D scenes using photogrammetry techniques [
47,
48,
49] or by training Neural Radiance Field (NeRF) models [
50,
51]. The advantages of using
video for 3D reconstruction are a comprehensive (
) coverage of the environment, efficiency, smooth data acquisition, and that the
video itself can be immersively viewed (e.g., as fast preview of the reconstructed scene). Disadvantages of using
video for 3D reconstruction are the data size requiring high processing power, limited resolution, and potentially calibration and stitching issues.
Research into educational
video is relatively new, but it has already been applied to many fields, including medicine [
52,
53], language [
30], sports [
54], business [
55], collaborative design [
56], education and training [
46,
57,
58,
59], sustainability [
60], and marine biology [
61]. The literature suggests many of the same benefits as immersive VR, such as enjoyment, motivation, and improved learning outcomes [
33], though more research is needed in this area [
36,
62,
63].
Many of these studies were applied to domains that the researchers believed would particularly benefit from
video [
30,
54]. The evaluated videos were often hard to compare to standard videos because they were embellished with features that did not translate between the two formats [
52,
61]. Additionally, the studies directly comparing
video to standard video only evaluated learning through immediate knowledge tests [
52,
54,
55].
In this paper, we describe a randomised, crossover study in which we sought to compare the effects of video and standard video. Twelve videos were recorded and then produced in both and standard video formats.
To minimise the effect of domain and prior knowledge, the content presented was believable but fictitious. To ensure a comparable learning experience from the same video presented in either video format, we placed certain restrictions on the production of the videos, which incidentally lead to parallel production processes between the two video types. And to measure learning outcomes that had not yet been investigated, we included long-term retention and special cases for active and passive visual recollection in our study design.
This paper reports on both the results of a User evaluation surveying participants’ experience and the comparative short- and long-term learning outcomes of participants who viewed the same video in and standard formats. This is, to the best of our knowledge, the first time long-term learning retention has been investigated for video.
Research Questions
- RQ1:
What differences in user experience exist when presenting educational content in video on a virtual reality head-mounted display, compared to standard video on a desktop PC?
- RQ2:
What differences in short- and long-term learning retention exist when presenting educational content in video on a virtual reality head-mounted display, compared to standard video on a desktop PC?
2. Materials and Methods
This is a randomised, crossover study intended to compare the learning experience of video, viewed on a VR HMD, to standard video, viewed on a desktop PC. We chose an HMD since we wanted to test whether the additional immersion provided by it affected learning outcomes, e.g., no distraction from the real environment and more intuitive navigation (move head instead of a mouse). The study was spread over a six-week period and collected results about the user experience and short- and long-term learning retention of participants.
2.1. Participants
In total, 20 tertiary students participated in this study, 16 of which were undergraduate and 4 postgraduate. Of these students, 18 were male and 2 were female. The mean age of participants was 25; however, the ages of the participants varied greatly (standard deviation = 9.02, minimum age = 18, maximum = 55). No participants dropped out of this study, and all successfully completed all activities.
2.2. Educational Content
There were two primary considerations when designing the educational content used during this study. Firstly, we wanted to limit the effect that the application domain may have on the learning outcomes of students. Participants may have prior knowledge in different domain areas, affecting the study, and we wanted to confirm that the benefits of video were not constrained to especially well-suited subjects. Secondly, we needed to create videos covering the exact same learning content in and standard formats.
As a result of these considerations, we created a series of 12 short lectures in both video formats, filmed at 12 visually distinctive locations (e.g., a forest, a lecture theatre, a playground, inside a car). In 10 of these locations, a teacher would present a series of facts about that particular location, sometimes referring to visible features around them. The other two locations will be discussed in
Section 2.3.
To address the effect of domain and prior knowledge, the location names and information presented in the videos were designed to be believable but were actually fictitious. For example, in the “Whiterock Bush” video recording, participants were informed that the area had been the site of “the frequent illegal dumping of rubbish”. “Whiterock Bush” is not a real location, nor is the presented fact true. Participants were not informed that the information was fictitious until after the study was complete, as that knowledge could have an effect on learning outcomes.
Ensuring the two video formats provided the same information placed two limitations on video development. Firstly, as extra information is visible when viewing a video, we needed to ensure learning did not rely on content outside of the frame captured in the standard video. Secondly, the quality of accessible cameras varies considerably, and while the pixel resolution is high compared to standard video, it is stretched over the full sphere when viewed. This is compounded by re-projection through VR HMDs of varying resolutions, leading to video experiences often being of low quality with poor text legibility. It is for this reason that we chose to use verbally delivered lectures in visually distinct locations as the educational content.
We also decided to reduce the resolution of the standard videos to match that of the videos. Although the current technology results in a difference in resolution between typical and standard desktop videos, we anticipate that, in the future, the gap will close, and the experience of resolution in each technology will be more similar. We adjusted the resolution to ensure our results are more robust to changes in technology and not significantly influenced by the current technology limits.
Figure 1 and
Figure 2 show screenshots of the
videos participants watched and various questions about these videos. The questions covered specific locations (e.g., which location is shown in this image?), objects in these locations (e.g., which painting did you see in the building?), positions of objects (e.g., where was the swing located within the playground?), and information about these locations (e.g., which bus numbers stopped at this bus stop?).
2.3. Special Case Videos
There were 3 additional factors of interest we wanted to measure, for which we designed 2 special case videos and modified one of the location-based lectures.
Firstly, as arguably the most common setting for tertiary educational videos, we included a lecture theatre as one of the locations. Rather than presenting facts about the location in this recording, we instead performed a traditional lecture where information was presented both verbally and via an overhead projector. This was performed to evaluate whether increasing the feeling of presence in a lecture theatre would affect learning outcomes.
We were also interested in whether viewing a video in would affect the active and passive recollection of visual information. To investigate active visual recollection, we included a recording located inside of a car. Participants were not presented with any facts but were instead asked to memorise as many details as they could about the interior of the vehicle. They were later tested on this information.
To investigate passive visual recollection, a bright red ribbon was tied around a tree in one of the existing locations. The ribbon was clearly visible in the recordings; however, the ribbon was never explicitly mentioned by the speaker at this location. The participants were later asked to recall where they had seen this ribbon among 3 visually similar locations.
2.4. Technologies
The videos were captured using a Ricoh Theta S spherical camera. The camera features two fish-eye lenses capable of recording each. We imported the videos into the Ricoh Theta software, which stitches the video together to a video and exports it as a standard MP4 file, which we then edited with Adobe Premiere Pro.
We displayed videos on a Samsung Gear VR HMD (used in conjunction with a Samsung Galaxy Note 5). While more modern HMDs, such as the Meta Quest 2, offer higher comfort and display quality, very few students own such HMDs, and hence, we decided to use a mobile phone-based HMD, which we considered more accessible for education applications.
The standard videos were obtained using the free and open-source OBS Studio to take screen recordings of the videos, running on the free Ricoh Theta desktop video player software. They were recorded at 60 fps and adjusted to a resolution of 1920 × 1080, to match the video’s resolution. Standard videos were played on a 23″ PC monitor with a resolution of 1920 × 1080.
In most of the standard videos, all necessary information was contained within the intial field of view. However, in some locations, small rotations of the camera were needed to include other details. These rotations were integrated into the standard video recordings, not controlled by or performed in front of the participants.
In total, 24 video clips were created, including both the
and standard versions of each location and lecture. For a detailed description of the content creation process see [
64].
2.5. Challenges in Creating 360° Videos
We encountered several challenges when creating videos for practical applications. Finding a suitable position for the camera was non-trivial. Placing the camera on a table using a normal tripod resulted in an unnatural experience when viewing the video with an HMD. Ideally, the camera had to be located at the eye height of a participant. Holding the camera by hand or using a large tripod resulted in the hand or tripod being visible when looking down. In the end, we used a large monopod (a one-legged camera stand). This single-legged stand was almost invisible, hidden under the camera, and provided the most natural viewing experience.
Another problem we encountered was the low video quality caused by the use of a cheaper older camera model. While the device is capable of recording in 1920 × 1080 pixels, this resolution is spread across the entire , resulting in a display resolution more similar to a 240p video than a 1080p video when viewed on a desktop screen. This limited the educational content we could display; e.g., we avoided situations requiring the recording of text.
2.6. Study Design
Participants were randomly divided into Groups 1 and 2. Both groups were shown all 12 recordings; however, Group 1 was shown locations 1–6 in
video on a VR HMD and locations 7–12 in a standard video on a desktop PC. Group 2 was shown the opposite videos on the HMD and PC (see
Table 1). This is a randomised, crossover design to evaluate user experience across video formats, with both groups experiencing both treatments and subsequently compared through a user evaluation.
This study is additionally a randomised design to evaluate the learning outcomes between the video formats, achieved by comparing the test scores of participants experiencing the same recordings in different formats. These scores were obtained from both a Short-Term Retention Test and Long-Term Retention Test.
The groups compared in this second design are not Groups 1 and 2, but rather the pseudo-groups HMD and PC. This is because the scores associated with experiencing video will include question scores from both Groups 1 and 2, depending on the recording being assessed by each individual question. In fact, all participants may appear in both pseudo-groups, depending on the specific condition, since all participants experience both and standard videos.
2.7. Assessment Instruments
Three assessment instruments were used in this study: a user evaluation, a Short-Term Retention Test, and a Long-Term Retention Test.
The user evaluation was 12 questions long, comprising 4 Likert-scale questions, 1 short-answer question, and 7 open-ended questions. In all questions, participants were asked to compare and standard video experiences, except the final question, which directly asked participants for their opinion on video as an educational tool.
The user evaluation included questions about participants’ senses of enjoyment, immersion, engagement, and their level of distraction while watching the different video formats. We also asked about their preference of format for lecture recordings and any issues that were caused by video and screen quality, motion sickness, or other factors. These are some of the demonstrated benefits of immersive VR, and we wanted to validate that they were evident in video as well.
The Short-Term Retention Test was 20 questions long, comprising 14 multiple-choice questions and 6 short-answer questions about the content of the video recordings. This test also included questions about the active and passive visual recollection from the special case videos.
The Long-Term Retention Test was 21 questions long, comprising 15 multiple-choice questions and 6 short-answer questions. It is identical to the Short-Term Retention Test except for 4 questions, which are slight simplifications of 3 of the original questions. Two of these asked the participants to recall information from an image instead of a location name, and one short-answer question asking for two pieces of information was split into both a short-answer and multiple-choice question.
The Short- and Long-Term Retention Tests were divided into 2 sections, “Content Retention” and “Location Recognition”. Questions about content retention focused on the recall of the information participants were taught during the lectures, while the location recognition questions targeted visual information about the recording locations. The questions were designed to be unambiguous and binary in nature, allowing for a rigid marking rubric.
2.8. Study Procedure
Participants were first asked to fill out a
Demographic Questionnaire. This questionnaire collected basic information, including the participants’ age, area of study, experience with VR HMDs, and whether they had any issues affecting their vision (so the HMD could be adjusted accordingly).
Next, participants were asked to put on the HMD and instructed to complete both a pre-installed Oculus Tutorial as well as watch a 2-minute-long introductorylecture on the HMD. This was to familiarise the participants with video.
After the familiarisation protocols, participants were asked to watch the first set of videos on either the HMD or the PC, depending on the group they had been assigned to (see
Table 1). These video sets each comprised 6 location recordings stitched together, forming one continuous recording approximately 5 min long.
Participants were then asked to perform an unrelated reading task for 3 min, designed to act as a distractor between the 2 video sessions. After the distractor task, they were shown the second set of videos in the other video format.
When they had finished viewing the video content, participants were asked to complete both the user evaluation and Short-Term Retention Test. Six weeks later, participants were asked to complete the online Long-Term Retention Test.
4. Discussion
The results of our user evaluation analysis highlighted many of the key differences between using standard and video.
In terms of general preference, participants were found to enjoy video more, feel more engaged by it, and would prefer to use it as additional material for their learning, provided they could access it as easily as standard video. This was largely attributed to the increased sense of immersion and interaction in the environment in video, amplified by feelings of presence and realism. These results are consistent with existing research around immersive VR and validate our motivations for using video as a learning tool.
Looking further into the thematic analysis, we see a more complex relationship between immersion and engagement in the two video formats. While video was more engaging, participants commonly reported being more distracted while using it, primarily by the interesting, immersive environment itself. The extra visual information pulled attention away from the speaker in the video who was delivering learning content verbally.
This illuminated a distinction in attention and focus between the video formats. Participants watching standard video were more likely to get bored and be distracted by elements of the real world. However, they paid more attention to the core learning elements of the video. Viewers of video were more engaged, but that did not translate directly to engagement with the learning content itself.
While a couple of participants took this as evidence that standard video was the better teaching tool, more noted the potential strength of harnessing distractions in video. They noted that if the learning content is reinforced by the visual environment, instead of being distracted by it, video has the potential to create greater engagement with learning content than standard video.
This related directly to another major emergent theme: content appropriateness. Participants stated that video would be more suited to topics that relied on visual and environmental information, instead of those that are taught verbally or through text.
Our study design should be taken into account when discussing distractions and content appropriateness. We included videos that were heavy in visual information (e.g., memorising details inside a car), and heavy in verbal information (e.g. a traditional lecture recording), incidentally allowing participants to compare the two kinds of content. We also tried to create as comparable an experience as possible between the two video formats, by restricting the learning content in the videos to a frame that could also be captured as standard video. This meant most of the environmental information in the videos was not part of the learning by design, which would have increased the effect of distractions.
Interestingly, the sub-theme of intimacy suggested that
video may have an advantage with some verbal content. Some participants felt an increased sense of connection to the speaker due to the presence and realism of the video format, which translated into a sense of obligation to listen to them. This kind of authentic connection can be particularly useful in situations like language learning [
30].
Enhanced intimacy could even be harnessed to create better learning resources for students from cultural backgrounds that emphasise the relationship between student and teacher as a part of learning. For instance, Reynolds suggests that for students of Pacific Island background, “
teu le va [the nurturing and valuing of a relationship] between a teacher and student is crucial because a student’s identification with a subject can come through a positive connection with a teacher” [
66]. With the increasing prevalence of digital learning resources, consciously maintaining an element of human intimacy as part of their design will benefit some students.
The final major difference between the video formats to mention here is usability. Watching
video was reportedly less comfortable, with a few participants noting the weight of the HMD or a sense of disorientation from using it. These drawbacks of watching
videos with HMDs have also been reported in previous research using different types of HMDs [
67]. Participants also noted that you cannot take written or typed notes while in immersive VR. Overall, however, these limitations did not appear to outweigh the benefits of
video, as participants also claimed they could study from both formats for similar amounts of time, and would prefer to use
video to do so.
Our analysis of the results of the Short- and Long-Term Retention Tests found no statistically significant differences in learning retention between the HMD and PC pseudo-groups. This is true of the overall scores, the separate scores for the “content retention” and “location recognition” subsections of the tests, and the scores associated with the three “special case” videos.
This means that video was as effective as standard video in conveying learning content, even with the limitations we placed on production. There were no major differences in short- and long-term retention in the context of this study.
It should be noted that the small effective sample size (n = 10) limited the statistical power of this study. When looking at the results in aggregate, the HMD group outperformed the PC group in every measure except two (short-term content retention and long-term passive visual recollection) and scored better in over twice as many total questions. As mentioned, none of these results were statistically significant at the threshold, but there is an indication that a study with greater strength may be able to detect an effect.
Our results are interesting in the context of previous research, which has focused on evaluating the use of video for specific applications and/or different VR technologies in education.
Schroeder et al. [
63] conducted a review investigating how
video influences cognitive learning outcomes. The authors identified 26 studies and report that overall, there was no evidence of benefits or detriments on learning. The authors also looked at specific properties of
video, such as interactivity and contextual information, but neither property had a significant effect on learning. These results are in line with our results for short-term and long-term learning retention.
Baysan et al. [
67] reviewed the use of
video for nursing education. The authors included 12 studies in their review and found that
video can improve motivation, confidence, and task performance. A total of 4 of the 12 reviewed studies used a smartphone-based HMD for better accessibility as we did in our study. Only one study compared display options and found that viewing
video with an HMD was preferred to using a touch screen, but the authors did not investigate the effect of these options on learning [
68].
Rosendahl and Wagner [
59] reviewed application areas of
video in education. The authors found 44 papers and reported that
videos are mainly for three teaching–learning purposes: presentation and observation of teaching–learning content, immersive and interactive theory–practice mediation, and external and self-reflection. Our research focused on learning retention, which is relevant for the first two of these purposes. The authors cite multiple studies that report that viewer engagement increases with the immersion level (e.g., [
69,
70]) but only Rupp et al. investigated the effect on knowledge retention and claimed that,
videos, with the most immersive display, enabled users to remember more verbally presented information [
69]. However, the authors did not investigate long-term retention.
Atal et al. reviewed the use of
video in teacher education [
71]. The authors analysed 17 papers and found that
video is a preferred method for overcoming the limitations of standard video, as it offers viewers multiple perspectives and levels of decision-making and enables viewers to notice more details faster due to the ability to look around freely. Furthermore, the authors cite a few studies investigating different display technologies for
video. Two studies suggested a potential usefulness of HMDs for perceptual capacity, reflection, and teacher noticing [
72,
73], and one study showed no added usefulness of HMDs [
74].
Muzata et al. [
75] reviewed 66 publications and reported that
video constitutes one of the most beneficial learning environments due to (1) the low cost of the required equipment; (2) the ability of viewers to employ their expected sensory–motor contingencies, such as head movements; and (3) the encouragement for viewers to use more immersive technologies. VR is preferred when users need to interact with objects and explore them in detail, e.g., in engineering and medicine.
In summary, our study supports and builds upon previous research. Learners enjoy using
video for education. There are few usability and accessibility issues. Given a choice, most students prefer using an HMD over a desktop display for viewing. In our study, the display technology had no significant effect on learning, although we did observe higher (non-significant) values when using an HMD. This supports results by Rupp et al, which, in a larger study, found that participants using an HMD remembered more verbally presented information [
69]. In contrast to Rupp et al., we also investigated long-term retention, where we again found (non-significant) higher values. This indicates that, while no evidence was found to suggest that using an HMD is better, there is a suggestion that a larger study may reveal such an effect. Similar to previous research, we found that HMDs result in increased enjoyment and engagement, which might have a positive long-term effect on learning [
76].
5. Limitations
Novelty is a commonly observed confounding effect in studies involving new learning technologies [
77,
78], including immersive VR [
22]. It can lead to increased motivation or perceived usability of a technology, which may translate into increased attention and engagement in learning activities [
7]. The responses to the user evaluation showed evidence of the novelty effect among participants, which should be taken into account with these results. The novelty effect also manifested itself as leniency towards issues with
video.
Connected to this novelty effect is the potential for anchoring bias in our study results. Participants will have likely watched many standard videos in the past, giving them high-quality reference points or “anchors” against which to evaluate the study videos. However, they are much less likely to have encountered
videos before and, therefore, may not have any expectations of quality in that format. The anchoring effect can lead to more severe criticisms of experiences that can be compared, as well as more lenient evaluations of novel experiences [
79].
Another factor that could generate leniency is social desirability bias. This is where survey responses are informed by a participant’s desire to project a favourable image to others [
80,
81]. In this study, participants may have been less critical of
video, aware that we (the researchers) were studying the format, and mistakenly believing we may perceive them undesirably for answering negatively.
As noted previously, the other major limitation of this study is the small sample size. This caused our statistical analysis to have limited power to detect differences in learning outcomes between the HMD and PC groups. Also, most of our user study participants were students or staff from Computer Science and all of them were at least 18 years old.
Future studies comparing and standard video will need larger sample sizes and more diverse user study populations, including children, to conclusively comment on any effects on learning retention in different learning applications. Furthermore, there is a need for more elaborate longitudinal studies assessing the long-term impact on motivation and academic performance.
6. Conclusions
In this paper we presented and analysed the results of a user study to evaluate differences in user experience and short- and long-term learning retention in tertiary students viewing educational videos in both and standard desktop formats. We found that participants retained the same amount of learning from both types of video, but engaged with and enjoyed video more. We also found that participants believed they could study using either format for a similar amount of time and would generally prefer to use video as additional learning materials for their coursework, provided it was accessible.
To create a more comparable experience between the
and standard videos used in this study, we placed restrictions on the production of the
videos. That means these results were obtained using
videos produced in an accessible way, parallel to standard video production. This indicates that
video is a viable way for institutions to continue generating value from an investment in immersive VR technology, without having to upskill their current learning content creators or pay for expensive bespoke VR development. We believe that
video also offers an attractive alternative to other new learning technologies such as game-based learning, which, while highly engaging, suffers from high content creation costs, vendor lock-in, limited diversity in teaching, and often too much focus on entertainment [
82].
User responses also suggested that video has much greater potential for learning engagement if additional effort is put into incorporating the learning content into the surrounding environment. Our results show that at its most accessible level, video is equally effective and more engaging than standard video. However, it is a different medium with a different user experience, creating engagement and enjoyment through a sense of immersion in the wider environment. To unlock the potential of video for education, this environment should be harnessed to reinforce learning objectives.
We see significant potential by combining
videos with VR technologies, as demonstrated in previous research [
40,
42,
43,
44,
45,
46]. In an educational context, 360-degree videos can be used in VR to provide realistic backgrounds (skybox) and instructor information or display large-scale information which is difficult to manually model or reconstruct (e.g., archaeological sites), whereas VR content should be used for objects users need to interact with.
Furthermore,
video could be useful in learning applications where cognitive load must be measured. Cognitive load measurements can be used for adjusting the difficulty of learning content [
83]. However, measuring cognitive load in interactive VR applications is difficult since body movements can interfere with physiological measures of cognitive load [
84,
85,
86].
video could be more useful in such applications since users usually don’t move a lot while watching. However, the fact that students have limited interactions with the video can also be a limitation, since it makes it difficult to measure users’ behavior, which is an important parameter in intelligent tutoring systems [
87,
88].