1. Introduction
The conventional flight simulator, particularly in terms of its technologies, has developed from primitive beginnings to the highly complex, high-fidelity digital simulators of today. The use of such simulators, particularly for the transfer of flight-critical skills, has long been viewed and understood alongside the limitations of such training [
1]. The usefulness of visually based flight simulators for training may be affected by various phenomena, including simulator sickness [
2]. Simulator sickness, which has been widely researched in both academia and industry since at least the 1980s [
3]—and acknowledged since the 1950s [
4,
5]—involves symptoms typically associated with motion sickness. The subtle differences between simulator sickness, motion sickness, and cyber sickness form a complex literature unto themselves [
6]. However, the common symptoms of simulator sickness, such as nausea, dizziness, and spatial disorientation, are well documented. Simulator sickness is both polygenic and polysymptomatic, making precise classification and identification difficult [
7]. That is, sickness may arise as a result of unique individual factors or individual responses to improper simulation from the device, hardware, or software. However, not all symptoms, or a specific pattern of symptoms, will be identifiable in each individual. Notably, simulator sickness is considered to result from incorrect aspects (i.e., fidelity) of the simulator or simulation, rather than sickness caused by the accurate simulation of a nauseating or visually disorienting event, such as aerobatic flight [
1]. Despite the lack of specificity around simulator sickness—both in origins and symptoms—the measurement of the experience is achieved consistently by tools such as the Simulator Sickness Questionnaire [
8]. The Simulator Sickness Questionnaire is widely used in the measurement of simulator sickness and is well validated [
9] and generally considered the standard [
10]. Decades of research on simulator sickness in conventional simulators, has, of course, yet to be matched for newer and more immersive simulator technologies—such as virtual reality (VR).
The growth in the awareness and experience of simulator sickness paralleled the growth of simulation for pilot training in the 1980s [
10]. As newer display technologies emerged—including the head-mounted display (HMD) in the 1990s—the problems associated with simulator sickness increased further. The prevalence of simulator sickness is, of course, not limited to aviation, much as VR for education is not limited to aviation [
11,
12]. Simulator sickness has been noted in other fields, including consumer entertainment [
13], where such display technologies have continued to grow in popularity [
14]. For the training of pilots, and indeed for the training of any safety-critical activities, the prevalence of simulator sickness can offset many of the advantages of ground-based training, including cost and safety [
15,
16]. The experience of sickness can compromise training through distraction and decreased motivation, particularly due to behavioural adaptations made by trainees to avoid symptoms, such as not looking out of windows [
17]. These adaptations may become compensatory skills carried into real aircraft, which could be dangerous. The risk that this issue poses was noted early in the development of (what would become) conventional simulators [
5] and has been subsequently suggested as an issue with flight simulators based on consumer-grade VR-HMDs [
18]. In addition to the more direct threats to safety posed by simulator sickness in a VR simulator, the practical implications of using the technology must also be considered. In particular, the immersion often claimed for virtual reality [
19]—which appears to offer benefits for training that relies on the visuospatial understanding of relationships [
20,
21,
22,
23]—may be undermined by the symptoms of simulator sickness, disrupting this experience [
24]. That is, engagement within the virtual environment may be significantly reduced despite the greater immersion of virtual reality [
25,
26]. The potential limitations of the technology must be considered in the specific contexts in which the technology will be used, such as for the flight simulator training of pilots [
27].
The growing body of literature on modern VR for flight simulation and other extended reality (xR) technologies [
28] is rather unsophisticated and inconclusive [
29]. VR simulation will, however, undoubtedly be an increasing part of aviation training [
19,
30]. The intentions for the use of xR flight simulators for pilot training, for various reasons, are primarily focused on the training of ab initio and general aircraft handling skills [
29]. These skills would include the most basic tasks required of the pilot—such as maintaining the aircraft in straight-and-level flight (Aircraft Rating Standard—Aeroplane Category—A3.1 [
31])—through to the conduct of circuits (Aircraft Rating Standard—Aeroplane Category—A3.6 [
31]). Learning these skills is the foundation of all future development of the flying
technique [
32] and, of course, occurs from the beginning (i.e., ab initio) of flight training. These skills, and indeed the majority of the initial flight training, are based upon the visual recognition of the situation outside of the aircraft and the visual relationship between the instrument coaming and the natural horizon [
33]. The interactivity and immersion of VR appear to work synergistically to enable the greater visuospatial awareness required [
20,
21,
22], as compared to the necessarily truncated visuals of simulators based on two-dimensional display technology [
34]. Presently, simulators in general aviation (GA) are used primarily for the training of instrument flying skills [
35], while the VR simulator would enable better cognitive priming for broader ab initio skills. That is, the inexpensive consumer-grade VR-HMD does not replace the aircraft as a training device, but rather prepares the trainee better for the learning that will occur airborne. The promises, and likely advantages, of the VR simulator must be carefully considered alongside the risks to training from simulator sickness.
The clear intention for the use of VR simulators for ab initio training, along with the acknowledged benefits of simulator training before real-world flight [
15,
36], offers huge potential benefits. It is unclear precisely where the VR simulator, or indeed the xR simulator, will exist within the existing hierarchy of flight simulators, due primarily to the dominance of commercial factors in shaping GA aviation. The existing literature, with its GA focus (i.e., ab initio and general handling), would likely mean the augmentation of, or replacement of, simpler flight training devices (FTD) and aviation training devices (ATD). However, the greater accessibility that VR-HMDs enable [
29], like the greater accessibility once enabled by the low-cost personal computer aviation training device (PCATD) [
37], would suggest the increasing use of unsupervised simulator flying, whether voluntary or sanctioned by instructors. While this may enable greater practice, the potential of simulator sickness to alter behaviour (i.e., compensatory skills) must be considered. These compensatory skills could also develop unchallenged in the absence of the guidance and “ingenuity” [
15] of a qualified flight instructor. Such was the case—although due to different causes—following the release of consumer-accessible flight simulator software as early as 2001, where Prensky found that many trainee pilots arrived “…fully training up on [Microsoft] flight simulator…” ([
38] p. 310). However, the compensatory skills developed outside of formal instruction must then be unlearned at a financial and temporal cost [
37,
39,
40].
Means of reducing simulator sickness in VR have been proposed, but these solutions generally require the modification or addition of hardware or software components [
41,
42,
43]. Modifications to software and hardware can either directly reduce sensory conflicts or mitigate their effects through indirect means. Modifications to reduce sensory conflicts include the reduction of latency, field-of-view changes, or the use of alternative display technologies [
44]. Certain other means, such as the use of galvanic vestibular stimulation, olfactory stimulation, or auditory distraction, have also shown some promise [
45,
46,
47]. However, the polygenic nature of the susceptibility to simulator sickness means that no particular solution is guaranteed in all circumstances [
48]. The use of these modifications also adds both complexity and cost to the system. The additional cost and complexity would seem to partially negate the advantages of consumer-grade VR-HMDs, and they are unlikely to permeate voluntary student use. Techniques based upon acclimatisation [
49] and repeated exposure have shown promise [
50], although these results are also not guaranteed under all circumstances [
18]. A valuable contribution would be a simple method to reduce the symptoms of simulator sickness, while fitting within the existing educational context.
As previously mentioned, there are overlapping definitions around motion sickness, simulator sickness, and cyber sickness. Importantly, within the aviation industry, the preferred term is simulator sickness, and this applies even to synthetic flight training devices that do not incorporate any physical motion. This is a taxonomical issue in the aviation industry as all relevant training devices are colloquially referred to as simulators. There is then the specific subset of flight simulators that utilise motion, as well as the subset of synthetic flight trainers that do not utilise motion. To compound these issues further, the software utilised across training devices is collectively called flight simulators. As such, the term utilised in this work will be simulator sickness. It is worth noting that the established symptoms of cyber sickness align exactly with those of simulator sickness as used in this work, such as those of Kourtesis et al. [
44], being “nausea, disorientation, and oculomotor disturbances”.
Of relevance is the measurement tool used to assess simulator sickness. The specifically developed [
8] and hence widely used tool in the aviation industry is the Simulator Sickness Questionnaire (SSQ) [
51,
52,
53,
54]. Of relevance is the fact that the SSQ has also previously been applied to VR flight simulation [
55]. To enable this work to be directly comparable for those working in the aviation industry, the use of the SSQ is essential. There are other measurement tools, such as the Cyber Sickness Questionnaire for Virtual Reality (CSQ-VR) [
56]. In this work, the CSQ-VR was compared to the SSQ, as well as the VR version of the SSQ called the VRSQ, developed by Sevinc and Berkman [
57]. Kourtesis et al. [
58] have also utilised the Virtual Reality Neuroscience Questionnaire (VRSQ), which assesses VR-induced symptoms and effects.
VR is compelling for simulation because of the immersion, engagement, greater access, and better understanding of visuospatial relationships. Such simulators are likely to be adopted in place of, or as augmentation to, simpler ATDs and PCATDs. However, simulator sickness appears to pose a significant risk to such training, particularly due to the development of compensatory skills and the lasting effects of symptoms long after the initial exposure. Despite these risks, the technology is clearly seen as attractive by industry and academia for ab initio training. There has been insufficient research on simulator sickness in the ab initio cohort and mechanisms to address this efficiently. Although there are attempts in the literature to reduce simulator sickness through various hardware and software extensions, these solutions are unlikely to be practical in the context of flight training syllabi. Thus, this research seeks to evaluate two practical, and directly incorporable, interventions for the reduction of simulator sickness. It may be possible to alter syllabi to make use of existing resources (e.g., initial use an of existing PCATD) or, alternatively, adjust the stage at which these technologies are introduced (e.g., after some initial training airborne).
1.1. Background
1.1.1. Flight Simulators
The conventional flight simulator, particularly in terms of its technologies, has developed from primitive beginnings to the highly complex, high-fidelity digital simulators of today. The use of such simulators, particularly for the transfer of flight-critical skills, has long been viewed and understood alongside the limitations of such training [
1]. The increasing literature on the use of modern virtual reality (VR) for flight simulation, as well as the various other extended reality (xR) technologies [
28], is rather unsophisticated and inconclusive [
29]. VR simulation will, however, undoubtedly be an increasing part of aviation training [
19,
30]. The intentions for the use of xR flight simulators for pilot training, for various reasons, are primarily focused on the training of ab initio and general aircraft handling skills [
29]. These skills would include the most basic tasks required of the pilot—such as maintaining the aircraft in straight-and-level flight (Aircraft Rating Standard—Aeroplane Category—A3.1 [
31])—through to the conduct of circuits (Aircraft Rating Standard—Aeroplane Category—A3.6 [
31]). Learning these skills is the foundation of all future development of the flying technique [
32] and, of course, occurs from the beginning (i.e., ab initio) of flight training. These skills, and indeed the majority of initial flight training, are based upon the visual recognition of the situation outside of the aircraft and the visual relationship between the instrument coaming and the natural horizon [
33]. The interactivity and immersion of VR appear to work synergistically to enable the greater visuospatial awareness required [
20,
21,
22], as compared to the necessarily truncated visuals of simulators based on two-dimensional display technology [
34]. Presently, simulators in general aviation (GA) are used primarily for the training of instrument flying skills [
35], while the VR simulator would enable better cognitive priming for broader ab initio skills. That is, the inexpensive consumer-grade VR-HMD does not replace the aircraft as a training device, but rather prepares the trainee better for the learning that will occur airborne. The clear intention for the use of VR simulators for ab initio training, along with the acknowledged benefits of simulator training before real-world flight [
15,
36], offers huge potential benefits. It is unclear precisely where the VR simulator, or indeed the xR simulator, will exist within the existing hierarchy of flight simulators, due primarily to the dominance of commercial factors in shaping GA aviation. The existing literature, with its GA focus (i.e., ab initio and general handling), would likely mean the augmentation of, or replacement of, simpler flight training devices (FTD) and aviation training devices (ATD). However, the greater accessibility that VR-HMDs enable [
29], like the greater accessibility once enabled by the low-cost personal computer aviation training device (PCATD) [
37], would suggest the increasing use of unsupervised simulator flying, whether voluntary or sanctioned by instructors.
1.1.2. Simulator Sickness
The usefulness of visually based flight simulators for training may be affected by various phenomena, including simulator sickness [
2]. Simulator sickness, which has been widely researched in both academia and industry since at least the 1980s [
3]—and acknowledged since the 1950s [
4,
5]—involves symptoms typically associated with motion sickness. The subtle differences between simulator sickness, motion sickness, and cyber sickness form a complex literature unto themselves [
6]. However, the common symptoms of simulator sickness, such as nausea, dizziness, and spatial disorientation, are well documented. Simulator sickness is both polygenic and polysymptomatic, making precise classification and identification difficult [
7]. That is, sickness may arise as a result of unique individual factors or individual responses to improper simulation from the device, hardware, or software. However, not all symptoms, or a specific pattern of symptoms, will be identifiable in each individual. Notably, simulator sickness is considered to result from incorrect aspects (i.e., fidelity) of the simulator or simulation, rather than sickness caused by the accurate simulation of a nauseating or visually disorienting event, such as aerobatic flight [
1]. Despite the lack of specificity around simulator sickness—both in origins and symptoms—the measurement of the experience is achieved consistently by tools such as the Simulator Sickness Questionnaire [
8]. The Simulator Sickness Questionnaire is widely used in the measurement of simulator sickness and is well validated [
9] and generally considered the standard [
10].
1.1.3. Cyber Sickness
The prevalence of simulator sickness is, of course, not limited to aviation, much as VR for education is not limited to aviation [
11,
12]. Simulator sickness has been noted in other fields, including consumer entertainment [
13], where such display technologies have continued to grow in popularity [
14]. As previously mentioned, there are overlapping definitions around motion sickness, simulator sickness, and cyber sickness. Importantly, within the aviation industry, the preferred term is simulator sickness, and this applies even to synthetic flight training devices that do not incorporate any physical motion. This is a taxonomical issue in the aviation industry as all relevant training devices are colloquially referred to as simulators. There is then the specific subset of flight simulators that utilise motion, as well as the subset of synthetic flight trainers that do not utilise motion. To compound these issues further, the software utilised across training devices is collectively called flight simulators. As such, the term utilised in this work will be simulator sickness. It is worth noting that the established symptoms of cyber sickness align exactly with those of simulator sickness as used in this work, such as those of Kourtesis et al. [
44], being “nausea, disorientation, and oculomotor disturbances”.
Of relevance is the measurement tool used to assess simulator sickness. The specifically developed [
8] and hence widely used tool in the aviation industry is the Simulator Sickness Questionnaire (SSQ) [
51,
52,
53,
54]. Of relevance is the fact that the SSQ has also previously been applied to VR flight simulation [
55]. To enable this work to be directly comparable for those working in the aviation industry, the use of the SSQ is essential. There are other measurement tools, such as the Cyber Sickness Questionnaire for Virtual Reality (CSQ-VR) [
56]. In this work, the CSQ-VR was compared to the SSQ, as well as the VR version of the SSQ called the VRSQ, developed by Sevinc and Berkman [
57]. Kourtesis et al. [
58] have also utilised the Virtual Reality Neuroscience Questionnaire (VRSQ), which assesses VR-induced symptoms and effects.
1.2. Significance
VR is compelling for simulation because of the immersion, engagement, greater access, and better understanding of visuospatial relationships. Such simulators are likely to be adopted in place of, or as augmentation to, simpler ATDs and PCATDs. However, simulator sickness appears to pose a significant risk to such training, particularly due to the development of compensatory skills and the lasting effects of symptoms long after the initial exposure.
For the training of pilots, and indeed for the training of any safety-critical activities, the prevalence of simulator sickness can offset many of the advantages of ground-based training, including cost and safety [
15,
16]. The experience of sickness can compromise training through distraction and decreased motivation, particularly due to behavioural adaptations made by trainees to avoid symptoms, such as not looking out of windows [
17]. These adaptations may become compensatory skills carried into real aircraft that could be dangerous.
While the previously noted increased use of unsupervised simulator flying may enable greater practice, the potential of simulator sickness to alter behaviour (i.e., compensatory skills) must be considered. These compensatory skills could also develop unchallenged in the absence of the guidance and “ingenuity” [
15] of a qualified flight instructor. Such was the case—although due to different causes—following the release of consumer-accessible flight simulator software as early as 2001, where Prensky found that many trainee pilots arrived “…fully training up on [Microsoft] flight simulator…” ([
38] p. 310. However, the compensatory skills developed outside of formal instruction must then be unlearned at a financial and temporal cost [
37,
39,
40].
The risk that this issue poses was noted early in the development of (what would become) conventional simulators [
5] and has been subsequently suggested as an issue with flight simulators based on consumer-grade VR-HMDs [
18]. In addition to the more direct threats to safety posed by simulator sickness in a VR simulator, the practical implications of using the technology must also be considered. In particular, the immersion often claimed for virtual reality [
19]—which appears to offer benefits for training that relies on the visuospatial understanding of relationships [
20,
21,
22,
23]—may be undermined by the symptoms of simulator sickness, disrupting this experience [
24]. That is, engagement within the virtual environment may be significantly reduced despite the greater immersion of virtual reality [
25,
26]. The potential limitations of the technology must be considered in the specific contexts in which the technology will be used, such as for the flight simulator training of pilots [
27].
Despite these risks, the technology is clearly seen as attractive by industry and academia for ab initio training. There has been insufficient research on simulator sickness in the ab initio cohort and mechanisms to address this efficiently. Although there are attempts in the literature to reduce simulator sickness through various hardware and software components [
41,
42,
43], these solutions are unlikely to be practical in the context of flight training syllabi. The additional cost and complexity would seem to partially negate the advantages of consumer-grade VR-HMDs, and they are unlikely to permeate voluntary student use. Techniques based upon acclimatisation [
49] and repeated exposure have shown promise [
50], although these results are also not guaranteed under all circumstances [
18]. A valuable contribution would be a simple method to reduce the symptoms of simulator sickness, while fitting within the existing educational context. Thus, this research seeks to evaluate two practical, and directly incorporable, interventions for the reduction of simulator sickness. It may be possible to alter syllabi to make use of existing resources (e.g., initial use an of existing PCATD) or, alternatively, adjust the stage at which these technologies are introduced (e.g., after some initial training airborne). The clear intention for the use of VR simulators for ab initio training, along with the acknowledged benefits of simulator training before real-world flight [
15,
36], offers huge potential benefits.
1.3. Aims and Objectives
The question guiding this research is “Does prior flight experience or conventional simulator exposure reduce simulator sickness in ab initio trainees after virtual reality flight simulation, based upon the Total Severity (TS) of the Simulator Sickness Questionnaire?” In order for this question to be answered convincingly, empirical evidence will be provided on the effectiveness of the interventions. This research does not rely on the conception of simulator sickness as either a perceptual [
59] or physiological [
60] phenomenon; rather, it is solely concerned with the practical implications and its reduction. No other research addressing this question could be identified, and the answer is important for all future pilot training with VR technology.
2. Materials and Methods
This research employs a quasi-experimental, pseudo-randomised, non-equivalent pre-test–post-test control group design with one direct intervention—prior PCATD-based simulator exposure—and one nuisance factor—prior flight experience. The research design uses both a pre-VR SSQ and a post-VR SSQ, which together form the basis of the primary change score data. As this design is a quasi-experimental approach, there are sources of invalidity that must be considered. Data were available for 85 (n = 85) participants, of whom 33 (n = 33) were in the PCATD exposure group, 25 (n = 25) were in the flight experience group, and the remaining 27 (n = 27) were in the control group. The flight experience group and the control group differed only in their prior airborne flight training as a nuisance factor. Demographic filtering was applied solely for the purpose of managing this nuisance factor—ensuring that the groups could be analysed based on prior flight experience. The original data, and the educational activities that produced these data, were solely for the purpose of an undergraduate laboratory course, which was conducted twice in a twelve-month period. The dataset comprised two separate course iterations, each lasting a single semester (i.e., six months). These data were anonymised prior to their provision to the researchers, as per the research ethics approval.
The order of use of the hardware (i.e., PCATD or VR simulator first or second) resulted from the limited availability of VR-equipped PCATDs. As such, the use of both simulator setups occurred simultaneously, to ensure that the time required did not exceed that timetabled. The control group, therefore, received the same PCATD-based simulator experience as the treatment group, but after the VR simulator, as shown in
Figure 1. Participants arrived sequentially within a scheduled timeframe for the lab (i.e., class) and were assigned to a simulator based on their arrival order. Those who arrived first were assigned to VR until all VR simulators were occupied, after which later arrivals were assigned to the PCATD. The allocation of a participant to use one or the other simulator first, and therefore the group allocation and randomisation, then resulted from the order in which they arrived (i.e., random to an extent). The pre-VR SSQ was administered immediately before VR exposure for all participants, regardless of whether they had first used the PCATD or not. Those assigned to VR first completed the post-VR SSQ immediately after exposure and then proceeded to PCATD use, whereas those initially assigned to the PCATD completed their pre-VR SSQ only after their PCATD session, before experiencing VR, and then completed their post-VR SSQ.
SSQ data were only collected before and after the VR simulator use; therefore, the control group had no additional data. This sequencing ensured that all participants completed SSQ assessments exclusively for VR exposure, with no post-PCATD SSQ collected to avoid confounding effects from multiple exposures. The provided SSQ data were from immediately prior to the use of the VR simulator and immediately after the use of the VR simulator, regardless of the group. The data took the form of extracted, but unprocessed, SSQ Likert-scale values for each dimension of the SSQ. The only demographic data that were available, and which had apparently served some purpose in the original educational activity, were whether the participants had (or did not have) any flight experience—the nuisance factor. The flight experience data were provided in the three group datasets, with each being for the specific exposure. No additional demographic data were made available to the researchers. Data for participants who had both prior PCATD exposure and flight experience were simply not provided and thus were unavailable for inclusion in the analysis. Given the risk that simulator sickness would occur at problematical levels, participants were instructed to immediately remove the VR-HMD if they experienced any uncomfortable sensation.
The simulator for the VR activity, shown in
Figure 2, was based on a consumer-grade VR-HMD tethered to an existing uncertified PCATD. This PCATD was the system used for the associated intervention group. The computer that was used for the simulator ran the Windows 11 operating system, with an Intel i7 (2.5 GHz) central processing unit, 16 GB of DDR4 random-access memory, and an Nvidia RTX 3060 (16 GB) graphics card. The VR-HMD used was a Meta Quest 2 (formerly Oculus Quest 2), with two 1832 × 1920 p displays (one per eye) with a maximum refresh rate of 90 Hz and a lateral field of view of approximately one hundred degrees. The displays are arranged such that each eye receives slightly different images, resulting in the experience of depth and immersion within the computer-generated scene [
61]. The VR-HMD was running a relatively recent version of its operating system (version: v61) and was compatible with the graphics card [
62]. The VR-HMD was tethered to the PCATD via a supported USB Type-C cable (USB3.2 gen one—signalling 5 Gbps). The PCATD visual system consisted of a 27-inch flatscreen computer monitor with a resolution of 1920 × 1080 p, at a 60 Hz refresh rate. The X-Plane 11 flight simulator software ([
63], version 11.20, Laminar Research, Columbia, SC, USA) was used, with graphics settings at medium. The aircraft used for the VR and the PCATD intervention groups was the Cessna 172, equipped with a Garmin G1000-style flightdeck (Guillemot Corporation, Chantepie, France). Identical flight controllers were also used—specifically, the Thrustmaster T.16000 HCS HOTAS [
64].
The flight pattern to be flown in the VR simulator, as well as for the PCATD, was a “mid-air square circuit”. During the circuit, the participant initially maintains the aircraft straight-and-level (Aircraft Rating Standard—Aeroplane Category—A3.2) for 30 s, with the aircraft maintained at a constant altitude, constant heading, and constant airspeed. Next, the participant performs a level turn (Aircraft Rating Standard—Aeroplane Category—A3.4) to the right, with an adequate bank angle and rudder input to achieve a balanced turn at three degrees of heading change per second (rate one) and at a constant altitude, for 30 s. Each of these is repeated a further three times, so as to form a “squircle” pattern, as shown in
Figure 3. This pattern is a highly streamlined version of that used by Ortiz [
65]. In the VR simulator, the pattern was flown only once, resulting in a total exposure period of approximately four minutes. This constituted a slightly shorter exposure time than is generally used to induce significant symptoms [
66,
67]. For the PCATD-based group, the same pattern was flown five times for a total exposure time of approximately 20 min. Both parts of the mid-air circuit (i.e., straight-and-level and turning) are ab initio manoeuvres that should be performed based on the visual relationship between the instrument coaming and the natural horizon. That is, for a large portion of time during each, the pilot should be maintaining their view outside the aircraft, with occasional, systematic scans of the internal instruments [
33,
68,
69].
The participants’ raw SSQ data for the two timepoints, before exposure and after exposure to the VR simulator, were each computed into a Total Severity (TS) score by the normal method [
8]. Handling the dimensions of the SSQ in this way—each of which is measured on a Likert-style scale and therefore produces ordinal data—results in interval data in population-level analyses [
70,
71]. In order to manage the occurrence of non-zero baseline scores of the TS of the SSQ, which cannot be assumed [
72], the change score between timepoints was calculated. This change score of the TS of the SSQ is treated as presenting the experience of simulator sickness that is attributable to the VR simulator exposure. Considering the SSQ in terms of a change score between pre-exposure and post-exposure experience is not a new approach [
73,
74] and has been recommended for better interpretability [
75]. However, this approach must be studied in consideration of Lord’s paradox [
76]—as will be shown in the Discussion. Prior to the selection of statistical tests, the SSQ data (i.e., the change score of the TS of the SSQ) were checked for conformance with assumptions of normality and homogeneity of variance. Normality was checked by the Shapiro–Wilk test and homogeneity of variance by Levene’s test.
In order to assess the differences in the central tendency of the three independent groups, having identified non-normality and heteroskedasticity, the Kruskal–Wallis test was employed. The null hypothesis was that there would be no difference in the central tendency of the total severity of simulator sickness among the pre-treatment groups: real-world flight experience, PCATD-based simulator exposure, and a control group that did not receive pre-treatment. The global null hypothesis (
HG0) was, therefore,
where
M1,
M2, and
M3 represent the median rank total severity of the SSQ for the real-world, PCATD, and control groups, respectively. The alternate hypothesis was
where
i represents each of the groups. If the Kruskal–Wallis test was significant—requiring the rejection of the global null hypothesis—a pairwise comparison to identify specific group differences was performed by Dunn post hoc testing. Thus, the conditional pairwise null hypothesis (
HP0) for each pair was
where
x and
y represent each of the pairwise group comparisons. That is, the median of group
x is equal to the median of group
y. The rejection of the global null hypothesis (
HG0) only indicates at least one pairwise difference, without confirming which pair[s] differ. Incidentally, the conditional alternate hypothesis was thus
where each pairwise comparison is independently evaluated. Therefore, not all pairwise null hypotheses need to be rejected—only those pairs that the Dunn test identifies as significantly different. In order to control the Type I error rate, for the pairwise comparisons, a Bonferroni correction was applied to control the family-wise error rate (FWER) [
77]. The effect size for the global hypotheses was assessed by the use of Epsilon squared (
ε2), so as to quantify the strength difference between all groups. For the conditional pairwise hypotheses, the effect size (
rrb) quantifies the magnitude of difference for each significant pairwise comparison. These measures, in combination, fully assess the significance of the interventions, the relationship amongst the groups, and the practical implications of the change[s].
3. Results
The change score of the TS of the SSQ for the control (
M = 12.882,
SD = 20.096,
Md = 3.74), real-world flight experience (
M = 3.142,
SD = 5.142,
Md = 0.000), and PCATD-based simulator exposure (
M = 10.427,
SD = 11.518,
Md = 7.480) groups required assumption checking prior to primary statistical testing. The change score of the TS for each group is shown in
Figure 4. Testing by Shapiro–Wilk showed that the distributions of the data for the control group (
W = 0.669,
p < 0.001), flight experience group (
W = 0.656,
p < 0.001), and PCATD group (
W = 0.843,
p < 0.001) all differed significantly from normality—as is common and expected for data of this nature [
78,
79,
80]. Levene’s test (
FLevene(2,82) = 6.635,
p = 0.002) was used to check for homogeneity of variance, with the data found to be heteroskedastic. Given the results of the assumption testing, statistical testing proceeded with the non-parametric Kruskal–Wallis test and associated Dunn post hoc test.
The results of the Kruskal–Wallis test (
H(2) = 8.374,
p = 0.015), shown in
Table 1, comparing the change scores of the TS of the SSQ for the three groups, indicate that, amongst the groups, at least one group’s median rank differs from the others. The effect size (
ε2 = 0.0997) suggests that approximately 9.97% of the variance in the ranks is attributable to group differences—indicative of a small to medium effect. These results require the rejection of the global null hypothesis (
HG0) and proceeding with testing of the conditional pairwise hypotheses.
The Dunn post hoc pairwise comparisons, shown in
Table 2, reveal significant differences in TS for two of the pairwise comparisons. The pairwise comparison between the flight experience group and the control group (
z = −2.429,
p = 0.015) and the pairwise comparison between the flight experience group and the PCATD-based simulator group (
z = 2.640,
p = 0.008) were statistically different. However, the pairwise comparison between the PCATD-based simulator group and the control group (
z = 0.099,
p = 0.921) was not significant. These results remained consistent after the application of the Bonferroni correction. The effect sizes of the control–experience comparison (
rrb = 0.375) and the experience–PCATD comparison (
rrb = 0.385) each indicated a medium effect. The effect size for the control–PCATD comparison (
rrb = 0.016) was both trivial and non-significant, suggesting that the difference likely occurred by chance and did not represent a meaningful effect. For the pairwise comparison between the control group and the PCATD-based simulator group, the results provide insufficient evidence to reject the conditional pairwise null hypothesis (
HP0). Conversely, for the pairwise comparisons between the flight experience group and the control group and between the flight experience group and the PCATD-based simulator group, the results require the acceptance of the conditional pairwise alternate hypothesis (
HP1).
4. Discussion
The use of flight simulators, including simulators based on consumer-grade VR-HMD technologies, for the training of flight-critical skills must be done in consideration of the secondary effects of such on the trainee. If the VR simulator is to be widely adopted for ab initio training, simulator sickness symptoms must be reduced to non-problematical levels. The method[s] by which a reduction is effected must be achievable without undue interference with the existing training syllabi—or else the conventional flight simulator may still be a better option. The change score of the total severity (TS) of the Simulator Sickness Questionnaire (SSQ) is here taken to adequately represent the change in the experience of simulator sickness symptoms attributable to the VR simulator experience.
Examining first the TS from the control group (
M = 12.882,
SD = 20.096,
Md = 3.74), it is notable that both the mean and median SSQ scores were below the level (median TS < 20) at which simulator sickness is considered problematical [
81]. Interestingly, those conducting the original educational activity noted that none of the participants in the control group—or indeed in the other two groups—removed the VR-HMD, despite instructions to do so if they experienced simulator sickness symptoms. As such, the participants also self-assessed the severity as unproblematic. It is notable that the median values of the TS of the SSQ are somewhat lower for all three groups than in previous research [
82,
83,
84,
85]. It is unclear from the available data why this would be the case, although the short exposure time is likely a factor.
The results of the post hoc pairwise comparison indicate that the flight experience group differed statistically from both the control and the PCATD-based simulator groups. The control and PCATD groups do not differ. These results, in combination with the medium effect size for the flight experience group compared to the control (rrb = 0.375), indicate that flight experience, as an “intervention”, was able to significantly lower the experience of simulator sickness. Prior flight experience appears to influence simulator sickness; however, it is not an intervention in the true sense, as it was neither assigned nor controlled. Within the experiment, as a nuisance factor, it introduced variability into the analysis, rather than serving as a deliberate treatment, and this necessitates the careful interpretation of its effects.
The development of compensatory skills during flight simulation, whether in VR or conventional simulators, presents a significant risk to aviation safety when these skills are taken into the aircraft and are found inappropriate for flight [
16]. As simulator sickness appears to be a factor likely to result in compensatory skills [
17]—due to learned adaptive behaviours that students use to minimise the occurrence of simulator sickness [
86]—its minimisation is paramount. This minimisation becomes even more critical as newer technologies, such as advancing display technologies (i.e., VR and xR), are likely to result in higher levels of simulator sickness—as has been the case historically. As has previously been observed, VR simulators are likely to emerge at the lower end of the fidelity hierarchy of flight simulators. Nonetheless, conventional simulators, such as PCATDs, are likely to remain available for some time—if not indefinitely. Earlier research on the temporal aspects of simulator sickness by Kennedy, Stanney and Dunlap [
50] suggests that increasing exposure increases simulator sickness, but repeated exposures are known to decrease simulator sickness. The use of existing PCATDs to acclimatise trainee pilots would therefore appear to be an efficient solution; however, the analysis herein suggests that those receiving PCATD exposure experienced symptoms of simulator sickness, as measured by the TS of the SSQ, that were not statistically different from those of the control group. The earlier research was based on repetition within the same virtual environment and not across environments [
18]. It is likely that the PCATD-based group’s acclimatisation was specific to the two-dimensional display used on this simulator and that this did not adequately prepare them for the three-dimensional display of the VR simulator. In other words, the display technologies between the two simulator modalities were sufficiently different to necessitate a new period of acclimatisation on the VR simulator.
A second option to mitigate the risk of simulator sickness—and thus reduce the likelihood of developing compensatory skills or otherwise negating the benefits of VR—is to limit the use of the VR simulator to a later phase of training, following an initial period of training airborne. The existing literature on the role of flight experience on simulator sickness is largely inconclusive in conventional simulators [
87]—with increased experience sometimes shown to increase incidents [
51,
88,
89]. Notwithstanding the previous literature, the present research supports the conclusion that flight experience (i.e., the flight experience group) results in lower levels of simulator sickness. The lower incidence of simulator sickness in the flight experience group may have resulted from this group developing a tolerance to sickness-inducing stimuli through repeated exposure to a provoking environment. That is, repeated exposure to environments with sickness-inducing stimuli, such as aircraft or even roller coasters, may help to develop tolerance to these stimuli [
86]. Those training organisations intending to adopt VR simulators would do well to delay the use of such until trainees have some flight experience. Instead, they could continue to rely on conventional PCATDs, or ‘chair flying’ [
90], as needed. Regardless, given the nature of simulator sickness [
7], the careful monitoring of trainees for symptoms and adjustments to individual training syllabi should be considered. This approach would require the honest admission of issues from the student when this might risk their standing, empathy from overworked flight instructors, and finally the willingness by flight schools to adjust the training to the needs of the individual student—it is difficult to know which is less likely.
In considering the results of the present analysis, the potential sources of invalidity, particularly arising from the design and analysis, ought be considered. The non-equivalent control group design, which is a quasi-experimental compromise of the pre-test–post-test control group design where true randomisation has not occurred, is subject to potentially inherent sources of invalidity [
91]. As regards the internal sources of invalidity, unlike the true-experimental equivalent, this design is subject to
regression toward the mean. That is, in this context, extreme values in the pre-test—whether high or low—will tend to moderate toward the mean on subsequent measurements. The use of a change score, as was the case in this analysis, will partially address this potential source of invalidity by minimising spurious effects of the initial measurement and focussing on the relative change [
92].
Maturations must also be considered here, in the broader conception presented by Onwuegbuzie [
93] as including all of the processes that operate in a participant due, in part or in whole, to the passage of time. Although
maturation is not an inherent thread in the validity of this design, the differential intervention times and the temporal proximity of the intervention to testing may introduce this source. The interventions herein (i.e., prior flight experience and PCATD exposure) did not each occur immediately prior to the test; flight experience may have occurred years earlier, whereas the use of the PCATD occurred very shortly beforehand and could conceivably have fatigued the participants. Further, as the originating event was a purely educational activity, no particular concern was given to ensuring consistent and sufficient washout. Therefore,
maturation cannot be disregarded as a source of internal invalidity.
The non-equivalent control group design is potentially subject to all of the common sources of external invalidity, except for multiple-treatment interference [
91].
Interaction of testing and X (i.e., testing–treatment interaction), wherein the pre-testing of participants changes their behaviour or awareness—and, in this research, imposed the expectation of symptoms—can reduce the generalisability of findings [
94]. In the case of simulator sickness, which cannot be assumed to be zero at baseline [
72], it is not certain how this source can be addressed. A judgment must, therefore, be made as to whether the assumption of a zero baseline TS score or a testing–treatment interaction poses a greater threat to the research validity.
Interaction of selection and X (i.e., selection–treatment interaction), which reduces the generalisability of results where the participants are not sufficiently representative of the population, cannot be assured against in this research. That the participants were undergraduate university students does not, necessarily, differentiate them for the population of concern (i.e., aviation professionals) [
95]. However, the lack of demographic data makes controlling for this source impractical. It could also reasonably be argued that the flight experience group must, in some way, differ from the other two groups; however, this difference is precisely why such experience is treated as a nuisance factor. Nonetheless, this may also introduce natural variation in the outcomes that is independent of any controlled manipulation. The final potential source of external invalidity—reactive arrangements—is the source most fully controlled for in this research. The origins of the data virtually excluded the possibility of the Hawthorne effect (see [
96]), as neither the laboratory demonstrators nor the students were aware, at the time of the original educational activity, that the data would later be used for research. Furthermore, the relative chaos of the undergraduate laboratory would seem an ideal analogue for the areas assigned by flight schools for their basic simulators—thereby improving the generalisability.
The use of a change score of the TS, although helpful in partially addressing
regression toward the mean, must be considered in light of Lord’s paradox. That is, the same data used to compute a change score (i.e., post-test minus pre-test) could alternatively be analysed using an ANCOVA, with the pre-test score serving as a covariate to predict the dependent post-test score. However, this approach is not reasonable for the present data, due to the presence of both non-normality and heteroscedasticity. ANCOVA is robust to a single violation—either homogeneity of variance or non-normality—but not to both [
97], rendering ANCOVA unviable for this analysis. Notably, had ANCOVA been employed, the prior flight experience of some participants could have been treated as a covariate. As change scores were used instead, it was necessary to treat such flight experience as a nuisance factor rather than a controlled variable. The statistical approach employed would not otherwise permit a structured adjustment, as would be possible with ANCOVA. Although the transformation of the data was (theoretically) possible, this is both extremely difficult in this design and risks a significant reduction in the interpretability and comparability of the results.
The results of the present research, although partially positive and a contribution to the literature on simulator sickness in VR flight simulation, represent only an initial evaluation. Much further research on VR flight simulators, as regards simulator sickness and in general, is needed to reach the depth and quality that exists for conventional simulators. Future research on the occurrence of simulator sickness in such simulators, and other extended reality simulators, would benefit from additional data and design. The lack of detailed demographic data, as previously outlined, limits the deeper analysis of underlying trends—for example, determining whether the level of flight experience (i.e., high or low) in ab initio students, which has a negligible impact in conventional simulators [
98], is a factor in VR. That the original activity was only intended to be a classroom activity, and not designed as a research activity, or intended as a research activity, precluded the incorporation of certain types of data capture and procedures. In addition to capturing more detailed demographic data, the capture of workload (e.g., NASA-TLX [
99]) and flight performance data could provide useful covariates. Future researchers should also incorporate true randomisation, consider the required washout period between the PCATD and VR, and increase the exposure time in VR to ensure that the experience is representative. Notwithstanding the limitations and ambitions for future research, the present research advances the adoption of VR simulators through the evaluation of methods that can be practically incorporated, within flight training syllabi, to reduce simulator sickness. It is the only known research on this important topic concerning the effect of prior experiences on the likelihood and severity of simulator sickness when using VR simulation.