Virtual and Augmented Reality Direct Ophthalmoscopy Tool: A Comparison between Interactions Methods

: Direct ophthalmoscopy (DO) is a medical procedure whereby a health professional, using a direct ophthalmoscope, examines the eye fundus. DO skills are in decline due to the use of interactive diagnostic equipment and insufﬁcient practice with the direct ophthalmoscope. To address the loss of DO skills, physical and computer-based simulators have been developed to offer additional training. Among the computer-based simulations, virtual and augmented reality (VR and AR, respectively) allow simulated immersive and interactive scenarios with eye fundus conditions that are difﬁcult to replicate in the classroom. VR and AR require employing 3D user interfaces (3DUIs) to perform the virtual eye examination. Using a combination of a between-subjects and within-subjects paradigm with two groups of ﬁve participants, this paper builds upon a previous preliminary usability study that compared the use of the HTC Vive controller, the Valve Index controller, and the Microsoft HoloLens 1 hand gesticulation interaction methods when performing a virtual direct ophthalmoscopy eye examination. The work described in this paper extends our prior work by considering the interactions with the Oculus Quest controller and Oculus Quest hand-tracking system to perform a virtual direct ophthalmoscopy eye examination while allowing us to compare these methods without our prior interaction techniques. Ultimately, this helps us develop a greater understanding of usability effects for virtual DO examinations and virtual reality in general. Although the number of participants was limited, n = 5 for Stage 1 (including the HTC Vive controller, the Valve Index controller, and the Microsoft HoloLens hand gesticulations), and n = 13 for Stage 2 (including the Oculus Quest controller and the Oculus Quest hand tracking), given the COVID-19 restrictions, our initial results comparing VR and AR 3D user interactions for direct ophthalmoscopy are consistent with our previous preliminary study where the physical controllers resulted in higher usability scores, while the Oculus Quest’s more accurate hand motion capture resulted in higher usability when compared to the Microsoft HoloLens hand gesticulation.


Introduction
In the context of medical education and training, simulation is defined as "an artificial, yet faithful, representation of clinical situations through the use of analog and digital apparatuses" [1]. Simulation relies on a number of physical assets including manikins, cadavers, standardized patients, animals, devices, and computer-based simulation (CBS), among other methods of imitating real-world systems [2]. CBS has been gaining momentum due to the availability of low-, mid-, and high-end immersive technologies used as complementary training and educational tools where learners are able to develop procedural and declarative knowledge under a controlled and safe environment [3]. The use of CBS in medical education allows for the safe exposure to hazardous and life-threatening situations otherwise impossible in the real world [4]. The level of immersion and presence possible with CBS, particularly with virtual reality (VR) (e.g., virtual experiences that do not require a head-mounted display (HMD)) and augmented reality (AR) (e.g., experiences, where computer content is overlaid on the real world and visualized through a handheld or HMD), in addition to cross-sensory cues from spatial audio and haptic feedback, can provide a highly immersive and engaging learning environment.
The eye fundus examination is a standard medical examination [5], which allows for the early identification of conditions associated with high blood pressure and diabetes mellitus, among others [6]. Within the eye fundus examination, the direct ophthalmoscopy (DO) examination requires the health professional to use a direct ophthalmoscope, to search for anomalies within the eye. Relative to other eye examination methods, including those employing slit lamps for eyelid examination or the tonometer for gauging the intraocular pressure of the eye, direct ophthalmoscopy remains the most cost-effective method and is widely available in urban and rural healthcare facilities [7]. Teaching and evaluating DO examination competency is particularly challenging; DO training requires one ophthalmoscope per student and a set of eye fundus samples to evaluate. Due to the limitations associated with the use of live patients, images and digital renderings are often used. Instructors provide verbal descriptions during training, walking trainees through the anatomical landmarks and possible eye fundus-related conditions [8]. The use of images lacks the proper representation of the volumetric shape of the eye fundus, thus limiting the development of spatial awareness and the patient interaction skills needed for live examinations [9].
Typically, DO simulator-related studies focus on validating the effectiveness of the simulation with respect to other simulators or traditional practice [10]. However, this is not the only question regarding VR-and AR-based simulation technologies. VR-and AR-based DO simulators require input devices that are representative of the direct ophthalmoscope, and the choice of input device can have a significant impact on the ease of use of the entire simulation. The potential impact of interaction techniques on VR-and AR-based DO simulators has led us to explore usability issues associated with mobile AR used in combination with a Styrofoam head, and a 3D-printed direct ophthalmoscope replica as alternative tangible user interfaces [11], tabletop displays for multiuser visualization, interaction and augmentation [12], and early prototyping of VR DO eye examination [4]. Despite recent advances in custom-made user interfaces employing 3D printing and open electronics, virtual simulators continue to employ off-the-shelf VR controllers and gestural inputs that may impact usability.
In this paper, we build upon our previous preliminary work that compared the usability between current VR controllers and hand-tracking interaction methods employed in VRand AR-based DO simulators. Our aim is to develop a greater understanding regarding the usability effects associated with the controller and hand-tracking interactions methods, including the widely used HTC Vive controller, Valve Index controller, Microsoft HoloLens 1 hand gesticulation, Oculus Quest controller, and Oculus Quest hand tracking, on usability when performing a virtual eye examination. To address these combinations, we conducted a within-subjects study in two stages. The first stage saw five participants randomly exposed to the HTC Vive, Valve Index, and Microsoft HoloLens hand gesticulation 3D user inputs. The second stage saw 13 participants randomly exposed to the Oculus Quest controller and the Oculus Quest hand-tracking system.

Background
Eye fundus examination training typically focuses on reproducing and mimicking the procedure employing interchangeable images (e.g., printed or digital pictures) [10]. Such an approach has proven to be cost-effective as instructors and trainees can access pictures representing numerous eye conditions that are otherwise difficult to reproduce in the classroom. Ricci et al. [10] and Chan et al. [13] conducted a review of various approaches to DO simulation and training, indicating growth in the adoption of high-and low-fidelity simulators, with a particular increase in low-cost and cost-effective solutions for facilitating more training opportunities to students.

High-End Simulators
High-fidelity simulators include those that combine photorealistic depictions of the eye fundus in addition to patient behavior and learning management system features for facilitating learning assessment. For example, the EYEsi Direct Ophthalmoscope Simulator (EYEsi DOS) [10] offers a realistic training experience featuring a touchscreen interface that shows the procedure for the instructor to observe while the trainee interacts with an artificial human face that is examined with a special ophthalmoscope replica operated by the trainee. Currently, the EYE Exam Simulator (Kyoto Kagaku Co. Ltd., Kyoto, Japan) and the EYEsi direct ophthalmoscope simulator (VRmagic, GmbH, Mannheim, Germany) appear to be the two most representative commercially available high-end simulators.
Tso et al. [14] conducted a study to examine the role of the EYEsi DOS on trainees' confidence when performing DO in comparison to traditional lecture-based teaching and to determine which teaching method was superior and why. Results from 31 participants indicate that confidence ratings (mean = 3.58/5.0) were significantly higher for those that used the EYEsi DOS than those following the traditional lecture-based teaching session (mean = 2.19/5.0). Additionally, four-fifths of respondents felt that the EYEsi DOS was superior to lecture-based teaching given the guidance provided to the trainee while using the EYEsi DOS during the examination through the use of labels to facilitate pathological findings, while also providing trainees the opportunity to study conditions not possible in traditional lectures. Furthermore, the EYEsi DOS library includes hands-on documentation that is available to the trainee during the simulation.
Boden et al. [15] evaluated the effectiveness and acceptance of learning direct ophthalmoscopy on a simulator in comparison to lecture-based teaching. The study had 34 participants divided into two groups, one receiving the instruction employing the EYEsi DOS and the other receiving lecture-based instruction. After completing the training sessions, the two groups were assessed employing the objectively structured clinical examination, which is a modern type of examination often used in health sciences. Those who used the simulator were 91% more successful than those with the traditional method, with a success of 78% when evaluated with the objective structured clinical examination.

Physical Simulators
The elevated costs and requirements associated with high-fidelity simulators have sparked research and development toward cost-effective solutions that can be deployed to trainees without the availability or access to specialized equipment. While highdefinition photographs [16], video demonstrations [17], mobile applications [18], multimedia websites [17], and most recently, 3D computer-generated models [19], with various levels of realism, have been gaining momentum, such solutions share a lack of tangible user interfaces. To address this limitation, Chung and Watzke [20] proposed a simple and inexpensive device to aid in ophthalmoscopy teaching. This device employs a cylindrical plastic canister that was altered to have an artificial pupil and a replaceable fundus photograph for simulating various eye fundus conditions. The canister was employed in laboratories with medical students, family practice residents, and physician assistant students who reported finding the tool easy to use and helpful for finding morphology and diseases in the pictures.
Wu et al. [21] present the FAK-I model designed using low-cost materials including an opaque, plastic bottle cap that replicates the anterior-posterior dimension of the human eye. High-resolution, digital retinal images were created and printed on matte paper to minimize glare. A retention study with 12 participants who were randomly divided into two equal groups was conducted. After 45 days of no further intervention, those who trained with the FAK-I model had a higher percentage of correct responses (44%) and reported deliberately practicing more than the control group. However, the results did not reach statistical significance. Furthermore, Kelly et al. [22] conducted the Teaching Ophthalmoscopy To Medical Students (TOTeMS) study, where 138 first-year medical student participants were randomly divided into three groups depending on the type of direct ophthalmoscopy training they received. One group received training in direct ophthalmoscopy using the FAK-I simulator, another group received training with human patient volunteers, and another group received training employing digital photographs. Results show that 71% of participants preferred human patients to simulators, and 77% preferred photographs for learning relevant features of the ocular fundus since they found them to be easier and less frustrating than when using the ophthalmoscope on simulators or human patients. Kim and Humphries [23] proposed a simple eye model using inexpensive materials including a Styrofoam head, a hardboiled egg, and a contact lens depicting the iris. Residents at the University of Kentucky expressed feeling more confident after using this inexpensive training tool.

Mobile, AR, and VR Simulation
DO simulation with consumer-level devices has been gaining momentum as immersive technologies such as VR and AR, in particular, become widely available. Furthermore, computer-based simulation (CBS) overcomes some of the limitations associated with traditional eye fundus examination.
Acosta et al. [11] propose an AR solution that digitally augments a Styrofoam head without relaying on physical photographs by employing markers and overlaying the eye fundus on the eye sockets when viewed with a mobile phone [24]. However, this approach was modified to a 3D-printed ophthalmoscope replica that overlays the eye fundus since the manipulation of the Styrofoam head and the mobile device was difficult. A preliminary usability study with 17 participants compared virtual DO examination employing the 3D printed ophthalmoscope and a mobile VR headset that provided stereoscopic viewing, a mobile phone with a marker placed on a wall, and the EYEsi DOS. Results showed a trend whereby the participants perceived the mobile AR solution to be more usable than the one with the ophthalmoscope replica but less usable when compared to the EYEsi DOS. Another example employs a tangible user interface and an AR-based approach that allows multiple trainees to interact with an eye fundus displayed on both the screen and mobile devices [12].
With respect to VR, Chan et al. [25] conducted a usability and cognitive load withinsubjects study with 10 participants that compared the system usability scale (SUS) and the NASA Task Load Index (TLX) scores focusing on developing an understanding of the differences between three modes of 3DUIs, and more specifically, the HTC Vive controller, the Valve Index controller, and the Microsoft HoloLens 1 hand gesticulation when performing a virtual DO examination. The results of this preliminary study found that the Valve Index controller had the highest SUS score with 92.5/100, followed by the HTC Vive controller with a score of 88.5/100, and lastly, a score of 41.3/100 for the Microsoft HoloLens 1 hand gesticulation. With respect to TLX scores, results were consistent with the SUS results. More specifically, the Valve Index controller resulted in a score of 92.5, followed by the HTC Vive controller resulting in a score of 89.16, and lastly, the Microsoft HoloLens 1 hand gesticulation which resulted in a score of 60.83. Overall, these preliminary results indicate that physical controllers that capture finger movement are perceived as more usable with less cognitive load when performing the virtual DO examination.

Materials and Methods
The virtual examination scenario used here was developed following current eye fundus examination practices conducted at the Clinical Simulation Center in Universidad Militar Nueva Granada, Bogota, Colombia. The scenario also builds on previous research conducting preliminary usability and cognitive load testing [25].

The Virtual DO Procedure
The virtual DO examination begins with the trainee holding the ophthalmoscope and aligning the aperture of the device with the patient's eye. Trainees are expected to align their right and left examination eyes with the patient's examined eye to avoid nose-tonose contact with the patient [26]. This process requires approaching the patient while maintaining the red reflex in focus until the ophthalmoscope is as close to the patient as possible. The red reflex is caused by light being reflected on the pupil, and by approaching it, a small portion of the retina becomes visible through the ophthalmoscope [26]. The first anatomic landmark to be located in the optic disc, or optic nerve head presents a yelloworange hue and is located approximately 15 degrees from the patient's nose. Once the retina is in focus, blood vessels are localized and traced back against the branching pattern to the optic disc and the macula next to it. To further explore the eye fundus, the examiner orients the ophthalmoscope and asks the patient to look in different directions. In addition to the optic nerve, the "optic cup" is the second-most notable anatomical landmark, characterized as a pale depression at the center of the optic disc. In summary, the virtual DO examination procedure requires locating and holding the direct ophthalmoscope, approaching the patient, locating the red reflex, and observing the retina, the optic disc, and the macula for abnormalities.

DO Virtual Examination Scenario
Building on our previous work [25], the virtual examination scenario was updated after conducting a within-subjects usability testing session with five participants (Section 5). The updated scenario includes a virtual static ophthalmoscope that allows for the examination of the eye fundus when tracking issues cause the virtual ophthalmoscope to jitter and thus increase the difficulty of the task. The previous and current scenarios are presented in Figure 1.

The Virtual Eye Model
The virtual eye was modeled after the human eye and included eyelid animations and subtle eye movements found during the eye fundus examination. These animations were implemented to increase realism and to avoid an unrealistic and uncanny virtual patient. Additionally, the eye model was comprised of a hollow interior depicting the anatomical landmarks with invisible colliders that trigger their detection during the virtual examination. Figure 2 presents the virtual examination scene with a closeup to the floating standalone virtual eye model and its integration into two virtual avatars, 'Lyette' and 'Jimothy'. To facilitate the location of anatomical landmarsk, as shown in Figure 2, the following conventions were used: (i) a transparent white circle indicates where the Optic Disc is located, (ii) a blue transparent circle indicates where the Optic Cup is located, and (iii) a green transparent circle indicates where the Macula is located.

Virtual Avatars
The examination scenario was designed to introduce users to the virtual eye examination employing three virtual avatars posted in examination stations containing information about the tasks to be performed ( Figure 1). The first station (Figure 2a

3D User Interactions
Our previous work focused on the HTC Vive and the Valve Index controllers for the VR interactions and the Microsoft HoloLens 1 gesticulation system for the AR experience. The Unity game engine was used to create the VR and AR experiences. The VR version of the environment was developed in Unity 2019.3.0f5 with the SteamVR plug-in, while the AR version was developed using Unity 2017.2.5f1 compatible with the Microsoft Academy Mixed Reality Toolkit. To enable compatibility with current consumer-level VR and AR hardware such as the Oculus Quest series of HMDs, we used the Air Light VR (ALVR) software, an open-source remote VR display software. ALVR allows streaming content from a computer running the VR simulation into a standalone VR headset [27]. ALVR allows for the addition of the Oculus Quest's hand tracking as an alternative to the controller inputs.
In the VR simulation, the user is required to wear a headset and navigate the virtual environment by physically walking within a pre-established room-scale boundary. When using the HTC Vive, users are required to set their tracking space to room-scale with a minimum size of 2.0 × 1.5 m 2 , while when using the Oculus Quest, the tracking space minimum tracking space is 2 m 2 . Depending on the size of the VR interactive area, the simulation places the examination scene within the proximity of the user. It is important to note that this version does not support stationary VR modes to reduce possible simulator sickness effects caused by walking employing the controllers, hand gestures, or teleportation.
During the virtual eye fundus examination, the virtual direct ophthalmoscope (VDO) is operated by employing the HTC Vive or the Valve Index controllers when using the HTC Vive headset or employing the Oculus Quest controller or hand tracking when using the Oculus Quest 1 or Oculus Quest 2 in the VR simulation. Pinching, supported by the Microsoft HoloLens 1 hand gesticulation, is employed with the AR simulation (see Figure 3). By operating the VDO, the learner can identify the anatomical landmarks on an eye fundus. To perform the virtual eye fundus examination, first, the VDO has to be located, then reached, and secured. The VDO is placed on a table to the left of the virtual environment with an introduction and instructions. In the VR simulation, when using the HTC Vive controller, the VDO can be secured and gripped by reaching out and pressing the grip buttons on each side of the HTC Vive controller. When the Valve Index controllers are being used, the VDO can be secured by closing the hand around the controller simulating the feeling of grasping the VDO due to the finger proximity sensors and haptic feedback. When using the Oculus Quest controllers, the VDO can be secured by pressing and holding the grip button, while, when using hand tracking, the VDO can be grasped and held by closing the hand around it. When using the AR simulation, the VDO is secured by pointing the Microsoft HoloLens 1 reticle (i.e., a circular marking built into a holographic projection on the screen to enable interactions with the pinch gesture during AR experiences), at it and selecting it by performing a pinch gesture employing the index and thumb fingers. The same pinch gesture is used to hold and move the VDO during the virtual examination. Figure 3 presents the virtual examination with the 3D input methods in addition to an ophthalmoscope view of the eye fundus.

Study Design
The study was conducted in two stages. The first stage (interrupted by governmentimplemented COVID-19 lockdowns to minimize the virus spread) was a within-subjects study focused on analyzing the usability, workload, and difficulty after having added the static ophthalmoscope when employing the HTC Vive controller, the Valve Index controller, and the Microsoft HoloLens 1 hand gesticulation for conducting the virtual eye examination. The interventions were randomized to ensure all participants were exposed to the three user inputs in a different order to minimize carry-over effects. This study was reviewed by the Ontario Tech University Research Ethics Board (REB# 15526) and approved on 7 November 2019. The second stage was an online within-subjects study (during-COVID-19) focused on usability and engagement. This study was reviewed by the Ontario Tech University Research Ethics Board (REB# 15128) and approved on 22 January 2021. The virtual examination procedure was the same for both stages, requiring the participants to grasp the VDO and approach the three virtual patients in sequential order starting with the floating eye, then 'Lyette', and finally 'Jimothy'. Each virtual patient was accompanied by a set of floating instructions in text format indicating what the participant was required to do when examining the virtual eye fundus.

Participants
Five participants from Ontario Tech University in Oshawa, Ontario, Canada, were recruited for Stage 1. It is important to highlight that additional participants were not recruited for Stage 1, given the difficulties associated with the restrictions imposed by the COVID-19 pandemic. In total, 13 participants, 9 from Ontario Tech University, and 4 from undisclosed affiliations were recruited for Stage 2. The experiment of Stage 2 was conducted remotely, and therefore, all participants were required to have access to an Oculus Quest 2 headset, a VR-ready computer, and 5 GHz local area network connectivity for completing the VR tasks by streaming content from the computer to the headset using ALVR. The inclusion criteria and the COVID-19 pandemic have made it difficult to recruit a larger number of participants. All participants completed the study, and all participants reported being familiar with VR and AR during the introduction to the study and not having any condition that would impede them to perform the virtual eye fundus examination. Participant background with eye fundus examination was not considered as an exclusion factor since the information presented in the procedure was sufficient for novice trainees.
3.6. Evaluation Criteria 3.6.1. Usability-Stage 1 and Stage 2 The system usability scale (SUS) questionnaire [28] is regarded as a quick method for measuring system usability. The questionnaire asks users to rate levels of agreement through a 5-point Likert scale with statements that cover a variety of usability characteristics such as the system's complexity, ease of use, and need for assistance amongst others. After calculating the SUS score according to [28], a score above 68/100 indicates that the system is usable.

Task Workload-Stage 1
The NASA Task Load Index (TLX) [29] provides a method of measuring mental workload for each user as they complete the tasks. The Raw NASA TLX (RTLX) was chosen to derive an overall workload score based on a weighted average of ratings associated with mental demand, physical demand, temporal demand, performance, effort, and frustration [30].

Task Difficulty-Stage 1
User-perceived task difficulty is captured employing a five-point Likert scale question, where "Strongly Disagree" is "1", "Disagree" is "2", "Neutral" is "3", "Agree" is "4", and "Strongly Disagree" is "5". This question focuses on how difficult each method of interaction is with respect to locating the eye fundus landmarks when examining the virtual patients.

Stage 1
After agreeing to be part of the study and coordinating a participation date, the participants met with a facilitator in the GAMER Laboratory at Ontario Tech University in Oshawa, Ontario, Canada. Upon completion of the informed consent form, the participants were introduced to the HTC Vive controller, the Valve Index controller, and Microsoft HoloLens 1 hand gesticulation. Then, the participants were randomly assigned to each input device and required to perform an eye fundus virtual examination. After completing each examination with the designated user input device, the participants completed the SUS questionnaire, the NASA RTLX questionnaire, and the task difficulty question. After completing the study, participants received a verbal thank you from the facilitator.

Stage 2
After agreeing to be part of the study and coordinating a date for the study, the participants met over Discord and Google Meet with a facilitator. Upon completion of the informed consent form, the participants were introduced to the virtual scenario and were asked if they had any issues running ALVR with their Oculus VR headsets. Three of the participants had technical issues, and the facilitator helped them troubleshoot the problem until the simulation was running. Then, the participants were randomly assigned to either the Oculus Quest controller or the Oculus Quest hand tracking first and after completing the virtual examination, the participants completed the SUS questionnaire. Once finalized, the participants received a verbal thank you from the facilitator.

Results
Stage 1 results are limited due to the small sample size although five participants are on average suitable to find most of the usability problems that affect at least one-third or more of the users [31]. Furthermore, Nielsen and Landauer [31] indicate that adding more participants results in observing the same usability issues again and again. However, the small number of participants can limit the identification of significant effects associated with the data.  To determine the statistical power of the repeated measures ANOVA SUS results, we conducted a post hoc power analysis employing G*Power [32]. The effect size f(U) = 0.62 (SPSS sytle) indicates a medium effect with an α error probability of 0.05 and produces a statistical power of 51%.

Task Workload-Stage 1
The NASA RTLX scores are shown in Figure 5 and are consistent with the findings reported in [25] where the physical controllers required less workload than the hand gesticulation, HTC Vive controller (M = 30.66, sd = 4.42), Valve Index controller (M = 43.33, sd = 11.73), and Microsoft HoloLens 1 hand gesticulation (M = 47.5, sd = 15.04). The responses from the NASA RTLX questionnaire indicate that the participants found the Microsoft HoloLens 1 hand gesticulation more mentally and physically demanding, thus making them feel more insecure while requiring more effort to complete the task than with the HTC Vive controller or the Valve Index controller. Figure 5. Task workload indices distribution using the NASA RTLX index for the HTC Vive, Valve Index, and Microsoft HoloLens 1 hand gesticulation. The box plot indicates the maximum and minimum values, first quartile (q1-darker grey), and third quartile (q3-lighter grey), data points (blue dots), outliers (blue points above and below the whiskers), and the median (i.e., the line between the dark and light gray boxes) within the interquartile range.

Task Difficulty-Stage 1
The task difficulty response distribution associated with locating the eye fundus anatomical landmarks while employing the HTC Vive controller, the Valve Index controller, and the Microsoft HoloLens 1 hand gesticulation are presented Figure 6. Since Likert data is ordinal, the median provides the middle value of the responses to determine the difficulty rating. Results indicate that the participants found locating the virtual eye landmarks less difficult when using the HTC Vive controller with a Median = 2 equivalent to "Disagree", followed by the Valve Index with a Median = 3 equivalent to "Neutral", and then the Microsoft HoloLens 1 hand gesticulation with a Median = 4 equivalent to "Agree" being the most difficult user input device for locating the virtual eye anatomical landmarks.

Discussion
The COVID-19 pandemic has had a negative impact on this study. For example, we were only able to recruit five participants for Stage 1 due to mandatory lockdowns and 13 online participants for Stage 2. While initially designed for in-person testing, we had to implement adjustments to make it suitable for online testing. Most significantly, we now use the Oculus Quest 1 and 2 headsets as both headsets support hand tracking and are cable of wireless Desktop VR compatible with Steam VR. However, it is important to highlight that the scope of the results is limited due to the small sample size and that these initial findings allow us to highlight some considerations that require further investigation.

Usability
The physical VR controllers resulted in higher SUS scores than the hand gesticulation required for the AR simulation during Stage 1 supported by the findings presented in [25]. The HTC Vive and Valve Index controllers had a SUS score above 68/100 while the Microsoft HoloLens 1 hand gesticulation did not. Participants who struggled with the hand gesticulation required several attempts before they were able to interact with the funduscope using the pinching gesture. In comparison, the HTC Vive controller and the Valve Index were easier to use. Interestingly, we anticipated that the finger tracking available in the Valve Index controller would lead to higher usability than the HTC Vive controller due to allowing the grasping of objects by closing the hand around the controller. However, the participants found using one button for gripping the VDO more convenient than using finger tracking. From the usability testing results, we identified a problem associated with jittering when holding the VR controller in front of the virtual patient since shacking due to tracking issues made it difficult to observe the eye fundus. To address this issue, we implemented a floating static VDO for the participants to use if they experienced any form of jittering.
The addition of the Oculus Quest controller and hand tracking in Stage 2 in combination with a larger sample size (n = 13) allowed us to obtain a statistical power of %51 indicating |hlthat there is a probability of having a difference between the Oculus Quest controller and the Oculus Quest hand tracking. Stage 2 results are consistent with the findings presented in [25], where the SUS scores were higher for the Oculus Quest controller than the Oculus Quest hand tracking. Interestingly, the Oculus Quest hand tracking resulted in a higher usability score than the Microsoft HoloLens 1 hand gesticulation. From the study observations in Stage 1, the participants moved within the virtual eye fundus examination room more naturally and operated the VDO with ease, as they were able to move their arms and position the VR controllers at the correct height and distance from the virtual patient. Furthermore, when securing and holding the VDO, Stage 1 participants expressed their preference for the Valve Index controller, while this was not reflected in their SUS scores. For Stage 2, due to the online nature of the study, no observations were made. However, at the end of the study, the participants shared difficulties with their experience. For example, when using the Oculus Quest controller and hand tracking, all participants reported dropping the VDO during the simulation, in some cases due to stopping pressing the grip button or when the hands were out of the camera sensor's field of view when using hand tracking. Dropping the VDO when using the controllers resulted given that the participants expected the VDO to stay snapped to the hand after grabbing it, a relevant feature to consider for future work. While the Oculus Quest controller and hand tracking were well received, the hand tracking was not found usable when compared to the physical controller with a SUS score below the accepted minimum of 68.00/100. Some of the main issues reported by participants include concerns with respect to the stability and accuracy of the tracking as well as the limited working space due to the Oculus Quest's 100 degrees horizontal and 75 degrees vertical sensor field of view.

Task Workload
The RTLX indices show that utilizing physical controllers required lower effort than hand gesticulation when conducting the virtual eye examination. We believe that prior experience with VR and AR may have influenced the amount of perceived workload, thus requiring additional investigation in future work. While the participants struggled with hand gesticulation for the virtual examination from a usability perspective, this did not increase the workload. It is worth noting that task workload may be higher with novice users who are not familiar with VR or AR; in which case, practice over time facilitates employing the controllers and hand gestures.

Task Difficulty
The virtual DO eye examination task difficulty indicates a preference toward using the HTC Vive and Valve Index controllers. The Microsoft HoloLens 1 hand gesticulation was perceived to be more challenging due to the hand-tracking difficulties experienced by the participants. Participants also expressed difficulties associated with the field of view when employing the Microsoft HoloLens 1 as the clipping planes required them to maintain a certain distance from the patient, while in VR they were able to properly approach the patient for the virtual examination.
Finally, it is worth mentioning that as input devices, the VR controllers and the Microsoft HoloLens 1 hand gesticulation differed. While the Microsoft HoloLens 1 handtracking system performs poorly with respect to usability, this device was at a stage where head-mounted hand tracking was novel. Although preliminary, lessons from this study indicate that a physical device is better suited for manipulating a device such as the VDO as a result of the challenges experienced with hand tracking. While hand gestures are more natural, current tracking technologies require further development to properly capture the complexity and dexterity of the human hand at a consumer-level for effective user input devices.
While task difficulty ratings were not requested in Stage 2, the participants reported experiencing difficulties when examining some of the eyes and, more specifically, while trying to visualize the eye fundi. Additionally, on some rare occasions, when the VDO was dropped, the participants were unable to recover it, thus requiring them to restart the application in conjunction with ALVR on both the computer and the Oculus Quest headset. Additionally, some participants reported having hand-tracking issues with either the Oculus Quest 1 and the Oculus Quest 2. Due to the online nature of the test, it was difficult to ensure the same conditions across all participants and particularly with respect to lighting conditions which can affect the Oculus Quest tracking performance.
Every participant had a different opinion regarding which avatar was more difficult to examine. One participant expressed the floating eye was the most difficult because it lacked the patient's face, while others pointed out 'Lyette's' blinking and head movements as the reason for this, and finally, those who found 'Jimothy' the most difficult claimed it was because of the size of his eyes. Overall, all participants expressed that the floating eye did not provide relevant information for the practice and suggested redefining the static VDO when using hand tracking to better facilitate the examination when having tracking issues to snap the hand to it.

Conclusions
This paper presented a comparative study between VR and AR interaction methods for a direct ophthalmoscopy training tool employing five user input devices. Our work, which is part of a larger research project focused on developing a greater understanding of immersive technologies in virtual eye examination, builds upon and expands on previous research. The results of the current study have expanded our understanding of user input interaction techniques within a virtual (VR and AR) DO eye examination simulation. We compared several user-interaction techniques including a traditional VR controller such as the HTC Vive, a finger-tracking VR controller such as the Valve Index, the more recent Oculus Quest controller, and hand tracking available in the Microsoft HoloLens 1 and the Oculus Quest headsets, with respect to usability, task workload, and difficulty.
The usability results indicate that physical VR controllers are regarded as a more practical and functional choice for virtual interactions. However, due to the small number of participants, no statistical difference between the Oculus Quest controller and the Oculus Quest hand tracking was observed. A larger sample size is needed to increase the achieved %51 statistical power with the current sample, before more concrete conclusions can be made.. With respect to the results obtained in Stage 1, the Microsoft HoloLens 1 hand tracking presented the participants with difficulties when utilizing hand gesticulation, where inaccurate gesture recognition and registration induced frustration leading to higher SUS scores in favor of the HTC Vive and Valve Index controllers. These results are consistent with the findings presented in [25]. Interestingly, while the Microsoft HoloLens hand gesticulation and the Oculus Quest hand tracking presented a simpler interaction, gesture tracking issues causing jittering and proper hand detection prevented the users from examining the eye without problems.
The difficulty associated with locating the anatomical landmarks inside the virtual eye was consistent with the usability and workload results. Here, the input interactions provided by the HTC Vive and Valve Index were perceived as less difficult than the Microsoft HoloLens 1 hand gesticulation. However, in some cases, the hand tracking with the Oculus Quest 2 introduced jittering into the examination having participants use the static ophthalmoscope in the scene.
As a result of our findings, we suggest that physical user input devices be employed when possible for DO eye examination since they can provide ease of use when performing 3D user interactions.
Future work will focus on the refinement of the virtual eye examination simulation to realistically simulate patient interactions with diverse eye-fundus-related conditions. In this case, adjustments will be made to the functionality of the virtual ophthalmoscope to allow for a more detailed inspection of the fundus, as well as providing additional means to interact with the eye model for compatibility across diverse platforms to enable remote participation if COVID-19 restrictions remain. Furthermore, development toward supplementary 3D-printed peripherals and mixed interactions employing physical controllers and hand tracking will be explored to enhance user experience and increase the realism of the virtual procedure. Finally, future work will also see a larger study that includes a greater number of participants and also examines retention, to further develop our understanding regarding the effects of 3D input interaction techniques within a virtual DO eye examination.