Controller-Free Hand Tracking for Grab-and-Place Tasks in Immersive Virtual Reality: Design Elements and Their Empirical Study

: Hand tracking enables controller-free interaction with virtual environments, which can, compared to traditional handheld controllers, make virtual reality (VR) experiences more natural and immersive. As naturalness hinges on both technological and user-based features, ﬁne-tuning the former while assessing the latter can be used to increase usability. For a grab-and-place use case in immersive VR, we compared a prototype of a camera-based hand tracking interface (Leap Motion) with customized design elements to the standard Leap Motion application programming interface (API) and a traditional controller solution (Oculus Touch). Usability was tested in 32 young healthy participants, whose performance was analyzed in terms of accuracy, speed and errors as well as subjective experience. We found higher performance and overall usability as well as overall preference for the handheld controller compared to both controller-free solutions. While most measures did not differ between the two controller-free solutions, the modiﬁcations made to the Leap API to form our prototype led to a signiﬁcant decrease in accidental drops. Our results do not support the assumption of higher naturalness for hand tracking but suggest design elements to improve the robustness of controller-free object interaction in a grab-and-place scenario.


Introduction
Immersive Virtual Reality (VR) enables the user to "dive into" a computer-generated 3D environment. Scenarios in immersive VR are typically presented on a head-mounted display (HMD) that blocks out vision of the outside world and adapts the virtual environment based on the user's head movements, thereby enabling a sense of presence (i.e., a feeling of actually being in the virtual world despite knowing it to be an illusion [1,2]). The popularity of immersive VR has been increasing for a wide range of applications; for example, for entertainment, education and industry but also research and clinical purposes [3]. Commercial VR systems (e.g., Oculus Rift, HTC Vive) enable interaction with virtual environments and objects (beyond head movements) typically by way of handheld controllers (e.g., the Oculus Touch). Such setups have been used, for example, for safety and equipment training in mining (e.g., [4]) or manufacturing scenarios (e.g., [5]). In basic research, VR enables naturalistic (i.e., dynamic, multisensory, interactive) experiments while maintaining full experimental control and allowing precise measurements of the participant's behavior [6][7][8][9]. Effects found in such naturalistic

Task and Prototype
The Leap Motion is a common, affordable, camera-based sensor for hand tracking. Leveraging its modifiability, we designed a prototype using the basic Leap API. We optimized the interface for the gestures of grabbing and placing objects on a table: gestures commonly used in neurological testing [10] and industry (i.e., [4]). Grab-and-place tasks are also common in VR interaction research [5,15,16]. The following design elements were introduced, based on our own experiences and previous studies:

1.
Colour: Smart object colouring (a green "spotlight" emitted from the virtual hand-see Figure 1) to indicate when an object is in grabbing distance. Color indicators on the virtual hand, the virtual object or both have been shown to improve time on task, accuracy of placement and subjective user experience [5,15,16].

2.
Grab restriction: The user can only grab the object after first making an open-hand gesture within grabbing distance of the object, in order to prevent accidental grabs. 3.
Transparency: Semi-transparent hand representation as long as no object is grabbed, to allow the user to see the object even if it is occluded by the hand (Figure 1).

4.
Grabbing area: The grabbing area is extended so that near misses (following an open hand gesture, see above) are still able to grab the object.

5.
Velocity restriction: If the hand is moving above a certain velocity, grabbing cannot take place, in order to prevent uncontrolled grabs and accidental drops). 6.
Trajectory ensurance: Once the object is released from the hand, rogue finger placement cannot alter the trajectory of the falling object. 7.
Acoustic support: Audio feedback occurs when an object is grabbed and when an object touches the table surface after release (pre-installed sounds available in the Unity library).

Present Study and Hypotheses
In this study, we assessed the usability of camera-based object interaction for a grab-and-place use case in immersive VR, which is common in VR-based rehabilitation and industry applications. We modified the basic implementation of Leap Motion-based hand tracking and created the HHI_Leap prototype (described above). We then compared it to the basic Leap Motion (B_Leap) and the Oculus Touch controller. Performance was evaluated using measures of accuracy, speed and errors (accidental drops) as well as subjective experience (self-report questionnaires).

Hypotheses
Based on the literature and our own experiences, we had the following hypotheses:

Study Design
In a repeated-measures design with interface (B_Leap, HHI_Leap, Oculus controller) as the independent variable (Figure 2), all participants completed a grab-and-place task with all interfaces in randomized order (for task details, see Section 2.3). Performance was measured during the grab and place task, and participants answered questions about their experience after each interface ( Figure 3).

Sample
Participants were recruited from staff at the Fraunhofer Heinrich Hertz Institute who were not involved in this research project. Exclusion criteria were injuries or vision impairments that would prevent them from completing the task. The sample comprised 32 participants (23 males, 8 females, 1 undisclosed) between 22 and 36 years of age (M = 27.8, SD = 3.34). The majority of participants (n = 20) reported "no" (n = 10) or "some" (n = 10) prior experience with VR. All participants (n = 32) reported at least "some" prior experience with video game controllers and half of the participants (n = 16) reported "frequent" current use of electronic games (including mobile phone games).

Task
Participants completed a virtual grab-and-place task (Figure 4), in which they had to place virtual cubes, one at a time, onto a target area on a virtual table (Figure 3). In 30 trials per interface (B_Leap, HHI_Leap, Oculus), 10 cubes of 3 different sizes (small: 3 cm; medium: 6 cm; large: 10 cm) were presented in random order. In each trial, the cube appeared randomly at one of 10 positions at 30 cm from the target ( Figure 5). Participants used their dominant hand to place each cube as quickly and accurately as possible onto a target, which consisted of a 2D square the size and shape of one face of the cube (Figure 2). After grabbing the cube, the participant had one chance to place it on the target; they could not adjust the cube once it made contact with the table.  Accidental drops: Prematurely terminated trials due to mistakenly dropping the cube (for details see below); used for both cleaning the data and as additional outcome measure to quantify interface performance.

1.
System Usability Scale (SUS): A "quick and dirty" [21] questionnaire to assess the usability (primarily "ease of use") of any product (e.g., websites, cell phones, kitchen appliances). It contains 10 items with 5-point Likert-scale response options from 1 (strongly disagree) to 5 (strongly agree). Responses are transformed by a scoring rubric, resulting in a score out of 100.

2.
Single subjective questions: 8 questions assessing user experience (i.e., comfort, ease of gripping, likelihood to recommend to friends) with Likert-scale response options ranging from 1 (strongly disagree) to 5 (strongly agree).

3.
Agency: The feeling of control over and connectedness to (a part of) one's own body or a representation thereof [22] was measured with the question "I felt like I controlled the virtual representation of the hand as if it was part of my own body" [16]. Response options ranged from 1 (strongly disagree) to 7 (strongly agree).

4.
Overall satisfaction on a Kunin-scale: Response options ranged from 1 (least satisfied) to 7 (most satisfied) with smiley faces representing degree of overall satisfaction [23] ( Figure A1).

5.
Overall preference: After the participants completed all 3 interfaces, they answered the question "Of the three interfaces you used, which did you like best?" It was left up to the participants to define "best" for themselves. This was not meant to be a single definitive data point to gauge overall subjective preference, but one measure among others (including overall satisfaction and the SUS).

Data Cleaning/Pre-Processing
Accidental drops were analyzed separately from the calculation of the other performance metrics (accuracy, total time, grab time and release time). Therefore, trials that were flagged as accidental drops were removed from the set of trials analyzed for performance.
A trial was considered an accidental drop if it fulfilled at least one of three criteria: 1. the experimenter noted that the participant accidentally dropped the cube before getting a chance to place it on the target, 2. release time below 0.5 s and accuracy above 10 cm, 3. accuracy above 20 cm.
Trials with accidental drops (8%: 233 of 2880 trials in total) were removed from the analysis (199 according to criterion 1, of which 168 also fulfilled criterion 2 or 3; 34 fulfilled criterion 2 or 3 but not criterion 1). Hence, 2647 trials entered the performance analyses.

Analysis
For each metric, a one-way repeated-measures analysis of variance (ANOVA) with the three-level factor "interface" (B_Leap, HHI_Leap, Oculus) was conducted. For all statistical tests, a two-sided alpha of 0.05 was used to determine significance. In case of significance in the ANOVAs, paired t-tests were conducted between the HHI_Leap and B_Leap and between the HHI_Leap and Oculus (as the HHI_Leap was designed to improve upon the B_Leap, and we had no research questions regarding the differences between the B_Leap and Oculus, we decided to defer on this analysis). Multiple-comparison correction (for the two comparisons) was performed using the Holm-Bonferoni method [24]. For measures of central tendency, we used means and 95% confidence intervals. To enable quantitative (parametric) summary and inferential statistics, we deliberately added qualitative labels (i.e., strongly agree, strongly disagree) to only the extreme values of the scales. Despite some controversy, parametric tests can be a viable method for analyzing Likert-scale data, even when they violate assumptions (i.e., normal distribution) for those analyses [25][26][27].

Results
We conducted repeated-measures ANOVAs to analyze differences between the interfaces on each measure. In case of statistical significance, pairwise comparisons were used to determine the origin of the effect (see Section 2.6: Analysis for details). The results for all performance measures are presented first, followed by subjective measures.

Performance Measures
There were significant main effects of interface for all performance measures (Table 1), which were driven by the Oculus controller performing significantly better than the HHI_Leap on all performance metrics ( Table 2). No significant differences were found between the HHI_Leap and B_Leap for accuracy or total time per trial. The grab time with the HHI_Leap was significantly longer than with the B_Leap by 0.22 s on average ( Figure 6). Release time with the HHI_Leap was significantly shorter than with the B_Leap by 0.27 s on average (Figure 7). With the HHI_Leap, significantly fewer accidental drop errors occurred than with the B_Leap, with a difference of 2.34 drops on average (Figure 8).

System Usability Scale (Sus)
There was a significant main effect of interface for SUS scores (Table 1). This effect was driven by the Oculus controller as the paired-samples t-test comparing the HHI_Leap showed that their SUS scores were not significantly different (Table 2, Figure 9). Scoring guidelines for the SUS [28] put both Leap interfaces at the border (score = 70) of "acceptable" and "marginal." SUS scores for the HHI_Leap were significantly lower than for the Oculus controller (Table 2). Figure 9. Usability scores (System Usability Scale, SUS) by interface, with means and 95% confidence intervals, individual participant scores (colored dots), smoothed density distributions, results of paired-samples t-tests and scoring categories [28]. * = p < 0.05.

5-Point Likert Scale Questionnaire Items
ANOVA found significant main effects for Interface on ratings of comfort, precision, gripping, releasing and likelihood to recommend (Table 1). These effects were driven by differences between the HHI_Leap and Oculus, as post-hoc paired-samples t-tests showed no significant difference between the HHI_Leap and the B_Leap (Table 2, Figure 10). All 5-point Likert question ratings were lower for the HHI_Leap than for the Oculus controller, and paired-samples t-tests showed that ratings were significantly lower for the HHI_Leap than the Oculus on comfort, gripping, releasing, precision and likelihood to recommend (Table 2, Figure 10).

Agency
ANOVA found no significant effect of interface on ratings of Agency (Table 1, Figure 11).

Overall Satisfaction
ANOVA found a significant main effect of interface on ratings of Overall Satisfaction (Table 1). Post-hoc paired-samples t-tests showed a significant difference between the Oculus and HHI_Leap (Table 2, Figure 12). Mean scores for Overall Satisfaction, converted to a scale of −3 to 3, with 95% CI. * = p < 0.05. Significant difference is between Oculus and HHI_Leap (see Table 2).

Overall Preference
Out of the 32 participants, 18 selected the Oculus controller as their overall preferred interface, 7 selected the HHI_Leap and 7 selected the B_Leap.

Comparison of Hand Tracking to the Traditional Controller
For a grab-and-place task in immersive VR, we systematically compared a prototype of a camera-based hand tracking interface optimized for this use case, to the standard (non-optimized) Leap Motion API, and a traditional controller. We measured performance (accuracy, speed, errors) and subjective experience (e.g., comfort, naturalness, precision). The traditional controller outperformed hand tracking on all performance metrics. Subjective ratings were significantly higher for the traditional controller for ease of use (as measured by the SUS), gripping, releasing, precision, comfort, likelihood to recommend, and overall satisfaction. For the remaining subjective dimensions, there were no significant differences between the traditional controller and hand tracking.
Because hand tracking enables more naturalistic interaction gestures than the traditional controller, we hypothesized participants would rate hand tracking higher on naturalness and intuitiveness. The absence of significant differences on these ratings was surprising. This could be because of unfulfilled user expectations about the capabilities of hand tracking (see [29], and [30] for the case involving voice interfaces). Technological limitations with mapping the virtual to the real hand and the responsiveness of the Leap Motion system may have lowered feelings of naturalness [17] as well as the related dimension of agency: even small differences between the user's intended hand gesture and how the virtual hand actually behaves can lower the sense of agency [22,31,32].
The combination of naturalistic gestures, sub-optimal responsiveness and the lack of haptic feedback may have placed hand tracking in its current state (as represented by our prototype) in the dip of the U-shaped curve described by McMahan et al. [13]. This close-but-not-quite-there representation of reality in VR may create an uncanny valley effect (originally discussed in the context of lifelike robots [33,34]), in which the similarity to reality results in the VR experience feeling unsettling. Consistent with this possibility, Argelaguet et al. [16] found that less realistic virtual hand avatars created higher feelings of agency.
The haptic element is important: One study found that combining the Leap Motion with a haptic feedback device can create stronger feelings of presence and immersion than the Leap Motion on its own [1]. Though the Oculus controller did not map haptic feedback to interaction with virtual objects in our study, it is itself a physical object, with buttons to press to engage in interaction. Participants may have responded positively to this tactile sensation when using the traditional controller to interact with virtual objects.

Comparison of Our Hand Tracking Prototype (Hhi _Leap) to the Basic Leap Api (B _Leap)
Performance metrics showed that the HHI_Leap improved upon the B_Leap by reducing errors (accidental drops), while being slightly slower for grabbing (grab time), slightly faster for placing (release time), and not significantly different in accuracy or total time per trial. Increased grab time can be explained by the modification of the HHI_Leap, which required users to hover with the hand over the object before grabbing was possible (see Section 1.4 for a list of specifications to our prototype). Although the measure to prevent unintentional gripping led to an increase in grab time, the total time for the task remained the same, as the release time was correspondingly shorter. The additional features of HHI_Leap (compared to B_Leap) against unintentional gripping and unintentional dropping during fast movements may explain the reduced errors. Specifically, these features were the prevention of grab recognition when the hand was moving too fast and the prevention of rogue fingers from altering the trajectory of the cube after release. While subjective scores were slightly higher overall for the HHI_Leap on ease of use (as measured by the SUS, Figure 9) and individual questions (Figure 10), we found no significant differences between the HHI_Leap or the B_Leap on any subjective experience measure.

Limitations and Recommendations for Future Research
As our sample was relatively homogeneous (mostly young males, see Section 2.2), the results may not generalize-for example to particular populations that might use VR in clinical settings, such as older adults or patients. While previous VR experience among our sample was low, video game experience was high, which might have increased familiarity with video game controllers and prepared our participants for use of the Oculus controller. Familiarity generally increases affinity [35] and has been shown to increase user satisfaction and performance for interfaces [30].
The sample size was based on comparable within-subjects studies, which tested around 30 participants (e.g., [5,15,16]). Testing a larger sample could determine whether non-significant differences between conditions are a power issue or because of an absence of such differences.
In addition, users may have had expectations for hand tracking based on cultural phenomena, such as science-fiction films (e.g., Minority Report (2002)) [29], which may have been their only source of their mental models ( [20,36] for such an interface. If expecting a highly fluid experience like in the movies, participants would be disappointed by the performance of the Leap Motion, which may have led to lower ratings. We recommend future research efforts to assess expectations and prior experience that may be a factor in observed usability.

Conclusions
Our study forms a basis for determining the state of the art of a flexible, common implementation of a hand tracking VR interface by comparing it to a traditional VR controller, on a task representative of the types of motions found in VR applications. The traditional controller provided better overall usability than the hand tracking interface, as evidenced by performance and subjective metrics. On individual questions, however, the only significant differences were on questions related to performance, and the hand tracking interface was rated as acceptable (within 1.1 standard deviations from the middle rating; see Table 2, Figure 10) on all subjective items except for those assessing performance. For tasks that involve grabbing and placing, hand tracking can be a viable alternative, especially given the flexibility of the Leap API. Our modifications improved the performance of the Leap Motion, particularly by reducing accidental drops.
Our results do not support the hypothesis of higher naturalness for hand tracking in its current state over traditional controllers. However, as familiarity with hand tracking in the general population increases and technical issues are progressively overcome, this may change. An analogous technology may be touch interfaces, which were inferior to mouse pointing and clicking for many years, until very recently. Similarly, as hand tracking interfaces iteratively improve and become progressively more commonplace in daily life, they may come to surpass traditional controllers in user experience.