1. Introduction
Virtual reality (VR) devices, e.g., head-mounted displays (HMDs), are becoming mainstream tools in a wide range of different domains, for example, in entertainment, marketing, education, and training. They make it easy to cater immersive experiences with the user situated in a realistic three-dimensional (3D) environment where he/she can, for example, observe, learn and act on virtual, dynamic constructions that could not be arranged in reality.
VR devices have been introduced also into the medical field, especially in visualization and training applications [
1]. For example, Gunn et al. [
2] studied the impact of a VR learning environment on the development of technical proficiency for medical imaging (diagnostic radiography). In addition, Reymus et al. [
3] found that students appreciated the VR simulation over two-dimensional radiographs in learning root canal anatomy, and Ryan et al. [
4] reported that virtual learning significantly improved students’ satisfaction/engagement and recall. Almiyad et al. [
5] presented an augmented reality (AR) tool to help doctoral students learning percutaneous radiology procedures.
Currently, in medical diagnosis and operation planning, clinicians perform their task of observing the 3D models on two-dimensional (2D) screens using 2D software. The clinicians use 2D interfaces, such as a mouse and keyboard, to translate and rotate the 3D model to observe it from different angles. However, using 3D environments to perform the same task will provide better understanding [
6,
7]. Due to medical risks involved, the clinicians require a reliable interaction technique and interface to precisely manipulate 3D objects in VR.
Rigid object manipulation (such as the model observations, above) is one of the fundamental tasks in virtual reality besides navigation and way finding, according to Bowman and Hodges [
8]. Previous research results indicate (e.g., [
9,
10,
11]) that hand-tracking-based systems are not necessarily as accurate or efficient as controller-based systems for direct object manipulation in VR. However, if hand tracking is a viable interaction method, the system user would be able to avoid learning new control methods. Two direct manipulation interaction techniques,
HandOnly (pinch gesture) and
Controller+Trigger, differ in terms of two design parameters: the tracking method and action for object pickup. To evaluate the designs, we included a third interaction technique, which combines the controller tracking with hand gestures. This hybrid interaction technique called
Controller+Grab is based on the functionality of the Valve Index controller [
12] with finger tracking and makes use of the grab gesture.
A controlled experiment was carried out using commercially available interaction technologies with 12 participants, using within-subject design, and the conditions were presented in balanced order. We measured objective measures, such as the task completion time, position and orientation error and subjective measures of confidence, ease of use, hand tiredness, naturalness (defined by Navarro and Sundstedt [
13] as an interaction concept of “using your own body”), and using daily. The results demonstrate that the hybrid interaction technique was found to be the most liked, as it was intuitive, easy to use, fast, reliable and it provided haptic feedback resembling the real-world object grab.
Wagner et al. [
14] compared hand interaction, controller interaction and their combination for data manipulation task in a virtual reality environment. Similar to our experiment, the results indicate that while there were no significant differences between methods in task performance or workload, the participants preferred the mixed mode.
Earlier studies (e.g., [
9,
10,
11,
13,
15,
16]) compared hand-tracking- and controller-based interaction techniques in terms of task accuracy and task completion time and used these two factors to choose the optimal interaction technique for object manipulation. The present experiment demonstrates that there is a trade-off between naturalness, task accuracy and task completion time when using these direct manipulation interaction techniques, and we should also consider naturalness while making the decision of choosing the interaction technique for direct object manipulation. This implies that future direct manipulation interaction technique designs should consider these three factors: task accuracy, task completion time and naturalness.
In summary, the contributions of this paper are (1) an evaluation of three interaction techniques for direct manipulation of a rigid object, (2) finding of the trade-off between different factors for direct manipulation techniques.
6. Discussion
Looking back to the research questions RQ1 and RQ2 (
Section 3.1), we found two cases where there were statistically significant differences in the completion times and placement accuracy (research question RQ1). The
HandOnly method was found to be statistically significantly slower than the two other methods,
Controller+Trigger and
Controller+Grab. However, for the test of positioning accuracy, the statistical power was too low, and the statistically significant difference in the test needs to be discarded. No other significant differences were found in the objective measures. The results also confirm Hypothesis H2 (task completion time is shorter for
Controller+Trigger than for
HandOnly;
Section 3.1).
The results are in line with related earlier studies, e.g., [
9,
10,
11,
13,
38] where hand tracking compared unfavorably against the regular controllers for object manipulation tasks. For example, Caggianese et al. [
9] reported that completion times were significantly faster with the controller than with the hand interface, and Gusai et al. [
10] concluded that controllers give better performance in accuracy and in usability.
The slow pace of the
HandOnly method is at least partially explained by the hand pose recognition issues. The current technology cannot always reliably recognize when the user is picking the object or when he/she is releasing it. This sometimes led to repeated trials of object manipulation and, therefore, more time spent on the task. Additionally, the object release was sometimes difficult, as the participant could not say exactly when the system would recognize the release act, and the object might “jump” from an intended position if the hand pose slightly changed at the recognized release time (reported as a “sticky hand” effect by Navarro and Sundstedt [
13]). The release recognition problem may also have been the reason for the positioning accuracy difference between the
Controller+Trigger method and the
Controller+Grab method (participants commented “…the most difficult to discern when I had successfully released an object”, “Releasing accurately was difficult”, and “Releasing object felt uncertain”).
For the participant preferences (research question RQ2), we found three cases (two related to subjective attributes and one related to ranking) where there were statistically significant differences between the interaction methods. However, the results for subjective attributes had very low statistical power.
The lower values of user confidence for
HandOnly are most probably caused by the hand pose recognition problems. The participant had difficulties picking up the cubes, which naturally leads to lower confidence. The evaluation of the
HandOnly method being more tiring may be because it took more time and (sometimes several) repeated efforts to get the task done. When using
HandOnly, the arm moves and poses do not obviously extend farther or wider than for the other methods (for the extent of hand tracking, see [
61]).
The interaction methods were also ranked by the participants in three dimensions: the most liked, easiest to adopt and having the most development potential (see
Table 3). For these rankings only the
Controller+Grab method was ranked as statistically significantly being more liked (R1) than the
HandOnly method. The results also show that Hypothesis H1 (the
HandOnly would be preferred over the
Controller+Trigger) was not confirmed. For the other ranking questions, there were no significant differences between the interaction methods. It is noteworthy that Figueiredo et al. [
44] reported that hand tracking was liked more than the controller, even though hands were not as precise as the controllers. Their study showed that some participants liked hands, as they were intuitive to use and did not require any learning, while some participants with more experience with the controller did not feel that there was a high learning curve for the controllers. In addition, some participants in [
44] reported feeling a higher level of immersion with hand tracking.
The
Controller+Grab method was the most liked because of many reasons: the method was easy and accurate, and a good balance between naturalness, reliability, and speed. The
HandOnly method was liked because it felt natural, and the
Controller+Trigger because it was easy to use; the
Controller+Grab had good qualities of both. In the comments, there were no common ideas on what feature would make an interaction method easy to adopt. The
HandOnly was seen as having the most opportunities for improvement, despite the many problems with its use.
HandOnly is felt to be a natural way of interaction, which Navarro and Sundstedt [
13] described as an interaction concept of “using your own body”. The participants commented: “Very much enjoyed manipulating objects without the controller”, and “The
HandOnly felt still the more difficult to control …but I would see [
HandOnly] has the most potential, …”.