The E ﬀ ects of Augmented Reality Interaction Techniques on Egocentric Distance Estimation Accuracy

: Recent developments in virtual environment applications allow users to interact with three-dimensional (3D) objects in virtual environments. As interaction with 3D objects in virtual environments becomes more established, it is important to investigate user performance with such interaction techniques within a speciﬁc task. This study investigated two interaction modes, direct and indirect, depending on how the users interacted with the 3D objects, by measuring the accuracy of egocentric distance estimation in a stereoscopic environment. Fourteen participants were recruited to perform an acquisition task with both direct pointing and indirect cursor techniques at three egocentric distances and three task di ﬃ culty levels. The accuracy of the egocentric distance estimation, throughput, and task completion time were analyzed for each interaction technique. The indirect cursor technique was found to be more accurate than the direct pointing one. On the other hand, a higher throughput was observed with the direct pointing technique than with the indirect cursor technique. However, there were no signiﬁcant di ﬀ erences in task completion time between the two interaction techniques. The results also showed accuracy to be higher at the greatest distance (150 cm from the participant) than at the closer distances of 90 cm and 120 cm. Furthermore, the di ﬃ culty of the task also signiﬁcantly a ﬀ ected the accuracy, with accuracy lower in the highest di ﬃ culty condition than in the medium and low di ﬃ culty conditions. The ﬁndings of this study contribute to the understanding of user-interaction techniques in a stereoscopic environment. Furthermore, developers of virtual environments may refer to these ﬁndings in designing e ﬀ ective user interactions, especially those in which performance relies on accuracy. shows the mean completion time as functions of egocentric distance and ID. The repeated-measures ANOVA results showed that the task completion time for egocentric distances di ﬀ ered signiﬁcantly (F [1.709, 22.217] = 37.382, p < 0.001). Tukey post-hoc tests further showed that the completion time was signiﬁcantly di ﬀ erent ( p < 0.05) for all three distances. The completion time signiﬁcantly increased when targets were displayed at the farthest distance of 150 cm (Mean = 18.342 s, SD = 0.930 s) as compared to 120 cm (Mean = 15.318 s, SD = 0.831 s) and 90 cm (Mean = 13.836 s, SD = 0.751 s). The completion time increased with ID for both direct pointing and indirect cursor techniques. The ANOVA conﬁrmed that ID signiﬁcantly a ﬀ ected completion time (F [2, 26] = 25.882, p < 0.001). The post-hoc Tukey’s analysis classiﬁed the independent variables into two groups. The results showed that the completion time was signiﬁcantly higher ( p < 0.01) when targets were displayed at high ID (mean = 17.020 s, SD = 0.893 s) than when they were displayed at medium (mean = 15.848 s, SD = 0.852 s) and low IDs (mean = 14.628 s, SD = 0.650 s). The post-hoc test did not indicate a signiﬁcant di ﬀ erence between the low and medium IDs ( p > 0.05) in task completion time. We found no signiﬁcant main e ﬀ ect of interaction technique (F [1, 13] = 1.775, p > 0.05) on task completion time, and no signiﬁcant ( p > 0.05) two-way interaction e ﬀ ects of egocentric distance


Introduction
Virtual environment (VE) applications are rapidly increasing. Since the early 1990s, such applications have been developed in diverse domains, such as surgery [1], safety training for cabin safety procedures [2], advanced manufacturing systems integrating robots and humans to execute tasks [3], and usability evaluations [4]. In addition to being applicable in more extensive areas, VEs have recently received attention from computing technology providers such as Microsoft, Google, NVIDIA, HTC, and Samsung-all of which are seeking even more exciting applications. Many recent efforts have focused on applications that allow users to observe and interact in three-dimensional (3D) environments rather than just viewing 3D modeled images. In the development of these applications, immersive virtual environments (IVEs) featuring head-mounted displays (HMDs) are mostly used.

Direct vs. Indirect Interaction Methods
Since many applications in VEs require selecting a target object among other objects, prior studies have applied an acquisition task as their basis of study. To acquire the object (target), the user performs an action to position the selection tool (e.g., a finger, cursor) over the target. A tracking system also needs to acquire all the necessary information from the user through his/her actions or other movements [9,10]. Lin and Woldegiorgis [11] investigated the performance of a direct pointing technique, wherein participants directly moved a hand to reach for a real/virtual target shown continuously in a VE with projection display. Similar studies by Bruder et al. [12] and Swan et al. [13] employed direct-reaching by hand as a reporting method. In Bruder, Steinicke, and Sturzlinger [12], a comparison of interaction techniques with direct 3D mid-air interaction and a direct-2D touch screen found no significant difference in error rates in a selection task performed in a stereoscopic multi-touch tabletop setup. In contrast, Swan, Singh, and Ellis [13] conducted a comparative study of direct-matching and blind reaching in estimating the position of real/virtual objects in the stereoscopic display, and they found that perceptual direct-matching was more accurate than blind reaching in estimating egocentric distance. Overall, distances have been underestimated by up to 4 cm in blind reaching, but the disparity was reduced to 2.6 cm in perceptual direct matching. In our study, these techniques (direct pointing, reaching, and direct matching) are considered direct interaction methods.
Unlike direct interaction methods, which utilize the human body, including gestures and gaze direction, to complete a task [14], indirect interaction methods allow the user to observe representations in a display. The user controls the representation or an icon to perform a specific task. The icon can represent a virtual hand, a virtual cursor, or anything imaginable that can be used as a virtual control [14]. The user directly controls an intermediary, such as a mouse, gamepad, stylus or other physical controls, to manipulate the icons. Previous studies have applied virtual representations to perform a task in IVEs. Poupyrev and Ichikawa [15] presented the possibility of an interaction metaphor of a virtual pointer and a virtual hand for object selection and repositioning tasks in IVE. In their study, the participants interacted with objects by ray casting pointing technique, wherein when both the virtual pointer and the virtual hand intersected with an object, the object could be manipulated (selected or repositioned). The comparison showed that the virtual pointer provided more accuracy in a selection of objects within the user's reaching distance than did the virtual hand. Werkhoven and Groen [16] evaluated manipulation (positioning, grasping) performance with 3D controllers (i.e., virtual hand control and 3D cursor control) in a stereoscopic environment. The virtual hand control was set such that a participant manipulated a virtual object by contacting a virtual hand created by wearing a DataGlove on the right hand. The position and orientation of the glove were tracked in real time while the participant was manipulating the object. In the 3D cursor control setting, the participant controlled a SpaceMouse to move the 3D cursor (i.e., a virtual arrow) in VE. The results showed that in positioning tasks, virtual hand control was more accurate than the 3D cursor. A recent study by Deng et al. [17] asked participants to position a ball-shaped object in a spherical area in a virtual space using a handheld controller in a ray casting technique setup. This technique allows a virtual light ray emitted from the controller to move an object remotely in a VE. Based on our definition of how users interact with objects, we considered those settings in indirect interaction methods.

Related Work
As user interactions with virtual objects become more popular, it is crucial to determine an appropriate interaction technique and to design effective interactions with the 3D target in VEs. To design an effective and efficient user interface, it is also essential to understand the factors affecting the users' performance. Moreover, the design of the interface also needs to encompass an understanding of human behaviors-particularly when people are interacting in 3D space. Researchers have explored another 3D space input technique [18][19][20] as an alternative to traditional 2D input devices (e.g., mouse, trackball, etc.) and touch screens [21]. Generally, the technique utilizes 3D target pointing hand movement as an input function for various devices that require free-hand or touchless computer interaction. The question is whether such hand pointing interactions have the same performance as traditional 2D input devices. Several studies have successfully investigated user performance in human-computer tasks in 2D environments [22,23] by utilizing direct-touch screens, as well as in 3D environments [17,24,25]. These are considered to be the same type of indirect interactions used in the present study. Nevertheless, to the best of our knowledge, comparative studies on how users interact with virtual targets considered direct and indirect interactions in our study are limited in number. Although previous studies have examined the performance of target acquisition in VE, the two techniques were considered separately (see Table 1 for an overview of studies on interaction techniques). This study, therefore, investigated the effects of direct interaction and indirect interaction techniques on the accuracy of egocentric distance estimation in a VE. We simultaneously employed direct pointing and a virtual cursor controlled by a gamepad for interactions in direct and indirect interactions.
Another important factor in a VE is space and the perception of it. In addition, spatial information such as distance and size are fundamentally crucial in both the real world and VEs; furthermore, its information should be provided accurately [26]. However, biased perception of distance in VEs often causes users to over-or underestimate. The space in VEs can be divided into two measured distances: egocentric and exocentric distance. Egocentric distance is a distance between an observer and the object; usually, the observer estimates a depth toward the object. Exocentric distance is a distance between two objects. Cutting [27] defined a metric of depth from observer to target into peripersonal space, a distance extending a little beyond the arm's reach (about a radius of one meter) and extrapersonal space, which is all space beyond 1 m. Distance underestimation has been reported in the majority of studies of VEs [28][29][30][31]. A systematic review of previous studies by Lin and Woldegiorgis [32] found that the accuracy of distance in VEs was only about 80%, as compared to about 94% in the real world. Distance estimation in VEs can also suffer from worse inaccuracy, sometimes by as much as 50% of the intended distance [33]. Another study of egocentric distance perception in an IVE system by Willemsen et al. [34] showed that distance accuracy was degraded by 45% (underestimated true distance within the range of 5 to 15 m). The study hypothesized that the inaccuracies and cue conflicts involving stereo viewing conditions in HMDs resulted in inaccurate absolute scaling of the virtual world. However, the results indicated that compressed egocentric distance judgment was not caused by the unnatural stereo viewing conditions commonly found in IVE systems. The study suggested further investigation of other factors as a source of the compression; e.g., the ergonomics of wearing the VR headset or sense of presence. Regarding the inaccuracy of distance judgment, previous research provides a variety of possible causes. An extensive review by Renner et al. [35] identified four classes of factors that possibly cause underestimation: measurement methods, technical factors, compositional factors, and human factors. However, it is still not clear why actions in VE are smaller than intended. Recent studies on the perception of target sizes in VE also revealed a compression Appl. Sci. 2019, 9, 4652 4 of 18 issue [36,37], as it is dominantly observed in egocentric distance. One of the causes reported was the variations of judgment techniques used to respond. To date, studies of distance estimations have mostly employed verbal or variations of walking (blind walking, triangulation walking) reporting techniques to specify the perceived distance. As VR applications are constantly becoming more interactive, users are enabled to physically interact (e.g., touching, manipulating, etc.) with 3D objects rather than just viewing 3D images. This advancement has been a focus of concern for scholars conducting research on interaction techniques to measure the accuracy of distance estimation. Table 1 shows a summary of studies on interaction techniques (direct and indirect interactions) with different experimental conditions and the extent to which this factor affects the accuracy of egocentric distance estimation. Generally, egocentric distance estimation can be evaluated in one of three planes (frontal, sagittal/lateral, and horizontal/transversal), depending on where the effective motion or movement is to be performed. Although the majority of studies have investigated the accuracy of egocentric distance in the frontal plane, as it is important for many applications in VEs [38], the results have consistently showed that egocentric distance judgments in VEs are not as accurate as real-world estimates.  3D cursor control was less accurate than virtual hand control in the positioning task.
The stereoscopic condition was more accurate than the monoscopic condition.
In this experiment, the accuracy of distance judgment using direct pointing and indirect cursor techniques was investigated. Participants estimated distances by reaching for/pointing at the virtual target at three egocentric distance levels: 90, 120, and 150 cm from the participant. Various applications in augmented reality and virtual environments require reaching by hand or holding other objects to touch a virtual target. For example, many mixed reality-assisted medical procedures involve positioning a medical device at a depth indicated by a (virtual) marker. Distance judgment by reaching/pointing has been studied in previous studies [13,[45][46][47]. Based on some of the results of previous studies [12,40,46], it was expected that distance estimation would be less accurate with the direct pointing technique than with the indirect cursor technique. In particular, a recent study compared the exocentric distance judgment and spatial perception between direct and indirect interactions and found that the accuracy in the frontal space significantly differed [46]. In the current study, the experiment was carried out to find whether such differences also exist for egocentric distance judgment. We expected a more accurate distance estimation for longer distances between the participant and the target [11]. We also expected that underestimations would be less severe with the direct pointing technique [13] than with the indirect cursor technique.

Methods
In this study, we investigated the effects of two interaction techniques, i.e., direct pointing and indirect cursor techniques, on egocentric distance estimation in a stereoscopic environment. We examined the two techniques by considering a tapping (pointing) task wherein virtual targets were projected at three different levels of egocentric distance along the frontal plane. The interaction techniques were developed on the basis of terms for interactions between users and targets in VE by Mine [14].

Direct Interaction with Virtual Objects
Direct interaction is an impression of direct involvement with an object rather than by communicating with an intermediary [48]. With this interaction, a more natural type of interaction is expected to be easy to achieve because of the utilization of the user's hand or other body parts to interact with the objects [49]. Direct 3D interaction has been the focus of many works in VEs over the last few decades. However, based on the statistics showing that accuracy is less accurate in virtual scenes than in the real world, direct interaction with a physical object (e.g., hand, pointing stick, etc.) could also create confusion and lead to inaccuracy because of the need to "touch" intangible 3D objects [50]. Although direct interaction could lead to a number of errors, most results from similar studies agree that this interaction could improve the performance of manipulation of objects when the condition is coupled with visual and motor spaces (e.g., hand positions) to allow better interactions [51][52][53].
In our study, in the direct interaction condition, participants directly pointed (i.e., direct pointing technique) to the center of the target surface using a pointing stick (Figure 1a). This reporting method is similar to the method employed in Singh, Swan, Jones, and Ellis [45], Lubos et al. [54], and Bruder, Steinicke, and Stuerzlinger [39], who employed a physical arm to reach a target in the egocentric distance in a near-field VE. Since the distances considered in their experiment were within the arm's reach, direct pointing by hand was sufficient. In our study, we used three pointing sticks of different lengths with reflective markers attached at their tips. The data of the reflective marker positions were recorded and tracked by a 6D motion capture system (Optitrack) at a frame rate of 120 frames per second. To confirm the pointing actions, participants had to click a Logitech spotlight wireless remote attached at the lower end of the stick.

Experimental Design and Variables
The experimental task is illustrated in Figure 2. We used a standard ISO 9241-9 tapping (pointing) task setup [57], with stereoscopic targets projected (on a frontal plane to participants) at different egocentric distances in front of the projection screen, i.e., with different negative parallaxes. The parallax itself defined the position of the targets relative to a fixed screen display position while the participant was seated 210 cm from the screen; the targets could be perceived at distances of 90, 120, and 150 cm from the participant. Eight spherical targets, rendered in red, were arranged in a circle and displayed one by one in a defined order. Similarly, the arrangement of experimental facilities and their setup is shown in Figure 3. The

Indirect Interaction with Virtual Objects
Alternatively, to select an object that is not located within the arm's reach, the indirect interaction method offers the possibility of selecting distant objects but is limited by decoupling of the user's motor and visual spaces. Indirect interaction also often requires more cognition between input and output [55]. For example, a user who controls an intermediary of a slider on the panel to adjust the intensity of light must put thought into controlling the intermediary slider, wait for a response, and then interpret the results. Different indirect interaction techniques have been proposed, one example being the Go-Go technique [56], which can provide users with the ability to interact with objects in extrapersonal space by nonlinear scaling of hand positions within arm's reach.
In the present study, for the indirect interaction technique, participants used a gamepad to control a virtual cursor (i.e., hand cursor) to acquire the target in VE. Participants had to place the cursor at the center of the target surface ( Figure 1b). Since the virtual cursor and the targets were all displayed virtually, there were no conflicts between the real (physical) and virtual objects. This was expected to reduce the confusion caused by virtually touching an intangible 3D object. However, it is not clear whether the reduction of visual conflicts can improve the overall estimation performance.

Experimental Design and Variables
The experimental task is illustrated in Figure 2. We used a standard ISO 9241-9 tapping (pointing) task setup [57], with stereoscopic targets projected (on a frontal plane to participants) at different egocentric distances in front of the projection screen, i.e., with different negative parallaxes. The parallax itself defined the position of the targets relative to a fixed screen display position while the participant was seated 210 cm from the screen; the targets could be perceived at distances of 90, 120, and 150 cm from the participant. Eight spherical targets, rendered in red, were arranged in a circle and displayed one by one in a defined order.

Experimental Design and Variables
The experimental task is illustrated in Figure 2. We used a standard ISO 9241-9 tapping (pointing) task setup [57], with stereoscopic targets projected (on a frontal plane to participants) at different egocentric distances in front of the projection screen, i.e., with different negative parallaxes. The parallax itself defined the position of the targets relative to a fixed screen display position while the participant was seated 210 cm from the screen; the targets could be perceived at distances of 90, 120, and 150 cm from the participant. Eight spherical targets, rendered in red, were arranged in a circle and displayed one by one in a defined order. Similarly, the arrangement of experimental facilities and their setup is shown in Figure 3. The experimental space was 4 m x 3 m x 2.5 m and partitioned by black curtains to create an excellent stereoscopic environment with no unwanted light. The room was equipped with tables and an adjustable chair. The background of the stereoscopic environment was a uniform dark blue scene, which was projected onto a 130 cm wide x 100 cm high projection screen placed at a fixed position of 210 cm from the participant, and the projector was positioned below the participant's eyes, under the table. Similarly, the arrangement of experimental facilities and their setup is shown in Figure 3. The experimental space was 4 m × 3 m × 2.5 m and partitioned by black curtains to create an excellent stereoscopic environment with no unwanted light. The room was equipped with tables and an adjustable chair. The background of the stereoscopic environment was a uniform dark blue scene, which was projected onto a 130 cm wide × 100 cm high projection screen placed at a fixed position of 210 cm from the participant, and the projector was positioned below the participant's eyes, under the table.

Independent Variables
This experiment had three independent variables: two interaction techniques (direct pointing and indirect cursor technique), three egocentric distances (target displayed at 90 cm, 120 cm, and 150 cm from the participant), and six indices of task difficulty (2.7, 3.5, 4.1, 4.5, 6.05, and 6.15 bits). Therefore, the experiment was designed as a 2 (interaction technique) x 3 (egocentric distance) x 6 (index of difficulty) within-subject design, and repeated-measures ANOVA was used as the analysis method. The levels of task difficulty were further classified into low (easy task), medium (moderate task), and high (difficult task) indices of difficulty. These difficulty levels respectively indicated an index of difficulty (ID) less than or equal to four, greater than four but less than or equal to six, and greater than six [58]. The ID was computed using Equation (1): where A is the amplitude of inter-target distance and W is the target width, as shown in Figure 2. A

Independent Variables
This experiment had three independent variables: two interaction techniques (direct pointing and indirect cursor technique), three egocentric distances (target displayed at 90 cm, 120 cm, and 150 cm from the participant), and six indices of task difficulty (2.7, 3.5, 4.1, 4.5, 6.05, and 6.15 bits). Therefore, the experiment was designed as a 2 (interaction technique) × 3 (egocentric distance) × 6 (index of difficulty) within-subject design, and repeated-measures ANOVA was used as the analysis method. The levels of task difficulty were further classified into low (easy task), medium (moderate task), and high (difficult task) indices of difficulty. These difficulty levels respectively indicated an index of difficulty (ID) less than or equal to four, greater than four but less than or equal to six, and greater than six [58]. The ID was computed using Equation (1): where A is the amplitude of inter-target distance and W is the target width, as shown in Figure 2.
A farther inter-target distance or smaller target results in increased difficulty.

Dependent Variables
The major dependent variable considered in this study was accuracy. Accuracy measures how close the estimation is to the reference value, which can be defined as a fraction of the actual egocentric distance. Accuracy was previously used by Armbrüster et al. [59], Dey et al. [60], and Lin and Woldegiorgis [11] for evaluating egocentric distance perception, using Equation (2): where De is the observer's perceived distance, and Da is the corresponding actual egocentric distance. A value of accuracy closer to one indicates that the egocentric distance estimation is more accurate.
Task completion time was also measured in seconds from the moment the participant judged the first target to the moment they completed the trial (with eight targets). Movement time was considered as the time taken by the participant to move the tip of the pointing stick or virtual hand cursor from one target to the next target. As displayed in Equation (3), the movement time (MT) required to point at a target is affected by the distance moved (A) and the width of the target W measured with respect to the direction of movement [61]: where a and b are regression coefficients, while the logarithmic phrase is an index of difficulty (ID). The movement time was used to calculate the throughput of each trial. The throughput represents a rate of information transmission of responses [57,62] and can be calculated using the following equation:

Experimental Settings and Stimuli
The VE, the 3D stereoscopic targets, and the experimental task were developed in Unity 3D (version 4.3.4f1). The VE was projected by a ViewSonic 3D projector and the task ran on a high-speed computer that supported stereoscopic vision (processor Intel Core-i7-7700 CPU @3.60GHz, with installed RAM 8 GB, and NVIDIA GeForce GTX 1060 6GB graphics); the latency of the virtual system could be minimized such that it should not affect the performance of interaction. To perceive the 3D stereoscopic targets at three different egocentric distances, participants wore NVIDIA 3D glasses integrated with a 3D vision emitter. The parallax was adjusted based on the required egocentric distances (90 cm, 120 cm, or 150 cm from the participant) and the interpupillary distance (IPD) of each participant. The interaction by direct pointing was performed using light wooden sticks of three different lengths, 80 cm, 110 cm, and 140 cm, with 0.6 cm reflective markers attached to their tips. The material of the pointing stick was carefully chosen to minimize the adverse effects of its weight on the participants' pointing postures or performance. The data of the reflective marker positions were recorded and tracked by a 6D motion capture system (Optitrack) at a frame rate of 120 frames per second. To confirm the pointing actions, participants had to click a Logitech spotlight wireless remote attached at the lower end of the stick. The interaction by indirect cursor was performed using a gamepad, a Sun-Yes R-0011 dual analog. The gamepad analog enabled the movement of the cursor along the x-, y-, and z-axes within the VE. The left analog of the gamepad controlled a virtual cursor in two-degrees-of-freedom (DoF) of translation (up and down, left and right), including diagonal movement, while the right analog of the gamepad controlled the depth (forward and backward) of the virtual cursor. The size of the virtual cursor was scaled to approximately the width of each target to provide good visualization when touching the target. Four programmable buttons were located on the gamepad grip, but for this study, only the primary button "X" was activated (to confirm the pointing). Greater force on the gamepad analog increased the velocity of the virtual cursor. We optimized the sensitivity value of the gamepad between the exerted force and the cursor speed. A preliminary experiment was conducted to obtain an optimum sensitivity value to control the cursor to a comfortable condition for precise and fast movement. The average value was approximately 2 m/s, and we used this value for subsequent experiments. The starting points of the stick tips and virtual cursors were kept fixed at three different positions, P 1 , P 2 , and P 3 , marked on the table (as shown in Figure 3). Considering the projector as a reference of measurement, the relative positions of the initial positions of the tips and cursors were P 1 (82, 0, 82), P 2 (82, 0, 90), and P 3 (82, 0, 120) for targets at egocentric distances of 90, 120, and 150 cm, respectively. For each trial, both the stick tip and the cursor position were restarted at the initial positions. The data streams from the marker position, virtual cursor, and virtual target were collected and processed by PC to analyze how close the estimates of the egocentric distance were to the reference of the theoretical positions. To minimize the possibility of differences in the perceived egocentric distances because of the head motions, the participants rested their chins on a fixed chinrest on the tabletop and sat on an adjustable chair to ensure consistency in height.

Procedure
Prior to testing, participants read and completed a consent form, which also described the purposes, tasks, and procedures of the study. After reading the instructions, participants received an equivalent verbal description while their IPDs were measured. The IPD values were used to determine the separation of the horizontal images (in calculating the parallax) to create the 3D stereoscopic target at the required egocentric distance. Greater parallax brought the virtual target closer to the participant's eyes.
Each participant trained for an average of 3 min in each interaction to become familiar with the experimental setting and apparatus. During the training session, the participant sat on a chair at a designated distance, wore the 3D glasses, and was instructed to view the VE and to obtain a good image of the virtual targets. Participants were also advised to quit at any time if they felt discomfort associated with the VE.
When each trial began, the spherical virtual targets were sequentially projected briefly and disappeared (just after the participant clicked the target) one after the other, and the participants were instructed to point at a target as quickly as possible while maintaining accuracy. No visual feedback (e.g., color change) was given for both interaction techniques when the target was chosen, other than the next target appearing.
We randomly divided the participants equally into two groups to start the experiment either in the direct or indirect interaction conditions, which were separated by at least two days to minimize the learning effect and fatigue. Each group was further divided into three to counterbalance the order of the trials in the three egocentric distance conditions (90 cm, 120 cm, and 150 cm from the participant). Once the egocentric distance was allocated, the participant had to complete all six IDs, within the same depth condition, in a completely randomized order. Each participant had to complete all combinations of tasks as 2 (interaction techniques) × 3 (egocentric distances) × 6 (indices of difficulty). Therefore, every participant completed 144 trials (3 egocentric distances × 6 indices of difficulty × 8 targets) for each interaction. The total time for each participant to finish all combinations of tasks including the training session was approximately 90 min.

Participants
Fourteen graduate students from various departments at the National Taiwan University of Science and Technology were recruited by advertisement on social media. They were ten males and four females between the ages of 23 and 28 years old (Mean = 24.36, SD = 1.45). All the participants were right-handed and reported normal vision or corrected-to-normal with glasses or contact lenses, and their IPDs were between 6.5 cm and 7 cm (Mean = 6.57, SD = 0.18). Among all the participants, the majority had not had any experience in VR. Written informed consent was provided to each participant prior to the experiment. All participants gave their informed consent for inclusion before they participated in the experiment. We tested the maximum stereo vision of the experiment for each participant by viewing the nearest target distance (90 cm from the participant) before the experiment started. None of the participants failed this test. The participants received neither payment nor compensation with academic credit. The experiment was approved by the research ethics committee of National Taiwan University (NTU-REC No: 201209HS002).

Results
Repeated-measures ANOVA with three independent variables was performed on each dependent variable, i.e., accuracy of estimation, task completion time, and throughput. Post hoc tests were conducted using the Tukey HSD test (α = 0.05) when applicable. Degrees of freedom (DoF) were corrected using Greenhouse-Geisser correction when Mauchly's test indicated that the assumption of sphericity had been violated.

Accuracy of Estimation
As shown in Figure 4, the mean accuracy of the indirect cursor technique was higher (mean = 0.95, SD = 0.01) than the mean accuracy of the direct cursor technique (mean = 0.90, SD = 0.01) at all tested egocentric distances and IDs. The ANOVA exposed significant differences in accuracy between the two techniques (F [1, 13] = 12.26, p < 0.05). We found a significant main effect of egocentric distance (F [1.340, 17.419] = 6.719, p < 0.05) on accuracy. Tukey post-hoc test revealed that participants significantly performed more accurate judgment when the targets were 150 cm (p < 0.05) than when they were 120 cm and 90 cm from the participant. We found a two-way interaction between technique and egocentric distance (F [2, 26] = 8.759, p < 0.05) on accuracy. The post-hoc test showed that participants using direct pointing were significantly (p < 0.05) more precise when the targets were displayed at a distance of 150 cm than when they were displayed at the other distances. For the indirect cursor technique, we found no significant difference of distance on accuracy. In the direct pointing technique, the overall egocentric distance estimations were 96.797 cm (SD = 2.569 cm), 127.492 cm (SD = 2.151 cm), and 156.867 cm (SD = 1.543 cm) at distances of 90 cm, 120 cm, and 150 cm from the participant, respectively. These results showed that the overall egocentric distances were overestimated. Similarly, in the indirect cursor technique, the corresponding overall estimations were 86.015 cm (SD = 1.169 cm), 115.612 cm (SD = 1.295 cm), and 143.281 cm (SD = 1.794 cm) at 90 cm, 120 cm, and 150 cm, respectively. Unlike those with the direct pointing technique, the overall egocentric distances were underestimated.
The effect of ID on egocentric distance estimation accuracy was also examined. The results showed that the main effect of ID on the accuracy of distance judgement was significant (F [1.35, 17.58] = 8.24, p < 0.05). The highest mean accuracy was 0.934 (SD = 0.007) at a low level of ID, followed by 0.927 (SD = 0.008) at a medium level of ID, and then 0.919 (SD = 0.009) at a high level of ID. However, the interaction between ID and interaction technique was not significant (F [1.35, 17.49] = 2.94, p > 0.05).
127.492 cm (SD = 2.151 cm), and 156.867 cm (SD = 1.543 cm) at distances of 90 cm, 120 cm, and 150 cm from the participant, respectively. These results showed that the overall egocentric distances were overestimated. Similarly, in the indirect cursor technique, the corresponding overall estimations were 86.015 cm (SD = 1.169 cm), 115.612 cm (SD = 1.295 cm), and 143.281 cm (SD = 1.794 cm) at 90 cm, 120 cm, and 150 cm, respectively. Unlike those with the direct pointing technique, the overall egocentric distances were underestimated.     The results for throughput are illustrated in Figure 6. We found a weak main effect of interaction technique (F [1, 13] = 5.257, p = 0.039) on throughput. The overall throughput computed for the direct pointing technique was higher (mean = 2.314 bps, SD = 0.136 bps) than that for the indirect cursor technique (mean = 2.190 bps, SD = 0.095 bps). The ANOVA revealed a significant main effect of egocentric distance (F [1.989, 25.859] = 34.876, p < 0.001) on throughput. The post-hoc tests indicated that the throughput was significantly lower for the targets at 150 cm (p < 0.05) than for those at 120 cm and 90 cm. Similarly, we found a significant main effect of ID (F [1.548, 20.126] = 116.952, p < 0.001) on throughput. Tukey post-hoc tests further indicated that the throughputs were significantly different (p < 0.001) for all three IDs. The average throughput was significantly higher at the highest ID (Mean = 2.805 bps, SD = 0.157 bps) than at the medium ID (Mean = 2.172 bps, SD = 0.110 bps) and lowest ID (Mean = 1.779 bps, SD = 0.082 bps). We also found a marginally significant difference between interaction technique and index of difficulty (F [1.987, 25.832] = 3.574, p = 0.043) on throughput; however, no significant difference was found between interaction technique and egocentric distance (F [1.795, 23.329] = 2.819, p > 0.05).

Discussion
This study investigated the effects of two interaction techniques, direct pointing and indirect cursor, when a stereoscopic target was displayed in a projection-based display. The results revealed evident effects of the interaction technique, the egocentric distance, and the index of difficulty on the accuracy of interaction performance. Estimation of egocentric distance was less accurate with the direct pointing technique (90%) than with the indirect cursor technique (95%). This finding supports our experimental hypothesis that egocentric distance estimation is more accurate with the indirect cursor technique than with the direct pointing one. A plausible explanation is the visual conflicts that may occur when a virtual object is selected with the direct pointing technique [63]. In particular, when a participant tries to point a pointing stick at a virtual object in a stereoscopic environment, either the virtual object or the pointing stick will appear blurred. On the other hand, with the indirect cursor, since both the virtual targets and the virtual cursor were displayed stereoscopically in this study, the visual conflicts might have been reduced [40]. Another possible explanation might be related to an occlusion cue that occurred in the indirect cursor condition. When the virtual cursor passed through the virtual target, the cursor would occlude the spherical virtual target. Thus, with this cue, the participant might be aware that the cursor was passing behind the target. Although no visual feedback (e.g., color change) was provided when the cursor touched the target, the presence of the hand cursor could be a visual cue leading to the higher accuracy than that of the direct pointing technique. In addition, since the size of the virtual cursor was scaled approximately to the width of each target, as the cursor moved toward the target, the cursor was comparable until it was almost the same size as the target, which could indicate that they were at the same depth. In other words, when the hand cursor appeared smaller than the target, the size difference indicated that the cursor was farther from the target. Therefore, it was possible that the participants' judgment benefited from this relative size cue.
This study also found that underestimations consistently occurred at each egocentric distance in the indirect cursor condition. On average, participants underestimated target positions by 4, 5, and 7 cm when the targets were displayed at distances of 90 cm, 120 cm, and 150 cm, respectively. This phenomenon was observed in other studies. A comprehensive review by Lin and Woldegiorgis [32] on studies of egocentric distance perspective in VR mostly showed underestimations. One may expect that this result indicates compression of the intended size of the virtual space [34,59]. Some researchers have claimed that such compression might be ascribed to measurement methods and technical issues [35], the low quality of graphics [64], experience in VR [65], and distance cues [66]. However, no consensus has been reached on the causes. In contrast to the indirect cursor technique, in this study, egocentric distance estimation was overestimated with the direct pointing technique in all target distance conditions. On average, distances were overestimated by 6 to 8 cm for all tested target distances (90 cm, 120 cm, and 150 cm from the participant). These results are in line with previous studies showing that the accuracy of egocentric distance estimations in stereoscopic displays up to a distance of 1.5 m was less precise and overestimated [11,13,67]. Using a technique similar to the direct pointing technique used in this study, Swan, Singh, and Ellis [13], who utilized direct matching and reaching for depth judgments, also observed a slightly overestimated judgment of 0.5 to 1.9 cm over reaching distances of up to 50 cm in a mixed environment of VR and real targets displayed by an nVisor ST60 head-mounted display (HMD) that allowed a 60 • diagonal field of view (FOV). Likewise, using a widescreen display system and a triangulated pointing task, Bruder, Sanz, Olivier, and Lecuyer [67] found that overestimations of up to 50% occurred when a target was projected at a distance of 1 m from the observer. In a recent study by Lin and Woldegiorgis [11], vision-based and memory-based pointing methods were used to estimate the egocentric distances of 3D stereoscopic and real targets. Their results showed that egocentric distances were overestimated by 10 to 11 cm at distances of 100 cm and 150 cm in the stereoscopic environment, and their finding was consistent with their previous study [32]. Although earlier researchers based their conclusions on the response methods separately, their approaches can also be categorized as direct interaction methods. The overall estimations at the three tested egocentric distances showed underestimation with the indirect cursor technique and overestimation with the direct pointing technique. This contrast could be attributed to the difference in the methods of distance judgment, which has been interpreted as two representations of visual space with respect to the utilization of interaction techniques [68,69], namely, a cognitive/perception representation for the indirect cursor technique and a sensorimotor representation for the direct pointing technique. The activities in direct pointing involve vision-for-action of direct reaching to a reference target, in which the participant's attention (perception) is focused on the target to be selected and the hand is moved closer to the goal through a motor (action) control. On the other hand, the indirect cursor technique involves only vision-for-perception or matching of a target object (i.e., virtual cursor) until it perceptually matches the reference target, with less or without any motor control (i.e., direct reaching to the target). These two different representations of visual space-a cognitive representation driving perception and a sensorimotor representation leading to action-could have influenced the estimation. At present, however, it is difficult to conclude that visual perception differs from participants' responses or actions with respect to direct and indirect interaction techniques, as only a few related comparative studies are available. Further study with a greater focus on this variable is needed to clarify its effect on egocentric distance estimations. This study provides important information on how interaction techniques (direct pointing and indirect cursor) affect the accuracy of egocentric distance estimation for interaction with objects displayed stereoscopically at different parallaxes. It is also evident from the present study that interaction with the direct pointing technique could mitigate the underestimation problems commonly reported in VE studies.
In this experiment, we found that the accuracy of egocentric distance estimation with the direct pointing technique increased as the egocentric distance increased (as shown in Figure 4). This result was also supported by statistical evidence that egocentric distance estimation with the direct pointing technique was significantly more accurate (p < 0.05) at the farthest distance of 150 cm than at the other distances of 90 cm and 120 cm. The result is consistent with previous studies [11,35], which reported that the accuracy of stereoscopic target selection was lower for targets closer to the eyes. This finding can be explained by the occurrence of accommodation-converge mismatch [70]: 3D objects displayed closer to a participant generate higher accommodation-converge mismatch than do objects displayed farther away. This factor reduces the accuracy of target selection in stereoscopic environments.
The throughput performance was significantly higher with direct pointing than with the indirect cursor technique. This difference probably resulted from the target selection times of the direct pointing technique, which were faster than those of the indirect cursor technique. With the direct pointing technique, simplicity and efficiency of arm movement are easy to achieve. In addition, participants informally reported that they felt the direct pointing technique was easier to perform than the indirect cursor technique. In our study, a common pattern was found in movement time. The results indicated that movement time linearly increased with the target's ID. However, an unanticipated finding was that throughput increased with ID, contrary to the intuitive expectation that a higher ID would result in lower throughput. Such results have been reported in some studies; for instance, Lin, Abreham, and Woldegiorgis [41] showed that throughput was higher at the highest ID than at the lowest ID. The explanation for this unexpected result could be that the rate of increment between IDs was not proportional to the rate of increment observed in the corresponding MT, where the slope of ID increased more sharply. Further work is required to comprehensively explain this issue.

Conclusions
In this study, the effects of interaction techniques on the interaction performance of accuracy in performing a pointing/selection task within a stereoscopic environment were investigated. The influences of indices of difficulty and different egocentric distances on the accuracy of interaction performance were also examined. The results showed that accuracy in the selection of stereoscopic targets was significantly affected by the interaction technique. Moreover, we also found substantial effects of the distance at which a target was projected at a different level of difficulty within the negative parallax of the stereoscopic display.
The indirect cursor technique was more accurate than the direct pointing technique. However, the results also showed that the throughput of the direct cursor technique was higher than that of the indirect cursor technique. This may have been caused by the ease of use and the naturalness of the direct cursor technique allowing faster target selection times, although the means of the task completion times between the direct pointing and indirect cursor techniques were not significantly different. Direct pointing could have interesting implications in VR. The results here suggest that the direct pointing technique may alleviate underestimation of distances in VE, despite the slight overestimation. Consequently, developers of VEs in which user task performance depends on accuracy (for instance, surgical training and simulations) should consider the relative advantages of the direct and indirect interaction techniques.
In addition, the experiment revealed that targets displayed at 150 cm from the participant provided the most accurate distance judgment of all three egocentric distances. This result implies that the accuracy of target selection in a stereoscopic environment improves as egocentric distance increases. This result could also be used to enhance the understanding of the effective space during interaction with stereoscopic targets and to propose proper interaction distances with stereoscopic targets and displays. However, future research might be needed to determine how much distance judgment improves with changes in distance by considering targets at farther distances from participants. The present study examined task performance in the frontal plane under three egocentric distance conditions. However, performance in the lateral and transversal planes could be considered in future studies. In addition, the virtual target was displayed in a projection-based stereoscopic environment, and a particular tracking system and input device were employed. Future studies may use different virtual environment systems, such as head-mounted and wide-angle displays, to understand the generalizability of the results. In addition, a simple 3D pointing/selection task is commonly used to determine the effects of interaction techniques on egocentric distance accuracy. However, taking the growing interest in advanced forms of 3D virtual or augmented reality interactions into account, it may be necessary to consider a more complex task to show the performance of the interaction techniques. Therefore, further experimental investigations with a more complex task are recommended to clarify the performance of the interaction techniques. The results reported in this study provide useful information on how accuracy and throughput vary with different interaction (direct and indirect) techniques, levels of egocentric distances, and indices of difficulty in a stereoscopic display. Future studies on interaction in stereoscopic environments might be required to investigate user behaviors and kinematics by evaluating usability and analyzing the movements when different interaction techniques are employed.