Next Article in Journal
Sentiment Analysis Based on Enhanced Feature Decoupling and Multimodal Logical Reasoning
Previous Article in Journal
FLAG: Fatty Liver Awareness Game for Liver Health Literacy in Last-Semester Software Engineering Students
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison

1
Faculty of Engineering, University of Rijeka, Vukovarska 58, HR-51000 Rijeka, Croatia
2
Faculty of Maritime Studies, University of Rijeka, Studentska ulica 2, HR-51000 Rijeka, Croatia
3
Center for Artificial Intelligence and Cybersecurity, University of Rijeka, R. Matejcic 2, HR-51000 Rijeka, Croatia
*
Author to whom correspondence should be addressed.
Multimodal Technol. Interact. 2026, 10(5), 49; https://doi.org/10.3390/mti10050049
Submission received: 2 April 2026 / Revised: 24 April 2026 / Accepted: 27 April 2026 / Published: 1 May 2026

Abstract

Fitts’ law is a foundational model for predicting pointing performance and has been increasingly explored in immersive virtual reality (VR) environments. This paper presents a controlled experimental framework for deriving modality-specific Fitts’ law models in VR and evaluating their predictive transfer to applied interaction tasks. The framework comprises two scenarios. The first replicates a standardized ISO 9241 pointing task in a 3D virtual environment to derive predictive movement time models by systematically varying target distance (20–50 cm), target size (2.5–5 cm), and spatial configuration ( 0 , 45 , 90 , 135 ). The second simulates an applied warehouse-inspired task involving tool sorting and structured placement actions to evaluate the generalizability of the derived models in more ecologically valid VR interactions. Thirty-two participants completed all tasks using the Meta Quest 3 headset and two interaction modalities: a handheld controller and hand tracking with gesture recognition. Results show that Fitts’ law remains a strong predictor of movement time for 3D pointing in VR, with high linear fits for both the controller ( R 2 = 0.9615 ) and hand tracking ( R 2 = 0.9668 ). However, models derived from standardized pointing tasks showed limited transferability to applied object-manipulation scenarios, producing prediction errors of approximately 27–35% and systematically underestimating movement times. Additionally, both objective metrics and subjective evaluations indicated that controller-based interaction outperformed hand tracking in efficiency, accuracy, perceived workload, and usability. These findings highlight both the robustness and limitations of Fitts-based performance modeling in realistic VR interaction contexts.

1. Introduction

Understanding and modeling human motor performance is a fundamental objective in human–computer interaction (HCI) research. One of the most influential and widely adopted models in this domain is Fitts’ law, originally proposed by Paul M. Fitts in 1954 [1]. Fitts’ law describes the relationship between movement time and the difficulty of a target acquisition task. The model predicts that the time required to rapidly move to a target is a logarithmic function of the ratio between the distance to the target and its width. Fitts’ law has become a standard tool for evaluating input devices, interaction techniques, and user interface designs because of its simplicity and strong empirical support.
Over the decades, the original formulation of Fitts’ law has been extended beyond the 1D tasks used in early motor control studies. Subsequent work adapted the model to 2D pointing tasks commonly encountered in graphical user interfaces, demonstrating that the relationship between movement time and index of difficulty remains robust across a wide range of interaction contexts [2]. These findings led to the development of standardized multi-directional pointing tasks for evaluating input devices, which were later formalized in the ISO 9241-9 standard [3] (subsequently updated to ISO 9241-411) [4,5]. More recently, researchers have begun to investigate the applicability of Fitts’ law in 3D interaction spaces. Early work examined spatial pointing tasks in 3D environments, such as stereoscopic displays and motion-tracked interfaces, while more recent studies focus specifically on immersive virtual reality (VR) systems in which users perform object manipulation tasks in virtual 3D spaces [6,7,8,9].
The rapid development and adoption of VR technologies have driven the need to better understand human motor performance in 3D interaction spaces. Today, virtual reality systems are increasingly being used in domains such as education, professional training, medical simulation, and remote collaboration. In these contexts, users often interact with virtual objects using handheld controllers or hand tracking systems. Therefore, evaluating the efficiency, accuracy, and usability of these interaction modalities becomes essential for designing effective VR interfaces. Predictive models such as Fitts’ law can provide valuable insights into user performance, enabling researchers and designers to systematically assess interaction techniques and optimize task designs and interfaces within VR.
Although Fitts’ law has been used to study interaction performance in VR, several important challenges remain. In particular, it is not yet fully established how standardized pointing tasks, such as those defined by ISO 9241, can be consistently adapted to immersive 3D environments. Furthermore, it remains unclear how well models derived from such controlled tasks generalize to more realistic object-manipulation scenarios commonly encountered in VR applications. Finally, the increasing diversity of interaction modalities, including controller-based and hand-tracking techniques, raises additional questions regarding performance differences and user experience within such experimental frameworks.

1.1. Related Work

Research on Fitts’ law in three-dimensional and virtual environments dates back several decades. Early foundational work, such as the “fish tank” virtual environment study by Ware and Lowther [10], demonstrated that Fitts’ law remains applicable in stereoscopic 3D interaction, while also highlighting the influence of depth perception and visual feedback on pointing performance. Around the same period, Murata and Iwase [11] extended Fitts’ law to 3D input devices and spatial tasks, showing that while the model generally holds, factors such as movement direction and depth can influence performance and affect model fit.
Building on these early efforts, a large body of work has examined the validity of Fitts’ law in immersive VR, with many studies extending the model to better account for 3D-specific properties. Preikstas and Schofield [12] showed that discrete target-selection tasks, such as navigating the Oculus Quest 2 menu, exhibit a strong linear relationship between movement time and index of difficulty, with R 2 values up to 0.8951 per participant. Extending beyond planar layouts, Cha and Myung [6] introduced azimuth and inclination angles in a physical wall-mounted 3D target arrangement, demonstrating that including these angle-based factors improved model fit compared to traditional formulations based only on distance and target width. Similarly, Kim et al. [13] investigated the effect of target depth in a multi-directional tapping task by varying the distance of a planar target panel along the z-axis and comparing ray-casting and virtual hand interaction techniques. Their results indicated that model accuracy declined when targets were very close to or very far from the user, emphasizing the importance of depth in 3D interaction modeling. Clark et al. [7] further explored three-dimensional Fitts’ law using cubic targets arranged by azimuth and inclination in 45 increments, showing that angular factors significantly influenced movement time and could explain up to 64.5% of the variance in predictive models. Collectively, these studies confirm that while Fitts’ law generalizes well to VR, its classical formulation often requires augmentation to fully capture 3D interaction dynamics.
In parallel, extensive research has compared different interaction modalities in VR, particularly controller-based input and hand tracking. Johnson et al. [14] examined controller-based and hand tracking interaction in object manipulation tasks and reported lower perceived naturalness and feelings of agency with hand tracking, primarily due to tracking accuracy issues. Schafer et al. [15] also found that handheld controllers outperformed hand gesture techniques in accuracy and task completion time during pick-and-place tasks. In a study using an ISO multi-directional Fitts’ law task, Babu et al. [16] compared Leap Motion-based gesture pointing with HTC Vive controller input and observed higher throughput values with controller-based interaction. Collectively, these findings suggest that traditional controllers provide more stable and efficient interaction than hand-tracking techniques. However, users often perceive hand-based methods as more natural.
In contrast, research on the predictive transferability of Fitts’ law models across tasks remains limited. Most studies validate the model within a single experimental context, typically standardized pointing tasks, without examining whether derived models generalize to more complex interaction scenarios. Lane et al. [17] proposed a KLM-based additive model to predict task completion time in augmented reality, incorporating Fitts-based movement time formulations as operators. They evaluated existing 3D movement time models from the literature across two interaction tasks: a simple button-selection task and a more complex object-manipulation task, in which participants moved and rescaled virtual cylinders. They reported prediction errors below 20% in both tasks when using additive models. However, the models were not explicitly derived from a single task and transferred unchanged to another task within the same experimental framework. Garbaya and Hugel [18] examined Fitts’ law in a haptic-enabled virtual assembly environment involving cylindrical parts, which resembles the tool sorting task used in this study. Their results show that Fitts’ law can model movement time in object-manipulation tasks, with an R 2 of 0.972. However, they emphasize that factors such as the assembly part diameter significantly influence performance, which motivates an “inverted Fitts’ law” interpretation. Still, their study does not explicitly investigate cross-task predictive transfer from standardized pointing tasks to distinct assembly tasks within a unified framework.
Finally, it is worth noting that some transfer attempts exist outside of VR-specific research. For example, Wan et al. [19] introduced an extended Fitts-based model with a mission-difficulty index to predict performance in mobile-manipulator teleoperation tasks, incorporating additional factors such as reachability and target orientation. While effective in real-world scenarios, the model was not evaluated within a standardized calibrate-then-transfer setup across tasks in a controlled VR-like framework.

1.2. Research Gap, Contributions, and Structure

While previous research has examined the applicability of Fitts’ law in 3D and VR environments and compared different interaction modalities, two important aspects remain relatively underexplored.
First, despite growing interest in 3D interaction modeling, a clear translation of the standardized ISO 9241 multi-directional pointing task to immersive VR environments has not been fully established. Existing studies often introduce variations in depth or angular layouts, but frequently deviate from the original task structure, use alternative target geometries, or adopt modified modeling formulations. A recent systematic review by Amini et al. [20] highlights that this lack of consistency in task design and experimental setups significantly hinders the comparability of findings across studies. This underscores the need for standardized evaluation methodologies and motivates the ISO-based task extension proposed in this study.
Second, whether predictive models derived from standardized pointing tasks can be reliably transferred to more realistic object manipulation, which involves additional spatial coordination and interaction constraints, remains largely unexplored. To the best of the authors’ knowledge, no study has systematically investigated this calibrate-then-transfer paradigm within a unified VR framework, which motivates the present work.
To address these gaps, this study proposes a principled adaptation of the standardized ISO-based pointing task to immersive VR. The design introduces spherical targets arranged along a rotating target plane, enabling the task to be performed within a fully 3D spatial configuration while preserving the core characteristics of the standardized paradigm. The study maintains the standard formulation to compute task difficulty and uses linear regression modeling to ensure comparability with classical Fitts’ law studies. Furthermore, the study implements a calibrate-then-transfer paradigm, where models derived from the standardized ISO-based pointing task are applied to a more realistic object-manipulation scenario within the same VR framework, enabling a controlled evaluation of the cross-task generalizability.
To complement these contributions, the study also includes supporting evaluations through empirical validation of the model in VR and comparison of interaction modalities. These demonstrate the applicability and robustness of the proposed framework across common input techniques.
In summary, the main contributions of this work are as follows:
  • Primary contributions
    A principled extension of the standardized ISO-based Fitts-style pointing task to VR environments using spherical targets and rotating spatial configurations.
    An investigation of the predictive transferability of models derived from standardized Fitts-style tasks to more realistic object-manipulation scenarios.
  • Supporting evaluations
    An empirical validation of Fitts’ law in a 3D VR context using the proposed standardized framework.
    A comparative analysis of controller-based and hand tracking interaction techniques, demonstrating the framework’s applicability across interaction modalities.
The remainder of this paper is structured as follows. Section 2 describes the experimental setup, task design, and methodology used in the study. Section 3 presents the results of the empirical evaluation. These findings are then discussed in Section 4, including the limitations of the study and directions for future work. Finally, Section 5 summarizes the key findings and conclusions of this research.

2. Materials and Methods

2.1. Participants

Participants for this study were recruited via university communication channels as well as through personal contacts outside the university. This recruitment method resulted in a diverse range of ages and professional backgrounds. The inclusion criteria for participation were normal or corrected-to-normal vision and the absence of any known neurological or musculoskeletal disorders.
A total of 32 healthy adults, consisting of 16 females and 16 males, agreed to participate in the study. Their ages ranged from 18 to 58, with a mean of 32.4 years ( S D = 12 ) and a median of 27 years. Out of 32 participants, 29 (or 90.6%) were right-handed. The sample size of 32 is in line with prior controlled studies on Fitts’ law in VR and 3D interaction, which typically include between 12 and 33 participants (e.g., [7,15,16,17,21]). It also aligns with recent recommendations suggesting a minimum of 18 participants for XR studies [20]. The within-subject design, in which each participant contributed data across all conditions, further supports reliable detection of meaningful effects with the chosen sample size.
To assess prior familiarity with relevant technologies, participants were asked to rate their experience with video gaming and virtual reality (VR) on a 5-point Likert scale, ranging from 1 (no prior experience) to 5 (extensive experience). Video gaming experience was included as it may indicate familiarity with interactive 3D environments, spatial navigation, and hand–eye coordination, which can influence how users interact with VR systems even without prior VR exposure. The corresponding results are presented in Table 1. As shown, most participants reported limited or no prior experience with VR, indicating that the majority of the sample consisted of novice VR users. In contrast, prior video gaming experience was more evenly distributed across levels, with over half of participants reporting moderate to extensive familiarity. This suggests that while many participants had general experience with interactive digital environments and input devices, their direct exposure to VR systems was limited.

2.2. Apparatus

This study was conducted using a Meta Quest 3 VR headset (Meta Platforms, Inc., Menlo Park, CA, USA) and its native controllers. All tasks and virtual environments were developed in Unity 6.2, using the OpenXR plugin as the XR backend. Unity’s XR Interaction Toolkit (v3.2.1) and XR Hands package (v1.6.2) were used to implement two interaction modalities: controller-based interaction and hand tracking.
During experimental sessions, the headset was connected via the official Meta Quest Link USB-C cable to a laptop with an Intel Core i7 processor and an NVIDIA GeForce RTX 2060 GPU. The virtual environment was executed in the Unity Editor’s Play Mode, with rendering performed on the host computer and streamed to the headset via Quest Link. This setup allowed real-time supervision of experimental logs, monitoring for unexpected behavior, and rapid intervention in case of technical difficulties.
Experimental data, including task performance metrics and event timestamps, were logged using custom C# scripts within Unity. Data were recorded at the application frame rate, with timestamps obtained using Unity’s internal timing functions. Survey data and participant feedback were collected using digital forms immediately after each experimental session and subsequently analyzed using Python 3.14 scripts.

2.3. Experiment Design

The study employed a within-subjects design, with input modality (controller-based interaction vs. hand tracking) as the independent variable. Throughput and error rates served as the primary dependent variables, while subjective ratings collected through post-experiment questionnaires were treated as secondary outcome measures. The experiment consisted of two tasks:
  • Task A—a 3D extension of the ISO 9241-411 multi-directional tapping task;
  • Task B—a simulation of an everyday tool-sorting activity.
Before the experimental trials, all participants completed a training session for each interaction modality to familiarize themselves with the upcoming task. Further details on the experimental and training procedures are provided in Section 2.5 and Section 2.6.

2.4. Interaction Modality Implementation

Each task used distinct interaction techniques tailored to its objectives. In Task A, the controller-based condition required participants to use a handheld controller with a small spherical pointer ( r = 8 mm) rendered at its tip to indicate the pointing position. Targets were selected by placing the spherical pointer within the target bounds and pressing the controller’s trigger button. In the hand tracking condition, an identical pointing sphere was rendered between the thumb and index finger, while selections were performed using a pinch gesture. Specifically, a selection was triggered when the pinch strength (0–1 range) exceeded a threshold of 0.8.
In Task B, controller-based interaction relied on the controller’s side grip button for grasping and releasing the object. In the hand tracking condition, objects were manipulated using a pinch-based gesture, with an activation threshold of 0.8 for both grasp and release actions. No additional pointing sphere was rendered in Task B, as it was designed to involve direct object manipulation. The hand tracking condition in Task B was accompanied by visual feedback when the virtual fingers were within a 3 cm radius of the tool, as well as upon a successful grasp. In the controller condition, haptic feedback was provided when the controller entered the same proximity zone, as well as upon grasp and release.
The visual representations and interaction settings for both controllers and hand tracking were based on the default configurations provided by the XR Interaction Toolkit, using the XR Origin Hands (XR Rig) setup. Only minor adjustments were made, including disabling non-essential interaction modes and adapting the default grab interaction to a pinch-based selection. This enabled object manipulation whenever the index finger and thumb were brought together, making it independent of the overall hand pose.

2.5. Task A: Multi-Directional Tapping Task in 3D

2.5.1. Task Design

The first task was a 3D adaptation of a classic 2D multi-directional tapping test specified in ISO 9241-411 [5]. In the standard test, circular targets of equal width W are evenly arranged around the circumference of an imaginary circle, forming a sequence (Figure 1). The distances between successive targets are defined as amplitude A and remain constant within the sequence. The user must sequentially select highlighted targets by moving a pointer across the circle.
The difficulty of each movement is quantified using the index of difficulty ( I D ), defined according to the Shannon formulation as follows:
I D = log 2 A W + 1
In the equation above, A denotes the movement amplitude and W the target width. This measure represents the information-theoretic difficulty of the pointing task and is commonly used to model the relationship between movement time and task difficulty in Fitts’ law studies.
This study extends the standard test by using spherical instead of circular targets and arranging them along the great circle of an imaginary sphere. The great circle is then rotated about the sphere’s x-axis (the global right-pointing axis) by 0 , 45 , 90 , and 135 , resulting in four target-plane arrangements with the same I D , covering the full range of motion within the available 3D space (Figure 2).
Each sequence comprised 9 spherical targets evenly arranged in the standard multi-directional circular layout. An odd number of targets was used throughout the task to ensure the same movement amplitude for all trials in a sequence, as emphasized and suggested by Roig-Maimó et al. [22]. The diameter d of the imaginary great circle was derived from the desired movement amplitude A and the number of targets n using the formula proposed by Roig-Maimó:
d = A cos 90 n
The initial selectable target was positioned at 90 (at the top of the layout). Consecutive selections (trials) alternated in a clockwise direction between opposite targets to maintain a constant movement amplitude, in accordance with the standard 2D task.
Each sequence had to be completed in a single continuous run (without breaks), as selection time was recorded for every click. The sequence was completed when the initial target was selected for the second time. For a 9-target sequence, this required a total of 10 target selections to collect 9 valid trials. A valid trial requires knowing both the “from” and “to” selection coordinates, as well as the measured movement time between the two consecutive selections.
Each sequence was repeated four times, with the circular layout rotated clockwise around the x-axis by an additional 45 each time. The four repetitions formed a block of sequences with the same I D (i.e., the same width–amplitude combination). After the block was completed, the layout reset to 0 for the next I D (the next block of sequences). Examples of different spatial configurations in Task A are shown in Figure 3.
Target arrangements varied across three amplitude values (50 cm, 35 cm and 20 cm), and two target widths (5 cm and 2.5 cm), yielding six unique I D conditions (Table 2), according to Equation (1).
Furthermore, the order of appearance of the I D blocks was randomized for each participant using their participant ID number as the seed. The order of x-axis rotations within the I D block was the same for all participants (starting from 0 and ending at 135 ). Finally, a total of 240 target selections (10 selections per sequence × 4 rotation angles × 6 I D conditions) were required to complete the experimental session for one interaction modality.

2.5.2. Training Session Design

The training session that preceded the actual experimental task consisted of a single sequence (one I D condition) presented at each of the four rotation angles. The layout included 7 spherical targets (instead of 9) with a target width of W = 5 cm and a between-target amplitude of A = 20 cm ( I D = 2.3219 bits). These combinations give a total of 32 target selections (8 selections per sequence × 4 rotation angles) needed to complete the training session. Both the training session and the actual task were repeated twice (once per interaction modality).

2.5.3. Error Definition and Handling

An error was recorded when the pointer sphere’s volume did not overlap with the target’s bounds at the moment of selection. To indicate a mistake, a short 262 Hz sine-wave beep (250 ms duration) was played as auditory feedback, with the headset’s master volume set to a comfortable medium level.
Error-free trials were not enforced in this task; when participants misclicked or selected an incorrect target, the sequence immediately proceeded to the next target rather than requiring a retry. This design choice follows the recommendations of Amini et al. [20], who argue that Fitts’ law tasks should approximate single ballistic movements, as repeated corrective actions may artificially inflate measured movement times. The only exception was the initial target in each sequence, which had to be successfully selected within its bounds to ensure a valid starting position and a consistent onset of movement time measurement.

2.5.4. Virtual Environment Implementation

The virtual environment for this task featured a calming light-blue background, with contrasting red selectable targets positioned in front of the viewer.
The XR Origin (the viewer) was positioned at the center of Unity’s global coordinate system (0, 0, 0), with the camera facing the z-axis. The camera’s y-offset was set to 1.36144 m and remained the same for all participants. The center of the main sphere’s great circle was set to a height of y = 1.25 m, and its z-offset was 0.35 m from the camera. This setup allowed all target arrangements and rotations to remain comfortably within arm’s reach and within the participant’s field of view at all times.
A directional light was positioned 4.33 m above the participant’s head along the y-axis, and 3.88 m behind the participant along the z-axis, with an x-axis rotation of 37.65 , uniformly illuminating the target plane. Shadows were disabled for this task to prevent visual occlusion and ensure an unobstructed view of the targets.

2.6. Task B: Tool Sorting Task

2.6.1. Task Design

The second task was designed as an applied interaction task to evaluate the transferability of predictive performance models derived from Task A. Unlike the multi-directional tapping used in Task A, which represents a standardized serial pointing task, Task B simulates a simple everyday warehouse scenario involving object manipulation. Participants were required to sort virtual tools into their corresponding containers (targets) as quickly and as accurately as possible. This scenario was chosen to represent a structured, ecologically meaningful task involving goal-directed placement actions, spatial coordination, and depth variation. At the same time, it maintains a controlled and repeatable interaction, while introducing a sequential task structure in which the order of subtasks is not predetermined. This design allows for more natural user behavior while still supporting systematic comparison with predictive models derived from the standardized task.
Three distinct task scenarios were designed to approximate the spatial arrangements and layout rotations from the standard task:
  • Table scenario: Tools were placed on a flat horizontal surface in front of the viewer, mimicking the 90 target layout used in Task A. Participants were required to pick up the tools from the center of the table and sort them into the appropriate containers (Figure 4a,c,d).
  • Stepped Shelves scenario: Tools were arranged on an angled surface with four stepped shelves, forming an incline relative to the viewer. This approximated the 45 target layout used in Task A. The goal was the same as in the table task (Figure 4b).
  • Vertical Tool Board scenario: Target hooks were mounted on a vertical central board and placed on the wall in front of the viewer, providing a spatial configuration that mirrors the 0 target layout used in Task A. Participants were required to move the tools from the surrounding shelves and snap them onto the empty hooks to complete the sequence of three (Figure 5).
Several design patterns were implemented to minimize cognitive load and reduce reaction time, as advised in prior works [20,23]. When a tool was grasped, the correct target was visually highlighted, while the remaining tools and targets were temporarily dimmed to reduce visual clutter and emphasize the correct interaction (Figure 4d and Figure 5c). To further reduce reaction time, individual tools were assigned distinct colors and shapes within scenarios.
The tools used in Task B varied in shape and visual size, as shown in Table 3. However, to ensure consistent interaction behavior across all movements and scenarios, the effective collision region used for target detection was standardized. Specifically, larger tools contained an inner collider with dimensions equivalent to those of the smallest tools, ensuring that the interaction area with the target remained constant across all tool types.
All scenarios in Task B were designed as discrete pointing tasks, meaning that each movement was performed independently, with a deterministic starting position (the tool’s center) and a clear endpoint (target acquisition). Therefore, participants were not required to immediately proceed to the next target and were free to choose the order in which they sorted the tools. This contrasts with serial tasks, where selections are made in a continuous sequence, as in Task A. This distinction was discussed and introduced by Soukoreff and MacKenzie [4]. By employing a discrete rather than a serial target selection paradigm, Task B aimed to approximate a more natural interaction scenario while still allowing for reliable performance measurement.
Target arrangements for Task B varied across three amplitude values (30 cm, 20 cm and 10 cm), and two target widths (7.4 cm and 3.7 cm), yielding six I D conditions, five of which were unique (Table 4).
In this context, the amplitude A represents the distance between the center of the tool and the center of the target along the task axis. In the Vertical Board scenario, the target center was defined as the center of the circular hook, whereas in the remaining scenarios, it corresponded to the bottom center of the cylindrical container. The target width W was defined as the inner diameter of the cylindrical container or, in the Vertical Board scenario, as the diameter of the circular hook. This definition follows the standard Shannon formulation of Fitts’ law, ensuring consistency with the models derived in Task A and supporting their transferability to Task B, without explicitly incorporating object dimensions into the I D formulation. The height of the cylindrical container used in the Table and Stepped Shelves scenarios was constant ( h = 2.9 cm), while its width varied depending on the I D condition.
The task was designed to cover 6 I D conditions and 8 approach angles (0– 315 in 45 increments) within each individual scenario. These angles correspond to four flexor and four extensor movements that would be performed in a comparable 8-target multi-dimensional serial task. The placement of each tool–target pair was deterministically defined to ensure full coverage of conditions, while accommodating the physical constraints of each movement. Specifically, angle coverage within each scenario was achieved through separate orthogonal ( 0 , 90 , 180 , 270 ) and diagonal ( 45 , 135 , 225 , 315 ) subtasks, as shown in Figure 5a,b. For the Table scenario, the angles were grouped into upper-hemisphere directions ( 0 , 45 , 90 , 135 , 180 ), as shown in Figure 4a, and lower-hemisphere directions ( 225 , 270 , 315 ). The main motivation for the deterministic arrangement was to display all tools and targets within the user’s reachable workspace and field of view, while ensuring no spatial overlap or interference between objects. Although the spatial configurations were fixed across participants, they were arranged to appear varied and non-repetitive to reduce ordering effects, increase realism, and maintain engagement during repetitive movements. Since the order of individual tool sorting within scenarios was not enforced, an additional layer of execution variability was introduced among participants. Additionally, the order in which scenarios appeared was randomized for each participant using their participant ID number as the seed.
A total of 144 target acquisitions (8 approach angles × 6 I D conditions × 3 scenarios) were required to complete the experimental session for one interaction modality in Task B.

2.6.2. Training Session Design

The training session for Task B consisted of eight targets for the Table and Stepped Shelves scenarios, and nine for the Vertical Board scenario, covering all approach angles. This amounts to a total of 25 target acquisitions needed to complete the training session. Both the training session and the actual task were repeated twice (once per interaction modality).

2.6.3. Error Definition and Handling

An error was recorded when a tool was released outside the designated target bounds, causing it to fall next to the container or away from the hook. Unlike the first task, Task B required error-free completion, meaning that each tool had to be placed correctly before proceeding. The motivation behind this approach was to reflect the behavior of real-life object manipulation tasks.
However, to maintain consistency with Task A, any corrective actions after an error, such as regrasping or repositioning the dropped tool, were not included in the recorded movement time. Therefore, the movement time for a trial was defined as the interval between the first successful grasp of the tool and its first release, regardless of whether that release resulted in correct placement.
Visual feedback about incorrect placements was provided naturally through the physics simulation; when released outside the target, the tool visibly fell outside the intended placement location. Since this response clearly indicated an error, no additional auditory feedback was provided.

2.6.4. Virtual Environment Implementation

Consistent with Task A, the viewer was positioned at the origin of the global coordinate system (0, 0, 0), with the camera facing the z-axis and its y-offset set to 1.36144 m. In the Table scenario, the top center of the table was 0.35 m away from the camera and 1 m above the ground. In the Vertical Tool Board scenario, the central board was positioned 1.318 m above the ground and slightly farther from the camera at z = 0.4 m, ensuring that the entire setup remained fully within the participants’ field of view. In the Stepped Shelves scenario, the central board (located between the second and third shelf) was positioned 0.35 m from the camera and 1.105 m above the ground.
The directional light was positioned relative to the viewer in the same manner as in Task A. Shadows cast by the controller and the virtual hand were disabled to prevent visual occlusion of the tools and containers, while shadows cast by the tools were retained to enhance realism and support depth perception.
The tool models, containers, and shelving elements were obtained from the Unity Asset Store under standard asset licensing terms, and were modified to fit the requirements of the experimental task design. Supporting visual elements (e.g., table, walls, hooks, and central boards) were created directly in Unity using basic geometric primitives and simple materials.

2.7. Post-Experiment Survey

After the experiment, perceived workload was assessed using the rating component of the NASA Task Load Index (Raw NASA-TLX) [24], a tool developed by Hart and Staveland [25] and commonly used for subjective workload evaluation. The Raw NASA-TLX includes six subscales: Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration. In this study, the Temporal Demand subscale was omitted because the experimental tasks were performed without explicit time constraints or externally imposed pacing. Participants were not subjected to time pressure at any point during task execution; therefore, perceived temporal demand was not expected to meaningfully differentiate between conditions. This decision aligns with the use of the Raw NASA-TLX, which allows flexible consideration of workload dimensions based on task characteristics. The remaining five subscales were rated on a standard 1–21 scale, which was subsequently normalized to a 0–100 scale for analysis.
In addition to workload assessment, participants completed a custom questionnaire designed to evaluate their subjective perceptions of interaction quality when using the controller versus hand tracking. Participants rated a series of statements related to control, efficiency, learnability, and other custom usability attributes. Responses were recorded on a 7-point Likert scale ranging from 1 (strongly disagree) to 7 (strongly agree) for both interaction types. The following statements were presented:
  • Control: “Using this interaction type, I felt control over what I was doing.”
  • Speed: “Using this interaction type, I was able to complete the task quickly.”
  • Precision: “Using this interaction type, I was able to complete the task accurately.”
  • Learnability: “I quickly learned how to use this interaction type.”
  • Intuitiveness: “This interaction type was intuitive to use.”
  • Need for help: “I required additional guidance or help when using this interaction type.”
  • Fatigue: “I quickly became fatigued when using this interaction type.”
  • System reliability: “Using this interaction type, I felt confident that all of my movements and actions will be correctly registered.”
  • Engagement: “I would gladly use this interaction type the next time I have to perform similar tasks in VR.”

2.8. Experimental Procedure

The study was conducted over a three-week period, during which participants attended their pre-scheduled individual sessions. Testing took place in a neutral, controlled environment. The ambient in the testing room was kept distraction-free and comfortable for all participants. The experiment included three phases: initial instructions, testing, and a post-experiment survey. The maximum duration was 40 min per participant. At the start of the experiment, all participants signed an informed consent form and were briefed on the study’s main goals.
In the initial phase, participants received verbal instructions, video examples, and snapshots of the upcoming tasks. They were instructed to select targets and sort tools as quickly and accurately as possible while maintaining a comfortable pace. They were told that missing an occasional target was acceptable, but if many mistakes occurred, they should try to slow down. To prevent unwanted non-rectilinear movements observed during the pilot study, an additional instruction was given for the tool sorting task: participants were to pay extra attention and locate the tool’s final target (a box or a hook) before reaching for the tool itself. This instruction was important because it aimed to separate the user’s reaction time from their actual motor movement time.
At the start of the testing phase, participants were seated and given the VR headset and the controller for their dominant hand. All participants were instructed to use only their dominant hand throughout the experiment. Participants who wore corrective eyeglasses were allowed to keep them on while wearing the headset. All equipment, including the headset and the controller, was disinfected before each individual used it.
To ensure equal conditions for all participants, particularly regarding camera height and eye distance from the task panel, participants first recalibrated their view position using a specialized button on the controller. To mitigate order effects, modality counterbalancing was implemented: half of the participants used the controller as their first input method, while the other half began with hand tracking.
Menu navigation and selection were primarily handled using the controller’s raycast, except when a participant expressed a specific interest in using the hand tracking raycast for navigation (Figure 6).
The study began with Task A (the multi-directional tapping task). After completing the training session using both interaction modalities, participants proceeded to the actual experimental trial. The same procedure was then repeated for Task B (the tool sorting task).
Participants were allowed to take breaks between sequences, tasks, and when switching modalities. If needed, they were also allowed to remove the headset during breaks. Occasional spontaneous dialogue between the study administrator and the examinees was not discouraged, as it often created a more relaxed atmosphere for the participants.
Figure 7 shows a laboratory session in which the participant performs the assigned tasks in the VR environment.
After completing the testing phase, participants filled out the NASA-TLX questionnaire to assess their perceived workload during the tasks. They also answered several demographic questions and a custom usability questionnaire designed to gather feedback on their experience with the system and the experimental setup. After all study-related activities, participants received a small sweet treat as a token of appreciation for their time and participation.

2.9. Evaluation Protocol

The objective performance analysis consisted of three main phases:
  • Predictive modeling;
  • Model validation using predictive transfer;
  • Modality comparison.
Before the start of the first phase, the dataset was cleansed of any temporal and spatial outliers, consistent with prior work [26,27]. In both Task A and Task B, a temporal outlier was removed if the recorded M T of the trial exceeded the mean M T of all trials with the same I D condition by more than three standard deviations.
A spatial outlier was removed in Task A if the selection endpoint of a trial was more than twice the target width away from the target center. In Task B, however, a more lenient restriction was applied: a selection had to be made within three times the target width from the target center to be considered valid. This choice was motivated by the fact that participants exhibited a higher tendency to overshoot the intended placement due to the more realistic and physically constrained nature of the object manipulation task.
Moreover, due to the serial nature of selections in Task A, all trials immediately following spatial outliers were also marked as outliers, as the effective amplitudes for those trials would have been artificially inflated or deflated because of the misselection in the previous trial. This was not the case for Task B, which used a discrete selection task design. Table 5 shows the total percentage of outliers removed during preprocessing.
After outlier removal, the data from Task A (including the nominal I D values and the obtained movement times) were used to build a predictive temporal model using linear regression. All participants’ mean movement times for the corresponding I D condition were averaged into a single M T value per condition, which was then used to fit the model. Movement time was modeled as a linear function of I D , as shown in Equation (3), where a is the intercept and b is the slope of the regression. This is the standard formulation used in predictive modeling tasks for Fitts’ law [28,29]. The coefficient of determination R 2 was used to assess the goodness of fit of the derived models.
M T = a + b · I D
In the validation phase, the predictive models derived from Task A were applied to estimate the movement times for each of the three angle-based scenarios in Task B. The predictive accuracy of the cross-task transfer was evaluated using the Root Mean Squared Percentage Error (RMSPE). The formulation of RMSPE used in this study is given in Equation (4), where M T i denotes the observed movement time for observation i, M T ^ i represents the movement time predicted by the regression model, and n is the total number of observations:
R M S P E = 1 n i = 1 n M T i M T ^ i M T i 2 × 100 .
It should be noted that the M T values observed in Task B correspond to the five unique I D conditions (as presented in Table 4), since trials with the same A / W ratios were processed under the same I D condition.
Furthermore, throughput (TP) and error rates were used to compare the objective performance of the two interaction modalities. Throughput provides a combined measure of speed and accuracy, and was calculated according to the ISO 9241 recommendation using the “mean-of-means” method [5,20,23]. The following equation (Equation (5)) was used to calculate the overall throughput for each modality:
T P = 1 N p = 1 N 1 C c = 1 C I D e , p c M T p c ,
where N denotes the number of participants, C the number of I D conditions, I D e , p c the effective index of difficulty for participant p under condition c, and M T p c the corresponding mean movement time. The effective index of difficulty ( I D e ) was computed using Equation (6), which involves the effective amplitude A e and the effective target width W e [4]:
I D e = log 2 A e W e + 1
The effective target width W e was derived from the standard deviation of selection endpoints, according to Equation (7):
W e = 4.133 × S D x ,
where S D x represents the standard deviation of the selection coordinates projected onto the task axis. The effective amplitude A e was computed as the mean of the projected movement distances along the task axis [4,23].
Error rates were calculated as the percentage of unsuccessful selections (misses) relative to the total number of attempts for each participant, modality, and task, providing a complementary measure of interaction accuracy. After calculation, throughput values and error rates were subjected to further statistical analysis to assess differences in performance and accuracy between the interaction modalities.
In addition to the objective metrics, subjective workload and usability for each interaction modality were assessed using the NASA-TLX and a custom questionnaire, with responses analyzed using non-parametric statistical tests.

3. Results

The following sections present the results of the empirical evaluation conducted to assess the applicability of Fitts’ law in VR interaction tasks, the predictive transfer of the derived models to a more applied interaction scenario, as well as the comparative performance of the two interaction modalities: controller-based interaction and hand tracking.

3.1. Validity of Fitts’ Law in VR

Figure 8 shows the linear regression models fitted to the data from Task A for each interaction modality, following the standard formulation of Fitts’ law (Equation (3)). Both models demonstrate a strong linear relationship between movement time and I D , with R 2 = 0.9615 for the controller and R 2 = 0.9668 for the hand tracking modality.
Moreover, the regression line for the hand tracking modality consistently lies above that of the controller for the observed I D range, indicating higher movement times for comparable task difficulties. While this trend suggests that participants required more time to complete the task using hand tracking, further analysis of objective performance metrics, such as throughput, is necessary to draw definitive conclusions about differences in modality performance.

3.2. Predictive Transfer to Applied Tasks

The predictive performance of the models derived from Task A was evaluated by comparing the observed and predicted movement times from Task B. When averaging data across all three scenarios, the models demonstrated modest predictive accuracy for both modalities, with an overall RMSPE of 29.95 % for the controller modality and 27.36 % for hand tracking.
To better understand this trend, the RMSPE was further analyzed separately for each angle-based scenario. Figure 9 illustrates the relationship between the obtained and predicted MT values for each modality across all three observed scenarios in Task B. The RMSPE results are summarized in Table 6. As observed, the predicted values generally underestimate the observed movement times in most scenarios, indicating a systematic bias in the model’s predictions.
Overall, the models demonstrate limited predictive accuracy across all scenarios, with RMSPE values ranging from 26.64% to 34.18%. The hand tracking modality yielded slightly lower RMSPE values than the controller in two of the three tested scenarios, suggesting marginally better cross-task transferability. The lowest prediction error was observed in the Vertical Board ( 0 ) scenario for both modalities, while the Stepped Shelves ( 45 ) configuration produced the highest RMSPE values. The implications of these results and possible explanations will be presented in Section 4.

3.3. Modality Comparison (Controller vs. Hand Tracking)

3.3.1. Objective Metrics

Figure 10 shows the mean throughput values derived from Task A for both interaction modalities, with error bars indicating the standard deviation. A Shapiro–Wilk test confirmed that the normality assumption was met ( W = 0.9662 , p = 0.4025 ), supporting the use of a parametric test for further analysis. A paired-samples t-test revealed a statistically significant difference between the two modalities, t ( 31 ) = 18.95 , p < 0.001 , with the controller ( M = 4.85 ± 0.63 bits/s) yielding substantially higher throughput than hand tracking ( M = 3.02 ± 0.49 bits/s). The effect size was very large ( d = 3.27 ), far exceeding conventional benchmarks for Cohen’s d (small = 0.2, medium = 0.5, large = 0.8), indicating a strong performance advantage for the controller in terms of information throughput.
Figure 11a shows the mean error rates and their standard deviations for Task A using both interaction modalities. The normality of the paired differences was assessed using the Shapiro–Wilk test, which indicated a violation of the normality assumption ( W = 0.8918 , p = 0.0039 < 0.05 ), thus encouraging the use of the non-parametric Wilcoxon signed-rank test for further analysis. The Wilcoxon test revealed a statistically significant difference between the two modalities ( Z = 4.937 , p < 0.001 ) with a maximal rank-biserial effect size of r = 1.0 , indicating that participants made significantly fewer errors when using the controller ( M = 2.27 ± 2.00 % ) compared to hand tracking ( M = 7.16 ± 3.37 % ).
For Task B, the mean error rates for the controller and hand tracking are shown in Figure 11b. Normality was confirmed using the Shapiro–Wilk test ( W = 0.9733 , p = 0.5944 ), supporting the use of a parametric test for further analysis. A paired-samples t-test revealed a statistically significant difference between the two interaction modalities ( t ( 31 ) = 3.227 , p = 0.00295 < 0.05 ), with a medium effect size ( d = 0.667 ). Consistent with Task A, these results indicate that participants made significantly fewer errors when using the controller ( M = 2.96 ± 2.86 % ) compared to hand tracking ( M = 5.31 ± 4.07 % ).

3.3.2. Survey Results

Figure 12 presents box plots of the observed Raw NASA-TLX scores, providing a visual overview of the differences between the two interaction modalities. Note that higher values for all factors, including the “Performance” question, indicate higher perceived workload and worse subjective evaluation.
It is evident that, across all TLX factors, the median scores for the hand tracking modality are generally higher than those for the controller. This pattern suggests that participants may have experienced a greater perceived workload when using hand tracking. However, statistical analysis is needed to determine whether these differences are statistically significant.
Table 7 summarizes the Wilcoxon signed-rank test results for the five NASA-TLX factors. Across all dimensions, participants reported lower perceived workload when using the controller compared to hand tracking. The differences were statistically significant for all factors ( p < 0.001 ), with large effect sizes, confirming the visual trends suggested by the box plots. These results indicate that, from the participants’ perspective, the controller required less effort, induced less frustration, and allowed for better performance than hand tracking.
The results of the custom usability questionnaire further corroborate the preference for the controller, as shown in Figure 13 and described in Table 8. Note that higher values for all factors, except for the “Need for help” and “Fatigue” questions, indicate a higher perceived usability and better subjective evaluation.
The Wilcoxon signed-rank test indicates significant differences across all assessed features. Notably, the intuitiveness feature, while still showing a statistically significant difference, exhibited a medium effect size, whereas all other features showed large effects. Overall, these findings suggest that participants perceived the controller as more efficient, precise, and reliable, while imposing lower cognitive and physical demands compared to hand tracking.
Taken together, the results from the NASA-TLX assessment and the custom usability questionnaire show a consistent pattern that aligns with the objective performance analysis. These findings indicate agreement between subjective evaluations and measured performance, both of which favor the controller modality.

4. Discussion

4.1. Validity of Fitts’ Law in VR

The results from Task A confirm that Fitts’ law robustly modeled movement time across pointing tasks for both interaction modalities. This confirms the applicability of Fitts’ law for both interaction modalities in the context of virtual reality, with particular emphasis on the 3D extension of the standard multi-directional pointing task proposed in this study. These results are consistent with prior work examining the applicability of Fitts’ law in both 1D and 2D contexts [1,2,29], as the R 2 metric exhibited similarly high values in those studies. In the 3D context, the derived models showed higher R 2 values than in some previous studies [7,12,13,21], reaching 0.9615 and 0.9668 in our case, compared to the range of 0.645–0.95 reported in those works. Notably, direct quantitative comparison with prior VR studies remains inherently challenging due to the lack of standardized experimental tasks, as well as differences in interaction techniques, spatial configurations, and reported performance metrics. This limitation directly motivates the proposed ISO-based task adaptation, which aims to establish a more consistent basis for future comparisons.

4.2. Predictive Transfer to Applied Tasks

The predictive models derived from the standardized ISO-based point-and-select task showed limited cross-task transferability when applied to more realistic object manipulation tasks.
Among the three task configurations, the Vertical Board ( 0 ) scenario yielded the lowest prediction errors. One possible explanation is that movements in the ( 0 ) configuration are largely constrained to a single vertical plane, consistently limiting the z-axis movements that require depth-based coordination. This configuration may more closely resemble the planar pointing motions observed in the standardized tests typically modeled by Fitts’ law.
In contrast, the Stepped Shelves ( 45 ) arrangement required frequent depth adjustments and changes in movement direction, potentially increasing motor coordination demands and introducing additional variability in execution that the original predictive models do not capture. Participant feedback indicated challenges with depth perception and instances in which the tool interacted with the edges of the cylindrical containers, which may have further contributed to variability. Additionally, the hand trajectories were often curved rather than straight during tool insertion into the containers, a factor not accounted for in the linear predictive models.
It should be noted that, following the experiment, six participants (18.8%) reported perceiving the rendered image as blurred or out of focus; however, only one participant indicated that this negatively affected their perceived performance. While this may have contributed to depth perception issues, the precise factors underlying these discrepancies remain to be investigated in future work. In particular, factors such as visual fidelity, depth cue perception, and display characteristics in immersive VR environments warrant further investigation.
Moreover, as illustrated in Figure 9, the majority of observed movement times exceed the predicted values for both interaction modalities. This indicates that the movements performed in Task B generally took longer than the models derived from Task A predicted. In other words, the interaction scenarios in Task B appear to impose a systematic additional difficulty, or bias, beyond what is captured by the I D alone. This may be due to object manipulation constraints, increased motor coordination demands, and differences in interaction technique, since Task B required a grab-and-release manipulation action rather than the point-and-select interaction used to derive the predictive models.
Collectively, these findings suggest that ISO-based Fitts’ law models capture the relative trend in task difficulty well, but they underestimate absolute movement times for tasks requiring more complex 3D manipulation. Future work could explore model adjustments that account for depth-dependent constraints, interaction techniques, and curved hand trajectories to improve predictive accuracy in realistic VR tasks. In this context, direct quantitative comparison of model transferability across studies remains challenging, as relatively few works explicitly investigate cross-task prediction. Moreover, those that do typically focus on task-specific scenarios and often do not report standardized error metrics such as RMSPE. This highlights the need for more consistent evaluation frameworks to enable meaningful comparisons in future research.

4.3. Modality Comparison

The results from both the objective metrics (throughput and error rates) and subjective surveys (NASA-TLX and custom usability questionnaires) consistently indicate that the controller outperforms hand tracking in terms of efficiency, accuracy, and perceived workload.
Several factors likely contribute to these differences. Participant feedback indicated that in Task A, the pointer sphere rendered between the fingers during hand tracking moved inconsistently during pinch selection, making the selection method feel less stable than the controller’s trigger input. Participants reported that the hand tracking modality often resulted in less consistent hand positioning, whereas the controller provided a more stable, ergonomically comfortable hand posture. Some participants reported experiencing discomfort consistent with carpal tunnel strain when using hand tracking for extended periods.
In Task B, participants reported that the haptic feedback from the controller gave them a clear sensation of successful object acquisition. This tactile feedback, coupled with the stable hand posture supported by the controller, likely contributed to the lower perceived workload and higher performance scores. Although hand tracking theoretically allows for more naturalistic interactions, the lack of physical feedback appears to limit performance and user satisfaction in these tasks, as previously discussed by Kourtesis et al. [30] and Johnson et al. [14]. Additionally, the hand tracking system occasionally failed to register pinches when the thumb occluded the index finger, further introducing inconsistencies in interaction.
Finally, it should be noted that the observed effect sizes were generally large across all test cases, indicating robust effects. This further supports the adequacy of the chosen sample size ( N = 32 ) for detecting meaningful differences in this study.

4.4. Limitations

Several limitations of this study should be noted, primarily due to the challenges of applying Fitts’ law to realistic 3D object manipulation tasks. In particular, three aspects may have influenced the observed predictive performance:
  • The limited range of I D values in the applied task scenario.
  • Constraints in designing ecologically valid manipulation tasks that reflect all orientations used in the standardized experiment.
  • The use of the classic Shannon formulation of Fitts’ law without explicitly accounting for directional effects in 3D movement.
First, the range of I D values differed between the two experimental scenarios. In Task A, the I D values ranged from approximately 2.322 to 4.392 bits, while Task B used lower values ranging from 1.233 to 3.187 bits. This lower range in Task B was necessary due to physical constraints: targets needed to remain within arm’s reach, be clearly visible within the field of view, and be large enough for the tool to be comfortably grasped. Increasing the I D values would have required either smaller targets or greater target spacing, which would have made the task impractical given the physical size of the manipulated objects. Additionally, low I D values (between 1 and 2 bits) are known to sometimes cause a breakdown of the linear relationship predicted by Fitts’ law, producing a “hook” in the regression line [4]. This nonlinearity may have contributed to the relatively high RMSPE values observed during cross-task validation, since the predictive models were derived from higher I D ranges. Future efforts should seek greater overlap in ID ranges between calibration and applied tasks to enhance the robustness and predictive accuracy of the derived models.
Second, the design of the applied manipulation tasks imposed additional constraints. A counterpart to the 135 -rotated configuration used in Task A could not be implemented in Task B, as no realistic object manipulation scenario could be devised for this orientation. Replicating this configuration would have required participants to move tools into physically awkward positions, such as under a table or above head level, which would neither be safe nor ecologically valid. Hence, the applied task scenarios did not fully replicate the spatial configurations used in the standardized experiment.
Third, the study used the classic Shannon formulation to calculate I D , without explicitly accounting for target orientation angles in 3D space. While this approach allowed straightforward comparison with established Fitts’ law paradigms, it does not capture potential directional effects in 3D movement, which have been discussed in prior work [6]. As such, the I D was calculated without an effective target width formulation, although prior work has indicated that object size can influence movement time [31]. Interaction behavior was nevertheless controlled by applying consistent collision boundaries across most tools, which reduced variability in the target acquisition process; however, this approach does not explicitly account for object size in the task difficulty formulation. Alternative formulations that incorporate effective width may better reflect object manipulation tasks. Including these factors, along with angular dependencies and other spatial characteristics, in future models may further enhance the accuracy of modeling 3D pointing and manipulation tasks.
Furthermore, while the chosen sample size aligns with previous controlled VR studies, future research could improve generalizability by including larger and more diverse participant groups, especially those with varying levels of VR experience. Although the controlled experimental setup ensured reproducibility and isolation of key factors, it does not fully represent real-world conditions. Future studies could evaluate the proposed framework in more ecologically valid environments, incorporating less constrained tasks and potential edge cases such as occlusions, varying lighting conditions, or tracking instability to further assess the robustness and applicability of the findings.

4.5. Educational Implications

Beyond its methodological and empirical contributions, the system developed in this study may also serve as a valuable tool in higher education, particularly in programs that teach HCI. Traditionally, Fitts’ law is introduced and demonstrated using 2D pointing tasks, typically involving mouse-based interaction on a desktop interface and, sometimes, on a smartphone touchscreen via direct tapping. While these exercises effectively illustrate the relationship between movement time and I D , they offer limited exposure to the complexities of spatial interaction that arise in immersive environments.
The experimental framework proposed in this work could be readily adopted in laboratory settings equipped with consumer-grade VR hardware such as Meta Quest headsets. In this context, students could replicate the workflow presented in this study, with two main benefits: experiencing immersive interaction firsthand and engaging in empirical data collection, model construction, and quantitative evaluation.
From a pedagogical perspective, this setup supports several forms of higher-order learning. Students can collaborate in groups to collect experimental data, participate in hands-on modeling and regression analysis, and critically examine the limitations of predictive models when applied to more complex tasks. Thus, in addition to its primary research contributions, this study also illustrates how immersive VR can be effectively integrated into HCI education to support experiential and data-driven learning.

5. Conclusions

This study examined the applicability of Fitts’ law in a 3D virtual reality environment, the predictive transfer of classic point-and-select models to more realistic object manipulation tasks, and the performance differences between controller and hand tracking modalities.
The results confirm that Fitts’ law remains a robust predictor of movement time for 3D pointing tasks in VR, extending prior findings from 1D and 2D interaction contexts. However, predictive models derived from standardized pointing tasks exhibited limited transferability when applied to more realistic grab-and-release object manipulation scenarios. While the models preserved the overall trend of task difficulty across scenarios, they consistently underestimated the absolute movement times required for these more complex interactions.
Both objective performance metrics and subjective user evaluations further indicated that controller-based interaction outperformed hand tracking in terms of efficiency, accuracy, and perceived workload. These differences appear to be influenced by factors such as more stable input mechanisms, reliable haptic feedback, and fewer tracking inconsistencies compared to hand tracking.
Overall, the findings suggest that although traditional Fitts’ law models remain useful for characterizing pointing behavior in immersive environments, additional factors such as interaction technique, depth-dependent coordination, and manipulation constraints should be considered when modeling realistic 3D object manipulation tasks. Beyond its primary research contributions, this study also suggests the potential of immersive VR environments as platforms for enhanced experiential learning in HCI.

Author Contributions

Conceptualization, S.L.; methodology, N.R., D.O., L.B. and S.L.; software, N.R.; validation, N.R. and S.L.; formal analysis, N.R., D.O., L.B. and S.L.; investigation, N.R., D.O., L.B. and S.L.; resources, D.O. and S.L.; data curation, N.R.; writing—original draft preparation, N.R.; writing—review and editing, N.R., D.O., L.B. and S.L.; visualization, N.R.; supervision, D.O. and S.L.; project administration, L.B. and S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by EU under the Erasmus+ project TRAINEE (grant 2024-1-MT01-KA220-HED-000246701), and by the University of Rijeka grant uniri-iz-25-198.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the Faculty of Engineering, University of Rijeka (Reference No.: 2170-1-43-39-26-1; date of approval: 13 February 2026).

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HCIHuman–Computer Interaction
VRVirtual Reality
MTMovement Time
IDIndex of Difficulty
NASA-TLXNASA Task Load Index
RMSPERoot Mean Square Percentage Error

References

  1. Fitts, P.M. The information capacity of the human motor system in controlling the amplitude of movement. J. Exp. Psychol. 1954, 47, 381–391. [Google Scholar] [CrossRef] [PubMed]
  2. MacKenzie, I.S.; Buxton, W. Extending Fitts’ law to two-dimensional tasks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’92, Monterey, CA, USA, 3–7 May 1992; pp. 219–226. [Google Scholar] [CrossRef]
  3. ISO 9241-9:2000; Ergonomic Requirements for Office Work with Visual Display Terminals (VDTs)—Part 9: Requirements for Non-Keyboard Input Devices. International Organization for Standardization: Geneva, Switzerland, 2000.
  4. Soukoreff, R.W.; MacKenzie, I.S. Towards a standard for pointing device evaluation, perspectives on 27 years of Fitts’ law research in HCI. Int. J. Hum.-Comput. Stud. 2004, 61, 751–789. [Google Scholar] [CrossRef]
  5. ISO/TS 9241-411:2012; Ergonomics of Human-System Interaction—Part 411: Evaluation Methods for the Design of Physical Input Devices. International Organization for Standardization: Geneva, Switzerland, 2012. Available online: https://www.iso.org/standard/54106.html (accessed on 2 April 2026).
  6. Cha, Y.; Myung, R. Extended Fitts’ law for 3D pointing tasks using 3D target arrangements. Int. J. Ind. Ergon. 2013, 43, 350–355. [Google Scholar] [CrossRef]
  7. Clark, L.D.; Bhagat, A.B.; Riggs, S.L. Extending Fitts’ law in three-dimensional virtual environments with current low-cost virtual reality technology. Int. J. Hum.-Comput. Stud. 2020, 139, 102413. [Google Scholar] [CrossRef]
  8. Triantafyllidis, E.; Li, Z. The Challenges in Modeling Human Performance in 3D Space with Fitts’ Law. In Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA ’21, Yokohama, Japan, 8–13 May 2021. [Google Scholar] [CrossRef]
  9. Lou, X.; Song, X.; Hu, X.; Ma, M.; Fu, L.; Zhuang, X.; Fan, X. An Extended Fitts’ Law Model for Hand Interaction Evaluation in Virtual Reality: Accounting for Spatial Variables in 3D Space and the Arm Fatigue Influence. Int. J. Hum. Comput. Interact. 2025, 41, 11611–11637. [Google Scholar] [CrossRef]
  10. Ware, C.; Lowther, K. Selection using a one-eyed cursor in a fish tank VR environment. ACM Trans. Comput.-Hum. Interact. 1997, 4, 309–322. [Google Scholar] [CrossRef]
  11. Murata, A.; Iwase, H. Extending Fitts’ law to a three-dimensional pointing task. Hum. Mov. Sci. 2001, 20, 791–805. [Google Scholar] [CrossRef] [PubMed]
  12. Preikstas, V.; Schofield, D. Exploring Fitts’ Law in Virtual Reality Applications. SSRG Int. J. Comput. Sci. Eng. 2025, 12, 1–18. [Google Scholar] [CrossRef]
  13. Kim, H.; Hong, Y.; Yu, J.; Xiong, S.; Kim, W. Toward a More Standardized Multi-Directional Tapping Task in VR: The Effect of Target Depth. In Proceedings of the 2025 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Daejeon, Republic of Korea, 8–12 October 2025; pp. 359–369. [Google Scholar] [CrossRef]
  14. Johnson, C.I.; Fraulini, N.W.; Peterson, E.K.; Entinger, J.; Whitmer, D.E. Exploring Hand Tracking and Controller-Based Interactions in a VR Object Manipulation Task. In Proceedings of the HCI International 2023—Late Breaking Papers; Chen, J.Y.C., Fragomeni, G., Fang, X., Eds.; Springer: Cham, Switzerland, 2023; pp. 64–81. [Google Scholar] [CrossRef]
  15. Schäfer, A.; Reis, G.; Stricker, D. Comparing Controller with the Hand Gestures Pinch and Grab for Picking Up and Placing Virtual Objects. In Proceedings of the 2022 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Christchurch, New Zealand, 12–16 March 2022; pp. 738–739. [Google Scholar] [CrossRef]
  16. Babu, S.; Tsai, M.H.; Hsu, T.W.; Chuang, J.H. An Evaluation of the Efficiency of Popular Personal Space Pointing versus Controller based Spatial Selection in VR. In Proceedings of the ACM Symposium on Applied Perception 2020, SAP ’20, Virtual, 12–13 September 2020. [Google Scholar] [CrossRef]
  17. Lane, L.; Tahmid, I.; Lu, F.; Bowman, D. Evaluating the Viability of Additive Models to Predict Task Completion Time for 3D Interactions in Augmented Reality. arXiv 2026, arXiv:2601.23209. [Google Scholar] [CrossRef]
  18. Garbaya, S.; Hugel, V. Modelling Movement Time for Haptic-enabled Virtual Assembly. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2020)—HUCAPP; INSTICC, SciTePress: Setúbal, Portugal, 2020; pp. 53–63. [Google Scholar] [CrossRef]
  19. Wan, Y.; Sun, J.; Peers, C.; Humphreys, J.; Kanoulas, D.; Zhou, C. Performance and Usability Evaluation Scheme for Mobile Manipulator Teleoperation. IEEE Trans. Hum.-Mach. Syst. 2023, 53, 844–854. [Google Scholar] [CrossRef]
  20. Amini, M.; Stuerzlinger, W.; Teather, R.J.; Batmaz, A.U. A Systematic Review of Fitts’ Law in 3D Extended Reality. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25, Yokohama, Japan, 26 April–1 May 2025. [Google Scholar] [CrossRef]
  21. Teather, R.J.; Stuerzlinger, W. Pointing at 3D targets in a stereo head-tracked virtual environment. In Proceedings of the 2011 IEEE Symposium on 3D User Interfaces (3DUI), Singapore, 19–20 March 2011; pp. 87–94. [Google Scholar] [CrossRef]
  22. Roig-Maimó, M.F.; Mas-Sansó, R.; MacKenzie, I.S. Design of 2D Fitts’ law experiments: An odd thing about targets. Appl. Ergon. 2026, 131, 104680. [Google Scholar] [CrossRef] [PubMed]
  23. Roig-Maimó, M.F.; MacKenzie, I.; Manresa-Yee, C.; Varona, J. Fitts’ Law: On Calculating Throughput and Non-ISO Tasks. Rev. Colomb. Comput. 2018, 19, 7–28. [Google Scholar] [CrossRef]
  24. National Aeronautics and Space Administration. NASA Task Load Index (TLX). 2006. Available online: https://humansystems.arc.nasa.gov/groups/tlx/ (accessed on 10 March 2026).
  25. Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. In Human Mental Workload; Hancock, P.A., Meshkati, N., Eds.; Elsevier: Amsterdam, The Netherlands, 1988; Volume 52, pp. 139–183. [Google Scholar] [CrossRef]
  26. Wobbrock, J.O.; Cutrell, E.; Harada, S.; MacKenzie, I.S. An error model for pointing based on Fitts’ law. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, Florence, Italy, 5–10 April 2008; pp. 1613–1622. [Google Scholar] [CrossRef]
  27. MacKenzie, I.S.; Isokoski, P. Fitts’ throughput and the speed-accuracy tradeoff. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’08, Florence, Italy, 5–10 April 2008; pp. 1633–1636. [Google Scholar] [CrossRef]
  28. Fitts, P.; Peterson, J. Information capacity of discrete motor responses. J. Exp. Psychol. 1964, 67, 103–112. [Google Scholar] [CrossRef] [PubMed]
  29. MacKenzie, I.S. Fitts’ Law. In The Wiley Handbook of Human Computer Interaction; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2018; Chapter 17; pp. 347–370. [Google Scholar] [CrossRef]
  30. Kourtesis, P.; Vizcay, S.; Marchal, M.; Pacchierotti, C.; Argelaguet, F. Action-Specific Perception & Performance on a Fitts’s Law Task in Virtual Reality: The Role of Haptic Feedback. IEEE Trans. Vis. Comput. Graph. 2022, 28, 3715–3726. [Google Scholar] [CrossRef] [PubMed]
  31. Deng, C.L.; Geng, P.; Hu, Y.F.; Kuai, S.G. Beyond Fitts’s Law: A Three-Phase Model Predicts Movement Time to Position an Object in an Immersive 3D Virtual Environment. Hum. Factors J. Hum. Factors Ergon. Soc. 2019, 61, 879–894. [Google Scholar] [CrossRef] [PubMed]
Figure 1. A schematic representation of the 2D multi-directional tapping test using 9 targets.
Figure 1. A schematic representation of the 2D multi-directional tapping test using 9 targets.
Mti 10 00049 g001
Figure 2. 3D extension of the standardized multi-directional task used in this study.
Figure 2. 3D extension of the standardized multi-directional task used in this study.
Mti 10 00049 g002
Figure 3. Targets in different spatial configurations for Task A: 0 rotation using the controller, resembling the classic 2D task (a), 135 rotation using the controller (b), 45 rotation using hand tracking (c), 90 rotation using hand tracking (d).
Figure 3. Targets in different spatial configurations for Task A: 0 rotation using the controller, resembling the classic 2D task (a), 135 rotation using the controller (b), 45 rotation using hand tracking (c), 90 rotation using hand tracking (d).
Mti 10 00049 g003
Figure 4. Targets in different spatial configurations for Task B: Table scenario ( 90 ) (a), Stepped Shelves scenario ( 45 ) (b), Table scenario with the hand tracking modality showing visual feedback (c), Table scenario showing target highlighting upon successful grasp (d).
Figure 4. Targets in different spatial configurations for Task B: Table scenario ( 90 ) (a), Stepped Shelves scenario ( 45 ) (b), Table scenario with the hand tracking modality showing visual feedback (c), Table scenario showing target highlighting upon successful grasp (d).
Mti 10 00049 g004
Figure 5. Targets in different spatial configurations for Task B: Vertical Tool Board scenario ( 0 ) (a), Vertical Board scenario with diagonal approach angles (b), Vertical Board scenario showing target highlighting upon successful grasp by the controller (c), Vertical Board scenario showing a hover effect when the tool is in the target’s proximity (d).
Figure 5. Targets in different spatial configurations for Task B: Vertical Tool Board scenario ( 0 ) (a), Vertical Board scenario with diagonal approach angles (b), Vertical Board scenario showing target highlighting upon successful grasp by the controller (c), Vertical Board scenario showing a hover effect when the tool is in the target’s proximity (d).
Mti 10 00049 g005
Figure 6. Menus for the initial experiment setup (a) and post-task navigation (b).
Figure 6. Menus for the initial experiment setup (a) and post-task navigation (b).
Mti 10 00049 g006
Figure 7. An example of a participant performing the tasks using the controller (a) and the hand tracking modality (b).
Figure 7. An example of a participant performing the tasks using the controller (a) and the hand tracking modality (b).
Mti 10 00049 g007
Figure 8. Linear regression model for predicting MT using the controller and hand tracking based on the data collected in Task A.
Figure 8. Linear regression model for predicting MT using the controller and hand tracking based on the data collected in Task A.
Mti 10 00049 g008
Figure 9. Observed vs. predicted movement times broken down by angle-based scenarios for the controller modality (a) and hand tracking (b).
Figure 9. Observed vs. predicted movement times broken down by angle-based scenarios for the controller modality (a) and hand tracking (b).
Mti 10 00049 g009
Figure 10. Throughput results per modality derived from Task A.
Figure 10. Throughput results per modality derived from Task A.
Mti 10 00049 g010
Figure 11. Error rate results per modality for Task A (a) and Task B (b).
Figure 11. Error rate results per modality for Task A (a) and Task B (b).
Mti 10 00049 g011
Figure 12. Box plots showing the results per modality for five NASA-TLX factors observed in this study.
Figure 12. Box plots showing the results per modality for five NASA-TLX factors observed in this study.
Mti 10 00049 g012
Figure 13. Box plots showing the results per modality for nine custom usability features observed in this study.
Figure 13. Box plots showing the results per modality for nine custom usability features observed in this study.
Mti 10 00049 g013
Table 1. Participants’ prior experience with video gaming and virtual reality.
Table 1. Participants’ prior experience with video gaming and virtual reality.
Experience LevelVideo Gaming (%)VR (%)
None28.140.6
Limited18.859.4
Moderate21.90.0
Good21.90.0
Extensive9.40.0
Table 2. The range of I D values used in Task A.
Table 2. The range of I D values used in Task A.
Width (W) (cm)Amplitude (A) (cm)Index of Difficulty (ID) (Bits)
5.020.02.322
5.035.03.000
2.520.03.170
5.050.03.459
2.535.03.907
2.550.04.392
Table 3. Tools used in Task B and their visual and effective interaction dimensions.
Table 3. Tools used in Task B and their visual and effective interaction dimensions.
ScenarioToolLongest Visual Dimension (cm)Effective Collider Length (cm)
Table, Stepped ShelvesScrew2.32.3
Wooden dowel2.32.3
Plastic dowel2.32.3
Nut1.151.15
Bolt2.32.3
Vertical Tool BoardWrench7.42.3
Pipe wrench7.42.3
Hammer7.42.3
Table 4. The range of I D values used in Task B.
Table 4. The range of I D values used in Task B.
Width (W) (cm)Amplitude (A) (cm)Index of Difficulty (ID) (Bits)
7.410.01.233
3.710.01.889
7.420.01.889
7.430.02.337
3.720.02.679
3.730.03.187
Table 5. Outlier percentages for each interaction modality.
Table 5. Outlier percentages for each interaction modality.
TaskModalityPercentage of Outliers (%)
Task AController1.29
Hand tracking2.68
Task BController4.95
Hand tracking6.32
Table 6. RMSPE values for each interaction modality and angle-based scenario from Task B.
Table 6. RMSPE values for each interaction modality and angle-based scenario from Task B.
RMSPE (%)
ModalityTable ( 90 )Stepped Shelves ( 45 )Vertical Board ( 0 )
Controller30.1534.1826.64
Hand tracking26.9029.6226.68
Table 7. Wilcoxon signed-rank test results for NASA-TLX factors comparing the controller and hand tracking.
Table 7. Wilcoxon signed-rank test results for NASA-TLX factors comparing the controller and hand tracking.
TLX FactorZ-Scorep-ValueEffect Size rEffect Interpretation
Mental Demand−3.586<0.0010.634Large
Physical Demand−3.888<0.0010.687Large
Performance−3.851<0.0010.681Large
Effort−3.542<0.0010.626Large
Frustration−4.052<0.0010.716Large
Table 8. Wilcoxon signed-rank test results for custom usability features comparing the controller and hand tracking.
Table 8. Wilcoxon signed-rank test results for custom usability features comparing the controller and hand tracking.
Usability FeatureZ-Scorep-ValueEffect Size rEffect Interpretation
Control−4.825<0.0010.853Large
Speed−4.995<0.0010.883Large
Precision−4.830<0.0010.854Large
Learnability−4.335<0.0010.766Large
Intuitiveness−2.6310.00850.465Medium
Need for help−3.1000.00190.548Large
Fatigue−3.782<0.0010.669Large
System reliability−4.599<0.0010.813Large
Engagement−4.278<0.0010.756Large
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rodin, N.; Ogrizović, D.; Batistić, L.; Ljubic, S. Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison. Multimodal Technol. Interact. 2026, 10, 49. https://doi.org/10.3390/mti10050049

AMA Style

Rodin N, Ogrizović D, Batistić L, Ljubic S. Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison. Multimodal Technologies and Interaction. 2026; 10(5):49. https://doi.org/10.3390/mti10050049

Chicago/Turabian Style

Rodin, Nikolina, Dario Ogrizović, Luka Batistić, and Sandi Ljubic. 2026. "Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison" Multimodal Technologies and Interaction 10, no. 5: 49. https://doi.org/10.3390/mti10050049

APA Style

Rodin, N., Ogrizović, D., Batistić, L., & Ljubic, S. (2026). Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison. Multimodal Technologies and Interaction, 10(5), 49. https://doi.org/10.3390/mti10050049

Article Metrics

Back to TopTop