Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment

Bertolasi, Jessica; Garcia-Hernandez, Nadia Vanessa; Memeo, Mariacarla; Guarischi, Marta; Gori, Monica

doi:10.3390/virtualworlds4030031

Open AccessArticle

Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment

by

Jessica Bertolasi

^1,2,†

,

Nadia Vanessa Garcia-Hernandez

^3,4,*,†,

Mariacarla Memeo

⁵

,

Marta Guarischi

^1,2

and

Monica Gori

¹

U-VIP: Unit for Visually Impaired People, Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152 Genoa, Italy

²

Department of Informatics, Bioengineering, Robotics and Systems Engineering, University of Genoa, Via All’Opera Pia 13, 16145 Genoa, Italy

³

Department of Robotics and Advanced Manufacturing, Center for Research and Advanced Studies (CINVESTAV), Zona Industrial, Ramos Arizpe, Saltillo 25900, Mexico

⁴

Ministry of Science, Humanities, Technology and Innovation (SECIHTI), Avenida Insurgentes Sur 1582, Mexico City 03940, Mexico

⁵

Electronic Design Laboratory (EDL), Fondazione Istituto Italiano di Tecnologia, Via Enrico Melen 83, 16152 Genoa, Italy

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Virtual Worlds 2025, 4(3), 31; https://doi.org/10.3390/virtualworlds4030031

Submission received: 15 May 2025 / Revised: 22 June 2025 / Accepted: 30 June 2025 / Published: 3 July 2025

Download

Browse Figures

Versions Notes

Abstract

The advent of mixed reality (MR) systems has revolutionized human–computer interactions by seamlessly integrating virtual elements with the real world. Devices like the HoloLens 2 (HL2) enable intuitive, hands-free interactions through advanced hand-tracking technology, making them valuable in fields such as education, healthcare, engineering, and training simulations. However, despite the growing adoption of MR, there is a noticeable lack of comprehensive comparisons between the hand-tracking accuracy of the HL2 and high-precision benchmarks like motion capture systems. Such evaluations are essential to assess the reliability of MR interactions, identify potential tracking limitations, and improve the overall precision of hand-based input in immersive applications. This study aims to assess the accuracy of HL2 in tracking hand position and measuring kinematic hand parameters, including joint angles and lateral pinch span (distance between thumb and index fingertips), using its tracking data. To achieve this, the Vicon motion capture system (VM) was used as a gold-standard reference. Three tasks were designed: (1) finger tracing of a 2D pattern in 3D space, (2) grasping various common objects, and (3) lateral pinching of objects with varying sizes. Task 1 tests fingertip tracking, Task 2 evaluates joint angle accuracy, and Task 3 examines the accuracy of pinch span measurement. In all tasks, HL2 and VM simultaneously recorded hand positions and movements. The data captured in Task 1 were analyzed to evaluate HL2’s hand-tracking capabilities against VM. Finger rotation angles from Task 2 and lateral pinch span from Task 3 were then used to assess HL2’s accuracy compared to VM. The results indicate that the HL2 exhibits millimeter-level errors compared to Vicon’s tracking system in Task 1, spanning in a range from 2 mm to 4 mm, suggesting that HL2’s hand-tracking system demonstrates good accuracy. Additionally, the reconstructed grasping positions in Task 2 from both systems show a strong correlation and an average error of 5°, while in Task 3, the accuracy of the HL2 is comparable to that of VM, improving performance as the object thickness increases.

Keywords:

motion capture; HoloLens 2; hand tracking; grasping; evaluation; movement; Vicon

1. Introduction

The advent of mixed reality (MR) technology has revolutionized how we interact with digital information, blending virtual elements with the physical world. Unlike virtual reality, which immerses users in a completely digital environment, MR merges realistically and seamlessly virtual objects within the real world, enhancing users’ sensory experience with a high level of local presence [1,2]. This integration has led to significant advancements across various sectors. For instance, in education, MR facilitates immersive learning experiences [3,4,5], allowing people to visualize complex concepts through interactive 3D models. In healthcare, MR assists surgeons by overlaying critical data during procedures, thereby improving precision and outcomes [6], and motivates users to perform rehabilitation exercises [7]. Also, the entertainment engineering industry has embraced MR to create engaging experiences, such as interactive gaming environments that merge virtual characters with real-world settings [8].

Among the leading devices for MR experiences, the Microsoft HoloLens 2 (HL2) (Microsoft, Redmond, WA, USA) stands out as an optical see-through head-mounted display [9]. HL2 enables intuitive, hands-free interaction through gesture, voice, and eye tracking; indeed, it is equipped with advanced sensors, spatial mapping, and holographic processing. In this context, HL2 is noteworthy for its sophisticated hand-tracking system, allowing users to naturally and seamlessly interact with MR content. In recent years, various applications using HL2 have been developed in different fields, including manufacturing [10], medical simulations [11], and human–computer interaction research [12]. However, the effectiveness of these applications heavily relies on the accuracy and reliability of the hand-tracking system [13]. While HL2 promises robust hand-tracking features, it is crucial to validate these claims through rigorous experimental testing [14] and comparisons with established motion capture technologies. Since each head-mounted display (HMD) utilizes different tracking algorithms, sensor placements, and depth perception techniques, variations in tracking accuracy, latency, and stability are common. To date, only one recent study has used the Vicon Motion system (Vicon, Oxford, UK), a highly accurate motion capture technology, as a gold-standard reference. However, it provided only a graphical and descriptive comparison between HL2 and Vicon, without quantitative analysis [15]. Moreover, no studies have evaluated the HL2’s accuracy in estimating finger rotation angles during functional tasks, despite the relevance of this data for medical and sports applications. Finally, HL2’s ability to track hand movements during real-object interactions, such as pinching or grasping, common in daily activities and mixed reality, remains unexamined [16,17,18].

To help address this gap in knowledge, this study aims to evaluate the hand-tracking accuracy of the HL2 and assess its reliability in extracting kinematic measurements from its tracking data. Unlike previous studies, the VM capture system is used as a benchmark, and the evaluation of the HL2 tracking has been conducted not only during free-hand movement but also during grasping and pinching tasks with real objects. To achieve this, three primary tasks were designed: (1) finger tracing of a 2D pattern in 3D space, (2) grasping various common objects, and (3) lateral pinching of objects with varying sizes. Task 1 assesses HL2’s capabilities in fingertip tracking, Task 2 evaluates its accuracy in measuring joint angles, and Task 3 examines its accuracy in measuring lateral pinch span. These tasks were selected because they represent common interactions in mixed reality applications and require precise hand tracking to be effective. The data collected and processed from both HL2 and VM were then compared, using VM as the benchmark, to evaluate the spatial accuracy of HL2 in tracking hand position and measuring kinematic hand parameters. If HL2 demonstrates strong agreement with VM data, it could be a viable alternative in situations where optical tracking systems are impractical. Furthermore, the findings of this study can aid researchers and developers in selecting appropriate uses of the system, based on its tracking accuracy and kinematic data measurement capabilities, and in exploring future applications of this technology in fields such as education, healthcare, and industry.

2. Related Works

Previous works have been conducted to analyze and compare the hand-tracking precision of advanced HMDs in detecting hand movements, finger positions, and gesture recognition. In some studies, high-precision optical motion capture systems, such as Vicon Motion Capture System (VM) (Vicon, Oxford, UK) or OptiTrack (Corvallis, OR, USA), have been used as benchmarks for the validation process. For instance, the Quest 2 (Meta Platforms Inc., Menlo Park, CA, USA) HMD has been evaluated for its hand-tracking performance in various settings, using external cameras as marker-based ground-truth tracking systems [19,20], highlighting its strengths and weaknesses in precision and robustness [18]. Additionally, the tracking performance of the Oculus Rift S (Oculus VR, Menlo Park, CA, USA) [21] and HTC VIVE (HTC Corporation in partnership with Valve Corporation, HTC: New Taipei City, Taiwan; Valve: Bellevue, WA, USA) [22] devices has been assessed using the OptiTrack and VM tracking systems, respectively, as gold-standard references. Furthermore, other studies have compared hand-tracking performance across different HMDs. For example, a previous study [23] compared the accuracy, jitter, and working area of the Oculus Rift (Oculus VR, Menlo Park, CA, USA) and HTC VIVE, suggesting various applications for these devices.

Regarding the performance of HoloLens (Microsoft Corporation, Redmond, WA, USA), it has also been studied in different contexts. For instance, a work from Chaconas and Hollerer [24] designed and evaluated various two-handed gestures for rotating and scaling objects with HL2. The results indicate that, despite challenges related to field-of-view limitations, certain two-handed techniques perform comparably to the one-handed baseline technique in terms of accuracy and time. Additionally, the study by Vassallo and colleagues [11] evaluated the HL2’s ability to accurately maintain holograms in their intended real-world locations, and the results demonstrated the device’s potential for clinical applications. Meanwhile, the study from Schneider et al. [25] compared the spatial accuracy of the HL2 and Magic Leap One (Magic Leap Inc., Plantation, FL, USA) in tracking index finger movements along a 2D shape displayed on a touchscreen positioned either vertically or horizontally on a table. The results revealed that HL2 achieved an accuracy of approximately 15 mm, while Magic Leap One had an accuracy of around 40 mm. In another study [14], researchers first evaluated the performance of the HL2’s depth sensor and tracking system, followed by its capability to map indoor environments. The tracking evaluation results showed that HL2 achieved an accuracy of approximately 17 mm. A similar result was reported in Soares et al. [17], which found an accuracy of approximately 20 mm in experiments involving active and passive hand/head movements, using the OptiTrack system as the ground-truth measurement system. A more recent study validated the capabilities of HL2’s RGB cameras, eye tracking, microphone array, and IMU sensors through various scenario-based tests, though without the use of a ground-truth tracking system [26].

Overall, previous studies have consistently reported that the accuracy of HL2 in finger tracking ranges from 15 mm to 20 mm. This variation and relatively low accuracy may be attributed to the tracking method or validation system used. Notably, only one recent study has used the VM as a gold-standard reference—one of the most precise motion capture systems, capable of achieving sub-millimeter accuracy under optimal conditions. However, this study conducted only a graphical and descriptive comparison of hand movements captured by HL2 and Vicon, without providing a quantitative analysis [15]. Moreover, no study has evaluated the accuracy of HL2 in calculating finger rotation angles during functional tasks using its tracking system, despite the usefulness of this data in medical and sports applications for assessing hand movements. Lastly, none have examined HL2’s tracking capabilities in tasks involving pinching or grasping real objects, which are commonly performed in activities of daily living and also mixed reality applications [16,17,18].

3. Methods

3.1. Participants

Subjects were recruited from the general population. Inclusion criteria required neurologically healthy subjects with no history of significant injury to their upper limb. Informed written consent was obtained for each subject recruited into the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Liguria Region Ethics Committee (2/2020—DB id 10213) on 18 December 2023.

Twelve participants (mean age 32.3, sd 5.6), five men and seven women, took part in this experimental procedure. All subjects completed the assigned protocol without incident, and no subjects were excluded. Ten participants (mean age 32.1, sd 5.7) took part in Task 1 (tracing of a 2D pattern in 3D space) and Task 2 (grasping common objects), while four participants (mean age 35, sd 7.2) participated in Task 3 (lateral pinching of an object with varying sizes).

3.2. Microsoft HoloLens 2 System

HL2 (Microsoft, Redmond, WA, USA) is an advanced augmented reality headset designed primarily for enterprise applications, offering significant improvements over its predecessor. It is equipped with a Qualcomm Snapdragon 850 Compute Platform and a custom-built Holographic Processing Unit (HPU 2.0) for efficient mixed reality computing. It features 4 GB LPDDR4x RAM and 64 GB UFS 2.1 storage. The headset’s display system consists of two MEMS-based waveguide displays, each providing a resolution of 2048 × 1080 per eye with a 52-degree diagonal field of view, enhancing visual clarity and immersion. The graphics processing is handled by an Adreno 630 GPU, ensuring smooth holographic rendering. For audio, it includes built-in spatial sound speakers and a five-microphone array for voice recognition and communication. Input methods include fully articulated hand tracking for natural and intuitive interaction with digital content, eye tracking technology that facilitates user-specific calibration and gaze-based navigation, voice commands, and external Bluetooth peripherals. The front-facing camera is an 8 MP RGB sensor capable of 1080p video recording at 30 FPS, complemented by depth and IR sensors for environment mapping. The ergonomic design includes a lightweight, balanced structure (approximately 566 g) with a flip-up visor, allowing seamless transitions between virtual and physical environments.

Hand gestures are recognized within a vertical range of +10° to −60° relative to the eye line, with optimal tracking occurring between 0° and −35°. Horizontal movement tracking is similarly constrained but dynamically adjusted based on the user’s head movements.

Hand-tracking data are acquired at a frequency of 100 Hz, aligning with the acquisition frequency of the VM to ensure synchronization between both systems. Data, including time and finger joint positions, were stored in lists during task execution and saved in a CSV file at the end.

3.3. VICON Nexus Motion Capture System

Seven motion capture infrared cameras from VM (Vicon, Oxford, UK) were placed in a laboratory to create a capture volume of approximately 2 m × 1 m × 1 m. Data were recorded with version v2.2 of Vero cameras (2048 × 1088 pixel resolution) at a 100 Hz acquisition rate. The VM allows for highly accurate motion capture data to be collected, which can subsequently be converted to speed, acceleration, and jerk estimations.

Using standard calibration practices outlined by the VM user’s manual, we completed system calibration prior to data acquisition. Calibration was accepted if the average 3D residuals were estimated at under 0.8 mm, and data acquisition was only attempted after an adequate calibration was achieved. The average calibration error was 0.5 ± 0.2 mm.

Each subject wore lightweight, retroreflective, spherical, 6.5 mm diameter markers on their hand. Marker placement was performed according to the hand joints tracked by HL2 using the Mixed Reality Toolkit version 2.8.3.0 (MRTK2) and following the recommendations provided in Cook et al. [27]. In total, we had 6 tracking points, as shown in Figure 1a, for the tracking of a 2D pattern (Task 1) and 14 points, as seen in Figure 1b, for the grasping (Task 2) and lateral pinching (Task 3) of objects. Each point was assigned a specific label, and its position in 3D coordinates was recorded.

3.4. Tasks and Experimental Procedures

Task 1 involved tracing a 2D pattern in 3D space with the index fingertip by moving their right arm. It was designed to assess HL2’s fingertip tracking capabilities. The pattern was an infinity symbol, measuring 20 cm in height and 50 cm in width, viewed through HL2 at a distance of 60 cm from the participant’s head and centered at the height of their right shoulder. Participants were instructed to trace the shape using their right index finger while moving their arm and keeping their hand open. They were explicitly instructed to keep their heads still and were informed that this was crucial for ensuring accurate recordings. They could start at any point and trace in any direction at a normal speed but were required to complete the entire pattern. The task was performed only once.

Task 2 involved grasping various common objects with the right hand while maintaining a static position. Some objects were transparent to facilitate marker tracking, as opaque or colored objects could occlude parts of the hand, obstructing the markers. They included a cylinder (13 cm long × 3 cm in diameter), two spheres (8 cm and 12 cm in diameter), a cube (3 cm × 3 cm × 1 cm) and a plug (3 cm in diameter), allowing us to capture the most common types of grasps: cylindrical, spherical, and pinch [28,29]. Specifically, the cube is for precision grip with opposite fingers on flat surfaces; the small sphere provides a spherical grasp with flexed fingers encircling the object, while the big sphere requires a power grasp; the plug allows us a pinch grasp with thumb and index finger; the cylinder enables us to study cylindrical grasp with fingers wrapped around the object. The procedure for this task involved grasping each object from the table, positioning it approximately 20 cm in front of the participant’s face for 5 s, and then returning it to its original position. This process was repeated five times for each object, always starting from the same initial position. The objects were always presented and grasped in the same order across repetitions, as the primary objective was to record stable hand configurations. A single data file was generated for each object, containing recordings of all five grasping attempts. Figure 2a illustrates how the objects were grasped and positioned in front of the participant’s face.

Task 3 involved laterally pinching rectangular objects with both hands while maintaining a static position. Five objects of varying thickness were used, as shown in Figure 2b, ranging from 2 to 10 cm in increments of 2 cm. The use of both hands in this task was necessary due to the difficulty HL2 faces in accurately detecting the hand’s configuration when markers are present. As a result, markers were placed on only one hand, allowing HL2 to track the hand without markers and VM to track the hand with markers. The markers were placed on the index and thumb fingers, randomly on either the right or left hand across participants, who were instructed and trained to maintain a natural pose and the same symmetrical pinch-posture hand configuration with both hands. The procedure for this task involved pinching each object on the table and keeping a static position for 10 s in front of the participant’s head to allow both the HL2 and VM to correctly capture the hands. The sequence of object sizes was not randomized or counterbalanced, as the fixed order facilitated consistent data capture across repetitions to collect accurate static pinching configurations.

To ensure valid HL2 recording of the hand in Tasks 2 and 3, participants could see a hand mesh through HL2, which overlapped their actual hand during the tasks. A recording was considered valid when participants confirmed that the hand mesh configuration matched their real hand while holding the object and maintaining a static position. Otherwise, the task was repeated.

In all tasks, participants were seated on a chair positioned so that the backrest was 40 cm from the table. The right armrest was aligned with a designated cross mark, indicating the resting position of the right hand, which was used for performing the calibration procedures for both the HL2 and VM. The operational workspace was located directly in front of the participants and was constrained by the field of view of the HL2 device.

Data recording for each task began with the remote initiation of the HL2 recording application. Once the HL2 application started, the VM recording was immediately activated. To facilitate offline synchronization of data from the VM and HL2, participants were instructed to make a quick upward movement with their index finger before beginning each task. This movement was clearly visible and served as a reference point for aligning the two data streams and synchronizing the recorded tracks.

3.5. Data Analysis

3.5.1. Data Preprocessing and Alignment

All data were analyzed using MATLAB r2024b [30] (the procedural pipeline followed throughout all tasks is illustrated in Supplementary Figure S1). Initially, the data were stored in various formats; therefore, we standardized the data formats. Data from HL2 were first filtered using a 4th-order Butterworth low-pass filter with a 6 Hz cut-off frequency to reduce noise while preserving useful signal components, avoiding significant distortions, and ensuring accurate and reliable movement analysis. Data from VM were used as stored by its acquisition software, which had already filtered the data with a 4th-order Butterworth low-pass filter and a 6 Hz cut-off frequency before storing it. As a reference for signal filtering, we followed the guidelines described by Carpinella and colleagues [31], who proposed a standardized protocol for hand kinematic analysis, including recommendations for filtering parameters. Specifically, we applied a low-pass Butterworth filter with the same cut-off frequency as reported by Carpinella, but with a lower filter order to minimize signal distortion.

After filtering the data acquired with the HL2 and VM, both data sets were aligned using a time-shifting procedure based on local maximum detection on the first 5 s of the tracks, using the initial movement of the index finger as a reference.

3.5.2. Task 1 Data Analysis

After the two data streams were successfully aligned in time, the HL2 data in the

y

and

z

coordinates were switched to match the reference frame of the VM-acquired data. Since the VM coordinate system is different from the HL2 coordinate system, it was required to overcome this problem prior to comparing data from both systems. One option was to calculate the transformation matrix (rotations

R

and translations

T

) that maps HL2’s data to the reference frame of VM; however, the reference frame of HL2 is unknown as it is located inside the device. Instead, an ICP (Iterative Closest Point) algorithm was used to find iteratively the transformation matrix that aligns the 3D data given by the two systems, minimizing the distance between them [32]. The ICP algorithm has two steps: In the first step, the algorithm finds for each point in the data set of HL2 (

H_{i}

) its closest point in the data set of VM (

V_{i}

) by calculating the Euclidean distance between each pair of points. Since both systems had the same acquisition frequency, and therefore the same number of points, each data point

H_{i}

in HL2 corresponds to one point

V_{i}

in VM with the same index.

d (H_{i}, V) =_{i \in \{i, \dots, N\}}^{m i n} d (H_{i}, V_{i})

(1)

The second step consists of applying a rotation

R

and translation

T

to minimize the distance

E (R, T)

between the corresponding pairs, as follows:

E (R, T) = \frac{1}{N} \sum_{i = 1}^{N} {||V_{i} - R H_{i} - T||}^{2}

(2)

where

N

is the total number of points.

These two steps are repeated until the error is below a threshold of 0.5 mm. Finally, the rotation

R

and translation

T

were applied to the HL2 data to align it to the VM data. Once both datasets were aligned, the average distance error between the HL2 and VM data was computed as

D i s t_{e r r o r} = \frac{1}{N} \sum_{i = 1}^{N} (V_{i} - H_{i})

(3)

This metric provided an estimate of the positional discrepancy between the two systems. A lower value indicates a higher degree of accuracy in HL tracking, while larger values suggest greater deviations from the VM reference system.

3.5.3. Task 2 Data Analysis

In this task, the joint angles of the index, middle, and thumb fingers were calculated from the distal, proximal, and metacarpophalangeal joints tracked by the HL2 and VM. For HL2, joint angles were computed from the tracked positions of the finger joints, whereas for VM, they were derived from the tracked marker positions, as shown in Figure 1c.

Before calculating the angles and after the data alignment process, the data were segmented for each object into five sub-tracks, each corresponding to one of the five grasping attempts. To achieve this, we calculated the velocity of one of the markers (specifically, the index fingertip), and the beginning of each movement was identified when the velocity exceeded 3% of the maximum peak velocity, while the end of the movement was similarly determined by a return to below this threshold. This method allowed us to segment each grasping movement for further analysis. Since each sub-track could have slightly different durations, we then applied a 200-sample normalization to standardize them and make the traces comparable across repetitions and subjects. This was achieved using MATLAB’s interp1 function with the spline 3rd-order interpolation method.

After normalizing the data in time, the angles of the three joints of the index, middle, and thumb fingers were calculated for each sample within each sub-track: for both HL2 and VM data, joint angles were computed using vector geometry between tracked points [33,34]:

θ = {c o s}^{- 1} (\frac{u \cdot v}{‖u‖ \cdot ‖v‖})

(4)

where

u

and

v

are the 3D vectors of the points recorded from HL2 and the VM corresponding to the segment between specific anatomical landmarks. The choice of reference points followed standard anatomical definitions of finger joint angles to ensure consistency between systems.

Three joint angles per finger and sub-track were calculated by averaging the angle values for each joint and finger within a sub-track. Finally, the angular error for each joint and finger was calculated as the difference between the average joint angle obtained from the HL2 data and the corresponding angle from the VM data.

3.5.4. Task 3 Data Analysis

The calculus of the pinch span (distance between thumb and index finger) in Task 3 was achieved by subtracting the size of a marker (only in data recorded with VM) and half the thickness of the thumb and index finger pads, ensuring that the calculation was made from the same anatomical location. The finger pad thickness of each participant was measured using a digital caliper. It is worth mentioning that a correct pinch span measurement should correspond to the thickness of the object being laterally pinched.

3.5.5. Statistical Analysis

First, for all the measures computed, the normality of the data distribution was assessed with the Lilliefors Normality Test [35]. Then, to evaluate the consistency between the reconstructions obtained from the HL2 and the VM, the Pearson correlation coefficient [36] (in case of normal distribution) or Spearman correlation coefficient [37] (in case of non-normal distribution) was calculated for the first and second tasks. This statistical measure was used to quantify the degree of linear relationship between corresponding data from the two systems. Specifically, it was applied in Task 1 to assess whether the reconstructed movement traces from HL2 and VM were correlated and in Task 2 to evaluate the correlation between the reconstructed joint angles. Using the correlation across all tasks provided a standardized method to compare the outputs and validate the similarity of reconstructions between the two systems. For Task 3, we conducted a repeated measures two-way ANOVA to test whether there is an interaction between measurement and size, i.e., whether the difference between HL and VM measurements varies with the size of the object. The null hypothesis for the ANOVA was that there is no interaction between the method of measurement (HL2 vs. VM) and object thickness. The alternative hypothesis was that an interaction exists. Then, we finally computed a paired sample t-test to assess the significance of the observed correlations: in this case, the null hypothesis was that no difference exists between HL2 and VM pinch span for a given thickness, with the alternative being that a difference exists. For all the analyses, the significance level was set at α = 0.05 (MATLAB r2024b default). To complement correlation coefficients, we report the Bland–Altman analysis in Supplementary Materials, including the mean difference (bias) with 95% confidence intervals, as well as the limits of agreement (LoA) with their corresponding 95% confidence intervals, to provide a more comprehensive statistical picture of agreement between the HL2 and the VM.

4. Results

4.1. Task 1: Finger Tracing of a 2D Pattern in 3D Space

In this task, participants were asked to trace an infinity-shaped path in 3D space with the right index fingertip while moving the arm, as viewed through HL2. This task was designed to evaluate the finger tracking capability of the HL2 compared to that of the VM. By calculating the average distance error between HL2 and VM data tracks across all participants, we assessed the finger tracking capability of the HL2.

Before calculating the distance error, the ICP algorithm reported by Besl and McKay [31] was used to iteratively find the transformation matrix that aligns the 3D data from the two systems, minimizing the distance between them. Figure 3 (left panel) illustrates the trajectories recorded by the VM and HL2 systems before alignment. It can be observed that they exhibit a noticeable discrepancy along the depth axis (y-axis). This misalignment arises from differences in the coordinate system origins of the two tracking systems. Figure 3 (right panel) shows the trajectories recorded by the VM and the trajectories of the HL2 system transformed by the rotation and translation matrices found by the ICP algorithm. It can be observed that the two trajectories become closely aligned, indicating that the discrepancy introduced by the differing reference frames has been effectively corrected.

Table 1 presents the average distance error across participants between HL2 and VM tracking data for each fingertip of the hand. The distance errors for the thumb, middle, ring, and little fingertips are consistent, averaging around 2.8 mm. The distance error for the index fingertip is higher (3.9 mm), but remains below 5 mm.

This correspondence is further supported by Pearson’s correlation coefficient, as the data were normally distributed. The correlation coefficients were: RTH3 0.99 ± 0.002, RIF3 0.99 ± 0.008, RTF3 0.99 ± 0.001, RRF4 0.99 ± 0.001, RPF3 0.99 ± 0.001. All p-values were equal to 0, indicating statistically significant correlations.

4.2. Task 2: Grasping Common Objects

In this task, participants grasped various common objects, one by one, with the right hand while maintaining a static position (see Figure 4 for the reconstructed grasp across all objects for a representative subject, shown for both HL2 (red) and VM (black)). The accuracy of HL2 in measuring joint angles was evaluated by calculating the angular error between the angles derived from the data collected by HL2 and VM for each object. All objects were grasped using the thumb, index, and middle fingers, except for the cube and plug, which involved only the thumb and index fingers. Consequently, only the angles of the fingers involved in each grasp were analyzed. The angles for the index and middle fingers were measured at the distal (DIP), proximal (PIP), and metacarpophalangeal (MCP) joints, while for the thumb, the angles were measured at the interphalangeal (IP) and metacarpophalangeal (MCP) joints.

The mean angular joint errors calculated for each grasp and finger are presented in Table 2. It can be observed that the errors range from 0.3° to 22°, with the largest errors occurring in the IP joint when grasping the small sphere (21.87° ± 0.62), and the smallest in the thumb MCP joint when grasping the big sphere (0.28° ± 0.56). On average, across objects and finger joints, the angular error is 5.36 ± 5.31°. The correlation coefficient between the angle distributions for HL2 and VM revealed that for the small sphere, only the thumb MCP joint angles were not correlated (r = 0.04, p = 0.61), while the index and middle ones are correlated (p < 0.01 for thumb IP; p < 0.0001 for all the others). For the big sphere, all the joint angles calculated correlate (p < 0.01 for thumb IP and index DIP and p < 0.005 for all the others) except for the middle PIP (r = −0.12, p = 0.82). The cylindrical grasp presented no correlation in both thumb IP angle (r = 0.09, p = 0.21) and PIP index angle (r = −0.05, p = 0.45). Finally, the plug presented no correlation for the thumb IP angle (r = 0.01, p = 0.89) and the index MCP angle (r = 0.10, p = 0.16), while for the cube grasp, there is always a significant correlation (p < 0.0001 for all five angles). Since not all distributions were normal, correlation types were selected accordingly, with Pearson’s correlation applied when normality was satisfied and Spearman’s correlation otherwise.

4.3. Task 3: Lateral Pinching of Objects

In this task, participants gripped objects of varying heights laterally using their index and thumb fingers. The pinch span (distance between the tip of the index finger and the thumb) was calculated from data collected by both HL2 and VM, and the results were compared to evaluate HL2’s accuracy in measuring lateral pinch span.

As shown in Figure 5, the pinch span measured by HL2 is close to that measured by the VM with the largest difference observed for the object with 6 cm thickness (6.77 ± 0.38 mm for HL2 and 6.36 ± 0.03 mm for VM) and the smallest for the object with 10 cm thickness (10.75 ± 0.34 mm for HL2 and 10.74 ± 0.01 mm for VM). Notably, the pinch span calculated from HL2 data improves as object thickness increases, suggesting that HL2 performs better with larger objects (see Table 3). For thinner objects (2–4 cm), the error was small and consistently positive, as indicated by narrow confidence intervals. For thicker objects (6–10 cm), the increased variability resulted in wide confidence intervals crossing zero, suggesting no clear systematic error at these thicknesses. The two-way ANOVA with repeated measures confirmed a significant interaction between the type of measure (HL2 vs. VM) and the size of the grasped object: F(1, 3) = 644.23, p = 0.00013,

η_{p}^{2}

= 0.995.

The Lilliefors test showed that all the data were normally distributed. The two-way ANOVA with repeated measures confirmed a significant interaction between the type of measure (HL2 vs. VM) and the size of the grasped object: F(1, 3) = 644.23, p = 0.00013,

η_{p}^{2}

= 0.995. No statistically significant difference (h = 0) was observed between the pinch span measurement from HL2 and VM recordings for all objects with different thicknesses. Specifically, the statistical paired sample t-test results for each object thickness were as follows: 2 cm (p = 0.32, t(3) = −1.16 ± 0.004, CI [−0.009, 0.004]), 4 cm (p = 0.27, t(3) = −1.37 ± 0.006, CI [−0.013. 0.005]), 6 cm (p = 0.39, t(3) = −1.01 ± 0.008, CI [−0.017, 0.009]), 8 cm (p = 0.06, t(3) = −2.97 ± 0.002, CI [−0.006, 0.0002]), and 10 cm (p = 0.99, t(3) = −0.02 ± 0.004, CI [−0.007, 0.007]).

5. Discussion

The findings of the present study offer valuable insights into the hand-tracking performance of the HL2, demonstrating its accuracy and reliability in capturing hand kinematics by benchmarking it against the high-precision VM. By assessing both general hand movements and object-grasping tasks, we aimed to capture the strengths and limitations of the HL2’s tracking capabilities in real-world augmented reality (AR) interactions.

Related to the HL2’s fingertip tracking capabilities during hand movement tracing, the results revealed a strong correspondence between the HL2 and VM traces, as indicated by the Euclidean distance measurements, suggesting that the HL2 hand-tracking system provides relatively precise positional data. As shown in Table 1, the tracking errors for all fingertips remain within a few millimeters, with values consistently below 4 mm. Additionally, the high Pearson correlation coefficients (approximately 0.99 for all fingers) confirm a near-perfect correspondence between the two systems. The statistical significance of these correlations (p = 0) further reinforces the reliability of HL2’s tracking performance. The slight differences observed in finger tracking errors may result from factors such as inconsistent sensor coverage, occlusion effects, or variations in hand morphology among participants. As with all tracking systems, the similarity between the participant’s hand and the underlying model or mesh used by the system plays a crucial role in determining measurement accuracy. Therefore, individual differences in hand morphology, such as hand size and finger length, are likely to influence the precision of tracking. These anatomical variations may lead to suboptimal mesh fitting and contribute to measurement errors. While these findings highlight the high spatial accuracy of HL2, they do not account for potential limitations related to temporal resolution or tracking latency, which could impact real-time applications. Notably, the spatial accuracy observed in the present study is considerably higher than that reported in previous works, such as [17,25], where tracking errors were approximately 15 and 20 mm, respectively. These differences may be attributed to variations in the nature of the spatial tasks, with the current study focusing on 3D hand movement tracing, in contrast to the 2D surface tracking used in [25]. Additionally, differences in the reference tracking systems may have contributed, as this study used a Vicon motion capture system, while [25] employed a touchscreen-based system and [17] relied on an OptiTrack setup.

The results obtained for angular joint errors during the grasping of various common objects highlight the overall reliability of the HL2 tracking data for calculating hand kinematic parameters. The average angular joint errors generally remained moderate (about 5°), indicating a reasonable level of accuracy for most grasped objects. The correlation coefficients further support this, showing strong correlations for most joint angles, particularly in objects such as the small and large spheres. However, discrepancies arise in specific cases, such as the lack of correlation for thumb angles in the small sphere grasp and the middle intermediate angle in the big sphere task. These inconsistencies could be attributed to variations in how participants positioned their fingers and differences in hand dimensions, which may introduce differences in the tracking position of the joints. In addition, for joint angle measurements that showed no correlation (e.g., thumb IP), we attribute this to the presence of reflective markers on participants’ hands, which may have interfered with the HL2’s ability to correctly fit the hand mesh and thereby reduced tracking accuracy.

The cylindrical grasp proved to be the most challenging for the HL2 system, likely due to finger occlusion issues. Indeed, the tracking system struggled to differentiate individual finger movements when they overlapped, leading to a lack of correlation for the thumb and index intermediate joints. Similarly, the plug grasp exhibited no correlation for the thumb intermediate and index metacarpal angles, further demonstrating the HL2’s limitations in handling complex hand postures where multiple fingers are closely positioned. In contrast, the cube grasp yielded strong correlations across all angles, suggesting that the HL2 system performs better in situations where finger positions are more distinct and easily trackable.

The results from measuring the lateral pinch span while holding objects of varying thicknesses show that HL2 performance is comparable to that of the VM, with accuracy in calculating the pinch span improving as the object thickness increases. This suggests that the HL2 performs better with larger objects, likely because such objects provide clearer spatial references for hand tracking and reduce occlusion-related errors. This further supports the notion that while HL2’s hand-tracking system is reliable in many scenarios, its performance is highly dependent on the nature of the grasp task and on the object being manipulated. These findings highlight the need for improved tracking algorithms that can better handle overlapping fingers and account for individual hand size differences to enhance MR interaction fidelity.

The high accuracy of HL2 in tracking hand position and measuring kinematic parameters, such as joint angles and lateral pinch span, carries significant implications. As a portable system, HL2 shows potential as a tool for evaluating hand movement during functional activities [38,39] and analyzing human grasping capabilities under various conditions [40]. From a practical perspective, the errors identified in this study suggest that HL2 can be effectively employed in realistic scenarios. Specifically, concerning object grasping, an angular error of approximately 5° falls within a functional tolerance for coarse grasping tasks, allowing successful execution without significant impact on performance. In such cases, the HL2 offers notable advantages in terms of portability and usability compared to traditional motion capture systems such as VM. However, for tasks that demand higher precision (such as microsurgery), such errors may be unacceptable. In these contexts, the HL2 alone (as well as similar commercial systems) may not provide sufficient tracking accuracy, and the integration of additional correction algorithms or complementary sensors becomes necessary. Our results support this conclusion: in Task 3, we quantitatively observed that grasping accuracy improves as the size of the object increases, confirming that large-object grasping remains functional, while fine grasping presents greater challenges. This highlights that, depending on the application, a gold-standard system like Vicon may be preferable over a more portable but less precise solution. Furthermore, the spatial error observed (approximately 3 mm) indicates that HL2 hand tracking performs comparably to the VM in tasks involving large movements or interactions with sizeable virtual objects but loses accuracy in fine movements or with small objects. Consequently, the HL2 proves to be a suitable option for mixed-reality interactions involving gross motor tasks, whereas systems like the VM remain the preferred choice when higher precision in fine motor tracking is required.

To our knowledge, this is the first study that assesses the accuracy of HL2 in tracking hand position and measuring kinematic hand parameters using the VM as a benchmark. This comparison was necessary due to the growing development of applications and research that make use of HL2. Our findings suggest a potential, albeit less precise, correlation between the results obtained from these two different motion detection systems.

6. Conclusions and Future Directions

The results of the present study indicate that the HL2 exhibits millimeter-level errors compared to the VM in both Task 1 and Task 3, and additionally, the reconstructed grasping positions from Task 2 from both systems show a strong correlation and an average error of 5°, suggesting that the HL2’s hand-tracking system demonstrates good accuracy. These findings highlight the potential of HL2 for precise hand tracking in MR applications while also emphasizing the need for further refinements to minimize small positional errors and ensure optimal tracking across different hand configurations and motion patterns.

In conclusion, the HL2 system offers markerless tracking, which, although slightly less precise, is still comparable to motion capture systems. The pace of technological advancements in MR is expected to accelerate in the coming years [41]. Future research will focus on creating more ecologically valid scenarios within experimental procedures, leveraging the possibility to integrate perceptual stimuli into mixed reality environments. Additionally, future work could explore the integration of advanced optimization algorithms and embedded system technologies to further enhance precision and efficiency in applications such as hand tracking and mixed-reality interaction [42,43]. Similarly, real-time processing on embedded platforms may support the development of more reliable and accurate tracking systems, for example, to refine real-time hand pose estimation and gesture recognition [44].

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/virtualworlds4030031/s1, Figure S1: Procedural pipeline outlining the data pre-processing and analysis steps performed for each task; Figure S2: Bland-Altman plots comparing HL2 and VM measurements across each finger in Task 1; Figure S3: Bland-Altman plots comparing HL2 and VM measurements across five dimensions (2 cm, 4 cm, 6 cm, 8 cm, and 10 cm) in Task 3; Table S1: Bland-Altman analysis results for each finger tracked in Task 1, specifically thumb (RTH3), index (RIF3), middle (RTF3), ring (RRF4), and pinky (RPF3); Table S2: Results of the Bland-Altman analysis on angle data collected by two acquisition systems (HL2 and VM) for different grasp types and finger joints from Task 2; Table S3: Bland-Altman analysis results showing the mean difference (bias) with 95% confidence intervals (CI) and limits of agreement (LoA) with their respective 95% CIs for the comparison between HL2 and VM measurements across five dimensions (2 cm, 4 cm, 6 cm, 8 cm, and 10 cm) in Task 3.

Author Contributions

Conceptualization, N.V.G.-H., J.B. and M.G. (Monica Gori); Methodology, N.V.G.-H., J.B. and M.M.; Software, N.V.G.-H. and M.M.; Validation, N.V.G.-H., J.B., M.M. and M.G. (Marta Guarischi); Formal Analysis, N.V.G.-H., J.B. and M.M.; Investigation, N.V.G.-H., J.B. and M.M.; Resources, N.V.G.-H., J.B. and M.M.; Data Curation, N.V.G.-H., J.B., M.M. and M.G. (Marta Guarischi); Writing—Original Draft Preparation, N.V.G.-H., J.B., M.M. and M.G. (Marta Guarischi); Writing—Review and Editing, N.V.G.-H., J.B., M.M., M.G. (Marta Guarischi) and M.G. (Monica Gori); Visualization, N.V.G.-H., J.B., M.M. and M.G. (Marta Guarischi); Supervision, M.G. (Monica Gori); Project Administration, M.G. (Monica Gori); Funding Acquisition, M.G. (Monica Gori). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Territorial Ethics Committee of the Liguria Region on 18 December 2023.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original data presented in the study are openly available in OSF at the following DOI link: https://doi.org/10.17605/OSF.IO/6YZQJ [45].

Acknowledgments

The authors would like to thank all the participants who took part in our experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wagner, I.; Broll, W.; Jacucci, G.; Kuutii, K.; McCall, R.; Morrison, A.; Schmalstieg, D.; Terrin, J.-J. On the Role of Presence in Mixed Reality. Presence Teleoper. Virtual Augment. Real. 2009, 18, 249–276. [Google Scholar] [CrossRef]
Rauschnabel, P.A.; Felix, R.; Hinsch, C.; Shahab, H.; Alt, F. What is XR? Towards a Framework for Augmented and Virtual Reality. Comput. Hum. Behav. 2022, 133, 107289. [Google Scholar] [CrossRef]
Moro, C.; Phelps, C.; Redmond, P.; Stromberga, Z. HoloLens and mobile augmented reality in medical and health science education: A randomised controlled trial. Br. J. Educ. Technol. 2021, 52, 680–694. [Google Scholar] [CrossRef]
Chen, C.; Zhang, L.; Luczak, T.; Smith, E.; Burch, R.F. Using Microsoft HoloLens to improve memory recall in anatomy and physiology: A pilot study to examine the efficacy of using augmented reality in education. J. Educ. Technol. Dev. Exch. 2019, 12, 2. [Google Scholar] [CrossRef]
Moro, C.; Birt, J.; Stromberga, Z.; Phelps, C.; Clark, J.; Glasziou, P.; Scott, A.M. Virtual and Augmented Reality Enhancements to Medical and Science Student Physiology and Anatomy Test Performance: A Systematic Review and Meta-Analysis. Anat. Sci. Educ. 2021, 14, 368–376. [Google Scholar] [CrossRef]
Al Janabi, H.F.; Aydin, A.; Palaneer, S.; Macchione, N.; Al-Jabir, A.; Khan, M.S.; Dasgupta, P.; Ahmed, K. Effectiveness of the HoloLens mixed-reality headset in minimally invasive surgery: A simulation-based feasibility study. Surg. Endosc. 2020, 34, 1143–1149. [Google Scholar] [CrossRef]
Hernandez, N.V.G.; Buccelli, S.; Laffranchi, M.; de Michieli, L. Mixed Reality-based Exergames for Upper Limb Robotic Rehabilitation. In Proceedings of the Companion of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, Stockholm, Sweden, 13–16 March 2023; ACM: New York, NY, USA; pp. 447–451. [Google Scholar] [CrossRef]
Cheok, A.D.; Haller, M.; Fernando, O.N.N.; Wijesena, J.P. Mixed Reality Entertainment and Art. Int. J. Virtual Real. 2009, 8, 83–90. [Google Scholar] [CrossRef]
Park, S.; Bokijonov, S.; Choi, Y. Review of Microsoft HoloLens Applications over the Past Five Years. Appl. Sci. 2021, 11, 7259. [Google Scholar] [CrossRef]
Kolla, S.S.V.K.; Sanchez, A.; Plapper, P. Comparing software frameworks of Augmented Reality solutions for manufacturing. Procedia Manuf. 2021, 55, 312–318. [Google Scholar] [CrossRef]
Vassallo, R.; Rankin, A.; Chen, E.C.S.; Peters, T.M.; Kupinski, M.A.; Nishikawa, R.M. Hologram Stability Evaluation for Microsoft HoloLens; Kupinski, M.A., Nishikawa, R.M., Eds.; SPIE: Bellingham, WA, USA, 2007; p. 1013614. [Google Scholar] [CrossRef]
Xu, C.; Wang, Y.; Quan, W.; Yang, H. Multi-Person Collaborative Interaction Algorithm and Application Based on HoloLens; Springer: Singapore, 2020; pp. 303–315. [Google Scholar] [CrossRef]
Kern, F.; Keser, T.; Niebling, F.; Latoschik, M.E. Using Hand Tracking and Voice Commands to Physically Align Virtual Surfaces in AR for Handwriting and Sketching with HoloLens 2. In Proceedings of the 27th ACM Symposium on Virtual Reality Software and Technology, Osaka, Japan, 8–10 December 2021; ACM: New York, NY, USA, 2021; pp. 1–3. [Google Scholar] [CrossRef]
Hübner, P.; Clintworth, K.; Liu, Q.; Weinmann, M.; Wursthorn, S. Evaluation of HoloLens Tracking and Depth Sensing for Indoor Mapping Applications. Sensors 2020, 20, 1021. [Google Scholar] [CrossRef]
Garcia-Hernandez, N.; Buccelli, S.; De Angelis, A.; Taglione, E.; Laffranchi, M.; De Michieli, L. Assessment of Gamified Mixed Reality Environments for Upper Limb Robotic Rehabilitation: Pilot Study on Healthy Adults. Virtual Real. 2024, 28, 164. [Google Scholar] [CrossRef]
Guinet, A.L.; Bouyer, G.; Otmane, S.; Desailly, E. Reliability of the head tracking measured by Microsoft Hololens during different walking conditions. Comput. Methods Biomech. Biomed. Eng. 2019, 22 (Suppl. S1), S169–S171. [Google Scholar] [CrossRef]
Soares, I.; Sousa, R.B.; Petry, M.; Moreira, A.P. Accuracy and Repeatability Tests on HoloLens 2 and HTC Vive. Multimodal Technol. Interact. 2021, 5, 47. [Google Scholar] [CrossRef]
Smeragliuolo, A.H.; Hill, N.J.; Disla, L.; Putrino, D. Validation of the Leap Motion Controller using markered motion capture technology. J. Biomech. 2016, 49, 1742–1750. [Google Scholar] [CrossRef]
Carnevale, A.; Mannocchi, I.; Sassi, M.S.H.; Carli, M.; De Luca, G.; Longo, U.G.; Denaro, V.; Schena, E. Virtual Reality for Shoulder Rehabilitation: Accuracy Evaluation of Oculus Quest 2. Sensors 2022, 22, 5511. [Google Scholar] [CrossRef]
Abdlkarim, D.; Di Luca, M.; Aves, P.; Maaroufi, M.; Yeo, S.-H.; Miall, R.C.; Holland, P.; Galea, J.M. A methodological framework to assess the accuracy of virtual reality hand-tracking systems: A case study with the Meta Quest 2. Behav. Res. Methods 2023, 56, 1052–1063. [Google Scholar] [CrossRef]
Jost, T.A.; Nelson, B.; Rylander, J. Quantitative analysis of the Oculus Rift S in controlled movement. Disabil. Rehabil. Assist. Technol. 2021, 16, 632–636. [Google Scholar] [CrossRef]
Ikbal, M.S.; Ramadoss, V.; Zoppi, M. Dynamic Pose Tracking Performance Evaluation of HTC Vive Virtual Reality System. IEEE Access 2021, 9, 3798–3815. [Google Scholar] [CrossRef]
Borrego, A.; Latorre, J.; Alcañiz, M.; Llorens, R. Comparison of Oculus Rift and HTC Vive: Feasibility for Virtual Reality-Based Exploration, Navigation, Exergaming, and Rehabilitation. Games Health J. 2018, 7, 151–156. [Google Scholar] [CrossRef]
Chaconas, N.; Hollerer, T. An Evaluation of Bimanual Gestures on the Microsoft HoloLens. In Proceedings of the 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Tuebingen/Reutlingen, Germany, 18–22 March 2018; pp. 1–8. [Google Scholar]
Schneider, D.; Biener, V.; Otte, A.; Gesslein, T.; Gagel, P.; Campos, C.; Pucihar, K.Č.; Kljun, M.; Ofek, E.; Pahud, M.; et al. Accuracy Evaluation of Touch Tasks in Commodity Virtual and Augmented Reality Head-Mounted Displays. In Proceedings of the 2021 ACM Symposium on Spatial User Interaction, Virtual, 9–10 November 2021; Association for Computing Machinery: New York, NY, USA, 2021. Art. No. 7. pp. 1–11. [Google Scholar] [CrossRef]
Balakrishnan, P.; Guo, H.-J. HoloLens 2 Technical Evaluation as Mixed Reality Guide; Springer: Cham, Switzerland, 2024; pp. 145–165. [Google Scholar] [CrossRef]
Cook, J.R.; Baker, N.A.; Cham, R.; Hale, E.; Redfern, M.S. Measurements of Wrist and Finger Postures: A Comparison of Goniometric and Motion Capture Techniques. J. Appl. Biomech. 2007, 23, 70–78. [Google Scholar] [CrossRef]
Yang, Y.; Fermuller, C.; Li, Y.; Aloimonos, Y. Grasp type revisited: A modern perspective on a classical feature for vision. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 400–408. [Google Scholar] [CrossRef]
Hamon, P.; Chablat, D.; Plestan, F. A New Robotic Hand Based on the Design of Fingers with Spatial Motions. In Volume 8B, Proceedings of the 45th Mechanisms and Robotics Conference (MR), Virtual, 17–19 August 2021; American Society of Mechanical Engineers: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
The MathWorks Inc. MATLAB Version: 24.2.0 (r2024b); The MathWorks Inc.: Natick, MA, USA, 2024. [Google Scholar]
Carpinella, I.; Mazzoleni, P.; Rabuffetti, M.; Thorsen, R.; Ferrarin, M. Experimental protocol for the kinematic analysis of the hand: Definition and repeatability. Gait Posture 2006, 23, 445–454. [Google Scholar] [CrossRef] [PubMed]
Besl, P.J.; McKay, N.D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
Wu, G.; Van Der Helm, F.C.; Veeger, H.E.J.; Makhsous, M.; Van Roy, P.; Anglin, C.; Nagels, J.; Karduna, A.R.; McQuade, K.; Wang, X.; et al. ISB recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion—Part II: Shoulder, elbow, wrist and hand. J. Biomech. 2005, 38, 981–992. [Google Scholar] [CrossRef]
Cappozzo, A.; Cappello, A.; Croce, U.; Pensalfini, F. Surface-marker cluster design criteria for 3-D bone movement reconstruction. IEEE Trans. Biomed. Eng. 1997, 44, 1165–1174. [Google Scholar] [CrossRef]
Lilliefors, H.W. On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown. J. Am. Stat. Assoc. 1967, 62, 399–402. [Google Scholar] [CrossRef]
Pearson, K. Note on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 1895, 58, 240–242. [Google Scholar] [CrossRef]
Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
Lang, C.E.; DeJong, S.L.; Beebe, J.A. Recovery of Thumb and Finger Extension and Its Relation to Grasp Performance After Stroke. J. Neurophysiol. 2009, 102, 451–459. [Google Scholar] [CrossRef]
Gracia-Ibáñez, V.; Agost, M.-J.; Bayarri-Porcar, V.; Granell, P.; Vergara, M.; Sancho-Bru, J.L. Hand kinematics in osteoarthritis patients while performing functional activities. Disabil. Rehabil. 2022, 45, 1124–1130. [Google Scholar] [CrossRef]
Shivers, C.L.; Mirka, G.A.; Kaber, D.B. Effect of Grip Span on Lateral Pinch Grip Strength. Hum. Factors J. Hum. Factors Ergon. Soc. 2002, 44, 569–577. [Google Scholar] [CrossRef]
Tepper, O.M.; Rudy, H.L.; Lefkowitz, A.; Weimer, K.A.; Marks, S.M.; Stern, C.S.; Garfein, E.S. Mixed Reality with HoloLens: Where Virtual Reality Meets Augmented Reality in the Operating Room. Plast. Reconstr. Surg. 2017, 140, 1066–1070. [Google Scholar] [CrossRef] [PubMed]
Mansouri, H.; Elkhanchouli, K.; Elghouate, N.; Bencherqui, A.; Tahiri, M.A.; Karmouni, H.; Sayyouri, M.; Moustabchir, H.; Askar, S.; Abouhawwash, M. A modified black-winged kite optimizer based on chaotic maps for global optimization of real-world applications. Knowledge-Based Syst. 2025, 318, 113558. [Google Scholar] [CrossRef]
El Ghouate, N.; Bencherqui, A.; Mansouri, H.; El Maloufy, A.; Tahiri, M.A.; Karmouni, H.; Sayyouri, M.; Askar, S.S.; Abouhawwash, M. Improving the Kepler optimization algorithm with chaotic maps: Comprehensive performance evaluation and engineering applications. Artif. Intell. Rev. 2024, 57, 313. [Google Scholar] [CrossRef]
Mchichou, I.; Ouchker, R.; Tahiri, M.A.; Amakdouf, H.; Jamil, M.O.; Qjidaa, H. Enhanced Hybrid Approach for Real-Time Emotion Recognition Using Canny Edge Detection and Deep Learning on Embedded Systems. In Proceedings of the 2024 3rd International Conference on Embedded Systems and Artificial Intelligence (ESAI), Fez, Morocco, 19–20 December 2024; pp. 1–8. [Google Scholar] [CrossRef]
Bertolasi, J. Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment. [CrossRef]

Figure 1. Tracking points and landmarks for marker placement in (a) Task 1 (2D pattern tracing with a finger in 3D space), (b) Task 2 (grasping common objects), and Task 3 (lateral pinching of an object with varying sizes). Each panel includes both a schematic representation of marker locations and a corresponding real-world photo of the actual placement on the hand. For Task 1, the RP marker was adopted only to guarantee a good reconstruction for the VM, but was not considered in the analysis. For Task 2, all the markers were applied on the hand, while for Task 3, only the thumb and index finger were tracked, so we did not position markers on the middle finger. (c) Joint angles calculated from the distal, proximal, and metacarpophalangeal joints.

Figure 2. (a) Hand grasping of different common objects during Task 2. The hand is marked with reflective markers. Objects include a cube, a plug, a cylinder, and two different spheres, involving different grasping techniques. The colors of the objects are arbitrary and have no experimental significance. The cylinder is transparent to allow visibility for HL2 of anatomical markers during grasping. (b) Top-down view of laterally pinching a rectangular object of varying thickness with both hands during Task 3, though only one hand is tracked. The object’s thickness increases by 2 cm each time, with a new segment stacked. The objects are also shown in different colors, which have no experimental meaning.

Figure 3. One-participant comparison between the trajectories recorded by the VM (black) and HL2 (red) tracking systems for the RIF3 (index tip) marker. Both plots include labeled spatial axes (X, Y, Z in cm) to clarify orientation and position. The left graph displays the unaligned trajectories, showing an apparent spatial misalignment between the two systems. The right shows the same trajectories after applying the transformation (

R

and

T

) obtained with the ICP algorithm, demonstrating an improved correspondence between the VM and HL2 traces. The enhanced alignment in the fitted trajectories supports the high correlation values reported in the results.

Figure 3. One-participant comparison between the trajectories recorded by the VM (black) and HL2 (red) tracking systems for the RIF3 (index tip) marker. Both plots include labeled spatial axes (X, Y, Z in cm) to clarify orientation and position. The left graph displays the unaligned trajectories, showing an apparent spatial misalignment between the two systems. The right shows the same trajectories after applying the transformation (

R

and

T

) obtained with the ICP algorithm, demonstrating an improved correspondence between the VM and HL2 traces. The enhanced alignment in the fitted trajectories supports the high correlation values reported in the results.

Figure 4. Three-dimensional reconstruction for a representative subject of marker positions and connecting segments representing the three fingers involved in grasping different objects: (a) cylinder, (b) big sphere, (c) small sphere, (d) cube, and (e) plug. Each panel shows both the reconstruction obtained from HL2 (in red) and the corresponding reconstruction from the VM (in black). The two reconstructions are not spatially aligned, as HL2 and VM operate in different reference frames. As a result, the point clouds may appear rotated or translated with respect to one another. However, since the analysis focused solely on angular measurements, spatial alignment was not necessary.

Figure 5. Comparison of pinch span across object thicknesses for HL2 (dark blue) and VM (orange) systems. Bars represent mean ± standard deviation.

Table 1. Distance error between HL2 and VM tracking data. The values represent the Euclidean distances between the trajectories recorded by the VM and HL2 systems (mean ± standard deviation), with relative variance and 95% confidence interval. A lower value indicates a higher degree of accuracy in HL2 tracking, whereas larger values suggest greater deviations from the VM reference system. All fingers showed low variance and narrow confidence intervals, indicating stable performance across participants.

Finger	Mean ± SD (mm)	Variance (mm²)	95% CI (mm)
RTH3 (Thumb)	2.77 ± 0.63	0.39	[2.32, 3.22]
RIF3 (Index)	3.94 ± 0.69	0.48	[3.45, 4.44]
RTF3 (Middle)	2.86 ± 0.61	0.37	[2.43, 3.30]
RRF4 (Ring)	2.87 ± 0.59	0.35	[2.44, 3.29]
RPF3 (Little)	2.88 ± 0.58	0.34	[2.46, 3.29]

Table 2. Mean angular joint errors calculated between HL and VM for each finger and object (mean ± standard deviation in degrees).

	Thumb		Index			Middle
	IP	MCP	DIP	PIP	MCP	DIP	PIP	MCP
Small Sphere	21.87° ± 0.62	7.39° ± 0.35	0.92° ± 0.39	0.65° ± 0.42	3.15° ± 0.36	9.91° ± 1.45	4.47° ± 0.51	1.46° ± 0.53
Big Sphere	18.11° ± 0.58	0.28° ± 0.56	2.64° ± 0.52	0.79° ± 0.54	1.37° ± 0.29	4.43° ± 0.48	1.38° ± 0.50	2.39° ± 0.48
Cylinder	7.18° ± 0.31	8.03° ± 0.22	2.72° ± 0.31	1.14° ± 0.26	0.53° ± 0.18	5.35° ± 0.28	0.68° ± 0.29	3.91° ± 0.35
Plug	14.46° ± 0.53	2.96° ± 0.47	7.72° ± 0.57	2.44° ± 0.44	5.88° ± 0.29	-	-	-
Cube	12.39° ± 0.29	2.47° ± 0.31	14.61° ± 0.41	1.98° ± 0.19	6.44° ± 0.31	-	-	-

Table 3. Mean pinch span error (±standard deviation), variance, and 95% confidence interval between HL2 and VM tracking data across different object thicknesses.

Object Thickness	Mean ± SD (mm)	Variance (mm²)	95% CI (mm)
2 cm	0.23 ± 0.09	0.01	[0.0931, 0.3723]
4 cm	0.40 ± 0.19	0.04	[0.1015, 0.6933]
6 cm	0.40 ± 0.35	0.12	[−0.1493, 0.9533]
8 cm	0.28 ± 0.30	0.09	[−0.2017, 0.7523]
10 cm	0.004 ± 0.33	0.11	[−0.5200, 0.5275]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bertolasi, J.; Garcia-Hernandez, N.V.; Memeo, M.; Guarischi, M.; Gori, M. Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment. Virtual Worlds 2025, 4, 31. https://doi.org/10.3390/virtualworlds4030031

AMA Style

Bertolasi J, Garcia-Hernandez NV, Memeo M, Guarischi M, Gori M. Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment. Virtual Worlds. 2025; 4(3):31. https://doi.org/10.3390/virtualworlds4030031

Chicago/Turabian Style

Bertolasi, Jessica, Nadia Vanessa Garcia-Hernandez, Mariacarla Memeo, Marta Guarischi, and Monica Gori. 2025. "Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment" Virtual Worlds 4, no. 3: 31. https://doi.org/10.3390/virtualworlds4030031

APA Style

Bertolasi, J., Garcia-Hernandez, N. V., Memeo, M., Guarischi, M., & Gori, M. (2025). Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment. Virtual Worlds, 4(3), 31. https://doi.org/10.3390/virtualworlds4030031

Article Menu

Evaluation of HoloLens 2 for Hand Tracking and Kinematic Features Assessment

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Participants

3.2. Microsoft HoloLens 2 System

3.3. VICON Nexus Motion Capture System

3.4. Tasks and Experimental Procedures

3.5. Data Analysis

3.5.1. Data Preprocessing and Alignment

3.5.2. Task 1 Data Analysis

3.5.3. Task 2 Data Analysis

3.5.4. Task 3 Data Analysis

3.5.5. Statistical Analysis

4. Results

4.1. Task 1: Finger Tracing of a 2D Pattern in 3D Space

4.2. Task 2: Grasping Common Objects

4.3. Task 3: Lateral Pinching of Objects

5. Discussion

6. Conclusions and Future Directions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI