Wearable Augmented Reality Platform for Aiding Complex 3D Trajectory Tracing

Augmented reality (AR) Head-Mounted Displays (HMDs) are emerging as the most efficient output medium to support manual tasks performed under direct vision. Despite that, technological and human-factor limitations still hinder their routine use for aiding high-precision manual tasks in the peripersonal space. To overcome such limitations, in this work, we show the results of a user study aimed to validate qualitatively and quantitatively a recently developed AR platform specifically conceived for guiding complex 3D trajectory tracing tasks. The AR platform comprises a new-concept AR video see-through (VST) HMD and a dedicated software framework for the effective deployment of the AR application. In the experiments, the subjects were asked to perform 3D trajectory tracing tasks on 3D-printed replica of planar structures or more elaborated bony anatomies. The accuracy of the trajectories traced by the subjects was evaluated by using templates designed ad hoc to match the surface of the phantoms. The quantitative results suggest that the AR platform could be used to guide high-precision tasks: on average more than 94% of the traced trajectories stayed within an error margin lower than 1 mm. The results confirm that the proposed AR platform will boost the profitable adoption of AR HMDs to guide high precision manual tasks in the peripersonal space.


Introduction
Visual Augmented Reality (AR) technology supplements the user's perception of the surrounding environment by overlaying contextually relevant computer-generated elements on it so that the real world and the digital elements appear to coexist [1,2]. Particularly in visual AR, the locational coherence between the real and the virtual elements is paramount to supplementing the user's perception of and interaction with the surrounding space [3]. Published research provides glimpses of how AR could dramatically change the way we learn and work, allowing the development of new training paradigms and efficient means to assist/guide manual tasks.
AR has proven to be a key asset and an enabling technology within the fourth industrial revolution (i.e., Industry 4.0) [4]. A large number of successful demonstrations have been reported in maintenance

Materials and Methods
This section provides a detailed description of the hardware and software components. All components are depicted in Figure 1.

AR Application
Tracking Rendering

Augmented Scene Frame Acquisition
Stereo Camera Frames 2560x720 @ 60 fps

3D Model Reference System
Markers for Inside-Out Tracking Figure 1. Overview of the hardware and software components of the wearable Augmented Reality (AR) platform for aiding high-precision manual tasks in the peripersonal space. The AR framework runs on a single workstation (i.e., a laptop) and can implement both the optical see-through (OST) and the video see-through (VST) mechanisms.

Custom-Made Head-Mounted Display
The custom-made hybrid video-optical see-through HMD fulfills strict technological and human-factor requirements towards the realization of a functional and reliable aiding tool for high-precision manual tasks. The HMD was assembled by re-working and re-engineering a commercial OST visor (ARS.30 by Trivisio [25]).
As described in [23], the key features of the HMD were established with the aim of mitigating relevant perceptual conflicts typical of commercial AR headsets for close-up activities.
Notably, the collimation optics of the display were re-engineered to offer a focal length of about 45 cm, which constitutes, when used for close-up works, a defining and original feature to mitigate the vergence-accommodation conflict and the focus rivalry. The HMD was incorporated in a 3D printed plastic frame together with a pair of Liquid Crystal shutters and a pair of front-facing USB 3.0 RGB cameras [26]. The stereo camera pair is composed by two LI-OV4689 cameras by Leopard Imaging, both equipped with 1/3" OmniVision CMOS 4M pixels sensor (pixel size of 2 µm). The cameras were mounted with an anthropometric interaxial distance (∼6.3 cm) and with a fixed convergence angle. In this way, we could ensure sufficient stereo overlap at about 40 cm (i.e., an average working distance for manual tasks). Both the cameras are equipped with an M12 lens support whose focal length (f = 6 mm) was chosen to compensate for the zoom factor due to the eye-to-camera parallax along the display optical axis (at ≈40 cm).
The computing unit is a Laptop PC with the following specifications: Intel Core i7-8750H CPU @ 2.20 GHz with 12 cores and 16 GB RAM (Intel Corp., Santa Clara, CA, USA). Graphic card processing unit (GPU) is a Nvidia GeForce RTX 2060 (6GB) with 1920 CUDA Cores (Nvidia Corp., Santa Clara, CA, USA).

AR Software Framework
The software framework is conceived for the deployment of VST and OST AR applications able to support in situ visualization of medical imaging data and specifically suited for stereoscopic AR headsets; the key function of the software, under VST modality, is to process and augment the images grabbed by the stereo pair of RGB cameras before they are rendered to the two microdisplays of the visor. The grabbed frames of the real scene are processed to perform a marker-based optical tracking, which requires the identification of the 3D position of the markers both in the target reference frame and camera reference frame.
The augmented scene is generated by merging the real camera frames with the virtual content (e.g., in our application, the planned trajectories) ensuring the proper locational realism. To accomplish this task, the projection parameters of the virtual viewpoints are set equal to those of the real cameras and the pose of the tracked object defines the pose of the virtual content in the scene [27].
The main features of the software framework can be summarized as follows [23]: • The software is capable of supporting the deployment of AR applications on different commercial and custom-made headsets.

•
The CUDA-based architecture of the software framework makes it computationally efficient.

•
The software provides in situ visualization of task-oriented digital content.

•
The software framework is highly configurable in terms of rendering and tracking capabilities.

•
The software can deliver both optical and video see-through-based augmentations.

•
The software features a robust optical self-tracking mechanism (i.e., inside-out tracking) that relies on the stereo localization of a set of spherical markers.

•
The AR application achieves an average frame rate of ≈ 30 fps.

AR Task: Design of Virtual and Real Content
Three trajectories with different degrees of complexity were implemented to test the system accuracy:

2.
A 3D curve (130 mm in length) describing a closed trajectory on a convex surface (T2).

3.
A 3D curve (223 mm in length) describing a closed trajectory consisting of a series of four curves on concave and convex surfaces (T3).
T1 was designed to test the system on a simple planar phantom simulating, for instance, an industrial manufacturing process that requires cutting flat parts to specific shapes; T2 and T3 were drawn on two anatomical surfaces (i.e., a portion of skull and of acetabulum), and they simulate complex surgical incision tasks.
Creo Parametric software was used to design the three trajectories ( Figure 2). T1 was drawn on the top side of rectangular plate (size 10 × 5 mm); T2 and T3 were modeled with spline curves by selecting 3D points on a portion of the selected anatomical surface (dark grey portions in Figure 2). The 3D model of the skull and of the acetabulum were generated from real computed tomography datasets, segmented with a semi-automatic segmentation pipeline [28] to extract the cranial and the acetabular bones. A 3D printer (Dimension Elite) was used to turn the phantom virtual models into tangible replicas made of acrylonitrile butadiene styrene (ABS).
The three trajectories were represented with dashed curves (0.5 mm thickness) and saved as .wrl models to be imported by the software framework and displayed as the virtual content of the AR scene.
As previously mentioned, the accurate AR overlay of the virtual trajectory to the physical 3D-printed models is achieved by means of a tracking modality that relies on the real-time localization of three reference markers; for this reason, three spherical markers (11 mm in diameter) were embedded in the CAD model of phantoms as shown in Figure 2. The markers were dyed in fluorescent green, to boost the response of the camera sensor and improve the robustness of the blob detection under uncontrolled lighting conditions [23,29].

Subjects
Ten subjects, 3 males and 7 females, with normal visual acuity or corrected-to-normal visual acuity (with the aid of contact lenses) were recruited from technical employees and University students. Table 1 reports the demographics of the participants included in this study, which were aged between 42 and 25.
Participants were asked to rate their experience with AR technologies, HMDs, and VST-HMDs to get the opportunity to correlate these with their performance and subjective evaluation of the AR platform.

Protocol of the Study
The experimental setting is shown in Figure 3. During the performance of the task, each subject was seated in a chair adjustable in height, at a comfortable distance from the three phantoms and he/she was free to move freely. The subject was asked to perform the "trajectory tracing" task three times for each trajectory, and to report any perceptible spatial jitter or drift from the AR content. The trajectories were administered in random order.
The accuracy of the trajectories traced by the subjects was evaluated by using templates designed ad hoc to match the surface of the phantoms (Figure 4). The templates were provided with inspection windows shaped as the ideal trajectories (dotted blue line in Figure 4), and with engagement holes to ensure a unique and stable placement of the template over the corresponding phantoms.
Three templates were designed for each phantom, with different wide inspection windows, to evaluate three different levels of accuracy: given that the virtual trajectory, as well as the pencil line, have a 0.5 mm thickness, inspection windows measuring 1.5 mm, 2.5 mm, and 4.5 mm in width were designed to test a 0.5 mm, 1 mm, and 2 mm of accuracy level, respectively. We considered as successful only those trials in which the accuracy was ≤2 mm. Indeed, 1-2 mm accuracy is regarded as an acceptable range in many complex manual tasks such as in the context of image-guided surgery [30]. When the traced trajectory was outside the template with thicker inspection window (i.e., the 4.5 mm window), the test was considered as failed.
In the experiments, stripes of graph paper were used to estimate the cumulative length of the traced trajectory within the inspection windows ( Figure 5). In this way, we could estimate the percentage of the traced trajectory staying within the specific accuracy level dictated by the template (Figure 5).
A 0.5 mm pencil was used to draw the perceived trajectory on a masking tape applied over the phantom surface; the tape was removed and replaced at the end of each trial after the evaluation of the user performance.
Subjects were instructed that the primary goal of the test was to accurately trace the trajectories as indicated by the AR guidance; time to completion in tracing the trajectory was recorded using a stopwatch. At the end of the experimental session, subjects were administered a 5-point Likert questionnaire to qualitatively evaluate the AR experience (Table 2).

Statistical Analysis
The SPSS Statistics Base 19 software was used to perform statistical analysis of data. Results of the Likert questionnaire were summarized in terms of median with dispersion measured by interquartile range (i.e., iqr = 25 • ∼75 • ), while quantitative results were reported in terms of mean, and standard deviation of the accuracy in "trajectory tracing", and normalized completion time (i.e., the average velocity to complete the task).
The Kruskal-Wallis test was performed to compare qualitative and quantitative data among groups with different levels of "Experience with AR"/"Experience with HMDs"/"Experience with VST-HMDs". A p-value < 0.05 was considered statistically significant.

Qualitative Evaluation
Results of the Likert Questionnaire are reported in Table 2. Overall, the participants agreed/strongly agreed with all the statements addressing the ergonomics, the trustability of the proposed AR modality to successfully guide manual task, and confidence on accurately performing the tasks guided by the AR platform. For all questionnaire items, except for item 2 "I perceived VR trajectory as clear and sharp", and item 7 "The latency of the camera mediated view does not compromise the task execution", there was no statistically significant difference (p > 0.05) in answering tendencies among subjects with different levels of experience with VR, HMDs, and VST-HMDs (see Table 3 for p-values). For item 2 and item 7, the agreement level varied according to the expertise of the subject: for item 2, the less experienced participants, namely with no or limited experience with AR and HMDs, and, for item 7, the participants with limited experience with HMDs and VST-HMDs expressed their agreement, whereas the remaining subjects (those with more experience) strongly agreed with both items.  Figure 6 shows an example of traced trajectory for the T3 task. The zoomed detail of the image shows the traced trajectory within the inspection windows of the 0.5 mm accuracy level template. Table 4 summarizes mean and standard deviation values of the accuracy results: for each trajectory (T1, T2, and T3), the subject performance is reported as a percentage of the length of traced line staying within the 0.5 mm and 1 mm accuracy levels. The table reports the success ratio in completing the tasks without committing errors greater than 2 mm: all the subjects successfully completed all the T1 tasks (30/30 success ratio), 9 out of 10 subjects successfully completed all the T2 tasks (29/30 success ratio), and 7 out of 10 subjects successfully completed all the T3 tasks (24/30 success ratio). Overall, all subjects were able to successfully trace the trajectories in at least one of the three trials. Notably, in unsuccessful trials, more than 85% of the traced line was within the 1 mm accuracy level (mean 92 ± 6%). Table 5 reports performance results in terms of normalized completion time (i.e., the average velocities to complete each task), and shows that on average subjects were slower in completing the T3 trajectory. Mean and standard deviation of the duration of the experiments are reported in the last two columns. Finally, as shown in Table 3, the Kruskal-Wallis test revealed that there were no significant differences (p > 0.05) in accuracy performances and normalized completion time between participants with different levels of experience with VR, HMDs, and VST-HMDs.

Discussion and Conclusions
Recent literature shows that HMDs are emerging as the most efficient output medium to support complex manual tasks performed under direct vision. Despite that, technological and human-factor limitations still hinder their routine use for aiding high-precision tasks in the peripersonal space. In this work, we show the results of a user study aimed to validate a new wearable VST AR platform for guiding complex 3D trajectory tracing tasks.
The quantitative results suggest that the AR platform could be used to guide high-precision tasks: on average, more than 94% of the traced trajectories stayed within an error margin lower than 1 mm and more than 82% of the traced trajectories stayed within an error margin lower than 0.5 mm. Only in 5% of the trials did the users fail in tracing the line having a margin error greater than 2 mm. We can argue that such failures may be due to different reasons, not all of them owing to the AR platform per se but also to the user's ability. As for the possible source of errors not strictly associated with the AR platform, we noticed that most inaccuracies happened around discontinuities of the phantom surfaces. This may be related not only on a sub-optimal perception of relative depths when viewing through the VST HMD, but also to a more practical difficulty for the user to ensure a firm stroke while following the trajectory over such discontinuities. This is also confirmed by the generally lower velocities experimented in completing the T3 trajectory that is the one on a non-uniform surface.
In this study, the main criterion adopted to select the participants was to include subjects with different levels of experience with VR, HMDs and VST-HMDs, as the 3D trajectory tracing task was general purpose. To apply the proposed AR platform to a more specific industrial or medical application, usability tests with the final users should be performed after having defined, for each specific trajectory tracing task, the most appropriate strategy to track the target 3D surface AR registration strategy. In the field of image-guided surgery, we are currently designing the most appropriate tracking/registration strategy to perform AR-guided orthognathic surgery. In order to properly register the planned 3D osteotomy to the actual patient in the surgical room, we have adopted an innovative patient-specific occlusal splint that embeds the three spherical markers for the inside-out tracking mechanism. For this specific application, we are planning to perform an in vitro study recruiting several maxillofacial surgeons with different level of expertise in orthognathic surgery to test, on patient-specific replicas of the skull, an AR-guided osteotomy of the maxillary bone.
As regards the display (i.e., photon-to-photon) latency caused by the VST mechanism, we have a direct measure of the frame rate of the tracking-rendering mechanism (i.e., ≈ 30 f ps yielding a latency of ≈ 33 ms). For a thorough evaluation of the perceived latency, we must also consider the tracking camera frame rate (i.e., in our system, the camera frame rate is of 60 Hz that produces a latency of 17 ms). Finally, we must also consider the latency caused by the OLED display that contributes with other 17 ms (our HMD runs at 60 Hz). These considerations lead to an overall estimation of the photon-to-photon latency of at least 33 + 17 + 17 = 67 ms. Such latency is undoubtedly perceivable by the human vision system. In this study, only a qualitative assessment of latency and spatial jitter/drift due to inaccuracies in the inside-out tracking and to the VST mechanism was performed. However, considering the results obtained with this and previous studies [23,31,32], we can reasonably argue that the proposed wearable VST approach is adequate in ensuring a stable VST AR guidance for manual tasks that demand high accuracy and for which the subject can compensate for display latency by working more slowly.
Even if these results should be confirmed considering a larger number of subjects sample from end users for each specific application, we believe that the proposed wearable AR platform will pave the way for the profitable use of AR HMDs to guide high precision manual tasks in the peripersonal space.

Acknowledgments:
The authors would like to thank S. Mascioli and C. Freschi for their support on the preliminary selection of the most appropriate software libraries and tools and R. D'Amato for his support in designing and assembling the AR visor.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: HMD Head-mounted display AR Augmented reality VST Video see-through OST Optical see-through