Mixed-Reality (MR) Enhanced Human–Robot Collaboration: Communicating Robot Intentions to Humans

Zhang, Kaiyuan; Yan, Yuchen; Jia, Yunyi

doi:10.3390/robotics14100133

Open AccessArticle

Mixed-Reality (MR) Enhanced Human–Robot Collaboration: Communicating Robot Intentions to Humans

by

Kaiyuan Zhang

,

Yuchen Yan

and

Yunyi Jia

^*

Department of Automotive Engineering, International Center for Automotive Research at Clemson University, Greenville, SC 29607, USA

^*

Author to whom correspondence should be addressed.

Robotics 2025, 14(10), 133; https://doi.org/10.3390/robotics14100133

Submission received: 30 July 2025 / Revised: 18 September 2025 / Accepted: 20 September 2025 / Published: 24 September 2025

(This article belongs to the Section Humanoid and Human Robotics)

Download

Browse Figures

Versions Notes

Abstract

Advancements in collaborative robotics have significantly enhanced the potential for human–robot collaboration in manufacturing. To achieve efficient and user-friendly collaboration, prior research has predominantly focused on the robot’s perspective, including aspects such as planning, control, and adaptation. A key approach in this domain has been the recognition of human intentions to inform robot actions. However, true collaboration necessitates bidirectional communication, where both human and robot are aware of each other’s intentions. A lack of transparency in robot actions can lead to discomfort, reduced safety, and inefficiencies in the collaborative process. This study investigates the communication of robot intentions to human operators through mixed reality (MR) and evaluates its impact on human–robot collaboration. A laboratory-based physical human–robot assembly framework is developed, integrating multiple MR-based intention communication strategies. Experimental evaluations are conducted to assess the effectiveness of these strategies. The results demonstrate that conveying robot intentions via MR enhances work efficiency, trust, and user comfort in human–robot collaborative manufacturing. Furthermore, a comparative analysis of different MR-based communication designs provides insights into the optimal approaches for improving collaboration quality.

Keywords:

human–robot collaboration; mixed reality; trust and comfort

1. Introduction

Human–robot collaboration (HRC) is defined as “the state of a purposely designed robotic system and operator working in a collaborative workspace” [1] and has garnered significant attention in recent years. Traditional industrial robots, which remain prevalent in modern production lines, are typically enclosed within protective fencing and equipped with extensive safety peripherals; while these measures ensure worker safety, they are also costly, inflexible, and cumbersome, leading to inefficiencies in manufacturing processes [2]. The growing demand for more intelligent and adaptable solutions has driven the development and adoption of human–robot collaboration and collaborative robots (COBOTs), offering a transformative approach to industrial automation.

HRC is an interdisciplinary field that aims to integrate human and robotic capabilities to achieve shared objectives [3]. COBOTs present promising solutions for complex hybrid assembly tasks, particularly within the context of smart manufacturing [4]. By leveraging human–robot interactions, tasks can be allocated based on the respective strengths of humans and robots, optimizing efficiency and flexibility [5,6]. Despite the growing research interest in COBOTs and human–robot interaction, their industrial adoption remains limited compared to conventional robotic systems. A key barrier to widespread implementation is user acceptance, which is strongly influenced by the perceived comfort of human workers [7]. The level of human comfort not only impacts the acceptance of COBOTs but also plays a crucial role in overall manufacturing efficiency [8].

Extensive research has been conducted in the field of human–robot collaboration (HRC), with a significant focus on predicting human intentions. However, the predominant approach has been to utilize these predictions solely to guide robot actions, rather than facilitating mutual adaptation between humans and robots. The existing research in this area can be categorized into three primary groups of focus: gesture-based recognition, human body action analysis, and multimodal intention prediction. Firstly, gesture-based interaction is a key focus in human–robot interaction (HRI) research. Chan et al. [9] combined augmented reality with eye gaze input to create a joint-action framework where humans convey intent by natural gaze and gesture cues. This approach improved fluency in collaboration, demonstrating how combining AR with gaze tracking enables robots to better anticipate human actions. The ARROCH framework [10] introduced an AR interface for bidirectional communication where humans used gestures and visual cues to signal their intended actions. By linking gesture recognition with AR overlays, the system improved transparency and coordination between humans and robots. Kemp et al. [11] further advanced this approach by training robots to recognize and respond to human gestures using Support Vector Regression (SVR). Their experiments demonstrated robots’ ability to take and give objects to humans based on learned gesture patterns. Secondly, in human body action analysis, the focus has been on leveraging full-body or contextual motion to predict human intent. Moore et al. [12] developed an AR-based method to visualize shared affordances, highlighting regions in the environment where human actions are likely to occur. This mapping of body actions to actionable zones enabled the robot to proactively coordinate with human partners. Macciò et al. [13] proposed anticipatory MR visualizations that reveal forthcoming robot motion trajectories to human collaborators. These visualizations enhanced situational awareness and reduced coordination delays by making robot actions more predictable. The movement trajectories of human body joints contain rich 3D action information, which can be analyzed to infer human intentions [14,15]. Thirdly, in multimodal fusion-based HRI, various perception models are integrated to predict human intent more accurately. For example, Christophe et al. [16] utilized multimodal sensory inputs to train a domestic assistant robot capable of responding to human actions and engaging in verbal communication. Another significant area of research is robot imitation of human actions. Dillmann [17] developed a method to capture human demonstrations—such as hand positions and object movements—and transfer them to robotic systems for task replication. Beyond physical imitation, some studies have explored emotional expression in robots to enhance collaborative interactions. Macciò et al. [18] presented RICO-MR, an open-source architecture that integrates multimodal inputs such as speech, gestures, and spatial interaction for intent communication. The framework offers a flexible and extensible way to transmit human intent, supporting reproducibility and further research. Lunding et al. [19] introduced ARTHUR, an MR authoring tool designed to support customizable HRC scenarios with multimodal feedback channels. By incorporating visual, auditory, and haptic cues, the system allows human intent to be conveyed more clearly and intuitively. Despite these advancements, much of the research in HRC remains unidirectional, focusing primarily on adapting robot behavior based on human actions. A more effective collaboration framework should emphasize bidirectional communication, where both human and robot intentions are transparently conveyed to enhance efficiency, safety, and user acceptance in human–robot interactions.

Beyond the immediate scope of MR-based intent communication, this line of work also relates to the broader field of human–machine interaction (HMI). Prior research on HMI has examined both low-level control interfaces and high-level task coordination frameworks, emphasizing transparency, predictability, and shared situational awareness between humans and machines. Positioning our study within this broader perspective highlights the relevance of MR guidance not only as a visualization tool, but also as a means to strengthen mutual understanding and coordination in collaborative systems.

Despite significant advancements in human–robot collaboration (HRC), most research has focused on the robot’s perspective, particularly in recognizing human intentions to guide robot actions. However, effective collaboration requires bidirectional communication, where both human and robot understand each other’s intentions. A lack of transparency in robot behavior can lead to uncertainty, discomfort, reduced safety, and inefficiencies. This study investigates how mixed reality (MR) can address this limitation by visually conveying robot intentions to human operators. MR technologies have been increasingly integrated into manufacturing, including HRC applications [20,21,22,23]; while mixed reality (MR) is commonly used for visual assistance and task guidance, this study employs MR as a direct communication tool for robot intent. A series of MR-based communication strategies were designed and tested through real-time experimental data collection and post-experiment surveys, evaluating their impact on task efficiency, safety, and user experience. The findings provide insights into the effectiveness of MR in enhancing human–robot collaboration in manufacturing settings.

2. Enhanced Human–Robot Collaborative Manufacturing Using MR

2.1. The Framework of the MR-Based Human–Robot Collaboration

The MR-based experimental framework, as illustrated in Figure 1, integrates an ABB Yumi dual-arm collaborative robot, controlled via the Robot Operating System (ROS), to facilitate toy assembly tasks with human participants. A Microsoft HoloLens mixed-reality device [24] projects holographic visualizations, enabling participants to understand the robot’s intended actions. The MR environment is developed in Unity [25], with a UDP-based WebSocket communication system facilitating bidirectional motion and voice command exchange between the HoloLens and ROS. The experimental platform setup, shown in Figure 2, positions the Yumi robot at the rear of the workbench, with participants standing in front, aligning their body center with the mid-line of the cube group. To accurately track human movements, a Vicon motion tracking system, consisting of six Vero cameras and two sets of tracking plates with spherical markers, records real-time hand and elbow movement trajectories. These trajectories, along with the robot gripper’s motion, are analyzed to compute human–robot separation distances, a critical metric for evaluating collaboration efficiency and safety.

2.1.1. Mixed-Reality Device and Application Development

A key component of the experimental platform is the Microsoft HoloLens, which serves as the primary interface for displaying holographic visualizations of the robot’s intended motions. Throughout the experiment, all robot action cues are projected via HoloLens to assist participants in understanding the robot’s next steps. Participants can interact with the system using virtual buttons on the user interface (UI) to pause or resume the visual guidance programs. Additionally, voice commands enable participants to dynamically adjust the hologram display strategy, allowing for a more flexible experiment flow and more intuitive user experience.

The mixed reality environment is developed within Unity Engine using its Mixed Reality Toolkit (MRTK), which provides an efficient framework for creating MR applications optimized for deployment on HoloLens. The MR scene, as illustrated in Figure 3, comprises detailed 3D models, including two toy planes (featuring components such as the fuselage, wings, tail, and wheels), the Yumi collaborative robot, and several interactive virtual buttons programmed for user interaction. To ensure precise spatial alignment, Blender 3.3 LTS [26] was used to design and refine the 3D models, with necessary adjustments made to compensate for coordinate offsets between Unity and Blender. One thing noteworthy is that the headset views provided in Figure 3 are direct screenshots captured on the HoloLens during operation (not simulated renders), shown to illustrate what the user sees for each MR strategy.

The experiments were conducted using a Microsoft HoloLens 2 headset as the MR display device. The robot platform was an ABB YuMi dual-arm collaborative robot, connected to a workstation equipped with an Intel Core i7-10700K CPU @ 3.8 GHz, 32 GB RAM, and an NVIDIA RTX 3070 GPU running Windows 10. Robot control and data exchange were implemented via ROS Noetic (Ubuntu 20.04) using the rosbridge_server package to establish communication with Unity. The MR environment was developed in Unity Editor 2021.3 (LTS) with the Microsoft Mixed Reality Toolkit (MRTK) v2.7.3. Human hand positions were tracked using a Vicon motion capture system. These specifications ensure the reproducibility of the MR–robot integration pipeline described in this paper.

2.1.2. Robot’s Path Planning

The robot’s motion is controlled using ROS MoveIt packages, with joint coordinates recorded at a sampling rate of 30 Hz to ensure precise motion tracking. In the Unity-based virtual environment, a digital twin of the Yumi robot replicates the real robot’s movements in real time by continuously reading joint status data from the physical Yumi system. To facilitate seamless communication, the ABB motion control program is integrated with HoloLens via UDP communication socket, enabling the real-time exchange of motion commands and status updates.

2.2. Mixed-Reality Environment Construction

In modern human–robot collaboration, mixed reality (MR) plays a pivotal role in enhancing task execution by overlaying virtual instructions onto laboratory-based environments. This is achieved by anchoring virtual scenes to the real-world scenes, ensuring precise alignment between digital content and physical components. The theoretical framework presented in Figure 4 illustrates the transformation chain that bridges the virtual and real worlds, enabling seamless projection of virtual instructions onto physical objects.

2.2.1. Real–Virtual Synchronization in MR Systems

MR systems, such as the one used in this study, rely on a series of coordinate transformations to align virtual elements with real-world objects. The process begins in the virtual environment, pre-designed in platforms like Unity. Here, the virtual scene is constructed using relative location information from the real-world scene to ensure perfect alignment with real-world objects. In the MR headset, the virtual scene exists within its own headset coordinate system, which serves as the zero point for the virtual objects.

Without an anchor to the real-world scene, the virtual elements initially “float” in arbitrary locations, disconnected from the physical space. To solve this, an April Tag is introduced as a bridge between the virtual and real-world coordinate systems. The April Tag serves as an anchor point, enabling the transformation of the virtual scene into the real-world coordinate system and ensuring precise overlap of virtual and physical components within the MR headset.

2.2.2. Coordinate Transformation

The synchronization process involves a chain of coordinate transformations, as shown in Figure 4.

Virtual–Headset Transformation $T_{v h}$

The virtual scene is initially designed with its own coordinate system (

C S_{v}

). The MR headset transforms this virtual coordinate system into the headset coordinate system (

C S_{h}

) using the transformation matrix

T_{v h}

:

C S_{h} = T_{v h} \cdot C S_{v}

(1)

Headset–April Tag Transformation $T_{h t}$

The headset detects the April Tag in the lab environment, which acts as the anchor point. The transformation matrix

T_{h t}

maps the headset coordinate system to the April Tag’s coordinate system (

C S_{t}

):

C S_{t} = T_{h t} \cdot C S_{h}

(2)

Anchor–Real-World Transformation $T_{t r}$

Finally, the April Tag’s coordinate system is mapped to the real-world coordinate system (

C S_{r}

) using the transformation matrix

T_{t r}

:

C S_{r} = T_{t r} \cdot C S_{t}

(3)

By applying this transformation chain, the virtual scene is theoretically anchored to the April Tag and can be precisely aligned with the real objects. In practice, however, hardware limitations (e.g., HoloLens sensors) and software factors (e.g., Vuforia tracking stability) introduce small deviations. In our setup, the alignment accuracy varied across objects, with most virtual overlays appearing visually well matched and the maximum noticeable deviation being approximately 5 mm. Despite these imperfections, the alignment was sufficient to ensure that the virtual objects remained stable and consistently registered with the physical environment as the user moved.

In this framework, virtual objects are “anchored” to lab-environment markers via April Tag. Once the system detects the position of a real-world marker, the corresponding virtual object is rendered in the correct location within the user’s field of view. By maintaining consistent synchronization between the real and virtual worlds, the MR system enables a seamless overlap of digital information with physical objects, allowing users to interact with both realms simultaneously. This process works iteratively, updating at each time step based on new sensor data and recalculating object positions and orientations.

In our study, coordinate transformation is achieved using the Vuforia engine for marker recognition. As shown in Figure 2, a marker is placed in front of the subject to ensure precise position and rotation detection by the HoloLens. All virtual components are anchored to a virtual marker, which aligns with the lab environment-based marker upon detection. Before the experiment, alignment is verified and adjusted if needed. Once all holograms are precisely aligned, the Vuforia engine is disabled to optimize HoloLens performance.

2.3. Robot Intention Communication Strategies

This study evaluates four MR-based strategies for communicating robot intentions during human–robot collaboration. We have proposed four different strategies to communicate robot intentions to humans to study their impacts on human–robot collaborations. The strategies are described below.

Component Highlighting Strategy (MR Strategy 1): This strategy highlights only the components that Yumi will pick up in each step. As shown in Figure 3 (‘MR strategy 1’), red holograms indicate the plane tail and body, signaling that Yumi will assemble these parts, while participants choose additional components to complete the task. If assembling a blue plane, the holograms shift to blue. This serves as the baseline strategy, forming the foundation for the subsequent approaches.

Highlighting with Connection Lines Strategy (MR Strategy 2): Expanding on Strategy 1, this method introduces straight holographic lines connecting the gripper’s tool center point (TCP) to the corresponding target components, as shown in Figure 3 (‘MR strategy 2’). These guiding lines reinforce spatial awareness and allow for direct comparison between MR Strategies 1 and 2.

Trajectory Animation Strategy (MR Strategy 3): Instead of static indicators, this approach displays a dynamic animation of the TCP’s trajectory before the robot moves. The animation outlines the gripper’s complete movement from its starting position to the final assembly point, including grasping, delivering, and assembling (Figure 3, ‘MR strategy 3’). The trajectory is color-coded to match the target components, mitigating the potential distraction of full-arm holograms (analyzed in Strategy 4).

Full-Arm Animation Strategy (MR Strategy 4): This method animates Yumi’s entire arms, visually representing each joint movement throughout the task (Figure 3, ‘MR strategy 4’). The model color matches the real robot, and target components remain highlighted. This approach provides the most comprehensive visualization, allowing participants to anticipate the full grasping, delivering, and assembling process.

The four different strategies are implemented on realistic human–robot collaborative manufacturing tasks to study their impacts on the human–robot collaborations. All four MR strategies are displayed before the real robot moves in order to ensure that participants can anticipate the robot’s actions to facilitate their collaborations with the robot. Once the assembly task begins, the holograms disappear to prevent obstruction of the participant’s view during the real human–robot collaborations.

3. Design of Experimental Evaluations with Humans

The first stage includes a single lab-based physical baseline scenario, where participants perform the task without any MR visualization support. The second stage comprises the remaining four scenarios, each implementing a different MR display strategy. The sequence of the four strategy-based scenarios is randomized using the Latin square method, ensuring that ordering effects are minimized.

Throughout the experiment, two identical sets of model planes serve as interactive objects, distinguished by red and blue coloring and referred to as the red set and blue set. The primary objective for participants is to collaborate with the Yumi robot to assemble both plane sets as efficiently as possible. To ensure consistency, a set of strict procedural rules is established, and participants are fully briefed before the experiment begins, as illustrated in Figure 5.

Participants must pick up two components before the robot initiates any movement. This marks the official start of the experiment.
Participants must fully assemble one color set before proceeding to the next.
While participants are encouraged to complete the task as efficiently as possible, they are not required to do so. The use of both hands is recommended to enhance collaboration, provided the participant feels comfortable.

Figure 5. Assembled and disassembled model plane.

To replicate a realistic manufacturing scenario, predefined assembly duration periods are assigned to each component. Participants must hold each part in its designated position for the specified duration before proceeding. The assembly times for each component are detailed in Table 1. Upon completing the assembly of a plane, participants perform a final inspection lasting 15 s, referred to as the “fine-tuning” process, to ensure proper alignment and structural integrity.

Figure 6 illustrates the task timeline for the case in which the red set is assembled first. At the beginning of each trial, the robot grasped the airplane fuselage and held it in a hovering position, thereby providing a stable reference for subsequent assembly. The human participant then sequentially grasped and placed parts onto the fuselage, sometimes holding two parts simultaneously in position for a specified dwell time. This holding phase was designed to mimic the time-consuming nature of complex industrial assembly actions. Once all five parts of a set were attached, a final “fine-tuning” step was performed to simulate the inspection and adjustment process prior to completion. Importantly, participants were required to fully complete all tasks for one color set (red or blue) before proceeding to the other, ensuring clear separation of the two assembly stages.

A total of 10 participants were recruited for the experiment, with ages ranging from 22 to 31. Among them, five had prior experience with MR devices and applications, while another five were familiar with the Vicon motion tracking system. Before the experiment, all participants underwent a training session to familiarize themselves with the basic operations of the HoloLens, the movement patterns of the Yumi robot, and the wearable Vicon tracking devices. Participants were required to confirm their understanding of hologram interactions and voice commands on the HoloLens. If a participant expressed uncertainty, the training was repeated until both the experimenter and participant agreed that the participant was sufficiently proficient. During the experiment, the task completion time for each trial was recorded, while the Vicon system continuously tracked hand and elbow positions. The human–robot separation distance was calculated as the standard Euclidean distance in 3D space between the robot end-effector position obtained from the ROS system and the human hand position captured by the Vicon tracking system. Following each experimental scenario, participants completed a post-experiment survey to provide subjective evaluations of their experience. The survey consisted of five Likert-scale questions, assessing task fluency, perceived safety, task efficiency improvement (compared to the non-MR scenario), the effectiveness of MR in conveying robot actions, and overall comfort in human–robot collaboration (HRC). Subjective rating questions were phrased in plain language, for example: “Please rate the perceived comfort of collaboration with the robot in this trial.” The metrics—comfort, safety, fluency, and predictability—were adapted from prior HRI studies [27,28], while perceived efficiency improvement was newly introduced in this study.

4. Experimental Results and Analysis

This section presents the experimental results and analysis, categorized into two subsections: objective metrics and subjective metrics.

4.1. Objective Metric Results

Based on the recorded experimental videos, we analyzed each participant’s decision-making process and the first movement execution time across all MR-based and non-MR scenarios. Since the virtual guidance primarily influences the participant’s initial choice and first movement, these factors were selected for evaluation. To quantify decision-making effectiveness, a scoring system was implemented:

Optimal choice (fastest option)—1.
Suboptimal choice (faster than the slowest option but not optimal)—0.
Least efficient choice (slowest option) is represented by −1.

The results are presented in Table 2. Additionally, the exceeding time—calculated as the difference between the actual first movement execution time and the theoretical minimum time for each case—was evaluated. The theoretical minimum times for the non-MR method and MR methods 1–4 were predetermined as 14, 12, 12, 11, and 11 s, respectively. The exceeding time values are listed in Table 3.

4.1.1. Task Completion Time Analysis

As shown in Table 2, all participants consistently selected the optimal decision under MR guidance (methods 1–4), while decision making varied in the non-MR condition. This indicates that virtual guidance effectively conveys robotic intentions, guiding participants toward more efficient decisions.

Table 3 categorizes decision making under three non-MR subcases:

Optimal case (choosing the wheels and tail).
Suboptimal case (choosing the wing and plane body).
Least efficient case (selecting components from the incorrect group).

The theoretical minimum execution times for the three non-MR cases were 14, 68, and 73 s, respectively. For MR methods 1–4, the theoretical minimum times were 11, 11, 12, and 12 s, respectively. From Table 3, it is evident that, on average, MR-guided methods consistently result in lower exceeding times compared to the non-MR condition, regardless of the participant’s initial choice. For exceeding time (Table 3), we conducted paired exact binomial sign tests comparing method 4 against methods 1–3 on a per-participant basis (two-sided

α = 0.05

). This non-parametric test uses only the direction of paired differences and is appropriate for small-sample within-subject designs without distributional assumptions. We additionally report medians and interquartile ranges (IQR). Method 4 yielded the lowest median exceeding time (2.0 s [2.0, 3.8]). Compared with M2 and M3, M4 was significantly lower (9/10,

p = 0.021

; 8/8 non-tied,

p = 0.0078

). Compared with M1, M4 showed the same direction but did not reach significance (7/10,

p = 0.344

). Accordingly, we describe M4’s advantage over M1 as a trend with

N = 10

. Across MR strategies, M4 had the smallest mean (

2.6 \pm 1.35

s) and median (2.0 s) exceeding times. Taken together, these results indicate that, among the MR strategies, method 4 provides the strongest and statistically supported reduction in exceeding time comparing against M2/M3, with a consistent but non-significant trend against M1.

Within the experiment, participants 1, 2, 7, and 10 made the optimal choice, while participants 3, 4, 5, 6, 8, and 9 made suboptimal choices. Among those who made suboptimal choices, participants 4, 6, and 9 selected the least efficient case, leading to the longest exceeding times. One notable exception was observed in participant 1, where the exceeding time for MR method 3 was higher than in the non-MR condition. This anomaly occurred because the participant delayed movement initiation after the robot reached its assembly position during the first case. Apart from this exception, all other participants followed the expected trend, where MR-based methods resulted in lower exceeding times than the non-MR condition. In conclusion, these findings confirm that virtual guidance effectively conveys the robot’s intentions to human participants, enabling them to make more efficient decisions. Among the four MR strategies, method 4 demonstrates the greatest improvement in reducing execution time, suggesting that the full-arm animation with motion playback provides the most intuitive and effective visualization of the robot’s intended actions.

4.1.2. Human–Robot Proximity Analysis

In this study, the Vicon motion tracking system recorded participants’ hand and elbow positions, allowing a comparative analysis with the robot arm’s motion. This enabled precise measurement of human–robot separation distances between the participant’s hands/elbows and the robot’s gripper/arm. With a sampling frequency of 100 Hz, we quantified close-proximity interactions by counting the data points where hands or elbows remained within a predefined threshold distance from the robot. A trajectory plot of human hand and robot gripper movements is shown in Figure 7.

To further analyze human–robot spatial interactions, the dynamic positions of both hands and elbows were compared to the robot’s grippers during each task. Consequently, four minimum distances (one for each tracked body part) were calculated per subject for every experimental scenario. The average minimum separation distance, derived from these values, was then computed to provide a holistic measure of human–robot proximity. The results, detailing the average minimum separation distances across all five methods, are illustrated in Figure 8.

As shown in Figure 8, the average minimum separation distances for MR methods 1, 3, and 4 were larger than those in the non-MR condition for most participants. This suggests that virtual guidance increases human–robot separation distance rather than reducing it; this is likely because the HoloLens limited participants’ peripheral vision, making them less confident in close-proximity interactions with the moving robot. Several participants also reported that the headset negatively affected their peripheral awareness. Statistical tests further support these observations: MR method 1 and method 4 produced the largest mean distances (49.9 cm and 42.8 cm, respectively), while MR method 2 yielded the smallest (23.1 cm). Paired Wilcoxon tests confirmed that method 4 was significantly higher than method 2 (

p = 0.0059

), but not significantly different from method 1 (

p = 0.375

) or method 3 (

p = 0.846

). We therefore describe method 4’s advantage over method 2 as statistically significant, while differences relative to method 1 and method 3 are reported as trends. The smaller distances in method 2 can be attributed to its unique delivery motion, which required participants to remain in closer proximity to the robot for longer periods, thereby reducing the minimum separation distance.

4.1.3. Time-Based Proximity Evaluation

Average minimum distance alone does not fully capture human–robot proximity over the entire experiment. To provide a more comprehensive measure, we also considered time spent within close range. Specifically, we counted the number of samples where hand or elbow tracking points fell below a 150 mm threshold, which is generally considered the uncomfortable working distance in human–robot interaction [29]. Since four body parts were tracked, four values were calculated per trial, and their average was used to represent the final proximity measure for each method. The results are presented in Figure 9.

As shown in Figure 9, the number of points represents the duration that participants remained within close proximity to the robot. For most participants, MR methods resulted in longer close-range interaction times compared to the non-MR condition. Among all methods, method 2 exhibited the longest interaction duration for most participants, consistently exceeding method 1.

4.2. Subjective Metric Results

In addition to objective metrics, subjective feedback is a crucial component of this study. As outlined in the previous section, participants completed a Likert-scale survey assessing their subjective experience with the task. The survey consisted of three questions for the non-MR condition and five questions for the MR conditions. Participants rated their level of agreement on each question using a 5-point scale (1–5).

The collected responses were visualized through several plots, presented in Figure 10. Each plot depicts the distribution of ratings for a single question across four or five experimental conditions, depending on the question’s applicability.

Question 1 evaluates task fluency. As shown in Figure 10, MR method 4 consistently received the highest or equal ratings across all subjects, whereas the non-MR method generally received the lowest ratings. This indicates that participants perceived MR method 4, which displays the entire virtual robot arms, as the most fluent interaction, while the non-MR condition was perceived as the least fluent. Statistical analysis: a Friedman test showed a significant effect of condition on fluency ratings,

χ^{2} (4) = 22.94

,

p < 0.001

.

Question 2 assesses perceived safety. MR method 4 received the highest safety ratings for most subjects. However, MR methods 1, 2, and 3 often received equal or lower safety ratings compared to the non-MR condition, suggesting that MR visual guides do not always enhance perceived safety. Notably, when participants could clearly see the robot’s upcoming movement in detail, as in MR method 4, their perceived safety significantly increased. Statistical analysis: the Friedman test was significant for safety,

χ^{2} (4) = 9.67

,

p = 0.046

, indicating differences across conditions.

Question 3 measures perceived task efficiency improvement. Most subjects rated this metric highest for MR method 4, while ratings for other conditions remained consistently lower. Perceived efficiency ratings for method 4 were aligned with the objective performance results: it achieved both the highest subjective efficiency scores and the lowest exceeding times, reinforcing its role as the most effective strategy. In contrast, MR method 3 was often rated around 2–4, indicating that showing only the gripper’s trajectory did not provide users with a strong sense of efficiency improvement. Nevertheless, objective metrics showed that method 3 did enhance task performance, highlighting a discrepancy between subjective perception and actual efficiency. Statistical analysis: the Friedman test did not detect a significant overall effect across conditions for perceived efficiency,

χ^{2} (4) = 6.42

,

p = 0.170

; thus, method 4’s advantage is described as a clear trend rather than a statistically confirmed effect.

Question 4 evaluates the predictability of the robot’s next movement. MR method 4 received the highest ratings, suggesting that displaying the full virtual robot arm is the most effective way to communicate the robot’s intent. Statistical analysis: the Friedman test indicated a significant effect of condition on predictability,

χ^{2} (4) = 8.58

,

p = 0.035

.

Question 5 assesses general perceived comfort. Most subjects rated MR method 4 significantly higher than the other conditions. Notably, five participants assigned a rating of 4 or higher to MR method 4, a trend that did not occur in any other condition. In contrast, ratings for other methods were inconsistent and irregular, indicating a lack of consensus among participants regarding their comfort levels. However, MR method 4 was widely regarded as the most comfortable display method. Statistical analysis: the Friedman test was significant for comfort,

χ^{2} (4) = 14.82

,

p = 0.0051

, demonstrating differences across conditions.

The average score of each case for each question is presented in Table 4 and Figure 11. Across all subjective metrics, MR method 4 consistently received the highest scores, indicating that it provided the most satisfying experience for participants.

4.3. Summary and Discussion

The objective results indicate that MR-based guidance consistently improves task efficiency, with MR method 4 demonstrating the lowest exceeding time over the theoretical minimum execution time. Additionally, MR methods 1, 3, and 4 increased human–robot separation distance, likely due to MR-induced visibility limitations, while MR method 2 resulted in the closest human–robot interaction due to its unique delivery motion. Subjective evaluations further confirm that MR method 4 is the most effective in enhancing task fluency, perceived efficiency, predictability, and comfort, as it provides full-arm visualization of robot movements. However, perceived safety did not always improve across all MR conditions, suggesting that not all virtual guidance methods equally enhance user confidence. Despite some variations, the findings confirm that MR-based visualization effectively conveys robot intentions, with MR method 4 offering the most intuitive and satisfying user experience.

Beyond summarizing the results, several implications can be drawn. From a research perspective, the findings extend prior work on MR intent communication by providing the first systematic within-subjects comparison of four visualization strategies under identical task conditions; while earlier studies have demonstrated the benefits of individual visualization techniques such as volume rendering or trajectory overlays [30], our results show that full-arm animation (method 4) consistently offers the most effective support across both objective and subjective measures. This positions method 4 as a particularly promising baseline for future HRC studies.

From a practical standpoint, the results suggest that MR-based intent visualization can be directly applied to collaborative assembly scenarios to reduce human uncertainty and improve comfort. In particular, method 4 provides operators with a clear understanding of forthcoming robot actions, enabling smoother coordination. At the same time, the results caution that not all MR cues enhance safety perceptions, and designers should consider the trade-off between increasing predictability and maintaining user confidence.

Looking forward, future studies should validate these findings with larger and more diverse participant pools, including professional workers, and in more complex industrial environments. Technical improvements in MR hardware, such as expanding the field of view and increasing tracking accuracy, will also be critical to further enhance alignment quality and user trust. Finally, integrating MR visualization with complementary modalities, such as auditory or haptic cues, represents an exciting direction to create richer, multi-channel communication of robot intentions in collaborative manufacturing.

4.4. Limitations

This study has several limitations. First, the non-MR baseline condition was always administered first; while this design ensured that participants became familiar with the assembly task before experiencing MR guidance, it may also introduce a potential order effect. Although the four MR conditions were counterbalanced, the lack of counterbalancing for the non-MR condition may reduce the strength of direct comparisons between MR and non-MR. Second, although we administered extensive subjective questionnaires and report those results, qualitative feedback and video-based observations were only briefly mentioned. A more systematic integration of these sources would enrich the analysis and provide additional insights. Finally, the participant pool in this study was modest (

N = 10

). Although the within-subjects design increased the number of observations and enabled paired comparisons, future studies should recruit larger and more diverse participant groups to improve statistical power, strengthen external validity, and allow validation in more realistic industrial environments.

5. Conclusions

This study proposed and evaluated four mixed reality (MR)-based visualization prompting strategies to enhance human–robot collaboration (HRC) by effectively communicating robot intentions to human operators. Through a controlled feasibility experiment, we systematically compared the strategies under identical task conditions. The results demonstrated that MR guidance consistently improved task efficiency, decision making, and user experience compared to the non-MR baseline. Among the four methods, full-arm animation (method 4) emerged as the most effective strategy, significantly enhancing fluency, predictability, perceived efficiency, and comfort. At the same time, we observed that, while MR generally increased human–robot separation distances, the visibility constraints of the headset influenced perceived safety in close-proximity interactions. These findings extend prior work by providing one of the first within-subjects comparisons of multiple MR intention communication strategies in collaborative assembly.

This study also has limitations. The participant pool was modest (

N = 10

), which, although sufficient for a feasibility study with a within-subjects design, limits the statistical power and generalizability of the findings. In addition, the non-MR baseline condition was always administered first, potentially introducing order effects, and qualitative feedback or video observations were only briefly integrated. These factors should be taken into account when interpreting the results.

Looking forward, future work will expand the participant pool and include professional workers to improve statistical robustness and ecological validity. Testing in more realistic industrial environments will help bridge the gap between laboratory studies and practice. Technical improvements in MR hardware, such as increasing field of view and tracking accuracy, will be essential, as will refining visualization designs to balance predictability, efficiency, and user confidence. Finally, integrating MR visualization with complementary modalities such as auditory or haptic cues could further enrich robot intention communication. Together, these directions will advance the deployment of MR-supported intention communication in real-world industrial HRC applications.

Author Contributions

Conceptualization, all authors; methodology, all authors; software, K.Z. and Y.Y.; validation, not applicable; formal analysis, K.Z.; investigation, all authors; resources, K.Z.; writing—original draft preparation, K.Z. and Y.Y.; writing—review and editing, all authors; visualization, all authors; supervision, Y.J.; project administration, Y.J.; funding acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science Foundation under Grant IIS-1845779.

Institutional Review Board Statement

The study was conducted with the Institutional Review Board at Clemson University.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

ISO 10218-1:2011; Robots and Robotic Devices—Safety Requirements for Industrial Robots. ISO: Geneva, Switzerland, 2011. Available online: https://www.iso.org/standard/51330.html (accessed on 19 September 2025).
Shi, J.; Jimmerson, G.; Pearson, T.; Menassa, R. Levels of human and robot collaboration for automotive manufacturing. In Proceedings of the Workshop on Performance Metrics for Intelligent Systems, College Park, MD, USA, 20–22 March 2012; pp. 95–100. [Google Scholar]
Villani, V.; Pini, F.; Leali, F.; Secchi, C. Survey on Human–robot Collaboration in Industrial Settings: Safety, Intuitive Interfaces and Applications. Mechatronics 2018, 55, 248–266. [Google Scholar] [CrossRef]
Thoben, K.; Wiesner, S.; Wuest, T. “Industrie 4.0” and smart manufacturing—A review of research issues and application examples. Int. J. Autom. Technol. 2017, 11, 4–16. [Google Scholar] [CrossRef]
Krüger, J.; Lien, T.; Verl, A. Cooperation of human and machines in assembly lines. CIRP Ann. 2009, 58, 628–646. [Google Scholar] [CrossRef]
Wang, W.; Chen, Y.; Diekel, Z.; Jia, Y. Cost functions based dynamic optimization for robot action planning. In Proceedings of the Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, Chicago, IL, USA, 5–8 March 2018; pp. 277–278. [Google Scholar]
Yan, Y.; Jia, Y. A review on human comfort factors, measurements, and improvements in human–robot collaboration. Sensors 2022, 22, 7431. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Xu, M.; Bian, C. Experimental comparison of local direct heating to improve thermal comfort of workers. Build. Environ. 2020, 177, 106884. [Google Scholar] [CrossRef]
Chan, W.; Crouch, M.; Hoang, K.; Chen, C.; Robinson, N.; Croft, E. Design and implementation of a human–robot joint action framework using augmented reality and eye gaze. arXiv 2022, arXiv:2208.11856. [Google Scholar]
Chan, K.; Kudalkar, V.; Li, X.; Zhang, S. ARROCH: Augmented reality for robots collaborating with a human. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 3787–3793. [Google Scholar]
Edsinger, A.; Kemp, C. Human–robot interaction for cooperative manipulation: Handing objects to one another. In Proceedings of the RO-MAN 2007—The 16th IEEE International Symposium on Robot and Human Interactive Communication, Jeju, Republic of Korea, 26–29 August 2007; pp. 1167–1172. [Google Scholar]
Moore, D.; Zolotas, M.; Padir, T. Shared affordance-awareness via augmented reality for proactive assistance in human–robot collaboration. arXiv 2023, arXiv:2312.13410. [Google Scholar]
Macciò, S.; Carfì, A.; Mastrogiovanni, F. Mixed reality as communication medium for human–robot collaboration. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 2796–2802. [Google Scholar]
Thomaz, A.; Hoffman, G.; Cakmak, M. Computational human–robot interaction. Found. Trends® Robot. 2016, 4, 105–223. [Google Scholar]
Mead, R.; Atrash, A.; Matarić, M. Automated proxemic feature extraction and behavior recognition: Applications in human–robot interaction. Int. J. Soc. Robot. 2013, 5, 367–378. [Google Scholar] [CrossRef]
Mollaret, C.; Mekonnen, A.; Pinquier, J.; Lerasle, F.; Ferrané, I. A multi-modal perception based architecture for a non-intrusive domestic assistant robot. In Proceedings of the 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Christchurch, New Zealand, 7–10 March 2016; pp. 481–482. [Google Scholar]
Dillmann, R. Teaching and learning of robot tasks via observation of human performance. Robot. Auton. Syst. 2004, 47, 109–116. [Google Scholar] [CrossRef]
Macciò, S.; Shaaban, M.; Carfì, A.; Zaccaria, R.; Mastrogiovanni, F. RICO-MR: An open-source architecture for robot intent communication through mixed reality. In Proceedings of the 2023 32nd IEEE International Conference on Robot Furthermore, Human Interactive Communication (RO-MAN), Busan, Republic of Korea, 28–31 August 2023; pp. 1176–1181. [Google Scholar]
Lunding, R.; Hubenschmid, S.; Feuchtner, T.; Grønbæk, K. ARTHUR: Authoring human–robot collaboration processes with augmented reality using hybrid user interfaces. Virtual Real. 2025, 29, 73. [Google Scholar] [CrossRef]
Gkournelos, C.; Karagiannis, P.; Kousi, N.; Michalos, G.; Koukas, S.; Makris, S. Application of wearable devices for supporting operators in human–robot cooperative assembly tasks. Procedia CIRP 2018, 76, 177–182. [Google Scholar] [CrossRef]
Kousi, N.; Stoubos, C.; Gkournelos, C.; Michalos, G.; Makris, S. Enabling human robot interaction in flexible robotic assembly lines: An augmented reality based software suite. Procedia CIRP 2019, 81, 1429–1434. [Google Scholar] [CrossRef]
Luxenburger, A.; Mohr, J.; Spieldenner, T.; Merkel, D.; Espinosa, F.; Schwartz, T.; Reinicke, F.; Ahlers, J.; Stoyke, M. Augmented reality for human–robot cooperation in aircraft assembly. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), San Diego, CA, USA, 9–11 December 2019; pp. 263–2633. [Google Scholar]
Kyjanek, O.; Al Bahar, B.; Vasey, L.; Wannemacher, B.; Menges, A. Implementation of an augmented reality AR workflow for human robot collaboration in timber prefabrication. In Proceedings of the 36th International Symposium on Automation and Robotics in Construction, ISARC, Banff, AB, Canada, 21 May–24 May 2019; pp. 1223–1230. [Google Scholar]
Available online: https://www.microsoft.com/en-us/hololens (accessed on 19 September 2025).
Haas, J. A History of the Unity Game Engine; Worcester Polytechnic Institute: Worcester, MA, USA, 2014. [Google Scholar]
Hess, R. Blender Foundations: The Essential Guide to Learning Blender 2.6; Focal Press: Waltham, MA, USA, 2010. [Google Scholar]
Lasota, P.; Shah, J. Analyzing the effects of human-aware motion planning on close-proximity human–robot collaboration. Hum. Factors 2015, 57, 21–33. [Google Scholar] [CrossRef] [PubMed]
Dragan, A.; Bauman, S.; Forlizzi, J.; Srinivasa, S. Effects of robot motion on human–robot collaboration. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, Portland, OR, USA, 2–5 March 2015; pp. 51–58. [Google Scholar]
Yan, Y.; Jia, Y. Design and Evaluation of a Human-Comfort-Aware Robot Behavior Controller. In Proceedings of the 2025 American Control Conference (ACC), Denver, CO, USA, 8–10 July 2025; pp. 2709–2714. [Google Scholar]
Gruenefeld, U.; Prädel, L.; Illing, J.; Stratmann, T.; Drolshagen, S.; Pfingsthorn, M. Mind the arm: Realtime visualization of robot motion intent in head-mounted augmented reality. In Proceedings of the Mensch Und Computer 2020, Magdeburg, Germany, 6–9 September 2020; pp. 259–266. [Google Scholar]

Figure 1. The framework of the MR-based human–robot collaboration.

Figure 2. Experiment platform.

Figure 3. Four display strategies of MR scenes (HoloLens device-captured views).

Figure 4. Coordinate transformation in real–virtual alignment.

Figure 6. Interaction process timeline.

Figure 7. Three−dimensional trajectories of human hands and robot grippers.

Figure 8. Average minimum human–robot separation distance.

Figure 9. Number of points under threshold.

Figure 10. Likert scale ratings of subjective questions.

Figure 11. Average scores of cases of questions.

Table 1. Assemble durations for each component.

Component	Assemble Time (s)
Plane wing	25
Plane tail	10
Front wheels	10
Rear wheels	10

Table 2. Decisions made by subjects.

Subject	Non MR	M1	M2	M3	M4
1	1	1	1	1	1
2	1	1	1	1	1
3	0	1	1	1	1
4	−1	1	1	1	1
5	0	1	1	1	1
6	−1	1	1	1	1
7	1	1	1	1	1
8	0	1	1	1	1
9	−1	1	1	1	1
10	1	1	1	1	1

Table 3. Exceeding time over theoretical minimum time (s).

Subject	Non MR	M1	M2	M3	M4
1	13	6	8	19	5
2	5	4	5	3	1
3	5	3	4	3	2
4	24	4	4	5	1
5	21	1	4	4	3
6	8	5	7	2	2
7	16	4	4	5	3
8	39	8	7	7	4
9	18	1	17	2	2
10	4	1	1	3	2
AVERAGE	15.3	3.7	6.1	5.3	2.5

Table 4. Subjective metric ratings.

Questions	Non-MR	M1	M2	M3	M4
Fluency	3.2	3.8	3.8	4	4.6
Safety	3.7	3.8	3.9	3.9	4.5
Efficiency	/	3.5	3.4	3.2	3.9
Predictability	/	3.8	3.6	3.9	4.5
Comfort	3.3	3.7	3.6	3.7	4.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, K.; Yan, Y.; Jia, Y. Mixed-Reality (MR) Enhanced Human–Robot Collaboration: Communicating Robot Intentions to Humans. Robotics 2025, 14, 133. https://doi.org/10.3390/robotics14100133

AMA Style

Zhang K, Yan Y, Jia Y. Mixed-Reality (MR) Enhanced Human–Robot Collaboration: Communicating Robot Intentions to Humans. Robotics. 2025; 14(10):133. https://doi.org/10.3390/robotics14100133

Chicago/Turabian Style

Zhang, Kaiyuan, Yuchen Yan, and Yunyi Jia. 2025. "Mixed-Reality (MR) Enhanced Human–Robot Collaboration: Communicating Robot Intentions to Humans" Robotics 14, no. 10: 133. https://doi.org/10.3390/robotics14100133

APA Style

Zhang, K., Yan, Y., & Jia, Y. (2025). Mixed-Reality (MR) Enhanced Human–Robot Collaboration: Communicating Robot Intentions to Humans. Robotics, 14(10), 133. https://doi.org/10.3390/robotics14100133

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mixed-Reality (MR) Enhanced Human–Robot Collaboration: Communicating Robot Intentions to Humans

Abstract

1. Introduction

2. Enhanced Human–Robot Collaborative Manufacturing Using MR