IMPAct: A Holistic Framework for Mixed Reality Robotic User Interface Classiﬁcation and Design

: The number of scientiﬁc publications combining robotic user interfaces and mixed reality highly increased during the 21st Century. Counting the number of yearly added publications containing the keywords “ mixed reality ” and “ robot ” listed on Google Scholar indicates exponential growth. The interdisciplinary nature of mixed reality robotic user interfaces (MRRUI) makes them very interesting and powerful, but also very challenging to design and analyze. Many single aspects have already been successfully provided with theoretical structure, but to the best of our knowledge, there is no contribution combining everything into an MRRUI taxonomy. In this article, we present the results of an extensive investigation of relevant aspects from prominent classiﬁcations and taxonomies in the scientiﬁc literature. During a card sorting experiment with professionals from the ﬁeld of human–computer interaction, these aspects were clustered into named groups for providing a new structure. Further categorization of these groups into four different categories was obvious and revealed a memorable structure. Thus, this article provides a framework of objective, technical factors, which ﬁnds its application in a precise description of MRRUIs. An example shows the effective use of the proposed framework for precise system description, therefore contributing to a better understanding, design, and comparison of MRRUIs in this growing ﬁeld of research.


Introduction
Currently, robots are becoming a part of our daily life in households, for example as vacuum cleaners, lawn mowers, and personal drones, but also in the area of intelligent toys.Industrial machines have started to collaborate with human co-workers, and assistive robots find application in health-care.The development of accessible and secure robots is a major challenge for current and future applications.Burdea [1] already pointed out how virtual reality (VR) and robotics are beneficial to each other.Intuitive, natural, and easy-to-use user interfaces for human-robot interaction have the potential to fulfill the needs in this context.While a fully-automated robotic companion, which is capable of performing almost every task for humans, still remains a vision of the future, it is essential to also focus research on the interaction between humans and robots.Mixed reality (MR) technology provides novel vistas, which has enormous potential to improve the interaction with robots.
"The implementation of a mobile and easy deployable tracking system may trigger the use of trackers in the area of HRI.Once this is done, the robot community may benefit from the accumulated knowledge of the VR community on using this input device."[2] The enormous increase in research interest of MR combined with robotics is demonstrated by counting the number of new scientific publications during the past few years (cf. Figure 1).Many contributions involving robotic user interfaces (RUI) and the field of MR are already published, and the number of new publications per year regarding mixed reality robotic user interfaces (MRRUI) shows a trend of exponential growth.To the best of our knowledge, there is no classification integrating all relevant aspects of this highly multi-disciplinary field of research available in the literature.However, a taxonomy of MRRUIs would represent a valuable tool for researchers, developers, and system designers for a better understanding, more detailed descriptions, and easier discriminations of newly-developed user interfaces (UI).Therefore, we propose a holistic approach that incorporates the relevant aspects of MRRUIs for system design, while supporting a comprehensible overview and understanding the interconnections and mutual influences of the different aspects.Number of publications q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 1.Yearly number of new publications listed on Google Scholar containing the keywords "mixed reality" and "robots" (date of acquisition: 13 March 2019).
The main aim of this contribution is to provide an appropriate structure for the classification of robotic user interfaces involving mixed reality.The authors believe that good and successful design starts with understanding all relevant fields by performing a holistic analysis.The proposed structure is generated from unfolding prominent and relevant taxonomies, continua, and classifications from the most important aspects into a list of highly-relevant factors.Only factors being under direct control of system designers are taken into account; thus making it possible to select or determine their actual values without the need to be concerned with the mental models of probable users of the system.This is beneficial for both purposes: system design and classification.Due to the strong interdisciplinary nature of the topic of MRRUI, a new taxonomy is needed that summarizes all relevant factors and provides the necessary connections, explaining the interplay between them.
In this article, the "IMPAct framework" is introduced for the classification of mixed reality-based robotic user interfaces.As explained in Section 3.2, it identifies different factors from the categories Interaction, Mediation, Perception, and Acting as active contributors to actual MRRUI implementations.The factors are grouped into different categories in order to provide a memorable structure for application.Its purpose is to categorize existing MR-based robotic systems and to provide the process of designing new solutions for specific problems, in which interaction with robots in MR is involved, with a guideline to include all relevant aspects into the decision process.Single factors are explained as an aid for their application.The authors selected the included factors carefully to avoid dependency on the mental models of individuals.Thus, only technical, measurable, or identifiable factors were taken into account.As a result, the IMPAct framework serves as an effective tool for discriminating different MR-based robotic systems with a focus on the UI.Additionally, it can be consulted for designing or improving MRRUI by making use of the holistic view in order to create effective and specialized UIs.
The remainder of this article is as follows: Section 2 introduces the general structure of human-robot interaction, starting with teleoperation, and extends the model to a more detailed version with the aim to describe MR systems for human-robot interaction more specifically.The section concludes with an overview of prominent theory papers regarding the relevant fields of 3D interaction, mixed reality (MR), robot autonomy, interaction with virtual environments (VE), and many more.In Section 3, the experimental procedure is explained and the results are summarized.The extraction of relevant factors from theoretical papers, the card sorting experiment, and the second clustering into major categories are described in Section 3.1.Section 3.2 presents the results from the card sorting experiment and the following refinement in tabular and graphical representations.A validation of the results is presented in Section 4 by applying the taxonomy to an MRRUI described in a publication.The validation is discussed in detail for every separate group and also provides additional information beyond the given information in the tables.Section 5 summarizes the article and discusses the results and their limitations and future work.The Appendix contains the complete table of factors for reference (cf.Table A1) including the factor's publication source.Table A2 represents the short profile of the system categorized in Section 4, and Table A3 is the empty template, ready for use to classify other MRRUI.

Related Work
In the following, the main components involved in MRRUI are identified and relevant taxonomies, classifications and definitions from scientific publications are listed and discussed very briefly.

The Structure of Information Flow in Human-Robot Interaction
The main components in a human-robot system and how they are connected were already described in Crandall et al. [3]. Figure 2 shows an adapted version.This basic scheme of a robot remote operation identifies human, robot, world as the essential entities to perform a human-robot interaction task.Required components on the side of the interface between human and robot are responsible for transferring control data to the robot and some information from the robot to the human as feedback.In such setups, the robot is able to manipulate the world and to interact with the world physically, or at least is co-located with other real-world entities.To fulfil this task, the robot needs to gather information about the world it is acting on by utilizing sensors.It can influence the actual state of the world by utilizing actuators.To a certain degree, the robot can act autonomously.The knowledge of the robot about the world and its own state is accessible for the human operator by the user interface in general and provides the operator with some information helpful for making decisions about the next control.In a scenario where human and robot are sharing the same physical space (cf. Figure 3), there can be additional direct and physical interactions of the human and the robot, and the human and the world.Both of these schemes explain how humans can interact with a distant or nearby robot in order to manipulate the real world, but there is no detail about how VEs are probably embedded into this.Enhancing the basic scheme of remote operation with the component of a virtual environment containing a virtual robot model and a model of a virtual world (cf. Figure 4) reveals new potential interactions.One conclusion is based on the fact that even though a remote operation scenario is described, the operator has the possibility to influence directly the virtual world by virtual physical interaction or implicitly by utilizing some linkage between the VE and the real robot.This is the case if we use VR as an additional component in our human-robot system.By including a virtual model of the robot and the world in the system, the same control means, used for the real robot, are applicable to the VE.However, interaction with components of a VE removes the physical boundaries of our real world.The implementation of VR-based user interfaces means for the system designer that he/she has to deal with a much higher complexity in creating an appropriate means of interaction.Extending the scheme of VR robot operation by integrating MR technology means for the operator that he/she is not separated from the real world anymore.The multitude of interconnections further grows, and direct interaction with the real world is additionally possible.Enhancements of the classical co-located robot operation scheme with a VE by using MR technology (cf. Figure 5) couples the operator directly with the real robot and the real world, while all relationships of the VR robot operation are maintained.We can conclude that mixed reality robot operation is the most flexible and complex case we can currently imagine to implement a robotic user interface.Especially how the interaction with the VE and the VE itself is connected to the real robot is a challenging question for MRRUI design and implementation.

On Classifying Mixed Reality Robotic User Interfaces
Much effort went into understanding and classifying a multitude of aspects relevant to the large fields of human-computer interaction (HCI) and human-robot interaction (HRI).These taxonomies, continua, rules, and guidelines each serve a special purpose.Despite contributing valuable terms and knowledge, combining them together into a kind of composed taxonomy reveals the issue of overlap and conflicting models and involves the risk of misleading from the core concept of the actual focus of MRRUI analytics.
Robinett [4] discusses the topic of synthetic experiences.He clarifies many aspects of how modalities are captured and transferred from the world to the human.The other direction, from the human to the world, resembles how to manipulate the world using technical devices.The bidirectional mediation, discussed here, is highly relevant for robotic UIs involving MR.
A taxonomy with VR as one of the extrema is the AIP-Cube proposed by Zeltzer [5].The authors determined three categories, Autonomy, Interaction, and Presence, as important and combined these as orthogonal axes in a cube-like taxonomy.Especially, autonomy is very relevant to robots in general, but in the context of VEs, the question arises where autonomy should be: at the robot side, at the human side, or at the VE side.If we agree not to specify the location of an MRRUI in detail, but to define the UI as the interplay of different aspects and components involved in enabling the interaction, we can find a subset of suitable elements for classifying MRRUIs.
Presence, as defined by Slater and Wilbur [6] in the FIVE framework ("A Framework for Immersive Virtual Environments"), depends on individual mental models of different humans, and thus, the presence level of a certain implementation cannot be specified.Instead, immersion is found to be an appropriate technical factor, which can be determined objectively and also contributes to the resulting level of presence.
Strongly related to immersion is the definition of the reality-virtuality continuum by Milgram et al. [7] and display classes in the context of the continuum, thus contributing to the level of immersion [7][8][9].Major structural properties of MRRUI are specified by these topics.
One very large field that is highly relevant to MR technology is 3D interaction.Bowman et al. [10] provided a very general discrimination of 3D user interfaces (3DUI) into three different applications, selection and manipulation, travel, and system state.
The classification of hand gestures by Sturman et al. [11] provided a very complete taxonomy, which takes the dynamics into account, as well as up to six degrees of freedom (DoF).The proposed classification is also applicable for gestures, other than hand gestures.Interaction techniques using up to 6 DoF and their relevant properties were issued by Zhai and Milgram [12] with the 6-DOF-Input-Taxonomy. Here, a strong focus lied on the mapping properties of the implementation of an actual input method.The interactive virtual environments (IVE) Interaction Taxonomy proposed by Lindeman [13] formalizes interaction methods with arbitrary DoF as a combination of an action type and a parameter manipulation type.
Robot classification is a difficult task, due to the polymorphic characteristic of robots.Some books, serving as an introduction to robotics, list a classification based on their intelligence level, defined by the Japanese Industrial Robot Association (JIRA) [14,15].Another classification from Association Francaise de Robotique (AFR) describes their capabilities of interaction.Murphy and Arkin [16] introduced classical paradigms of robot behavior.However, there is not only one general definition of a robot.Across robotics' history, many accepted definitions appeared, but in the end, it depends on the specific community that one has selected.Therefore, the question of how a user interface for robots is defined is almost untouched in the literature.In the paper of [15,17], the term robotic user interface (RUI) seeds to be used for the first time.In this paper, the authors defined four categories for classifying RUIs with regard to HRI-relevant factors.The authors took into account the level of autonomy, the purpose of the robot, the level of anthropomorphism, and the control paradigm resulting in a certain type of communication.
Regarding autonomy, there is a long history of research methods addressing questions of design and analysis what should be automated and how far.The oldest and most prominent conceptualization goes back to 1978 by Sheridan and Verplank [18].The authors introduced levels of autonomy (LOA), a classification for rating the interaction with a computer along with a discrete scale of autonomy.A refined version of LOA, published by Parasuraman et al. [19], offers a more detailed level of granularity by dividing tasks, which can be automated, into four different stages of acquisition, analysis, decision, and action.The next step involves dynamic assignments of autonomy levels, which are not fixed by design, but depend on the current state of the world [20].Miller and Parasuraman [21] further improved the model by explaining how tasks at the four different processing stages consist of several subtasks, each with its own LOA.Thus, the resulting autonomy level at each stage is the result of a process of aggregation.Discussions of autonomy were not especially focused on issues of intelligent robots; furthermore, they were targeted to questions of industrial automation, e.g., the question of how to replace human workers by machines on specific tasks.This resulted in taxonomies that did not include specific demands and abilities of different kinds of robots.Beer et al. [22] proposed levels of robot autonomy (LORA).Schilling et al. [23] provided a very recent perspective on shared autonomy by taking multiple dimensions of relevance for robot interaction into account.A very interesting explanation of the term autonomy and its connection to intelligence and capabilities was provided by Gunderson and Gunderson [24].The human-centered perspective of the effects of interaction with autonomous systems has been explained in detail by several authors [25][26][27].
Important aspects in human-robot collaboration are the roles of the interaction partners, the structure of the interaction, and how the initiative and autonomy are distributed among the interacting partners [2,28].

Materials and Methods
As demonstrated by Adamides et al. [29], it is possible to develop a taxonomy from extensive literature investigation of scientific publications, followed by a session of card sorting in order to cluster the resulting data.Further processing of the intermediate results can lead to a reasonable structure for the classification of an actual topic.Our aim was to create an objective taxonomy for specifying technical factors of MRRUIs.Thus, every aspect involving qualitative evaluation of the system is neglected in this work.Even though we believe that qualitative measures of MRRUIs are of importance, we argue for integrating these factors into another work about MRRUI evaluation.

Procedure
In order to assess important factors for MRRUI classification, we collected high impact publications containing classifications from the fields of 3D interaction, mixed reality, user interfaces, immersive virtual environments, robot autonomy, automation, and many more.The decision to include or neglect a certain publication was governed by its citation count or the reputation of the main authors.In general, the most important authors were found by reading survey and review papers, followed by a combination of systematic and cumulative literature investigation on Google Scholar, Researchgate, the IEEE Digital Library, and the ACM Digital Library.Then, we extracted factors, which directly impact the nature of the underlying MRRUI, from the investigated papers by unrolling the described taxonomies, definitions, and classifications.Typically, many authors describe classifications of a certain aspect using a discrete list or a continuum, which allows relative ordering of actual instances.In this article, a single property of this kind is called a factor.After determining factors in this way, we tried to find aspects not covered so far and identified three more factors that were important in our previous work: "interaction reality", "location", and "system extent".The resulting list contained 62 different, mostly technical factors (cf.Table A1), all being useful to discriminate different approaches of robotic user interface design involving MR.Caused by the technical nature of these factors, their actual values are easy to determine without the need to be experienced with the actual system and without the interference of individual mental models of different specific users or operators.The achieved objectivity of the selected factors makes them useful for other researchers and designers to be utilized in classification tasks.
After extracting these factors, we applied open card sorting in order to cluster these into named groups.Due to the high number of relevant factors, we were interested especially in these groups in order to be able to create a more granular taxonomy for a better understanding of relevant categories to the special case of MRRUI design.The card sorting was prepared by noting down every factor from Table A1 to the front side of the cards.The description of the each factor (cf. Table A1) was written on the backsides of the cards.To avoid strong bias by the authors, the participants were instructed not to look at the backside of the cards unless every card was assigned to a named category.During the experiment, 9 experts (3 female, 6 male) from HCI research with their focus on virtual reality, augmented reality (AR), and interaction with 3D user interfaces were recruited to perform open card sorting on a shared set of cards in an open group form.Participants were allowed to enter and leave the session as they liked during a 3-h time slot.The active times of the participants were protocoled.The session took 2 h and 40 min, and as a result, the 62 factors were clustered into 13 groups in an iterative approach.Each participant spent on average 35 min on the task (SD ≈ 18.498).The protocol revealed a total temporal effort of 5 h and 17 min.During the session, audio recordings were made, and the participants were instructed to think aloud.This was intended to identify misunderstandings of the tasks and to better understand the decisions of the participants.Participants were allowed to note down comments on the cards themselves and to create new cards or to remove already existing cards if an explanation was provided.No comments were made, and no card was removed.One card, DoF, was duplicated by the participants and integrated into two different categories, interaction parameters and input.
Followed by the card sorting session, the resulting groups were further clustered by the authors into four major categories in order to create a better memorable structure.The session protocol was considered during the decision process to reduce the influence of the authors' peculiar view.This step of further categorization is important to increase the likelihood that the resulting taxonomy is adopted by other researchers.The resulting categories were further discussed with two researchers from our department, who did not coauthor this contribution, but took part in the card sorting experiment, until the result revealed a reasonable structure without logical flaws (cf.Table 1).

Card Sorting Results
Based on a literature investigation, a card sorting experiment with HCI professionals was performed.Further analysis of the card sorting results revealed a general structure for the classification of MRRUIs with four major categories, as shown in Figure 6.The detailed results of the two clustering steps are summarized in Table 1, listing each of the factors according to their associated group and category.The main categories provided a plausible and memorable structure with the aim to improve the workflow during classification or design tasks.The groups of the next detail-level resulted directly from the iterative approach of the card sorting experiment, in which all identified factors from the literature investigation (cf.Table A1) were clustered into groups.Many of these sources were listed in the related work (cf.Section 2).
Table A3 summarizes all factors on a single page, which serves as a template for profiling arbitrary MRRUI designs.Figure 6 together with Table 1, containing the results of the clustering, Table A1, the alphabetical list for reference and explanation, and the empty template in Appendix A (cf. Table A3) serve as a framework for classification and design of MRRUI.As demonstrated in the following section, using the IMPAct framework enables very precise and extensive analysis and description of relevant systems.Since we included only objective factors, mostly technical, which were under the control of the system designer, we can conclude that the results represent a useful framework for initiating diversity and effectiveness in MRRUI research and development.

Validation
In this section, the results are applied to an MRRUI from a recent publication.Example: Application of the Framework for System Classification.In the following, an MRRUI for a pick-and-place task [30] is classified using the IMPAct framework.A detailed analysis of the technical properties in the categories interaction, mediation, perception, and acting, including their subgroups, is listed and, if necessary, further explained to enable readers to have a complete understanding of the setup and implementation given.

Interaction
Interaction is represented by the two groups interaction parameter (cf.Table 2) and paradigm (cf.Table 3).The actual values of the classification of the analyzed system are noted in the tables.Further explanations are given in the following paragraph.Some characteristics of the interaction depend on the interaction level of the robot.The AFR describes Class D as an extension of Class C, adding the capability to the robot of acquiring information from its environment.Class C itself is defined as "programmable, servo controlled robots with continuous or point-to-point trajectories" [14].The degree of spatial matching, expressed by the factor Directness, is probably lowered by sensor error.This involves detection of the position of the targets for grasping by the robotic system and the registration process within the implementation of marker detection at the main UI device for aligning the virtual world components along their actual alter egos.The interaction method for selecting targets needs the human to be at an appropriate position and to look at a certain point at the surface of the target.One could argue that this method involves 6 DoF, 3 for defining a position in the real world of the human's head and another 3 for rotating it as desired.Mathematically, the rolling rotation around the forward vector of the head is not used, so 5 DoF is correct, as well.The interaction with real-world objects is mediated through the robotic hardware after selection using virtual counterparts of the actual targets.Thus, the interaction reality is rated as virtual.Nevertheless, there is a kind of passive interaction, as well.By utilizing a see-through HMD, the actual targets are observable in the real environment during displaying of planned trajectories, which are visualized by actuating the virtual model in a loop with the very same trajectories.Even if selection is the most prominent active interaction concept, here system control is identifiable as well, when voice commands are used to start a specific action.Then, the state machine of the system moves over to a different state, changing the way of interaction in the next state; e.g., when confirming a picking trajectory with a voice command, the system proceeds to the execution state, and after finishing the grasping, it is ready to receive a command for placing the grasped object on the table.The current system design in the experiment, described by Krupke et al. [30], is for single users, single robots, and performing a single pick-and-place task.However, the implementation would also allow multiple operators to be logged into the system, simultaneously.The robot itself had integrated joint controllers, which were accessed by the dedicated industrial computer, provided by the manufacturer of the robot.The control computer sent goal positions to the joint controllers.The low-level joint controllers itself used a feedback control loop to keep joints at a desired position and to alert about position mismatch, or unreasonable high currents, or other issues to the control computer.During virtual or actual robot movements, the operator served as a supervisor, but during the planning steps of selecting a pick position or selecting a place position, he/she fulfills the role of a leader.The RUI-type is classifiable as interactive, but with a tendency towards very high-level programming.Regarding the history of user interfaces, the system may be called a spatial or supernatural user interface; spatial because the user is situated in the very same place of the operation and benefits from exploring the RE by natural walking and looking around; supernatural because he/she can see the probable future.Despite the focus of interaction, there was also classical graphical user interface (GUI) elements realized in the implementation.A head-up display showed the system state in textual form and gave feedback about the success of a given command.Regarding the way commands were given by the operator, the term natural user interface (NUI) is applicable because voice commands were triggering the main functions of the system.

Mediation
Mediation consists of three groups image/display/vision (cf.Table 4), input (cf.Table 5), and output (cf.Table 6).The actual values of the various factors are collected in the tables.The following paragraph contains further explanations.The level of vividness is mainly limited by the hardware and the selected software used for rendering.Since the first version of the Microsoft HoloLens with a small field of view was utilized, as well as the general quality of the see-through display, regarding resolution, color range and brightness were taken into account.In addition, the quality of the real environment (RE) content by looking through the head-mounted display (HMD) was negatively influenced, causing overall mid-to-low vividness.The action measurement of the operator was mainly contributed by the integrated self-localization feature of the Microsoft HoloLens.Since relevant information is three-dimensional coordinates in the reference frame of the real world, marker detection served the task of providing a reference anchor for the transformation of local device coordinates to RE coordinates.The directness of sensation can principally be regarded as direct.The RE was directly viewed, and the virtual components were supposed to be anchored at corresponding points of their alter ego.Without sensor errors, this would be perfectly direct as well, but in our implementation, there was some minor perceptible error.From the robot middleware's view, the system input was just a single 3 DoF position within the reference frame of the target object and its name.On the lower level, during user input, walking was used to reach a certain position in the RE, then the posture of the body might be altered, and finally, the head was turned around pitch and yaw axes to move the cursor along the surface of the target object until the desired position was reached.Typically, the system used static posture as the gesture type for defining pick and place positions and confirmed the current selection with a voice command.The second condition used the pointing gesture of the index finger; thus, a motionless finger posture was consulted in this case.

Factor Value
Actuator Type robot arm and gripper Display Type see-through HMD

Perception
In the category perception, four groups, embodiment (cf.Table 7), immersion/user perception (cf.Table 8), modalities (cf.Table 9), and space and time (cf.Table 10), were analyzed.Relevant factors and their actual values are collected in the tables.Further explanations are provided in this section.

Factor Value
Extent of Body Matching no VB Extent of Proprioceptive Matching virtual cursor and viewing direction In the analyzed system, there was no virtual body (VB) involved.One could argue that a cursor, which was augmented by the view, matched at least a little the definition of a virtual body.In this case, it should be mentioned that there was no matching between the moving bodies, cursor, and head.However, the proprioceptive matching was appropriate in terms of visualizing the viewing direction and its spatial cues when projecting the cursor on top of the surface of viewed objects.The extend presence metaphor (EPM) is rated very high since the human was mostly looking at the RE, but through a see-through HMD.This fact can potentially lower the EPM in comparison to looking directly at the RE.Only some elements like the robot, a table, and some objects for grasping were virtual.However, these elements were anchored with reference to RE coordinates in a very stable way, resulting in a very well-performed integration to the RE.The integration of VE by superimposition of the virtual robot on top of the actual robot caused a logical mismatch to natural perception, since human perception usually does not allow the existence of two objects at the very same place.Regarding level of MR, due to the fact that there were only a few objects having virtual counterparts, we can conclude that the RE was very dominant.Artificial-looking objects, which were perfectly integrated into the real environment, caused a moderate reproduction fidelity since their level of detail was quite low and simple shaders were utilized.The causality of modalities was based on a mixture of data from reading an actual posture of the robot, then transmitting, and finally applying them to the virtual model, which was almost perfectly matching with the real robot.Some displayed trajectories were computed and represented only options for the future, but can become real.Regarding inclusiveness, it should be mentioned that sometimes, very bright virtual content on the see-through display can occlude or distract from parts of the real world.Except for this side-effect in the analyzed system, operators were not shut-off from the real world.The user experience was classified by three different concepts of data flow (cf.Figures 7-9), which were all present during system use and describe different aspect of the system.Transmitted experience (cf. Figure 7) represents especially how information like the current pose of the robot was transferred to the human.Simulated experience (cf. Figure 8) describes what happens when a new target position is selected and the generated trajectory from the robot middleware is sent to the HMD.Robot supervised by human (cf. Figure 9) represents the case when the robot started to manipulate the world and the human was interacting with a VE.Regarding location, it should be mentioned that the analyzed system is intended to be used side-by-side with the actual robot, but technically, it also works as a remote system.Unfortunately, then the cues provided from the RE are missing, and the potential of recognizing problematic situations with probable collisions is reduced.Even if the real-time level of the system is classifiable as real time, it is not designed for continuous real-time control from the interaction point of view.Evaluating the factor surrounding reveals that, despite the low FoV, the tetherless operation and the inside-out 6 DoF tracking of the HMD resulted in a spatial experience.The VE can be viewed from arbitrary positions and explored by walking around.Thus, even by the visual experience of looking through a looking glass, spatial presence was generated and improved the resulting experience of the surrounding factor in comparison to a simple 2D video glass with the same FoV.For the system extent, it should be mentioned that the HMD was self-contained, making it a head-mounted computer (HMC).It was connected by WLAN with the computer controlling the robot and, if necessary, extendable by additional network-based processing nodes.

Acting
Acting consists of the four groups autonomy (cf.Table 11), behavior (cf.Table 12), robot appearance (cf.Table 13), and application (cf.Table 14).The actual values of the classification of the analyzed system are noted in the tables.Further explanations are given in the following paragraph.The level of intelligence was divided among two devices.The robot control computer calculated collision-free trajectories according to its dynamic planning world.The input and output device of the operator, the Microsoft HoloLens, assisted in selection of grasp points, according to its programming.The transparency of the system state was limited by the implementation.All modeled states were displayed during system use in the head-up display element of the UI.

Level of Robot Anthropomorphism very low
The robot incorporated only a tendency towards anthropomorphism since the gripper was three-fingered and had some similarities to a humanoid hand, e.g., bending of joints is only possible in one direction when starting from a configuration in which all fingers are completely straight.

Robot Purpose tool Robot Type industrial
In the proposed setup, an industrial robot was used as a tool for manipulation tasks in a tabletop scenario.

Discussion
This section discusses the experimental results and explains their potential impact on MR robotics.Furthermore, the limitations of this contribution and future ideas are explained.

Conclusions and Discussion
In this article, the IMPAct framework for classification and analytical design of mixed reality robotic user interfaces was presented and applied to a robotic pick-and-place system utilizing the Microsoft HoloLens.Relevant factors were carefully investigated from the scientific literature.Prominent taxonomies from relevant fields were decomposed to find important factors that directly influence the system and can be determined regarding their actual values.Since many of these factors represent a position on a continuum and are not exactly measurable, it is, in general, a difficult task.Nevertheless, it was demonstrated that even with these inaccuracies, very detailed descriptions of existing systems are possible.The framework can be regarded as an important and necessary step towards holistic system design and description of MRRUIs.
The authors believe that using the proposed IMPAct framework for exhaustive system descriptions helps to remove ambiguity, as demonstrated in Section 4. By taking an exhaustive list of relevant factors into account, differences from other systems, as well as unique features, become clear.Additionally, the framework provides a methodology for improving actual system designs by providing a standardized template, which enforces intentional decision making.

Limitations and Future Work
Even though the authors performed an extensive literature investigation, it is likely that some relevant aspects are missing in the list of factors and thus are not regarded in the proposed framework.Especially, if certain topics are further developed and new taxonomies arise, the proposed taxonomy should be adapted to these changes and then incorporate new relevant factors of the given topic.In general, the authors believe that a description of MRRUI, covering as many relevant aspects as possible, is desirable.The proposed framework would further benefit from efficient means to assess the values of the listed factors in a more comfortable and time-saving way.An implementation as a wizard-based GUI tool with prepared options, tooltips, and web links for explanations of the single factors would reduce the current workload.
The proposed work was limited to objective and technical measures, which are assessable without any practice with the analyzed or described MRRUI.Qualitative aspects were neglected in this contribution, but are valuable feedback for further improvements in system design.An extension of the IMPAct framework would explain how the objective factors are related to perceptive and qualitative factors.Then, the framework would provide means for holistic re-design as a part of an iterative design process of MRRUI.Furthermore, some robotic web archives are currently arising (cf.https://robots.ieee.org/robots/).Currently, there is no such site for MRRUI.The proposed framework within this article could serve as a basis for creating an MRRUI archive.

Factor Values Description
Extent of Body Matching [6] no correlation-perfect matching In case of the existence of a virtual body: How strong is the virtual body matching with corresponding real body?
Extent of Proprioceptive Matching [6] no correlation-perfect matching In case of the existence of a virtual body: How strong can the proprioceptive matching be?
Extent of Presence Metaphor (EPM) [7,8] WoW/monoscopic imaging vs. HMD/real-time imaging How strong is the display technology likely to induce sense of presence.Or better: How immersive is the technology?
Extent of World Knowledge (EWK)/where and what [7,8] unmodeled-modeled How much of the real world is known in detail?
Gesture Type [11] static posture, static oriented posture, dynamic/moving gesture, moving oriented gesture If so, what category of gesture is used?
Human-Robot Communication [27] single dialog-two (or more) monologues Is the communication between human and robot based on equal rights?
Inclusiveness [6] none-complete To which degree are you shut off from the modalities of the real world around you?
Initiative [31] fixed-mixed How is the initiative of acting or communicating between human, robot and the rest of the system?Fixed at a single actor or distributed between several partners?Integration of VE [12] separated-integrated Spatio-temporal relationship between real environment and virtual environment.
Intelligent Robot Control Paradigm [15] hierarchical, reactive, hybrid How is the intelligent behavior of the robot organized?
Interaction Reality 1 physical-virtual Is the interaction mainly with real or virtual objects?
Interaction Type [10] Selection and Manipulation, Travel, System Control Which of Bowman's 3D interaction category is mainly present?
JIRA Intelligence Level of the Robot [14] Class 1-Class 6 Which class of intelligence fits the robot, involved in the system, at most?Level of Capabilities (cf.[24]) System skills are not enough to solve/act-out any task-system is capable of acting-out at least any task perfectly How (potentially) skilled is the system independent from the operators skills?

Figure 4 .
Figure 4. Virtual reality robot operation (inspired by the remote robot operation by Crandall et al. [3]).

Figure 5 .
Figure 5. Mixed reality robot operation (inspired by the remote robot operation by Crandall et al. [3]).

Figure 6 .
Figure 6.The overall structure of the mixed reality robotic user interface taxonomy.MRRUIs are directly shaped by the aspects of interaction, mediation, perception, and acting.These groups including their relevant subcategories and containing factors generate the IMPAct framework for MRRUI classification and design.

Table 1 .
Results of the card sorting.

Table 10 .
Space and time.