Systematic Review of Multimodal Human–Computer Interaction

: This document presents a systematic review of Multimodal Human–Computer Interaction. It shows how different types of interaction technologies (virtual reality (VR) and augmented reality, force and vibration feedback devices (haptics), and tracking) are used in different domains (concepts, medicine, physics, human factors/user experience design, transportation, cultural heritage, and industry). A systematic literature search was conducted identifying 406 articles initially. From these articles, we selected 112 research works that we consider most relevant for the content of this article. The articles were analyzed in-depth from the viewpoint of temporal patterns, frequency of usage in types of technology in different domains, and cluster analysis. The analysis allowed us to answer relevant questions in searching for the next steps in work related to multimodal HCI. We looked at the typical technology type, how the technology type and frequency have changed in time over each domain, and how papers are grouped across metrics given their similarities. This analysis determined that VR and haptics are the most widely used in all domains. While VR is the most used, haptic interaction is presented in an increasing number of applications, suggesting future work on applications that conﬁgure VR and haptic together.


Introduction
We present a systematic review of multimodal human-computer interaction (HCI) with the primary objective of showing how different types of technologies are used in different subject areas, herein called domains. Various domain-specific surveys and reviews have recently been published. In particular, studies have focused on VR use in manufacturing [1,2], VR in education [3], haptic interaction in medicine [4], orthopedic surgery [5], medical training [6], wearable haptics [7], VR systems in museums [8], and cultural heritage [9].
We are not aware of a systematic review covering multiple technologies used in different application areas. Most of the above-mentioned studies analyze, in-depth, a small number of domains and types of technology (e.g., VR in manufacturing, haptics in medicine, etc.). There is a need for a more extensive study covering various domains that does not provide a detailed review but paints a larger horizontal picture. Thus, a critical research endeavor would be to identify the combination of multiple domains and various technologies for the development and research of HCI. To make advances toward this research gap, this study addresses the research questions described in Section 2.
Multimodal interfaces offer efficient and expressive human-computer interaction. The term "multimodal" focuses on the combination of human perception channels (vision, touch, hearing, taste, and smell) to involve various human abilities (communication, cognition, and perception) in order to improve the user's understanding of what is being presented computationally [10]. This is achieved by including sensory technologies such as haptic displays, virtual reality, and augmented reality. This systematic review covers multimodal human-computer interactions based on the use of types of technologies such as haptic displays, virtual reality, or augmented reality, and the use of devices that allow specific one-directional and bidirectional interactions. This review also presents the bases to guide researchers towards possible intersections that appear mainly in the domains and technologies mentioned in this work. Furthermore, this study also identifies how research and technology developments have been carried out over the past few years. The primary contribution of this study is in providing an overview of what has been researched and developed to date and serves as a guide to identify and develop future research by working with multimodal interactions in several domains.

Types of Technologies
We base the categorization used on Displays and Interactions in previous works by Anthes et al. [11] and Hornbaek et al. [12]. In addition to the information provided by the works reviewed for this systematic review, this survey focuses on various types of displays (visual 2D, immersive (VR), augmented, and haptics), types of interaction (touch, vibration, wind, temperature, audio, and gizmo), and types of tracking (object, eye, hand, head, and body). We do not focus on the type of graphics primitives and underlying data (points, meshes, voxels, etc.), whether temporal, or the kind of visualization used; for these topics, we refer the reader to surveys [13][14][15]. Despite recent advances in computer hardware, this survey also does not focus on the specific hardware used. The following sections describe the types of technologies that are the focus of our research in order to define the scope of our study.
Displays are devices that convey information to the user. They can be categorized into being dimensionless, such as audio, 1D displays used to display messages in the Braille alphabet, 2D display, and 3D that are also called virtual reality (VR) or immersive [16]. While 2D displays do not require any knowledge of the user's position, immersive systems need to track the user's viewing position and direction to synchronize the display with the motion.
Augmented reality enables an interactive real-world experience enhanced with perceptual computer-generated information [17], allowing users to combine the real world with various computer-generated content to enhance the real world with information from a computer [18].
Haptic displays represent the sense of touch through force-feedback devices, which generate forces that allow the user to apply pressure to explore virtual objects [19]. This type of display provides kinesthetic and tactile information about the virtual environment via sensors, control circuits, and actuators that vibrate or exert force [20]. An example would be having difficulty pressing buttons that result in dangerous actions within a virtual environment [19].
Interactions include one-directional communication with the computer device, such as using a mouse or touch-based devices that scan the applied pressure.
Bidirectional interactions include devices that provide vibration, haptic devices that apply forces, and even wind and temperature.
A special kind of interaction uses various devices that we call gizmos. Some authors define a gizmo as a mechanical device or gadget that is used to perform a mechanical procedure [21]. These interactions are related to the use of controls or command devices to interact with the system by using ad hoc hardware components.
The last interaction includes tracking that monitors objects, humans, or parts of human bodies in space and encodes this information to electric signals that are interpreted by the computer. This systematic review considers object tracking with the eye, hand, head, and body.

Domains
We based the definition of domains on the subject areas proposed by Freina and Ott [3] (e.g., medicine, physics, computer science, social science, materials science, and engineering), Vera-Baceta et al. [22] (e.g., arts and humanities, life sciences and biomedicine, physical sciences, social sciences, and technology), Garcia et al. [23] (e.g., arts and humanities, computer science, medicine, physics and astronomy, and social sciences), and on the information provided by the research publications analyzed for this systematic review. We link the use and the advances of these categories to the following domains: Concepts and overviews, Medicine, Physics, Transportation, Cultural heritage, Industry, and Human factors/User experience design (UX). It is important to mention that since UX can be viewed as transversal to all disciplines and we consider that multimodal interaction enriches and improves user experience, it was added to the study. The selection of the domains is a result of careful analysis of a vast body of papers and is detailed in Section 2. This survey shows how each discipline utilizes various modalities and inspires future work.

Methods
This systematic review follows the structure and methods according to the guidelines for performing Systematic Literature Reviews described by Page et al. [24], the guidelines of Kitchenham and Charters [25], Xiao and Watson [26], and Torres-Carriét al. [27]. We developed a review protocol to guide our research. In particular, the search strategy was conducted by using the following steps: (1) definition of research questions; (2) search method for the identification of studies; (3) quality assessment; (4) paper inclusion and exclusion criteria; (5) data collection; and (6) data analysis first presented as a synthesis of the manuscripts identified, followed by the response to the research questions.

Research Questions
In order to guide our study towards the objective of determining the effort made in multimodal interaction and the next steps in multimodal HCI work, we identified the following four research questions:

RQ1:
How has the type of technology changed over time in each domain? RQ2: What is the typical technology type by domain? RQ3: How has the frequency of research publications changed over time by domain? RQ4: How are research publications grouped across metrics given their similarities?

Keyword Identification
We first identified keywords related to human-computer interaction, virtual reality, augmented reality, and haptic devices.
First, the selection process was based on general keywords related to the scope of this work, such as human-computer interaction, virtual reality, augmented reality, and haptics. Then, combinations of these words were used to find results where more than one of the terms were combined (e.g., haptic virtual reality, human-computer interaction in virtual reality). Finally, a combination of all keywords was marked with the domain name and terms directly related to each domain to identify jobs related to each specific domain (e.g., virtual reality in transport, virtual reality haptics in museums, human-computer interaction in manufacturing). Table 1 shows the keywords used to construct the combination of queries.

Study Identification Search Method
We searched for research works in which multimodal human-computer interactions were presented based on the use of haptic displays, virtual reality, or augmented reality. We used the Purdue University Library to identify relevant studies related to the objective of this systematic review. The library provides complete access to 676 databases and most existing sites and publishers. We also used Google Scholar to find studies related to the objective of this systematic review.

Domains Keywords
Concepts and overviews Human-computer interaction, virtual reality, augmented reality, haptic, visualization, and behavioral theories Medicine Human-computer interaction, virtual reality, augmented reality, haptic, medicine, surgery, training, rehabilitation, and dentistry Physics Human-computer interaction, virtual reality, augmented reality, haptic, physics, surfaces, object grasping, fluid mechanics, electromagnetism, dynamic systems, astrophysics, and molecular physics Transportation Human-computer interaction, virtual reality, augmented reality, haptic, transportation, driving, and flight Cultural heritage Human-computer interaction, virtual reality, augmented reality, haptic, cultural heritage, museum, archaeology, and tourism Industry Human-computer interaction, virtual reality, augmented reality, haptic, industry, and manufacturing Human factors/User experience design Human-computer interaction, virtual reality, augmented reality, haptic, user experience, user factors, and product factors

Quality Assessment
Purdue University Library and Google Scholar provide journal ratings and the event's importance (their h-index). Based on these ratings, we identified research works published during the last eight years in the top-ranked 20 conferences and journals in Computer Graphics, HCI, haptics, and VR/AR up to July 2021 related to multimodal human-computer interactions that use haptic displays, VR/AR, and devices that allow specific one-directional and bidirectional interactions that present tracking.

Data Collection
We focused on types of interactions depending on the display type used in each domain. In particular, we reviewed the following: IEEE Transactions on Visualization and Computer Graphics; IEEE Transactions on Haptics; ACM CHI; IEEE Computer Graphics and Applications; IEEE Symposium on Visual Analytics Science and Technology; Joint EuroHaptics Conference; Visualization and Data Analysis; ACM Transactions on Graphics; Virtual Reality; HCI International; ACM UIST; IEEE Virtual Reality Conference; International Conference on Haptics perception devices mobility and communication; Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems; and ACM TOCHI, among other related conferences and journals. We found an initial set of 406 articles. After an exhaustive review, 112 articles were selected; we kept only the documents with the most relevant content and characteristics for analysis (see Figure 1).

Inclusion and Exclusion Criteria
While subjective, the key criteria for inclusion were user interaction and display technology. Special attention was paid to works related to multimodal human-computer interactions based on the use of haptic displays, virtual reality or augmented reality, and the use of devices that allow specific one-directional and bidirectional interactions, which could present tracking.
The exclusion process began with an identification phase in which words related to multimodal human-computer interaction were searched based on the keywords of human-computer interaction, VR/AR, and haptics in databases following the parameters mentioned in Section 2.4. We excluded works dealing with basic computer graphics algorithms, as well as works dealing with advances in computer hardware. The works considered duplicates were eliminated, as were those not contributing or not aligned with the scope of this work. Then, we read all works that still presented a possible contribution to the objective of this study. Several were excluded because they did not have clear information or were not related to the scope of the work. Finally, the remaining works were reread and analyzed in-depth; we realized that some would not provide as much input as expected, and therefore excluded them. In the end, the 112 works were selected.

Data Analysis
According to the methodology selected for our literature review [24][25][26][27], data analysis was performed in two stages. First, all manuscripts identified were synthesized. For this process, we organized the manuscripts by domain. Then, we searched for the response to the research questions from Section 2.1.
For the synthesis of manuscripts, the initial set was classified according to the technologies and domains used. We based the subdivision into different categories on the recent survey by Freina and Ott [3] that discusses the usage of VR in education. We also used the works of Anthes et al. [11] and Hornbaek et al. [12] to define the initial interaction domains. To balance the number of papers per domain, we merged domains with very few papers (for example, geography and transportation) and split domains with too many papers. This analysis results in a set of domains of all papers according to our criteria.
The classification result is a map that relates the application domain to the technology used. While we could have listed the documents according to each technology, such as according to the display, it would result in disproportional classes and more importantly, would not be interesting for people from different areas of application. Therefore, we decided to use the domain as the primary classification criterion, hoping researchers from different fields will be able to learn something about their area.

Results
This section is organized into two main parts. First, we synthesize the results of the literature review, characterizing the most relevant papers identified by following the above-specified criteria and organized by each application domain. Then, we present the analysis describing the main tendencies of the research regarding (a) the number of papers by domain and technology type published over time, (b) the number of papers in each application domain, cross-measuring them with the kinds of applications, and (c) the proximity and similarity of papers from the viewpoint of domain and type.

Synthesis
The following presents each application domain and the most relevant papers according to the above-specified criteria. In particular, we discuss the following domains: (1) Concepts and overviews include algorithms and methods that can be applied across different categories, (2) Medicine, (3) Physics, (4) Human factors/User experience design (UX), (5) Transportation, (6) Cultural heritage, and (7) Industry. Papers are further grouped into smaller blocks, and each block is organized chronologically in ascending order.

Concepts and Overviews
This section describes contemporary methods that are not domain specific and can be applied in multiple areas. It also includes reviews from all categories discussed in this paper summarized in Table 2.

Visualization
It is a large domain by itself, and we refer the reader to [28] who presented a survey of interaction and data representation in scientific visualization. Reda et al. [29] introduced a hybrid reality system in which they study 2D and 3D data visualization in large-scale settings using immersive technology. Olshannikova et al. [30] overview methods for processing Big Data, their visualization, and integration with AR and VR. They conclude that visualized data can significantly improve understanding of the pre-selected information and create a new interactive system to operate with the visualized objects. Moreover, human perception and cognition must be considered, and virtual and physical objects should be well-integrated.

VR and AR
Freina and Ott [3] survey the use of VR in education, and the state-of-the-art of Slater and Sanchez-Vives [31] reviews VR in domains focused on applications with some level of research support. Mihelj et al. [32] introduced characterization and definitions of manipulation, interaction, and navigation in multiuser VR. Critical elements of VR experience using a CAVE environment and a taxonomy based on the interaction (vibration, gizmo, hand, head, and body tracking) were discussed in [33]. Another work that deals with the taxonomy for VR is [11], and it also focuses on the hardware and the different technical characteristics of the existing types of devices. Chavan [34] studied a comparison of VR and AR in various contexts, including price, differences, similarities, and application areas. They discussed future work options, including screen resolution, eye and face tracking, and more advanced controls. The work of Rubio-Tamayo et al. [35] discusses basic concepts of VR and the relationship between the virtual and the real world and presents the ideas of representation, expressiveness, and interaction in VR.

VR and Haptics
Achibet et al. [36] introduced a virtual glove as a VR approach to tactile experience in an immersive environment. It is a visuo-haptic system for grip force interaction; it uses pressure sensors and cameras for hand tracking, allowing interaction with virtual objects. Deng et al. reviewed the use of haptic devices and eye-tracking in [37]. Pacchierotti et al. [7] discuss taxonomy, design, perspectives, and review of wearable haptic systems for hand and fingertip, and they also discuss the role of wearable haptics in the cutaneous stimulus.

Behavioral Theories of HCI
Hekler et al. [38] explained and presented a guide to interpret, use, and contribute to the behavioral theories of HCI. They realize that this guide is superficial and that, in future works, it is possible to go further on topics such as the best methods to evaluate change technologies of behavior in HCI research, a full understanding of the required knowledge that each field requires before committing to the other, the possibility of distortions arising from mistranslations of concepts between areas, and the impact of socio-cultural differences related to the origin of theories about the interpretability, utility, and generalization of different behavioral approaches within an HCI context. Its ultimate goal is to call HCI behavioral scientists and researchers to work more closely together, both in designing behavior change technologies and developing better theory.
Vines et al. [39] studied the concepts that should be taken into account when users participate in developing an HCI system. Their goal has been to draw attention to the plurality of participation in HCI, and the problems and possibilities that this brings to future research. They seek to present a more nuanced understanding of shared control between researchers and participants. Finally, they indicate that these are exciting times because new technologies come to new audiences and new perspectives arise on what design could and should be.
A review on the introduction of hedonism in HCI was presented by Diefenbach et al. [40]. An important aspect is the conceptualization of the value of experience in terms of the product's attributes.

Medicine
Human-Computer Interaction in medicinal applications is closely related to robotics and, compared to other areas, has specific requirements such as high precision, fast feedback, intuitiveness, and realism. Ruthenbeck and Reynolds [41] presented state-of-the-art use of VR and haptic devices for medical training, and Vaughan et al. [5] provided a review of visuo-haptic training simulators for orthopedic surgery (see the summary in Table 3).

Surgery
An early work of Talasaz and Patel [42] discusses haptic teleoperation to locate minimally invasive robot-assisted tumors. They use an on-screen display to see the location, and a haptic interface controlled by hand to handle the robot's assistance. Díaz et al. [43] presented a haptic system for surgery drilling assistance, which is composed of a haptic pedal and a visual interface. The system transmits vibrations and audio feedback to the user during the interaction. Jeon and Harders [44] introduced a system to operate on palpate tumors through the use of AR and haptic devices. A comparison of environments that aid learning visuo-haptically to perform surgeries and traditional forms was presented in the work of Esteban et al. [45]. The authors conclude that introducing the sense of touch in surgery simulators through haptic devices is an essential addition. Ruffaldi et al. [46] introduced a visuo-haptic system to perform ultrasonography by using VR and a haptic manual device together with hand and head tracking.
Training Several works addressed simulations and training for medical purposes (see also the review [4]). One of them is the work of Fortmeier et al. [47] that presents a visuohaptic simulation framework for needle insertion capable of simulating patients breathing movements. Cardiac life support VR training simulator was studied by Khanal et al. [48]. In addition to the immersive visualization, their system uses haptic devices and audio to provide timely feedback for error detection and correction feedback for proper technique. They concluded that VR-based training could effectively complement the conventional training method. Hamza-Lup et al. [49] surveyed the use of visuo-haptic simulation in the surgical training process, features, APIs, and frameworks on the haptic devices used in this type of training. They described the methodology for simulating a laparoscopic surgical procedure using a visuo-haptic interactive application. Finally, Pan et al. [50] demonstrated a VR system combined with haptic devices for laparoscopic rectal surgery.

Rehabilitation
Rose et al. reviewed VR for rehabilitation in [51]. They report that immersion improves the navigation task's performance and accuracy of the assignment and provides instability in posture. They mention that the potential of VR for rehabilitation is not fully explored. Andaluz et al. [52] presented a system for upper limb rehabilitation with the use of VR coupled with a haptic device for feedback of force and vibrations in addition to a method for tracking the hand and fingers. Won et al. [53] reviewed immersive VR systems for rehabilitation of pediatric pain by categorizing the qualities and practical aspects of VR. They concluded that, together with the applications and effectiveness of VR for the treatment of pain, pediatrics is necessary to understand the impact on the quality of life of pediatric patients.

Dentistry
Wang et al. [54] surveyed virtual multisensory feedback systems for dental training. They summarize the components, functions, and unique characteristics of several methods and discuss the technical challenges behind these systems. Wang et al. [55] evaluated a VR dental simulator for the drilling operation. By using haptics, audio, and VR, they proposed adding extra haptic support that simulates the fingers to perform complex tasks and improve the graphic representation of the virtual environment. This area includes several works dealing with the HCI system that use physics and related fields. Kucukyilmaz et al. [56] presented an experimental study on a system that recognizes an intuitive communication between the partners during remote haptic collaboration in physics. The results suggest that human-computer communication can be improved by adding a decision-making process in which the computer infers the intentions of the human operator and dynamically adjusts the controls of the interacting parts to ensure more intuitive interactions. Table 4 summarizes this section.

Surfaces
Donalek et al. [57] presents a platform for data visualization using collaborative VR, where the VR headset and the tracking device were used to place the hand in the virtual environment. Kim and Kwon [58] proposed a haptics rendering method based on geometry for 2D images. Their focus is to estimate the haptic information of the structure of objects contained in 2D images while preserving the image. Kokubun et al. [59] described a visuo-haptic system to represent normal and shear forces in a mobile device through pseudo haptic interaction and a subsequent tactile interface to evoke the haptic sensation without using haptic devices. Nakamura and Yamamoto [60] described a prototype of a visuo-haptic system with multitouch surface interaction that uses direct electrostatic stimulation as feedback. They use haptic gloves on a multitactile screen report that the rendering experiments for dynamic objects revealed a problem known as "object stiction", which is exclusive to multi-touch haptic systems and is caused by the non-directional nature of electrostatic stimulation that appears when an object is pinched and dragged at the same time. Visuo-haptics simulation of friction has been studied by Yuksel et al. [61]; Yuksel et al. [62] and by Neri et al. [63]. Visual and visuo-haptic simulations were compared to physical simulation, and the learning gain was considered.

Object Grasping
Prachyabrued and Borst [64] researched visual feedback to understand the signals that improve behavior when manipulating virtual objects with fingers. Eight visual feedback techniques were compared and evaluated to improve performance or subjective experience. Among them, audio, gizmos, and hand tracking were used. The authors concluded that future work should combine other techniques such as haptic or heuristic release feedback. Madan et al. [65] presented work to obtain a more in-depth understanding of recognizing patterns of collaborative haptic interaction in manipulating dyadic articular objects.

Fluid Mechanics
Wang and Wang [66] proposed a hybrid model for haptic interaction with fluids based on solid-fluid interaction. In addition to evaluating the efficiency of the hybrid model, comparative experiments and result analysis are presented. The authors mention that future work should detail the special effects. There is a haptic interaction with the fluid, accompanied by turbulence, filtration, bubbles, and the acoustic phenomenon to improve telepresence.

Electromagnetism
Walsh et al. [67] compared physical simulations of systems with pulleys to visuohaptics simulations, and the same group in [68] used visuo-haptics simulations to study the learning of electric charges and the magnetic fields. This work was also considered by Shaikh et al. [69]. The authors conclude that visuo-haptic simulation has at least the same learning gain as a simulation without haptic feedback.

Dynamic Systems
Amirkhani and Nahvi [70] designed and implemented an interactive visual-haptic laboratory for students to experience the theory discussed in class. They discovered that the interactive virtual laboratory compensates for the lack of a real laboratory. More experiments related to those presented can be included as they are dynamic and include the vibration.

Astrophysics
Another example of usage of simulations in education is the work of Lindgren et al. [71] that studied gravity and planetary movement in a mixed reality system of immersive interactive simulation. They compared learning and perceptions about science with students who used a desktop version of the same simulation, concluding that the students who used the immersive interactive full-body simulation showed higher learning and more positive attitudes towards the simulation experience and towards the learning environment.

Molecular Physics
Edwards et al. [72] showcases an immersive visuo-haptic system for learning organic chemistry. In particular, the learners manipulate hydrocarbon molecules by using vibrations, a haptic glove for hand tracking, and a VR headset for head tracking. They show how an immersive learning experience integrates several learning approaches that include multimedia and multisensory instruction. This area includes several works dealing with the HCI that use human factors/User experience design and related fields, as summarized in Table 5.

User Factors
Okamoto et al. [73] reviewed psycho-physical dimensions related to tactile perception, concluding that the tactile perception of materials is composed of five dimensions. They also mentioned that promoting studies on the perceptual mechanisms of each tactile dimension will confirm the classification. Kober and Neuper [74] analyzed how a personality and a presence in VR depend on the level of immersion through the realization of VR navigation tasks and the application of questionnaires. Cavrag et al. [75] used a visuo-haptic interaction system for the treatment of arachnophobia (fear of spiders). Their work compares and discusses the modeling of 3D objects and test scenarios. An analysis of placelessness, spacelessness, and formlessness in virtual possessions perception has been evaluated in [76]. The authors reflected and synthesized findings in five field studies investigating people's practices in digital environments and their attitudes toward virtual possessions.
Social interaction in immersive environments was studied by Bombari et al. [77]. The perceived realism of virtual humans can be improved by adding features such as making participants feel that virtual humans better understand them, including the the-ory of the mind, verbal behavior, and a physical aspect congruent to its characteristics. Ahmed et al. [78] evaluated haptic technologies that simulate the affective touch in VR. They conclude that regardless of the agent's expression, force feedback becomes more natural than the vibration only. Kyriakou et al. [79] examined the attributes of virtual human behavior that can increase the plausibility of crowd simulation and can affect user experience in the virtual environment.

Product Factors
The role of haptic feedback for the integration of intentions in shared execution tasks has been studied by Groten et al. [80]. Several experiments showed two users moving an object together with audio and a haptic device with haptic hand tracking. They conclude that mutual haptic feedback is a valuable channel for integrating haptic tasks if shared decision making is required. Aras et al. [81] presented a quantitative evaluation of effectiveness using two visualization techniques with haptic interaction to manipulate 3D objects in virtual environments. Their study serves as the basis for more advanced studies of visuo-haptic coupling and its impact on the mental/cognitive workload.
Hamam et al. [82] introduced a taxonomy to classify the experience quality parameters for visuo-haptic environments. Next, an experiment with a visuo-haptic system to balance a ball is presented to test the model. Achibet et al. [83] offer a device for haptic feedback using an elastic arm to increase interaction and perception in virtual environments, and they showed cases to illustrate the capabilities of the system. Fittkau et al. [84] introduced a model to explore a visualization metaphor called software cities through VR. An evaluation of gestures created for the use of the system is presented, and the possible assessment of the system with other VR devices and tracking is proposed.
Moran et al. [85] showed a tool to improve the visual analysis of Big Data with interactive VR, which allows visualization and interaction that can facilitate the understanding and representation of extensive data. They conclude that VR also serves as a data visualization platform enabling the most efficient user interaction with patterns and visual analysis when working in a geospatial domain. The work of Atienza et al. [86] presented a VR interaction technique using head gaze. They found that using hands and feet to navigate and control the environment degrades immersion level. In addition, the users prefer reliability over intuition in the system, and intelligent navigation guides may be the next significant improvement. Carvalheiro et al. [87] proposed a haptics interaction system for VR based on a combination of tracking devices and a real-to-virtual mapping system for the redirection of users. Y.-S. Chen et al. [88] used an augmented and connectable haptic device to improve control in immersive virtual situations in which the user can receive audio, visual, wind, thermal, and force feedback. Matsumoto et al. [89] showed an environment that efficiently directs a user within a visual-haptic climate by using tactile signals to modify spatial perception actively. M. Kim et al. [90] proposed a system that uses a tracking device as a haptic interface to interact immersively in VR by using a haptic device in hand with vibrations and heat feedback. Lee et al. [91] presented an immersive virtual environment based on mazes that seek to provide users with a greater sense of presence and experience through the virtual scene and immersive interaction with the use of a VR headset and a tracking device for the feet. The work of Maereg et al. [92] presents a vibrotactile haptic device to perceive the rigidity during interaction with virtual objects. Piumsomboon et al. [93] presented three gaze-based interaction techniques inspired by natural eye movements for VR immersion. Only recently, Reski and Alissandrakis [94] presented a comparison of various input technologies for interactive VR environments. They identified trends in the preference of visual representations, but physical controls in scenarios that stimulate exploration without a time limit are inconclusive.

Transportation
This section describes systems that use multimodal interaction and visualization for transport, and the papers are summarized in Table 6.

Driving
Grane and Bengtsson [95] studied how visual and tactile interfaces affect drivers' performance and how visual-haptic feedback could reduce the effects of driver distraction. They discovered that haptic support could reduce the impact of visual load without adding a cognitive load. Kemeny [96] analyzed the challenges of driving simulation through the use of VR and provided the main points that must be taken into account when carrying out this type of simulation with the perception of movement, distance, acceleration, and speed. Driving simulation has been a standard technology since the advent of VR for high-end 3D vision and tracking.
A haptic-multimodal interaction system with cooperative guidance, control, and cognitive automation was presented by Altendorf et al. [97]. Their case study compared the haptic device and haptic hand tracking in a virtual driving simulator. Mars et al. [98] studied human-machine cooperation when driving in a simulated environment with different degrees of shared control and haptic support. The authors stated that more studies should be carried out to determine how their results can be generalized to other shared control designs and different situations. Wang et al. [99] presented a shared control model for lane tracking through driver interaction with a haptic guide. Their results suggested that the higher the degree of dependence on the driver with the haptic-guided direction, the less the effort. Stamer et al. [100] presented a glove-based study for tactile and force feedback to support car driving in virtual simulations. They showed that visuo-haptic feedback brings essential advantages to virtual interactions.

Flight
Aslandere et al. [101] used a flight simulator interaction system in VR. The immersive system was equipped with audio, manual control with a virtual button, hand and head tracking, and simulated scenarios. They tested various configurations and concluded that the virtual interaction of a manual button depends heavily on the avatar of the hand, that the participants presented more efficient interaction with a less abstract virtual hand, and that the collision of a button was equal to its visual volume.
Li and Zhou [102] showed a VR flight simulator that supports real-time multiuser interaction. This exhibition is exciting and attractive as a new efficient form of scientific dissemination. Marayong et al. [103] presented a modification of the volumetric status display of the cockpit of NASA's next-generation air transport system, which is an advanced software tool for managing flights in real-time from the cabin. This study integrates force feedback into the cabin visualization framework and its effectiveness in performing two tasks: object selection and route manipulation. Oberhauser and Dreyer [104] presented a VR flight simulator that combines the flexibility of a desktop flight simulator with the level of immersion close to a full flight simulator. Their results show that the system provides reliable information on interaction with the human-machine interface making it a low-cost, trustworthy addition to the early development process of in-cockpit interaction technologies. When it comes to human function evaluations, Valentino et al. [105] developed a VR flight simulator with simple flight dynamics, limited terrain, and objects. This simulator provides a great perspective of flying. They also mention that the flight simulator was not complete and powerful because of its limited flight dynamics. Here, we discuss the use of multimodal interaction and visualization in the context of cultural heritage. See the summary in Table 7.

Museum
Chen et al. [106] presented an AR multimedia system that does not require the user to operate any designated hardware devices such as a keyboard, mouse, or touch screen. Computer vision retrieves the user's input signal by using an aerial camera, enabling various tasks with virtual objects such as mapping textures, text, and audio. Dima et al. [107] developed a haptic interaction system providing the illusion of touching museum artifacts. They concluded that non-digital prototypes presented more crucial sensory information. In contrast, digital prototypes offer the possibility of adding additional interactive elements that could improve interaction. Papaefthymiou et al. [108] presented a virtual museum environment that can be observed through a cardboard-style headset and controlled with different devices. Jung et al. [8] investigated the impact of VR and AR on the visitor's broad experience in the museum context. Only a few studies have been conducted in AR environments compared to VR environments. They indicate that VR and AR can be valuable tools for improving tourists' experience by motivating the intent to visit the actual destination. Kersten et al. [109] developed a virtual museum with two options: (a) interactive software and (b) a VR system, HTC Vive. They collected data about interaction in the exhibition and showed different animations, explaining the changes in the building's construction over the centuries. Tsai et al. [110] presented an AR museum information guide application. Their results show that usage conforms to usability standards and provides a positive experience during a visit.
Carrozzino et al. [111] investigated the possible positive effects that the use of avatars can provide to a virtual cultural experience, proposing a virtual museum with three different alternatives (panel, audio, and virtual guide), and comparing the results in terms of engagement and understanding of the proposed content.

Archaeology
Gaugne et al. [112] contributed to the multidomain research of archaeology and VR, and they concluded that VR could improve archaeology by proposing modeling and the collaborative analysis of archaeological objects. Pietroni and Adami [113] discussed fundamental concepts about the potential of virtual reconstructions of cultural sites. They mention that a virtual reconstruction should have different visualization, 3D models, narration, behaviors, visualization, and interaction tools. They concluded that it is necessary to design the content of virtual applications and ensure excellent communication. Barbieri et al. [114] described the development of a VR exhibit for interactive exploration of archaeological artifacts. Moreover, they address various technical issues related to the design of virtual museum exhibits based on standard technologies.

Tourism
Younes et al. [115], with the use of VR and AR, showcase the Roman theater of Byblos and discuss potential strategies for implementing this approach in other scenarios. Bekele et al. [9] mention that the use of technologies, such as AR and VR, enables a user-centered presentation and makes cultural heritage digitally accessible, mainly when physical access is restricted. Finally, this work mentions future research directions for AR and VR, focused on interaction interfaces, and suggests the implications for cultural heritage. The last content section describes the usage of multimodal interaction and visualization in industrial domains. Berg and Vance [2] surveyed VR in the industry, concluding that VR has grown over the past twenty years, and that its knowledge base has expanded significantly in this domain (see the summary in Table 8).

Manufacturing
Perret et al. [116] present a work dealing with implementing an interactive simulation with haptic feedback. They describe the challenges of movement integration, collision detection, and change to assembly constraints. They presented an idea of the maturity of the technology and concluded that one of the main future challenges will be introducing deformable objects since modeling this type of object is essential for the simulation of gripping tasks. Qiu et al. [117] presented a real-time model of a virtual human to perform assembly tasks. They also analyzed driving errors to help users choose a suitable motion capture system.
Xia et al. [118] presented a comprehensive review of VR and tactile issues for product assembly based on rigid pieces of soft wire; they researched new ideas and recent advancements in the area. Gonzales-Badillo et al. [119] studied the development and key features of a visuo-haptic system for planning and evaluating assemblies, which is intended to be used as a tool for training, design analysis, and route planning. The results demonstrated that it could be effectively used to simulate, evaluate, plan, and automatically formalize the assembly of complex models naturally and intuitively. Hamid et al. [120] studied advances in computer modeling, visualization, simulation, and management of product data through the use of VR. They mention that these technologies are a viable alternative for product manufacturing. They also concluded that it is essential to realize that VR is not solely for visualization purposes. Vélaz et al. [121] focused on the use of VR systems to teach industrial assembly tasks and studied the influence of interaction technologies on the learning process.
Abidi et al. [122] developed a VR haptic platform that allows management and interaction of virtual components for assembly. The use of haptics is an effective method to improve the sense of presence in virtual environments and the benefits for tasks such as virtual assembly. Choi et al. [1] surveyed using manufacturing and VR and evaluated the application of the VR technologies element in the context of developing new processes. They concluded that more research is needed to improve manufacturing competitiveness based on the dynamic integration of components, which requires the extension and constant development of related standards for dynamic integration, VR element technologies, and standards. Gavish et al. [123] assessed VR and AR training platforms for maintenance and assembly tasks, concluding that these platforms provide new interaction, and that users need time to learn how to use them efficiently. Grajewski et al. [124] tested different approaches to creating realistic, immersive educational simulations of workplace conditions for assembly operations with the help of haptic and VR systems. The level of realism was a crucial factor when performing immersive simulations of workstations for training purposes. This type of simulation was identified as an excellent tool for performing training tasks. Different types of views and AR features to show assembly instructions using AR applications were studied by Radkowski et al. [125]. This was demonstrated to improve user confidence while performing tasks, and it also allowed the transfer of the learned skills to other tasks. Al-Ahmari et al. [126] presented a manufacturing assembly simulation system that uses a virtual environment to create an interactive workbench that can be used to evaluate assembly decisions and assembly operations training. It is a comprehensive system that provides visual, auditory, tactile, and force feedback. Future work includes a series of user-based evaluation studies to assess the effectiveness of their system for training. In addition, they mention that other haptic feedback mechanisms, such as friction and gravity, will be added to the environment. Wang et al. [127] presented an AR simulator that assists in completing assembly tasks, facilitates assembly, planning, and product design. They conclude that it is crucial to improve depth detection to facilitate the construction of an assembly allowing a greater fusion of real and virtual components. Xia [128] surveyed the use of haptics for product design and manufacturing simulation. They observed that many researchers have developed their haptic interfaces to simulate the design and manufacture of products, but that most of these devices are still in the laboratory stage. Finally, Ho et al. [129] proposed and evaluated a VR training system for the assembly of hybrid medical devices. Their system integrates Artificial Intelligence, VR, and game concepts, and their results showed that the proposed training has significant advantages over standard VR training and conventional training.
Roldán et al. [130] propose a system to transfer knowledge in the context of Industry 4.0. The system provides an immersive VR-based interface for expert operators and trainees. The aim is to focus on applying the proposed system to more realistic assemblies as future work. Finally, they mention that a comparison of VR and AR in the industry context would be of interest to determine the future of immersive training systems.

Maintenance
Loch et al. [131] proposed a concept of haptic interaction in a virtual training system for maintenance procedures. They report that one benefit of virtual training systems is the attractiveness for students, as well as their flexibility in presentation and interaction; therefore, they are enhanced by the possibilities of haptic interaction.

Analysis
This section presents the analysis of literature guided by our four research questions from Section 2.1. We analyzed the collected articles by focusing on three main tendencies.
(1) Temporal: we compared the number of papers in each area over the last seven years. This analysis focused on responding to the first research question (RQ1) and question two (RQ2). (2) Frequency: we calculated the number of papers in each application domain and cross-measured them with the types of displays, interaction, tracking, and applications. This analysis focused on responding to research question three (RQ3). Finally, we have (3) Cluster: Each paper was considered to be a point in a multidimensional space, and we performed a cluster analysis to show the proximity and similarity of papers from the viewpoint of domain and type. This analysis focused on responding to research question four (RQ4). We can only speculate about the temporal tendencies from Figure 2. One observation is that the number of papers is substantially divided per domain or technology, which may exacerbate fluctuations. Another impact may be market behavior. There were several sizable purchases (e.g., Facebook acquired Oculus in 2014 [132]) that may have a future effect on these technologies. Still, there may be a lack of widespread adoption by users. Moreover, the device's price may be a substantial factor (Oculus cost about USD 400 in 2020, but the force-based haptic devices range from USD 150 to USD 20,000).

Frequency Analysis
Frequency analysis focuses on addressing RQ3 and shows the absolute histogram of domain and type of interaction used in Figure 3. The maximum value of 20 related to papers on the cross-section of VR and UX. Moreover, many papers studied UX in interaction with gizmos and varying kinds of tracking. VR has also been used frequently in Industry, Physics, and Medicine. Another frequently used technology is haptic interaction that has been studied in the context of Medicine, Physics, UX, and Industry. We found that certain types of interaction, such as temperature and wind, are not commonly used.

Cluster Analysis
This step investigates the proximity and similarity of the papers from the viewpoint of the domain and application described in Section 1, thus addressing RQ4. Let us recall that we have the following domains: Concepts and overviews (basic algorithms) (Section 3.1.1), Medicine (Section 3.1.2), Physics (Section 3.1.3), Human factors/User experience design (UX) (Section 3.1.4), Transportation (Section 3.1.5), Cultural heritage (Section 3.1.6), and Industry (Section 3.1.7). We will denote the domains by d, and we will use the lower index to identify each domain as follows: Each domain d ∈ D can use zero, one, or more types of interactions. In order to compare papers from the domain D, we define a metric that assigns each paper a value depending on the type of interactions used. In particular, we assign an integer number defining how many types of interaction the paper uses. Having a paper p k from domain d k , k ∈ {d, m, p, u, c, t, i}, the value of the paper is set as follows: where each value of p k is either one or zero depending on if the paper uses the type of interaction. Distance of two papers p a and p b in the n−dimensional space is then calculated by using the Euclidean L 2 norm as follows.
By having a defined distance of two papers, we can perform cluster analysis in n−dimensional space. We applied the k-means algorithm that clusters data into a predefined number of clusters. While the number of clusters is unknown, it can be determined by using the elbow method, as shown in Figure 4. The idea is to run the algorithm with an increasing number of clusters and measure the compactness of each cluster. Initially, all papers are in a single sparse cluster. As the number of clusters increases, the clusters become less dense, eventually resulting in each paper being in its cluster alone. A good number of clusters is in the "elbow" of the graph that shows a compromise between the number of papers in each cluster (higher is better) and the compactness of each cluster (more compact is better). The ratio is expressed as the distortion score. We have found the number of clusters k = 3. Figure 4 also shows the choice of clusters for k = 2 and k = 4, demonstrating that k + 3 is a good choice.
The results of k-means cluster the papers according to their distance, but it is not obvious what each cluster includes and why.
The initial space is five-dimensional, and we used T-distributed Stochastic Neighbor Embedding (t-SNE) algorithm to project it to 2D. The algorithm attempts to keep the points that are close to in the higher dimensional space also close in 2D. The algorithm has some parameters, and we used the following: perplexity = 100, exaggeration = 1, and 29 PCA components. The results of the t-SNE algorithm are shown in Figure 5, where the three clusters from k-means are identified with different colors; each data point corresponds to one paper, and we also show each paper's authors and year of publication. Moreover, we visualized the cluster subdivision as a dendrogram in Figure 6.

Discussion
Rubio-Tamayo et al. [35] stated that VR and technologies associated with the virtuality continuum are "emerging media" referring to VR as a concept and proposing models to link it to other domains, such as UX. While the same authors follow the "bidirectional communication theory" approach by Marko [133], we focus on the link of multimodal HCI technology in the particular domains of Concepts and overviews, Medicine, Physics, Transportation, Cultural heritage, and Industry, in addition to UX, in order to highlight specific technology configurations that are typical for each domain, suggesting future lines of work from the characteristics of these configurations. These possible future work options, in turn, stem from the answer to RQ1 and RQ2: Research Question 1-How has the type of technology changed over time in each domain?
As can be observed throughout Section 3, in general, the main types of technology used by all domains over time are haptic and VR. In addition, it is perceived that there is greater use of AR in the domains of Industry and Cultural Heritage than in the other domains. In contrast, in domains such as Transportation or UX, the use of AR is almost nil. In 2013, Medicine and UX domains had higher use of haptics, while the Physics domain had higher use of VR. Some domains such as Industry and Transportation had fair use of haptics and VR, and the Concepts and Overviews and Cultural Heritage domains do not present works. For 2014, the domain with the highest use of haptics is Medicine, while the domains with the highest use of VR are Physics, UX, Transportation, and Industry, while the Concepts and Overviews, and Cultural Heritage domains present an equitable use of haptics and VR. It was also shown that the domains of Medicine, Industry, and, for the most part, Cultural Heritage work with AR for that year. In 2015, the Physics domain presents a greater use of haptics; the Concepts and Overviews, UX, Transportation, Cultural Heritage, and Industry domains present a greater use of VR; and the Medicine domain presents an equitable use of haptic and VR technologies. When using AR, the Concepts and Overviews, and Industry domains are the only ones that present jobs. Then, in 2016, the dominant technology is VR in the domains of Concepts and Overviews, Physics, UX, and Transportation, while in the domain of Medicine, there is an equitable use of haptics and VR, and in the domain of Industry there is an equitable use of haptics, VR and AR, together. It can, thus, be indicated that in the use of AR, the domains of Concepts and Overviews, Medicine, Physics, and Cultural Heritage are also presented. In 2017, there is greater use of VR in the domains of Medicine, UX, and Transportation, while the domains of Concepts and Overviews, Physics, and Industry present an equitable use of haptics and VR, and the Cultural Heritage domain presents an equitable use of VR and AR. In subsequent years, there is a decrease in the number of works developed in all domains, but VR is still maintained as the leading technology and supported in some cases by haptics or AR.
The synthesis aims to provide an overview of each domain's technological state by noting changes in technology type for each. An example of configuration change for technology type over time is found in the work of Pacchierotti et al. [7], which reviews the progression of haptic systems for the fingertip and the hand from stationary to wearable devices in ten years. Typical technology type configurations for a specific domain change, not only as new applications appear but also on the technology type available to approach a particular domain problem. In this review, we find a difference in the frequency of works related to their domain and technology type, as shown in Section 2.

Research Question 2-What is the typical technology type by domain?
Rubio-Tamayo et al. [35] affirm that "it is necessary to determine what we want to develop as an experience and how to connect it in a more multisensorial experience". Aiming to develop this concept, we propose an updated review to report how experts working in each domain design applications, choosing specific types of technology when they develop the kind of experience that is appropriate for their target users and, hence, the relevance of the selected technology type. We found that, for the works reviewed in Section 3.2.2, the most used technology type is VR across all domains (Concepts and overviews, Physics, Transportation, Cultural heritage, Industry, and UX), except in Medicine where haptics is the most used, followed by VR.
Examples of VR and technologies associated with the virtuality continuum as emerging media are found in specific configurations according to the studied domain and time of application. For instance, Rose et al. [51] reviewed several studies aiming to understand the impact of VR and haptic feedback on the application of healthcare, noting that viewing mediums acquire immersive properties as technology advances: from computer monitors to panoramic TVs to Head-Mounted Displays (HMD). Similarly, Hamza-Lup et al. [49] surveyed visuo-haptic systems for surgical training. When the studied domain changes, so does the configuration of technological tools. For example, Stamer et al. [100] explore the use of VR with a haptic glove. They report the design of an application that uses a visuo-haptic system but only in a configuration that responds to its context; thus, it is different in technology type from the typical healthcare visuo-haptic application. We aim to show that the term visuo-haptic has a tool set with a different technology type, depending on the domain and time. The applications cited in Medicine and Transportation research show a different visuo-haptic conceptualization. This temporal analysis provides data to answer RQ3:

Research Question 3-How has the frequency of research publications changed over time by domain?
For the set of reviewed works between 2013 and 2021, Figure 2 shows a decreasing frequency in all domains since 2017. Some authors [134,135] mention that such a decrease in the development of this type of work may be due to various causes. Among those mentioned by said authors, increasingly complex applications can have greater barriers to adoption. In addition, a higher resolution is needed, which implies greater development and, therefore, greater investment. Despite this, it is expected that these barriers will decrease over time and that there will be a positive evolution in the coming years. Furthermore, we noted that the lack of widespread adoption of the haptic, AR, and VR technologies could be the cause for this decrease, as shown in Section 3.2.1. Specifically, the following appears to apply:

•
The domains of Concepts and overviews and UX do not clearly present new works that have the characteristics to be part of the analysis of this work since 2018 and that the domain of Cultural heritage does not clearly present new works that have the characteristics to form part of the analysis of this work since 2019. • One work in the Transportation domain from 2020 has the characteristics to form part of the analysis of this work, while no new work was found with the characteristics to form part of the analysis of this work for the other domains.
Finally, the frequency of works by domain does not provide enough insight into the relation between technology type and domain. Section 3.2.3 and the Figure 5 show the three clusters identified by k-means and the distribution of papers from each domain, revealing that, for the works reviewed, a significant amount of shared technology and interaction types between domains exists. Specifically, for the works reviewed, the following was observed: • In all domains, we found works with applications of haptics and VR. This information is the reference to answer our RQ4: Research Question 4-How are research publications grouped across metrics given their similarities? Figure 5 and Table 9 show the existence of three clusters, and they also show how each domain contributes to each cluster. The smallest is the green cluster that shows works from closely-related domains since there is a relationship between the domains Concepts and overviews, Medicine, and to a more significant extent Physics, UX, and Transportation, while leaving out the domains of Cultural heritage and Industry. The medium-sized cluster is the red cluster. It is the cluster with the least number of related domains since a more significant relationship is presented only in the domains Concepts and overviews, Medicine, and to lesser extent Transportation, excluding the other four domains studied in this work.
Finally, the largest cluster is shown in blue. It includes Physics and, to a greater extent UX, Cultural heritage, and Industry, leaving out the Concepts and overviews, Medicine, and Transportation domains. Table 10 shows the total number of works by domain. There is a greater number of works related to UX, Industry, Physics, and Medicine and, to a lesser extent, Concepts and overviews, Cultural Heritage, and Transportation. This table also shows how the maximum number of works in each domain has been published within our studied period of 2013-2017. This table also shows the types of display most commonly used among those analyzed (with prevailing VR and Haptic) and the most commonly used interactions (prevailing gizmo and tracking). These findings possibly open the door to novel works that would exploit the combinations of different types of displays and interactions within any domain or combination of domains, with a solid knowledge base.

Conclusions, Limitations, and Future Work
We presented a systematic review of multimodal human-computer interaction showing how different technologies are used in various domains. We defined the initial set of domains calibrated to provide balanced numbers of papers in each area. We then studied the works from the viewpoint of temporal patterns, frequency of usage in technology types in different domains, and cluster analysis by using paper metrics. This analysis allowed us to answer relevant questions when searching for the next steps in work related to multimodal HCI.
We studied the typical technology type, how the technology type and frequency have changed over time in each domain, and how papers are grouped across metrics given their similarities. We determined that VR and haptic are the most widely used in all domains. While VR is the most used, haptic interaction is presented in an increasing number of applications, suggesting future work on applications that configure VR and haptic together. We can only speculate about the implications for future development. Still, it seems that VR will become more common in many areas, and haptics technologies should follow a logical expansion.
We found the following limitations to the technology type and domain approach. (1) The clustering process occurs over seven domains and four types of interaction, resulting in three clusters with low density. (2) The clustering uses domains and types of interaction but not the application type. (3) The presented review work reports the applications used in a particular domain but does not indicate how well they are used. We also recognize that not all training technologies were included in this review. For example, technologies such as type and screen for medical training were not included in the medical domain, nor was task-specific virtual reality training included for rehabilitation. This was due to the key search terms and the defined inclusion and exclusion criteria.
There are many possible avenues for future work: (1) designing and proposing a metric to report the effectiveness of the used technology in each domain; (2) analyzing and describing the process behind the fluctuations of frequency shown in the Temporal Analysis section 3.2.1 to explain the decrease in works per year in every domain; (3) exploring further the clustering by domains to scan a more compact cluster.

Data Availability Statement:
The data supporting reported results of the analysis can be found in: https://github.com/JoseDanielAzofeifa/Files-for-analysis-of-Systematic-Review-of-Multimodal-Human-Computer-Interaction-.git.