Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making

Chen, Liru; Zhao, Hantao; Shi, Chenhui; Wu, Youbo; Yu, Xuewen; Ren, Wenze; Zhang, Ziyi; Shi, Xiaomeng

doi:10.3390/systems12010007

Open AccessArticle

Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making

¹

School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China

²

Purple Mountain Laboratories, Nanjing 211189, China

³

School of Computer Science and Engineering, Southeast University, Nanjing 211189, China

⁴

School of Transportation, Southeast University, Nanjing 211189, China

^*

Author to whom correspondence should be addressed.

Systems 2024, 12(1), 7; https://doi.org/10.3390/systems12010007

Submission received: 15 November 2023 / Revised: 21 December 2023 / Accepted: 22 December 2023 / Published: 25 December 2023

(This article belongs to the Section Systems Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Visualization systems play a crucial role in industry, education, and research domains by offering valuable insights and enhancing decision making. These systems enable the representation of complex workflows and data in a visually intuitive manner, facilitating better understanding, analysis, and communication of information. This paper explores the potential of augmented reality (AR) visualization systems that enhance multi-modal perception and interaction for complex decision making. The proposed system combines the physicality and intuitiveness of the real world with the immersive and interactive capabilities of AR systems. By integrating physical objects and virtual elements, users can engage in natural and intuitive interactions, leveraging multiple sensory modalities. Specifically, the system incorporates vision, touch, eye-tracking, and sound as multi-modal interaction methods to further improve the user experience. This multi-modal nature enables users to perceive and interact in a more holistic and immersive manner. The software and hardware engineering of the proposed system are elaborated in detail, and the system’s architecture and preliminary function testing results are also included in the manuscript. The findings aim to aid visualization system designers, researchers, and practitioners in exploring and harnessing the capabilities of this integrated approach, ultimately leading to more engaging and immersive user experiences in various application domains.

Keywords:

visualization systems; AR visualization systems; multi-modal perception and interaction; user experience

1. Introduction

Visualization systems can provide valuable insights and aid in decision-making processes [1]; therefore, they have become indispensable tools in various domains including industry, education, and research. These systems enable the representation of complex workflows and data in a visually intuitive manner, enhancing understanding, analysis, and communication of information [2]. However, traditional visualization systems mainly rely on visual perception, which limits users’ ability to fully participate in data and limits the potential of the system.

To overcome these limitations, researchers have explored the integration of augmented reality and visualization systems to enhance the user experience and improve multi-modal perception and interaction. The AR system overlays virtual elements into a real-world environment, creating an immersive interactive experience [3]. On the other hand, tangible objects involve physical objects that the user can directly observe and touch, and the visualization feature allows them to directly adjust to real-world objects to improve their ability to face complex decisions. By combining these two methods, users can utilize multiple sensory modes for natural and intuitive interaction. Previous tangible user interface (TUI) study explored the “tangible virtual interaction” between tangible Earth instruments and virtual data visualization and proposed that head-worn AR displays allow seamless integration between virtual visualization and contextual tangible references such as physical Earth instruments [4]. In addition to enhancing the user experience, the integration of AR and visualization systems also brings benefits in terms of accessibility and inclusivity. Users with motion impairments can use their body posture and movements to manipulate virtual objects, enabling them to interact more effectively with virtual elements, thereby overcoming the limitations of traditional input devices.

This study proposed an advanced augmented reality visualization system that incorporates multi-modal perception and interaction methods. This cutting-edge system seamlessly integrates virtual elements into the real-world environment, enhancing users’ interaction with their surroundings. By employing various multi-modal interaction methods, including visual, tactile, and auditory, users can easily identify and engage with virtual elements superimposed onto their physical reality. The system also enables interactive feedback, allowing users to physically interact with virtual objects, enhancing the overall sense of realism. In addition, our system incorporates eye-tracking technology, which provides a more intuitive and natural interactive visualization, increasing a certain degree of convenience.

The rest of this article is organized as follows. Firstly, we conducted a literature review on visualization systems, AR technology, and virtual user interfaces. Then, we introduced the AR visualization system for improving the user experience, including system architecture and functional design. Subsequently, the results of AR memory eye-tracking data using the system were presented. Finally, the data analysis and functionality of the system were discussed, as well as its limitations and future research prospects. Overall, the system described in this article provides a deep understanding of the integration aspects of AR visualization systems, showcasing their functionality and potential applications. By exploring and harnessing the capabilities of this integrated approach, we can unlock new possibilities for enhancing multi-modal perception and interaction, ultimately revolutionizing the way we interact with visualized data and workflows.

2. Literature Review

2.1. Visualization Systems

Visualization systems play a crucial role in aiding the comprehension and analysis of data [5]. These systems allow users to transform raw data into visual representations, providing a more intuitive and interactive way to explore and understand information. Visualization systems offer numerous benefits that contribute to their widespread adoption in various domains. One of the primary advantages is the ability to uncover patterns and relationships that may not be apparent in raw data. By presenting data in a visual form, users can easily identify trends, outliers, and correlations, leading to more informed decision-making processes [6].

Visualization systems apply in a wide range of domains, including scientific research, business analytics, and healthcare. In scientific research, visualization systems have been instrumental in understanding complex phenomena, such as environmental monitoring [7]. In business analytics, visualization systems are used for communication, information seeking, analysis, and decision support [8]. In healthcare, visualization systems aid in electronic medical records and medical decision making, enhancing patient care and outcomes [9].

As technology continues to advance, visualization systems are expected to evolve in various ways. One emerging trend is the integration of virtual reality (VR) and augmented reality technologies into visualization systems [10]. In the context of augmented reality, visualization systems can leverage the capabilities of AR technology to present data in a more intuitive and context-aware manner. Various visualization techniques, such as 3D models, graphs, charts, and spatial layouts, have been explored to enhance data exploration and understanding. Martins [11] proposed a visualization framework for AR that enhances data exploration and analysis. The framework leverages the capabilities of AR to provide interactive visualizations in real-time, allowing users to manipulate and explore data from different perspectives. By combining AR with visualization techniques, users can gain deeper insights and make more informed decisions. Additionally, there has been a growing interest in collaborative visualization systems using AR. Chen [12] developed a collaborative AR visualization system that enables multiple users to interact and visualize data simultaneously. The system supports co-located and remote collaborations, enhancing communication and understanding among users.

In recent years, there has been a growing interest in incorporating multi-modal feedback, including visual and tactile cues, to create more immersive and intuitive experiences. Haptic feedback has been explored as an essential component of visualization systems to provide users with a tactile sense of virtual objects. Haptic feedback can enhance the user’s perception of shape, texture, and force, allowing for a more realistic and immersive experience. Several studies have investigated the integration of haptic feedback into visualization systems, such as the use of force feedback devices [13] or vibrotactile feedback [14]. These approaches enable users to feel and manipulate virtual objects, enhancing their understanding and engagement with the data. In summary, the integration of multi-modal perception and interaction in visualization systems, particularly through the use of augmented reality and user interfaces, has been an active area of research. Previous studies have demonstrated the benefits of combining visual and tactile feedback to create more immersive and intuitive experiences [14]. The visualization system proposed in this study presents data that are challenging to visualize in text or daily life, such as a certain range of information. It utilizes a 3D format, in contrast to previous information that is entirely virtual or detached from associated equipment. This study provides timely and reliable information assistance in user decision making by mapping virtual information onto tangible physical objects and through multi-modal feedback, including visual and auditory cues.

2.2. Augmented Reality Technology

Augmented reality is a technology that overlays virtual information onto the real world, enhancing the user’s perception and interaction with the environment [15]. This integration is achieved through the use of computer vision techniques, tracking technologies, and display devices. Azuma [16] introduced the concept of AR as a combination of real and virtual environments, where virtual objects are seamlessly integrated into the physical world. AR enables users to perceive and manipulate virtual objects in a real-world context, leading to improved spatial understanding and an enhanced user experience.

AR has gained significant attention in various domains, including education [17], healthcare [15], and entertainment [18]. Several studies have explored the benefits and challenges of AR in different applications [19,20]. In recent years, advancements in hardware, such as smartphones and head-mounted displays (HMDs), have made AR more accessible and widely adopted. HMDs, like Microsoft HoloLens and Magic Leap, provide immersive experiences by overlaying virtual objects directly into the user’s field of view [21]. These devices offer a wide range of possibilities for visualization and interaction in AR systems. Researchers have explored different AR techniques [22,23,24] to improve the user’s visual perception and engagement. Studies have shown that AR can provide a more immersive and interactive experience by combining virtual objects with real-world surroundings, offering opportunities for enhanced learning, training, and decision-making processes [25]. However, some challenges need to be addressed for the successful implementation of AR. One major challenge is the accurate and robust tracking of the user’s position and orientation in real time [26]. Various tracking techniques, such as marker-based [27], sensor-based [28], and Simultaneous Localization and Mapping (SLAM) [29], have been developed to overcome this challenge. Our system is designed for versatility and scalability, applicable across various fields such as industry, education, and healthcare. This universal design enhances the practical value of our system, addressing diverse user needs in different domains and increasing its utility and potential for widespread application.

Another challenge is the design and development of intuitive and natural user interfaces for AR systems. Traditional input devices, such as keyboards and mice, may not be suitable for AR interactions. Therefore, researchers have explored alternative input methods, including gesture recognition and voice commands, to enhance user engagement and interaction [30,31,32]. However, a single gesture recognition or voice command may only provide limited interaction options, limit the user’s operating methods, and have reliability and accuracy issues, while lacking diversity and flexibility. Therefore, multiple interaction methods should be provided in AR applications to ensure the widespread adoption and successful implementation of AR technology in various applications.

To address these challenges and provide users with a richer and more immersive experience, this study combines AR and visualization systems. By integrating visual, auditory, and tactile multi-modal perception and interaction, AR applications can offer a more comprehensive and engaging user experience. This approach expands the possibilities for interaction and enhances the user’s ability to manipulate and explore virtual objects in the real world.

2.3. Virtual User Interfaces

Virtual User Interfaces, including tangible user interfaces, provide a physical and tangible means for users to interact with digital information [33]. TUIs enable users to manipulate virtual objects or control digital systems through physical artifacts or objects [34]. This interaction paradigm enables users to leverage their existing knowledge and skills to manipulate and participate in digital content in a more natural and meaningful way, making technology easier to use and user-friendly [35].

Unlike traditional graphical user interfaces (GUIs), TUIs provide a more embodied and tangible interaction experience by utilizing physical objects as input and output devices. These physical objects, also known as “affordances”, are designed to represent and convey digital information in a perceptible and manipulable form [36]. TUIs offer several benefits over traditional interaction methods. One of the key advantages is their ability to leverage humans’ innate physical and sensorimotor skills, enabling a more natural and intuitive interaction. By providing physical objects that users can grasp, touch, and move, TUIs engage multiple senses and enhance the user’s spatial awareness and cognitive engagement [37]. Additionally, TUIs facilitate a more tangible and embodied understanding of digital information, as users can directly manipulate and explore physical objects that represent abstract data [34].

TUIs offer multi-modal feedback and intuitive manipulation of virtual objects, making them suitable for AR environments. Research has shown that TUIs in AR visualization systems can enhance user collaboration, spatial cognition, and overall user experience. Sketched Reality [32] combines AR technology and TUI technology to achieve bidirectional interaction through tactile feedback and physical interaction. This bidirectional interaction method enables users to feel the existence of virtual objects more realistically, enhancing the immersion and interactivity of AR applications. Ubi Edge is an edge-based augmented reality touchable user interface authoring tool. This system allows users to control augmented reality elements by sliding or clicking on the edges of physical objects. For example, users can change the color of virtual light bulbs by sliding on the edge of a coffee cup or activate AR shooting animations by clicking on the edge of a toy airplane. These examples demonstrate the potential and application of multi-modal perception TUI in augmented reality environments [38]. The combination of eye-tracking and user interaction feedback is a promising development direction. Utilizing the fixation point of the eyes, the system can discern the user’s intention and offer corresponding feedback. By embedding interactive objects in the TUI, when the user gazes at a specific object, the system can detect the user’s fixation point through eye-tracking technology and provide relevant feedback, enabling control through pseudo-ideation. This enables users to tailor interaction methods based on preferences, fostering a tighter connection between the system and users, ultimately enhancing user satisfaction and improving overall experiences.

3. Materials and Methods

3.1. System Framework

The study proposed an Augmented Reality Visualization System with multi-modal perception and interaction, aiming to elevate the capabilities of this integrated approach. The system is developed by combining Unity, HoloLens, and the Augmented Reality Toolkit. By leveraging these technologies, we aim to provide more intuitive, accurate, and comprehensive support for complex decisions. The system consists of several modules, each playing a crucial role in achieving our goals. These modules include the member management module, the augmented reality interface module, the user behavior interaction module, the eye-tracking data acquisition module, and the AR experimental process management module, as shown in Figure 1.

Member management module: This module is a comprehensive system that includes system tutorials, system experiments, data recording, and data processing and analysis. Participants can familiarize themselves with augmented reality systems through this module, conduct interactive experiments, and record real-time data. The module provides a foundation for analyzing the behavior and attention distribution of participants, ensuring the accuracy and reliability of experimental results.
Augmented reality interface module: This module provides researchers with a user-friendly and reliable platform for conducting experiments and refining AR experiences. It utilizes Unity, HoloLens device, Vuforia platform, and the Mixed Reality Toolkit to create immersive AR scenes, seamlessly integrating virtual objects into real environments enabling device locomotion-based virtual content tracking, specific image recognition, and various interaction modalities. This integration establishes a unified framework, enhancing the overall cohesion and functionality of the augmented reality system.
User behavior interaction module: This module enables users to interact with the augmented reality environment through various input methods, including voice commands, gestures, and user interfaces. It provides a flexible and intuitive way for users to manipulate virtual objects and navigate the system.
Eye-tracking data acquisition module: This module stores the aggregated gaze data locally, providing spatial location and timing information for subsequent statistical analyses. This accurate and convenient platform offers researchers valuable insights into users’ visual behavior patterns and interface design issues in virtual environments.
AR experiment process management module: This module ensures the smooth execution and management of augmented reality experiments. It provides tools for designing and conducting experiments, collecting data, and managing experimental processes, helping to improve the efficiency and accuracy of experiments while also promoting the work of researchers.

To implement this system, the following software and hardware configurations are required: the Unity development platform, HoloLens headset, and Augmented Reality Toolkit. Unity is a powerful game engine and development platform that enables the creation of interactive and immersive experiences, serving as the foundation for developing the augmented reality visualization system. The HoloLens is a wearable mixed reality device developed by Microsoft that combines virtual reality and augmented reality capabilities, allowing users to interact with virtual objects in the real world. The Augmented Reality Toolkit is a software library that provides tools and resources for developing augmented reality applications, including features for 3D object recognition, tracking, and interaction, which are essential for our system’s functionality.

After the scene data and system settings are completed, the system enhances multi-modal perception and interaction by incorporating various modes of interaction. Users wear HoloLens glasses to access the AR scene, and the system recognizes device information through Vuforia scanning. Users can navigate and interact using voice commands, and hand gestures are detected for precise manipulation. Physical props can also be used to interact with virtual objects. Eye-tracking data are recorded for analysis, and an experimental process management module streamlines the evaluation and improvement of the system. This comprehensive approach improves the user experience and usability.

3.2. Member Management Module

The member management module is the foundation of the business process control mechanism. This module can provide clear guidance and assistance to the participating members, leading them to fully engage in the experimental environment of this augmented reality visualization system. The member management module mainly includes four key parts: system tutorial, system experiment, data recording, and data processing and analysis, as shown in Figure 2.

System tutorial: Before conducting the visualization system experiment, participating members first undergo system tutorial learning. Through the system tutorial, participants can gain a detailed understanding of the operational steps and processes in the augmented reality experimental environment using the Hololens device. The system tutorial aims to provide necessary guidance, enabling participants to familiarize themselves with the system’s functionality and interaction methods, and ensuring their correct usage of the system for subsequent experiments.
System experiment: After completing the system tutorial, participants enter the formal system experiment phase. Participating members interact with the augmented reality system scenario through actions such as clicking, gazing, and voice commands. The experiment design allows participants to freely explore the system’s features and characteristics, collecting data during the experiment.
Data recording: During the experiment, the system can record real-time experimental data of participating members. This includes recording system interaction videos, eye gaze coordinates, gaze duration, and eye gaze trajectory heatmaps. Accurate recording of participants’ behavior and attention focus provides a necessary foundation for subsequent data analysis.
Data processing and analysis: After the experiment, the experimental data for each participant are processed and analyzed. This includes experiment replays, analysis of participants’ gaze data, and plotting scatter diagrams representing participants’ eye gaze ranges. Through statistical analysis and visualization techniques, the behavior patterns and attention distributions of participants during the experiment can be revealed, supporting further analysis and conclusions.

The member management module of this system ensures the controllability and repeatability of the augmented reality visualization system experiment, ensuring the accuracy and reliability of the experimental results. Additionally, valuable empirical data and references are provided for future research work and improvements in system performance.

3.3. Augmented Reality Interface Module

The AR interface module acts as a conduit between the system and the HoloLens device, augmenting multi-modal perception and interactivity for a more immersive user experience. It brings virtual reality scenes into users’ actual visual field by overlaying AR content onto the HoloLens headset display. The module offers researchers a reliable, user-friendly HoloLens research platform, enabling concentration on experimental design and AR experience refinement without contending with convoluted technical intricacies. To achieve this, the module capitalizes on Unity’s integration with HoloLens to render and showcase captivating AR scenes. As shown in Figure 3, the AR interface module serves well as a bridge between the system and the device.

Unity, as a prevalent cross-platform game engine, can constitute the primary development framework. Its abundant tools and engine support facilitate the crafting of 3D scenes and user interfaces. Unity is leveraged to create and render AR scenes encompassing virtual objects, 3D models, and user interface elements. Its robust graphics engine assimilates virtual content into the HoloLens headset, while the device’s innate spatial mapping and gesture recognition integrate virtual objects seamlessly into real environments for remarkably authentic AR experiences.

The module’s development harnessed the Mixed Reality Toolkit (MRTK), an open-source toolkit furnishing fundamental components and features to streamline cross-platform AR application development. The module buttresses diverse interaction modalities, including gesture control, air tap, voice commands, and eye-tracking. Catering to varied research requirements, it offers flexible customization capabilities to introduce novel virtual objects and adjust scene layouts while facilitating the storage and visualization of user behavior data for analysis.

In order to enrich the AR function of the system and device, we also added device locomotion features to the AR Interface Module. The device locomotion features in AR systems can track real objects and overlay virtual content on them to enhance interactivity and tangibility through recognition and positioning methods. In this system, we primarily utilize Vuforia’s scanning capability to implement device locomotion. Vuforia is a cross-platform AR application development platform with robust tracking and performance on various hardware, including mobile devices and mixed-reality head-mounted displays (HMDs), such as Microsoft HoloLens [39].

In this system, the objects to be recognized are imported into the Vuforia recognition library to generate a corresponding Unity package with star ratings reflecting recognition quality. The Unity package is then imported into Unity. In Unity, the Vuforia is set up, and the AR camera is called to interact with the objects to be recognized. When a real-world object is recognized, virtual content is bound to it, and users can interact through touch, rotation, tilt, or other gestures. Specifically, Vuforia’s scanning function first performs image recognition. The user uses the camera on the mobile device to scan and recognize specific images, logos, objects, or scenes from the real world. These images are usually specific patterns or markers used to determine the user’s position and orientation. Next, Vuforia extracts visual feature points such as corners and edges from the recognized images. These feature points are used to build a feature database for matching virtual content to physical objects. By matching the real-time image against feature points in the feature database, Vuforia tracks the position and orientation of the user’s device in real-time. This ensures the alignment of virtual content with the physical world. Finally, once the user’s position and orientation are determined, Vuforia overlays the virtual content in the user’s view aligned with the physical object, using rendering techniques to ensure consistency of lighting, perspective, and scale between the virtual and real world.

3.4. User Behavior Interaction Module

User behavior involves multi-modal perception and interaction, including clicking, voice, gesture recognition, physical manipulation, etc. It needs to be integrated with the AR module to enable users to observe virtual information and interact with it on mobile devices. This provides users with a more intuitive and efficient interactive experience. The module consists of three main parts: multi-modal perception, interactive objects, and a feedback mechanism, as shown in Figure 4.

In multi-modal perception, the module recognizes user input methods such as voice, gestures, and touch to capture real-time behavior and needs. This allows users to select 3D content by clicking or using a ray emitted from their hand. Interactive objects in the 3D world can trigger events, such as touching buttons and 3D objects, allowing users to directly interact with the system through wearable devices. The feedback mechanism provides users with timely feedback on their operations. This can be visual, such as highlighting and finger cursor feedback, or auditory, with sound effects at different user selection statuses (including observation, hovering, touch start, touch end, etc.).

By combining these modules, the system offers users a highly intelligent and personalized interactive experience. By integrating various input methods, users can interact with virtual information more intuitively and efficiently. Additionally, the inclusion of interactive objects and a feedback mechanism ensures that users receive timely and informative feedback on their actions, further enhancing their understanding and control of the system. Overall, the multi-modal perception and interaction module greatly enhances the interactive experience and enables users to effectively collaborate and innovate in real-world scenarios.

3.5. Eye-Tracking Data Acquisition Module

The eye-tracking data acquisition module capitalizes on the integrated eye-tracking system of the HoloLens augmented reality headset to gather gaze data required by researchers through customized eye-tracking scripts. The system encompasses dedicated eye-tracking cameras and sensors that enable high-fidelity, low-latency tracking, along with automated pupil finding and head movement compensation. This module consists of a data collection sub-module and a data processing sub-module.

3.5.1. Data Collection Methods

The data collection sub-module activates when users interact with the augmented reality environment, producing real-time heat maps based on user gaze patterns. It overlays these patterns on augmented reality objects and UI elements to reflect user interactions. It continuously seizes the 3D spatial coordinates of users’ gaze points in the augmented reality scene to dissect visual exploration behavior. The module pinpoints specific elements and areas attended to by users during interactions, gauging the time users spend looking at them to evaluate the appeal and cognitive load, with more prolonged gaze duration typically betokening greater interest or cognitive load. Furthermore, the module investigates gaze point sequences to gain insights into users’ information processing tactics and attention distribution in the augmented reality environment.

3.5.2. Data Processing Methods

The data processing sub-module stores the aggregated gaze data locally on the HoloLens device, containing spatial location and timing information. Location information logs the x, y, and z coordinates of users’ gaze points during all interactions with augmented reality objects and UI elements, along with corresponding timestamps. This enables the generation of scatter plots, scan paths, and areas of interest based on aggregated gaze points over time. Timing information embodies the duration users spent looking at various AR interface components and augmented reality objects throughout the interaction. Contrasting total gaze times on different interface elements can identify areas needing optimization to refine the user experience. Gaze duration furnishes quantitative temporal insights into visual information processing during AR interactions. The preserved eye-tracking data facilitates subsequent statistical analyses to uncover users’ visual behavior patterns, interface design issues, and more. This HoloLens-based eye-tracking approach offers researchers a handy and accurate platform for gathering virtual environment interaction data.

3.6. AR Experiment Process Management Module

The augmented reality experiment process management module is designed for administrators, including the following functions, as shown in Figure 5. Administrators can create experiment projects by inputting basic information such as the name, description, and objectives of the experiment. This allows for better organization and tracking of different experiments, providing a clear understanding of each project’s purpose and goals. In addition, administrators have full control over the design of the experiment process. They can add, edit, and delete experiment steps, allowing for customization and tailoring of the experiment process to meet specific requirements. Detailed information, such as step names, descriptions, keywords, images, and videos, can be provided for each step, ensuring clarity and accuracy in the experiment design.

Once the experiment is designed, users can execute it using the module’s interface. The interface provides real-time information on the progress and results of the experiment, allowing users to stay on track and monitor the experiment’s execution. This ensures that the experiment is carried out smoothly and effectively. Administrators can also manage the overall experiment process. They can create, edit, and delete experiment processes, setting the sequence and steps for the experiment. This ensures a logical and efficient flow of the experiment, improving organization and management.

The module automatically collects data during the experiment process. This includes user operations, eye-tracking data, and user feedback. The collected data can be used for further analysis and evaluation of the experiment, providing valuable insights for administrators and researchers. The module offers a range of functions that empower administrators to create, design, execute, and analyze augmented reality experiments. Its goal is to enhance the efficiency and quality of experiments, benefiting both users and researchers.

4. Results

4.1. AR Interactive Experiment

4.1.1. Interactive Experiment Design

To test the system’s performance, we conducted a user study experiment to examine the usability of the different modules. A total of 25 people were recruited through social networks and student organizations on the university campus. We integrated the system into a smart home scenario, as shown in Figure 6, and designed a series of experiments for participants to test the system’s functionality and user experience. First, the real-time status of the smart device was visualized and displayed above the device, and second, the user could set up the smart device using the user interface. In order to enhance the user’s perception of the environment, the system also visualizes the sensor sensing range in relation to data communication. In addition, to simplify the interaction, the system is designed with a scene-switching function, which realizes the rapid transformation of device configuration in different scenes. During the experiment, participants’ interactive actions were recorded, and time markers were used. After the experiment, participants were asked to rate the performance of the system. All subjects gave their informed consent for inclusion before they participated in the study. The protocol was approved by the Ethics Committee of the affiliated university (2023ZDSYLL354-P01).

All participants were first required to complete basic AR operation tutorials to help them learn and become familiar with the AR system and its functions. Firstly, users were asked to observe the visualization state of the device and perform the interaction test of the user interface; secondly, users were asked to observe the sensing perception range and remove the physical objects that we had placed in the range in advance; and lastly, users were asked to switch and observe the communication relationship of the device as well as the status information during different scenes. This was followed by a formal experimental session in which participants were required to configure their smart devices according to different scenario descriptions and prompts in conjunction with their personal needs, as shown in Table 1. They were also asked to explain the reasons for their settings to the experimenter after completing the tasks. In the formal experiment, participants made corresponding smart device configuration decisions by observing the smart device status, sensing range, and communication relationships in the scenarios and were prompted by the experiment descriptions. The participants were tested separately. Firstly, the experimenter introduced the research background, and the participants read and signed the informed consent form. Subsequently, the participants in the AR group wore AR glasses and underwent glasses calibration. After familiarizing themselves with the basic operations, they completed the above scenario, setting tasks in sequence. Instructions for each task were given throughout the experiment, and participants were instructed by the experimenter to proceed to the next task after completing the previous one.

4.1.2. Experiment Results

Multi-modal interaction data were collected from users in interactive experiments, including click counts, consumed time, and other operational data. The counts of clicks and the time consumed by users in different task sessions are shown in Figure 7. Since these two factors are non-normally distributed, we conducted a Wilcoxon analysis and the results of the correlation analysis are shown in Table 2. The results showed that in different scenario setting tasks, the counts of clicks were positively correlated with the consumed time (p < 0.0001, r = 0.73).

The users also demonstrated their exploration and adaptability to the experimental environment. During the experimental process, users can click to switch scenes, select devices, and set operations to complete the experimental tasks. During the experiment, the user performed multiple click operations, and the specific distribution is shown in Figure 8.

The details of time and clicks consumed by different users to complete the tasks are shown in Figure 9. In summary, by analyzing data on user click operations and consumption time, the functional performance of the system can be evaluated, and improvement suggestions can be proposed based on the evaluation results to optimize system performance and user experience.

In addition, during the testing of the sensing perception range, it was found that there were errors in the user’s perception of the boundary of the virtual 3D range. In the corresponding test session, users were asked to move the physical objects placed in the sensing range along different distances and angles under the premise of moving the smallest possible distance, and we recorded the position of the final object, measured it, and obtained the distribution of the error perception, as shown in Figure 10. Since the boundary perception errors are non-normally distributed, we performed a Wilcoxon analysis to examine the user’s perceptions of straight surface perception errors (angle error) and curved surface perception errors (distance error). The data show that the straight surface error is significantly different from the surface error (p < 0.0001, r = 0.09).

User evaluations of the ease of learning and interactivity of the system were collected in different experimental sessions and the results are shown in Figure 11. Since the user evaluation scores were non-normally distributed, we conducted a Wilcoxon analysis to test this. Participants rated both the ease of learning and interactivity of the system highly, with no significant difference between the two (p < 0.01, r = 0.38), indicating that their effects are largely independent of each other. The experimental results suggest that there may be limitations in how well people can perceive boundaries within virtual objects, but they also demonstrate that the system is highly usable and engaging.

4.2. Eye-Tracking Experiment

4.2.1. Eye-Tracking Experiment Design

The visual module is one of the most intuitive and important interaction modalities for humans. The visual modality utilizes human visual perception and attention to guide users’ focus and their interactions. By leveraging visual cues, such as highlighting, animation, and visual hierarchy, important information can be emphasized and capture users’ attention within the interface.

To investigate relevant information about user attention and cognitive processes, we designed a comprehensive eye-tracking data collection method in the context of this augmented reality visualization system. During the experimental process, the system records real-time gaze focus data on the graphical interface for each participant based on their interactions with the system. This aids in identifying specific elements and areas that users focus on during the interaction. Additionally, we designed the recording of the duration participants gazed at each graphical interface to assess their levels of attention. Longer gaze duration typically indicates greater interest or cognitive load.

In conducting eye-tracking experiments for augmented reality visualization systems, we invited six participants to ensure breadth and reliability of the experiment. Initially, participants underwent eye-tracking calibration on the AR device to accurately track and record their gaze points during the experiment, accounting for variations in individuals’ eye characteristics. Following eye-tracking calibration, participants engaged in the formal eye-tracking experiment integrated within the smart home scenario. In this experiment, users first activated the real-time status visualization interface of the actual devices using the AR device, which, in turn, triggered the eye-tracking module of the system. Building on the aforementioned experimental steps, participants were free to explore the scene based on their interests and carry out exploratory trials. Additionally, when users activated the real-time status visualization interfaces of multiple devices through eye-tracking, we prompted them to activate the system’s status management interface by fixating on either the left or right palm, allowing them to view the real-time statuses of all devices in the system. Throughout the eye-tracking interactions and experimental process described above, the system recorded each user’s eye-tracking behavior, focus, and gaze data using the eye-tracking data recording method we designed. Furthermore, to visually demonstrate participants’ usage and manipulation of the visualization control panel within the system, a heatmap of their eye gaze, generated based on their gaze duration and frequency on each visualization interface, was overlaid on the interface, as shown in Figure 12.

In summary, in this experiment, participants were allowed to engage in exploratory attempts within the scenario, in addition to the tasks they were prompted to complete. This open design provided participants with greater autonomy, encouraging more natural interaction with the system in the augmented reality scenario. This design approach facilitated the collection of more authentic eye-tracking behavior data, allowing us to better understand user needs and improve the interaction design of the augmented reality system. We evaluated the usability of the interaction interfaces in the AR system and captured users’ most authentic eye-tracking data in the augmented reality scenario. By analyzing users’ fixation points, fixation duration, and gaze areas, we assessed the interaction effectiveness and efficiency between users and the interface. This evaluation helped us identify potential issues in the interface design and provide improvement suggestions to enhance user interaction experience and task completion efficiency.

4.2.2. Experiment Results

Through the comprehensive and detailed eye-tracking data collection method mentioned above, we collected experimental data for each participant. The collected eye-tracking data serve as the basis for subsequent analysis. By using relevant data processing tools and techniques, researchers can gain insights into users’ attention distribution, information processing patterns, and cognitive load. These insights are crucial for improving system interface design, optimizing user experiences, and conducting user research.

We processed gaze point data obtained from participants. Since augmented reality visualization systems involve overlaying virtual reality onto the real world, the gaze coordinates for each user focusing on the visualization graphical interface within the system may vary based on their position. Therefore, we normalize the coordinates of the gaze points for each user by recording the central coordinate point of the scene. The processed data are then plotted on the same scatter plot to visually depict the eye’s visual behavior and points of attention for each user. As the system enhances the real-world three-dimensional environment, users’ graphical interface gaze data may present different results in a two-dimensional plane compared to the three-dimensional environment. Thus, we normalized participants’ eye attention data and plotted both the scatter plot in the two-dimensional plane and the scatter plot in the three-dimensional environment. We selected gaze data from each participant on two visual interfaces during the experiment, as shown in Figure 13.

Different participants exhibited varying levels of interest in different scenes or interfaces during the experiment. To evaluate users’ attention intensity towards different elements, we recorded the duration of each participant’s gaze on each graphical interface. Additionally, since participants may have multiple discontinuous fixations on the same interface, we calculated the total gaze duration for each participant by summing the gaze times on that particular interface. Longer gaze durations typically indicate higher interest or cognitive load. The results are shown in Figure 14.

Through the eye-tracking experiments conducted in this system, we have collected a substantial amount of user gaze data. Analyzing and discussing these data allows us to gain a deeper understanding of the participants’ attention distribution and usage preferences while using the system, providing valuable insights for optimizing the system’s user experience and interface design.

5. Discussion

5.1. Analysis of System Advantages and Usability Experiment Results

Previous research has highlighted that interactive scene visualization in immersive virtual environments can offer decision support [40]. By utilizing AR visualization systems for complex decisions, a more intuitive, real-time, multimodal, and collaborative decision-making environment is provided, thus improving the quality and efficiency of decision making. In the context of smart homes, visualizing privacy-invasive devices around the user can assist the user in recognizing the presence of privacy devices and making necessary adjustments [41]. Moreover, some researchers have implemented data type visualization for common privacy-invasive devices, such as cameras and smart assistants, to aid user decision making [42]. The system proposed in this paper introduces a visualization system that can be applied to various scenarios, simplifying complex situations and helping users make decisions in an immersive manner. The system provides users with a more immersive and engaging experience by superimposing virtual objects onto the real world. This enables users to perceive information more intuitively and naturally, thus deepening their understanding and memory of the content. In addition, augmented reality visualization systems can combine multiple sensory technologies, such as vision, hearing, and touch, to provide information from multiple perspectives. This can enhance users’ perception of the environment and help them better understand complex situations. This will enable decision-makers to better understand and analyze data, consider factors and variables more comprehensively, and make more informed and accurate decisions.

The results of the user experiments provide valuable insights into the effectiveness of the system. The finding that users tended to click on the menu for device selection indicates that the system successfully provided users with the flexibility to choose and switch between different devices for interaction. This demonstrates the system’s capability to support multi-modal perception and interaction, allowing users to utilize different input devices based on their preferences or specific task requirements. Furthermore, during the experiment, the number of user interactions and time consumption in the subsequent scene setting tasks were significantly lower than in the first scene, suggesting that after learning, users quickly become familiar with the system. This indicates that the system has a learning curve, and with practice, users can become more adept at navigating and interacting with the augmented reality environment. Users’ ratings of the ease of learning to operate the system with different functions became higher, which also suggests that the system is user-friendly and easy to learn. Users found it relatively easy to grasp the system’s functionalities and felt comfortable interacting with the virtual objects. The higher ratings of interactivity compared to ease of learning suggest that users perceived the system as highly interactive and engaging, even though it might have required some initial effort to learn. In summary, the experimental data and analysis provided valuable insights into the effectiveness and usability of the system. Experimental participants indicated that the system provided a user-friendly and engaging experience and that the smart device-based visualization system provided an important reference for decision making in their scenarios. In summary, the experimental data and analysis provided valuable insights into the effectiveness and usability of the system.

5.2. System Modules’ Usability Analysis

The user management module of this augmented reality visualization system comprises four parts: system tutorials, system experiments, data recording, and data analysis. At the current stage, it demonstrates good system robustness. Considering participants’ backgrounds and skill levels, the system tutorials aim to help experiment participants quickly grasp the operational procedures within the augmented reality interaction system environment. They provide clear guidance and instructions, using concise and understandable language and illustrations, ensuring accurate execution of experimental tasks and reducing misunderstandings about the experimental process.

Additionally, the system’s experimental design aligns with participants’ actual needs and interaction styles. Employing multi-modal perceptual interaction experiments greatly reduces the difficulty of participants’ experiments and improves their user experience. The comprehensive experimental design allows precise recording of each participant’s focus on specific elements and areas during the interaction experiment. By analyzing the recorded user attention and gaze points, we can further refine the design of the system’s visual graphical interface, thereby elevating users’ experience and immersion. Moreover, the comprehensive data recording module captures participants’ real-time interaction data using various formats such as images, text, and videos. The analysis of obtained eye-tracking data enables a prompt understanding of participants’ actual usage patterns, providing valuable empirical data and user recommendations for future research and system performance improvements.

Furthermore, the system integrates a comprehensive data processing and analysis module for a thorough examination of participants’ experimental data. Continuous enhancements to the user management module and the overall interaction experience are derived from insights gained through participants’ data analysis results. These improvements aim to meet the varied usage and exploration needs of participants with different backgrounds. The visual representation of each participant’s attention distribution, information processing patterns, and cognitive load, based on eye-tracking data analysis results, guides further optimization of the system’s interaction modes and visual graphical interface distribution. Overall, this data-driven approach furnishes accurate evidence for system optimization and future research.

In summary, while the member management module of the system performs well at this stage, our pursuit is to optimize and enhance the augmented reality visualization system. Designing personalized member management is our next research direction, considering the specific needs and interaction preferences of different participants, for example, providing system tutorials and experiment difficulty levels tailored to participant’s skill levels and proficiency to meet their learning and exploration needs.

5.3. Multi-Modal User Data Acquisition Method

The proposed AR visualization system aims to enhance multi-modal perception and interaction by incorporating diverse sensory modalities. In this study, we conducted two pivotal experiments, namely the AR interactive experiment and the eye-tracking experiment, to comprehensively assess the functionality of the overall system. Through these two experiments, a diverse set of multi-modal data, including gestures, air taps, perception, and eye-tracking data, was collected to understand the cognitive processes and experiences of users during the interaction, facilitating a better overall evaluation of the system. In the AR interactive experiment, the obtained experimental data reflected the system’s good learnability and interactivity. It laid the foundation for improvements in boundary perception and user adaptability. Traditional multi-modal interaction research has predominantly concentrated on visual, auditory, and tactile aspects [14,32], often overlooking eye-tracking technology. In response, our system introduces eye-tracking technology to capture the user’s gaze, facilitating a more natural and intuitive interaction. A multi-modal system that incorporates sophisticated eye-tracking enables users to engage through gaze positioning, gesture control, and tactile feedback, thereby enhancing user participation and immersive experiences. Our eye-tracking experiment for the eye-tracking data acquisition module built in this system involves collecting data such as trajectories and heatmaps on the AR panel, providing valuable insights for analysts to examine users’ visual behavior patterns and evaluate and enhance the system’s UI interface design. Through multi-user eye-tracking experiments, we discovered the system’s sensitivity to eye movement calibration, providing significant assistance for subsequent system improvements.

Overall, the system showed promise in capturing diverse user data modalities. However, the testing highlighted opportunities to improve multi-modal data collection accuracy and reliability through adaptive calibrations, spatial reference standardization, and multi-modal input fusion. Enhancing the system’s capabilities to seamlessly integrate these modalities into natural interactions would further augment users’ sense of immersion and engagement. The user data provide valuable insights to inform the iterative refinement of the system’s multi-modal interaction design.

5.4. System Limitations and Future Expansion Points

The results of the user experiments indicate several limitations and areas for future expansion in the augmented reality visualization system. One limitation is the potential difficulty in accurately perceiving the depth and spatial relationships between virtual and real-world objects. The lack of depth perception in the augmented reality overlays can hinder users’ ability to interact effectively with the virtual objects. To address this, future development could explore the integration of tangible user interfaces combined with depth sensing technologies such as depth cameras or sensors, to provide users with more accurate depth perception in the augmented reality environment. This would enable more precise interaction with virtual objects and enhance the system’s multi-modal perception capabilities.

Another limiting factor is that eye-tracking in current systems is only used for data visualization. In terms of future expansion, the system could benefit from the integration of gaze-based interaction techniques. By combining eye-tracking with the user interface, users could perform actions such as object selection, navigation, and menu control through their gaze. This would provide a more natural and intuitive interaction modality, reducing the reliance on physical manipulation and further enhancing the system’s multi-modal perception and interaction capabilities. Additionally, future development could focus on incorporating advanced visualizations and overlays that take advantage of eye-tracking data. For example, the system could dynamically adjust the size, position, or content of augmented reality overlays based on users’ gaze patterns and visual attention. This would enable a more personalized and context-aware augmented reality experience, enhancing users’ perception and interaction with the virtual objects. Furthermore, the system could benefit from the integration of machine learning algorithms to analyze and interpret users’ gaze data. By leveraging machine learning, the system could learn and adapt to individual users’ gaze patterns, preferences, and behavior, further enhancing the personalized and adaptive nature of the augmented reality experience.

In conclusion, while the augmented reality visualization system has the potential to enhance multi-modal perception and interaction as well as improve complex decision making, there are limitations and areas for future expansion. Improving eye-tracking accuracy, improving depth perception, incorporating gaze-based interaction techniques, and leveraging machine learning algorithms are key areas to address. By expanding the system’s capabilities in these areas, it can provide users with a more immersive, intuitive, and personalized augmented reality experience and thus provide better support and assistance for complex decision making.

6. Conclusions

This study proposes an augmented reality visualization system that focuses on the potential of augmented reality visualization technologies in improving human decision making through enhanced multi-modal perception and interaction. We conducted a series of experiments, including a multi-modal interaction experiment and an eye-tracking experiment, within the context of a smart home system scenario, to evaluate the system’s performance. The visualization system provides decision-aiding information, and the multi-modal perception and interaction methods under AR, especially the eye-tracking technology, provide an immersive decision-making environment under the scene, which comprehensively improves the user’s ability to understand the information for decision making. Our study contributes to the advancement of augmented reality and human–computer interaction, presenting new possibilities for interactive visualization systems. The results of the experiments indicate that the integration of eye-tracking enhances the user experience and provides immersive interaction, allowing for a broader analysis of user behavior. However, there are limitations to consider, such as boundary perception errors and the limited application of eye-tracking, which restrict the system’s usability in certain scenarios. To further advance this field, future research should focus on improving reality perception, target recognition, and tracking to achieve diverse and natural interactions, thereby enhancing the quality and efficiency of complex decision making.

Author Contributions

Conceptualization, L.C., Z.Z., H.Z. and X.S.; methodology, L.C., H.Z., Z.Z. and X.S.; software, L.C., C.S., Y.W., X.Y., W.R. and Z.Z.; validation, L.C. and H.Z.; formal analysis, L.C., Z.Z., C.S. and Y.W.; writing: L.C., H.Z., C.S., Y.W., X.Y., W.R. and X.S.; visualization, L.C., C.S., Y.W., X.Y. and W.R.; supervision, H.Z. and X.S.; project administration, H.Z. and X.S.; funding acquisition, H.Z. and X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program of China (No. 2021YFB1600500) and Marine Science and Technology Innovation Program of Jiangsu Province (No. JSZRHYKJ202308).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the affiliated university (No. 2023ZDSYLL354-P01).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MDPI	Multidisciplinary Digital Publishing Institute
DOAJ	Directory of open access journals
TLA	Three letter acronym
LD	Linear dichroism
AR	Augmented Reality
UI	User interfaces
TUIs	Tangible user interfaces
VR	Virtual Reality
GUIs	Graphical user interfaces
HMDs	Head-mounted displays
MRTK	Mixed Reality Toolkit

References

Cui, W. Visual Analytics: A Comprehensive Overview. IEEE Access 2019, 7, 81555–81573. [Google Scholar] [CrossRef]
Chen, C. Information visualization. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 387–403. [Google Scholar] [CrossRef]
Zhan, T.; Yin, K.; Xiong, J.; He, Z.; Wu, S.T. Augmented Reality and Virtual Reality Displays: Perspectives and Challenges. iScience 2020, 23, 101397. [Google Scholar] [CrossRef] [PubMed]
Satriadi, K.A.; Smiley, J.; Ens, B.; Cordeil, M.; Czauderna, T.; Lee, B.; Yang, Y.; Dwyer, T.; Jenny, B. Tangible Globes for Data Visualisation in Augmented Reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, New York, NY, USA, 29 April–5 May 2022. [Google Scholar] [CrossRef]
Sadiku, M.; Shadare, A.; Musa, S.; Akujuobi, C.; Perry, R. Data Visualization. Int. J. Eng. Res. Adv. Technol. (IJERAT) 2016, 12, 2454–6135. [Google Scholar]
Keim, D. Information visualization and visual data mining. IEEE Trans. Vis. Comput. Graph. 2002, 8, 1–8. [Google Scholar] [CrossRef]
Xu, H.; Berres, A.; Liu, Y.; Allen-Dumas, M.R.; Sanyal, J. An overview of visualization and visual analytics applications in water resources management. Environ. Model. Softw. 2022, 153, 105396. [Google Scholar] [CrossRef]
Zheng, J.G. Data visualization for business intelligence. In Global Business Intelligence; Routledge: London, UK, 2017; pp. 67–82. [Google Scholar] [CrossRef]
Preim, B.; Lawonn, K. A Survey of Visual Analytics for Public Health. Comput. Graph. Forum 2020, 39, 543–580. [Google Scholar] [CrossRef]
White, S.; Kalkofen, D.; Sandor, C. Visualization in mixed reality environments. In Proceedings of the 2011 10th IEEE International Symposium on Mixed and Augmented Reality, Basel, Switzerland, 26–29 October 2011; p. 1. [Google Scholar] [CrossRef]
Martins, N.C.; Marques, B.; Alves, J.; Araújo, T.; Dias, P.; Santos, B.S. Augmented Reality Situated Visualization in Decision-Making. Multimed. Tools Appl. 2022, 81, 14749–14772. [Google Scholar] [CrossRef]
Chen, K.; Chen, W.; Li, C.; Cheng, J. A BIM-based location aware AR collaborative framework for facility maintenance management. Electron. J. Inf. Technol. Constr. 2019, 24, 360–380. [Google Scholar]
Ma, N.; Liu, Y.; Qiao, A.; Du, J. Design of Three-Dimensional Interactive Visualization System Based on Force Feedback Device. In Proceedings of the 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, 16–18 May 2008; pp. 1780–1783. [Google Scholar] [CrossRef]
Han, W.; Schulz, H.J. Exploring Vibrotactile Cues for Interactive Guidance in Data Visualization. In Proceedings of the 13th International Symposium on Visual Information Communication and Interaction, VINCI ’20, New York, NY, USA, 8–10 December 2020. [Google Scholar] [CrossRef]
Su, Y.P.; Chen, X.Q.; Zhou, C.; Pearson, L.H.; Pretty, C.G.; Chase, J.G. Integrating Virtual, Mixed, and Augmented Reality into Remote Robotic Applications: A Brief Review of Extended Reality-Enhanced Robotic Systems for Intuitive Telemanipulation and Telemanufacturing Tasks in Hazardous Conditions. Appl. Sci. 2023, 13, 12129. [Google Scholar] [CrossRef]
Azuma, R.T. A Survey of Augmented Reality. Presence Teleoper. Virtual Environ. 1997, 6, 355–385. [Google Scholar] [CrossRef]
Tarng, W.; Tseng, Y.C.; Ou, K.L. Application of Augmented Reality for Learning Material Structures and Chemical Equilibrium in High School Chemistry. Systems 2022, 10, 141. [Google Scholar] [CrossRef]
Gavish, N. The Dark Side of Using Augmented Reality (AR) Training Systems in Industry. In Systems Engineering in the Fourth Industrial Revolution: Big Data, Novel Technologies, and Modern Systems Engineering; Wiley Online Library: Hoboken, NJ, USA, 2020; pp. 191–201. [Google Scholar] [CrossRef]
Wu, H.K.; Lee, S.W.Y.; Chang, H.Y.; Liang, J.C. Current status, opportunities and challenges of augmented reality in education. Comput. Educ. 2013, 62, 41–49. [Google Scholar] [CrossRef]
Akcayr, M.; Akcayır, G. Advantages and challenges associated with augmented reality for education: A systematic review of the literature. Educ. Res. Rev. 2017, 20, 1–11. [Google Scholar] [CrossRef]
Nishimoto, A.; Johnson, A.E. Extending Virtual Reality Display Wall Environments Using Augmented Reality. In Proceedings of the Symposium on Spatial User Interaction, SUI ’19, New York, NY, USA, 19–20 October 2019. [Google Scholar] [CrossRef]
Liu, B.; Tanaka, J. Virtual Marker Technique to Enhance User Interactions in a Marker-Based AR System. Appl. Sci. 2021, 11, 4379. [Google Scholar] [CrossRef]
Gao, Q.H.; Wan, T.R.; Tang, W.; Chen, L. A Stable and Accurate Marker-Less Augmented Reality Registration Method. In Proceedings of the 2017 International Conference on Cyberworlds (CW), Chester, UK, 20–22 September 2017; pp. 41–47. [Google Scholar] [CrossRef]
Ye, H.; Leng, J.; Xiao, C.; Wang, L.; Fu, H. ProObjAR: Prototyping Spatially-Aware Interactions of Smart Objects with AR-HMD. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 23–28 April 2023. [Google Scholar] [CrossRef]
Al-Ansi, A.M.; Jaboob, M.; Garad, A.; Al-Ansi, A. Analyzing augmented reality (AR) and virtual reality (VR) recent development in education. Soc. Sci. Humanit. Open 2023, 8, 100532. [Google Scholar] [CrossRef]
Goh, E.S.; Sunar, M.S.; Ismail, A.W. Tracking Techniques in Augmented Reality for Handheld Interfaces. In Encyclopedia of Computer Graphics and Games; Lee, N., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–10. [Google Scholar] [CrossRef]
Moro, M.; Marchesi, G.; Hesse, F.; Odone, F.; Casadio, M. Markerless vs. Marker-Based Gait Analysis: A Proof of Concept Study. Sensors 2022, 22, 2011. [Google Scholar] [CrossRef]
Zhang, Z.; Wen, F.; Sun, Z.; Guo, X.; He, T.; Lee, C. Artificial Intelligence-Enabled Sensing Technologies in the 5G/Internet of Things Era: From Virtual Reality/Augmented Reality to the Digital Twin. Adv. Intell. Syst. 2022, 4, 2100228. [Google Scholar] [CrossRef]
Syed, T.A.; Siddiqui, M.S.; Abdullah, H.B.; Jan, S.; Namoun, A.; Alzahrani, A.; Nadeem, A.; Alkhodre, A.B. In-Depth Review of Augmented Reality: Tracking Technologies, Development Tools, AR Displays, Collaborative AR, and Security Concerns. Sensors 2023, 23, 146. [Google Scholar] [CrossRef]
Khurshid, A.; Grunitzki, R.; Estrada Leyva, R.G.; Marinho, F.; Matthaus Maia Souto Orlando, B. Hand Gesture Recognition for User Interaction in Augmented Reality (AR) Experience. In Virtual, Augmented and Mixed Reality: Design and Development; Chen, J.Y.C., Fragomeni, G., Eds.; Springer: Cham, Switzerland, 2022; pp. 306–316. [Google Scholar]
Aouam, D.; Benbelkacem, S.; Zenati, N.; Zakaria, S.; Meftah, Z. Voice-based Augmented Reality Interactive System for Car’s Components Assembly. In Proceedings of the 2018 3rd International Conference on Pattern Analysis and Intelligent Systems (PAIS), Tebessa, Algeria, 24–25 October 2018; pp. 1–5. [Google Scholar] [CrossRef]
Kaimoto, H.; Monteiro, K.; Faridan, M.; Li, J.; Farajian, S.; Kakehi, Y.; Nakagaki, K.; Suzuki, R. Sketched Reality: Sketching Bi-Directional Interactions Between Virtual and Physical Worlds with AR and Actuated Tangible UI. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY, USA, 29 October–2 November 2022. [Google Scholar] [CrossRef]
Ishii, H.; Ullmer, B. Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, CHI ’97, New York, NY, USA, 22–27 March 1997; pp. 234–241. [Google Scholar] [CrossRef]
Löffler, D.; Tscharn, R.; Hurtienne, J. Multimodal Effects of Color and Haptics on Intuitive Interaction with Tangible User Interfaces. In Proceedings of the Twelfth International Conference on Tangible, Embedded, and Embodied Interaction, TEI ’18, New York, NY, USA, 18–21 March 2018; pp. 647–655. [Google Scholar] [CrossRef]
Shaer, O.; Hornecker, E. Tangible User Interfaces: Past, Present, and Future Directions. Found. Trends Hum.-Comput. Interact. 2010, 3, 1–137. [Google Scholar] [CrossRef]
Zuckerman, O.; Gal-Oz, A. To TUI or not to TUI: Evaluating performance and preference in tangible vs. graphical user interfaces. Int. J. Hum.-Comput. Stud. 2013, 71, 803–820. [Google Scholar] [CrossRef]
Baykal, G.; Alaca, I.V.; Yantaç, A.; Göksun, T. A review on complementary natures of tangible user interfaces (TUIs) and early spatial learning. Int. J. Child-Comput. Interact. 2018, 16, 104–113. [Google Scholar] [CrossRef]
He, F.; Hu, X.; Shi, J.; Qian, X.; Wang, T.; Ramani, K. Ubi Edge: Authoring Edge-Based Opportunistic Tangible User Interfaces in Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 23–28 April 2023. [Google Scholar] [CrossRef]
Unity. Vuforia SDK Overview. Available online: https://docs.unity3d.com/2018.4/Documentation/Manual/vuforia-sdk-overview.html. (accessed on 13 November 2023).
Filonik, D.; Buchan, A.; Ogden-Doyle, L.; Bednarz, T. Interactive Scenario Visualisation in Immersive Virtual Environments for Decision Making Support. In Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, VRCAI ’18, New York, NY, USA, 2–3 December 2018. [Google Scholar] [CrossRef]
Prange, S.; Shams, A.; Piening, R.; Abdelrahman, Y.; Alt, F. PriView—Exploring Visualisations to Support Users’ Privacy Awareness. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY, USA, 8–13 May 2021. [Google Scholar] [CrossRef]
Bermejo Fernandez, C.; Lee, L.H.; Nurmi, P.; Hui, P. PARA: Privacy Management and Control in Emerging IoT Ecosystems Using Augmented Reality. In Proceedings of the 2021 International Conference on Multimodal Interaction, ICMI ’21, New York, NY, USA, 18–22 October 2021; pp. 478–486. [Google Scholar] [CrossRef]

Figure 1. The main design and implementation module of the system. The illustration integrates three components: performance layer, business layer, and data layer.

Figure 2. Member management module. The illustration integrates four components: system tutorial, system experiment, data recording, and data processing and analysis.

Figure 3. Composition of the Augmented Reality Interface Module. Unity and MRTK provide technical support for the AR interface to achieve scene construction and multi-modal interaction, connecting systems and devices through the AR interface, offering a reliable and convenient tool for AR research and development.

Figure 4. Composition of user behavior interaction module. The illustration integrates three components: multi-modal perception, interactive objects, and a feedback mechanism.

Figure 5. Experimental process management flowchart. This comprehensive flowchart outlines the process of designing, executing, and monitoring augmented reality experiments. It enables customization, real-time monitoring, overall process management, and data collection for in-depth analysis, ensuring efficient and insightful experimental administration.

Figure 6. AR based smart home scene setting environment.

Figure 7. Results of various tasks. (a) Distribution of consumed time for different tasks. (b) Distribution of click counts for different tasks.

Figure 8. Distribution of click types among users in different tasks. This figure shows the user’s operations on different tasks during the experiment.

Figure 9. Time consumed and click counts by different users. By observing this figure, we can understand the level of user participation in these tasks during the experiment.

Figure 10. Errors in the user’s perception of the boundaries of the virtual 3D range. (a) Distribution frequency of user distance perception error. (b) Distribution frequency of user angle perception error.

Figure 11. User evaluation of the ease of learning and interactivity of different functions.

Figure 12. Eye-tracking heatmap of participants in augmented reality system. It provides a visual representation of the specific elements and areas that users focus on during eye-tracking experiments in the interaction process.

Figure 13. These scatter plots depict participants’ eye-tracking gaze data on the visual interface, generated during the eye-tracking experiment. They showcase the eye movement data results from two different graphical interfaces within the system. The data have been normalized and standardized, aligning it to a consistent coordinate system. On the left side, the scatter plot displays the eye-tracking data results of the system’s graphical interface in a two-dimensional plane. On the right side, the results present users’ three-dimensional eye-tracking data on the graphical interface in an augmented reality setting. This chart reflects each user’s focal points during the eye-tracking experiment.

Figure 14. Total gaze duration of different participants on five different graphical interfaces in the eye-tracking experiment. By calculating the total gaze duration of each participant on each interface, we can extract the attention intensity of each user towards different elements. Longer gaze durations typically indicate higher interest or cognitive load.

Table 1. Task scene and description.

Scene	Description
Privacy Scene	Participants were asked to imagine setting up corresponding settings in privacy scenarios to minimize the risk of privacy exposure.
Leaving Scene	Participants were asked to imagine setting up energy-saving, home-cleaning, and house safety functions when leaving home for work.
Parlor Scene	Participants were asked to imagine having friends as guests at home and to provide a light and comfortable environment. They were also asked to make corresponding settings while confidently chatting.
Sleeping Scene	Participants were asked to imagine preparing to sleep at night and needing a quiet environment. They were also asked to make corresponding settings to avoid exposing their privacy.

Table 2. Statistical data of consumed time and click counts by different tasks. We conducted the Shapiro test for user consumption time and number of clicks in different scenarios and found that it does not conform to normal distribution, so we used a nonparametric two-sample Wilcoxon’s rank test to check whether the results were significant or not and calculated the correlation coefficient of the two.

	Click Counts		Time Consumed (s)		p	r
	M	SD	M	SD	p	r
Privacy Scene	34.08	17.04	89.2	53.69	p < 0.0001	0.78
Leaving Scene	13.72	7.84	56.24	37.66	p < 0.0001	0.83
Parlor Scene	12.12	8.27	58.40	39.46	p < 0.0001	0.71
Sleeping Scene	13.28	7.93	54.64	39.61	p < 0.0001	0.56
Total	73.2	41.08	258.48	170.42	p < 0.0001	0.73

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, L.; Zhao, H.; Shi, C.; Wu, Y.; Yu, X.; Ren, W.; Zhang, Z.; Shi, X. Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making. Systems 2024, 12, 7. https://doi.org/10.3390/systems12010007

AMA Style

Chen L, Zhao H, Shi C, Wu Y, Yu X, Ren W, Zhang Z, Shi X. Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making. Systems. 2024; 12(1):7. https://doi.org/10.3390/systems12010007

Chicago/Turabian Style

Chen, Liru, Hantao Zhao, Chenhui Shi, Youbo Wu, Xuewen Yu, Wenze Ren, Ziyi Zhang, and Xiaomeng Shi. 2024. "Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making" Systems 12, no. 1: 7. https://doi.org/10.3390/systems12010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making

Abstract

1. Introduction

2. Literature Review

2.1. Visualization Systems

2.2. Augmented Reality Technology

2.3. Virtual User Interfaces

3. Materials and Methods

3.1. System Framework

3.2. Member Management Module

3.3. Augmented Reality Interface Module

3.4. User Behavior Interaction Module

3.5. Eye-Tracking Data Acquisition Module

3.5.1. Data Collection Methods

3.5.2. Data Processing Methods

3.6. AR Experiment Process Management Module

4. Results

4.1. AR Interactive Experiment

4.1.1. Interactive Experiment Design

4.1.2. Experiment Results

4.2. Eye-Tracking Experiment

4.2.1. Eye-Tracking Experiment Design

4.2.2. Experiment Results

5. Discussion

5.1. Analysis of System Advantages and Usability Experiment Results

5.2. System Modules’ Usability Analysis

5.3. Multi-Modal User Data Acquisition Method

5.4. System Limitations and Future Expansion Points

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI