Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing

Muchova, Patricia; Saderova, Janka; Ondov, Marek

doi:10.3390/su18104638

Open AccessArticle

Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing

by

Patricia Muchova

,

Janka Saderova

^*

and

Marek Ondov

Faculty of Mining, Ecology, Process Control and Geotechnologies, Technical University of Kosice, Letna 9, 04200 Kosice, Slovakia

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(10), 4638; https://doi.org/10.3390/su18104638

Submission received: 26 February 2026 / Revised: 28 April 2026 / Accepted: 4 May 2026 / Published: 7 May 2026

(This article belongs to the Special Issue Recent Advances in Modern Technologies for Sustainable Manufacturing)

Download

Browse Figures

Versions Notes

Abstract

Human–computer interaction (HCI) has evolved from traditional command-based interfaces to adaptive systems powered by artificial intelligence (AI). In industrial environments, particularly manufacturing and logistics, selecting the appropriate interaction modality is crucial for efficiency, safety, and user acceptance. This study presents a conceptual decision support framework that analyzes three modalities—visual, voice, and multimodal—based on a systematic literature review covering the period from 2003 to early 2026. The analysis evaluates differences in usability, cognitive workload, implementation complexity, and operational benefits of HCI and AI-based HCI. To address the selection challenge, a multi-criteria decision analysis (MCDA) model was developed. The proposed MCDA model is based on a structured literature analysis and expert-informed evaluation. The expert-based MCDA ranking is context-dependent and grounded in the reviewed literature. The results indicate that multimodal HCI shows the highest potential in manufacturing scenarios, offering advantages in safety, robustness, flexibility, and potential contributions to sustainability. However, it also indicates more demanding implementation, training requirements, and higher costs. The proposed decision support framework is intended to serve as a methodological tool for the structured evaluation of HCI modality suitability in sustainable manufacturing environments.

Keywords:

human–computer interaction; artificial intelligence; multimodal interaction; multi-criteria evaluation; sustainable manufacturing

1. Introduction

Human–computer interaction (HCI) has evolved from early interface-based communication toward increasingly adaptive and intelligent forms of interaction. Early HCI research conceptualized interaction as a dialog between the user and the computer, mediated through physical or software interfaces [1]. To ensure conceptual clarity throughout this study, a fundamental distinction is made between these two paradigms. Traditional HCI is understood as a reactive, rule-based system in which the user provides explicit commands (e.g., clicks or manual inputs) to a largely static interface. In contrast, AI-based HCI is characterized as a proactive and adaptive system that uses machine learning and pattern recognition to interpret user intent, process natural inputs (e.g., speech or gestures), and adjust its behavior according to the operational context. The design of HCI was initially dominated by direct manipulation and later by delegation [2]. Over time, HCI has developed into an interdisciplinary field encompassing computer science, cognitive sciences, ergonomics, design, economics, and social sciences [3]. Its goal is to design and evaluate interactive systems that account for user needs and enable more effective human–technology collaboration [4]. Traditional HCI models relied primarily on explicit commands and static user interfaces, with emphasis placed on usability and ergonomics.

The development of artificial intelligence has significantly expanded the HCI paradigm. AI integration enables adaptive and predictive interactions that reduce the need for manual control and support more personalized user experiences [4,5]. According to Sharma [6], it is essential to incorporate HCI principles into AI design through interdisciplinary collaboration involving computer science and AI, human factors and ergonomics, psychology, HCI, information science, and education [7]. At the same time, the growing role of AI in HCI should not be understood as a replacement for established human-centered principles. On the contrary, AI-based systems still rely on knowledge derived from cognitive psychology, ergonomics, and user-centered design, particularly in relation to trust, usability, and user acceptance. Such an approach enables the development of adaptive and personalized interactions that improve user experience and support more efficient human–technology collaboration [8].

Research on human–AI interaction also highlights the importance of active user participation in system design and operation, which contributes to greater engagement and better understanding of AI functioning [9]. AI has opened new application scenarios for HCI, including smart homes, autonomous vehicles, and virtual assistance systems [10]. At the same time, it raises important questions related to transparency, privacy protection, and user trust [11].

While many studies address HCI at a general level, this article focuses on interaction modalities that are particularly relevant in industrial and logistics contexts. In warehouse processes, picking and inventory handling operations are supported by solutions such as pick-by-voice, pick-by-light, pick-by-vision, and augmented reality (AR)-based visual guidance. In transport and intralogistics, HCI is reflected in operator interaction with automated guided vehicles (AGVs), intelligent transportation systems, and adaptive navigation interfaces.

In industrial practice, the selection of an appropriate HCI modality is closely linked to specific operational decision scenarios. These include, for example, the choice of guidance modalities for warehouse picking tasks, the design of maintenance and warning interactions in high-noise production environments, the selection of suitable interfaces for human–machine collaboration at operator workstations, and the definition of hands-free interaction requirements in settings where manual input is limited or undesirable. Framing the problem through such representative scenarios helps position the evaluation criteria used later in the paper as decision-relevant factors rather than as general descriptive attributes.

In this study, we analyze traditional HCI and AI-based HCI across three modalities—visual, voice, and multimodal interaction. The primary objective is to develop a scientifically grounded decision support framework for selecting appropriate HCI modalities in representative manufacturing and logistics scenarios from the perspective of industrial stakeholders. Due to a lack of cooperation with industrial partners, the industrial validation data are unavailable. The study is intentionally framed as a conceptual framework rather than an empirically validated investigation. The objective is not to provide final empirical proof, but to structure the HCI modality selection problem and identify context-sensitive decision priorities derived from the literature.

To compare and rank these modalities, a multi-criteria decision analysis (MCDA) approach is applied. The proposed framework is intended as a reusable analytical tool for future validation and scenario testing, supporting the design of efficient, safe, and user-centered systems in sustainable industrial environments. The analysis is based on literature published between 2003 and 2026 and is structured around three aspects:

Visual interaction—from graphical user interfaces to AI-supported computer vision and augmented reality;

Voice interaction—from simple spoken commands to advanced speech recognition and intelligent voice assistants;

Multimodal interaction—the combination of multiple input channels, where AI enables more adaptive and natural communication.

2. Literature Review

2.1. Literature Search and Selection Strategy

To ensure the transparency and validity of the comparative analysis, a comprehensive literature search was conducted. The primary objective was to establish a relevant evidence base for evaluating AI-based HCI modalities within manufacturing and industrial contexts.

2.1.1. Search Strategy and Databases

The search was performed across major academic databases, including Scopus, Web of Science, ScienceDirect, and SpringerLink. In addition, Google Scholar and arXiv were used to identify recent Early Access, In Press, and preprint publications available up to early 2026. The retrieval strategy utilized keywords grouped into three domains:

Core concepts: “Human–Computer Interaction (HCI)”, “Human-Centered Intelligent Interaction (HCII)”;
Interaction modalities: “Voice interaction”, “Visual interaction”, “Touch interaction”, “Multimodal interaction”;
Application context: “Manufacturing”, “Industry 4.0”, “Logistics”.

To refine the results, specific inclusion criteria were applied. We focused on peer-reviewed journal articles, high-quality conference proceedings, and relevant English-language preprints that provide functional characteristics, operational requirements, or qualitative benchmarks for AI-based HCI. Conversely, the selection excluded non-peer-reviewed sources, duplicate records, and papers where AI or HCI were mentioned only peripherally without technical depth or direct relevance to industrial applications.

2.1.2. Selection Process and Data Synthesis

The selection followed a multi-stage screening process based on the predefined criteria. Initially, duplicate records were removed, followed by a screening of titles and abstracts to filter out irrelevant studies. The remaining full-text articles were then thoroughly assessed for eligibility.

The final analytical sample consists of core sources that directly provide the qualitative findings and functional characteristics required for the comparative analysis. In addition to these primary sources, a secondary set of literature was utilized to support the background theoretical and methodological framework. Data extracted from the final selection of core sources served as the fundamental evidence base for the subsequent analysis, ensuring that the synthesis is grounded in current and contextually relevant industrial research.

2.2. Functional Overview and Application Trends of AI-Based HCI Modalities

Based on the systematic selection process described in Section 2.1, this section synthesizes the identified trends, operational characteristics, and practical applications of the evaluated HCI modalities within industrial and logistics environments.

Human–computer interaction (HCI) is an interdisciplinary field focused on the interactive collaboration between humans and computational systems via an interface [12]. Its primary goal is to develop technologies that enhance intuitiveness, efficiency, and user-friendliness in daily activities [13,14]. Core principles include user-centered design, ensuring systems reflect user needs, and accessibility [15]. Key design elements such as signifiers, consistency, and feedback are essential for intuitive system control and effective adoption of new applications [15]. While some research (e.g., in China) has traditionally focused on back-end hardware and software engineering to improve task speed and accuracy [16], modern approaches emphasize the categorization of visualization methods. Research has identified five primary visualization approaches: recipient, primary purpose, visual archetype, interaction type, and design process [17]. However, as no single selection system currently exists, future methods will likely require a combination of multiple visualization techniques [17].

HCI has transformed from command-line and graphical interfaces to natural multimodal inputs combining gestures, voice, touch, and visual perception [18]. Modern sensing technologies, such as 3D imaging and time-of-flight measurement, enable bidirectional interaction and immersive experiences like augmented reality (AR) [18]. Currently, artificial intelligence (AI) is one of the fastest-evolving technologies in robotics, computer vision, and natural language processing (NLP) [19,20]. AI enables the creation of predictive models and pattern recognition systems that mimic human thinking and decision-making processes [21,22]. The integration of HCI principles into AI—often termed Intelligent HCI (HCII)—is crucial for developing user-friendly interfaces, particularly in the post-pandemic era where contactless interaction is prioritized [6,23].

Unlike traditional HCI, HCII focuses on the naturalization of interaction through emotion and gesture recognition across various domains [20,24]. AI improves the prediction of user behavior, creates adaptive interfaces, and significantly reduces cognitive load [24]. This multidisciplinary link utilizes AI methods, including machine learning and computer vision, to enhance system performance and user experience [25,26,27,28,29]. Systematic reviews of research between 2010 and 2021 indicate a fundamental shift from static visual elements toward the intelligent interpretation of sensory inputs, such as emotions, gestures, and facial expressions [30]. AI techniques are increasingly utilized to enable dynamic interactions; in the visual domain, computer vision and AR now respond to real-time user behavior [4], while in the auditory domain, machine learning enables natural speech recognition and emotional nuance detection, significantly expanding the capabilities of voice assistants [5].

2.3. Visual Interaction

Visual interaction represents a key element in human–computer interaction, as facial expressions carry significant emotional information that influences interpersonal communication. With the development of artificial intelligence, new types of sensors and interactive devices have emerged, introducing advanced forms of communication between humans and systems. These technologies enable interaction based on biometric elements, such as facial recognition, fingerprint scanning, and body posture analysis, thereby expanding the possibilities for personalized and contactless interaction [31].

Devices in human–computer interaction that utilize sensor technologies [32] include wearable devices, which are often equipped with various built-in sensors—mechanical, physiological, and biochemical. These sensors enable the acquisition of data on the user’s physical and psychological state [33]. Currently, such wearable sensors are increasingly used to measure biological signals, such as heart rate or skin conductance [34].

According to Govindaraju and Thangam [35], facial expressions are crucial for interpreting emotions such as joy, sadness, or anger, with facial expression recognition technologies employing computer vision and machine learning methods to analyze facial features and categorize emotions. Such systems can adapt responses to the user’s emotional state, thereby enhancing the interactive experience. Challenges like lighting variations, image occlusions, and individual differences in facial morphology, however, pose significant hurdles to ensuring the accuracy of these systems.

Modern approaches in visual interaction also incorporate gaze-tracking technologies, such as eye tracking [36] and 3D visual sensors [37], which enable more detailed mapping of user visual attention and improve the accuracy of interpreting their reactions. In healthcare, artificial intelligence is utilized for processing visual data, for example, through remote photoplethysmography (rPPG) technology to estimate heart rate from video [38], as well as in the analysis of radiographic and intraoral images in dental medicine, where AI algorithms support diagnosis and treatment planning [39].

Other significant areas of AI-supported visual interaction include emotion recognition [40] and real-time emotion detection [41]. In these ways, visual interaction advances from the traditional HCI model, which focused primarily on user responses to visual stimuli, toward intelligent systems leveraging AI methods for adaptive and emotionally aware communication.

Augmented reality in logistics transcends traditional HCI approaches by delivering digital information directly into workers’ field of view during tasks (e.g., warehousing, handling, or training), thereby improving performance, reducing the need for external interfaces, and lowering cognitive load—but it also underscores that these AR interactions require user adaptation and addressing technical barriers such as device ergonomic limitations and social factors in workplace practice [42]. Pick-by-light is a visual HCI system used in warehouses, where light indicators on shelves guide workers to the correct items, facilitating order-picking tasks. This simplifies navigation, reduces the need for physical documentation, and enhances picking efficiency and accuracy [43]. Hand gesture recognition via computer vision represents a visual form of HCI, where hand movements are interpreted as commands for computer applications, enabling natural and contactless user–system interaction [44]. Touch-based interaction is closely associated with visual HCI because it is typically performed through touchscreens and graphical interfaces supported by immediate visual feedback. The recent literature indicates that touch remains an important interaction modality in contemporary human–machine systems, particularly in adaptive and human-centric industrial environments. At the same time, current research shows that touch is increasingly integrated with other input channels within broader multimodal interaction frameworks, which further strengthens its role in modern AI-supported interface design [45,46]. Recent trends in visual interaction increasingly emphasize AR/VR-supported visualization, real-time information overlays, and adaptive visual interfaces for human-centric manufacturing environments [47].

2.4. Voice Interaction

Speech recognition technology has significantly evolved over the years, with early systems capable of identifying only isolated words, requiring slow and deliberate speech, and operating on the principle of comparing spoken words to pre-recorded templates [48]. Advances in AI-driven human–computer communication focus on voice emotion recognition through deep learning, enabling the capture of subtle emotional cues such as tone and pitch, thereby supporting more natural and contextually sensitive interactions with virtual assistants, customer chatbots, and similar applications [49].

Voice represents one of the most used output modalities for artificial intelligence agents, defined as computational systems capable of perceiving the environment, processing information, and generating contextually sensitive and adaptive responses to user stimuli [50,51]. Conversational agents (CAs) or chatbots are software systems designed to simulate text- or voice-based communication through natural language processing techniques [52]. These agents communicate with users via spoken language, which is a natural and intuitive interaction medium for most people [53]. Voice interaction is rapidly expanding across various industries, particularly due to improved natural language processing accuracy and real-time responses. Recent research also indicates a shift from simple command-based voice systems toward more conversational voice agents, with increasing emphasis on usability and practical design guidelines for voice user interfaces [54].

In practice, it is applied in customer support through voice assistants that virtually engage in conversations with customers via voice [55]. Virtual voice assistants in operations, such as Amazon’s Alexa, provide customer service and shopping opportunities [56].

In healthcare, voice-controlled artificial intelligence can significantly enhance routine physician–patient interactions by overcoming language barriers [57], while also serving as a tool for voice analysis and diagnostics, such as Parkinson’s disease, Alzheimer’s disease, and schizophrenia [58]. Currently, however, the adoption of voice interaction in medicine must address challenges such as patient privacy, security, ethical concerns, reliability, and trust in artificial intelligence. In the automotive industry, Toyota Europe has integrated the “eCare” voice assistant based on conversational AI, which contacts drivers when a warning check light is activated (integrated with dealer systems) [59].

In the Arab region, a 2025 study presented a voice-controlled smart home system based on human–computer interaction, artificial intelligence, and IoT technology, supporting Arabic commands for seniors and visually impaired users; the system demonstrated seamless functionality in real-world tests with 94.4% recognition accuracy [60]. Hands-free home control, such as Amazon Alexa’s voice assistant, streamlines household tasks like lighting adjustment, temperature regulation, and home access security. Across these domains, it contributes to more natural, intuitive, and efficient communication between humans and intelligent technologies. AI-powered voice interaction technologies significantly reduce the need for manual, time-consuming tasks and contribute to simplifying and optimizing various aspects of users’ daily lives [25].

Voice user interfaces in traditional HCI are applied in logistics, for example, to control robots, machines, and transport vehicles via voice commands, adjust manufacturing machine controls and processes, provide machine status information, or assist in warehouse order picking [61]. Pick-by-voice (PbV) is a voice-guided system used in warehouses that allows workers to perform hands- and eyes-free tasks, thereby reducing cognitive and physical strain and shortening training time. PbV implementation leads to higher productivity, accuracy, and workplace safety, while optimizing the order-picking process and reducing costs. This system represents an effective HCI solution in logistics, supporting direct human–machine interaction in warehouse operations [62].

2.5. Multimodal Interaction

Multimodal interaction enables more natural communication between the user and an automated system in both directions—for input submission and output reception. Such a system must be capable of recognizing and processing inputs from diverse modalities, integrating them based on temporal and contextual relationships to correctly interpret the meaning of the communication [63]. Alejandro Jaimes and Nicu Sebe [64], in their definition, adopt a human-centered approach, viewing a modality as a communication method corresponding to human senses or the type of computer input device.

Multimodal interfaces minimize error occurrence while enhancing system robustness. They allow users to more easily identify and correct errors or recover from them more quickly. Additionally, they expand communication possibilities between humans and systems by increasing information bandwidth and offering alternative interaction methods tailored to diverse situations and environments [65]. Figure 1 illustrates the general architecture of a multimodal HCII system, which integrates sensory inputs, data analysis and fusion processes, as well as the generation of interactive commands and responses within an intelligent computing environment.

The practical application of multimodal systems is also evident in experiments integrating visual and spatial interaction. As noted by the authors of a comparative HCI and AI analysis [66], multimodal prototypes such as the Stanchion computer and Tbo the Tablebot demonstrate that visual and spatial interaction can be embedded into built environments and everyday objects, thereby blurring the boundaries between the digital and physical worlds.

Multimodal interaction [67] enables operators to control robots, select items in AR/VR environments, and combine gestures with voice commands, thereby increasing the accuracy and efficiency of warehouse tasks. At the same time, it improves ergonomics and safety, minimizing physical strain and injury risks during goods handling. In the study “Assessing the Value of Multimodal Interfaces: A Study on Human–Machine Interaction in Weld Inspection Workstations,” the authors examined multimodal interfaces combining projection, gestures, and voice control at weld seam inspection workstations in manufacturing. The results showed that multimodal interaction enhances efficiency, reduces physical strain, and improves work intuitiveness. This approach demonstrates the practical value of HCI in industrial production, where flexible and ergonomic control improves workflows and worker satisfaction [68]. Recent research in manufacturing highlights a shift toward multimodal language models and spatial intelligence as promising directions for more natural, context-aware, and collaborative human–robot interaction in smart manufacturing environments [69].

3. Materials and Methods

To compare traditional human–computer interaction approaches with intelligent AI-based approaches based on modalities, we applied methods of comparative analysis and multi-criteria evaluation.

Comparative analysis represents a set of analytical methods that enable the evaluation and comparison of diverse objects and datasets [70] on both quantitative and qualitative levels [71], even in cases where human intelligence cannot perform the comparison directly [70]. The application of this method is employed in examining various topics in logistics and manufacturing, such as comparing decision-making methods in transport logistics [72], the relationship between macroeconomic indicators and logistics performance [73], multi-criteria decision-making methodologies for warehouse location selection problems [74], and even the latest defect detection techniques in manufacturing using thermography [75]. Within comparative analysis, three basic approaches are distinguished [76]:

Quantitative approach—focused on variables,
Qualitative approach—centered on specific cases,
Fuzzy approach—based on fuzzy set theory.

The goal of comparative research is to systematically describe, analyze, and explain similarities and differences between the examined cases [77,78,79]. The results of comparative studies significantly contribute to identifying gaps in the current state of knowledge and create opportunities for new research directions and future studies in areas that may not have been sufficiently addressed in previous research [79]. Comparative research provides opportunities to adapt more efficient and/or innovative procedures, processes, or guidelines borrowed from other contexts [77]. In this study, comparative analysis was used to systematically compare traditional HCI and AI systems across visual, voice, and multimodal interactions.

Multi-criteria analysis (MCA), also referred to in the literature as multi-criteria decision making (MCDM), multi-criteria decision analysis (MCDA), multi-objective decision analysis (MODA), multi-criteria decision making for alternatives (MADM), or multi-dimensional decision making (MDDM), represents a set of methods, techniques, and analytical tools varying in complexity levels. A common feature of these approaches is the explicit consideration of multiple objectives and evaluation criteria in solving decision problems. Although MCA serves as an effective tool for structuring and framing complex decision tasks, its application does not automatically guarantee higher-quality or more comprehensive decisions [80].

The selection of a specific MCA method is often performed quite arbitrarily and is primarily motivated by analysts’ and decision-makers’ familiarity with the method, availability of required software and tools, or the existence of examples and similar studies that can be relatively easily replicated [81,82]. Each multi-criteria decision-making approach is based on a decision matrix that includes a set of evaluated alternatives A_i, a set of evaluation criteria C_j, and corresponding weights expressing their relative importance w_j, along with u_ij values representing the degree to which each alternative meets the criteria [83].

In this study, the Simple Additive Weighting (SAW) method, also known as the Weighted Sum Method (WSM), is applied for multi-criteria decision making. In the SAW method, the ranking of individual alternatives is determined based on calculating the overall evaluation score, which arises as the weighted sum of normalized values of the evaluation criteria. The overall score of an alternative is calculated according to the following equation [84]:

A_{i} = \sum_{j = 1}^{n} w_{j} u_{i j}

(1)

where

$A_{i}$ represents the resulting score of the $i$ -th alternative;
$u_{i j}$ denotes the suitability of the $i$ -th alternative according to the $j$ -th criterion;
$w_{j}$ expresses the weight of the $j$ -th criterion.

In practice, the criteria for multi-criteria decision problems are often expressed in different units, making direct comparison impossible. For this reason, data preprocessing in the form of normalization is essential, allowing the transformation of heterogeneous data into a comparable form [85].

We proposed a methodological procedure for selecting the appropriate modality that will be carried out in six consecutive steps, visually presented in Figure 2.

Within the proposed research framework, the first step involves defining the primary objective of the solution. The research seeks to develop a decision support framework for selecting an appropriate HCI modality in manufacturing and logistics processes. A key limitation is the inability to verify the conclusions in real-world industrial environments and specific operational scenarios. Therefore, the study focuses on a scientifically grounded framework based on data obtained through a comprehensive literature analysis in the field of HCI, addressing the manufacturing context.

Within the established literature review, a comparative analysis of interactions at the level of individual modalities is applied. Comparative analysis is crucial for identifying key differences and functional characteristics of each modality.

The core part of the study consists of multi-criteria analysis. An important component is the definition of criteria and the assignment of weights to criteria. The criteria were defined by the authors based on the literature review and comparative analysis of the modalities.

To determine the scores and weights of individual criteria, an expert-based approach was adopted. The expert-based assessment was used as a structured elicitation technique in the absence of direct industrial measurements. To preserve the scientific character of the framework, three experts from the academic sphere were selected. One of the experts was an author of the contribution who conducted the literature review. The involvement of an author as one expert is acknowledged as a limitation. This limitation may be partially balanced by the author’s comprehensive, scientifically grounded knowledge of the topic, derived from conduction of the structured literature review presented in Section 2 and Section 4.

Two additional experts were included: the first from the fields of manufacturing, logistics, and information systems in production, and the second from the field of industrial engineering, with a focus on the application of AI in manufacturing processes. The expert panel was limited to three due to the availability constraints of different domain experts. Larger panels (n ≥ 7) are recommended for future framework applications. To reduce subjectivity, all experts were provided with identical scoring bases and standardized evaluation rules to ensure assessment consistency:

Interactions for a given criterion were to be evaluated maximally; i.e., the better interaction satisfies the criterion, the higher its score within the assigned scale.
Interactions for a given criterion were to be scored on an interval from 1 to 9 (1—very low satisfaction of the criterion, 2—low to very low, 3—low, 4—low to medium, 5—medium, 6—medium to high, 7—high, 8—high to very high, 9—very high).
For criteria whose impact is opposite, an inverse scale was to be applied.
During the evaluation, the experts were to remain neutral and base their judgments solely on facts recognized by the academic community.

Subsequently, the experts assigned weights to the individual criteria. Their individual weights were aggregated using the arithmetic mean. In the case of weights assigned by multiple experts, a supplementary rank correlation approach can be used to check consistency. The consistency of the weights was verified using Spearman’s rank correlation among the three experts.

The final phase involves the actual implementation of multi-criteria evaluation. The final phase involves the actual implementation of the multi-criteria evaluation. The aim of the study is to rank HCI modalities in a clear and practically interpretable manner. The Simple Additive Weighting (SAW) method is therefore applied as a compensatory aggregation method that evaluates alternatives via the weighted sum of normalized criterion values. It is suitable for situations where fast and transparent decision making based on multiple criteria is required. The input values for the SAW method were established using an indirect pairwise comparison approach [84].

In this context, SAW represents a suitable choice, as it aggregates the weighted scores of the criteria into a single overall value for each alternative, enabling direct comparison and transparent interpretation of the results.

4. Results and Discussion

We conducted comparative analysis and MCDA based on scientific publications detailing applications of visual, voice, and multimodal interaction within traditional HCI systems and AI-utilizing systems in manufacturing, warehousing, logistics, or other fields. Within the analyses, we consider three modalities because we classify touch modality as part of the visual modality, as the two are closely related. The touch modality is also presented within multimodal HCI.

Regarding interfaces used in the social domain and healthcare, such as emotion recognition or brain–computer interaction, we did not create a separate category, as their use in manufacturing settings has not yet achieved broad coverage in this context compared with the other modalities. We address some of these applications within multimodal interaction.

4.1. Comparative Analysis

To enhance the robustness of the comparison, we provide additional examples of HCI and HCII applications across industries. The authors’ study, through an experiment using Google Glass, compares traditional HCI methods (handheld scanners) with AR interfaces in the parcel sorting process. The results indicate that while intelligent AR applications with automatic verification may be technically slower, they significantly increase subjective user satisfaction and reduce error rates [86].

In smart manufacturing and Industry 4.0, research focuses on integrating augmented reality as a key tool for connecting workers with digital systems. The analysis emphasizes that successful multi-criteria assessment of the technology requires considering factors such as data processing speed, hardware ergonomics, and real-time information display accuracy. These intelligent systems enable more efficient decision making and reduce operators’ cognitive load, thereby defining new standards for human–machine interaction. A comprehensive review of barriers and opportunities for deploying these approaches in industrial practice was provided by the authors in [87].

In warehousing, the study focuses on various types of automated order-picking systems. One application highlighted is Parts-to-picker, which examines automated devices and how their technical design affects performance and error rates in logistics. Here, voice modality is represented by the worker confirming picks vocally, and visual modality through Pick-to-light signaling or AI cameras verifying item correctness in the warehouse. Another type is Robot-to-parts, where mobile robots (e.g., autonomous AGVs/AMRs) navigate to shelves. The robot must detect humans in aisles (computer vision) and communicate with them (e.g., audio signals or displays) to prevent collisions. The article also describes parts-to-robot—fully automated arms replacing humans—where the human role shifts to supervisor, with human–computer interaction centered on monitoring and control [88].

The study by Ziaee and Hamedi [89] systematically analyzes the implementation of augmented reality as a key pillar of Industry 4.0, focusing on its ability to integrate digital information directly into the physical work environment. The research addresses the transition from conventional methods to intelligent assistance systems that leverage visual and multimodal interaction to support operators’ real-time decision making. The authors delve into the technical and functional aspects of human–machine interaction, identifying key factors for successful adoption, such as ergonomics, object tracking accuracy, and cognitive load reduction. The contribution demonstrates measurable increases in productivity and accuracy when replacing traditional HCI interfaces with intelligent AR solutions.

Within multimodal interaction between humans and robots (HRI—Human–Robot Interaction), the authors [90] investigate smart manufacturing. The article bridges robotics technical aspects with cognitive science. The publication emphasizes the shift from “robot-centric” manufacturing to “human-centric” (human-oriented) manufacturing, where the robot is not merely a machine but an intelligent partner. For reliable HCI, redundancy of modes is essential—if the robot fails to hear a voice command due to machine noise, it must confirm it via visual gesture recognition (e.g., smooth tool handoff without requiring button presses).

Within the framework of multimodal interaction, the authors also examined the use of artificial intelligence and machine learning methods to analyze EEG and eye-tracking data for detecting users’ cognitive and error states. The contribution of the work lies in the design and validation of AI models capable of adaptively interpreting human responses during system interaction. Although the results have implications for human–system interaction, the study is primarily AI-oriented [91].

In the medical field, the study focuses on traditional HCI, examining directional eye-tracking control using a mobile device’s front camera and application. The presented system can control power to any electrical device or direct the movement of mobile objects, such as cars, robots, and wheelchairs. The author’s goal was to improve the quality of life for elderly individuals lacking physical limb control capabilities [92].

The research “Model-based adaptive user interface based on context and user experience evaluation” [32] focuses on a model-based adaptive user interface that automatically adjusts according to the usage context and user experience. The system leverages environmental information, user capabilities, device type, and feedback to modify the interface in real time. Implementation and empirical testing demonstrated that this approach enhances usability and user satisfaction compared to standard interfaces. Such an approach is typical of the HCI field, as it emphasizes optimizing human–machine interaction, evaluating user experience, and developing adaptive, user-friendly interfaces.

In their contribution [33], “A CNN-LSTM Deep Learning Classifier for Motor Imagery EEG Detection Using a Low-invasive and Low-Cost BCI Headband”, the authors represent the HCI aspect through the design of a physical interface between human and system via Electroencephalography (EEG) sensors and a low-cost Brain–Computer Interface (BCI) headband for capturing brain signals. The artificial intelligence domain is addressed by employing deep learning, where the combination of models enables automatic extraction and classification of spatiotemporal features from EEG signals.

Velagaleti S. B. et al. [49] focus on the concept of empathetic algorithms as AI systems capable of recognizing, interpreting, and responding to human emotions to foster emotional intelligence across various application domains. The core of the approach leverages artificial intelligence methods, particularly machine learning, natural language processing, and computer vision, enabling autonomous analysis of emotional states and adaptive decision making. Although the outcome improves human–system interaction, the work is conceptually oriented toward AI as a cognitive and decision-making mechanism, rather than traditional human–computer interaction, primarily focused on interface design.

The study by Jo, H. [93] examines factors influencing the long-term use of AI-based voice assistants such as Siri or Alexa. The authors investigate how technology interaction, novelty, voice quality, and discomfort shape users’ attitudes and willingness to continue using these assistants. Data analysis from 256 participants using structural equation modeling revealed that positive attitudes, quality interactions, and attractive voices promote usage. The findings offer valuable insights for developing and optimizing AI voice assistants to make them more pleasant and effective for users.

In the publication “Enhancing User Experience in AI-Powered Human-Computer Communication with Vocal Emotions Identification Using a Novel Deep Learning Method” (Computers), Alhussen, A. et al. [48] focus on improving user experience in AI-supported human–computer communication through automatic vocal emotion identification using a novel deep learning model. The authors utilize a public dataset, processed audio signals, and then optimized the network for effective emotion detection. Results demonstrate high accuracy in emotion recognition, indicating that the model reliably identifies emotional states in spoken speech. This approach is directly relevant to the HCI field, as it enables more empathetic and contextually sensitive interactions between users and AI systems.

De Carvalho, D. et al. [94] in the article “Enhancing mechanical ventilation management with AI: Computer vision for automated detection of ventilatory modes, parameters and asynchrony” present an AI-supported decision system utilizing computer vision to automatically analyze mechanical ventilator screens without requiring direct device integration. The solution achieves high accuracy in recognizing ventilation parameters and detecting multiple types of patient–ventilator asynchrony, thereby supporting personalized, evidence-based decision making in mechanical ventilation. Despite visual interaction with the user, the system is conceptually grounded primarily in artificial intelligence as a decision-making mechanism, rather than principles of traditional human–computer interaction.

In comparative analysis, we systematically address various types of visual, voice, and multimodal modalities, distinguishing between traditional HCI approaches and modern AI-based solutions. The overview summarizes the functional characteristics of technologies across industrial, logistics, and medical domains. A specific overview of the processed scientific findings is summarized in the following Table 1.

The initial phase of traditional human–computer interaction primarily focused on direct physical interaction between humans and computers or devices, with industrial environments dominated by technologies such as handheld scanners and pick-to-light systems. These solutions were characterized by their static nature, requiring humans to adapt to machine logic, which resulted in higher cognitive load for operators. With the advent and integration of AI, this paradigm has shifted, employing systems like deep learning, AI voice assistants, and intelligent AR applications, where the system learns and autonomously adapts to the human.

The literature review integrates findings from healthcare and smart home domains to ensure comprehensive coverage of HCI modalities. Healthcare evidence demonstrates visual modality’s limitations in low-light [38] or error detection amid acoustic noise [49,58]. Smart home applications [60] validate multimodal robustness in variable contexts. Regarding BCI, we acknowledge its significance in medical contexts [91]. However, BCI was not included as a separate analytical category due to its primary application outside manufacturing—we therefore recognize BCI as a promising direction for future research.

Based on the comparative analysis (Table 1), we identified key differences between HCI and AI-based systems in Table 2.

Based on the outlined comparative framework and key differences in interaction approaches, we conclude that for effective manufacturing, it is essential to incorporate elements that directly impact production efficiency, accuracy, and reliability. In modern manufacturing, implementation of sensory modalities such as voice, vision, and their combinations predominates, while the emotional dimension of interaction recedes into the background. From a manufacturing process standpoint, this dimension finds limited application compared to other HCI and HCII applications, such as those in healthcare.

Visual modality is suitable for working with complex data, where precision and control are required, and tasks have a longer duration. Voice modality is appropriate when hands or eyes are occupied, for simple, short commands, or in mobile/dynamic situations. Visual modality maximizes precision and control, while voice modality maximizes accessibility and speed. AI integrates them into multimodal interactions. AI in visual modality optimizes attention, and AI in voice modality optimizes comprehension.

4.2. Multi-Criteria Evaluation of Interaction Modalities

By applying MCDA, we considered two perspectives. A production manager does not evaluate visual, voice, and multimodal interaction “theoretically,” but according to whether it, for example, reduces risk, saves time, and maintains stable, efficient production. An operator evaluates visual, voice, and multimodal interaction primarily based on its impact on task performance, such as workload, usability, cognitive demand, and overall ease of use.

For multi-criteria comparison, we employed the complete pairwise comparison method. The weights and scores of individual interactions are determined via pairwise comparison. The relevant comparison criteria based on the literature review and comparative analysis were defined. The selected criteria primarily represent qualitative aspects of interaction that directly affect operator performance and comfort, as well as managerial assessment:

Criterion A: Noise levels;
Criterion B: Dustiness;
Criterion C: Response speed;
Criterion D: Cognitive load;
Criterion E: Training complexity;
Criterion F: Operational costs;
Criterion G: Implementation ease;
Criterion H: Robustness to disturbances.

Pairwise comparison of interactions for each criterion was evaluated on a maximization scale. The comparison was conducted by three experts following the rules described in the previous section. Although no formal statistical test of consistency among the raters was performed, consistency was ensured through the use of a shared evaluation framework, predefined criteria, and uniform scoring bases.

Table 3 summarizes the evaluations of individual criteria as the median value of three expert assessments. The median was chosen because we aimed to retain integer values for subsequent calculations. The reasoning was defined as follows:

Noise Level
Reliability of visual interaction is very high in such environments, as it is independent of acoustic conditions. Noise has no direct impact on input (touch, buttons). In noisy settings, visual interaction is the most stable. Reliability of voice interaction is low to very low in this case. Various sounds in the production hall and the use of protective gear reduce voice recognition accuracy. The reliability of multimodal interaction is higher than pure voice. Voice can serve as a supplementary channel, but if voice fails, additional redundant command confirmations (visual, touch) and adaptive modality switching are available.
Ratings for the noise criterion: visual—9, voice—2, multimodal—6.
Dustiness
Dust adversely affects the reliability of visual interaction; it can reduce touchscreen sensitivity, clog mechanical buttons, or impair display readability. Reliability of visual interaction is moderate to high in such operations, depending on how well the device is protected against various dust particles. Regarding voice interaction, dust does not impact acoustics like noise does. Issues here include microphone clogging, protective masks that muffle the voice, reduced speech intelligibility through respirators, or the need for frequent device maintenance. Reliability of multimodal interaction is high if the system is robust. Adaptive modality switching is available: if the display is dirty, voice input can be used, and vice versa—if the microphone is clogged, touch control can be employed. Regular cleaning is still required, of course. In dusty environments, multimodal interaction is a very strong solution.
Ratings for the dust criterion: visual—6, voice—5, multimodal—8.
Response Speed
Response speed for visual interaction is high if the operator is at the visualization panel. Upon fault occurrence, the operator registers the visual alarm and performs the intervention. The display provides clear signaling and detailed information. Response speed depends on operator attention and human factors. Response speed for voice interaction is very high, with good intelligibility of the voice announcement. The operator hears it immediately, can respond with a voice command, and no visual contact is required. In noisy environments, response speed slows down. Response speed for multimodal interaction is very high. The operator is alerted through multiple means (display, sound, device light beacon, etc.), reducing the likelihood of missing it.
Ratings for the response speed criterion: visual—7, voice—8, multimodal—9.
Cognitive Load
Cognitive load represents the number of mental resources required for information perception, comprehension, decision making, and action execution. For this criterion, note that higher operator cognitive load corresponds to fewer points on the rating scale. In visual interaction, the operator must monitor multiple parameters, filter alarms, and interpret graphs and numbers. The load is manageable but still requires sustained attention, with a risk of mental fatigue. Voice interaction leverages natural communication, reducing visual overload and memory load (no need to memorize commands). Multimodal interaction distributes information across senses, reducing mental fatigue compared to visual interaction. The operator selects the most suitable interaction channel. With a well-designed system, cognitive load for multimodal interaction can be rated as low.
Ratings for the cognitive load criterion: visual—4, voice—6, multimodal—7.
Training Complexity
For this criterion, note that more demanding training—with longer duration and requiring multiple operator skills—receives fewer points on the rating scale. In visual interaction, the operator must learn menu structures, icon and alarm meanings, and response procedures. It requires basic technical thinking from the operator. For voice interaction, training emphasizes precise command phrasing, correct pronunciation, and disciplined practical use. For multimodal interaction, the operator must learn multiple control options, when to use which channel, how redundancy works, how to respond to single-modality failures, etc. Post-training, system use is intuitive, but initial demands are higher.
Ratings for the training complexity criterion: visual—6, voice—8, multimodal—4.
Operational Costs
Visual interaction has low operational costs, including minimal maintenance and occasional software updates, given its long lifespan. Compared to visual interaction, voice interaction incurs higher operational costs, covering device maintenance, headset replacements, language model updates, and hygiene costs. Multimodal interaction has the highest operational costs (maintenance of multiple devices, spare parts, and update management). For this criterion, note that higher costs receive fewer points on the rating scale.
Ratings for the operational costs criterion: visual—8, voice—5, multimodal—3.
Implementation
Visual modality can be considered the simplest and lowest-risk implementation. Introducing voice modality is more time-consuming, requiring a pilot phase and thorough testing in real conditions. Among the modalities, multimodal has the highest implementation demands in operations, as it involves integrating multiple inputs, iterative testing, and training. For this criterion, note that a more demanding implementation receives fewer points on the rating scale.
Ratings for the implementation criterion: visual—8, voice—4, multimodal—3.
Robustness to Disturbances
Visual interaction is sensitive to dust, lighting, and vibrations but unaffected by noise and temperature. It is the most stable form of interaction in changing environments. Voice interaction is most sensitive to environmental changes (noise, hall echoes, respirators/masks, microphone vibrations, network latency from cloud ASR). Its advantage is independence from lighting. Multimodal interaction is highly resilient to environmental changes. It adapts situationally, with the benefit of redundancy switching one channel for another.
Ratings for the robustness criterion: visual—6, voice—2, multimodal—8.

Table 3. Input table for MCDA—comparison of interactions.

Criterion	Interaction
Criterion	Visual	Voice	Multimodal
Noise level	9	2	6
Dustiness	6	5	8
Response speed	7	8	9
Cognitive load	4	6	7
Training complexity	6	8	4
Operational costs	8	5	3
Implementation	8	4	3
Robustness to disturbances	6	2	8

Three experts determined criterion weights using an indirect method via pairwise comparison of individual criteria. A key advantage of this approach is that it forces the decision-maker to compare only two criteria at a time, which is cognitively simpler than ranking all criteria simultaneously. The weight-determination procedure is identical to that defined in [84]. Table 4 presents the pairwise comparisons and the weights established by Expert 1. A comparison of the weights from all experts is shown in Figure 3.

Figure 3 shows that all experts assigned the highest weight to criterion H—robustness to disturbances. The lowest weight was assigned to criterion F—operational costs (by Expert 1 and Expert 2) and to criterion E—training complexity (by Expert 3). The MCDA will use the criterion weight value obtained as the arithmetic mean of the weights determined by the experts (Table 5).

In the SAW method, consistency checking of the weights is important to ensure that the decision-making process is neither illogical nor arbitrary. SAW does not include a built-in consistency mechanism. Therefore, additional approaches are used. In the case of weights derived from multiple experts, a supplementary rank correlation approach can be applied to compare the weights. The comparison of weights was performed using Spearman’s rank correlation among the three experts.

The Spearman correlation coefficient was calculated between pairs of experts. The highest agreement was observed between Expert 1 and Expert 3, with r_s = 0.80, which indicates a very similar weighting pattern. The lowest agreement was found between Expert 1 and Expert 2, with r_s = 0.54, reflecting a moderate correlation. All correlations were positive, indicating that the experts generally agree in the direction of their weight assessments.

The partial utilities of interactions u_ij for individual criteria were determined similarly by pairwise comparisons based on Table 3. The procedure is analogous to that used for determining the criteria weights. The upper triangular matrix cells are filled with the interaction that best satisfies the defined criterion compared to the others [84]. The partial utilities of interactions u_ij for individual criteria are presented in Table 6. The calculation of the overall score A_i is shown in Table 7.

Within the SAW method, sensitivity analysis of the criterion weights was performed to verify the stability of the resulting ranking, including both one-factor and multi-parameter (scenario-based) changes in the weights. One-parameter and multi-parameter (scenario-based) analyses represent standard sensitivity analyses of weights in the linear MCDA model SAW.

Multimodal interaction (A₃) was identified as the winning alternative, primarily due to the high weight values of criteria H, C, B, and G. Criterion H carries the highest weight; for this criterion, A₃ (0.154077) has a clear advantage over A₁ (0.076923). If its weight is reduced, A₃ may lose its leading position. The same applies to criterion C. Criterion G is the only strong criterion favoring A₁, and criterion A also supports A₁ (0.065366 > 0.032634). Several sensitivity scenarios were conducted, presented in Table 8.

Following the sensitivity analysis results presented in Table 8, multimodal interaction (A₃) maintains its leading position across most evaluated scenarios. Even when the weight of criterion H (robustness) is significantly reduced, A₃ remains dominant, though the gap with A1 narrows. A ranking change occurs only under specific combined weight adjustments of multiple criteria, allowing visual interaction (A₁) to outperform A₃.

The ranking should be interpreted as contingent on the selected criteria and weights, reflecting the assumed decision profile rather than universal modality superiority. Sensitivity analysis demonstrates the stability of the framework under the literature-informed criteria structure and expert-based evaluation used in this study. These findings indicate that multimodal interaction appears most suitable under the analyzed scenarios.

Multimodal interaction’s relative advantage stems from its ability to integrate multiple channels, such as voice, visual, and gestures. This integration enhances flexibility across diverse operational conditions. This hybrid approach supports rapid fault detection through a combination of auditory and visual alarms. This approach also performs reliably in noisy/dusty environments. From a sustainability perspective, multimodal HCI may reduce waste, energy overuse, and cognitive load, aligning with sustainable manufacturing ecosystems [95,96].

Visual interaction ranks second and represents the current widespread practice with low implementation complexity and rapid operator training. Voice interaction ranks lowest due to vulnerability in noisy industrial environments but remains viable in clean, quiet settings.

Despite these relative preferences, all modalities involve trade-offs. Multimodal systems require higher implementation costs and training, while single-modality systems sacrifice robustness in harsh conditions. The framework facilitates structured trade-off analysis rather than prescriptive modality selection.

The framework can be applied as part of an iterative evaluation process, where interaction comparisons and expert assessments provide feedback for repeated decision cycles. This approach enables continuous refinement of the evaluation as conditions, technologies, and user requirements evolve, thereby enhancing the dynamic adaptability and real-world applicability of the framework.

4.3. Study Limitations

Several limitations were identified in this study. The study does not include empirical validation in real manufacturing environments, relying instead on literature-informed analysis and expert judgment. This restricts analysis to an academic methodological decision support framework.

The expert panel comprised only three participants with different expertise. This limited panel size may affect ranking generalizability; thus, larger panels (n ≥ 7) are recommended for future framework applications. One expert was an author of the contribution, which may introduce potential bias despite standardized evaluation. Therefore, independent experts from different domains of expertise would strengthen subsequent framework applications.

The ranking is dependent on the chosen criteria set and weighting structure, and therefore should not be generalized beyond the studied manufacturing context. Different stakeholder priorities or scenarios may yield alternative rankings. Overall, with the ongoing advancement of AI, the main limiting factor may gradually shift from technological constraints to human cognitive capacity, attention, and willingness to adopt more complex interaction systems.

These limitations position the study as a conceptual decision support framework rather than conclusive empirical evidence. The results provide a structured indication of modality suitability under the assumed decision profile, serving as a foundation for future application.

5. Conclusions

This study developed a conceptual decision support framework for evaluating HCI modalities in manufacturing environments using MCDA. The framework structures the HCI modality selection problem based on literature-informed criteria and expert judgment, enabling reproducible comparison of visual, voice, and multimodal in manufacturing scenarios.

The MCDA ranking indicates multimodal interaction as the most suitable option under the analyzed conditions, offering advantages in robustness, flexibility, and sustainability potential through multi-channel integration. Visual interaction ranks second as an established practice, while voice interaction appears least suitable for noisy industrial settings. These results should be interpreted as context-dependent indications rather than universal recommendations, reflecting the specific criteria weights and expert preferences applied.

The framework’s value lies in structured trade-off analysis rather than prescriptive modality selection. It supports manufacturing stakeholders in systematically evaluating HCI options according to operational priorities, facilitating informed decision making where empirical data may be unavailable. The framework provides a methodological basis for iterative application in real decision scenarios.

This study is purely scientific research based on available literature and expert opinions from the academic sector. Therefore, several limitations must be acknowledged. The lack of real-world manufacturing data limits evaluation to a literature context framework. The limited number and composition of experts, including author participation, may influence results and introduce subjectivity. The final ranking reflects specific criteria choices and a weighting structure. Findings apply to manufacturing environments and should not be generalized without caution.

Future research and framework applications should address these limitations by incorporating larger independent expert panels (n ≥ 7) to enhance result robustness and real-world industrial case studies and pilot implementations. Additionally, exploration and inclusion of emerging modalities, such as brain–computer interfaces, may extend framework applicability.

Author Contributions

Each author (P.M., J.S. and M.O.) has contributed to this publication. Conceptualization, J.S. and M.O.; methodology, P.M. and M.O.; software, M.O.; validation, P.M. and J.S.; formal analysis, P.M.; investigation, P.M.; resources, P.M.; data curation, P.M. and J.S.; writing—original draft preparation, P.M. and J.S.; writing—review and editing, M.O.; visualization, M.O.; supervision, J.S. and M.O.; project administration, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments, which improved the quality of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Card, S.K.; Moran, T.P.; Newell, A. The Psychology of Human–Computer Interaction; CRC Press: Boca Raton, FL, USA, 1983. [Google Scholar]
Pantic, M.; Pentland, A.; Nijholt, A.; Huang, T.S. Human computing and machine understanding of human behavior: A survey. In Artificial Intelligence for Human Computing; Huang, T.S., Nijholt, A., Pantic, M., Pentland, A., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4451, pp. 47–71. [Google Scholar] [CrossRef]
Rogers, Y. HCI theory: Classical, modern, and contemporary. Synth. Lect. Hum.-Centered Inform. 2012, 5, 1–129. [Google Scholar] [CrossRef]
Khan, S.B.; Chandna, S. Introduction to human–computer interaction using artificial intelligence. In Innovations in Artificial Intelligence and Human–Computer Interaction in the Digital Era; Academic Press: Cambridge, MA, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Jacobs, C.; Johnson, H.; Rennie, T.; Lambert, J.; Joiner, R. Human–computer interaction and artificial intelligence: Advancing care through extended mind theory. Cureus 2024, 16, e74968. [Google Scholar] [CrossRef]
Sharma, S.; Shrestha, S. Integrating HCI principles in AI: A review of human-centered artificial intelligence applications and challenges. J. Future Artif. Intell. Technol. 2024, 1, 44–56. [Google Scholar] [CrossRef]
Mazarakis, A.; Bernhard-Skala, C.; Braun, M.; Peters, I. What is critical for human-centered AI at work? Toward an interdisciplinary theory. Front. Artif. Intell. 2023, 6, 1257057. [Google Scholar] [CrossRef] [PubMed]
Schmager, S.; Pappas, I.O.; Vassilakopoulou, P. Understanding Human-Centred AI: A Review of Its Defining Elements and a Research Agenda. Behav. Inf. Technol. 2025, 44, 3771–3810. [Google Scholar] [CrossRef]
Raees, M.; Meijerink, I.; Lykourentzou, I.; Khan, V.-J.; Papangelis, K. From Explainable to Interactive AI: A Literature Review on Current Trends in Human-AI Interaction. arXiv 2024, arXiv:2405.15051. [Google Scholar] [CrossRef]
Acharjya, P.; Joardar, S.; Koley, S. Artificial intelligence-based intelligent human–computer interaction. In Handbook of Research on AI Methods and Applications in Computer Engineering; IGI Global: Hershey, PA, USA, 2023; pp. 58–79. [Google Scholar] [CrossRef]
Ding, Z.; Ji, Y.; Gan, Y.; Wang, Y.; Xia, Y. Current status and trends of technology, methods, and applications of human–computer intelligent interaction: A bibliometric research. Multimed. Tools Appl. 2024, 83, 69111–69144. [Google Scholar] [CrossRef]
Womser-Hacker, C. Accessible human–computer interaction. In Handbook of Accessible Communication; Maaß, C., Rink, I., Eds.; Frank & Timme: Berlin, Germany, 2024; pp. 453–472. [Google Scholar] [CrossRef]
Liu, B.H.; Pham, V.T.; Nguyen, T.N.; Luo, Y.S. A heuristic for maximizing the lifetime of data aggregation in wireless sensor networks. arXiv 2019, arXiv:1910.05310. [Google Scholar] [CrossRef]
Panda, S.; Roy, S.T. Reflections on emerging HCI–AI research. AI Soc. 2024, 39, 407–409. [Google Scholar] [CrossRef]
Ramadevi, P. Human–computer interaction: Bridging the gap between humans and technology. Int. Sci. J. Eng. Manag. 2025, 4, 1–5. [Google Scholar] [CrossRef]
Bi, T.; Zhang, Y.; Wang, C.; Ayobi, A. Characterizing HCI research in China: Streams, methodologies and future directions. In Proceedings of the CHI 2019 Conference, Glasgow, UK, 4–9 May 2019. [Google Scholar] [CrossRef]
Li, K.; Tiwari, A.; Alcock, J.; Bermell-Garcia, P. Categorisation of visualisation methods to support the design of human–computer interaction systems. Appl. Ergon. 2016, 55, 85–107. [Google Scholar] [CrossRef]
Bhowmik, A.K. Natural and intuitive user interfaces with perceptual computing technologies. Inf. Disp. 2013, 29, 6–10. [Google Scholar] [CrossRef]
Cheng, X.; Lin, X.; Shen, X.L.; Zarifis, A.; Mou, J. The dark sides of AI. Electron. Mark. 2022, 32, 11–15. [Google Scholar] [CrossRef]
Yang, S.J.H.; Ogata, T.; Matsui, N.; Chen, N.S. Human-centered artificial intelligence in education: Seeing the invisible through the visible. Comput. Educ. Artif. Intell. 2021, 2, 100008. [Google Scholar] [CrossRef]
Curchoe, C.L.; Bormann, C.L. Artificial intelligence and machine learning for human reproduction and embryology. J. Assist. Reprod. Genet. 2019, 36, 591–600. [Google Scholar] [CrossRef] [PubMed]
Pisoni, G.; Díaz-Rodríguez, N.; Gijlers, H.; Tonolli, L. Human-centred artificial intelligence for designing accessible cultural heritage. Appl. Sci. 2021, 11, 870. [Google Scholar] [CrossRef]
Liao, H.; Zhou, Z.; Zhao, X.; Zhang, L.; Mumtaz, S.; Jolfaei, A.; Ahmed, S.H.; Bashir, A.K. Learning-based context-aware resource allocation for edge-computing-empowered industrial IoT. IEEE Internet Things J. 2020, 7, 4260–4277. [Google Scholar] [CrossRef]
Alkatheiri, M.S. Artificial intelligence assisted improved human–computer interactions for computer systems: A systematic review. Comput. Electr. Eng. 2022, 101, 107950. [Google Scholar] [CrossRef]
Lyu, Z.; Poiesi, F.; Dong, Q.; Lloret, J.; Song, H. Deep learning for intelligent human–computer interaction. Appl. Sci. 2022, 12, 11457. [Google Scholar] [CrossRef]
Grigsby, S.S. Artificial intelligence for advanced human–machine symbiosis. In Augmented Cognition: Intelligent Technologies; Springer: Cham, Switzerland, 2018; pp. 255–266. [Google Scholar] [CrossRef]
Gomes, C.C.; Preto, S. Artificial intelligence and interaction design for a positive emotional user experience. In Intelligent Human Systems Integration; Springer: Cham, Switzerland, 2018; pp. 321–327. [Google Scholar]
Zhang, C.; Lu, Y. Study on artificial intelligence: The state of the art and future prospects. J. Ind. Inf. Integr. 2021, 23, 100224. [Google Scholar] [CrossRef]
Ahamed, M.M. Analysis of human–machine interaction design perspective: A comprehensive literature review. Int. J. Contemp. Comput. Res. 2017, 1, 31–42. [Google Scholar]
Šumak, B.; Brdnik, S.; Pušnik, M. Sensors and artificial intelligence methods and algorithms for human–computer intelligent interaction: A systematic mapping study. Sensors 2022, 22, 20. [Google Scholar] [CrossRef]
Lin, L.; Qiu, J.; Lao, J. Intelligent human–computer interaction: A perspective on software engineering. In Proceedings of the 14th International Conference on Computer Science & Education (ICCSE 2019), Toronto, ON, Canada, 19–21 August 2019; pp. 488–492. [Google Scholar]
Hussain, J.; Ul Hassan, A.; Bilal, H.S.M.; Ali, R.; Afzal, M.; Hussain, S.; Bang, J.; Banos, O.; Lee, S. Model-based adaptive user interface based on context and user experience evaluation. J. Multimodal User Interfaces 2018, 12, 1–16. [Google Scholar] [CrossRef]
Garcia-Moreno, F.M.; Bermudez-Edo, M.; Rodriguez-Fortiz, M.J.; Garrido, J.L. A CNN–LSTM deep learning classifier for motor imagery EEG detection using a low-invasive and low-cost BCI headband. In Proceedings of the 16th International Conference on Intelligent Environments (IE 2020), Madrid, Spain, 20–23 July 2020; pp. 84–91. [Google Scholar]
Oviatt, S.; Schuller, B.; Cohen, P.R.; Sonntag, D.; Potamianos, G.; Krüger, A. (Eds.) The Handbook of Multimodal–Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, Vol. 1; Association for Computing Machinery: New York, NY, USA, 2017; Volume 1. [Google Scholar]
Govindaraju, D.; Thangam, D. Emotion recognition in human–machine interaction and a review in interpersonal communication perspective. In Handbook of Research on AI and Human Interaction; IGI Global: Hershey, PA, USA, 2024; pp. 312–330. [Google Scholar] [CrossRef]
Jacob, R.; Karn, K. Eye tracking in human–computer interaction and usability research: Ready to deliver the promises. In The Mind’s Eye; Hyönä, J., Radach, R., Deubel, H., Eds.; Elsevier: Amsterdam, The Netherlands, 2003; pp. 573–605. [Google Scholar] [CrossRef]
Thalmann, D. Sensors and actuators for HCI and VR: A few case studies. In Frontiers in Electronic Technologies; Prabaharan, S., Thalmann, N., Kanchana Bhaaskaran, V., Eds.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2017; Volume 433, pp. 41–56. [Google Scholar] [CrossRef]
Zhang, S.; Song, R.; Cheng, J.; Zhang, Y.; Chen, X. A feasibility study of a video-based heart rate estimation method with convolutional neural networks. In Proceedings of the IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA 2019), Tianjin, China, 14–16 June 2019; pp. 1–5. [Google Scholar] [CrossRef]
Panahi, O.; Ezzati, A. AI in dental medicine: Current applications and future directions. Open Access J. Clin. Images 2025, 2, 1–5. [Google Scholar] [CrossRef]
Salloum, S.A.; Alomari, K.M.; Alfaisal, A.M.; Aljanada, R.A.; Basiouni, A. Emotion recognition for enhanced learning: Using AI to detect students’ emotions and adjust teaching methods. Smart Learn. Environ. 2025, 12, 21. [Google Scholar] [CrossRef]
Telceken, M.; Akgun, D.; Kacar, S.; Yesin, K.; Yıldız, M. Can artificial intelligence understand our emotions? Deep learning applications with face recognition. Curr. Psychol. 2025, 44, 7946–7956. [Google Scholar] [CrossRef]
Lagorio, A.; Di Pasquale, V.; Cimini, C.; Miranda, S.; Pinto, R. Augmented reality in logistics 4.0: Implications for the human work. IFAC-PapersOnLine 2022, 55, 329–334. [Google Scholar] [CrossRef]
Stockinger, C.; Steinebach, T.; Petrat, D.; Bruns, R.; Zöller, I. The effect of pick-by-light systems on situation awareness in order picking activities. Procedia Manuf. 2020, 45, 96–101. [Google Scholar] [CrossRef]
Sharma, R.P.; Verma, G.K. Human–computer interaction using hand gesture. Procedia Comput. Sci. 2015, 54, 721–727. [Google Scholar] [CrossRef]
Yang, J.; Liu, Y.; Morgan, P.L. Human–machine interaction towards Industry 5.0: Human-centric smart manufacturing. Digit. Eng. 2024, 2, 100013. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M.; Troussas, C.; Mylonas, P. Multimodal Interaction, Interfaces, and Communication: A Survey. Multimodal Technol. Interact. 2025, 9, 6. [Google Scholar] [CrossRef]
Saha, N.; Gadow, V.; Harik, R. Emerging Technologies in Augmented Reality (AR) and Virtual Reality (VR) for Manufacturing Applications: A Comprehensive Review. J. Manuf. Mater. Process. 2025, 9, 297. [Google Scholar] [CrossRef]
Alhussen, A.; Ansari, A.S.; Mohammadi, M.S. Enhancing user experience in AI-powered human–computer communication with vocal emotions identification using a novel deep learning method. Comput. Mater. Contin. 2025, 82, 2909–2929. [Google Scholar] [CrossRef]
Velagaleti, S.B.; Choukaier, D.; Nuthakki, R.; Lamba, V.; Sharma, V.; Rahul, S. Empathetic algorithms: The role of AI in understanding and enhancing human emotional intelligence. J. Electr. Syst. 2024, 20, 2051–2060. [Google Scholar] [CrossRef]
Huang, M.-H.; Rust, R.T. The GenAI future of consumer research. J. Consum. Res. 2025, 52, 4–17. [Google Scholar] [CrossRef]
Wirtz, J.; Stock-Homburg, R. Generative AI meets service robots: The promise of LLMs, LBMs, and agentic AI in physical service encounters. J. Serv. Res. 2025, 28, 527–543. [Google Scholar] [CrossRef]
Meshram, S.; Naik, N.; More, T.; Kharche, S. Conversational AI: Chatbots. In Proceedings of the International Conference on Intelligent Technologies (CONIT 2021), Hubli, India, 14–16 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
Hu, P.; Gong, Y.; Lu, Y.; Ding, A.W. Speaking vs. listening? Balance conversation attributes of voice assistants for better voice marketing. Int. J. Res. Mark. 2023, 40, 109–127. [Google Scholar] [CrossRef]
Park, D.; Kim, E. Method of interacting between humans and conversational voice agent systems. Heliyon 2024, 10, e23573. [Google Scholar] [CrossRef]
Blut, M.; Wünderlich, N.V.; Brock, C. Facilitating retail customers’ use of AI-based virtual assistants: A meta-analysis. J. Retail. 2024, 100, 293–315. [Google Scholar] [CrossRef]
Guha, A.; Grewal, D.; Kopalle, P.K.; Haenlein, M.; Schneider, M.J.; Jung, H.; Moustafa, R.; Hegde, D.R.; Hawkins, G. How artificial intelligence will affect the future of retailing. J. Retail. 2021, 97, 28–41. [Google Scholar] [CrossRef]
Loeffler, C.M.L.; Muti, H.; Kather, J.; Truhn, D. Bridging communication gaps: The role of voice-enabled AI in medicine. ESMO Real World Data Digit. Oncol. 2025, 8, 100138. [Google Scholar] [CrossRef]
Muddaloor, P.; Baraskar, B.; Shah, H.; Gopalakrishnan, K.; Sood, D.; Pasupuleti, P.C.; Singh, A.; Mitra, D.; Hoskote, S.S.; Iyer, V.N.; et al. The human voice as a digital health solution leveraging artificial intelligence. Sensors 2025, 25, 3424. [Google Scholar] [CrossRef]
Chirita, R.; Ciobanescu, S.A.; Ungureanu, C.; Sbircea, I. Impact of artificial intelligence in the automotive industry. FAIMA Bus. Manag. J. 2025, 13, 49–58. [Google Scholar]
Alghlayini, S.; Deriche, M. A personalized smart home control system for the elderly and people with disabilities using Arabic voice commands. In Proceedings of the IEEE 22nd International Multi-Conference on Systems, Signals & Devices (SSD 2025), Monastir, Tunisia, 17–20 February 2025; pp. 1346–1350. [Google Scholar] [CrossRef]
Ludwig, H.; Schmidt, T.; Kühn, M. Voice user interfaces in manufacturing logistics: A literature review. Int. J. Speech Technol. 2023, 26, 627–639. [Google Scholar] [CrossRef]
Dujmešić, N.; Bajor, I.; Rožić, T. Warehouse processes improvement by pick by voice technology. Teh. Vjesn. 2018, 25, 1227–1233. [Google Scholar] [CrossRef]
Li, Y.; Huang, J.; Tian, F.; Wang, H.-A.; Dai, G.-Z. Gesture interaction in virtual reality. Virtual Real. Intell. Hardw. 2019, 1, 84–112. [Google Scholar] [CrossRef]
Jaimes, A.; Sebe, N. Multimodal human–computer interaction: A survey. Comput. Vis. Image Underst. 2007, 108, 116–134. [Google Scholar] [CrossRef]
Cohen, P.R.; McGee, D.R. Tangible multimodal interfaces for safety-critical applications. Commun. ACM 2004, 47, 41–46. [Google Scholar] [CrossRef]
Gonsher, I. Beyond the keyboard, mouse, and screen: New paradigms in interface design. In Proceedings of the Future Technologies Conference (FTC 2021); Arai, K., Ed.; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2022; Volume 358, pp. 115–125. [Google Scholar] [CrossRef]
Wang, Z.; Chen, M.; Liu, Q. A review on multimodal communications for human–robot collaboration in 5G: From visual to tactile. Intell. Robot. 2025, 5, 579–606. [Google Scholar] [CrossRef]
Chojecki, P.; Strazdas, D.; Przewozny, D.; Gard, N.; Runde, D.; Hoerner, N.; Al-Hamadi, A.; Eisert, P.; Bosse, S. Assessing the value of multimodal interfaces: A study on human–machine interaction in weld inspection workstations. Sensors 2023, 23, 5043. [Google Scholar] [CrossRef]
Wu, D.; Zheng, P.; Zhao, Q.; Zhang, S.; Qi, J.; Hu, J.; Zhu, G.-N.; Wang, L. Empowering natural human–robot collaboration through multimodal language models and spatial intelligence: Pathways and perspectives. Robot. Comput.-Integr. Manuf. 2026, 97, 103064. [Google Scholar] [CrossRef]
Bolbakov, R.G.; Morgunov, V.S.; Solovyev, I.V.; Tsvetkov, V.Y. Methods of comparative analysis. J. Phys. Conf. Ser. 2020, 1679, 052047. [Google Scholar] [CrossRef]
Mills, M.; Van de Bunt, G.G.; De Bruijn, J. Comparative research: Persistent problems and promising solutions. Int. Sociol. 2006, 21, 619–631. [Google Scholar] [CrossRef]
Kondratenko, Y.P.; Klymenko, L.P.; Sidenko, I.V. Comparative analysis of evaluation algorithms for decision-making in transport logistics. In Advance Trends in Soft Computing: Proceedings of WCSC 2013; Springer: Heidelberg, Germany, 2013; Volume 312, pp. 203–216. [Google Scholar]
Ohakwe, C.R.; Wu, J. The impact of macroeconomic indicators on logistics performance: A comparative analysis using simulated scenarios. Sustain. Futures 2025, 9, 100567. [Google Scholar] [CrossRef]
Özcan, T.; Çelebi, N.; Esnaf, Ş. Comparative analysis of multi-criteria decision making methodologies and implementation of a warehouse location selection problem. Expert Syst. Appl. 2011, 38, 9773–9779. [Google Scholar] [CrossRef]
Shah, S.; Suraj, D.; Reza, S.M.; Salam, M.A.R.B.A.; Ashraf, A.; Ferdous, S.F. Comparative analysis of deep learning models for defect detection in additive manufacturing using thermal imaging. Results Eng. 2025, 28, 108359. [Google Scholar] [CrossRef]
Sa’ei, A. Comparative Research Method: Quantitative, Historical and Fuzzy Analysis; Samt publications: Tehran, Iran, 2013; pp. 10–50. [Google Scholar]
Øvretveit, J. Comparative and Cross-Cultural Health Research: A Practical Guide; Radcliffe Medical Press: Oxford, UK, 1998; pp. 1–187. [Google Scholar]
Livingstone, S. On the challenges of cross-national comparative media research. Eur. J. Commun. 2003, 18, 477–500. [Google Scholar] [CrossRef][Green Version]
Gharawi, M.A.; Pardo, T.A.; Guerrero, S. Issues and strategies for conducting cross-national e-government comparative research. In Proceedings of the 3rd International Conference on Theory and Practice of Electronic Governance, Bogota, Colombia, 10–13 November 2009; pp. 163–170. [Google Scholar]
Dean, M. Multi-criteria analysis. In Advances in Transport Policy and Planning; Mouter, N., Ed.; Academic Press: Cambridge, MA, USA, 2020; Volume 6, pp. 165–224. [Google Scholar]
Triantaphyllou, E.; Mann, H. An examination of the effectiveness of multi-dimensional decision-making methods: A decision-making paradox. Decis. Support Syst. 1989, 5, 303–312. [Google Scholar] [CrossRef]
Watróbski, J.; Jankowski, J.; Ziemba, P.; Karczmarczyk, A.; Ziolo, M. Generalised framework for multi-criteria method selection. Omega 2018, 86, 107–124. [Google Scholar] [CrossRef]
Jahan, A.; Edwards, K.L. A state-of-the-art survey on the influence of normalization techniques in ranking: Improving the materials selection process in engineering design. Mater. Des. 2015, 65, 335–342. [Google Scholar] [CrossRef]
Straka, M. Logistika Distribúcie. Ako Efektívne Dostať Výrobok Na Trh; Epos: Bratislava, Slovakia, 2013; pp. 1–399. [Google Scholar]
Vafaei, N.; Ribeiro, R.A.; Camarinha-Matos, L.M. Assessing normalization techniques for simple additive weighting method. Procedia Comput. Sci. 2022, 199, 1229–1236. [Google Scholar] [CrossRef]
Stoltz, M.-H.; Giannikas, V.; McFarlane, D.; Strachan, J.; Um, J.; Srinivasan, R. Augmented reality in warehouse operations: Opportunities and challenges. IFAC-PapersOnLine 2017, 50, 12979–12984. [Google Scholar] [CrossRef]
Egger, J.; Masood, T. Augmented reality in support of intelligent manufacturing: A systematic literature review. Comput. Ind. Eng. 2020, 140, 106195. [Google Scholar] [CrossRef]
Jaghbeer, Y.; Hanson, R.; Johansson, M.I. Automated order picking systems and the links between design and performance: A systematic literature review. Int. J. Prod. Res. 2020, 58, 4489–4505. [Google Scholar] [CrossRef]
Ziaee, O.; Hamedi, M. Augmented reality applications in manufacturing and its future scope in Industry 4.0. arXiv 2021, arXiv:2112.11190. [Google Scholar] [CrossRef]
Wang, T.; Zheng, P.; Li, S.; Wang, L. Multimodal human–robot interaction for human-centric smart manufacturing: A survey. Adv. Intell. Syst. 2023, 5, 2300359. [Google Scholar] [CrossRef]
Lee, H.; Jiang, N.; Samuel, S. Detection of error in static and dynamic visual stimulation via electroencephalogram and eye-tracking systems. Eng. Appl. Artif. Intell. 2025, 159, 111688. [Google Scholar] [CrossRef]
Taban, R.A.; Croock, M.S. Eye tracking based directional control system using mobile applications. Int. J. Comput. Digit. Syst. 2018, 7, 365–374. [Google Scholar] [CrossRef]
Jo, H. Interaction, novelty, voice, and discomfort in the use of artificial intelligence voice assistant. Univers. Access Inf. Soc. 2025, 24, 2419–2432. [Google Scholar] [CrossRef]
De Carvalho, D.; Hoffmann, K.; Nunes Filho, J.R.; Baptistella, A.R. Enhancing mechanical ventilation management with AI: Computer vision for automated detection of ventilatory modes, parameters and asynchrony. J. Crit. Care 2026, 91, 155238. [Google Scholar] [CrossRef]
Hamdani, R.; Chihi, I. Adaptive human-computer interaction for industry 5.0: A novel concept, with comprehensive review and empirical validation. Comput. Ind. 2025, 168, 104268. [Google Scholar] [CrossRef]
Centre for Sustainable Human-Machine Interaction in Eco-Innovative Manufacturing. Centra Doskonałości Naukowej i Technologicznej Uniwersytetu Zielonogórskiego. 2024. Available online: https://cdnit.uz.zgora.pl/en/centre-for-sustainable-human-machine-interaction-in-eco-innovative-manufacturing/ (accessed on 26 February 2026).

Figure 1. A general architecture of a multimodal HCII system [30].

Figure 2. Methodological procedure for selecting the appropriate modality.

Figure 3. Comparison of the weights from all experts.

Table 1. Comparative analysis of HCI and AI-based HCI across modalities.

HCI Type	Modality	Interaction Form	Source
Traditional	Visual	Manual input devices (handheld scanners) and their comparison with AR in parcel sorting processes.	Stoltz et al., 2017 [86]
		Pick-to-light visual signaling for item pick confirmation in warehouses.	Jaghbeer, Y. et al., 2020 [88]
		Directional eye-tracking control using mobile camera for movement control of mobile objects.	Taban, R. A. et al., 2018 [92]
	Voice/Audio	Voice confirmation of picks in automated parts-to-picker warehouse systems.	Jaghbeer, Y. et al., 2020 [88]
	Multimodal	Model-based adaptive user interfaces (UIs) adapting to context and feedback.	Hussain, J. et al., 2018 [32]
		Physical interface (BCI headband) for EEG signal capture in motor imagery.	Garcia-Moreno, F.M. et al., 2020 [33]
		Monitoring and control of fully automated arms (human as supervisor).	Jaghbeer, Y. et al., 2020 [88]
AI-based	Visual	Intelligent AR applications with automatic inspection and real-time digital data integration.	Stoltz et al., 2017; Ziaee & Hamedi, 2021 [86,89]
		Augmented reality (AR) in Industry 4.0 focused on reducing cognitive load.	Egger & Masood, 2020; Ziaee & Hamedi, 2021 [87,89]
		AI cameras for item verification and computer vision for personnel detection near robots (AGV/AMR).	Jaghbeer, Y. et al., 2020 [88]
		Computer vision for automatic analysis of ventilators without integration.	De Carvalho, D. et al., 2026 [94]
	Voice/Audio	Optimization and factors for long-term use of intelligent AI voice assistants.	Jo, H., 2025 [93]
	Voice/Audio	Automatic vocal emotion identification using deep learning models for empathetic interaction.	Alhussen, A. et al., 2025 [48]
	Multimodal	Human-centric robotics with mode redundancy (voice + gestures) for smart manufacturing.	Wang, T. et al., 2023 [90]
		AI models for adaptive interpretation of EEG and eye-tracking data to detect error states.	Lee, H. et al., 2025 [91]
		Empathetic algorithms using NLP and computer vision for autonomous emotion analysis.	Velagaleti S. B. et al., 2024 [49]

Table 2. Key differences between traditional HCI and AI-based HCI approaches.

Aspect	Traditional HCI	AI-Based HCI
System Adaptation	Manual parameter and sensor setup for data collection.	Adaptive interface that autonomously adjusts to the user.
Interaction	Requirement for direct physical contact (keyboard, mouse, scanner).	Contactless interaction leveraging biometrics, facial, or posture recognition.
Cognitive Load	User adapts to the system’s logic and structure.	System adapts to the user’s cognitive states and needs.
Control Complexity	High dependence on operator skill level and experience.	Simplified control is accessible even to less experienced users.
Output Presentation	Static information display (fixed menus, icons, windows).	Dynamic, context-sensitive visualizations and adaptive UI.
Interface Learning	Time-intensive training and mastery of manual procedures.	Intuitive interaction requires minimal prior training.

Table 4. Pairwise comparisons and the weights established by Expert 1.

Criterion	A	B	C	D	E	F	G	H	Frequency of Occurrences	Normalized Weight
A	-	B	C	A	E	A	G	H	2	0.071
B	-	-	C	B	E	B	G	H	3	0.107
C	-	-	-	C	C	C	C	H	6	0.215
D	-	-	-	-	D	D	G	H	2	0.071
E	-	-	-	-	-	E	G	H	3	0.107
F	-	-	-	-	-	-	F	H	1	0.036
G	-	-	-	-	-	-	-	H	4	0.143
H	-	-	-	-	-	-	-	-	7	0.250
Sum									28	1

Table 5. Arithmetic means of the weights determined by the experts.

Criterion	Expert 1 Evaluation	Expert 2 Evaluation	Expert 3 Evaluation	Weight w_j
A	0.071	0.167	0.056	0.098
B	0.107	0.194	0.139	0.147
C	0.215	0.139	0.167	0.174
D	0.071	0.083	0.083	0.079
E	0.107	0.056	0.028	0.064
F	0.036	0.028	0.138	0.067
G	0.143	0.111	0.167	0.140
H	0.25	0.222	0.222	0.231

Table 6. Determination of partial utilities of interactions u_ij.

Criterion A	Visual	Voice	Multimodal	Frequency of Occurrences	u_ij
Visual	-	Visual	Visual	2	0.667
Voice	-	-	Multimodal	0	0
Multimodal	-	-	-	1	0.333
Sum				3	1
Criterion B	Visual	Voice	Multimodal
Visual	-	Visual	Multimodal	1	0.333
Voice	-	-	Multimodal	0	0
Multimodal	-	-	-	2	0.667
Sum				3	1
Criterion C	Visual	Voice	Multimodal
Visual	-	Voice	Multimodal	0	0
Voice	-	-	Multimodal	1	0.333
Multimodal	-	-	-	2	0.667
Sum				3	1
Criterion D	Visual	Voice	Multimodal
Visual	-	Voice	Multimodal	0	0
Voice	-	-	Multimodal	1	0.333
Multimodal	-	-	-	2	0.667
Sum				3	1
Criterion E	Visual	Voice	Multimodal
Visual	-	Voice	Visual	1	0.333
Voice	-	-	Voice	2	0.667
Multimodal	-	-	-	0	0
Sum				3	1
Criterion F	Visual	Voice	Multimodal
Visual	-	Visual	Visual	2	0.667
Voice	-	-	Voice	1	0.333
Multimodal	-	-	-	0	0
Sum				3	1
Criterion G	Visual	Voice	Multimodal
Visual	-	Visual	Visual	2	0.667
Voice	-	-	Voice	1	0.333
Multimodal	-	-	-	0	0
Sum				3	1
Criterion H	Visual	Voice	Multimodal
Visual	-	Visual	Multimodal	1	0.333
Voice	-	-	Multimodal	0	0
Multimodal	-	-	-	2	0.667
Sum				3	1

Table 7. The calculation of the overall score A_i.

Criterion	Weight	Visual Interaction		Voice Interaction		Multimodal Interaction
Criterion	w_j	u_ij	w_j × u_ij	u_ij	w_j × u_ij	u_ij	w_j × u_ij
A	0.098	0.667	0.065366	0	0	0.333	0.032634
B	0.147	0.333	0.048951	0	0	0.667	0.098049
C	0.174	0	0	0.333	0.057942	0.667	0.116058
D	0.079	0	0	0.333	0.026307	0.667	0.052693
E	0.064	0.333	0.021312	0.667	0.042688	0	0
F	0.067	0.667	0.044689	0.333	0.022311	0	0
G	0.14	0.667	0.09338	0.333	0.04662	0	0
H	0.231	0.333	0.076923	0	0	0.667	0.154077
Sum	1	A₁	0.3506	A₂	0.1959	A₃	0.4535

Table 8. Evaluation of the sensitivity of MCDA results.

Scenario	Outcome	Modalities Sequence
Significant reduction in weight H (H → 0)	A₃ loses value (0.2994); A₁ approaches the value of A₃ (0.2737).	A₃ > A₁ > A₂
Reduction in weights H and C by 50%	A₃ loses value (0.318444); A₁ approaches the value of A₃ (0.3122).	A₃ > A₁ > A₂
Increase in weight G by 50%	A₃ does not change its value; A₁ approaches the value of A₃ (0.3973).	A₃ > A₁ > A₂
Increase in weights A and G by 50%	A₁ is favored (0.3980); A₃ is disadvantaged (0.9530).	A₁ > A₃ > A₂
Increase in weights of criteria (C, D, E) by 100%	A₃ remains the winner (0.6222); A₂ (0.3228) approaches A₁ (0.3719).	A₃ > A₁ > A₂
Extreme scenario (all weights equal = 0.125)	A₁ would significantly benefit (0.3570); A₃ would lose dominance (0.3751).	A₃ > A₁ > A₂

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Muchova, P.; Saderova, J.; Ondov, M. Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing. Sustainability 2026, 18, 4638. https://doi.org/10.3390/su18104638

AMA Style

Muchova P, Saderova J, Ondov M. Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing. Sustainability. 2026; 18(10):4638. https://doi.org/10.3390/su18104638

Chicago/Turabian Style

Muchova, Patricia, Janka Saderova, and Marek Ondov. 2026. "Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing" Sustainability 18, no. 10: 4638. https://doi.org/10.3390/su18104638

APA Style

Muchova, P., Saderova, J., & Ondov, M. (2026). Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing. Sustainability, 18(10), 4638. https://doi.org/10.3390/su18104638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing

Abstract

1. Introduction

2. Literature Review

2.1. Literature Search and Selection Strategy

2.1.1. Search Strategy and Databases

2.1.2. Selection Process and Data Synthesis

2.2. Functional Overview and Application Trends of AI-Based HCI Modalities

2.3. Visual Interaction

2.4. Voice Interaction

2.5. Multimodal Interaction

3. Materials and Methods

4. Results and Discussion

4.1. Comparative Analysis

4.2. Multi-Criteria Evaluation of Interaction Modalities

4.3. Study Limitations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI