Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing
Abstract
1. Introduction
2. Literature Review
2.1. Literature Search and Selection Strategy
2.1.1. Search Strategy and Databases
- Core concepts: “Human–Computer Interaction (HCI)”, “Human-Centered Intelligent Interaction (HCII)”;
- Interaction modalities: “Voice interaction”, “Visual interaction”, “Touch interaction”, “Multimodal interaction”;
- Application context: “Manufacturing”, “Industry 4.0”, “Logistics”.
2.1.2. Selection Process and Data Synthesis
2.2. Functional Overview and Application Trends of AI-Based HCI Modalities
2.3. Visual Interaction
2.4. Voice Interaction
2.5. Multimodal Interaction
3. Materials and Methods
- Quantitative approach—focused on variables,
- Qualitative approach—centered on specific cases,
- Fuzzy approach—based on fuzzy set theory.
- represents the resulting score of the -th alternative;
- denotes the suitability of the -th alternative according to the -th criterion;
- expresses the weight of the -th criterion.
- Interactions for a given criterion were to be evaluated maximally; i.e., the better interaction satisfies the criterion, the higher its score within the assigned scale.
- Interactions for a given criterion were to be scored on an interval from 1 to 9 (1—very low satisfaction of the criterion, 2—low to very low, 3—low, 4—low to medium, 5—medium, 6—medium to high, 7—high, 8—high to very high, 9—very high).
- For criteria whose impact is opposite, an inverse scale was to be applied.
- During the evaluation, the experts were to remain neutral and base their judgments solely on facts recognized by the academic community.
4. Results and Discussion
4.1. Comparative Analysis
4.2. Multi-Criteria Evaluation of Interaction Modalities
- Criterion A: Noise levels;
- Criterion B: Dustiness;
- Criterion C: Response speed;
- Criterion D: Cognitive load;
- Criterion E: Training complexity;
- Criterion F: Operational costs;
- Criterion G: Implementation ease;
- Criterion H: Robustness to disturbances.
- Noise LevelReliability of visual interaction is very high in such environments, as it is independent of acoustic conditions. Noise has no direct impact on input (touch, buttons). In noisy settings, visual interaction is the most stable. Reliability of voice interaction is low to very low in this case. Various sounds in the production hall and the use of protective gear reduce voice recognition accuracy. The reliability of multimodal interaction is higher than pure voice. Voice can serve as a supplementary channel, but if voice fails, additional redundant command confirmations (visual, touch) and adaptive modality switching are available.Ratings for the noise criterion: visual—9, voice—2, multimodal—6.
- DustinessDust adversely affects the reliability of visual interaction; it can reduce touchscreen sensitivity, clog mechanical buttons, or impair display readability. Reliability of visual interaction is moderate to high in such operations, depending on how well the device is protected against various dust particles. Regarding voice interaction, dust does not impact acoustics like noise does. Issues here include microphone clogging, protective masks that muffle the voice, reduced speech intelligibility through respirators, or the need for frequent device maintenance. Reliability of multimodal interaction is high if the system is robust. Adaptive modality switching is available: if the display is dirty, voice input can be used, and vice versa—if the microphone is clogged, touch control can be employed. Regular cleaning is still required, of course. In dusty environments, multimodal interaction is a very strong solution.Ratings for the dust criterion: visual—6, voice—5, multimodal—8.
- Response SpeedResponse speed for visual interaction is high if the operator is at the visualization panel. Upon fault occurrence, the operator registers the visual alarm and performs the intervention. The display provides clear signaling and detailed information. Response speed depends on operator attention and human factors. Response speed for voice interaction is very high, with good intelligibility of the voice announcement. The operator hears it immediately, can respond with a voice command, and no visual contact is required. In noisy environments, response speed slows down. Response speed for multimodal interaction is very high. The operator is alerted through multiple means (display, sound, device light beacon, etc.), reducing the likelihood of missing it.Ratings for the response speed criterion: visual—7, voice—8, multimodal—9.
- Cognitive LoadCognitive load represents the number of mental resources required for information perception, comprehension, decision making, and action execution. For this criterion, note that higher operator cognitive load corresponds to fewer points on the rating scale. In visual interaction, the operator must monitor multiple parameters, filter alarms, and interpret graphs and numbers. The load is manageable but still requires sustained attention, with a risk of mental fatigue. Voice interaction leverages natural communication, reducing visual overload and memory load (no need to memorize commands). Multimodal interaction distributes information across senses, reducing mental fatigue compared to visual interaction. The operator selects the most suitable interaction channel. With a well-designed system, cognitive load for multimodal interaction can be rated as low.Ratings for the cognitive load criterion: visual—4, voice—6, multimodal—7.
- Training ComplexityFor this criterion, note that more demanding training—with longer duration and requiring multiple operator skills—receives fewer points on the rating scale. In visual interaction, the operator must learn menu structures, icon and alarm meanings, and response procedures. It requires basic technical thinking from the operator. For voice interaction, training emphasizes precise command phrasing, correct pronunciation, and disciplined practical use. For multimodal interaction, the operator must learn multiple control options, when to use which channel, how redundancy works, how to respond to single-modality failures, etc. Post-training, system use is intuitive, but initial demands are higher.Ratings for the training complexity criterion: visual—6, voice—8, multimodal—4.
- Operational CostsVisual interaction has low operational costs, including minimal maintenance and occasional software updates, given its long lifespan. Compared to visual interaction, voice interaction incurs higher operational costs, covering device maintenance, headset replacements, language model updates, and hygiene costs. Multimodal interaction has the highest operational costs (maintenance of multiple devices, spare parts, and update management). For this criterion, note that higher costs receive fewer points on the rating scale.Ratings for the operational costs criterion: visual—8, voice—5, multimodal—3.
- ImplementationVisual modality can be considered the simplest and lowest-risk implementation. Introducing voice modality is more time-consuming, requiring a pilot phase and thorough testing in real conditions. Among the modalities, multimodal has the highest implementation demands in operations, as it involves integrating multiple inputs, iterative testing, and training. For this criterion, note that a more demanding implementation receives fewer points on the rating scale.Ratings for the implementation criterion: visual—8, voice—4, multimodal—3.
- Robustness to DisturbancesVisual interaction is sensitive to dust, lighting, and vibrations but unaffected by noise and temperature. It is the most stable form of interaction in changing environments. Voice interaction is most sensitive to environmental changes (noise, hall echoes, respirators/masks, microphone vibrations, network latency from cloud ASR). Its advantage is independence from lighting. Multimodal interaction is highly resilient to environmental changes. It adapts situationally, with the benefit of redundancy switching one channel for another.Ratings for the robustness criterion: visual—6, voice—2, multimodal—8.
| Criterion | Interaction | ||
|---|---|---|---|
| Visual | Voice | Multimodal | |
| Noise level | 9 | 2 | 6 |
| Dustiness | 6 | 5 | 8 |
| Response speed | 7 | 8 | 9 |
| Cognitive load | 4 | 6 | 7 |
| Training complexity | 6 | 8 | 4 |
| Operational costs | 8 | 5 | 3 |
| Implementation | 8 | 4 | 3 |
| Robustness to disturbances | 6 | 2 | 8 |
4.3. Study Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Card, S.K.; Moran, T.P.; Newell, A. The Psychology of Human–Computer Interaction; CRC Press: Boca Raton, FL, USA, 1983. [Google Scholar]
- Pantic, M.; Pentland, A.; Nijholt, A.; Huang, T.S. Human computing and machine understanding of human behavior: A survey. In Artificial Intelligence for Human Computing; Huang, T.S., Nijholt, A., Pantic, M., Pentland, A., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4451, pp. 47–71. [Google Scholar] [CrossRef]
- Rogers, Y. HCI theory: Classical, modern, and contemporary. Synth. Lect. Hum.-Centered Inform. 2012, 5, 1–129. [Google Scholar] [CrossRef]
- Khan, S.B.; Chandna, S. Introduction to human–computer interaction using artificial intelligence. In Innovations in Artificial Intelligence and Human–Computer Interaction in the Digital Era; Academic Press: Cambridge, MA, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Jacobs, C.; Johnson, H.; Rennie, T.; Lambert, J.; Joiner, R. Human–computer interaction and artificial intelligence: Advancing care through extended mind theory. Cureus 2024, 16, e74968. [Google Scholar] [CrossRef]
- Sharma, S.; Shrestha, S. Integrating HCI principles in AI: A review of human-centered artificial intelligence applications and challenges. J. Future Artif. Intell. Technol. 2024, 1, 44–56. [Google Scholar] [CrossRef]
- Mazarakis, A.; Bernhard-Skala, C.; Braun, M.; Peters, I. What is critical for human-centered AI at work? Toward an interdisciplinary theory. Front. Artif. Intell. 2023, 6, 1257057. [Google Scholar] [CrossRef] [PubMed]
- Schmager, S.; Pappas, I.O.; Vassilakopoulou, P. Understanding Human-Centred AI: A Review of Its Defining Elements and a Research Agenda. Behav. Inf. Technol. 2025, 44, 3771–3810. [Google Scholar] [CrossRef]
- Raees, M.; Meijerink, I.; Lykourentzou, I.; Khan, V.-J.; Papangelis, K. From Explainable to Interactive AI: A Literature Review on Current Trends in Human-AI Interaction. arXiv 2024, arXiv:2405.15051. [Google Scholar] [CrossRef]
- Acharjya, P.; Joardar, S.; Koley, S. Artificial intelligence-based intelligent human–computer interaction. In Handbook of Research on AI Methods and Applications in Computer Engineering; IGI Global: Hershey, PA, USA, 2023; pp. 58–79. [Google Scholar] [CrossRef]
- Ding, Z.; Ji, Y.; Gan, Y.; Wang, Y.; Xia, Y. Current status and trends of technology, methods, and applications of human–computer intelligent interaction: A bibliometric research. Multimed. Tools Appl. 2024, 83, 69111–69144. [Google Scholar] [CrossRef]
- Womser-Hacker, C. Accessible human–computer interaction. In Handbook of Accessible Communication; Maaß, C., Rink, I., Eds.; Frank & Timme: Berlin, Germany, 2024; pp. 453–472. [Google Scholar] [CrossRef]
- Liu, B.H.; Pham, V.T.; Nguyen, T.N.; Luo, Y.S. A heuristic for maximizing the lifetime of data aggregation in wireless sensor networks. arXiv 2019, arXiv:1910.05310. [Google Scholar] [CrossRef]
- Panda, S.; Roy, S.T. Reflections on emerging HCI–AI research. AI Soc. 2024, 39, 407–409. [Google Scholar] [CrossRef]
- Ramadevi, P. Human–computer interaction: Bridging the gap between humans and technology. Int. Sci. J. Eng. Manag. 2025, 4, 1–5. [Google Scholar] [CrossRef]
- Bi, T.; Zhang, Y.; Wang, C.; Ayobi, A. Characterizing HCI research in China: Streams, methodologies and future directions. In Proceedings of the CHI 2019 Conference, Glasgow, UK, 4–9 May 2019. [Google Scholar] [CrossRef]
- Li, K.; Tiwari, A.; Alcock, J.; Bermell-Garcia, P. Categorisation of visualisation methods to support the design of human–computer interaction systems. Appl. Ergon. 2016, 55, 85–107. [Google Scholar] [CrossRef]
- Bhowmik, A.K. Natural and intuitive user interfaces with perceptual computing technologies. Inf. Disp. 2013, 29, 6–10. [Google Scholar] [CrossRef]
- Cheng, X.; Lin, X.; Shen, X.L.; Zarifis, A.; Mou, J. The dark sides of AI. Electron. Mark. 2022, 32, 11–15. [Google Scholar] [CrossRef]
- Yang, S.J.H.; Ogata, T.; Matsui, N.; Chen, N.S. Human-centered artificial intelligence in education: Seeing the invisible through the visible. Comput. Educ. Artif. Intell. 2021, 2, 100008. [Google Scholar] [CrossRef]
- Curchoe, C.L.; Bormann, C.L. Artificial intelligence and machine learning for human reproduction and embryology. J. Assist. Reprod. Genet. 2019, 36, 591–600. [Google Scholar] [CrossRef] [PubMed]
- Pisoni, G.; Díaz-Rodríguez, N.; Gijlers, H.; Tonolli, L. Human-centred artificial intelligence for designing accessible cultural heritage. Appl. Sci. 2021, 11, 870. [Google Scholar] [CrossRef]
- Liao, H.; Zhou, Z.; Zhao, X.; Zhang, L.; Mumtaz, S.; Jolfaei, A.; Ahmed, S.H.; Bashir, A.K. Learning-based context-aware resource allocation for edge-computing-empowered industrial IoT. IEEE Internet Things J. 2020, 7, 4260–4277. [Google Scholar] [CrossRef]
- Alkatheiri, M.S. Artificial intelligence assisted improved human–computer interactions for computer systems: A systematic review. Comput. Electr. Eng. 2022, 101, 107950. [Google Scholar] [CrossRef]
- Lyu, Z.; Poiesi, F.; Dong, Q.; Lloret, J.; Song, H. Deep learning for intelligent human–computer interaction. Appl. Sci. 2022, 12, 11457. [Google Scholar] [CrossRef]
- Grigsby, S.S. Artificial intelligence for advanced human–machine symbiosis. In Augmented Cognition: Intelligent Technologies; Springer: Cham, Switzerland, 2018; pp. 255–266. [Google Scholar] [CrossRef]
- Gomes, C.C.; Preto, S. Artificial intelligence and interaction design for a positive emotional user experience. In Intelligent Human Systems Integration; Springer: Cham, Switzerland, 2018; pp. 321–327. [Google Scholar]
- Zhang, C.; Lu, Y. Study on artificial intelligence: The state of the art and future prospects. J. Ind. Inf. Integr. 2021, 23, 100224. [Google Scholar] [CrossRef]
- Ahamed, M.M. Analysis of human–machine interaction design perspective: A comprehensive literature review. Int. J. Contemp. Comput. Res. 2017, 1, 31–42. [Google Scholar]
- Šumak, B.; Brdnik, S.; Pušnik, M. Sensors and artificial intelligence methods and algorithms for human–computer intelligent interaction: A systematic mapping study. Sensors 2022, 22, 20. [Google Scholar] [CrossRef]
- Lin, L.; Qiu, J.; Lao, J. Intelligent human–computer interaction: A perspective on software engineering. In Proceedings of the 14th International Conference on Computer Science & Education (ICCSE 2019), Toronto, ON, Canada, 19–21 August 2019; pp. 488–492. [Google Scholar]
- Hussain, J.; Ul Hassan, A.; Bilal, H.S.M.; Ali, R.; Afzal, M.; Hussain, S.; Bang, J.; Banos, O.; Lee, S. Model-based adaptive user interface based on context and user experience evaluation. J. Multimodal User Interfaces 2018, 12, 1–16. [Google Scholar] [CrossRef]
- Garcia-Moreno, F.M.; Bermudez-Edo, M.; Rodriguez-Fortiz, M.J.; Garrido, J.L. A CNN–LSTM deep learning classifier for motor imagery EEG detection using a low-invasive and low-cost BCI headband. In Proceedings of the 16th International Conference on Intelligent Environments (IE 2020), Madrid, Spain, 20–23 July 2020; pp. 84–91. [Google Scholar]
- Oviatt, S.; Schuller, B.; Cohen, P.R.; Sonntag, D.; Potamianos, G.; Krüger, A. (Eds.) The Handbook of Multimodal–Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations, Vol. 1; Association for Computing Machinery: New York, NY, USA, 2017; Volume 1. [Google Scholar]
- Govindaraju, D.; Thangam, D. Emotion recognition in human–machine interaction and a review in interpersonal communication perspective. In Handbook of Research on AI and Human Interaction; IGI Global: Hershey, PA, USA, 2024; pp. 312–330. [Google Scholar] [CrossRef]
- Jacob, R.; Karn, K. Eye tracking in human–computer interaction and usability research: Ready to deliver the promises. In The Mind’s Eye; Hyönä, J., Radach, R., Deubel, H., Eds.; Elsevier: Amsterdam, The Netherlands, 2003; pp. 573–605. [Google Scholar] [CrossRef]
- Thalmann, D. Sensors and actuators for HCI and VR: A few case studies. In Frontiers in Electronic Technologies; Prabaharan, S., Thalmann, N., Kanchana Bhaaskaran, V., Eds.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2017; Volume 433, pp. 41–56. [Google Scholar] [CrossRef]
- Zhang, S.; Song, R.; Cheng, J.; Zhang, Y.; Chen, X. A feasibility study of a video-based heart rate estimation method with convolutional neural networks. In Proceedings of the IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA 2019), Tianjin, China, 14–16 June 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Panahi, O.; Ezzati, A. AI in dental medicine: Current applications and future directions. Open Access J. Clin. Images 2025, 2, 1–5. [Google Scholar] [CrossRef]
- Salloum, S.A.; Alomari, K.M.; Alfaisal, A.M.; Aljanada, R.A.; Basiouni, A. Emotion recognition for enhanced learning: Using AI to detect students’ emotions and adjust teaching methods. Smart Learn. Environ. 2025, 12, 21. [Google Scholar] [CrossRef]
- Telceken, M.; Akgun, D.; Kacar, S.; Yesin, K.; Yıldız, M. Can artificial intelligence understand our emotions? Deep learning applications with face recognition. Curr. Psychol. 2025, 44, 7946–7956. [Google Scholar] [CrossRef]
- Lagorio, A.; Di Pasquale, V.; Cimini, C.; Miranda, S.; Pinto, R. Augmented reality in logistics 4.0: Implications for the human work. IFAC-PapersOnLine 2022, 55, 329–334. [Google Scholar] [CrossRef]
- Stockinger, C.; Steinebach, T.; Petrat, D.; Bruns, R.; Zöller, I. The effect of pick-by-light systems on situation awareness in order picking activities. Procedia Manuf. 2020, 45, 96–101. [Google Scholar] [CrossRef]
- Sharma, R.P.; Verma, G.K. Human–computer interaction using hand gesture. Procedia Comput. Sci. 2015, 54, 721–727. [Google Scholar] [CrossRef]
- Yang, J.; Liu, Y.; Morgan, P.L. Human–machine interaction towards Industry 5.0: Human-centric smart manufacturing. Digit. Eng. 2024, 2, 100013. [Google Scholar] [CrossRef]
- Dritsas, E.; Trigka, M.; Troussas, C.; Mylonas, P. Multimodal Interaction, Interfaces, and Communication: A Survey. Multimodal Technol. Interact. 2025, 9, 6. [Google Scholar] [CrossRef]
- Saha, N.; Gadow, V.; Harik, R. Emerging Technologies in Augmented Reality (AR) and Virtual Reality (VR) for Manufacturing Applications: A Comprehensive Review. J. Manuf. Mater. Process. 2025, 9, 297. [Google Scholar] [CrossRef]
- Alhussen, A.; Ansari, A.S.; Mohammadi, M.S. Enhancing user experience in AI-powered human–computer communication with vocal emotions identification using a novel deep learning method. Comput. Mater. Contin. 2025, 82, 2909–2929. [Google Scholar] [CrossRef]
- Velagaleti, S.B.; Choukaier, D.; Nuthakki, R.; Lamba, V.; Sharma, V.; Rahul, S. Empathetic algorithms: The role of AI in understanding and enhancing human emotional intelligence. J. Electr. Syst. 2024, 20, 2051–2060. [Google Scholar] [CrossRef]
- Huang, M.-H.; Rust, R.T. The GenAI future of consumer research. J. Consum. Res. 2025, 52, 4–17. [Google Scholar] [CrossRef]
- Wirtz, J.; Stock-Homburg, R. Generative AI meets service robots: The promise of LLMs, LBMs, and agentic AI in physical service encounters. J. Serv. Res. 2025, 28, 527–543. [Google Scholar] [CrossRef]
- Meshram, S.; Naik, N.; More, T.; Kharche, S. Conversational AI: Chatbots. In Proceedings of the International Conference on Intelligent Technologies (CONIT 2021), Hubli, India, 14–16 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Hu, P.; Gong, Y.; Lu, Y.; Ding, A.W. Speaking vs. listening? Balance conversation attributes of voice assistants for better voice marketing. Int. J. Res. Mark. 2023, 40, 109–127. [Google Scholar] [CrossRef]
- Park, D.; Kim, E. Method of interacting between humans and conversational voice agent systems. Heliyon 2024, 10, e23573. [Google Scholar] [CrossRef]
- Blut, M.; Wünderlich, N.V.; Brock, C. Facilitating retail customers’ use of AI-based virtual assistants: A meta-analysis. J. Retail. 2024, 100, 293–315. [Google Scholar] [CrossRef]
- Guha, A.; Grewal, D.; Kopalle, P.K.; Haenlein, M.; Schneider, M.J.; Jung, H.; Moustafa, R.; Hegde, D.R.; Hawkins, G. How artificial intelligence will affect the future of retailing. J. Retail. 2021, 97, 28–41. [Google Scholar] [CrossRef]
- Loeffler, C.M.L.; Muti, H.; Kather, J.; Truhn, D. Bridging communication gaps: The role of voice-enabled AI in medicine. ESMO Real World Data Digit. Oncol. 2025, 8, 100138. [Google Scholar] [CrossRef]
- Muddaloor, P.; Baraskar, B.; Shah, H.; Gopalakrishnan, K.; Sood, D.; Pasupuleti, P.C.; Singh, A.; Mitra, D.; Hoskote, S.S.; Iyer, V.N.; et al. The human voice as a digital health solution leveraging artificial intelligence. Sensors 2025, 25, 3424. [Google Scholar] [CrossRef]
- Chirita, R.; Ciobanescu, S.A.; Ungureanu, C.; Sbircea, I. Impact of artificial intelligence in the automotive industry. FAIMA Bus. Manag. J. 2025, 13, 49–58. [Google Scholar]
- Alghlayini, S.; Deriche, M. A personalized smart home control system for the elderly and people with disabilities using Arabic voice commands. In Proceedings of the IEEE 22nd International Multi-Conference on Systems, Signals & Devices (SSD 2025), Monastir, Tunisia, 17–20 February 2025; pp. 1346–1350. [Google Scholar] [CrossRef]
- Ludwig, H.; Schmidt, T.; Kühn, M. Voice user interfaces in manufacturing logistics: A literature review. Int. J. Speech Technol. 2023, 26, 627–639. [Google Scholar] [CrossRef]
- Dujmešić, N.; Bajor, I.; Rožić, T. Warehouse processes improvement by pick by voice technology. Teh. Vjesn. 2018, 25, 1227–1233. [Google Scholar] [CrossRef]
- Li, Y.; Huang, J.; Tian, F.; Wang, H.-A.; Dai, G.-Z. Gesture interaction in virtual reality. Virtual Real. Intell. Hardw. 2019, 1, 84–112. [Google Scholar] [CrossRef]
- Jaimes, A.; Sebe, N. Multimodal human–computer interaction: A survey. Comput. Vis. Image Underst. 2007, 108, 116–134. [Google Scholar] [CrossRef]
- Cohen, P.R.; McGee, D.R. Tangible multimodal interfaces for safety-critical applications. Commun. ACM 2004, 47, 41–46. [Google Scholar] [CrossRef]
- Gonsher, I. Beyond the keyboard, mouse, and screen: New paradigms in interface design. In Proceedings of the Future Technologies Conference (FTC 2021); Arai, K., Ed.; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2022; Volume 358, pp. 115–125. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, M.; Liu, Q. A review on multimodal communications for human–robot collaboration in 5G: From visual to tactile. Intell. Robot. 2025, 5, 579–606. [Google Scholar] [CrossRef]
- Chojecki, P.; Strazdas, D.; Przewozny, D.; Gard, N.; Runde, D.; Hoerner, N.; Al-Hamadi, A.; Eisert, P.; Bosse, S. Assessing the value of multimodal interfaces: A study on human–machine interaction in weld inspection workstations. Sensors 2023, 23, 5043. [Google Scholar] [CrossRef]
- Wu, D.; Zheng, P.; Zhao, Q.; Zhang, S.; Qi, J.; Hu, J.; Zhu, G.-N.; Wang, L. Empowering natural human–robot collaboration through multimodal language models and spatial intelligence: Pathways and perspectives. Robot. Comput.-Integr. Manuf. 2026, 97, 103064. [Google Scholar] [CrossRef]
- Bolbakov, R.G.; Morgunov, V.S.; Solovyev, I.V.; Tsvetkov, V.Y. Methods of comparative analysis. J. Phys. Conf. Ser. 2020, 1679, 052047. [Google Scholar] [CrossRef]
- Mills, M.; Van de Bunt, G.G.; De Bruijn, J. Comparative research: Persistent problems and promising solutions. Int. Sociol. 2006, 21, 619–631. [Google Scholar] [CrossRef]
- Kondratenko, Y.P.; Klymenko, L.P.; Sidenko, I.V. Comparative analysis of evaluation algorithms for decision-making in transport logistics. In Advance Trends in Soft Computing: Proceedings of WCSC 2013; Springer: Heidelberg, Germany, 2013; Volume 312, pp. 203–216. [Google Scholar]
- Ohakwe, C.R.; Wu, J. The impact of macroeconomic indicators on logistics performance: A comparative analysis using simulated scenarios. Sustain. Futures 2025, 9, 100567. [Google Scholar] [CrossRef]
- Özcan, T.; Çelebi, N.; Esnaf, Ş. Comparative analysis of multi-criteria decision making methodologies and implementation of a warehouse location selection problem. Expert Syst. Appl. 2011, 38, 9773–9779. [Google Scholar] [CrossRef]
- Shah, S.; Suraj, D.; Reza, S.M.; Salam, M.A.R.B.A.; Ashraf, A.; Ferdous, S.F. Comparative analysis of deep learning models for defect detection in additive manufacturing using thermal imaging. Results Eng. 2025, 28, 108359. [Google Scholar] [CrossRef]
- Sa’ei, A. Comparative Research Method: Quantitative, Historical and Fuzzy Analysis; Samt publications: Tehran, Iran, 2013; pp. 10–50. [Google Scholar]
- Øvretveit, J. Comparative and Cross-Cultural Health Research: A Practical Guide; Radcliffe Medical Press: Oxford, UK, 1998; pp. 1–187. [Google Scholar]
- Livingstone, S. On the challenges of cross-national comparative media research. Eur. J. Commun. 2003, 18, 477–500. [Google Scholar] [CrossRef][Green Version]
- Gharawi, M.A.; Pardo, T.A.; Guerrero, S. Issues and strategies for conducting cross-national e-government comparative research. In Proceedings of the 3rd International Conference on Theory and Practice of Electronic Governance, Bogota, Colombia, 10–13 November 2009; pp. 163–170. [Google Scholar]
- Dean, M. Multi-criteria analysis. In Advances in Transport Policy and Planning; Mouter, N., Ed.; Academic Press: Cambridge, MA, USA, 2020; Volume 6, pp. 165–224. [Google Scholar]
- Triantaphyllou, E.; Mann, H. An examination of the effectiveness of multi-dimensional decision-making methods: A decision-making paradox. Decis. Support Syst. 1989, 5, 303–312. [Google Scholar] [CrossRef]
- Watróbski, J.; Jankowski, J.; Ziemba, P.; Karczmarczyk, A.; Ziolo, M. Generalised framework for multi-criteria method selection. Omega 2018, 86, 107–124. [Google Scholar] [CrossRef]
- Jahan, A.; Edwards, K.L. A state-of-the-art survey on the influence of normalization techniques in ranking: Improving the materials selection process in engineering design. Mater. Des. 2015, 65, 335–342. [Google Scholar] [CrossRef]
- Straka, M. Logistika Distribúcie. Ako Efektívne Dostať Výrobok Na Trh; Epos: Bratislava, Slovakia, 2013; pp. 1–399. [Google Scholar]
- Vafaei, N.; Ribeiro, R.A.; Camarinha-Matos, L.M. Assessing normalization techniques for simple additive weighting method. Procedia Comput. Sci. 2022, 199, 1229–1236. [Google Scholar] [CrossRef]
- Stoltz, M.-H.; Giannikas, V.; McFarlane, D.; Strachan, J.; Um, J.; Srinivasan, R. Augmented reality in warehouse operations: Opportunities and challenges. IFAC-PapersOnLine 2017, 50, 12979–12984. [Google Scholar] [CrossRef]
- Egger, J.; Masood, T. Augmented reality in support of intelligent manufacturing: A systematic literature review. Comput. Ind. Eng. 2020, 140, 106195. [Google Scholar] [CrossRef]
- Jaghbeer, Y.; Hanson, R.; Johansson, M.I. Automated order picking systems and the links between design and performance: A systematic literature review. Int. J. Prod. Res. 2020, 58, 4489–4505. [Google Scholar] [CrossRef]
- Ziaee, O.; Hamedi, M. Augmented reality applications in manufacturing and its future scope in Industry 4.0. arXiv 2021, arXiv:2112.11190. [Google Scholar] [CrossRef]
- Wang, T.; Zheng, P.; Li, S.; Wang, L. Multimodal human–robot interaction for human-centric smart manufacturing: A survey. Adv. Intell. Syst. 2023, 5, 2300359. [Google Scholar] [CrossRef]
- Lee, H.; Jiang, N.; Samuel, S. Detection of error in static and dynamic visual stimulation via electroencephalogram and eye-tracking systems. Eng. Appl. Artif. Intell. 2025, 159, 111688. [Google Scholar] [CrossRef]
- Taban, R.A.; Croock, M.S. Eye tracking based directional control system using mobile applications. Int. J. Comput. Digit. Syst. 2018, 7, 365–374. [Google Scholar] [CrossRef]
- Jo, H. Interaction, novelty, voice, and discomfort in the use of artificial intelligence voice assistant. Univers. Access Inf. Soc. 2025, 24, 2419–2432. [Google Scholar] [CrossRef]
- De Carvalho, D.; Hoffmann, K.; Nunes Filho, J.R.; Baptistella, A.R. Enhancing mechanical ventilation management with AI: Computer vision for automated detection of ventilatory modes, parameters and asynchrony. J. Crit. Care 2026, 91, 155238. [Google Scholar] [CrossRef]
- Hamdani, R.; Chihi, I. Adaptive human-computer interaction for industry 5.0: A novel concept, with comprehensive review and empirical validation. Comput. Ind. 2025, 168, 104268. [Google Scholar] [CrossRef]
- Centre for Sustainable Human-Machine Interaction in Eco-Innovative Manufacturing. Centra Doskonałości Naukowej i Technologicznej Uniwersytetu Zielonogórskiego. 2024. Available online: https://cdnit.uz.zgora.pl/en/centre-for-sustainable-human-machine-interaction-in-eco-innovative-manufacturing/ (accessed on 26 February 2026).



| HCI Type | Modality | Interaction Form | Source |
|---|---|---|---|
| Traditional | Visual | Manual input devices (handheld scanners) and their comparison with AR in parcel sorting processes. | Stoltz et al., 2017 [86] |
| Pick-to-light visual signaling for item pick confirmation in warehouses. | Jaghbeer, Y. et al., 2020 [88] | ||
| Directional eye-tracking control using mobile camera for movement control of mobile objects. | Taban, R. A. et al., 2018 [92] | ||
| Voice/Audio | Voice confirmation of picks in automated parts-to-picker warehouse systems. | Jaghbeer, Y. et al., 2020 [88] | |
| Multimodal | Model-based adaptive user interfaces (UIs) adapting to context and feedback. | Hussain, J. et al., 2018 [32] | |
| Physical interface (BCI headband) for EEG signal capture in motor imagery. | Garcia-Moreno, F.M. et al., 2020 [33] | ||
| Monitoring and control of fully automated arms (human as supervisor). | Jaghbeer, Y. et al., 2020 [88] | ||
| AI-based | Visual | Intelligent AR applications with automatic inspection and real-time digital data integration. | Stoltz et al., 2017; Ziaee & Hamedi, 2021 [86,89] |
| Augmented reality (AR) in Industry 4.0 focused on reducing cognitive load. | Egger & Masood, 2020; Ziaee & Hamedi, 2021 [87,89] | ||
| AI cameras for item verification and computer vision for personnel detection near robots (AGV/AMR). | Jaghbeer, Y. et al., 2020 [88] | ||
| Computer vision for automatic analysis of ventilators without integration. | De Carvalho, D. et al., 2026 [94] | ||
| Voice/Audio | Optimization and factors for long-term use of intelligent AI voice assistants. | Jo, H., 2025 [93] | |
| Automatic vocal emotion identification using deep learning models for empathetic interaction. | Alhussen, A. et al., 2025 [48] | ||
| Multimodal | Human-centric robotics with mode redundancy (voice + gestures) for smart manufacturing. | Wang, T. et al., 2023 [90] | |
| AI models for adaptive interpretation of EEG and eye-tracking data to detect error states. | Lee, H. et al., 2025 [91] | ||
| Empathetic algorithms using NLP and computer vision for autonomous emotion analysis. | Velagaleti S. B. et al., 2024 [49] |
| Aspect | Traditional HCI | AI-Based HCI |
|---|---|---|
| System Adaptation | Manual parameter and sensor setup for data collection. | Adaptive interface that autonomously adjusts to the user. |
| Interaction | Requirement for direct physical contact (keyboard, mouse, scanner). | Contactless interaction leveraging biometrics, facial, or posture recognition. |
| Cognitive Load | User adapts to the system’s logic and structure. | System adapts to the user’s cognitive states and needs. |
| Control Complexity | High dependence on operator skill level and experience. | Simplified control is accessible even to less experienced users. |
| Output Presentation | Static information display (fixed menus, icons, windows). | Dynamic, context-sensitive visualizations and adaptive UI. |
| Interface Learning | Time-intensive training and mastery of manual procedures. | Intuitive interaction requires minimal prior training. |
| Criterion | A | B | C | D | E | F | G | H | Frequency of Occurrences | Normalized Weight |
|---|---|---|---|---|---|---|---|---|---|---|
| A | - | B | C | A | E | A | G | H | 2 | 0.071 |
| B | - | - | C | B | E | B | G | H | 3 | 0.107 |
| C | - | - | - | C | C | C | C | H | 6 | 0.215 |
| D | - | - | - | - | D | D | G | H | 2 | 0.071 |
| E | - | - | - | - | - | E | G | H | 3 | 0.107 |
| F | - | - | - | - | - | - | F | H | 1 | 0.036 |
| G | - | - | - | - | - | - | - | H | 4 | 0.143 |
| H | - | - | - | - | - | - | - | - | 7 | 0.250 |
| Sum | 28 | 1 | ||||||||
| Criterion | Expert 1 Evaluation | Expert 2 Evaluation | Expert 3 Evaluation | Weight wj |
|---|---|---|---|---|
| A | 0.071 | 0.167 | 0.056 | 0.098 |
| B | 0.107 | 0.194 | 0.139 | 0.147 |
| C | 0.215 | 0.139 | 0.167 | 0.174 |
| D | 0.071 | 0.083 | 0.083 | 0.079 |
| E | 0.107 | 0.056 | 0.028 | 0.064 |
| F | 0.036 | 0.028 | 0.138 | 0.067 |
| G | 0.143 | 0.111 | 0.167 | 0.140 |
| H | 0.25 | 0.222 | 0.222 | 0.231 |
| Criterion A | Visual | Voice | Multimodal | Frequency of Occurrences | uij |
|---|---|---|---|---|---|
| Visual | - | Visual | Visual | 2 | 0.667 |
| Voice | - | - | Multimodal | 0 | 0 |
| Multimodal | - | - | - | 1 | 0.333 |
| Sum | 3 | 1 | |||
| Criterion B | Visual | Voice | Multimodal | ||
| Visual | - | Visual | Multimodal | 1 | 0.333 |
| Voice | - | - | Multimodal | 0 | 0 |
| Multimodal | - | - | - | 2 | 0.667 |
| Sum | 3 | 1 | |||
| Criterion C | Visual | Voice | Multimodal | ||
| Visual | - | Voice | Multimodal | 0 | 0 |
| Voice | - | - | Multimodal | 1 | 0.333 |
| Multimodal | - | - | - | 2 | 0.667 |
| Sum | 3 | 1 | |||
| Criterion D | Visual | Voice | Multimodal | ||
| Visual | - | Voice | Multimodal | 0 | 0 |
| Voice | - | - | Multimodal | 1 | 0.333 |
| Multimodal | - | - | - | 2 | 0.667 |
| Sum | 3 | 1 | |||
| Criterion E | Visual | Voice | Multimodal | ||
| Visual | - | Voice | Visual | 1 | 0.333 |
| Voice | - | - | Voice | 2 | 0.667 |
| Multimodal | - | - | - | 0 | 0 |
| Sum | 3 | 1 | |||
| Criterion F | Visual | Voice | Multimodal | ||
| Visual | - | Visual | Visual | 2 | 0.667 |
| Voice | - | - | Voice | 1 | 0.333 |
| Multimodal | - | - | - | 0 | 0 |
| Sum | 3 | 1 | |||
| Criterion G | Visual | Voice | Multimodal | ||
| Visual | - | Visual | Visual | 2 | 0.667 |
| Voice | - | - | Voice | 1 | 0.333 |
| Multimodal | - | - | - | 0 | 0 |
| Sum | 3 | 1 | |||
| Criterion H | Visual | Voice | Multimodal | ||
| Visual | - | Visual | Multimodal | 1 | 0.333 |
| Voice | - | - | Multimodal | 0 | 0 |
| Multimodal | - | - | - | 2 | 0.667 |
| Sum | 3 | 1 | |||
| Criterion | Weight | Visual Interaction | Voice Interaction | Multimodal Interaction | |||
|---|---|---|---|---|---|---|---|
| wj | uij | wj × uij | uij | wj × uij | uij | wj × uij | |
| A | 0.098 | 0.667 | 0.065366 | 0 | 0 | 0.333 | 0.032634 |
| B | 0.147 | 0.333 | 0.048951 | 0 | 0 | 0.667 | 0.098049 |
| C | 0.174 | 0 | 0 | 0.333 | 0.057942 | 0.667 | 0.116058 |
| D | 0.079 | 0 | 0 | 0.333 | 0.026307 | 0.667 | 0.052693 |
| E | 0.064 | 0.333 | 0.021312 | 0.667 | 0.042688 | 0 | 0 |
| F | 0.067 | 0.667 | 0.044689 | 0.333 | 0.022311 | 0 | 0 |
| G | 0.14 | 0.667 | 0.09338 | 0.333 | 0.04662 | 0 | 0 |
| H | 0.231 | 0.333 | 0.076923 | 0 | 0 | 0.667 | 0.154077 |
| Sum | 1 | A1 | 0.3506 | A2 | 0.1959 | A3 | 0.4535 |
| Scenario | Outcome | Modalities Sequence |
|---|---|---|
| Significant reduction in weight H (H → 0) | A3 loses value (0.2994); A1 approaches the value of A3 (0.2737). | A3 > A1 > A2 |
| Reduction in weights H and C by 50% | A3 loses value (0.318444); A1 approaches the value of A3 (0.3122). | A3 > A1 > A2 |
| Increase in weight G by 50% | A3 does not change its value; A1 approaches the value of A3 (0.3973). | A3 > A1 > A2 |
| Increase in weights A and G by 50% | A1 is favored (0.3980); A3 is disadvantaged (0.9530). | A1 > A3 > A2 |
| Increase in weights of criteria (C, D, E) by 100% | A3 remains the winner (0.6222); A2 (0.3228) approaches A1 (0.3719). | A3 > A1 > A2 |
| Extreme scenario (all weights equal = 0.125) | A1 would significantly benefit (0.3570); A3 would lose dominance (0.3751). | A3 > A1 > A2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Muchova, P.; Saderova, J.; Ondov, M. Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing. Sustainability 2026, 18, 4638. https://doi.org/10.3390/su18104638
Muchova P, Saderova J, Ondov M. Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing. Sustainability. 2026; 18(10):4638. https://doi.org/10.3390/su18104638
Chicago/Turabian StyleMuchova, Patricia, Janka Saderova, and Marek Ondov. 2026. "Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing" Sustainability 18, no. 10: 4638. https://doi.org/10.3390/su18104638
APA StyleMuchova, P., Saderova, J., & Ondov, M. (2026). Comparison of AI-Based HCI Modalities for Selecting Interaction Systems in Sustainable Manufacturing. Sustainability, 18(10), 4638. https://doi.org/10.3390/su18104638

