From Human–Machine Interaction to Human–Machine Cooperation: Status and Progress

Tomislav Stipancic; Duska Rosenberg

doi:10.3390/app15179475

and

¹

Faculty of Mechanical Engineering and Naval Architecture, University of Zagreb, Ivana Lucica 5, 10000 Zagreb, Croatia

²

iCOM Research, Royal Holloway, University of London, 11 Bedford Square, London WC1B 3 DP, UK

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(17), 9475;https://doi.org/10.3390/app15179475

This article belongs to the Special Issue From Human–Machine Interaction to Human–Machine Cooperation: Status and Progress

Version Notes

Order Reprints

1. Introduction

Advances in artificial intelligence (AI) and cyber–physical systems are transforming human–machine interaction (HMI) into human–machine cooperation (HMC), where humans and machines collaborate, adapt, and share goals within virtual, augmented, or real environments. This Special Issue, titled “From Human–Machine Interaction to Human–Machine Cooperation: Status and Progress”, was conceived to capture this shift, bringing together contributions on multimodal interaction, cooperative robotics, affective systems, and emerging applications in domains such as the Internet of Things (IoT), the Metaverse, and extended reality (XR).

HMI addresses how people and automated systems communicate and cooperate in virtual, augmented, and real environments []. With the rise in artificial intelligence (AI) and cyber–physical systems, the research focus has shifted from basic interaction toward more advanced cooperation []. This transition underpins the development of (1) collaborative, social, and industrial robots; (2) bioinspired and digital systems; and (3) devices for the Internet of Things (IoT), the Metaverse, and beyond []. Research in this area is inherently interdisciplinary, combining insights from robotics, computer science, psychology, and engineering. It requires innovations in behavior modeling, task and motion planning, learning, activity recognition, intention prediction, multimodal interaction, and affective systems [].

Within this context, human–robot collaboration (HRC) has emerged as a key paradigm for Industry 5.0 []. Efficient and safe HRC relies heavily on sensors, with machine vision playing a central role in contextual modeling []. Context-awareness, the capacity to use information to characterize an entity’s situation, enables flexible production lines that dynamically adapt to shifting requirements [,,]. For example, hybrid industrial assembly stations monitor human behavior to predict collaboration needs [], while probabilistic models predict human motion and intention in real time, improving both interaction safety and fluency []. Collision-free collaboration frameworks extend this approach by exploiting sensor data to predict hazards and adapt robot trajectories accordingly [].

The integration of AI and machine learning further enhances HRC by supporting real-time decision-making []. Advanced reviews highlight the importance of combining sensors with AI algorithms to create intelligent and adaptive collaboration systems []. Digital twins are emerging as powerful tools for predictive analytics, providing real-time feedback for safer and more efficient environments []. Still, research gaps remain, particularly in handling complex multimodal data and improving robustness under uncertainty.

Datasets are essential for training reliable recognition models. General-purpose collections, such as Something-Something [] or EPIC-KITCHENS [], focus on everyday activities but lack multimodal inputs suited for industrial contexts. By contrast, specialized datasets such as InHARD [] provide RGB-D and skeletal motion data, directly supporting the recognition of industrial human actions. These enable the development of robust models for safety-critical tasks, including tool handling and collaborative assembly.

Recent methods apply deep learning, including graph convolutional networks [], LSTMs [], transfer learning [], and hierarchical networks []. More recently, transformers leveraging self-attention have shown strong potential for robust and adaptive action recognition [,,,].

2. An Overview of Published Articles

The studies collected in this Special Issue reflect several important developments in the field. There is an increasing emphasis on integrating AI-driven perception with decision-making to create systems that respond proactively rather than reactively. Human-centric design principles are being adopted to ensure usability, trust, and adaptability in real-world deployments, while multimodal approaches that combine data from diverse sources are proving essential for achieving robust and context-aware cooperation. The breadth of topics, from deep learning and bio signal processing to industrial planning and social-scientific analyses, underscores the interdisciplinary nature of human–machine cooperation and the need for cross-domain collaboration.

This first edition comprises six high-quality papers spanning technical, applied, and analytical perspectives. The first paper investigates deep neural networks for detecting selected object classes in digital images, comparing supervised, self-supervised, and transfer learning methods. The results identify YOLOv8 with transfer learning as the most effective configuration, offering practical insights for image-based recognition systems in diverse domains [].

The second contribution addresses the challenge of detecting subtle variations in children’s breath sounds, which are often missed by caregivers. Using clinical data from patients, the authors propose an AI-based diagnostic platform that integrates advanced signal processing and multiple classification algorithms, enabling real-time respiratory condition assessment and holding strong potential for early diagnosis and cost reduction in healthcare [].

The third paper proposes a simulation-based framework to holistically assess performance in algorithmic decision-making environments, where human and AI rationalities intersect. Using a bike-sharing case study with New York Citi Bike data, the study demonstrates how misalignments between incentives and operational needs can arise, showing the value of simulation in preemptively identifying inefficiencies [].

The fourth contribution focuses on EEG-based emotion recognition, evaluating features across time, frequency, time–frequency, and spatial domains. The authors demonstrate that hybrid feature sets combined with simple classifiers achieve high accuracy, with the best result obtained using an artificial neural network and four-domain hybrid features []. The fifth paper presents the Integrated Multilevel Planning Solution (IMPS), a human-centric Industry 4.0/5.0 planning tool for SMEs engaged in design-to-order manufacturing. By integrating multiple software platforms into a cohesive architecture, the IMPS enables multivariant, multiuser planning without major system overhauls, addressing common barriers to digital transformation such as resistance to change and resource limitations [].

The final paper offers a longitudinal content analysis of AI research in communication scholarship from 2006 to 2022. The findings reveal a steady growth in publications and emphasize the need for evolving theoretical frameworks to address the cultural, political, and societal implications of AI [].

3. Conclusions

The first edition of this Special Issue demonstrates that the transition from interaction to cooperation is not merely incremental but represents a fundamental rethinking of how humans and machines can operate together. The contributions presented here mark significant progress in the field while also highlighting challenges that remain to be addressed. Looking ahead, recent advances in AI, particularly in the areas of agentic AI and foundation models, are poised to further transform human–robot interaction. Agentic AI introduces autonomous reasoning and goal-driven behavior, enabling robots to make context-aware decisions and coordinate complex tasks over extended periods without continuous human oversight. Foundation models, with their broad multimodal capabilities, provide powerful new tools for perception, language understanding, and adaptive dialog, supporting richer and more natural cooperation between humans and machines. The integration of these technologies into HMC systems will be a major driver of the next generation of adaptive, trustworthy, and reality-agnostic collaborative systems. Building on the success of the first edition, the second edition will continue to explore theoretical, methodological, and applied advances in this evolving field, and we warmly invite the research community to contribute to its ongoing development.

Author Contributions

T.S.: writing—original draft preparation; T.S. and D.R.: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

ChatGPT 5 (OpenAI, San Francisco, CA, USA) was used during the preparation of this closing editorial to assist with English proofreading and reference format checking. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ren, M.; Chen, N.; Qiu, H. Human–machine collaborative decision-making: An evolutionary roadmap based on cognitive intelligence. Int. J. Soc. Robot. 2023, 15, 1101–1114. [Google Scholar] [CrossRef]
Semeraro, F.; Griffiths, A.; Cangelosi, A. Human–robot collaboration and machine learning: A systematic review of recent research. Rob. Comput.-Integr. Manuf. 2023, 79, 102432. [Google Scholar] [CrossRef]
Lodhi, S.K.; Zeb, S. AI-driven robotics and automation: The evolution of human–machine collaboration. J. World Sci. 2025, 4, 422–437. [Google Scholar] [CrossRef]
Vijay, R.; Kumar, A. The collaboration between humans and machines. In Advanced Digital Technologies in Financial and Business Management; Apple Academic Press: Waretown, NJ, USA, 2025; pp. 275–290. [Google Scholar]
Pizoń, J.; Gola, A. Human–machine relationship—Perspective and future roadmap for Industry 5.0 solutions. Machines 2023, 11, 203. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, H.; Geng, J.; Jiang, W.; Deng, X.; Miao, W. An information fusion method based on deep learning and fuzzy discount-weighting for target intention recognition. Eng. Appl. Artif. Intell. 2022, 109, 104610. [Google Scholar] [CrossRef]
Krumm, J. (Ed.) Ubiquitous Computing Fundamentals; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar] [CrossRef]
Gomez Cubero, C.; Rehm, M. Intention recognition in human–robot interaction based on eye tracking. In Proceedings of the IFIP Conference on Human–Computer Interaction (INTERACT 2021), Bari, Italy, 30 August–3 September 2021; Springer: Cham, Switzerland, 2021; pp. 428–437. [Google Scholar] [CrossRef]
Lindblom, J.; Alenljung, B. The ANEMONE: Theoretical foundations for UX evaluation of action and intention recognition in human–robot interaction. Sensors 2020, 20, 4284. [Google Scholar] [CrossRef]
Awais, M.; Saeed, M.Y.; Malik, M.S.A.; Younas, M.; Asif, S.R.I. Intention-based comparative analysis of human–robot interaction. IEEE Access 2020, 8, 205821–205835. [Google Scholar] [CrossRef]
Fan, J.; Zheng, P.; Li, S. Vision-based holistic scene understanding towards proactive human–robot collaboration. Rob. Comput.-Integr. Manuf. 2022, 75, 102304. [Google Scholar] [CrossRef]
Conte, D.; Furukawa, T. Autonomous robotic escort incorporating motion prediction and human intention. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 3480–3486. [Google Scholar] [CrossRef]
Hongyi, L.; Lihui, W. Collision-free human–robot collaboration based on context awareness. Rob. Comput.-Integr. Manuf. 2021, 67, 102022. [Google Scholar] [CrossRef]
Schmitz, A. Human–robot collaboration in industrial automation: Sensors and algorithms. Sensors 2022, 22, 5848. [Google Scholar] [CrossRef] [PubMed]
Bi, Z.M.; Luo, C.; Miao, Z.; Zhang, B.; Zhang, W.J.; Wang, L. Safety assurance mechanisms of collaborative robotic systems in manufacturing. Rob. Comput. Integr. Manuf. 2021, 67, 102022. [Google Scholar] [CrossRef]
Goyal, R.; Kahou, S.E.; Michalski, V.; Materzynska, J.; Westphal, S.; Kim, H.; Haenel, V.; Fruend, I.; Yianilos, P.; Mueller-Freitag, M.; et al. The “Something Something” video database for learning and evaluating visual common sense. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5843–5851. [Google Scholar] [CrossRef]
Damen, D.; Doughty, H.; Farinella, G.M.; Fidler, S.; Furnari, A.; Kazakos, E.; Moltisanti, D.; Munro, J.; Perrett, T.; Price, W.; et al. Scaling egocentric vision: The EPIC-KITCHENS dataset. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 720–736. [Google Scholar] [CrossRef]
Dallel, M.; Havard, V.; Baudry, D.; Savatier, X. InHARD—Industrial human action recognition dataset in the context of industrial collaborative robotics. In Proceedings of the 2020 IEEE International Conference on Human–Machine Systems (ICHMS), Rome, Italy, 7–9 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar] [CrossRef]
Ullah, A.; Muhammad, K.; Del Ser, J.; Baik, S.W.; de Albuquerque, V.H.C. Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Trans. Ind. Electron. 2019, 66, 9692–9702. [Google Scholar] [CrossRef]
Li, S.; Fan, J.; Zheng, P.; Wang, L. Transfer learning-enabled action recognition for human–robot collaborative assembly. Procedia CIRP 2021, 104, 1795–1800. [Google Scholar] [CrossRef]
Fazli, M.; Kowsari, K.; Gharavi, E.; Barnes, L.; Doryab, A. HHAR-net: Hierarchical human activity recognition using neural networks. In Proceedings of the International Conference on Intelligent Human Computer Interaction (IHCI 2020), Cham, Switzerland, 24–26 November 2020; Springer: Cham, Switzerland, 2020; pp. 48–58. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Mazzia, V.; Angarano, S.; Salvetti, F.; Angelini, F.; Chiaberge, M. Action Transformer: A self-attention model for short-time pose-based human action recognition. Pattern Recognit. 2022, 124, 107760. [Google Scholar] [CrossRef]
Ul-Haq, A.; Akhtar, N.; Pogrebna, G.; Mian, A. Vision transformers for action recognition: A survey. arXiv 2022, arXiv:2209.05700. [Google Scholar] [CrossRef]
Kaseris, M.; Kostavelis, I.; Malassiotis, S. A comprehensive survey on deep learning methods in human activity recognition. Mach. Learn. Knowl. Extr. 2024, 6, 842–876. [Google Scholar] [CrossRef]
Babiarz, A.; Bugaj, M. Application of deep neural networks in recognition of selected types of objects in digital images. Appl. Sci. 2025, 15, 7931. [Google Scholar] [CrossRef]
Liu, L.; Li, W.; Moxley, B. AI-based classification of pediatric breath sounds: Toward a tool for early respiratory screening. Appl. Sci. 2025, 15, 7145. [Google Scholar] [CrossRef]
Sankaran, G.; Palomino, M.A.; Knahl, M.; Siestrup, G. Towards a system dynamics framework for human–machine learning decisions: A case study of New York Citi Bike. Appl. Sci. 2024, 14, 10647. [Google Scholar] [CrossRef]
Álvarez-Jiménez, M.; Calle-Jimenez, T.; Hernández-Álvarez, M. A comprehensive evaluation of features and simple machine learning algorithms for electroencephalographic-based emotion recognition. Appl. Sci. 2024, 14, 2228. [Google Scholar] [CrossRef]
Trstenjak, M.; Gregurić, P.; Janić, Ž.; Salaj, D. Integrated multilevel production planning solution according to Industry 5.0 principles. Appl. Sci. 2023, 14, 160. [Google Scholar] [CrossRef]
Ertem-Eray, T.; Cheng, Y. A review of artificial intelligence research in peer-reviewed communication journals. Appl. Sci. 2025, 15, 1058. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).