Human-AI Collaborative Interaction Design: Rethinking Human-Computer Symbiosis in the Age of Intelligent Systems

Special Issue Editor


E-Mail Website
Guest Editor
School of Design, Jiangnan University, Wuxi 214122, China
Interests: human-computer interaction; AI-assisted design; user perception and preference; design research methods; smart services and system usability; digital literacy and ethics

Special Issue Information

Dear Colleagues,

With the rapid advancement of generative AI, adaptive systems, and intelligent interfaces, the paradigm of human-computer interaction is shifting toward a model of greater collaboration between humans and artificial intelligence. This Special Issue focuses on human-AI collaborative interaction design, with the aim of exploring the ways in which design methods, system architectures, and user experiences are being redefined in this new era of human-machine co-evolution.

We welcome research that investigates AI-assisted creativity, user perceptions of algorithmic agency, ethical dimensions of human-AI collaboration, and system transparency. Interdisciplinary approaches that combine design research, behavioral science, and computational methods are particularly encouraged. Topics may include adaptive interfaces, co-creative tools, human-in-the-loop systems, and the role of AI literacy in shaping user empowerment and interaction effectiveness.

Dr. Qianling Jiang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Multimodal Technologies and Interaction is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • human-AI collaboration
  • co-creation systems
  • AI-augmented design
  • human-in-the-loop interaction
  • adaptive interfaces
  • generative AI
  • algorithmic agency
  • interaction transparency
  • user trust and ethics
  • AI literacy

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

17 pages, 1791 KB  
Article
AI-Enhanced Motion Capture for Multimodal Interaction in Chinese Shadow Puppetry Heritage
by Gaihua Wang, Hengchao Yun, Lixin Yang, Qingyuan Zheng and Tianmuran Liu
Multimodal Technol. Interact. 2026, 10(5), 46; https://doi.org/10.3390/mti10050046 - 28 Apr 2026
Abstract
This study examines how AI-enhanced motion capture (AI-MoCap) mediates the preservation, transmission, and re-creation of Chinese shadow puppetry as performative intangible cultural heritage. Through a state-of-the-art review and comparative analysis of three representative application models—technology-driven, culturally integrated, and entertainment-oriented—the paper explores how AI-MoCap [...] Read more.
This study examines how AI-enhanced motion capture (AI-MoCap) mediates the preservation, transmission, and re-creation of Chinese shadow puppetry as performative intangible cultural heritage. Through a state-of-the-art review and comparative analysis of three representative application models—technology-driven, culturally integrated, and entertainment-oriented—the paper explores how AI-MoCap supports the digitization of performative techniques while reshaping modes of cultural presentation and interaction. Cross-case comparison highlights recurring tensions between technical standardization and cultural authenticity while also indicating possibilities for symbolic reconstruction, contextual continuity, and ethically grounded design. Based on this comparison, the paper develops a dual-channel inheritance framework—“perception–symbol” and “design–performance”—and treats cultural resolution and digital ethics as analytical and normative principles for resisting algorithmic homogenization. Rather than functioning only as a digitization tool, AI-MoCap can be understood as a mediating mechanism whose cultural value depends on how it remains embedded in community-based performative logics, symbolic systems, and ethical boundaries. The resulting framework offers transferable guidance for future research, curation, training, and policy discussion in the digital safeguarding of performance-based heritage. Full article
Show Figures

Figure 1

23 pages, 2386 KB  
Article
Beyond the Classroom: Technology-Enabled Acceleration Models for Gifted Learners in the Digital Era
by Yusra Zaki Aboud
Multimodal Technol. Interact. 2026, 10(2), 17; https://doi.org/10.3390/mti10020017 - 4 Feb 2026
Viewed by 1236
Abstract
The digital era represents a paradigm shift in gifted education, moving at an accelerating pace away from traditional models toward flexible and personalized technology-based pathways. This study investigates the impact of a model implemented via the FutureX platform in Saudi Arabia on the [...] Read more.
The digital era represents a paradigm shift in gifted education, moving at an accelerating pace away from traditional models toward flexible and personalized technology-based pathways. This study investigates the impact of a model implemented via the FutureX platform in Saudi Arabia on the autonomy and self-regulated learning (SRL) of 63 gifted high school students. Using a quasi-experimental design, the study integrated quantitative measures (paired t-tests) with phenomenological analysis of interviews. The quantitative results showed statistically significant improvements (p < 0.001) in the dimensions of autonomy and self-regulated learning, with large Cohen’s d effect sizes for planning (d = 1.05), monitoring (d = 1.05), and cognitive control (d = 1.30). These gains were supported by a pedagogical design intentionally embedded within the platform to scaffold self-regulation. These findings were reinforced by qualitative results, with 88% of gifted students reporting that the platform provided appropriately challenging content and promoted self-learning and goal-setting behaviors. Full article
Show Figures

Figure 1

11 pages, 555 KB  
Article
Human–AI Feedback Loop for Pronunciation Training: A Mobile Application with Phoneme-Level Error Highlighting
by Aleksei Demin, Georgii Vorontsov and Dmitrii Chaikovskii
Multimodal Technol. Interact. 2026, 10(1), 2; https://doi.org/10.3390/mti10010002 - 26 Dec 2025
Viewed by 1634
Abstract
This paper presents an AI-augmented pronunciation training approach for Russian language learners through a mobile application that supports an interactive learner–system feedback loop. The system combines a pre-trained Wav2Vec2Phoneme neural network with Needleman–Wunsch global sequence alignment to convert reference and learner speech into [...] Read more.
This paper presents an AI-augmented pronunciation training approach for Russian language learners through a mobile application that supports an interactive learner–system feedback loop. The system combines a pre-trained Wav2Vec2Phoneme neural network with Needleman–Wunsch global sequence alignment to convert reference and learner speech into aligned phoneme sequences. Rather than producing an overall pronunciation score, the application provides localized, interpretable feedback by highlighting phoneme-level matches and mismatches in a red/green transcription, enabling learners to see where sounds were substituted, omitted, or added. Implemented as a WeChat Mini Program with a WebSocket-based backend, the design illustrates how speech-to-phoneme models and alignment procedures can be integrated into a lightweight mobile interface for autonomous pronunciation practice. We further provide a feature-level comparison with widely used commercial applications (Duolingo, HelloChinese, Babbel), emphasizing differences in feedback granularity and interpretability rather than unvalidated accuracy claims. Overall, the work demonstrates the feasibility of alignment-based phoneme-level feedback for mobile pronunciation training and motivates future evaluation of recognition reliability, latency, and learning outcomes on representative learner data. Full article
Show Figures

Figure 1

12 pages, 1597 KB  
Article
Cognitive Workload Assessment in Aerospace Scenarios: A Cross-Modal Transformer Framework for Multimodal Physiological Signal Fusion
by Pengbo Wang, Hongxi Wang and Heming Zhang
Multimodal Technol. Interact. 2025, 9(9), 89; https://doi.org/10.3390/mti9090089 - 26 Aug 2025
Cited by 2 | Viewed by 1885
Abstract
In the field of cognitive workload assessment for aerospace training, existing methods exhibit significant limitations in unimodal feature extraction and in leveraging complementary synergy among multimodal signals, while current fusion paradigms struggle to effectively capture nonlinear dynamic coupling characteristics across modalities. This study [...] Read more.
In the field of cognitive workload assessment for aerospace training, existing methods exhibit significant limitations in unimodal feature extraction and in leveraging complementary synergy among multimodal signals, while current fusion paradigms struggle to effectively capture nonlinear dynamic coupling characteristics across modalities. This study proposes DST-Net (Cross-Modal Downsampling Transformer Network), which synergistically integrates pilots’ multimodal physiological signals (electromyography, electrooculography, electrodermal activity) with flight dynamics data through an Anti-Aliasing and Average Pooling LSTM (AAL-LSTM) data fusion strategy combined with cross-modal attention mechanisms. Evaluation on the “CogPilot” dataset for flight task difficulty prediction demonstrates that AAL-LSTM achieves substantial performance improvements over existing approaches (AUC = 0.97, F1 Score = 94.55). Given the dataset’s frequent sensor data missingness, the study further enhances simulated flight experiments. By incorporating eye-tracking features via cross-modal attention mechanisms, the upgraded DST-Net framework achieves even higher performance (AUC = 0.998, F1 Score = 97.95) and reduces the root mean square error (RMSE) of cumulative flight error prediction to 1750. These advancements provide critical support for safety-critical aviation training systems. Full article
Show Figures

Figure 1

25 pages, 19135 KB  
Article
Development of a Multi-Platform AI-Based Software Interface for the Accompaniment of Children
by Isaac León, Camila Reyes, Iesus Davila, Bryan Puruncajas, Dennys Paillacho, Nayeth Solorzano, Marcelo Fajardo-Pruna, Hyungpil Moon and Francisco Yumbla
Multimodal Technol. Interact. 2025, 9(9), 88; https://doi.org/10.3390/mti9090088 - 26 Aug 2025
Viewed by 2321
Abstract
The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. [...] Read more.
The absence of parental presence has a direct impact on the emotional stability and social routines of children, especially during extended periods of separation from their family environment, as in the case of daycare centers, hospitals, or when they remain alone at home. At the same time, the technology currently available to provide emotional support in these contexts remains limited. In response to the growing need for emotional support and companionship in child care, this project proposes the development of a multi-platform software architecture based on artificial intelligence (AI), designed to be integrated into humanoid robots that assist children between the ages of 6 and 14. The system enables daily verbal and non-verbal interactions intended to foster a sense of presence and personalized connection through conversations, games, and empathetic gestures. Built on the Robot Operating System (ROS), the software incorporates modular components for voice command processing, real-time facial expression generation, and joint movement control. These modules allow the robot to hold natural conversations, display dynamic facial expressions on its LCD (Liquid Crystal Display) screen, and synchronize gestures with spoken responses. Additionally, a graphical interface enhances the coherence between dialogue and movement, thereby improving the quality of human–robot interaction. Initial evaluations conducted in controlled environments assessed the system’s fluency, responsiveness, and expressive behavior. Subsequently, it was implemented in a pediatric hospital in Guayaquil, Ecuador, where it accompanied children during their recovery. It was observed that this type of artificial intelligence-based software, can significantly enhance the experience of children, opening promising opportunities for its application in clinical, educational, recreational, and other child-centered settings. Full article
Show Figures

Graphical abstract

14 pages, 412 KB  
Article
Do Novices Struggle with AI Web Design? An Eye-Tracking Study of Full-Site Generation Tools
by Chen Chu, Jianan Zhao and Zhanxun Dong
Multimodal Technol. Interact. 2025, 9(9), 85; https://doi.org/10.3390/mti9090085 - 22 Aug 2025
Viewed by 1887
Abstract
AI-powered full-site web generation tools promise to democratize website creation for novice users. However, their actual usability and accessibility for novice users remain insufficiently studied. This study examines interaction barriers faced by novice users when using Wix ADI to complete three tasks: Task [...] Read more.
AI-powered full-site web generation tools promise to democratize website creation for novice users. However, their actual usability and accessibility for novice users remain insufficiently studied. This study examines interaction barriers faced by novice users when using Wix ADI to complete three tasks: Task 1 (onboarding), Task 2 (template customization), and Task 3 (product page creation). Twelve participants with no web design background were recruited to perform these tasks while their behavior was recorded via screen capture and eye-tracking (Tobii Glasses 2), supplemented by post-task interviews. Task completion rates declined significantly in Task 2 (66.67%) and 3 (33.33%). Help-seeking behaviors increased significantly, particularly during template customization and product page creation. Eye-tracking data indicated elevated cognitive load in later tasks, with fixation count and saccade count peaking in Task 2 and pupil diameter peaking in Task 3. Qualitative feedback identified core challenges such as interface ambiguity, limited transparency in AI control, and disrupted task logic. These findings reveal a gap between AI tool affordances and novice user needs, underscoring the importance of interface clarity, editable transparency, and adaptive guidance. As full-site generators increasingly target general users, lowering barriers for novice audiences is essential for equitable access to web creation. Full article
Show Figures

Figure 1

Review

Jump to: Research

28 pages, 515 KB  
Review
From Cues to Engagement: A Comprehensive Survey and Holistic Architecture for Computer Vision-Based Audience Analysis in Live Events
by Marco Lemos, Pedro J. S. Cardoso and João M. F. Rodrigues
Multimodal Technol. Interact. 2026, 10(1), 8; https://doi.org/10.3390/mti10010008 - 8 Jan 2026
Viewed by 1296
Abstract
The accurate measurement of audience engagement in real-world live events remains a significant challenge, with the majority of existing research confined to controlled environments like classrooms. This paper presents a comprehensive survey of Computer Vision AI-driven methods for real-time audience engagement monitoring and [...] Read more.
The accurate measurement of audience engagement in real-world live events remains a significant challenge, with the majority of existing research confined to controlled environments like classrooms. This paper presents a comprehensive survey of Computer Vision AI-driven methods for real-time audience engagement monitoring and proposes a novel, holistic architecture to address this gap, with this architecture being the main contribution of the paper. The paper identifies and defines five core constructs essential for a robust analysis: Attention, Emotion and Sentiment, Body Language, Scene Dynamics, and Behaviours. Through a selective review of state-of-the-art techniques for each construct, the necessity of a multimodal approach that surpasses the limitations of isolated indicators is highlighted. The work synthesises a fragmented field into a unified taxonomy and introduces a modular architecture that integrates these constructs with practical, business-oriented metrics such as Commitment, Conversion, and Retention. Finally, by integrating cognitive, affective, and behavioural signals, this work provides a roadmap for developing operational systems that can transform live event experience and management through data-driven, real-time analytics. Full article
Show Figures

Figure 1

Back to TopTop