sensors-logo

Journal Browser

Journal Browser

Multimodal Human Behavior Understanding in Human–AI Interaction: Sensor-Based Signal Processing and Interaction Techniques

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensors and Robotics".

Deadline for manuscript submissions: 31 December 2025 | Viewed by 1430

Special Issue Editors


E-Mail Website
Guest Editor
1. Sydney Smart Technology College, Northeastern University, Shenyang, China
2. Graduate School of Engineering Science, Osaka University, Toyonaka, Japan
Interests: affective computing; deep learning; human–robot interaction
School of Computer Science and Engineering, Northeastern University, Shenyang, China
Interests: brain–computer interface; AI accelerator; customized chip design

E-Mail Website
Guest Editor
School of Information Science and Engineering, Northeastern University, Shenyang, China
Interests: big data; Internet of Things; deep learning; smart sensors; BioMEMS (BioMEMS)
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the contemporary era of human–AI interaction, the capacity to interpret human behavior through multimodal signals has emerged as a pivotal research frontier. This Special Issue, titled “Multimodal Human Behavior Understanding in Human–AI Interaction: Sensor-Based Signal Processing and Interaction Techniques”, is dedicated to advancing the scientific and technological paradigms that underpin the comprehension of human actions and emotions within interactive AI systems. By leveraging sophisticated signal processing techniques and innovative interaction methods, this Special Issue seeks to elucidate the complex interplay between humans and AI, thereby enhancing the efficacy and intuitiveness of these interactions. Contributions will explore the integration of emotional computing, multimodal human behavior understanding, and the application of wearable sensors and brain–computer interfaces, collectively aiming to foster a more seamless and responsive human–AI collaborative framework.

Dr. Changzeng Fu
Dr. Shiqi Zhao
Dr. Yuliang Zhao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multimodal signal processing
  • human behavior understanding
  • affective computing
  • cognitive computing
  • wearable sensors
  • brain–computer interfaces
  • human–AI interaction
  • signal processing techniques
  • interaction methods
  • real-time systems
  • user experience

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 3441 KiB  
Article
Which AI Sees Like Us? Investigating the Cognitive Plausibility of Language and Vision Models via Eye-Tracking in Human-Robot Interaction
by Khashayar Ghamati, Maryam Banitalebi Dehkordi and Abolfazl Zaraki
Sensors 2025, 25(15), 4687; https://doi.org/10.3390/s25154687 - 29 Jul 2025
Viewed by 429
Abstract
As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception [...] Read more.
As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception capabilities, their cognitive plausibility remains underexplored. In this study, we address this gap by using human visual attention as a behavioural proxy for cognition in a naturalistic human-robot interaction (HRI) scenario. Eye-tracking data were previously collected from participants engaging in social human-human interactions, providing frame-level gaze fixations as a human attentional ground truth. We then prompted a state-of-the-art VLM (LLaVA) to generate scene descriptions, which were processed by four LLMs (DeepSeek-R1-Distill-Qwen-7B, Qwen1.5-7B-Chat, LLaMA-3.1-8b-instruct, and Gemma-7b-it) to infer saliency points. Critically, we evaluated each model in both stateless and memory-augmented (short-term memory, STM) modes to assess the influence of temporal context on saliency prediction. Our results presented that whilst stateless LLaVA most closely replicates human gaze patterns, STM confers measurable benefits only for DeepSeek, whose lexical anchoring mirrors human rehearsal mechanisms. Other models exhibited degraded performance with memory due to prompt interference or limited contextual integration. This work introduces a novel, empirically grounded framework for assessing cognitive plausibility in generative models and underscores the role of short-term memory in shaping human-like visual attention in robotic systems. Full article
Show Figures

Figure 1

24 pages, 3409 KiB  
Article
DepressionMIGNN: A Multiple-Instance Learning-Based Depression Detection Model with Graph Neural Networks
by Shiwen Zhao, Yunze Zhang, Yikai Su, Kaifeng Su, Jiemin Liu, Tao Wang and Shiqi Yu
Sensors 2025, 25(14), 4520; https://doi.org/10.3390/s25144520 - 21 Jul 2025
Viewed by 429
Abstract
The global prevalence of depression necessitates the application of technological solutions, particularly sensor-based systems, to augment scarce resources for early diagnostic purposes. In this study, we use benchmark datasets that contain multimodal data including video, audio, and transcribed text. To address depression detection [...] Read more.
The global prevalence of depression necessitates the application of technological solutions, particularly sensor-based systems, to augment scarce resources for early diagnostic purposes. In this study, we use benchmark datasets that contain multimodal data including video, audio, and transcribed text. To address depression detection as a chronic long-term disorder reflected by temporal behavioral patterns, we propose a novel framework that segments videos into utterance-level instances using GRU for contextual representation, and then constructs graphs where utterance embeddings serve as nodes connected through dual relationships capturing both chronological development and intermittent relevant information. Graph neural networks are employed to learn multi-dimensional edge relationships and align multimodal representations across different temporal dependencies. Our approach achieves superior performance with an MAE of 5.25 and RMSE of 6.75 on AVEC2014, and CCC of 0.554 and RMSE of 4.61 on AVEC2019, demonstrating significant improvements over existing methods that focus primarily on momentary expressions. Full article
Show Figures

Figure 1

17 pages, 1691 KiB  
Article
Towards Explainable Graph Embeddings for Gait Assessment Using Per-Cluster Dimensional Weighting
by Chris Lochhead and Robert B. Fisher
Sensors 2025, 25(13), 4106; https://doi.org/10.3390/s25134106 - 30 Jun 2025
Viewed by 284
Abstract
As gaitpathology assessment systems improve both in accuracy and efficiency, the prospect of using these systems in real healthcare applications is becoming more realistic. Although gait analysis systems have proven capable of detecting gait abnormalities in supervised tasks in laboratories and clinics, there [...] Read more.
As gaitpathology assessment systems improve both in accuracy and efficiency, the prospect of using these systems in real healthcare applications is becoming more realistic. Although gait analysis systems have proven capable of detecting gait abnormalities in supervised tasks in laboratories and clinics, there is comparatively little investigation into making such systems explainable to healthcare professionals who would use gait analysis in practice in home-based settings. There is a “black box” problem with existing machine learning models, where healthcare professionals are expected to “trust” the model making diagnoses without understanding its underlying reasoning. To address this applicational barrier, an end-to-end pipeline is introduced here for creating graph feature embeddings, generated using a bespoke Spatio-temporal Graph Convolutional Network and per-joint Principal Component Analysis. The latent graph embeddings produced by this framework led to a novel semi-supervised weighting function which quantifies and ranks the most important joint features, which are used to provide a description for each pathology. Using these embeddings with a K-means clustering approach, the proposed method also outperforms the state of the art by between 4.53 and 16% in classification accuracy across three datasets with a total of 14 different simulated gait pathologies from minor limping to ataxic gait. The resulting system provides a workable improvement to at-home gait assessment applications by providing accurate and explainable descriptions of the nature of detected gait abnormalities without need of prior labeled descriptions of detected pathologies. Full article
Show Figures

Graphical abstract

Back to TopTop