sensors-logo

Journal Browser

Journal Browser

Multimodal Human Behavior Understanding in Human–AI Interaction: Sensor-Based Signal Processing and Interaction Techniques

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensors and Robotics".

Deadline for manuscript submissions: 31 August 2026 | Viewed by 4989

Special Issue Editors


E-Mail Website
Guest Editor
1. Sydney Smart Technology College, Northeastern University, Shenyang, China
2. Graduate School of Engineering Science, Osaka University, Toyonaka, Japan
Interests: affective computing; deep learning; human–robot interaction
School of Computer Science and Engineering, Northeastern University, Shenyang, China
Interests: brain–computer interface; AI accelerator; customized chip design

E-Mail Website
Guest Editor
School of Information Science and Engineering, Northeastern University, Shenyang, China
Interests: big data; Internet of Things; deep learning; smart sensors; BioMEMS (BioMEMS)
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the contemporary era of human–AI interaction, the capacity to interpret human behavior through multimodal signals has emerged as a pivotal research frontier. This Special Issue, titled “Multimodal Human Behavior Understanding in Human–AI Interaction: Sensor-Based Signal Processing and Interaction Techniques”, is dedicated to advancing the scientific and technological paradigms that underpin the comprehension of human actions and emotions within interactive AI systems. By leveraging sophisticated signal processing techniques and innovative interaction methods, this Special Issue seeks to elucidate the complex interplay between humans and AI, thereby enhancing the efficacy and intuitiveness of these interactions. Contributions will explore the integration of emotional computing, multimodal human behavior understanding, and the application of wearable sensors and brain–computer interfaces, collectively aiming to foster a more seamless and responsive human–AI collaborative framework.

Dr. Changzeng Fu
Dr. Shiqi Zhao
Dr. Yuliang Zhao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multimodal signal processing
  • human behavior understanding
  • affective computing
  • cognitive computing
  • wearable sensors
  • brain–computer interfaces
  • human–AI interaction
  • signal processing techniques
  • interaction methods
  • real-time systems
  • user experience

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

27 pages, 1460 KB  
Article
Multimodal Cognitive Architecture with Local Generative AI for Industrial Control of Concrete Plants on Edge Devices
by Fernando Hidalgo-Castelo, Antonio Guerrero-González, Francisco García-Córdova, Francisco Lloret-Abrisqueta and Carlos Torregrosa Bonet
Sensors 2025, 25(24), 7540; https://doi.org/10.3390/s25247540 - 11 Dec 2025
Viewed by 681
Abstract
Accessing operational information across industrial systems (ERP, MES, SCADA, PLC) in concrete plants requires 15–30 min and specialized knowledge. This work addresses this accessibility gap by developing a conversational AI system that democratizes industrial information access through natural language. A five-layer cognitive architecture [...] Read more.
Accessing operational information across industrial systems (ERP, MES, SCADA, PLC) in concrete plants requires 15–30 min and specialized knowledge. This work addresses this accessibility gap by developing a conversational AI system that democratizes industrial information access through natural language. A five-layer cognitive architecture was implemented integrating the Mistral-7B model quantized in GGUF Q4_0 format (3.82 GB) on a Raspberry Pi 5, Spanish speech recognition/synthesis, and heterogeneous industrial protocols (OPC UA, MQTT, REST API) across all automation pyramid levels. Experimental validation at Frumecar S.L. (Murcia, Spain) characterized performance, thermal stability, and reliability. Results show response times of 14.19 s (simple queries, SD = 7.56 s), 16.45 s (moderate, SD = 6.40 s), and 23.24 s (complex multilevel, SD = 6.59 s), representing 26–77× improvement over manual methods. The system maintained average temperature of 69.3 °C (peak 79.6 °C), preserving 5.4 °C margin below throttling threshold. Communication latencies averaged 8.93 ms across 10,163 readings (<1% of total latency). During 30 min of autonomous operation, 100% reliability was achieved with 39 successful queries. These findings demonstrate the viability of deploying quantized LLMs on low-cost edge hardware, enabling cognitive democratization of industrial information while ensuring data privacy and cloud independence. Full article
Show Figures

Figure 1

28 pages, 3441 KB  
Article
Which AI Sees Like Us? Investigating the Cognitive Plausibility of Language and Vision Models via Eye-Tracking in Human-Robot Interaction
by Khashayar Ghamati, Maryam Banitalebi Dehkordi and Abolfazl Zaraki
Sensors 2025, 25(15), 4687; https://doi.org/10.3390/s25154687 - 29 Jul 2025
Viewed by 1555
Abstract
As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception [...] Read more.
As large language models (LLMs) and vision–language models (VLMs) become increasingly used in robotics area, a crucial question arises: to what extent do these models replicate human-like cognitive processes, particularly within socially interactive contexts? Whilst these models demonstrate impressive multimodal reasoning and perception capabilities, their cognitive plausibility remains underexplored. In this study, we address this gap by using human visual attention as a behavioural proxy for cognition in a naturalistic human-robot interaction (HRI) scenario. Eye-tracking data were previously collected from participants engaging in social human-human interactions, providing frame-level gaze fixations as a human attentional ground truth. We then prompted a state-of-the-art VLM (LLaVA) to generate scene descriptions, which were processed by four LLMs (DeepSeek-R1-Distill-Qwen-7B, Qwen1.5-7B-Chat, LLaMA-3.1-8b-instruct, and Gemma-7b-it) to infer saliency points. Critically, we evaluated each model in both stateless and memory-augmented (short-term memory, STM) modes to assess the influence of temporal context on saliency prediction. Our results presented that whilst stateless LLaVA most closely replicates human gaze patterns, STM confers measurable benefits only for DeepSeek, whose lexical anchoring mirrors human rehearsal mechanisms. Other models exhibited degraded performance with memory due to prompt interference or limited contextual integration. This work introduces a novel, empirically grounded framework for assessing cognitive plausibility in generative models and underscores the role of short-term memory in shaping human-like visual attention in robotic systems. Full article
Show Figures

Figure 1

24 pages, 3409 KB  
Article
DepressionMIGNN: A Multiple-Instance Learning-Based Depression Detection Model with Graph Neural Networks
by Shiwen Zhao, Yunze Zhang, Yikai Su, Kaifeng Su, Jiemin Liu, Tao Wang and Shiqi Yu
Sensors 2025, 25(14), 4520; https://doi.org/10.3390/s25144520 - 21 Jul 2025
Viewed by 1567
Abstract
The global prevalence of depression necessitates the application of technological solutions, particularly sensor-based systems, to augment scarce resources for early diagnostic purposes. In this study, we use benchmark datasets that contain multimodal data including video, audio, and transcribed text. To address depression detection [...] Read more.
The global prevalence of depression necessitates the application of technological solutions, particularly sensor-based systems, to augment scarce resources for early diagnostic purposes. In this study, we use benchmark datasets that contain multimodal data including video, audio, and transcribed text. To address depression detection as a chronic long-term disorder reflected by temporal behavioral patterns, we propose a novel framework that segments videos into utterance-level instances using GRU for contextual representation, and then constructs graphs where utterance embeddings serve as nodes connected through dual relationships capturing both chronological development and intermittent relevant information. Graph neural networks are employed to learn multi-dimensional edge relationships and align multimodal representations across different temporal dependencies. Our approach achieves superior performance with an MAE of 5.25 and RMSE of 6.75 on AVEC2014, and CCC of 0.554 and RMSE of 4.61 on AVEC2019, demonstrating significant improvements over existing methods that focus primarily on momentary expressions. Full article
Show Figures

Figure 1

17 pages, 1691 KB  
Article
Towards Explainable Graph Embeddings for Gait Assessment Using Per-Cluster Dimensional Weighting
by Chris Lochhead and Robert B. Fisher
Sensors 2025, 25(13), 4106; https://doi.org/10.3390/s25134106 - 30 Jun 2025
Viewed by 618
Abstract
As gaitpathology assessment systems improve both in accuracy and efficiency, the prospect of using these systems in real healthcare applications is becoming more realistic. Although gait analysis systems have proven capable of detecting gait abnormalities in supervised tasks in laboratories and clinics, there [...] Read more.
As gaitpathology assessment systems improve both in accuracy and efficiency, the prospect of using these systems in real healthcare applications is becoming more realistic. Although gait analysis systems have proven capable of detecting gait abnormalities in supervised tasks in laboratories and clinics, there is comparatively little investigation into making such systems explainable to healthcare professionals who would use gait analysis in practice in home-based settings. There is a “black box” problem with existing machine learning models, where healthcare professionals are expected to “trust” the model making diagnoses without understanding its underlying reasoning. To address this applicational barrier, an end-to-end pipeline is introduced here for creating graph feature embeddings, generated using a bespoke Spatio-temporal Graph Convolutional Network and per-joint Principal Component Analysis. The latent graph embeddings produced by this framework led to a novel semi-supervised weighting function which quantifies and ranks the most important joint features, which are used to provide a description for each pathology. Using these embeddings with a K-means clustering approach, the proposed method also outperforms the state of the art by between 4.53 and 16% in classification accuracy across three datasets with a total of 14 different simulated gait pathologies from minor limping to ataxic gait. The resulting system provides a workable improvement to at-home gait assessment applications by providing accurate and explainable descriptions of the nature of detected gait abnormalities without need of prior labeled descriptions of detected pathologies. Full article
Show Figures

Graphical abstract

Back to TopTop