MDPI - Publisher of Open Access Journals

20 pages, 4288 KB

Open AccessArticle

A Prompt-Driven Vision-Language Framework for Deictic Interpretation in Human-Robot Handover

by Jimin Byeon, Song Min Ryu and Kyu Min Park

Actuators 2026, 15(6), 345; https://doi.org/10.3390/act15060345 - 18 Jun 2026

Viewed by 144

Recent advancements in Vision-Language Models (VLMs) have enabled robotic systems to leverage model-based understanding and reasoning over visual and linguistic inputs, offering a promising approach for interpreting user intent in human–robot interaction (HRI). In particular, deictic expressions commonly used in object handovers, such [...] Read more.

Recent advancements in Vision-Language Models (VLMs) have enabled robotic systems to leverage model-based understanding and reasoning over visual and linguistic inputs, offering a promising approach for interpreting user intent in human–robot interaction (HRI). In particular, deictic expressions commonly used in object handovers, such as “take this” and “give me that”, cannot be fully interpreted through language alone and require a comprehensive understanding of the speaker’s perspective and the environment. This study proposes a prompt-driven vision-language framework for deictic interpretation in human–robot handover. The system integrates a pre-trained VLM with a hierarchical prompt that decomposes reasoning into intent classification, spatio-temporal grounding, and output self-validation, enabling accurate identification of target objects and goal locations without model fine-tuning. Experimental results demonstrate 100% command interpretation accuracy across multiple interaction scenarios, including pick-and-place tasks, robot-to-human and human-to-robot handovers, and temporal deictic commands. Notably, the system operates under a prompt–command language mismatch, accurately interpreting Korean commands while being guided by English-based prompts. Analysis across progressive system configurations further demonstrates that structured prompting plays a critical role in reasoning performance. These results highlight the effectiveness of a prompt-driven approach for deictic interpretation and spatio-temporal grounding, providing a practical training-free framework for HRI. Full article

(This article belongs to the Special Issue Human-Centered Actuation: Algorithms, Design, and Robotic Applications)

► Show Figures

Figure 1

14 pages, 1171 KB

Open AccessArticle

Healthcare Resource Utilisation in Patients with Upper Tract Urothelial Carcinoma

by Sanjib Saha, Tove Sundström, Johannes Bobjer, Fredrik Liedberg and Elin Ståhl

Healthcare 2026, 14(12), 1729; https://doi.org/10.3390/healthcare14121729 - 16 Jun 2026

Viewed by 146

Abstract

Background: Upper tract urothelial carcinoma (UTUC) is rare, and contemporary data on real-world healthcare resource utilisation and costs are limited. The objective of this study is to describe long-term healthcare resource utilisation among patients with upper tract urothelial carcinoma (UTUC) and to identify [...] Read more.

Background: Upper tract urothelial carcinoma (UTUC) is rare, and contemporary data on real-world healthcare resource utilisation and costs are limited. The objective of this study is to describe long-term healthcare resource utilisation among patients with upper tract urothelial carcinoma (UTUC) and to identify clinical and treatment-related drivers of costs. Methods: We conducted a retrospective, population-based cohort study including all patients diagnosed with UTUC between 2019 and 2023 in Region Skåne, Sweden. Patients were identified through the Swedish National Register for Urinary Bladder Cancer (SNRUBC) and linked to regional healthcare databases covering primary, secondary, and tertiary care. The primary outcome was annual direct healthcare cost per patient, derived from Diagnosis-Related Group (DRG) cost data and expressed in 2023 international dollars (Int$). Secondary outcomes were cost patterns and predictors stratified by treatment modality: robot-assisted nephroureterectomy (RANU), open nephroureterectomy (ONU), segmental ureterectomy (SU), and endourological treatment (ET). Results: Among 278 included patients, most were older adults and/or with substantial comorbidity, and over half underwent radical nephroureterectomy. The adjusted mean annual cost was Int$36,870 in 2019, decreasing to Int$30,004 in 2023. In the subgroup treated with ONU, systemic treatment was associated with a higher adjusted cost ratio and in the subgroup operated with SU, female sex was associated with a higher adjusted cost ratio. Comorbidity was a cost driver in the ET subgroup. Conclusions: UTUC care in this Swedish region has become less resource-intensive over a short period. These results can provide a basis for planning UTUC services and highlight targets for cost-conscious, patient-centred optimisation of care. Full article

► Show Figures

Figure 1

27 pages, 5312 KB

Open AccessArticle

MEGNet: A Multi-Scale Edge Geometry-Aware Network for Green Plum Detection in Picking Orchard Environment

by Wanqiang Huang, Jing Wang, Shuo Zhang, Tianhua Chen, Chen Zhao, Guoyu Huang and Yang Zhou

Horticulturae 2026, 12(6), 682; https://doi.org/10.3390/horticulturae12060682 - 31 May 2026

Viewed by 843

Abstract

In response to the challenges of large fruit-scale variation, dense target distribution, severe leaf occlusion, and complex backgrounds in green plum detection within orchards, this paper proposes a lightweight multi-scale edge geometry-aware network (MEGNet). First, the Green Plum Detection Dataset (GPD) is constructed [...] Read more.

In response to the challenges of large fruit-scale variation, dense target distribution, severe leaf occlusion, and complex backgrounds in green plum detection within orchards, this paper proposes a lightweight multi-scale edge geometry-aware network (MEGNet). First, the Green Plum Detection Dataset (GPD) is constructed to provide realistic orchard scene data for the task. Next, we enhance the model’s structure based on YOLO11n by designing an efficient multi-scale feature fusion attention module (EMFFA) to improve the expression of multi-scale fruit features. We also introduce a color-edge guided dual-discriminator feature enhancement module (CED) to strengthen feature discrimination in complex backgrounds. A coordinate attention ghost detection head (CAGDetect) is proposed to reduce model parameters and computational complexity. Additionally, a geometry-consistency modulated CIoU loss function (GC-CIoU) is introduced to improve target localization stability in occluded and dense scenes by incorporating a geometric consistency modulation mechanism. Experimental results show that on the GPD, MEGNet achieves a Precision of 93.9%, Recall of 86.2%, mAP₅₀ of 93.2%, and mAP_50:95 of 76.1%. The model’s Parameters are only 2.13 M, with FLOPs of 4.7 G. Compared to the baseline YOLO11n model, Precision, Recall, mAP₅₀, and mAP_50:95 are improved by 2.5%, 5.2%, 4.4%, and 4.6%, respectively. Additionally, deployment experiments on the Jetson Orin Nano embedded device demonstrate real-time detection speeds of 31–33 FPS. The proposed method provides an efficient and reliable solution for intelligent harvesting systems, orchard monitoring platforms, and agricultural robot vision perception. Full article

(This article belongs to the Section Fruit Production Systems)

► Show Figures

Figure 1

49 pages, 3542 KB

Open AccessPerspective

The DIME Architecture: A Unified Operational Algorithm for Neural Representation, Dynamics, Control and Integration

by Ionel Cristian Vladu, Nicu George Bîzdoacă, Ionica Pirici, Tudor-Adrian Bălșeanu and Eduard Nicușor Bondoc

Appl. Sci. 2026, 16(11), 5380; https://doi.org/10.3390/app16115380 - 27 May 2026

Viewed by 476

Abstract

Contemporary neuroscience has generated extensive empirical insights into perception, memory, prediction, valuation, and consciousness. However, it still lacks an explicit operational architecture capable of explaining how these processes emerge from a unified computational mechanism. This work introduces DIME (Detect–Integrate–Mark–Execute), a unified operational architecture [...] Read more.

Contemporary neuroscience has generated extensive empirical insights into perception, memory, prediction, valuation, and consciousness. However, it still lacks an explicit operational architecture capable of explaining how these processes emerge from a unified computational mechanism. This work introduces DIME (Detect–Integrate–Mark–Execute), a unified operational architecture in which perception, memory, valuation, and conscious access are treated as components of a single recurrent computational cycle. The framework is organized around four core elements: engrams, defined as distributed recurrent neural structures that support multiple activation trajectories rather than static memory traces; execution threads, representing temporally extended, causally coherent trajectories of neural activity; marker systems, corresponding to neuromodulatory and limbic mechanisms that regulate value, selection, plasticity, and trajectory competition; and hyperengrams, large-scale integrative states associated with global coordination and conscious access. Within this formulation, DIME provides a mapping between local neural assemblies, temporal sequence dynamics, value-based modulation, and large-scale network integration. Rather than treating perception, memory, and decision-making as partially independent processes, the framework interprets them as different expressions of a single operational loop acting across multiple spatial and temporal scales. The proposed architecture is consistent with empirical findings on hippocampal indexing, recurrent cortical processing, neuromodulatory control, and large-scale network dynamics, while remaining sufficiently general to support applications in artificial intelligence and robotics. Unlike frameworks centered on prediction, memory storage, or global broadcasting, DIME proposes that cognition arises from the recurrent interaction between executable representational structures, trajectory-based processing, value-guided selection, and dynamic large-scale integration. The framework generates explicit and falsifiable predictions regarding context-dependent neural trajectories, marker-mediated state transitions, and large-scale network reconfiguration. In this sense, DIME is not intended as a metaphorical synthesis, but as a testable architectural hypothesis for neuroscience and biologically inspired cognitive systems. Beyond theoretical neuroscience, the framework is also positioned as a transferable design-level reference model for adaptive AI systems, autonomous robotics, and cognitively informed engineering architectures operating in dynamic environments. Full article

(This article belongs to the Special Issue Neural Networks and Brain Science: Structural Modeling, Functional Dynamics and Applied Perspectives)

► Show Figures

Figure 1

27 pages, 12201 KB

Open AccessArticle

LLM-Orchestrated Framework for Multifunctional Robotic Health Attendant (RHA) in Healthcare Environments

by Kyungki Kim, Irfan Gazi, John Windle, Christian Haas, Melissa Christian, Tom Windle, Nicholas Armstrong, Logan Doorlag and Tuankhanh Dao

Appl. Sci. 2026, 16(11), 5320; https://doi.org/10.3390/app16115320 - 26 May 2026

Viewed by 364

Abstract

Despite recent advances in healthcare robotics, most existing systems remain limited to single-purpose functions and lack the flexibility to collaborate dynamically with clinicians and facility systems. To address these limitations, this study presents an LLM-orchestrated framework for a multifunctional Robotic Health Attendant (RHA) [...] Read more.

Despite recent advances in healthcare robotics, most existing systems remain limited to single-purpose functions and lack the flexibility to collaborate dynamically with clinicians and facility systems. To address these limitations, this study presents an LLM-orchestrated framework for a multifunctional Robotic Health Attendant (RHA) that enables robot actions and environment interactions to be coordinated in healthcare environments. Within this framework, the RHA functions as a multifunctional nursing assistant capable of performing physical, communicative, and informational tasks through natural-language interaction. Tasks are expressed in natural language and decomposed into coordinated behaviors across three functional branches: physical, for navigation, object manipulation, and delivering medication; communicational, for dialog with patients and clinicians; and informational, for retrieving and summarizing clinical knowledge, such as patient education on complex heart transplant procedures. The framework integrates multiple Large Language Models (LLMs) and sensing nodes to combine facility data, patient information, and clinician commands, enabling robots and building systems to act in a context-aware manner through coordinated task execution across robotic and environmental components. Implemented in a simulated environment, the framework demonstrates the feasibility of executing representative tasks through LLM-based orchestration, serving as a proof-of-concept toward integrated robotic assistance in healthcare settings. Full article

(This article belongs to the Special Issue Autonomous Systems in Cyber-Physical Systems and Smart Industry: Innovations and Challenges, 3rd Edition)

► Show Figures

Figure 1

28 pages, 4216 KB

Open AccessArticle

Context-Awareness and Biologically Inspired Behaviour Based on Attention Mechanisms for Natural Human-Robot Interaction

by Jesús García-Martínez, Marcos Maroto-Gómez, Arecia Segura-Bencomo, José Carlos Castillo and María Malfaz

Biomimetics 2026, 11(5), 341; https://doi.org/10.3390/biomimetics11050341 - 14 May 2026

Viewed by 565

Abstract

The way robots represent the environment, make decisions, and express themselves can positively influence human–robot interaction if they clearly communicate their intentions and needs. To improve human–robot communication, biologically inspired models that mimic human communication skills, including task and scenario-specific contextual information, can [...] Read more.

The way robots represent the environment, make decisions, and express themselves can positively influence human–robot interaction if they clearly communicate their intentions and needs. To improve human–robot communication, biologically inspired models that mimic human communication skills, including task and scenario-specific contextual information, can facilitate mutual understanding and successful task execution. This paper presents a Context-Awareness and Biologically Inspired Behaviour system to generate a more natural human–robot interaction. The architecture combines sensory information processed by a Joint Attention System that prioritises stimuli based on internal processes with task-related motivations to generate context- and goal-adapted verbal and non-verbal interaction. We evaluate the system through a video-based user study that compares two robots with similar appearances but different behaviours, one using the proposed approach and the other not using the internal state and joint attention mechanisms, to make verbal and non-verbal responses. The results show that participants rated the robot endowed with the proposed system as significantly more sociable, agentic, and animated than the robot without it. Additionally, the robot not showing the responses developed in this work was perceived as more disturbing than the robot integrating the proposed system. Full article

(This article belongs to the Special Issue Intelligent Human–Robot Interaction: 5th Edition)

► Show Figures

Figure 1

33 pages, 22507 KB

Open AccessArticle

A Lightweight Vision-Based Emotion Sensing Framework for Assistive Healthcare Robotics

by Hosam Zolfonoon, Helder Jesus Araújo and Lino Marques

Sensors 2026, 26(9), 2865; https://doi.org/10.3390/s26092865 - 3 May 2026

Viewed by 1601

Abstract

Facial expression recognition (FER) for assistive and telepresence robotics remains challenging under resource-constrained conditions because landmark normalization is often unstable, many datasets have limited variability, and full facial landmark sets introduce redundancy. This paper proposes a lightweight, privacy-preserving FER framework for assistive healthcare [...] Read more.

Facial expression recognition (FER) for assistive and telepresence robotics remains challenging under resource-constrained conditions because landmark normalization is often unstable, many datasets have limited variability, and full facial landmark sets introduce redundancy. This paper proposes a lightweight, privacy-preserving FER framework for assistive healthcare robotics based on geometric facial landmarks rather than raw RGB images. The objective is to improve recognition robustness and deployment suitability on low-power edge devices through two complementary contributions: a revised nose-centered landmark normalization method and an optimized Facial Feature Mapping, FFM-L03. The proposed normalization replaces the expression-sensitive upper-lip reference with a geometrically stable nose-center anchor, while FFM-L03 combines FACS-guided anatomical priors with ANOVA F-score, LASSO, PCA, and t-SNE/UMAP to retain 60 informative landmarks. In addition, a heterogeneous Freepik dataset was constructed to increase variability in lighting, background, resolution, and subject appearance. Experimental evaluation across 15 landmark groups, four datasets, and four classifiers shows that the proposed method consistently improves performance over prior landmark configurations, achieving gains of up to 22.4 percentage points over the Ciraolo baseline and 22.1 percentage points over the full-landmark baseline in accuracy, precision, recall, and F1-score, while maintaining lightweight operation. These results demonstrate that principled normalization and targeted landmark selection can substantially improve FER for real-time, privacy-aware assistive robotic systems. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

20 pages, 19496 KB

Open AccessArticle

A Hierarchical Attention Synergetic Network for Facial Expression Recognition in Service Robots

by Dengpan Zhang, Qingping Ma, Zhihao Shen, Wenwen Ma, Yonggang Yan and Song Kong

Appl. Sci. 2026, 16(9), 4417; https://doi.org/10.3390/app16094417 - 30 Apr 2026

Viewed by 258

Abstract

Facial expression recognition (FER) is crucial for endowing service robots with emotional perception capabilities. Achieving high-performance facial expression recognition hinges on effectively balancing the capture of subtle local textures with the understanding of overall facial configurations. However, coordinating local feature variations with global [...] Read more.

Facial expression recognition (FER) is crucial for endowing service robots with emotional perception capabilities. Achieving high-performance facial expression recognition hinges on effectively balancing the capture of subtle local textures with the understanding of overall facial configurations. However, coordinating local feature variations with global semantic dependencies in unconstrained environments while maintaining semantic alignment remains a challenge. To address this issue, we propose FER-SDAM, a network architecture based on hierarchical attention collaboration. Through a dual-attention hierarchical collaboration mechanism, this architecture introduces an Attention Consistency Loss (ACL) to explicitly align shallow structural awareness with deep global dependencies. It simultaneously captures structural sensitivity and cross-regional correlations, facilitating the effective fusion of local structural information with global semantics, thereby balancing accuracy, robustness, and computational efficiency. We conducted extensive experiments on AffectNet, RAF-DB, and their subsets containing occlusion and pose variations, achieving accuracy rates of 68.12%, 66.68%, and 88.87% on the AffectNet-7, AffectNet-8, and RAF-DB datasets, respectively. The experimental results demonstrate that FER-SDAM achieves a critical balance between accuracy and efficiency, delivering highly competitive recognition performance while maintaining low computational overhead, making it an ideal solution for real-time deployment in service robots. Full article

► Show Figures

Figure 1

21 pages, 5104 KB

Open AccessArticle

Trust Isn’t Binary: Analysis of User Sentiment for Assistive Human–Robot Interaction

by Randyll Pandohie, Edgard M. Maboudou-Tchao, Nihad Habizada, Morris Beato and Aman Behal

Machines 2026, 14(5), 488; https://doi.org/10.3390/machines14050488 - 27 Apr 2026

Viewed by 471

Abstract

Understanding how users perceive assistive robotic systems is critical for their successful adoption, particularly in rehabilitation settings where both patients and clinicians influence decision-making. While prior work has focused on technical performance and overall usability, affective responses such as trust, control, and perceived [...] Read more.

Understanding how users perceive assistive robotic systems is critical for their successful adoption, particularly in rehabilitation settings where both patients and clinicians influence decision-making. While prior work has focused on technical performance and overall usability, affective responses such as trust, control, and perceived independence are often captured using coarse, single-score measures that overlook important nuances. This study analyzes focus group discussions with individuals with spinal cord injury to examine how users evaluate different aspects of assistive robot design. A hybrid aspect-based sentiment analysis approach is applied, combining lexicon-based and transformer-based methods to capture both interpretable and context-sensitive sentiment. The analysis separates sentiment across key dimensions, including independence, functionality, safety, control, cost, and data sharing. Participants expressed consistently positive views toward independence and functional support, while responses related to safety, control, and data sharing were more conditional. In particular, trust emerged as something that depends on transparency, user control, and the ability to override system behavior, rather than a fixed attitude toward the technology. These findings suggest that successful assistive robotic systems must balance autonomy with user authority and provide clear, adaptable mechanisms for control and data governance. Full article

(This article belongs to the Special Issue Advanced Human–Machine Interaction and Assistive Robotics for Rehabilitation)

► Show Figures

Figure 1

24 pages, 6361 KB

Open AccessArticle

A Novel Type of Pneumatic Rotary Positioner Using Three-Phase Pressure Commutation

by Valentin Ciupe, Robert Kristof and Ghadeer Ismael

Actuators 2026, 15(4), 192; https://doi.org/10.3390/act15040192 - 31 Mar 2026

Viewed by 555

Abstract

This paper presents the design, simulation, and experimental validation of a novel type of pneumatic rotary positioner that is based on a three-cylinder radial mechanism driven by independently controlled pressures. The system uses standard off-the-shelf industrial components, including pneumatic cylinders, proportional pressure regulators, [...] Read more.

This paper presents the design, simulation, and experimental validation of a novel type of pneumatic rotary positioner that is based on a three-cylinder radial mechanism driven by independently controlled pressures. The system uses standard off-the-shelf industrial components, including pneumatic cylinders, proportional pressure regulators, and a programmable logic controller. In order to obtain angular positioning, a three-phase sinusoidal pressure commutation scheme is adopted, similar to the three-phase electrical motors. Analytical expressions for piston kinematics and torque generation are derived and used to design direct open-loop, open-loop with friction compensation, and closed-loop position control strategies. The technical implementation, with the prototype tested unloaded, can achieve accurate positioning (±3° in open-loop mode with feedforward to ±0.3° in closed-loop mode with PD controller), with very good repeatability on average (<0.5°) and smooth theoretical torque (average 1.4 Nm, with 0.51% ripple) at low speeds (<60 rpm). The experimental prototype was designed as a compact device, having approx. 94 mm diameter and 110 mm depth. When used in open-loop mode, the actuator is connected to the control system using just three pneumatic tubes and thus is completely free of any electromagnetic fields, making it suitable for some environment-critical applications. These advantages promote the proposed positioner as a practical rotary actuator in specialized automation and robotics applications where established electrical servomotors cannot be used. Full article

(This article belongs to the Special Issue Actuation and Sensing of Intelligent Soft Robots—2nd Edition)

► Show Figures

Figure 1

32 pages, 7928 KB

Open AccessArticle

eXCube2: Explainable Brain-Inspired Spiking Neural Network Framework for Emotion Recognition from Audio, Visual and Multimodal Audio–Visual Data

by N. K. Kasabov, A. Yang, Z. Wang, I. Abouhassan, A. Kassabova and T. Lappas

Biomimetics 2026, 11(3), 208; https://doi.org/10.3390/biomimetics11030208 - 14 Mar 2026

Viewed by 934

Abstract

This paper introduces a biomimetic framework and novel brain-inspired AI (BIAI) models based on spiking neural networks (SNNs) for emotional state recognition from audio (speech), visual (face), and integrated multimodal audio–visual data. The developed framework, named eXCube2, uses a three-dimensional SNN architecture NeuCube [...] Read more.

This paper introduces a biomimetic framework and novel brain-inspired AI (BIAI) models based on spiking neural networks (SNNs) for emotional state recognition from audio (speech), visual (face), and integrated multimodal audio–visual data. The developed framework, named eXCube2, uses a three-dimensional SNN architecture NeuCube that is spatially structured according to a human brain template. The BIAI models developed in eXCube2 are trainable on spatio- and spectro-temporal data using brain-inspired learning rules. Such models are explainable in terms of revealing patterns in data and are adaptable to new data. The eXCube2 models are implemented as software systems and tested on speech and video data of subjects expressing emotional states. The use of a brain template for the SNN structure enables brain-inspired tonotopic and stereo mapping of audio inputs, topographic mapping of visual data, and the combined use of both modalities. This novel approach brings AI-based emotional state recognition closer to human perception, provides a better explainability and adaptability than existing AI systems. It also results in a higher or competitive accuracy, even though this was not the main goal here. This is demonstrated through experiments on benchmark datasets, achieving classification accuracy above 80% on single-modality data and 88.9% when multimodal audio–visual data are used, and a “don’t know” output is introduced. The paper further discusses possible applications of the proposed eXCube2 framework to other audio, visual, and audio–visual data for solving challenging problems, such as recognizing emotional states of people from different origins; brain state diagnosis (e.g., Parkinson’s disease, Alzheimer’s disease, ADHD, dementia); measuring response to treatment over time; evaluating satisfaction responses from online clients; cognitive robotics; human–robot interaction; chatbots; and interactive computer games. The SNN-based implementation of BIAI also enables the use of neuromorphic chips and platforms, leading to reduced power consumption, smaller device size, higher performance accuracy, and improved adaptability and explainability. This research shows a step toward building brain-inspired AI systems. Full article

► Show Figures

Figure 1

22 pages, 1747 KB

Open AccessReview

Talking Head Generation Through Generative Models and Cross-Modal Synthesis Techniques

by Hira Nisar, Salman Masood, Zaki Malik and Adnan Abid

J. Imaging 2026, 12(3), 119; https://doi.org/10.3390/jimaging12030119 - 10 Mar 2026

Viewed by 1426

Abstract

Talking Head Generation (THG) is a rapidly advancing field at the intersection of computer vision, deep learning, and speech synthesis, enabling the creation of animated human-like heads that can produce speech and express emotions with high visual realism. The core objective of THG [...] Read more.

Talking Head Generation (THG) is a rapidly advancing field at the intersection of computer vision, deep learning, and speech synthesis, enabling the creation of animated human-like heads that can produce speech and express emotions with high visual realism. The core objective of THG systems is to synthesize coherent and natural audio–visual outputs by modeling the intricate relationship between speech signals, facial dynamics, and emotional cues. These systems find widespread applications in virtual assistants, interactive avatars, video dubbing for multilingual content, educational technologies, and immersive virtual and augmented reality environments. Moreover, the development of THG has significant implications for accessibility technologies, cultural preservation, and remote healthcare interfaces. This survey paper presents a comprehensive and systematic overview of the technological landscape of Talking Head Generation. We begin by outlining the foundational methodologies that underpin the synthesis process, including generative adversarial networks (GANs), motion-aware recurrent architectures, and attention-based models. A taxonomy is introduced to organize the diverse approaches based on the nature of input modalities and generation goals. We further examine the contributions of various domains such as computer vision, speech processing, and human–robot interaction, each of which plays a critical role in advancing the capabilities of THG systems. The paper also provides a detailed review of datasets used for training and evaluating THG models, highlighting their coverage, structure, and relevance. In parallel, we analyze widely adopted evaluation metrics, categorized by their focus on image quality, motion accuracy, synchronization, and semantic fidelity. Operating parameters such as latency, frame rate, resolution, and real-time capability are also discussed to assess deployment feasibility. Special emphasis is placed on the integration of generative artificial intelligence (GenAI), which has significantly enhanced the adaptability and realism of talking head systems through more powerful and generalizable learning frameworks. Full article

(This article belongs to the Special Issue AI-Driven Multimodal Image and Video Processing: Advances and Applications)

► Show Figures

Figure 1

18 pages, 2012 KB

Open AccessArticle

Electromechanical Coupling and Piezoelectric Behaviour of (PDMS)–Graphene Elastomer Nanocomposites

by Murat Çelik, Miguel A. Lopez-Manchado and Raquel Verdejo

Polymers 2026, 18(5), 623; https://doi.org/10.3390/polym18050623 - 2 Mar 2026

Viewed by 803

Abstract

Elastomer-based nanocomposites combining polymer flexibility with conductive nanofillers provide lightweight, stretchable systems with tunable electromechanical properties for wearable electronics, soft robotics, and self-powered sensors. However, predicting their nonlinear response remains challenging because the observed piezoelectric-like response arises from strain-dependent interfacial polarization and evolving [...] Read more.

Elastomer-based nanocomposites combining polymer flexibility with conductive nanofillers provide lightweight, stretchable systems with tunable electromechanical properties for wearable electronics, soft robotics, and self-powered sensors. However, predicting their nonlinear response remains challenging because the observed piezoelectric-like response arises from strain-dependent interfacial polarization and evolving piezoresistive conduction pathways within heterogeneous microstructures. We introduce a continuum electro-hyperelastic framework combining the Mooney–Rivlin model for large-strain elasticity with a Helmholtz free-energy approach for electrostatic coupling. Analytical expressions for stress, electric displacement, and apparent piezoelectric coefficients are derived and implemented in finite element simulations. The model accurately reproduces the experimental mechanical, dielectric, and electromechanical behaviour of polydimethylsiloxane (PDMS) nanocomposites with 0.1–1 wt% graphene. These show increased stiffness, relative permittivity (from 3.4 to 4.0, ≈18%), and quasi-static d₃₃ coefficients (from −5.6 to −10.0 pC N⁻¹, ≈80% enhancement). Analytical and finite element method (FEM) results show consistent trends across the full deformation range, with Maxwell stress agreement within 10% at lower deformation levels, while deviations of 33–40% for coupled electromechanical quantities at an axial displacement u_z = ~−1 mm (~16.7% compressive strain) are attributable to three-dimensional shear effects absent from the uniaxial analytical assumption. Simulations reveal that graphene boosts Maxwell stress, yielding a four-fold increase at lower stretch ratios. This reframes PDMS–graphene composites as electro-hyperelastic materials, offering a predictive, extensible framework. It highlights apparent piezoelectricity as an emergent, tunable effect from charge redistribution in a compliant hyperelastic matrix—guiding the design of next-generation flexible devices leveraging field-induced coupling over intrinsic polarization. Full article

(This article belongs to the Section Smart and Functional Polymers)

► Show Figures

Graphical abstract

21 pages, 1469 KB

Open AccessArticle

Development of Surveillance Robots Based on Face Recognition Using High-Order Statistical Features and Evidence Theory

by Slim Ben Chaabane, Rafika Harrabi, Anas Bushnag and Hassene Seddik

J. Imaging 2026, 12(3), 107; https://doi.org/10.3390/jimaging12030107 - 28 Feb 2026

Viewed by 931

Abstract

The recent advancements in technologies such as artificial intelligence (AI), computer vision (CV), and Internet of Things (IoT) have significantly extended various fields, particularly in surveillance systems. These innovations enable real-time facial recognition processing, enhancing security and ensuring safety. However, mobile robots are [...] Read more.

The recent advancements in technologies such as artificial intelligence (AI), computer vision (CV), and Internet of Things (IoT) have significantly extended various fields, particularly in surveillance systems. These innovations enable real-time facial recognition processing, enhancing security and ensuring safety. However, mobile robots are commonly employed in surveillance systems to handle risky tasks that are beyond human capability. In this paper, we present a prototype of a cost-effective mobile surveillance robot built on the Raspberry PI 4, designed for integration into various industrial environments. This smart robot detects intruders using IoT and face recognition technology. The proposed system is equipped with a passive infrared (PIR) sensor and a camera for capturing live-streaming video and photos, which are sent to the control room through IoT technology. Additionally, the system uses face recognition algorithms to differentiate between company staff and potential intruders. The face recognition method combines high-order statistical features and evidence theory to improve facial recognition accuracy and robustness. High-order statistical features are used to capture complex patterns in facial images, enhancing discrimination between individuals. Evidence theory is employed to integrate multiple information sources, allowing for better decision-making under uncertainty. This approach effectively addresses challenges such as variations in lighting, facial expressions, and occlusions, resulting in a more reliable and accurate face recognition system. When the system detects an unfamiliar individual, it sends out alert notifications and emails to the control room with the captured picture using IoT. A web interface has also been set up to control the robot from a distance through Wi-Fi connection. The proposed face recognition method is evaluated, and a comparative analysis with existing techniques is conducted. Experimental results with 400 test images of 40 individuals demonstrate the effectiveness of combining various attribute images in improving human face recognition performance. Experimental results indicate that the algorithm can identify human faces with an accuracy of 98.63%. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Figure 1

35 pages, 1070 KB

Open AccessEditor’s ChoiceArticle

Adaptive Deep Learning Framework for Emotion Recognition in Social Robots: Toward Inclusive Human–Robot Interaction for Users with Special Needs

by Eryka Probierz and Adam Gałuszka

Electronics 2026, 15(5), 924; https://doi.org/10.3390/electronics15050924 - 25 Feb 2026

Viewed by 838

Abstract

Emotion recognition is a key capability of social robots operating in real-world human-centered environments, especially when interacting with users with special needs. Such users may express emotions in atypical, subtle, or strongly context-dependent ways. These characteristics pose significant challenges for conventional emotion recognition [...] Read more.

Emotion recognition is a key capability of social robots operating in real-world human-centered environments, especially when interacting with users with special needs. Such users may express emotions in atypical, subtle, or strongly context-dependent ways. These characteristics pose significant challenges for conventional emotion recognition systems. This paper proposes an adaptive deep learning framework for emotion recognition in social robots. The framework is designed to support inclusive and accessible human–robot interaction. It combines region-based convolutional neural networks with adaptive learning mechanisms. These mechanisms explicitly model individual variability, contextual information, and interaction dynamics. Multiple deep architectures are evaluated to assess robustness across diverse emotional expressions, including those influenced by cognitive, sensory, or developmental differences. Rather than relying on fixed emotion models, the proposed approach emphasizes adaptability. The system dynamically adjusts its perception strategies to user-specific expressive patterns. Experimental validation is conducted using context-aware emotion datasets. Performance is evaluated in terms of detection accuracy, robustness to variability, and generalization across emotion categories. The results show that adaptive mechanisms improve recognition performance in scenarios characterized by non-standard or low-intensity expressions, compared to static baseline models. This study highlights the importance of flexible, context-sensitive perception for inclusive social robotics. It also discusses design implications for deploying emotion-aware robots in assistive, educational, and therapeutic settings. Overall, the proposed framework represents a step toward socially intelligent robots capable of engaging more effectively with users with special needs. Full article

(This article belongs to the Special Issue Research on Deep Learning and Human-Robot Collaboration)

► Show Figures

Figure 1

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (185)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI