MDPI - Publisher of Open Access Journals

30 pages, 3772 KB

Open AccessArticle

Bayesian Multi-Task Facial Emotion Recognition with Reliability-Aware Uncertainty Under Controlled Facial Masking

by Qiyuan Xiao and Changqin Quan

Mach. Learn. Knowl. Extr. 2026, 8(7), 175; https://doi.org/10.3390/make8070175 (registering DOI) - 25 Jun 2026

Facial emotion recognition (FER) in real-world settings is limited by the semantic mismatch between discrete emotion categories and continuous Valence–Arousal–Dominance (V-A-D) dimensions and the lack of reliable uncertainty estimates under incomplete facial evidence. Existing uncertainty-aware FER studies mainly address annotation ambiguity or training-time [...] Read more.

Facial emotion recognition (FER) in real-world settings is limited by the semantic mismatch between discrete emotion categories and continuous Valence–Arousal–Dominance (V-A-D) dimensions and the lack of reliable uncertainty estimates under incomplete facial evidence. Existing uncertainty-aware FER studies mainly address annotation ambiguity or training-time reliability, leaving the behavior of predictive uncertainty under progressive input degradation insufficiently examined. This paper proposes BGDC (Bayesian Gaussian-mixture Distributional Consistency), a multi-task FER framework that integrates a GMM-based soft consistency module with a context-conditioned Bayesian regression head and explicitly models aleatoric and epistemic uncertainty. To evaluate predictive reliability, a controlled masking protocol is introduced to remove facial information under different spatial configurations. On FER2013-VAD, BGDC attains the highest classification accuracy of 0.6943 and the highest mean V-A-D CCC of 0.6079 among the compared configurations, and it yields a stronger epistemic uncertainty-error correspondence than MC Dropout in a single-model setting. Controlled masking further shows that the epistemic uncertainty of BGDC tracks task-relevant facial information loss rather than masking ratio alone: it rises with regression error when diagnostically important regions are removed, and it contracts when the masked region is largely task-irrelevant. Combining Bayesian uncertainty with the GMM-based distributional prior thus enables reliability-aware multi-task FER, in which controlled masking serves as a diagnostic intervention rather than as a benchmark of accuracy degradation alone. Full article

(This article belongs to the Section Visualization)

► Show Figures

Figure 1

17 pages, 1028 KB

Open AccessArticle

Optimized Deep Learning Framework for Emotion Recognition Using Multimodal Physiological Signals and Temporal Convolutional Networks

by Mohsen Golafrouz, Houshyar Asadi, Mohammad Reza Chalak Qazani, Anwar Hosen, Zoran Najdovski, Lei Wei, Sam Oladazimi and Saeid Nahavandi

Computers 2026, 15(6), 381; https://doi.org/10.3390/computers15060381 - 11 Jun 2026

Viewed by 216

Abstract

Emotion recognition plays a crucial role in human–computer interaction, health monitoring, and affective computing by analysing physiological signals. Despite recent advancements, current research still faces challenges, including the lack of effective fusion strategies for diverse physiological modalities, difficulties in handling high-dimensional feature representations, [...] Read more.

Emotion recognition plays a crucial role in human–computer interaction, health monitoring, and affective computing by analysing physiological signals. Despite recent advancements, current research still faces challenges, including the lack of effective fusion strategies for diverse physiological modalities, difficulties in handling high-dimensional feature representations, and limited use of efficient temporal modelling techniques to capture complex emotional patterns. This study proposes a deep learning-based approach that fuses multiple physiological modalities, including Electroencephalography (EEG), Electrooculography (EOG), Electromyography (EMG), Galvanic Skin Response (GSR), Respiratory Rate (RR), Skin Temperature (SKT), and Photoplethysmography (PPG), to improve emotion recognition. Arousal and valence ratings were binarized into two classes (low/high) using a threshold of 4.5, formulating a binary classification problem. In addition to utilising Bidirectional Long Short-Term Memory (Bi-LSTM), the study employs Temporal Convolutional Networks (TCN), a widely used approach for time-series analysis, to efficiently capture temporal dependencies. The proposed model optimises feature selection through channel-wise strategies, incorporates advanced learning rate scheduling, and reduces computational overhead. Furthermore, window-wise, block-wise, and trial-wise evaluation protocols were investigated to assess the impact of temporal information leakage on emotion recognition performance. Using the DEAP dataset for validation, the proposed TCN-based approach achieved classification accuracies of 88.42% for valence and 86.35% for arousal under an overlapping block-wise evaluation protocol, demonstrating improved performance in binary emotion recognition and highlighting the importance of leakage-aware model assessment. Full article

(This article belongs to the Topic Applications of Machine Learning in Large-Scale Optimization and High-Dimensional Learning)

► Show Figures

Figure 1

24 pages, 5968 KB

Open AccessArticle

Emotion Recognition Based on Fusion of Topological Features and Trajectory Images Derived from EEG Phase Space Reconstruction

by Tianyue Liang, Xuanpeng Zhu and Yu Song

Sensors 2026, 26(10), 3102; https://doi.org/10.3390/s26103102 - 14 May 2026

Viewed by 512

Abstract

Electroencephalogram (EEG) signals, as a direct measure of the brain’s cortical electrophysiological activity, can objectively capture emotion-induced neural changes. Phase space reconstruction is an effective method for processing nonlinear time series. It maps time series to a high-dimensional phase space, thereby better preserving [...] Read more.

Electroencephalogram (EEG) signals, as a direct measure of the brain’s cortical electrophysiological activity, can objectively capture emotion-induced neural changes. Phase space reconstruction is an effective method for processing nonlinear time series. It maps time series to a high-dimensional phase space, thereby better preserving subtle dynamic information in the signal. This paper proposes a method for emotion recognition in EEG signals based on phase space reconstruction. First, the macro-topological features of the trajectories are constructed via phase space reconstruction. The time delay and embedding dimension are then optimized using the minimum cross-prediction error and the G-P method, followed by dimensionality reduction to a two-dimensional plane via local linear embedding. Building on this foundation, and in response to the limitations of manually designed features, we further propose a deep learning-based method for extracting multiscale dynamic features from trajectory images. The designed GN-MVXXS framework, which utilizes a granularity-adaptive module to adaptively switch the receptive field and a noise-filtering module to suppress isolated noise points, thereby effectively uncovers microscopic evolutionary features at the image level. Finally, to leverage the complementary strengths of macro- and micro-level information, we propose a fusion method based on dynamic attention. This approach aligns the dual representational dimensions through global average pooling and nonlinear dimension expansion, and utilizes a dynamic attention mechanism to adaptively assign feature weights, enabling the model to collaboratively enhance both overall dynamic patterns and local details based on sample characteristics. The experimental results show that the model achieved an accuracy of 96.11% in the three-class classification task on the SEED, 86.33% in the four-class classification task on the HIED, and 83.67% in classification across normal-hearing and hearing-impaired individuals, significantly outperforming single-feature models and traditional fusion methods. Full article

(This article belongs to the Special Issue EEG Signal Processing Techniques and Applications—3rd Edition)

► Show Figures

Figure 1

20 pages, 3850 KB

Open AccessArticle

Dimensional Emotion-Guided Conditional Modulation for Context-Aware Multimodal Driver Affect Recognition

by Wei Shen, Xingang Mou, Jing Yi and Songqing Le

Appl. Sci. 2026, 16(9), 4312; https://doi.org/10.3390/app16094312 - 28 Apr 2026

Viewed by 377

Abstract

Driver emotion recognition constitutes a fundamental pillar of intelligent cockpit systems, playing a pivotal role in enhancing driving safety and optimizing human–machine interaction. Despite the integration of vehicle sensor data in recent multimodal approaches, conventional fusion paradigms frequently encounter performance degradation due to [...] Read more.

Driver emotion recognition constitutes a fundamental pillar of intelligent cockpit systems, playing a pivotal role in enhancing driving safety and optimizing human–machine interaction. Despite the integration of vehicle sensor data in recent multimodal approaches, conventional fusion paradigms frequently encounter performance degradation due to the inherent noise and weak semantic correlation between vehicle telemetry and emotional states. To address these challenges, this study introduces a Dimensional Emotion-Guided Multi-task (DEGM) framework, a novel architecture designed to explicitly formalize the asymmetric roles of visual and vehicular modalities. Rather than employing simplistic feature concatenation, the proposed method maps multivariate vehicle data into a continuous Valence–Arousal–Dominance (VAD) space to characterize latent emotional tendencies within specific driving contexts. These predicted dimensions subsequently serve as semantic priors to conditionally modulate global facial representations through a Feature-wise Linear Modulation (FiLM) mechanism, facilitating robust and interpretable cross-modal interaction. Furthermore, the framework adopts a multi-task learning strategy that jointly optimizes discrete emotion classification and continuous dimension regression, leveraging the latter as a structural regularizer to refine the latent feature space. Comprehensive evaluations on the public PPB driving emotion dataset demonstrate that the proposed DEGM achieves a competitive accuracy of 87.50% and a weighted F1-score of 0.8727. The results validate that our framework provides a lightweight and robust paradigm for context-aware affect sensing, demonstrating strong potential for practical deployment in intelligent transportation systems. Full article

► Show Figures

Figure 1

26 pages, 2634 KB

Open AccessArticle

Minimal Angular Facial Representation for Real-Time Emotion Recognition

by Gerardo Garcia-Gil

Appl. Sci. 2026, 16(7), 3572; https://doi.org/10.3390/app16073572 - 6 Apr 2026

Viewed by 733

Abstract

Real-time facial emotion recognition remains challenging due to the high dimensionality and computational cost of dense facial representations, which limit their applicability in resource-constrained and real-time scenarios. This study proposes a compact, anatomically informed angular facial representation for efficient, interpretable emotion recognition under [...] Read more.

Real-time facial emotion recognition remains challenging due to the high dimensionality and computational cost of dense facial representations, which limit their applicability in resource-constrained and real-time scenarios. This study proposes a compact, anatomically informed angular facial representation for efficient, interpretable emotion recognition under real-time constraints. Facial landmarks are first extracted using a standard landmark detection framework, from which a reduced facial mesh of 27 anatomically selected points is defined. Internal geometric angles computed from this mesh are analyzed using temporal variability and redundancy criteria, resulting in a minimal set of eight angular descriptors that capture the most expressive facial dynamics while preserving geometric invariance and computational efficiency. The proposed representation is evaluated using multiple supervised machine learning classifiers under two complementary validation strategies: stratified frame-level cross-validation and strict Leave-One-Subject-Out evaluation. Under mixed-subject stratified validation, the best-performing model (MLP) achieved macro-averaged F1-scores exceeding 0.95 and near-unity ROC–AUC values. However, subject-independent evaluation revealed reduced generalization performance (average accuracy ≈55%), highlighting the influence of inter-subject morphological variability embedded in absolute angular descriptors. These findings indicate that a minimal angular geometric encoding provides strong intra-subject discriminative capability while transparently characterizing its cross-subject generalization limits, offering a practical and interpretable alternative for data- and resource-constrained real-time scenarios. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Figure 1

34 pages, 3911 KB

Open AccessArticle

PAD-Guided Multimodal Hybrid Contrastive Emotion Recognition upon STEM-E²VA Dataset

by Shufei Duan, Wenjie Zhang, Liangqi Li, Ting Zhu, Fangyu Zhao, Fujiang Li and Huizhi Liang

Multimodal Technol. Interact. 2026, 10(4), 38; https://doi.org/10.3390/mti10040038 - 2 Apr 2026

Viewed by 760

Abstract

There are still challenges in speech emotion recognition, as the representation capability of single-modal information is limited, there are difficulties in capturing continuous emotional transitions in discrete emotion annotations, and the issues of modal structural differences and cross-sample alignment in multimodal fusion methods [...] Read more.

There are still challenges in speech emotion recognition, as the representation capability of single-modal information is limited, there are difficulties in capturing continuous emotional transitions in discrete emotion annotations, and the issues of modal structural differences and cross-sample alignment in multimodal fusion methods persist. To address these, this study undertakes work from both data and model perspectives. For data, a Chinese multimodal database STEM-E²VA was constructed, synchronously collecting four modalities of data: articulatory kinematics, acoustics, glottal signals, and videos. This covers seven discrete emotion categories and employs PAD continuous annotation. By integrating discrete and continuous dimensional annotations, it better represents the distinction between strong and weak emotions under the same discrete emotion label. Concurrently, to process the biases in PAD annotations, we employed the SCL-90 psychological questionnaire to analyze annotators’ cognitive and emotional perceptions, thereby ensuring data reliability. For model, this paper proposes a multimodal supervised contrastive fusion network incorporating PAD perception. It employs a PAD-enhanced hybrid contrastive loss function to optimize intra-model and inter-modal feature alignment. Utilizing a cross-attention mechanism combined with a GRU–Transformer network for temporal feature extraction, it achieves deep fusion of multimodal information, reducing inter-modal discrepancies and cross-class confusion. Experiments demonstrate that the proposed method achieves 85.47% accuracy in discrete sentiment recognition on STEM-E²VA, with a substantial reduction in RMSE for PAD dimension prediction. It also exhibits excellent generalization capability on IEMOCAP, providing a novel framework for integrating discrete and continuous sentiment representations. Full article

► Show Figures

Figure 1

29 pages, 7368 KB

Open AccessArticle

Method for Emotion Recognition of EEG Signals Based on Recursive Graph and Spatiotemporal Attention Mechanism

by Dong Huang, Lin Xu and Yuwen Li

Brain Sci. 2026, 16(4), 377; https://doi.org/10.3390/brainsci16040377 - 30 Mar 2026

Viewed by 701

Abstract

Emotion recognition plays a crucial role in human–computer interaction and mental health applications. Traditional Electroencephalogram (EEG)-based emotion recognition methods are limited in classification accuracy due to their neglect of the spatiotemporal characteristics of the signals and individual differences. This study proposes a novel [...] Read more.

Emotion recognition plays a crucial role in human–computer interaction and mental health applications. Traditional Electroencephalogram (EEG)-based emotion recognition methods are limited in classification accuracy due to their neglect of the spatiotemporal characteristics of the signals and individual differences. This study proposes a novel EEG emotion recognition framework that integrates spatiotemporal features to enhance performance through the following innovations: (1) the use of a Recurrence Plot (RP) to transform one-dimensional EEG signals into two-dimensional images, enhancing the representation of nonlinear dynamic features; (2) the design of a Spatiotemporal Channel Attention Module (TCSA), which combines temporal convolution, channel, and spatial attention mechanisms to optimize the capture of complex patterns; and (3) the integration of the lightweight and efficient network Efficientnet to construct the TCSA-Efficientnet classification model. On the Database for Emotion Analysis using Physiological Signals (DEAP) dataset, the proposed method achieves accuracy rates of 99.11% and 99.33% for valence and arousal classification tasks, respectively. On the Database for Emotion Recognition Using EEG and Physiological Signals (DREAMER) dataset, the method achieves accuracy rates of 98.08% and 97.49%, outperforming other EEG-based emotion classification models on both datasets. This demonstrates its advantages in accuracy, robustness, and generalization. Full article

(This article belongs to the Section Computational Neuroscience, Neuroinformatics, and Neurocomputing)

► Show Figures

Figure 1

18 pages, 1085 KB

Open AccessArticle

Self-Learning Multimodal Emotion Recognition Based on Multi-Scale Dilated Attention

by Xiuli Du and Luyao Zhu

Brain Sci. 2026, 16(4), 350; https://doi.org/10.3390/brainsci16040350 - 25 Mar 2026

Viewed by 684

Abstract

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance [...] Read more.

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance by effectively integrating electroencephalogram (EEG) signals and facial expressions through a multimodal framework. Methods: We propose a multimodal emotion recognition model that employs a Multi-Scale Dilated Attention Convolution (MSDAC) network tailored for facial expression recognition, integrates an EEG emotion recognition method based on three-dimensional features, and adopts a self-learning decision-level fusion strategy. MSDAC incorporates Multi-Scale Dilated Convolutions and a Dual-Branch Attention (D-BA) module to capture discontinuous facial action units. For EEG processing, raw signals are converted into a multidimensional time–frequency–spatial representation to preserve temporal, spectral, and spatial information. To overcome the limitations of traditional stitching or fixed-weight fusion approaches, a self-learning weight fusion mechanism is introduced at the decision level to adaptively adjust modality contributions. Results: The facial analysis branch achieved average accuracies of 74.1% on FER2013, 99.69% on CK+, and 98.05% (valence)/96.15% (arousal) on DEAP. On the DEAP dataset, the complete multimodal model reached 98.66% accuracy for valence and 97.49% for arousal classification. Conclusions: The proposed framework enhances emotion recognition by improving facial feature extraction and enabling adaptive multimodal fusion, demonstrating the effectiveness of combining EEG and facial information for robust emotion analysis. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Graphical abstract

45 pages, 2643 KB

Open AccessArticle

From Complexity Theory to Computational Wisdom: Enhancing EEG–Neurotransmitter Models Through Sophimatics for Brain Data Analysis

by Gerardo Iovane and Giovanni Iovane

Algorithms 2026, 19(3), 237; https://doi.org/10.3390/a19030237 - 22 Mar 2026

Cited by 1 | Viewed by 665

Abstract

The analysis of brain data through electroencephalography (EEG) has become essential in neuroscience, affective computing, and brain–computer interfaces. Recent work associates EEG features with artificial neurotransmitter models, simulating emotions and rational–emotional decision-making using complexity theory. However, current methods face limitations: (1) linear temporal [...] Read more.

The analysis of brain data through electroencephalography (EEG) has become essential in neuroscience, affective computing, and brain–computer interfaces. Recent work associates EEG features with artificial neurotransmitter models, simulating emotions and rational–emotional decision-making using complexity theory. However, current methods face limitations: (1) linear temporal representations lacking memory and anticipation, (2) limited contextual adaptation, (3) difficulty with paradoxical affective states, and (4) absence of ethical reasoning in decision-making. We present a framework based on Sophimatics, using complex time (

t = t_{r e a l} + i \cdot t_{i m a g} \in C

) where

t_{r e a l}

represents chronology and

t_{i m a g}

encodes experiential dimensions including memory depth and anticipatory imagination. The Super Time Cognitive Neural Network (STCNN) architecture enables the parallel processing of objective time sequences and subjective cognitive experiences. Our Sophimatics-assisted EEG analysis achieves: (1) two-dimensional temporal coherence integrating past experiences and future projections, (2) context-sensitive adaptation via ontological knowledge graphs, (3) interpretable symbolic reasoning compatible with clinical psychology, (4) mechanisms for resolving affective paradoxes, and (5) ethical constraints ensuring value-based decision-making. Across three case studies (emotion recognition, meditation-induced transitions, and brain–computer interface decision support), integrated Sophimatics models outperform traditional machine learning (15–22% accuracy improvement) and complexity theory models (8–14% improvement), while offering greater cognitive richness and immunity to incomplete data. Results establish a post-generative AI framework with computational wisdom: relationally interactive, ethically informed, and temporally consistent with human cognitive and affective life. The framework outlines paths toward next-generation neuromorphic systems achieving genuine understanding beyond pattern recognition. Full article

(This article belongs to the Special Issue Machine Learning Techniques for Brain Data Analysis Using EEG, EMG or Image Data)

► Show Figures

Figure 1

32 pages, 7928 KB

Open AccessArticle

eXCube2: Explainable Brain-Inspired Spiking Neural Network Framework for Emotion Recognition from Audio, Visual and Multimodal Audio–Visual Data

by N. K. Kasabov, A. Yang, Z. Wang, I. Abouhassan, A. Kassabova and T. Lappas

Biomimetics 2026, 11(3), 208; https://doi.org/10.3390/biomimetics11030208 - 14 Mar 2026

Viewed by 940

Abstract

This paper introduces a biomimetic framework and novel brain-inspired AI (BIAI) models based on spiking neural networks (SNNs) for emotional state recognition from audio (speech), visual (face), and integrated multimodal audio–visual data. The developed framework, named eXCube2, uses a three-dimensional SNN architecture NeuCube [...] Read more.

This paper introduces a biomimetic framework and novel brain-inspired AI (BIAI) models based on spiking neural networks (SNNs) for emotional state recognition from audio (speech), visual (face), and integrated multimodal audio–visual data. The developed framework, named eXCube2, uses a three-dimensional SNN architecture NeuCube that is spatially structured according to a human brain template. The BIAI models developed in eXCube2 are trainable on spatio- and spectro-temporal data using brain-inspired learning rules. Such models are explainable in terms of revealing patterns in data and are adaptable to new data. The eXCube2 models are implemented as software systems and tested on speech and video data of subjects expressing emotional states. The use of a brain template for the SNN structure enables brain-inspired tonotopic and stereo mapping of audio inputs, topographic mapping of visual data, and the combined use of both modalities. This novel approach brings AI-based emotional state recognition closer to human perception, provides a better explainability and adaptability than existing AI systems. It also results in a higher or competitive accuracy, even though this was not the main goal here. This is demonstrated through experiments on benchmark datasets, achieving classification accuracy above 80% on single-modality data and 88.9% when multimodal audio–visual data are used, and a “don’t know” output is introduced. The paper further discusses possible applications of the proposed eXCube2 framework to other audio, visual, and audio–visual data for solving challenging problems, such as recognizing emotional states of people from different origins; brain state diagnosis (e.g., Parkinson’s disease, Alzheimer’s disease, ADHD, dementia); measuring response to treatment over time; evaluating satisfaction responses from online clients; cognitive robotics; human–robot interaction; chatbots; and interactive computer games. The SNN-based implementation of BIAI also enables the use of neuromorphic chips and platforms, leading to reduced power consumption, smaller device size, higher performance accuracy, and improved adaptability and explainability. This research shows a step toward building brain-inspired AI systems. Full article

► Show Figures

Figure 1

26 pages, 3226 KB

Open AccessArticle

Assessing Street-Level Emotional Perception in Urban Regeneration Contexts Using Domain-Adapted CLIP

by Liyang Chu and Keting Zhou

Buildings 2026, 16(5), 980; https://doi.org/10.3390/buildings16050980 - 2 Mar 2026

Viewed by 602

Abstract

As urban regeneration goals shift from physical improvement to pedestrian-level experience and emotional perception, existing assessment methods struggle to describe the emotional responses associated with renewed street environments. This paper proposes a framework for street-level emotional perception inference and analysis within the context [...] Read more.

As urban regeneration goals shift from physical improvement to pedestrian-level experience and emotional perception, existing assessment methods struggle to describe the emotional responses associated with renewed street environments. This paper proposes a framework for street-level emotional perception inference and analysis within the context of urban regeneration, enabling the automatic semantic recognition based on Street View Images (SVIs) and a Vision-Language Model (VLM). The paper constructs a six-dimensional emotion perceptual framework encompassing Comfort, Vitality, Safety, Oppressiveness, Nostalgia, and Alienation and uses a lightweight domain-adapted Contrastive Language-Image Pre-training (CLIP) model to infer emotional perceptions from SVIs. Building upon this, a dual-axis evaluation framework is introduced to structure and interpret basic spatial experience and regeneration-related perception. Using the Yuyuan Road and Wuding Road areas in Shanghai as a case study, the paper combines emotional perception results with street-level spatial analysis, proposing a scalable and interpretable analytical method for diagnosing urban regeneration outcomes and supporting emotion-informed spatial interventions. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

14 pages, 1026 KB

Open AccessArticle

STHMA: Decoupling Spatio-Temporal Dynamics in EEG via Hybrid State Space Modeling

by Shuo Yang, Lintong Zhang, Youyi Cheng, Yingying Zheng, Shuai Zheng, Jiahui Guo and Lirong Zheng

Brain Sci. 2026, 16(3), 267; https://doi.org/10.3390/brainsci16030267 - 27 Feb 2026

Viewed by 668

Abstract

Background/Objectives: Decoding affective states from Electroencephalography (EEG) signals is fundamental to non-invasive Brain–Computer Interfaces. Despite recent advances, accurate recognition is impeded by the inherently non-stationary nature of physiological signals and the entanglement of spatio-temporal dynamics within high-dimensional recordings. While Transformers excel at global [...] Read more.

Background/Objectives: Decoding affective states from Electroencephalography (EEG) signals is fundamental to non-invasive Brain–Computer Interfaces. Despite recent advances, accurate recognition is impeded by the inherently non-stationary nature of physiological signals and the entanglement of spatio-temporal dynamics within high-dimensional recordings. While Transformers excel at global modeling, they often neglect the continuous dynamical properties of neural signals and suffer from quadratic complexity. Methods: In this paper, we propose the Spatio-Temporal Hybrid Mamba-Attention (STHMA), a framework designed to explicitly disentangle and model EEG dynamics via linear-complexity State Space Models. First, to incorporate domain knowledge, we introduce a Dual-Domain Physics-Aware Embedding module. This module fuses learnable temporal convolutions with explicit frequency-domain spectral features, ensuring fidelity to neurophysiological principles. Second, we propose a novel Decoupled Spatial–Temporal Scanning strategy. By dynamically reconfiguring the serialization of the data tensor, our model strictly separates the learning of instantaneous functional connectivity from the tracking of emotional state evolution, thereby preventing the structural collapse common in 1D sequence models. Results: Extensive experiments on the FACED and SEED-V datasets demonstrate that the STHMA achieves state-of-the-art performance, significantly exceeding the random chance baselines (11.11% for 9-class FACED and 20.00% for 5-class SEED-V). Conclusions: The results validate that combining Physics-Aware Embeddings with decoupled state-space modeling offers a scalable and effective paradigm for EEG emotion recognition. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Figure 1

24 pages, 1972 KB

Open AccessArticle

Exploring the Topics and Sentiments of AI-Related Public Opinions: An Advanced Machine Learning Text Analysis

by Wullianallur Raghupathi, Jie Ren and Tanush Kulkarni

Information 2026, 17(2), 134; https://doi.org/10.3390/info17020134 - 1 Feb 2026

Viewed by 3546

Abstract

This study investigates the evolution of public sentiment and discourse surrounding artificial intelligence through a comprehensive multi-method analysis of 28,819 Reddit comments spanning March 2015 to May 2024. Addressing three research questions—(1) what dominant topics characterize AI discourse, (2) how has sentiment changed [...] Read more.

This study investigates the evolution of public sentiment and discourse surrounding artificial intelligence through a comprehensive multi-method analysis of 28,819 Reddit comments spanning March 2015 to May 2024. Addressing three research questions—(1) what dominant topics characterize AI discourse, (2) how has sentiment changed over time, particularly following ChatGPT 5.2’s release, and (3) what linguistic patterns distinguish positive from negative discourse—we employ 28 distinct analytical techniques to provide validated insights into public AI perception. Methodologically, the study integrates VADER sentiment analysis, Linguistic Inquiry and Word Count (LIWC) analysis with regression validation, dual topic modeling using Latent Dirichlet Allocation and Non-negative Matrix Factorization for cross-validation, four-dimensional tone analysis, named entity recognition, emotion detection, and advanced NLP techniques including sarcasm detection, stance classification, and toxicity analysis. A key methodological contribution is the validation of LIWC categories through linear regression (R² = 0.049, p < 0.001) and logistic regression (61% accuracy), moving beyond the descriptive statistics typical of prior linguistic analyses. Results reveal a pronounced decline in positive sentiment from +0.320 in 2015 to +0.053 in 2024. Contrary to expectations, sentiment decreased following ChatGPT’s November 2022 release, with negative comments increasing from 31.9% to 35.1%—suggesting that direct exposure to powerful AI capabilities intensifies rather than alleviates public concerns. LIWC regression analysis identified negative emotion words (β = −0.083) and positive emotion words (β = +0.063) as the strongest sentiment predictors, confirming that affective rather than technical engagement drives public AI attitudes. Topic modeling revealed nine coherent themes, with facial recognition, algorithmic bias, AI ethics, and social media misinformation emerging as dominant concerns across both LDA and NMF analyses. Network analysis identified regulation as a central hub (degree centrality = 0.929) connecting all major AI concerns, indicating strong public appetite for governance frameworks. These findings contribute to theoretical understandings of technology risk perception, provide practical guidance for AI developers and policymakers, and demonstrate validated computational methods for tracking public opinion toward emerging technologies. Full article

(This article belongs to the Collection Natural Language Processing and Applications: Challenges and Perspectives)

► Show Figures

Figure 1

23 pages, 2450 KB

Open AccessArticle

DAFT: Domain-Augmented Fine-Tuning for Large Language Models in Emotion Recognition of Health Misinformation

by Youlin Zhao, Xingmi Zhu, Wanqing Tang, Linxing Zhou, Li Feng and Mingwei Tang

Appl. Sci. 2025, 15(23), 12690; https://doi.org/10.3390/app152312690 - 29 Nov 2025

Viewed by 935

Abstract

This study proposes a domain-augmented fine-tuning strategy for improving emotion recognition in health misinformation using pre-trained large language models (LLMs). The proposed method aims to address key limitations of existing approaches, including insufficient precision, weak domain adaptability, and low recognition accuracy for complex [...] Read more.

This study proposes a domain-augmented fine-tuning strategy for improving emotion recognition in health misinformation using pre-trained large language models (LLMs). The proposed method aims to address key limitations of existing approaches, including insufficient precision, weak domain adaptability, and low recognition accuracy for complex emotional expressions in health-related misinformation. Specifically, the Domain-Augmented Fine-Tuning (DAFT) method extends a health emotion lexicon to annotate emotion-oriented corpora, designs task-specific prompt templates to enhance semantic understanding, and fine-tunes GPT-based LLMs through parameter-efficient prompt tuning. Empirical experiments conducted on a health misinformation dataset demonstrate that DAFT substantially improves model performance in terms of prediction error, emotional vector structural similarity, probability distribution consistency, and classification accuracy. The fine-tuned GPT-4o model achieves the best overall performance, attaining an emotion recognition accuracy of 84.77%, with its F1-score increasing by 20.78% relative to the baseline model. Nonetheless, the corpus constructed in this study is based on a six-dimensional emotion framework, which may not fully capture nuanced emotions in complex linguistic contexts. Moreover, the dataset is limited to textual information, and future research should incorporate multimodal data such as images and videos. Overall, the DAFT method effectively enhances the domain adaptability of LLMs and provides a lightweight yet efficient approach to emotion recognition in health misinformation scenarios. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 1773 KB

Open AccessArticle

ACE-Net: A Fine-Grained Deepfake Detection Model with Multimodal Emotional Consistency

by Shaoqian Yu, Xingyu Chen, Yuzhe Sheng, Han Zhang, Xinlong Li and Sijia Yu

Electronics 2025, 14(22), 4420; https://doi.org/10.3390/electronics14224420 - 13 Nov 2025

Viewed by 1422

Abstract

The alarming realism of Deepfake presents a significant challenge to digital authenticity, yet its inherent difficulty in synchronizing the emotional cues between facial expressions and speech offers a critical opportunity for detection. However, most existing approaches rely on general-purpose backbones for unimodal feature [...] Read more.

The alarming realism of Deepfake presents a significant challenge to digital authenticity, yet its inherent difficulty in synchronizing the emotional cues between facial expressions and speech offers a critical opportunity for detection. However, most existing approaches rely on general-purpose backbones for unimodal feature extraction, resulting in an inadequate representation of fine-grained dynamic emotional expressions. Although a limited number of studies have explored cross-modal emotional consistency of deepfake detection, they typically employ shallow fusion techniques which limit latent expressiveness. To address this, we propose ACE-Net, a novel framework that identifies forgeries via multimodal emotional inconsistency. For the speech modality, we design a bidirectional cross-attention mechanism to fuse acoustic features from a lightweight CNN-based model with textual features, yielding a representation highly sensitive to fine-grained emotional dynamics. For the visual modality, a MobileNetV3-based perception head is proposed to adaptively select keyframes, yielding a representation focused on the most emotionally salient moments. For multimodal emotional consistency discrimination, we develop a multi-dimensional fusion strategy to deeply integrate high-level emotional features from different modalities within a unified latent space. For unimodal emotion recognition, both the audio and visual branches outperform baseline models on the CREMA-D dataset. Building on this, the complete ACE-Net model achieves a state-of-the-art AUC of 0.921 on the challenging DFDC benchmark. Full article

(This article belongs to the Special Issue Computer Vision and Pattern Recognition Based on Machine Learning)

► Show Figures

Figure 1

Search Results (127)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (127)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI