MDPI - Publisher of Open Access Journals

18 pages, 1085 KB

Open AccessArticle

Self-Learning Multimodal Emotion Recognition Based on Multi-Scale Dilated Attention

by Xiuli Du and Luyao Zhu

Brain Sci. 2026, 16(4), 350; https://doi.org/10.3390/brainsci16040350 - 25 Mar 2026

Viewed by 308

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance [...] Read more.

Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance by effectively integrating electroencephalogram (EEG) signals and facial expressions through a multimodal framework. Methods: We propose a multimodal emotion recognition model that employs a Multi-Scale Dilated Attention Convolution (MSDAC) network tailored for facial expression recognition, integrates an EEG emotion recognition method based on three-dimensional features, and adopts a self-learning decision-level fusion strategy. MSDAC incorporates Multi-Scale Dilated Convolutions and a Dual-Branch Attention (D-BA) module to capture discontinuous facial action units. For EEG processing, raw signals are converted into a multidimensional time–frequency–spatial representation to preserve temporal, spectral, and spatial information. To overcome the limitations of traditional stitching or fixed-weight fusion approaches, a self-learning weight fusion mechanism is introduced at the decision level to adaptively adjust modality contributions. Results: The facial analysis branch achieved average accuracies of 74.1% on FER2013, 99.69% on CK+, and 98.05% (valence)/96.15% (arousal) on DEAP. On the DEAP dataset, the complete multimodal model reached 98.66% accuracy for valence and 97.49% for arousal classification. Conclusions: The proposed framework enhances emotion recognition by improving facial feature extraction and enabling adaptive multimodal fusion, demonstrating the effectiveness of combining EEG and facial information for robust emotion analysis. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Graphical abstract

26 pages, 3519 KB

Open AccessArticle

Subject-Independent Depression Recognition from EEG Using an Improved Bidirectional LSTM with Dynamic Vector Routing

by Ziqi Ji, Kunye Liu, Weikai Ma, Xiaolin Ning and Yang Gao

Bioengineering 2026, 13(3), 358; https://doi.org/10.3390/bioengineering13030358 - 19 Mar 2026

Viewed by 557

Abstract

Electroencephalography (EEG) has become an increasingly important tool in depression research due to its ability to capture objective neurophysiological abnormalities associated with depressive disorders, offering high temporal resolution, non-invasiveness, and cost-effectiveness.However, existing methods often fail to fully exploit the multi-domain information in EEG [...] Read more.

Electroencephalography (EEG) has become an increasingly important tool in depression research due to its ability to capture objective neurophysiological abnormalities associated with depressive disorders, offering high temporal resolution, non-invasiveness, and cost-effectiveness.However, existing methods often fail to fully exploit the multi-domain information in EEG signals, resulting in limited model generalization capabilities. This paper proposes an improved bidirectional long short-term memory (BiLSTM) model that segments continuous EEG into non-overlapping 2-s epochs and learns end-to-end from multi-channel temporal sequences. After band-pass filtering and resampling, each epoch is represented as a channel–time matrix

X \in R^{C \times T}

(with C = 128) and processed by a BiLSTM encoder followed by a dynamic-routing encapsulated-vector classifier. On the MODMA dataset under subject-independent five-fold cross-validation, the proposed method outperforms a set of reproduced representative baselines (SVM, EEGNet, InceptionNet, Self-attention-CNN and CNN–LSTM) and achieves 84.8% accuracy with an AUC of 0.899. We further discuss recent contemporary directions (e.g., attention/Transformer-based and emotion-aware expert models) and clarify the scope of our empirical comparisons. Furthermore, experiments comparing different frequency bands and band combinations indicate that joint multi-frequency input can enhance classification performance. This study provides an effective multi-domain fusion approach for the automatic diagnosis of depression based on EEG. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Graphical abstract

33 pages, 4366 KB

Open AccessArticle

Structured and Factorized Multi-Modal Representation Learning for Physiological Affective State and Music Preference Inference

by Wenli Qu and Mu-Jiang-Shan Wang

Symmetry 2026, 18(3), 488; https://doi.org/10.3390/sym18030488 - 12 Mar 2026

Viewed by 317

Abstract

Emotions and affective responses are core intervention targets in music therapy. Through acoustic elements, music can evoke emotional responses at physiological and neurological levels, influencing cognition and behavior while providing an important dimension for evaluating therapeutic efficacy. However, emotions are inherently abstract and [...] Read more.

Emotions and affective responses are core intervention targets in music therapy. Through acoustic elements, music can evoke emotional responses at physiological and neurological levels, influencing cognition and behavior while providing an important dimension for evaluating therapeutic efficacy. However, emotions are inherently abstract and difficult to represent directly. Artificial intelligence models therefore provide a promising tool for modeling and quantifying such abstract affective states from physiological signals. In this paper, we propose a structured and explicitly factorized multi-modal representation learning framework for joint affective state and preference inference. Instead of entangling heterogeneous dynamics within monolithic encoders, the framework decomposes representation learning into cross-channel interaction modeling and intra-channel temporal–spectral organization modeling. The framework integrates electroencephalography (EEG), peripheral physiological signals (GSR, BVP, EMG, respiration, and temperature), and eye-movement data (EOG) within a unified temporal modeling paradigm. At its core, a Dynamic Token Feature Extractor (DTFE) transforms raw time series into compact token representations and explicitly factorizes representation learning into (i) explicit channel-wise cross-series interaction modeling and (ii) temporal–spectral refinement via learnable frequency-domain gating. These complementary structural modules are implemented through Cross-Series Intersection (CSI) and Intra-Series Intersection (ISI), which perform low-rank channel dependency learning and adaptive spectral modulation, respectively. A hierarchical cross-modal fusion strategy integrates modality-level tokens in a representation-consistent and interaction-aware manner, enabling coordinated modeling of neural, autonomic, and attentional responses. The entire framework is optimized under a unified multi-task objective for valence, arousal, and liking prediction. Experiments on the DEAP dataset demonstrate consistent improvements over state-of-the-art methods. The model achieves 98.32% and 98.45% accuracy for valence and arousal prediction, 97.96% for quadrant classification in single-task evaluation, and 92.8%, 91.8%, and 93.6% accuracy for valence, arousal, and liking in joint multi-task settings. Overall, this work establishes a structure-aware and factorized multi-modal representation learning framework for robust affective decoding and intelligent music therapy systems. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

17 pages, 11401 KB

Open AccessArticle

Exploring the Impact of Emotional States on Fatigue Evolution in Metro Drivers: A Physiological Signal-Based Approach

by Lianjie Chen, Yuanchun Huang, Fangsheng Wang, Lin Zhu and Zhigang Liu

Appl. Sci. 2026, 16(6), 2653; https://doi.org/10.3390/app16062653 - 10 Mar 2026

Viewed by 224

Abstract

To investigate the regulatory effects of emotional states on the evolution of fatigue in metro drivers, this study conducts an experimental investigation based on an urban rail transit driving simulation platform. A total of 21 participants complete a 90 min simulated driving task, [...] Read more.

To investigate the regulatory effects of emotional states on the evolution of fatigue in metro drivers, this study conducts an experimental investigation based on an urban rail transit driving simulation platform. A total of 21 participants complete a 90 min simulated driving task, during which electroencephalogram (EEG) and electrocardiogram (ECG) signals are synchronously collected from drivers for fatigue assessment and emotion recognition, respectively. An emotion recognition model based on a multi-scale convolutional neural network (MSCNN) combined with an attention mechanism is constructed. The proposed model uses ECG signals to classify three emotional states—neutral, positive, and negative—where the neutral state is defined as an emotionally undefined baseline that is neither positive nor negative. The model achieves a classification accuracy of 86.96% on the DREAMER dataset. By temporally aligning the emotion recognition results with EEG frequency-domain fatigue indicators, the results show that fatigue exhibits the highest growth and largest fluctuation in amplitude under negative emotions, demonstrating a pronounced fatigue-accelerating effect. Under positive emotions, fatigue decreases considerably and has smaller fluctuations, indicating a certain buffering and restorative effect. In contrast, the neutral emotional state exhibits intermediate and transitional fatigue characteristics. This study innovatively integrates ECG-based emotion recognition with EEG-based fatigue assessment to reveal the mechanisms based on which emotions influence fatigue in metro driving tasks from a physiological perspective. This work provides a basis for emotion-aware fatigue monitoring and safety intervention strategies. Full article

(This article belongs to the Section Transportation and Future Mobility)

► Show Figures

Figure 1

22 pages, 340 KB

Open AccessArticle

From Patient Emotion Recognition to Provider Understanding: A Multimodal Data Mining Framework for Emotion-Aware Clinical Counseling Systems

by Saahithi Mallarapu, Xinyan Liu, Pegah Zargarian, Seyyedeh Fatemeh Mottaghian, Ramyashree Suresha, Vasudha Jain and Akram Bayat

Computers 2026, 15(3), 161; https://doi.org/10.3390/computers15030161 - 3 Mar 2026

Viewed by 432

Abstract

Computational analysis of therapeutic communication presents challenges in multi-label classification, severe class imbalance, and heterogeneous multimodal data integration. We introduce a bidirectional analytical framework addressing patient emotion recognition and provider behavior analysis. For patient-side analysis, we employ ClinicalBERT on human-annotated CounselChat (1482 interactions, [...] Read more.

Computational analysis of therapeutic communication presents challenges in multi-label classification, severe class imbalance, and heterogeneous multimodal data integration. We introduce a bidirectional analytical framework addressing patient emotion recognition and provider behavior analysis. For patient-side analysis, we employ ClinicalBERT on human-annotated CounselChat (1482 interactions, 25 categories, imbalance 60:1), achieving a macro-F1 of 0.74 through class weighting and threshold optimization, representing a six-fold improvement over naive baselines and 6–13 point improvement over modern imbalance methods. For provider-side analysis, we process 330 YouTube therapy sessions through automated pipelines (speaker diarization, automatic speech recognition, temporal segmentation), yielding 14,086 annotated segments. Our architecture combines DeBERTa-v3-base with WavLM-base-plus through cross-modal attention mechanisms adapted from multimodal Transformer frameworks. On controlled human-annotated HOPE data (178 sessions, 12,500 utterances), the model achieves a macro-F1 of 0.91 with Cohen’s kappa of 0.87, comparable to inter-rater reliability reported in psychotherapy process research. On YouTube data, a macro-F1 of 0.71 demonstrates feasibility while highlighting annotation quality impacts. Cross-dataset transfer and systematic attention analyses validate domain-specific effectiveness and interpretability. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining: Methods, Trends, and Emerging Applications)

► Show Figures

Figure 1

27 pages, 4205 KB

Open AccessArticle

Facial Expression Annotation and Analytics for Dysarthria Severity Classification

by Shufei Duan, Yuxin Guo, Longhao Fu, Fujiang Li, Xinran Dong, Huizhi Liang and Wei Zhang

Sensors 2026, 26(4), 1239; https://doi.org/10.3390/s26041239 - 13 Feb 2026

Viewed by 376

Abstract

Dysarthria in patients post-stroke is often accompanied by central facial paralysis, which impairs facial motor control and emotional expression. Current assessments rely on acoustic modalities, overlooking facial pathological cues and their correlation with emotional expression, which hinders comprehensive disease assessment. To address this [...] Read more.

Dysarthria in patients post-stroke is often accompanied by central facial paralysis, which impairs facial motor control and emotional expression. Current assessments rely on acoustic modalities, overlooking facial pathological cues and their correlation with emotional expression, which hinders comprehensive disease assessment. To address this issue, we propose a multimodal severity classification framework that integrates facial and acoustic features. Firstly, a multi-level annotation algorithm based on a pre-trained model and motion amplitude was designed to overcome the problem of data scarcity. Secondly, facial topology was modeled using Delaunay triangulation, with spatial relationships captured via graph convolutional networks (GCNs), while abnormal muscle coordination is quantified using facial action units (AUs). Finally, we proposed a multimodal feature set fusion technology framework to achieve the compensation of facial visual features for acoustic modalities and the analysis of disease classification. Our experimental results using the THE-POSSD dataset demonstrate an accuracy of 92.0% and an F1 score of 91.6%, significantly outperforming single-modality baselines. This study reveals the changes in facial movements and sensitive areas of patients under different emotional states, verifies the compensatory ability of visual patterns for auditory patterns, and demonstrates the potential of this multimodal framework for objective assessment and future clinical applications in speech disorders. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 4143 KB

Open AccessArticle

Distinguishing Early Depression from Negative Emotion via Multi-Domain EEG Feature Fusion and Multi-Head Additive Attention Network

by Ruoyu Du, Benbao Wang, Haipeng Gao, Tingting Xu, Shanjing Ju, Xin Xu and Jiangnan Xu

Entropy 2026, 28(2), 218; https://doi.org/10.3390/e28020218 - 13 Feb 2026

Viewed by 395

Abstract

The early diagnosis of depression is often impeded by the subjectivity inherent in traditional clinical assessments. To advance objective screening, this study proposes a lightweight neural network framework designed to discriminate between pathological depressive states and non-pathological transient negative emotions using EEG signals. [...] Read more.

The early diagnosis of depression is often impeded by the subjectivity inherent in traditional clinical assessments. To advance objective screening, this study proposes a lightweight neural network framework designed to discriminate between pathological depressive states and non-pathological transient negative emotions using EEG signals. Diverging from conventional methods that rely on single-domain features, we construct a comprehensive multi-domain feature space via Wavelet Packet Decomposition. Specifically, the framework integrates frequency (α/β power spectral density ratio), spatial (normalized α-asymmetry), and non-linear (Sample Entropy) attributes to capture the heterogeneous neurophysiological dynamics of depression. To effectively synthesize these diverse features, a multi-head additive attention mechanism is introduced. This mechanism empowers the model to adaptively recalibrate feature weights, thereby prioritizing the most discriminative patterns associated with the disorder. Experimental validation on the DEAP (negative emotion) and HUSM (major depressive disorder) datasets demonstrates that the proposed method achieves a classification accuracy of 92.2% and an F1-score of 93%. Comparative results indicate that our model significantly outperforms baseline SVM and standard deep learning approaches. Furthermore, the architecture exhibits high computational efficiency and rapid convergence, highlighting its potential as a deployable engine for real-time mental health monitoring in clinical scenarios. Full article

(This article belongs to the Section Entropy and Biology)

► Show Figures

Graphical abstract

14 pages, 725 KB

Open AccessArticle

PLTA-FinBERT: Pseudo-Label Generation-Based Test-Time Adaptation for Financial Sentiment Analysis

by Hai Yang, Hainan Chen, Chang Jiang, Juntao He and Pengyang Li

Big Data Cogn. Comput. 2026, 10(2), 59; https://doi.org/10.3390/bdcc10020059 - 11 Feb 2026

Viewed by 689

Abstract

Financial sentiment analysis leverages natural language processing techniques to quantitatively assess sentiment polarity and emotional tendencies in financial texts. Its practical application in investment decision-making and risk management faces two major challenges: the scarcity of high-quality labeled data due to expert annotation costs, [...] Read more.

Financial sentiment analysis leverages natural language processing techniques to quantitatively assess sentiment polarity and emotional tendencies in financial texts. Its practical application in investment decision-making and risk management faces two major challenges: the scarcity of high-quality labeled data due to expert annotation costs, and semantic drift caused by the continuous evolution of market language. To address these issues, this study proposes PLTA-FinBERT, a pseudo-label generation-based test-time adaptation framework that enables dynamic self-learning without requiring additional labeled data. The framework consists of two modules: a multi-perturbation pseudo-label generation mechanism that enhances label reliability through consistency voting and confidence-based filtering, and a test-time dynamic adaptation strategy that iteratively updates model parameters based on high-confidence pseudo-labels, allowing the model to continuously adapt to new linguistic patterns. PLTA-FinBERT achieves 0.8288 accuracy on the sentiment classification dataset of financial sentiment analysis, representing an absolute improvement of 2.37 percentage points over the benchmark. On the FiQA sentiment intensity prediction task, it obtains an

R^{2}

of 0.58, surpassing the previous state-of-the-art by 3 percentage points. Full article

► Show Figures

Figure 1

20 pages, 1239 KB

Open AccessArticle

Task-Adaptive and Multi-Level Contextual Understanding for Emotion Recognition in Conversations

by Xiaomeng Yao, Wei Cao, Yuyang Xue, Haijun Zhang and Xiaochao Fan

Appl. Sci. 2026, 16(4), 1706; https://doi.org/10.3390/app16041706 - 9 Feb 2026

Viewed by 257

Abstract

Emotion recognition in conversations (ERC) is a significant task in natural language processing, aimed at identifying the emotion of each utterance within a conversation. Current research predominantly relies on pre-trained language models, often incorporating sophisticated network architectures to capture complex contextual semantics in [...] Read more.

Emotion recognition in conversations (ERC) is a significant task in natural language processing, aimed at identifying the emotion of each utterance within a conversation. Current research predominantly relies on pre-trained language models, often incorporating sophisticated network architectures to capture complex contextual semantics in conversations. However, existing approaches have not successfully combined effective task-specific adaptation with adequate modeling of conversational context complexity. To address this, we propose a model named TAMC-ERC (Task-Adaptive and Multi-level Contextual Understanding for Emotion Recognition in Conversations). The model adopts a progressive recognition framework that sequentially builds on foundational utterance representations, integrates conversation-level contexts, and leads to a task-adaptive classification decision. First, the Task-Adaptive Representation Learning module produces highly discriminative utterance representations. It achieves this by integrating emotion space information into prompts and employing contrastive learning. Subsequently, the Multi-Level Contextual Understanding module performs in-depth modeling of the conversational context. It synergistically integrates both macroscopic narratives and microscopic interactions to construct a comprehensive emotional context. Finally, the classifier is directly parameterized by the emotion concept vectors from the task-adaptive stage. This creates a coherent task adaptation process, maintaining task-specific awareness from representation learning through to the final decision. Experiments on three benchmark datasets demonstrate that TAMC-ERC achieves highly competitive performance: it attains weighted average F1 scores of 71.04% on IEMOCAP, 66.95% on MELD, and 40.99% on EmoryNLP. These results set a new state of the art and demonstrate that the model outperforms most existing baselines. This work validates that integrating task adaptation with multi-level contextual modeling is key to addressing conversational complexity and improving recognition accuracy. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

20 pages, 2117 KB

Open AccessArticle

An Interpretable Residual Spatio-Temporal Graph Attention Network for Multiclass Emotion Recognition from EEG

by Manal Hilali, Abdellah Ezzati, Said Ben Alla and Ahmed El Badaoui

Signals 2026, 7(1), 16; https://doi.org/10.3390/signals7010016 - 5 Feb 2026

Viewed by 888

Abstract

Automatic emotion recognition based on EEG has been a key research frontier in recent years, involving the direct extraction of emotional states from brain dynamics. However, existing deep learning approaches often treat EEG either as a sequence or as a static spatial map, [...] Read more.

Automatic emotion recognition based on EEG has been a key research frontier in recent years, involving the direct extraction of emotional states from brain dynamics. However, existing deep learning approaches often treat EEG either as a sequence or as a static spatial map, thereby failing to jointly capture the temporal evolution and spatial dependencies underlying emotional responses. To address this limitation, we propose an Interpretable Residual Spatio-Temporal Graph Attention Network (IRSTGANet) that integrates temporal convolutional encoding with residual graph-attention blocks. The temporal module enhances short-term EEG dynamics, while the graph-attention layers learn adaptive node connectivity relationships and preserve contextual information through residual links. Evaluated on the DEAP and SEED datasets, the proposed model achieved exceptional performance on valence and arousal, as well as four-class and nine-class classification on the DEAP dataset and on the three-class SEED dataset, exceeding state-of-the-art methods. These results demonstrate that combining temporal enhancement with residual graph attention yields both improved recognition performance and interpretable insights into emotion-related neural connectivity. Full article

► Show Figures

Figure 1

23 pages, 2302 KB

Open AccessArticle

Learnable Feature Disentanglement with Temporal-Complemented Motion Enhancement for Micro-Expression Recognition

by Yu Qian, Shucheng Huang and Kai Qu

Entropy 2026, 28(2), 180; https://doi.org/10.3390/e28020180 - 4 Feb 2026

Viewed by 447

Abstract

Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those [...] Read more.

Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those based on Robust Principal Component Analysis (RPCA), attempt to separate identity and motion components through fixed preprocessing and coarse decomposition. However, these methods can inadvertently remove subtle emotional cues and are disconnected from subsequent module training, limiting the discriminative power of features. Inspired by the Bruce–Young model of facial cognition, which suggests that facial identity and expression are processed via independent neural routes, we recognize the need for a more dynamic, learnable disentanglement paradigm for MER. We propose LFD-TCMEN, a novel network that introduces an end-to-end learnable feature disentanglement framework. The network is synergistically optimized by a multi-task objective unifying orthogonality, reconstruction, consistency, cycle, identity, and classification losses. Specifically, the Disentangle Representation Learning (DRL) module adaptively isolates pure motion patterns from subject-specific appearance, overcoming the limitations of static preprocessing, while the Temporal-Complemented Motion Enhancement (TCME) module integrates purified motion representations—highlighting subtle facial muscle activations—with optical flow dynamics to comprehensively model the spatiotemporal evolution of MEs. Extensive experiments on CAS(ME)³ and DFME benchmarks demonstrate that our method achieves state-of-the-art cross-subject performance, validating the efficacy of the proposed learnable disentanglement and synergistic optimization. Full article

(This article belongs to the Special Issue Application of Information Theory to Computer Vision and Image Processing, 3rd Edition)

► Show Figures

Figure 1

24 pages, 1972 KB

Open AccessArticle

Exploring the Topics and Sentiments of AI-Related Public Opinions: An Advanced Machine Learning Text Analysis

by Wullianallur Raghupathi, Jie Ren and Tanush Kulkarni

Information 2026, 17(2), 134; https://doi.org/10.3390/info17020134 - 1 Feb 2026

Viewed by 2818

Abstract

This study investigates the evolution of public sentiment and discourse surrounding artificial intelligence through a comprehensive multi-method analysis of 28,819 Reddit comments spanning March 2015 to May 2024. Addressing three research questions—(1) what dominant topics characterize AI discourse, (2) how has sentiment changed [...] Read more.

This study investigates the evolution of public sentiment and discourse surrounding artificial intelligence through a comprehensive multi-method analysis of 28,819 Reddit comments spanning March 2015 to May 2024. Addressing three research questions—(1) what dominant topics characterize AI discourse, (2) how has sentiment changed over time, particularly following ChatGPT 5.2’s release, and (3) what linguistic patterns distinguish positive from negative discourse—we employ 28 distinct analytical techniques to provide validated insights into public AI perception. Methodologically, the study integrates VADER sentiment analysis, Linguistic Inquiry and Word Count (LIWC) analysis with regression validation, dual topic modeling using Latent Dirichlet Allocation and Non-negative Matrix Factorization for cross-validation, four-dimensional tone analysis, named entity recognition, emotion detection, and advanced NLP techniques including sarcasm detection, stance classification, and toxicity analysis. A key methodological contribution is the validation of LIWC categories through linear regression (R² = 0.049, p < 0.001) and logistic regression (61% accuracy), moving beyond the descriptive statistics typical of prior linguistic analyses. Results reveal a pronounced decline in positive sentiment from +0.320 in 2015 to +0.053 in 2024. Contrary to expectations, sentiment decreased following ChatGPT’s November 2022 release, with negative comments increasing from 31.9% to 35.1%—suggesting that direct exposure to powerful AI capabilities intensifies rather than alleviates public concerns. LIWC regression analysis identified negative emotion words (β = −0.083) and positive emotion words (β = +0.063) as the strongest sentiment predictors, confirming that affective rather than technical engagement drives public AI attitudes. Topic modeling revealed nine coherent themes, with facial recognition, algorithmic bias, AI ethics, and social media misinformation emerging as dominant concerns across both LDA and NMF analyses. Network analysis identified regulation as a central hub (degree centrality = 0.929) connecting all major AI concerns, indicating strong public appetite for governance frameworks. These findings contribute to theoretical understandings of technology risk perception, provide practical guidance for AI developers and policymakers, and demonstrate validated computational methods for tracking public opinion toward emerging technologies. Full article

(This article belongs to the Collection Natural Language Processing and Applications: Challenges and Perspectives)

► Show Figures

Figure 1

21 pages, 2592 KB

Open AccessArticle

Parsing Emotion in Classical Music: A Behavioral Study on the Cognitive Mapping of Key, Tempo, Complexity and Energy in Piano Performance

by Alice Mado Proverbio, Chang Qin and Miloš Milovanović

Appl. Sci. 2026, 16(3), 1371; https://doi.org/10.3390/app16031371 - 29 Jan 2026

Viewed by 741

Abstract

Music conveys emotion through a complex interplay of structural and acoustic cues, yet how these features map onto specific affective interpretations remains a key question in music cognition. This study explored how listeners, unaware of contextual information, categorized 110 emotionally diverse excerpts—varying in [...] Read more.

Music conveys emotion through a complex interplay of structural and acoustic cues, yet how these features map onto specific affective interpretations remains a key question in music cognition. This study explored how listeners, unaware of contextual information, categorized 110 emotionally diverse excerpts—varying in key, tempo, note density, acoustic energy, and expressive gestures—from works by Bach, Beethoven, and Chopin. Twenty classically trained participants labeled each excerpt using six predefined emotional categories. Emotion judgments were analyzed within a supervised multi-class classification framework, allowing systematic quantification of recognition accuracy, misclassification patterns, and category reliability. Behavioral responses were consistently above chance, indicating shared decoding strategies. Quantitative analyses of live performance recordings revealed systematic links between expressive features and emotional tone: high-arousal emotions showed increased acoustic intensity, faster gestures, and dominant right-hand activity, while low-arousal states involved softer dynamics and more left-hand involvement. Major-key excerpts were commonly associated with positive emotions—“Peacefulness” with slow tempos and low intensity, “Joy” with fast, energetic playing. Minor-key excerpts were linked to negative/ambivalent emotions, aligning with prior research on the emotional complexity of minor modality. Within the minor mode, a gradient of arousal emerged, from “Melancholy” to “Power,” the latter marked by heightened motor activity and sonic force. Results support an embodied view of musical emotion, where expressive meaning emerges through dynamic motor-acoustic patterns that transcend stylistic and cultural boundaries. Full article

(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)

► Show Figures

Figure 1

28 pages, 1521 KB

Open AccessArticle

Image–Text Sentiment Analysis Based on Dual-Path Interaction Network with Multi-Level Consistency Learning

by Zhi Ji, Chunlei Wu, Qinfu Xu and Yixiang Wu

Electronics 2026, 15(3), 581; https://doi.org/10.3390/electronics15030581 - 29 Jan 2026

Viewed by 442

Abstract

With the continuous evolution of social media, users are increasingly inclined to express their personal emotions on digital platforms by integrating information presented in multiple modalities. Within this context, research on image–text sentiment analysis has garnered significant attention. Prior research efforts have made [...] Read more.

With the continuous evolution of social media, users are increasingly inclined to express their personal emotions on digital platforms by integrating information presented in multiple modalities. Within this context, research on image–text sentiment analysis has garnered significant attention. Prior research efforts have made notable progress by leveraging shared emotional concepts across visual and textual modalities. However, existing cross-modal sentiment analysis methods face two key challenges: Previous approaches often focus excessively on fusion, resulting in learned features that may not achieve emotional alignment; traditional fusion strategies are not optimized for sentiment tasks, leading to insufficient robustness in final sentiment discrimination. To address the aforementioned issues, this paper proposes a Dual-path Interaction Network with Multi-level Consistency Learning (DINMCL). It employs a multi-level feature representation module to decouple the global and local features of both text and image. These decoupled features are then fed into the Global Congruity Learning (GCL) and Local Crossing-Congruity Learning (LCL) modules, respectively. GCL models global semantic associations using Crossing Prompter, while LCL captures local consistency in fine-grained emotional cues across modalities through cross-modal attention mechanisms and adaptive prompt injection. Finally, a CLIP-based adaptive fusion layer integrates the multi-modal representations in a sentiment-oriented manner. Experiments on the MVSA_Single, MVSA_Multiple, and TumEmo datasets with baseline models such as CTMWA and CLMLF demonstrate that DINMCL significantly outperforms mainstream models in sentiment classification accuracy and F1-score and exhibits strong robustness when handling samples containing highly noisy symbols. Full article

(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)

► Show Figures

Figure 1

24 pages, 9586 KB

Open AccessArticle

EEG–fNIRS Cross-Subject Emotion Recognition Based on Attention Graph Isomorphism Network and Contrastive Learning

by Bingzhen Yu, Xueying Zhang and Guijun Chen

Brain Sci. 2026, 16(2), 145; https://doi.org/10.3390/brainsci16020145 - 28 Jan 2026

Viewed by 630

Abstract

Background/Objectives: Electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) can objectively capture the spatiotemporal dynamics of brain activity during affective cognition, and their combination is promising for improving emotion recognition. However, multi-modal cross-subject emotion recognition remains challenging due to heterogeneous signal characteristics that hinder [...] Read more.

Background/Objectives: Electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) can objectively capture the spatiotemporal dynamics of brain activity during affective cognition, and their combination is promising for improving emotion recognition. However, multi-modal cross-subject emotion recognition remains challenging due to heterogeneous signal characteristics that hinder effective fusion and substantial inter-subject variability that degrades generalization to unseen subjects. Methods: To address these issues, this paper proposes DC-AGIN, a dual-contrastive learning attention graph isomorphism network for EEG–fNIRS emotion recognition. DC-AGIN employs an attention-weighted AGIN encoder to adaptively emphasize informative brain-region topology while suppressing redundant connectivity noise. For cross-modal fusion, a cross-modal contrastive learning module projects EEG and fNIRS representations into a shared latent semantic space, promoting semantic alignment and complementarity across modalities. Results: To further enhance cross-subject generalization, a supervised contrastive learning mechanism is introduced to explicitly mitigate subject-specific identity information and encourage subject-invariant affective representations. Experiments on a self-collected dataset are conducted under both subject-dependent five-fold cross-validation and subject-independent leave-one-subject-out (LOSO) protocols. The proposed method achieves 96.98% accuracy in four-class classification in the subject-dependent setting and 62.56% under LOSO. Compared with existing models, DC-AGIN achieves SOTA performance. Conclusions: These results demonstrate that the work on attention aggregation, cross-modal and cross-subject contrastive learning enables more robust EEG-fNIRS emotion recognition, thus supporting the effectiveness of DC-AGIN in generalizable emotion representation learning. Full article

(This article belongs to the Section Cognitive, Social and Affective Neuroscience)

► Show Figures

Figure 1

Search Results (194)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (194)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI