applsci-logo

Journal Browser

Journal Browser

Multimodal Emotion Recognition and Affective Computing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Applied Neuroscience and Neural Engineering".

Deadline for manuscript submissions: 20 October 2026 | Viewed by 4388

Special Issue Editors


E-Mail Website
Guest Editor
ATIC Research Group, ITIS Software, Universidad de Málaga, 29071 Málaga, Spain
Interests: digital signal processing; musical acoustics; EEG-NIRS processing and new educational methods
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue aims to bring together contributions on machine learning and deep learning methods for recognizing human emotions and cognitive states from a variety of modalities, including brain signals (EEG, fNIRS), physiological signals (ECG, EDA, respiration), speech, facial expressions, body motion and multimodal data fusion. Applications in human–computer interaction, healthcare, education, entertainment and VR/AR environments would also be emphasized.

Given the increasing interest in emotion-aware and human-centered AI systems, I believe this theme would appeal to a wide community of researchers and practitioners. 

Dr. Athanasios Koutras
Dr. Ana Maria Barbancho
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • emotion recognition
  • affective computing
  • multimodal fusion

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

29 pages, 3177 KB  
Article
Dual-Distillation Vision-Language Model for Multimodal Emotion Recognition in Conversation with Quantized Edge Deployment
by DeogHwa Kim, Yu il Lee, Da Hyun Yoon, Byeong Jun Kim and Deok-Hwan Kim
Appl. Sci. 2026, 16(6), 3103; https://doi.org/10.3390/app16063103 - 23 Mar 2026
Viewed by 714
Abstract
Multimodal Emotion Recognition in Conversation (ERC) has attracted attention as a key technology in human–computer interaction, mental healthcare, and intelligent services. However, deploying ERC in real-world settings remains challenging due to reliability gaps across modalities, instability in visual representations, and the high computational [...] Read more.
Multimodal Emotion Recognition in Conversation (ERC) has attracted attention as a key technology in human–computer interaction, mental healthcare, and intelligent services. However, deploying ERC in real-world settings remains challenging due to reliability gaps across modalities, instability in visual representations, and the high computational cost of large pretrained models. In particular, on resource-constrained edge devices, it is difficult to reduce model size and inference latency while preserving accuracy. To address these challenges, we jointly propose a knowledge-distillation-based multimodal ERC model, called DDVLM, with an edge-optimized Weight-Only Quantization (WOQ) pipeline for efficient edge deployment. DDVLM assigns the textual modality as the teacher and the visual modality as the student, transferring emotion-distribution knowledge to improve non-verbal representations and stabilize multimodal learning. In addition, Exponential Moving Average (EMA)-based self-distillation enhances the consistency and generalization capability of text features. Meanwhile, the proposed WOQ pipeline quantizes linear-layer weights to INT8 while preserving precision-sensitive operations in mixed precision, thereby minimizing accuracy loss and reducing model size, memory usage, and inference latency. Experiments on the MELD dataset demonstrated that the proposed approach achieves state-of-the-art performance while also enabling real-time inference on edge devices such as NVIDIA Jetson. Overall, this work presents a practical ERC framework that jointly considers accuracy and deployability. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

22 pages, 7116 KB  
Article
IPA 2.0: Validation of an Interpretable Emotion-Attention Index for Neuro-Adaptive Learning with AI
by Javier Arranz-Romero, Rosabel Roig-Vila and Miguel Cazorla
Appl. Sci. 2026, 16(5), 2515; https://doi.org/10.3390/app16052515 - 5 Mar 2026
Viewed by 697
Abstract
Adaptive learning systems increasingly rely on multimodal affective computing, yet many pipelines remain difficult to audit and pedagogically justify. We introduce NAILF (Neuro-Adaptive Artificial Intelligent Learning Flow) and formalise IPA 2.0 as an interpretable continuous index integrating affective valence/intensity with attentional activation into [...] Read more.
Adaptive learning systems increasingly rely on multimodal affective computing, yet many pipelines remain difficult to audit and pedagogically justify. We introduce NAILF (Neuro-Adaptive Artificial Intelligent Learning Flow) and formalise IPA 2.0 as an interpretable continuous index integrating affective valence/intensity with attentional activation into a traceable intermediate signal for neuro-adaptive decision-making. Validation follows a two-level strategy. Study A performs a structured simulation over the full emotion–attention space (108 configurations), demonstrating numerical stability and coherent monotonic behaviour under controlled parameterisation. Study B evaluates external validity on the DIPSEER in-the-wild classroom dataset using subject-wise temporal calibration (lag/windowing/smoothing), hold-out evaluation, and explicit anti-leakage auditing. Across evaluable subjects (n = 172), Fisher-z aggregation shows a small but significant association between IPA 2.0 and an external engagement criterion (rglobal = 0.166, 95% CI [0.017, 0.308]). A heterogeneous strong-signal subset (n = 25, reval ≥ 0.50) supports personalised calibration as a core design principle. We discuss practical implications: IPA 2.0 is not a sole predictor, but an auditable signal that can gate, rank, and explain adaptive interventions under real-world noise and label–signal asynchrony. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

16 pages, 668 KB  
Article
Evaluation of a Company’s Media Reputation Based on the Articles Published on News Portals
by Algimantas Venčkauskas, Vacius Jusas and Dominykas Barisas
Appl. Sci. 2026, 16(4), 1987; https://doi.org/10.3390/app16041987 - 17 Feb 2026
Viewed by 633
Abstract
A company’s reputation is an important, intangible asset, which is heavily influenced by media reputation. We developed a method to measure a company’s reputation based on sentiments detected in online articles. The sentiment of each sentence was evaluated and categorized into one of [...] Read more.
A company’s reputation is an important, intangible asset, which is heavily influenced by media reputation. We developed a method to measure a company’s reputation based on sentiments detected in online articles. The sentiment of each sentence was evaluated and categorized into one of three polarities: positive, negative, or neutral. Then, we developed another method to assess a company’s media reputation using all available online articles about the company. The company’s media reputation is presented as a tuple consisting of their media reputation on a scale from 0 to 100, the number of articles related to the company, and the margin of error. Experiments were conducted using articles written in Lithuanian published on major news portals. We used two different tools to assess the sentiments of the articles: Stanford CoreNLP v.4.5.10, combined with Google API, and the pre-trained transformer model XLM-RoBERTa. Google API was used for translation into English, as Stanford CoreNLP does not support the Lithuanian language. The results obtained were compared with those of existing methods, based on the coefficients of media endorsement and media favorableness, showing that the results of the proposed method are less moderate than the coefficient of media favorableness and less extreme than the coefficient of media endorsement. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

19 pages, 2266 KB  
Article
Affective EEG Decoding Generalizes Across Colormap and Exposure Time
by Andrea De Cesarei, Andrea Belluzzi, Vera Ferrari and Maurizio Codispoti
Appl. Sci. 2026, 16(4), 1779; https://doi.org/10.3390/app16041779 - 11 Feb 2026
Cited by 1 | Viewed by 517
Abstract
Viewing emotional pictures modulates electrocortical activity during the first second, with functional properties that reflect the type of processing that is being carried out. Recently, the investigation of electrocortical activity has been aided by machine learning techniques, such as multivariate pattern analysis (MVPA). [...] Read more.
Viewing emotional pictures modulates electrocortical activity during the first second, with functional properties that reflect the type of processing that is being carried out. Recently, the investigation of electrocortical activity has been aided by machine learning techniques, such as multivariate pattern analysis (MVPA). Building on previous studies that used MVPA to classify between emotional and neutral stimuli, here we investigate electroencephalographic (EEG) changes while a sample of n = 15 participants viewed emotional and neutral scenes that could be presented in color or in grayscale, and for either a short (24 ms) or a long (6 s) exposure time. A linear classifier was used to classify EEG patterns as consequential to the viewing of emotional (pleasant, unpleasant) vs. neutral scenes, and to assess the extent to which scalp activation patterns are specific to the perceptual conditions under which a scene is viewed (i.e., color or greyscale, short vs. long exposure time) or generalize across viewing conditions. We observed that emotional content could be significantly decoded through MVPA, with earlier classification onset for pleasant-neutral vs. unpleasant-neutral classification. Moreover, this classification generalized across perceptual conditions, indicating that the symbolic meaning of natural scenes drives the emotional modulation of scalp activity. These results further indicate that, within the first second after the onset of natural scenes, emotional states can be decoded from the EEG signal, and that such learning can be applied to flexibly classify emotional states under perceptually different conditions. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

21 pages, 2592 KB  
Article
Parsing Emotion in Classical Music: A Behavioral Study on the Cognitive Mapping of Key, Tempo, Complexity and Energy in Piano Performance
by Alice Mado Proverbio, Chang Qin and Miloš Milovanović
Appl. Sci. 2026, 16(3), 1371; https://doi.org/10.3390/app16031371 - 29 Jan 2026
Viewed by 1054
Abstract
Music conveys emotion through a complex interplay of structural and acoustic cues, yet how these features map onto specific affective interpretations remains a key question in music cognition. This study explored how listeners, unaware of contextual information, categorized 110 emotionally diverse excerpts—varying in [...] Read more.
Music conveys emotion through a complex interplay of structural and acoustic cues, yet how these features map onto specific affective interpretations remains a key question in music cognition. This study explored how listeners, unaware of contextual information, categorized 110 emotionally diverse excerpts—varying in key, tempo, note density, acoustic energy, and expressive gestures—from works by Bach, Beethoven, and Chopin. Twenty classically trained participants labeled each excerpt using six predefined emotional categories. Emotion judgments were analyzed within a supervised multi-class classification framework, allowing systematic quantification of recognition accuracy, misclassification patterns, and category reliability. Behavioral responses were consistently above chance, indicating shared decoding strategies. Quantitative analyses of live performance recordings revealed systematic links between expressive features and emotional tone: high-arousal emotions showed increased acoustic intensity, faster gestures, and dominant right-hand activity, while low-arousal states involved softer dynamics and more left-hand involvement. Major-key excerpts were commonly associated with positive emotions—“Peacefulness” with slow tempos and low intensity, “Joy” with fast, energetic playing. Minor-key excerpts were linked to negative/ambivalent emotions, aligning with prior research on the emotional complexity of minor modality. Within the minor mode, a gradient of arousal emerged, from “Melancholy” to “Power,” the latter marked by heightened motor activity and sonic force. Results support an embodied view of musical emotion, where expressive meaning emerges through dynamic motor-acoustic patterns that transcend stylistic and cultural boundaries. Full article
(This article belongs to the Special Issue Multimodal Emotion Recognition and Affective Computing)
Show Figures

Figure 1

Back to TopTop