Next Article in Journal
AnomalyNLP: Noisy-Label Prompt Learning for Few-Shot Industrial Anomaly Detection
Previous Article in Journal
Range Enhancement of a 60 GHz FMCW Heart Rate Radar Using Fabry–Perot Cavity Antenna
Previous Article in Special Issue
Detecting Shifts in Public Discourse from Offline to Online Using Deep Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining

1
Department of Computer Science, Kyonggi University, Suwon-si, Gyeonggi-do 16227, Republic of Korea
2
Division of AI Computer Science and Engineering, Kyonggi University, Suwon-si, Gyeonggi-do 16227, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(20), 4015; https://doi.org/10.3390/electronics14204015 (registering DOI)
Submission received: 28 August 2025 / Revised: 2 October 2025 / Accepted: 6 October 2025 / Published: 13 October 2025
(This article belongs to the Special Issue Application of Data Mining in Social Media)

Abstract

We propose a multimodal emotion recognition framework that integrates facial expressions and speech transcription (where text is derived from the transcribed speech), with a particular focus on effectively modeling the continuous changes and transitions of emotional states during conversation. Existing studies have primarily relied on single modalities (text or facial expressions). They often perform static emotion classification at specific time points. This approach limits their ability to capture abrupt emotional shifts or the structural patterns of emotional flow within dialogues. To address these limitations, this paper utilizes the MELD dataset to construct emotion sequences based on the order of utterances and introduces an analytical approach using Sequential Pattern Mining (SPM). Facial expressions are detected using DeepFace, while speech is transcribed with Whisper and passed through a BERT-based emotion classifier to infer emotions. The proposed method fuses multimodal results through a weighted voting scheme to generate emotion label sequences for each utterance. These sequences are then used to construct an emotion transition matrix, apply change-point detection, perform SPM, and train an LSTM-based classification model to predict the overall emotional flow of the dialogue. This approach goes beyond single-point judgments by capturing the contextual flow and dynamics of emotions and demonstrates superior performance compared to existing methods through experimental validation.
Keywords: multimodal emotion recognition; sequential pattern mining; emotion transition; change-point detection; LSTM multimodal emotion recognition; sequential pattern mining; emotion transition; change-point detection; LSTM

Share and Cite

MDPI and ACS Style

Song, Y.; Chung, K. Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining. Electronics 2025, 14, 4015. https://doi.org/10.3390/electronics14204015

AMA Style

Song Y, Chung K. Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining. Electronics. 2025; 14(20):4015. https://doi.org/10.3390/electronics14204015

Chicago/Turabian Style

Song, Younghun, and Kyungyong Chung. 2025. "Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining" Electronics 14, no. 20: 4015. https://doi.org/10.3390/electronics14204015

APA Style

Song, Y., & Chung, K. (2025). Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining. Electronics, 14(20), 4015. https://doi.org/10.3390/electronics14204015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop