This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessFeature PaperArticle
Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining
by
Younghun Song
Younghun Song 1
and
Kyungyong Chung
Kyungyong Chung 2,*
1
Department of Computer Science, Kyonggi University, Suwon-si, Gyeonggi-do 16227, Republic of Korea
2
Division of AI Computer Science and Engineering, Kyonggi University, Suwon-si, Gyeonggi-do 16227, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(20), 4015; https://doi.org/10.3390/electronics14204015 (registering DOI)
Submission received: 28 August 2025
/
Revised: 2 October 2025
/
Accepted: 6 October 2025
/
Published: 13 October 2025
Abstract
We propose a multimodal emotion recognition framework that integrates facial expressions and speech transcription (where text is derived from the transcribed speech), with a particular focus on effectively modeling the continuous changes and transitions of emotional states during conversation. Existing studies have primarily relied on single modalities (text or facial expressions). They often perform static emotion classification at specific time points. This approach limits their ability to capture abrupt emotional shifts or the structural patterns of emotional flow within dialogues. To address these limitations, this paper utilizes the MELD dataset to construct emotion sequences based on the order of utterances and introduces an analytical approach using Sequential Pattern Mining (SPM). Facial expressions are detected using DeepFace, while speech is transcribed with Whisper and passed through a BERT-based emotion classifier to infer emotions. The proposed method fuses multimodal results through a weighted voting scheme to generate emotion label sequences for each utterance. These sequences are then used to construct an emotion transition matrix, apply change-point detection, perform SPM, and train an LSTM-based classification model to predict the overall emotional flow of the dialogue. This approach goes beyond single-point judgments by capturing the contextual flow and dynamics of emotions and demonstrates superior performance compared to existing methods through experimental validation.
Share and Cite
MDPI and ACS Style
Song, Y.; Chung, K.
Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining. Electronics 2025, 14, 4015.
https://doi.org/10.3390/electronics14204015
AMA Style
Song Y, Chung K.
Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining. Electronics. 2025; 14(20):4015.
https://doi.org/10.3390/electronics14204015
Chicago/Turabian Style
Song, Younghun, and Kyungyong Chung.
2025. "Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining" Electronics 14, no. 20: 4015.
https://doi.org/10.3390/electronics14204015
APA Style
Song, Y., & Chung, K.
(2025). Facial and Speech-Based Emotion Recognition Using Sequential Pattern Mining. Electronics, 14(20), 4015.
https://doi.org/10.3390/electronics14204015
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.