Emotion Recognition Using Electrocardiogram Trajectory Variation in Attention Networks

Yu, Sung-Nien; Cheng, Chia-Wei; Chang, Yu Ping

doi:10.3390/engproc2025120017

Open AccessProceeding Paper

Emotion Recognition Using Electrocardiogram Trajectory Variation in Attention Networks^†

by

Sung-Nien Yu

^1,2,*

,

Chia-Wei Cheng

¹ and

Yu Ping Chang

¹

Department of Electrical Engineering, National Chung Cheng University, Chiayi County 621301, Taiwan

²

Advanced Institute of Manufacturing with High-Tech Innovations (AIM-HI), National Chung Cheng University, Chiayi County 621301, Taiwan

^*

Author to whom correspondence should be addressed.

^†

Presented at 8th International Conference on Knowledge Innovation and Invention 2025 (ICKII 2025), Fukuoka, Japan, 22–24 August 2025.

Eng. Proc. 2025, 120(1), 17; https://doi.org/10.3390/engproc2025120017

Published: 2 February 2026

(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)

Download

Browse Figures

Versions Notes

Abstract

Emotions are classified into the valence dimension (positive and negative) and the arousal dimension (low and high). Using electrocardiogram (ECG) phase space diagrams and a deep learning approach, emotional states were identified in this study. The DREAMER database was utilized for training and testing the classification model developed. We examined different ECG phase space parameters and compared different deep learning models, including the Visual Geometry Group and Residual networks, and a simple convolutional neural network (CNN) with attention modules. Among the models, a simple four-layer CNN integrated with a convolutional block attention module showed the best performance. Experimental results indicate that the model achieved an accuracy of 87.89% for the valence dimension and 91.79% for the arousal dimension. Compared with existing models, the developed model demonstrates superior performance in emotion recognition. Emotional changes produce noticeable variations in the trajectory patterns of ECG phase space diagrams, which enhance the model’s ability to recognize emotions, even when using relatively simple networks.

Keywords:

emotion recognition; electrocardiogram; phase space diagram; convolutional neural network; convolutional block attention module

1. Introduction

Emotion is a complex psychological state that arises in response to internal or external stimuli, leading to distinct physiological and psychological reactions. Emotions are classified in various ways and are combined to form more complex emotional states. In 1987, Shaver et al. identified 135 emotions and categorized them into six basic types: love, joy, surprise, anger, sadness, and fear [1]. However, there is no universally accepted definition of basic emotions. Scholars studying emotions establish their own classification criteria based on their research objectives. For instance, Ekman and Friesen used facial expressions to define six primary emotions: happiness, sadness, anger, surprise, disgust, and fear [2].

Beyond categorizing emotions based on basic types, Osgood et al. [3] introduced a dimensional approach, dividing emotions into the arousal and valence dimensions. Arousal represents the intensity of emotions, ranging from extreme excitement to deep calmness, while valence reflects the positivity or negativity of emotional experiences. Similarly, in 1980, Russell and Pratt proposed a two-dimensional affective model, defining emotions along the axes of arousal and pleasure [4]. To date, the bipolar model remains the most widely used framework for understanding emotions.

Various methods have been proposed to identify emotion based on physiological clues, such as facial expressions, voice tone, or physiological signals. Physiological signals are widely used as they reflect the internal state of the body, which is not easy to fake. Therefore, they are more capable of reflecting the real emotional state. Recently, emotions have been thought to be closely linked to the brain’s emotional regulation centers. Moreover, emotions are expressed through the antagonistic activities of the sympathetic and parasympathetic branches of the autonomic nervous system, which regulate various physiological responses such as heart rate, blood pressure, respiration, and perspiration (affecting skin conductance).

An electrocardiogram (ECG) is widely used in emotion recognition due to its ease of access and capability of revealing emotional states through heart rate and heart rate variability (HRV). Numerous researchers have explored both traditional machine learning and advanced deep learning methods to classify emotions using ECG signals. Katsigiannis and Ramzan introduced the DREAMER database and conducted emotion recognition experiments by extracting HRV features from ECG signals [5]. They classified emotions in the valence and arousal dimensions using a support vector machine (SVM) classifier, achieving maximum accuracies of 62.37% for both dimensions. Similarly, Hasnul et al. employed feature extraction tools, a toolbox for emotional feature extraction from physiological signals and an automatic biosignal toolbox, for emotion recognition through an SVM classifier, obtaining best accuracies of 65.80% for valence and 57.10% for arousal on the DREAMER dataset [6]. Sarkar et al. proposed a self-supervised deep multi-task convolutional neural network (CNN) for emotion recognition using ECG signals [7]. Utilizing the DREAMER dataset, they achieved classification accuracies of 85.00% in the valence dimension and 85.90% for the arousal dimension. Fan et al. [8] introduced a deep CNN-based approach for ECG-based emotion recognition, integrating a convolutional block attention module (CBAM) into the CNN architecture. Their model, applied to the DREAMER dataset, achieved classification accuracies of 87.4% for both valence and arousal.

ECG-based emotion recognition adopts two approaches: directly utilizing raw ECG signals as input to deep learning models or using pre-extracted ECG features as input to classifiers. With advances in AI, deep learning models, particularly CNNs, have demonstrated superior performance compared with traditional machine learning methods such as SVM classifiers in emotion recognition tasks. Furthermore, the incorporation of attention mechanisms, such as CBAM, enhances CNNs’ ability to differentiate emotional states more effectively. Chan et al. utilized ECG phase space diagrams for individual identification [9]. They reconstructed ECG beats as trajectories in phase space and extracted features from these diagrams for identity recognition.

Based on the previous results, we applied the phase space diagram to emotion recognition. Given that respiratory patterns and HRV vary with emotional states, we used longer ECG recordings containing multiple beats. This approach allowed for the detection of significant trajectory variations in phase space diagrams, capturing the dynamic changes in ECG beats associated with emotional states. In this study, we first converted extended ECG signals into phase space representations to amplify beat-to-beat variability across different emotional states. We then leveraged the feature extraction capabilities of CNNs, further enhancing their performance with the attention module CBAM to achieve advanced emotion recognition based on ECG signals.

2. Materials and Methods

2.1. Database

The DREAMER database is used for emotion recognition [5]. The dataset includes data from 25 participants who were exposed to 18 movie clips, each lasting between 65 and 393 s, to elicit emotional responses. Both electroencephalography (EEG) and ECG signals were recorded during the experiments. Due to incomplete data from two participants, the remaining 23 participants’ data were included in this study. The participants ranged in age from 22 to 33 years old, comprising 14 males and 9 females. Following each movie clip, the participants rated their emotional responses using a five-point scale for valence, arousal, and dominance. In this study, we utilized Lead II ECG signals, sampled at 256 Hz, along with the valence and arousal ratings from the DREAMER dataset to train the deep learning models.

2.2. Signal Preprocessing

ECG signals are affected by various noise sources during data collection, including baseline drift, muscle artifacts, and environmental interference. To mitigate these effects, we followed the recommendations of NeuroKit2 [10] and applied a fifth-order Butterworth high-pass filter at 0.5 Hz to remove low-frequency baseline drift. Additionally, a second-order Butterworth notch filter at 50 Hz was used to eliminate power line noise. After filtering, Min-Max normalization was applied to scale the ECG signals to a range of 0 to 1, ensuring consistency across the dataset. Since the DREAMER dataset contains signals of varying lengths, we segmented the ECG signals into 30 s intervals. To balance the distribution of training samples across different emotional categories, varying levels of overlap were introduced between segments.

2.3. Phase Space Reconstruction (PSR)

PSR employs time delay embedding to transform a one-dimensional signal into a two-dimensional phase space diagram. Chan et al. utilized ECG phase space diagrams for individual identification [9]. The mathematical formulation for converting a one-dimensional ECG signal into a two-dimensional phase space representation is given as follows.

X(t) = [Xn(t), Xn(t + τ)]

(1)

where Xn(t) represents the value of the original ECG signal at time t, and τ denotes the time delay.

Chan et al. reconstructed phase space diagrams using single ECG beats for individual identification, determining a time delay of 20 ms to expand the P, QRS, and T waves in phase space [9]. As a result, the QRS wave was mapped to the outer ring, while the T-wave and P-wave formed the inner and smaller rings of the diagram, as illustrated in Figure 1a,b.

In this study, our objective was to distinguish emotional states, which are closely associated with heart rate variability (HRV), rather than to identify individual ECG patterns. Therefore, instead of using single ECG beats, we generated phase space diagrams from 30 s ECG segments. This approach ensured that each segment contained multiple heartbeats, allowing HRV variations to be effectively captured through changes in the phase space trajectory across ECG beats, as shown in Figure 1c,d.

2.4. Deep Learning Model

2.4.1. Visual Geometry Group Network (VGG)

The VGG architecture [11] is a deep CNN designed for image recognition tasks. In this study, we adopted the VGG19 model, which consists of 16 convolutional layers, 3 fully connected layers, and 5 max pooling layers. To enhance computational efficiency and accelerate both model training and decision-making, we simplified the fully connected layers from three to one.

2.4.2. Residual Network (ResNet)

The ResNet architecture [12] addresses the challenges of vanishing and exploding gradients in deep neural networks by incorporating residual blocks. These blocks enable the model to bypass gradient-vanishing layers and focus on learning residual mappings, improving the training process. In this study, we utilized the feature extraction blocks of ResNet while simplifying the fully connected layers by reducing the number of neurons, thereby facilitating efficient training.

2.4.3. A Simple CNN

A simple CNN model consisting of three convolutional layers, three max pooling layers, and one fully connected layer was proposed for comparison. Since most computational complexity lies in the three convolutional layers and the fully connected layer, we denoted the simple CNN as CNN4. We then integrated an attention module into the CNN4 and evaluated its performance against deeper models, such as VGG and ResNet, for emotion recognition.

2.4.4. CBAM

CBAM is a lightweight and effective attention mechanism for CNNs that enhances feature representation by sequentially applying channel and spatial attention, as depicted in Figure 2 [13]. Channel attention focuses on identifying what features are important by analyzing global average and max-pooled descriptors through a shared multilayer perceptron, while spatial attention emphasizes where the important information is by using pooled spatial descriptors followed by a convolution. By refining features along both dimensions, CBAM improves model performance with minimal computational overhead and is easily integrated into existing CNN architectures.

In this study, we integrated CBAM into the simple CNN4 model to improve its feature extraction capabilities. The resulting CNN4 + CBAM architecture, illustrated in Figure 2, captures ECG phase space features through the shallow CNN, while CBAM enhances performance by assigning weights to different channels (channel attention) and emphasizing relevant regions within the channels (spatial attention).

2.4.5. Network Parameter

VGG and ResNet models were initialized with pre-trained weights from ImageNet, whereas the CNN and CNN + CBAM models were trained from scratch. Training was conducted for 300 epochs with early stopping. The initial learning rate was set to 0.001 and adjusted using a rate decay policy, halving the rate if validation loss did not improve for a predefined number of epochs. Cross-entropy was used as the loss function, and the Adam optimizer was employed for training.

2.5. Experimental Design

After preprocessing, each data record was divided into 30 s segments. To account for variations in record length, the overlap ratio was adjusted to balance the number of segments across the two classes, as summarized in Table 1. The ECG segments were then transformed into phase space diagrams, converted to grayscale with a black background and white trajectories, and resized to 256 × 256 pixels for network input. For binary emotion classification, valence levels 1 and 2 were labeled as negative, while levels 3 to 5 were labeled as positive. Similarly, arousal levels 1 and 2 were categorized as low arousal, and levels 3 to 5 as high arousal. Consistent with related studies, we adopted an 80–10–10 split for training, validation, and testing during the cross-validation process.

The confusion matrix is a widely used tool for evaluating the accuracy of a model’s predictions against actual outcomes. In binary classification, it categorizes results into four groups: true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Using these categories, key performance metrics: accuracy, precision, recall, F1 score, and specificity, were calculated to assess classification performance.

3. Results and Discussion

3.1. Influence of Time Delay and Line Width in Phase Space

To assess the impact of time delay and line width on ECG phase space diagrams for emotion recognition, we evaluated different phase space parameters. Specifically, we tested three time-delays: 7.8 ms (2 sample points), 27.3 ms (7 sample points), and 46.9 ms (12 sample points), as illustrated in Figure 3a,c. Additionally, we examined trajectory line widths of 0.1, 1, and 5 pt, as shown in Figure 3d,f.

The recognition rates for the three time delays are summarized in Table 2. The highest accuracy was achieved with a 27.3 ms delay (7 sample points), aligning with the findings of Chan et al. [9], who recommended a time delay between 20 and 28 ms to effectively unfold ECG beats in phase space for improved individual identification. To further assess the impact of trajectory line width on recognition performance in phase space, we fixed the time delay at 27.3 ms and repeated the experiments. As shown in Table 2, using thinner lines (0.1 pt) resulted in higher recognition accuracy. A possible explanation is that thinner lines preserve the fine cyclic patterns of the trajectories with minimal overlap, allowing for clearer visualization of phase space dynamics. In contrast, thicker lines may obscure these patterns by increasing trajectory overlap, thereby reducing recognition effectiveness.

3.2. Comparison of Different Neural Network Architectures

Based on the results, we selected the optimal time delay of 27.3 ms (7 sample points) and a line width of 0.1 pt to reconstruct phase space diagrams for model comparison in emotion recognition. The results of binary classification for valence, summarized in Table 3, show that the CNN4 + CBAM model achieved the highest performance, with an accuracy of 87.89%, a recall of 84.58%, and a specificity of 91.12%. Notably, the simple CNN4 alone only yielded sub-optimal results, and its performance significantly improved when integrated with the CBAM. The CNN4 + CBAM combination, even with its much shallower structure, outperformed both VGGs and ResNets. Similarly, the results for arousal classification indicate that the CNN4 + CBAM model again outperformed all others, achieving an accuracy of 91.79%, a recall of 88.00%, and a specificity of 95.47%. As with valence classification, VGG models and the simple CNN4 performed better than ResNets. The CNN4 + CBAM combination again surpassed its competitors with its much shallower structure (Table 4).

3.3. Comparison with Other Studies

We compared the CNN4 + CBAM architecture with four representative methods discussed in the introduction: those proposed by Katsigianni et al. [5], Sarkar et al. [6], Hasnul et al. [7], and Fan et al. [8]. Since all these studies utilized the DREAMER dataset for emotion recognition in both valence and arousal dimensions, their performance can be more fairly compared. The comparative results are summarized in Table 5. The CNN4 + CBAM model outperformed all other approaches across all evaluation metrics. In the two-class classification tasks, it achieved the highest accuracy of 87.89% for valence, with an F1 score of 87.35% and the best accuracy of 91.79% for arousal, with an F1 score of 91.35% on the DREAMER dataset.

4. Conclusions

We developed a deep learning approach leveraging ECG phase space diagrams for emotion recognition. The results demonstrate that simpler architectures, integrated with the CBAM attention module, show excellent performance, reducing the need for deep or complex networks like VGG and ResNet. These findings suggest that using ECG phase space diagrams highlights emotional features. The natural periodicity of ECG signals, when transformed into phase space, creates trajectory patterns that reflect respiratory effects and heart rate variability. Emotional changes produce noticeable variations in these patterns, which enhances the model’s ability to recognize emotions, even when using relatively simpler and shallower network architectures.

Author Contributions

Conceptualization, methodology, supervision, writing—review and editing and funding acquisition, S.-N.Y.; software, formal analysis and writing—original draft preparation, C.-W.C.; validation and writing—review and editing, Y.P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Council, Taiwan, grant numbers: 108-2221-E-194-034-MY3 and 113-2221-E-194-005-MY3.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data used in this study are publicly available on Kaggle: https://www.kaggle.com/datasets/phhasian0710/dreamer (accessed on 18 July 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shaver, P.; Schwartz, J.; Kirson, D.; O’connor, C. Emotion knowledge: Further exploration of a prototype approach. J. Pers. Soc. Psychol. 1987, 52, 1061–1086. [Google Scholar] [CrossRef] [PubMed]
Ekman, P.; Friesen, W.V. Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 1971, 17, 124–129. [Google Scholar] [CrossRef] [PubMed]
Osgood, C.E.; Suci, J.G.; Tannenbaum, P.H. The Measurement of Meaning; The University of Illinois Press: Urbana, IL, USA, 1957. [Google Scholar]
Russell, J.G.; Pratt, G. A description of the affective quality attributed to environments. J. Pers. Soc. Psychol. 1980, 38, 311–322. [Google Scholar] [CrossRef]
Katsigiannis, S.; Ramzan, N. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J. Biomed. Health Inform. 2018, 22, 98–107. [Google Scholar] [CrossRef] [PubMed]
Hasnul, M.A.; Ab Aziz, N.A.; Abd Aziz, A. Evaluation of TEAP and AuBT as ECG’s Feature Extraction Toolbox for Emotion Recognition System. In Proceedings of the 2021 IEEE 9th Conference on Systems, Process and Control (ICSPC 2021), Malacca, Malaysia, 10–11 December 2021; pp. 52–57. [Google Scholar]
Sarkar, P.; Etemad, A. Self-supervised ECG representation learning for emotion recognition. IEEE Trans. Affect. Comput. 2022, 13, 1541–1554. [Google Scholar] [CrossRef]
Fan, T.; Qiu, S.; Wang, Z.; Zhao, H.; Jiang, J.; Wang, Y.; Xu, J.; Sun, T.; Jiang, N. A new deep convolutional neural network incorporating attentional mechanisms for ECG emotion recognition. Comput. Biol. Med. 2023, 159, 106938. [Google Scholar] [CrossRef] [PubMed]
Chan, H.-L.; Chang, H.-W.; Hsu, W.-Y.; Huang, P.-J.; Fang, S.-C. Convolutional neural network for individual identification using phase space reconstruction of electrocardiogram. Sensors 2023, 23, 3164. [Google Scholar] [CrossRef] [PubMed]
Makowski, D.; Pham, T.; Lau, Z.J.; Brammer, J.C.; Lespinasse, F.; Pham, H.; Schölzel, C.; Chen, S.H.A. NeuroKit2: A Python toolbox for neurophysiological signal processing. Behav. Res. Methods 2021, 53, 1689–1696. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]

Figure 1. Reconstruction of ECG in phase space: (a) single ECG beat [9]; (b) trajectories of (a); (c) a 30-sec ECG segment; (d) quasi-periodic trajectories of (c) with variation and fluctuation.

Figure 2. The proposed CNN + CBAM architecture.

Figure 3. Phase space diagram with different time delays: (a) 7.8 ms; (b) 27.3 ms; (c) 46.9 ms and line widths; (d) line width = 0.1 pt; (e) line width = 1 pt; (f) line width = 5 pt.

Table 1. Number of ECG segments in the experiment.

Dimension	Valence		Arousal
Dimension	Negative	Positive	Low	High
Raw data (records)	161 (38.9%)	253 (61.1%)	114 (27.5%)	300 (72.5%)
Number of segments	2576 (50.5%)	2530 (49.5%)	3078 (50.6%)	3000 (49.4%)

Table 2. Recognition rates of phase space diagrams with different time delays and line widths.

Dimension	Model	Time Delay (ms)			Line Width (pt)
Dimension	Model	7.8	27.3	46.9	0.1	1	5
Valence	VGG19	72.46%	79.10%	77.38%	79.10%	74.60%	74.02%
Valence	ResNet50	76.95%	74.80%	74.53%	74.80%	70.70%	71.09%
Arousal	VGG19	86.53%	90.48%	91.03%	90.48%	88.34%	85.05%
Arousal	ResNet50	85.38%	89.00%	92.61%	89.00%	84.23%	80.95%

Table 3. Recognition Rates in the Valence Dimension.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)	Specificity (%)
VGG19	79.10	78.52	79.45	78.98	78.76
VGG16	80.27	85.85	71.94	78.28	88.42
ResNet101	75.98	77.54	72.33	74.85	79.54
ResNet50	74.80	72.46	79.05	75.61	70.66
CNN4	78.91	80.59	75.49	77.96	82.24
CNN4 + CBAM	87.89	90.30	84.58	87.35	91.12

Table 4. Recognition Rates in the Arousal Dimension.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1 Score (%)	Specificity (%)
VGG19	90.48	92.61	87.67	90.07	93.20
VGG16	90.80	92.07	89.00	90.51	92.56
ResNet101	89.49	92.45	85.67	88.93	93.20
ResNet50	89.00	91.79	85.33	88.43	92.56
CNN4	90.80	93.57	87.33	90.34	94.17
CNN4 + CBAM	91.79	94.96	88.00	91.35	95.47

Table 5. Comparison to Other Methods in the Literature.

Reference	Model	Valence		Arousal
Reference	Model	Accuracy (%)	F1 Score (%)	Accuracy (%)	F1 Score (%)
[5]	SVM	62.37	53.05	62.37	57.98
[6]	SVM	65.80	--	57.10	--
[7]	Self-supervised CNN	85.00	84.50	85.90	85.90
[8]	DCNN–CBAM	87.40	86.80	87.70	85.30
This study	CNN4 + CBAM	87.89	87.35	91.79	91.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, S.-N.; Cheng, C.-W.; Chang, Y.P. Emotion Recognition Using Electrocardiogram Trajectory Variation in Attention Networks. Eng. Proc. 2025, 120, 17. https://doi.org/10.3390/engproc2025120017

AMA Style

Yu S-N, Cheng C-W, Chang YP. Emotion Recognition Using Electrocardiogram Trajectory Variation in Attention Networks. Engineering Proceedings. 2025; 120(1):17. https://doi.org/10.3390/engproc2025120017

Chicago/Turabian Style

Yu, Sung-Nien, Chia-Wei Cheng, and Yu Ping Chang. 2025. "Emotion Recognition Using Electrocardiogram Trajectory Variation in Attention Networks" Engineering Proceedings 120, no. 1: 17. https://doi.org/10.3390/engproc2025120017

APA Style

Yu, S.-N., Cheng, C.-W., & Chang, Y. P. (2025). Emotion Recognition Using Electrocardiogram Trajectory Variation in Attention Networks. Engineering Proceedings, 120(1), 17. https://doi.org/10.3390/engproc2025120017

Article Menu

Emotion Recognition Using Electrocardiogram Trajectory Variation in Attention Networks^†

Abstract

1. Introduction