1. Introduction
The safety of a metro in urban rail transit largely depends on the driver. Studies have shown that human actions and decision-making are dominant contributors in train accidents [
1]. The tasks of metro drivers require high concentration and repetitive operations. A prolonged engagement in monotonous yet highly responsible duty may lead to the accumulation of negative emotions, which in turn may exacerbate the accumulation of physiological fatigue [
2]. As an important psychological variable influencing drivers’ cognitive functions and behavioral decision-making, emotions play a critical moderating role in the generation and evolution of fatigue. Negative emotions tend to accelerate fatigue accumulation and reduce work vigilance, whereas positive emotions may, to some extent, buffer the adverse effects of fatigue and promote psychological recovery and task persistence [
3]. Therefore, accurately identifying metro drivers’ emotional states during driving and examining how different emotions are associated with temporal changes in fatigue are essential for understanding fatigue development and improving operational safety in metro systems.
In addition to fatigue, operational monotony and routine task exposure have been widely recognized as critical human risk factors in railway and transportation safety. Long-duration repetitive operations under highly standardized procedures may lead to vigilance decrement, reduced situational awareness, and gradual cognitive disengagement [
4]. Previous studies have shown that routine-based task environments are strongly associated with human error in major transportation accidents, particularly under conditions of low stimulation and sustained attention demand. The interaction between monotony, mental workload, and physiological fatigue has been identified as a key contributor to performance degradation in rail operations [
5].
Although routine-induced vigilance decline has been extensively studied, less attention has been given to the dynamic role of emotional states in modulating fatigue evolution under such monotonous operational contexts. Emotional fluctuations may either exacerbate cognitive depletion under repetitive tasks or buffer fatigue accumulation through affective regulation mechanisms. Therefore, studying the mechanism of how emotions affect fatigue under daily subway driving conditions is crucial for a more comprehensive understanding of safety risks related to human factors.
In addition to operational monotony, environmental factors such as lighting conditions have been shown to significantly influence attention, wakefulness, and emotional states [
6]. Exposure to different light spectra, particularly blue-enriched light, has been associated with melatonin suppression, enhanced alertness, and a modulation of affective responses [
7]. Recent interdisciplinary research combining transportation science and neuroscience has emphasized the role of ambient light in shaping driver emotions and cognitive performance. Variations in light quality and color perception within transportation environments may therefore contribute to emotional fluctuations and vigilance regulation [
8]. However, the present study focuses specifically on the interaction between physiological emotion recognition and fatigue evolution under controlled simulated driving conditions, where lighting parameters were kept stable to minimize external variability. The influence of dynamic lighting environments represents an important direction for future research.
In this study, emotional categories are defined based on the valence dimension of the widely accepted valence–arousal emotional model. From a scientific perspective, positive emotions refer to affective states characterized by high valence, typically associated with pleasant, rewarding, or desirable experiences. Negative emotions correspond to low-valence affective states, which are commonly linked to unpleasant, stressful, or adverse experiences. The neutral emotional state represents an intermediate valence level that is neither distinctly positive nor negative. This classification framework is consistent with established affective neuroscience and emotion recognition research [
9].
Numerous studies have shown that emotional fluctuations can affect drivers’ behavior during driving tasks. Under different emotional states, drivers exhibit an activation of the sympathetic nervous system, which in turn influences reaction time, decision-making, and overall driving performance [
10]. Moreover, emotional states can also induce driving fatigue, thus leading to increased drowsiness, reduced attention, and diminished reaction capability.
The existing studies on emotion recognition can be mainly divided into three categories [
11]. First, methods based on driving behavior characteristics, such as maximum lane deviation, steering angle, and acceleration–deceleration patterns, capture observable variations in driving performance. However, these behavioral indicators are influenced by multiple factors, including task demands, operational conditions, and individual driving styles. As a result, they cannot directly or specifically represent the underlying emotional valence of the driver and may lack sensitivity in detecting instantaneous emotional fluctuations [
12]. Second, the methods are based on facial recognition features, such as eye movement trajectories, pupil diameter, and frequency of blinking. These methods perform well under stable lighting and fixed viewing conditions but are highly susceptible to variations in illumination, camera angles, and occlusion, resulting in limited robustness [
13]. Third, the methods are based on the physiological indicators of a driver, such as electroencephalogram (EEG), electrocardiogram (ECG), and pulse signals. The analysis of these changes happening in the nervous system can objectively and accurately reflect the emotional state of a driver in real time. The third class of methods is regarded as the most promising collection of approaches in the current emotion recognition research, as physiological signals are not subject to deliberate human control and authentically reflect genuine emotional states [
14]. Physiological signals characterize the physiological regulation processes underlying the internal states of drivers in a more direct manner, without relying on external environments or overt behavioral manifestations, thereby providing more stable and fine-grained emotional representations in dynamic driving contexts. Therefore, adopting physiological signals as the primary information source for emotion recognition effectively meets the requirements of this study for performing the real-time and reliable detection of emotional variations [
15].
This work presents an emotion recognition model based on a multi-scale convolutional neural network (MSCNN) combined with an attention mechanism to classify drivers’ emotional states using ECG signals. In this proposed method, preprocessed electrocardiogram (ECG) features are used as inputs. The proposed model first extracts the local dynamic features at different temporal scales by using multi-scale one-dimensional convolutional modules. By employing convolution kernels of different sizes, the proposed model captures the emotion-related short-term fluctuations as well as long-term trends, thereby enhancing the temporal representational capacity of the extracted features. Subsequently, an attention mechanism is introduced to adaptively allocate feature weights to emphasize the key features that contribute more in the process of emotion discrimination while suppressing noise and redundant information. This improves the model’s discriminative performance and feature interpretability. The feature vectors obtained after multi-scale convolution and attention weighting are fed into fully connected layers, and a Softmax classifier is used to classify the input signal into three emotional states: neutral, positive, and negative. During the training process, cross-entropy loss function is adopted as the optimization objective, Adam optimizer with a learning rate of 1 × 10−3 is used for parameter updating, and K-fold cross-validation is incorporated to enhance model stability and generalization capability. The training and evaluation phases integrate multiple visualization analyses, including accuracy metrics, confusion matrices, and temporal visualizations of emotion labels for comprehensively assessing the model’s classification performance and dynamic response capability in emotion recognition for metro driving tasks.
After the training process is completed, the trained model is applied to ECG data collected during metro driving tasks to evaluate its performance in classifying drivers’ emotional states. The data for performing experiments are obtained from a metro driving simulation platform. EEG and ECG signals are synchronously collected from all participants during the experiment. The EEG data are subject to feature extraction for assessing fatigue levels, while the ECG data are used for emotion recognition. The emotion labels predicted by the trained model are temporally aligned with fatigue indicators derived from EEG signals over the same time segments, and statistical analyses are conducted to quantify the effects of different emotional states on fatigue levels. By comparatively analyzing the variation trends of fatigue indicators under different emotional categories, dynamic regulatory characteristics of emotions on the formation and evolution of driver fatigue are extracted, thereby obtaining analytical data on the roles of positive, neutral, and negative emotions during fatigue accumulation and regulation phases. These analytical results not only help in quantifying the magnitude and direction of the effects of emotions on driving fatigue, but also provide data support and theoretical foundations for real-time emotion monitoring and fatigue intervention strategies for urban rail transit drivers. The overall workflow of the proposed method is illustrated in
Figure 1.
2. Materials and Methods
This section describes the experimental design, participant information, simulation procedures, physiological signal acquisition and preprocessing, feature extraction methods, and the architecture and training strategy of the proposed emotion recognition model.
2.1. Participants in the Driving Simulation Experiment
To minimize inter-subject variability and ensure experimental control, 21 participants were recruited for this study. The participants consisted of undergraduate and graduate students aged between 22 and 26 years who had completed at least one semester of structured metro driving simulation training. Although they were not professional metro drivers, the experimental design aimed to investigate the physiological mechanisms underlying the influence of emotional states on fatigue development under standardized and controlled driving conditions.
The relatively homogeneous age range was selected to reduce variability in baseline physiological responses and cognitive performance, thereby strengthening internal validity. While age-related differences in fatigue recovery may exist, the present study focuses on relative fatigue dynamics across emotional states rather than absolute fatigue thresholds.
Considering that the majority of metro drivers are male, all participants in this study were male to maintain demographic consistency and reduce gender-related physiological variability. All participants reported good health with no history of major neurological or cardiovascular disorders and completed the Vienna Test System (VTS) assessment, with results meeting or exceeding standard cognitive performance criteria. Participants were instructed to avoid alcohol, caffeine, and medication for 24 h prior to the experiment and to obtain at least 8 h of sleep the night before testing to minimize confounding factors related to circadian rhythm and physiological fatigue.
2.2. Simulation Driving Experimental Tasks and Procedures
An overview of the experiments and procedures is illustrated in
Figure 2. This study employs an urban rail transit driving simulation system for conducting experiments by simulating metro driving to reproduce real-world operating conditions. Shanghai Metro Line 3 is used to achieve a high-fidelity representation of the actual driving environment. EEG signals are measured using the NeuSen W-series wireless EEG acquisition system. This system ensures high-quality multimodal physiological signal collection during simulated driving tasks. In addition, an ECG acquisition device from the CAPTIV series is used to collect the data of the participants. The data recording and analysis are performed using the CAPTIV-L7000 multimodal physiological and behavioral data acquisition and analysis system.
To control for circadian rhythm variability and minimize inter-day physiological fluctuations, all experiments were conducted within a fixed time window (14:00–16:00). This period corresponds to the well-documented post-lunch circadian dip, during which healthy adults typically exhibit reduced alertness and increased susceptibility to fatigue [
16]. Conducting the experiment during this naturally fatigue-prone interval facilitated the observation of fatigue evolution under emotionally modulated conditions while maintaining standardized experimental control. The driving simulations required participants to operate trains on both elevated and underground sections under simulated clear weather conditions. During operations, the drivers are required to control train acceleration and deceleration to ensure smooth operations without exceeding the speed limits of each track section. Upon arrival at a station, the drivers have to confirm the signals, and the train is manually stopped before passenger boarding. Prior to experiments, the participants are required to complete a basic information questionnaire reporting their demographic information (e.g., gender, age, and major), and also if they consumed any stimulant or sedative foods or medications. At the same time, the participants also completed an informed consent form and are briefed regarding the experimental procedures. All the participants have undergone at least one semester of simulated driving training prior to the experiment to reduce the differences in familiarity with the driving simulator.
At the beginning of the experiment, the participants are first required to complete the Karolinska sleepiness scale (KSS) and the positive and negative affect schedule (PANAS) to report their subjective alertness and emotional experiences. Afterwards, all the participants practiced the task until all performance metrics reached the predefined criteria and then rested for five minutes. Subsequently, the participants performed a 90 min driving task, during which the KSS questionnaire was administered every 15 min. At the end of the task, the participants completed the KSS and PANAS again to report their subjective experiences under current conditions. The post-experiment KSS and PANAS data are used in subsequent analyses to validate and complement the accuracy of the physiological data assessment results. ECG and EEG signals are continuously measured throughout the entire experiment. In addition, during driving simulations, the participants are restricted from consuming food or beverages, using mobile devices, or communicating with the staff performing the experiments. Participants received standard financial compensation for their time and participation, which was fixed and independent of task performance. The compensation was provided solely as an ethical reimbursement for participation and was not contingent upon task performance, emotional state, or experimental outcomes, thereby minimizing potential motivational or affective bias. Meanwhile, baseline emotional assessments (PANAS) were collected prior to task initiation to ensure that initial affective states were not systematically influenced by participation compensation.
2.3. Data Acquisition and Preprocessing
In this study, the physiological signals of metro drivers are synchronously collected in a multimodal manner, including electrocardiogram (ECG) and electroencephalogram (EEG) signals.
EEG signals are recorded using the Neuracle NeuSen W-series wireless EEG system, as shown in
Figure 2, which are then stored in BDF file format. EEG data were read and processed using the MNE-Python library (version 1.9.0), which supports the offline loading of raw EEG recordings for subsequent preprocessing and analysis. During the preprocessing stage, a 1–40 Hz band-pass filter is applied to the EEG signals to remove low-frequency drift and high-frequency noise, followed by z-score normalization to standardize signal amplitudes and reduce inter-subject variability [
17]. In the data segmentation step, EEG signals are divided into fixed-length windows of 400 data points to ensure temporal consistency with the ECG signals. The power features are extracted from three frequency bands, i.e.,
θ (4–8 Hz),
α (8–13 Hz), and
β (13–30 Hz). An additional ratio
is computed as a fatigue sensitivity indicator to reflect the changes in the cerebral functional activity during different stages of driving.
ECG signals are recorded using the CAPTIV L7000 system, with the electrodes being placed horizontally on the chest and connected by the leads suspended in front of the chest. A portable configuration effectively captures the electrocardiogram signals, as illustrated in
Figure 2. The ECG data are stored in UTF-16-encoded CSV files with a sampling rate of 128 Hz. The program identifies and extracts the columns containing time and ECG information based on header detection and preferentially used the pre-filtered signals. During preprocessing, a fourth-order Butterworth band-pass filter (0.5–40 Hz) combined with bidirectional zero-phase filtering is applied to remove low-frequency drift and high-frequency noise. Subsequently, z-score normalization is applied to the signals to eliminate any inter-individual amplitude differences. The preprocessed ECG signals are segmented into fixed-length windows of 400 data points each, and the mean and standard deviation curves are computed at window level to characterize the temporal variations in ECG features.
2.4. Feature Extraction
This section describes the extraction of EEG- and ECG-based features used in the present study. Specifically, EEG frequency-domain indicators were derived to quantify fatigue levels during the simulated driving task, whereas ECG-related features were extracted to provide physiological inputs for emotion recognition and subsequent analysis of the influence of emotional states on fatigue evolution.
2.4.1. EEG-Related Indicators
After preprocessing the EEG signals, frequency-domain analysis is performed using the fast Fourier transform (FFT). This is accomplished by transforming the time-domain EEG signals into frequency domain signals to extract the power features from three main frequency bands, i.e.,
(4–8 Hz),
(8–13 Hz), and
(13–30 Hz) [
18]. In this study, the specific computational procedure of the fast Fourier transform (FFT) is described as follows:
- 1.
Discrete Fourier Transform (DFT)
In (1), denotes the amplitude of the EEG time-domain signal of length N at the n-th sampling point, represents the length of the analysis window, represents the complex amplitude of the k-th frequency component of the signal in the frequency domain, represents the frequency index, denotes the complex exponential basis function that maps the time-domain signal to the frequency domain, and represents the imaginary unit.
- 2.
Power spectral density (PSD) based on FFT
In (2), denotes the power spectral density of the EEG signal at frequency , represents the squared amplitude of the corresponding frequency component, represents the window length, denotes the sampling rate, and represents the k-th frequency point.
- 3.
Band power calculation
In (3), denotes the power spectral density at frequency , and , , and represent the total power of the , , and frequency bands, respectively. reflects the increasing drowsiness and cognitive load, whose sum ranges between 4 and 8 Hz, is associated with the relaxation and internal attention states, whose sum ranges between 8 and 13 Hz, and represents alertness and executive function, whose sum ranges between 13 and 30 Hz.
- 4.
Extraction of EEG-based fatigue indicators ( ratio)
In (4), the
ratio is a commonly used indicator in EEG signal analysis for assessing fatigue and attention levels. When the drivers are engaged in prolonged monotonous driving or high-workload tasks,
-band power increases while
-band power decreases, leading to an elevated
ratio that reflects fatigue accumulation and attention decline [
19]. Therefore, the
ratio comprehensively reflects the cognitive activation levels of the drivers under different conditions and serves as an important EEG feature for characterizing fatigue development during driving tasks. This key EEG feature will be used as a quantitative representation of train operators’ fatigue level in the data analysis of the metro driving simulation experiment. To reduce inter-subject variability and ensure comparability across participants, the
ratio time-series was further normalized using subject-wise z-score standardization:
In (5), and represent the mean and standard deviation of for the corresponding participant over the entire driving task. The normalized was used as the final EEG-based fatigue indicator to quantitatively represent train operators’ fatigue level in the metro driving simulation experiment.
- 5.
Construction of EEG-based fatigue time-series
To characterize fatigue dynamics during the metro driving simulation experiment, the EEG-based fatigue indicator was computed sequentially to form a fatigue time-series for each participant. Time-series variations were quantified through point-to-point comparisons between consecutive values of . Specifically, for each time index , fatigue was considered to increase when , and to decrease when . Based on these comparisons, the proportions of fatigue increase and fatigue decrease were calculated by counting the numbers of increasing and decreasing transitions, respectively, and normalizing them by the total number of transitions within the analyzed segment. This procedure provides a consistent quantitative definition of fatigue variation and supports the subsequent statistical analysis.
2.4.2. ECG-Related Indicators
After preprocessing the ECG signals, statistical features are extracted from each window to capture the changes in the activity of the autonomic nervous system across different stages of driving. In this study, the following features from ECG signals are extracted:
- 1.
The mean and standard deviation of ECG signals, denoted as
and
, extracted from each window reflect the central tendency and dispersion of ECG fluctuations, respectively [
20]:
In (6), denotes the mean amplitude of the ECG signal within the window. It reflects the overall level of cardiac activity. represents the amplitude at the i-th sample, and indicates the number of samples in a window, i.e., 400.
- 2.
The R–R interval is calculated by detecting the positions of R peaks in the ECG signal, and then computing the time intervals between adjacent R waves:
In (7),
denotes the duration of the i-th cardiac cycle, which is obtained by calculating the difference between the occurrence times of two adjacent R waves. Here,
and
represent the occurrence times of consecutive R waves. Based on the R–R intervals, the instantaneous heart rate
is calculated as follows:
In (8), denotes the i-th instantaneous heart rate.
- 3.
The mean heart rate is also considered as a time-domain statistical indicator, and is mathematically expressed as follows:
In (9),
denotes the mean heart rate within the window,
represents the instantaneous heart rate, and
denotes the number of computable heartbeats within the window minus one [
21].
After completing the process of feature extraction from EEG and ECG signals, this study uses EEG features to assess fatigue levels. In order to explore the emotional states of drivers during driving tasks and quantify their relationship with fatigue accumulation, this study builds and trains an emotion classification model based on a multi-scale convolutional neural network (MSCNN) combined with an attention mechanism. The ECG features are used as the input for the emotion recognition model.
2.5. Model Architecture and Training
In this study, a multi-scale convolutional neural network (MSCNN) combined with an attention mechanism is built that uses the features extracted from ECG signals to classify the emotional states of drivers. During the training process, in addition to the binary emotion model (positive and negative emotional states), a neutral state is introduced to represent an emotionally undefined baseline condition that is neither positive nor negative. The inclusion of the neutral category facilitates the identification of transitions from a stable state toward positive or negative emotional directions and avoids semantic ambiguity. The proposed model adopts a one-dimensional convolutional architecture that uses the preprocessed ECG unified feature sequence vectors as inputs [
22]. The overall preprocessing workflow is illustrated in
Figure 3.
The core architecture of the proposed model adopts three parallel one-dimensional convolutional branches with kernel sizes of 3, 5, and 9, respectively, to extract rapid rhythmic variations from the ECG physiological data, as well as the emotional dynamics over longer temporal ranges. The multi-scale convolution captures hierarchical information ranging from short-term local fluctuations to cross-temporal variations. Batch normalization and max-pooling layers are employed to further ensure the stability of feature extraction and effective temporal compression.
To further enhance the model’s sensitivity towards emotion-related key features, a channel attention mechanism is applied on multi-scale convolutional outputs to adaptively weight the importance of features from different branches. The attention module employs global average pooling to extract channel-wise statistical features and generate channel weight vectors based on a two-layer fully connected structure with nonlinear activations. These outputs are then multiplied element-wise with the multi-scale features to obtain the attention-enhanced deep feature representations. This mechanism significantly improves the model’s focus on ECG physiological features, while suppressing noise, thereby achieving a higher discriminative capability in recognizing neutral, positive, and negative emotional states.
The attention-enhanced feature sequences are flattened and fed to a two-layer fully connected network for final classification. The first fully connected layer introduces nonlinearity via a Rectified Linear Unit (ReLU) activation function to further integrate the multimodal features. The second dense layer employs a Softmax probability activation function to generate the probability distribution over three emotion categories. The proposed model is trained using the DREAMER (Database for Emotion Recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices) dataset, a publicly available physiological emotion dataset [
23], which features high-quality annotations, validated reliability, well-structured data, and strong sample consistency, thus making it suitable for the convolutional neural network approach adopted in this study. The dataset is split into training and test sets by 8:2 to ensure consistent distributions across the three emotion categories. Subsequently, TensorDataset and DataLoader are constructed to enable mini-batch data iteration. Cross-entropy loss is used as the optimization objective, and model parameters are updated using the Adam optimizer, with a learning rate set to 1 × 10
−3. A batch size of 32 is selected, and the number of training epochs is set to 100. The validation performance is continuously monitored during the training process. The model weights that result in the highest classification accuracy are selected to ensure model stability and generalization capability. The overall model architecture is illustrated in
Figure 4.
The network architecture was designed according to the temporal characteristics of ECG signals and established practices in physiological signal classification. A one-dimensional convolutional structure was adopted because ECG data are sequential time-series signals. The use of multi-scale convolutional branches with kernel sizes of 3, 5, and 9 was intended to capture short-term rhythmic variations, intermediate temporal patterns, and longer-range fluctuations, respectively. This design enables hierarchical feature extraction across different temporal resolutions [
24]. The attention mechanism was introduced to enhance channel-wise feature weighting, allowing the model to emphasize emotion-relevant representations while suppressing noise [
25]. Such attention-based enhancement has been widely adopted in physiological signal recognition tasks to improve discriminative capability. Hyperparameters were selected based on preliminary experiments and common practices in deep learning for physiological data. The Adam optimizer with a learning rate of 1 × 10
−3 was employed due to its adaptive gradient properties and stable convergence behavior. A batch size of 32 was used to balance gradient stability and generalization performance under moderate sample size conditions. The number of training epochs was set to 100, while model performance was monitored on the validation set to select the best-performing weights and mitigate overfitting.
In summary, the integration of a multi-scale CNN and attention mechanism enables the proposed model to capture the emotion-related information embedded in multimodal physiological signals across different temporal scales and reinforce key emotional features through adaptive feature weighting, thereby significantly enhancing its ability to discriminate between various emotion categories. The training results demonstrate that the proposed model achieves good generalization performance on the DREAMER dataset and can be effectively applied to emotion classification in simulated driving experiments. Therefore, the proposed model provides a reliable emotion recognition tool for subsequent analyses of the mechanisms based on which emotional states influence fatigue evolution.
3. Results and Analysis
In order to evaluate the performance of the trained model in recognizing emotional states (positive, neutral, and negative), multiple evaluation metrics are adopted in addition to the accuracy, including the precision, recall, F1-score, and confusion matrix, considering that the label distribution of the dataset used in this study is imbalanced. In addition, a baseline CNN model comprising a two-layer one-dimensional convolution–pooling architecture is also trained and used for performing comparisons with the proposed model.
3.1. Overall Model Performance
Table 1 presents a comparison of classification performance based on the DREAMER dataset between the baseline CNN model composed of a two-layer one-dimensional convolution–pooling architecture and the proposed multi-scale CNN with an attention mechanism.
The evaluation results show that the multi-scale CNN combined with an attention mechanism outperforms the conventional CNN in terms of its accuracy, precision, recall, and F1-score. This is because the multi-scale convolutional architecture simultaneously captures the short-term fluctuations and long-term emotional variation trends embedded in the ECG signals, thereby enhancing the richness and stability of feature representations. Moreover, the attention mechanism adaptively adjusts the feature weights in response to emotional changes, thus allowing emotion-relevant signals to exert a greater influence while suppressing noise and redundant information, thereby improving the model’s discriminative capability.
Although the classification accuracy does not approach 100%, emotion recognition based on physiological signals inherently involves inter-individual variability and signal noise. Achieving an overall accuracy of 86.96% on a three-class emotion task demonstrates a strong discriminative capability, particularly when compared to the baseline CNN model (70.84%).
3.2. Analysis of Emotion Recognition Classification Results
As shown in
Figure 5, the confusion matrix results show that the proposed model overall exhibits a stable performance in recognizing the three emotion categories, with the highest accuracy observed for “neutral” and “negative” emotions, i.e., 139 (the number of neutral emotion samples is 161, and the proportion of successfully recognized samples is 86.34%) and 144 (the number of negative emotion samples is 163, and the proportion of successfully recognized samples is 88.34%) samples were correctly classified, respectively. For the “positive” emotion, the model was able to correctly identify most of the samples, although some “positive” samples were misclassified as “neutral” or “negative”. This indicates that the proposed model is sensitive to subtle differences between neutral and adjacent emotional states. Overall, the concentration of values along the diagonal of the confusion matrix suggests that the model effectively distinguishes among different emotion categories, demonstrating a good classification reliability.
Figure 6 presents a line chart of predicted emotions against sample indices. This chart shows that the proposed model is able to effectively capture emotional fluctuations during the driving process. The proposed model exhibits consistent recognition in stable intervals, and emotion transition points display clear stage-wise characteristics, indicating a good temporal sensitivity in processing continuous physiological signals. Although a small number of boundary samples are misclassified, this does not affect the identification of main trends in emotional variations.
3.3. Analysis of the Effects of Emotions on Driving Fatigue in Metro Driving Tasks
To elucidate the regulatory effects of emotions on the evolution of driving fatigue, this study utilizes a trained emotion recognition model to obtain the emotion label sequences of each participant throughout the simulated driving task. Subsequently, the emotion labels are temporally aligned with the fatigue indices derived from the EEG features. The fatigue variation segments corresponding to positive, neutral, and negative emotional states are extracted to construct emotion-specific fatigue time-series curves.
Figure 7 illustrates the temporal variations in fatigue under three emotional states for a representative participant during the experiment, where (a) shows the fatigue time-series curve under positive emotion, (b) shows the fatigue time-series curve under neutral emotion, and (c) shows the fatigue time-series curve under negative emotion.
Table 2 presents the proportions of increase and decrease in fatigue level under three emotional states. It also shows the maximum value, minimum value, and maximum difference in EEG frequency-domain features representing fatigue states.
4. Discussion
This section interprets the main findings of the study and discusses the influence of different emotional states on fatigue evolution from a physiological perspective. The implications of the results, as well as the limitations of the present study, are also addressed.
4.1. Effects of Emotional States on Fatigue
This work is based on physiological signal analysis and verifies the significant moderating role of emotional states in the evolution of fatigue during metro driving. The proposed multi-scale convolutional neural network (MSCNN) combined with an attention mechanism uses ECG signals to stably distinguish between positive, neutral, and negative emotional states. Moreover, the EEG frequency-domain fatigue indicators enable a quantitative analysis of the emotion–fatigue relationship. The experimental results show that different emotional states exert directional differences in the formation and evolution of fatigue.
The statistical results show that fatigue increases significantly under negative emotional states, and both the maximum value and fluctuations in the amplitude of fatigue-related features are significantly greater than those under other emotional states. This suggests that negative emotions accelerate the process of fatigue accumulation. This finding is consistent with previous studies showing that negative emotions increase mental workload and accelerate the consumption of cognitive resources. In contrast, under positive emotional states, the proportion of fatigue decreases considerably and fluctuations in the amplitude of fatigue are smaller. However, the differences between the increasing and decreasing proportions of fatigue are smaller than those observed under negative emotions, indicating that positive emotions exert a certain buffering and restorative effect that helps in delaying fatigue development, although their influence is weaker than that of the negative emotions.
The neutral emotional state exhibits intermediate fatigue evolution characteristics, with comparable proportions of fatigue increase and decrease, indicating a transitional condition between positive and negative emotional influences. This suggests that treating neutral as an independent emotion category helps in accurately characterizing the continuity and transitional features of emotional regulation on fatigue, while avoiding interference from neutral states in the analysis of positive and negative emotional effects.
Although the directional trend that negative emotions are associated with increased fatigue may appear intuitive, the novelty of the present study lies in its quantitative physiological validation within a controlled metro driving context. Unlike prior research that primarily relies on behavioral or self-report measures, this study integrates ECG-based emotion recognition with EEG-derived fatigue indicators to characterize the dynamic evolution of fatigue under different emotional states. The results reveal not only directional differences but also distinct fluctuation patterns and accumulation rates, providing a more fine-grained physiological understanding of how emotional states influence fatigue in operational settings.
Overall, the results of this study demonstrate that emotional states can significantly influence the fatigue level of train operators during metro driving tasks. The experimental evidence indicates that negative emotions are associated with a faster increase and larger fluctuations in fatigue, whereas positive emotions show a buffering effect by slowing fatigue accumulation and promoting a relatively lower fatigue level. These findings provide physiological support for considering the emotional state as an influential factor in fatigue assessment and highlight the potential value of emotion-aware fatigue monitoring for improving operational safety in urban rail transit.
4.2. Limitations and Future Directions
Despite the promising findings, several limitations should be acknowledged. First, the sample size was relatively small and consisted of young participants aged 22–26 years. Although this homogeneous age range was selected to reduce inter-individual physiological variability and enhance internal validity, age-related differences in fatigue recovery, cognitive resilience, and reaction time may influence absolute fatigue trajectories. Therefore, caution should be exercised when generalizing the results to older or more experienced metro drivers.
Second, participants were trained simulation operators rather than professional subway drivers. While this controlled experimental design allowed for standardized task conditions and reduced confounding operational variability, it may limit ecological validity. Future studies should incorporate professional drivers to further validate the applicability of the findings in real-world contexts.
Third, the experiments were conducted within a fixed afternoon time window (14:00–16:00), corresponding to the circadian post-lunch dip. Although this timing facilitated the observation of fatigue evolution under controlled conditions, it does not fully capture the operational complexity of peak-hour metro traffic. Future research may consider incorporating peak-hour simulations or field-based experiments to further enhance ecological validity.
These limitations do not undermine the methodological contribution of the present study, which focuses on elucidating the physiological mechanisms underlying the influence of emotional states on fatigue evolution. However, they highlight important directions for future research.
5. Conclusions
This study investigated the impact of emotional states on fatigue levels in metro driving tasks based on a physiological signal-driven analysis framework. A multi-scale convolutional neural network combined with an attention mechanism was developed to recognize drivers’ emotional states from ECG signals, achieving an emotion classification accuracy of 86.96%. Meanwhile, an EEG-based fatigue indicator was extracted to quantify fatigue evolution over time. By temporally aligning the ECG-based emotion recognition outputs with the EEG-derived fatigue time-series, the proposed model enables a quantitative investigation of how different emotional states influence fatigue development during long-duration metro driving operations.
From a methodological perspective, the proposed multi-scale convolutional neural network combined with an attention mechanism effectively captures the activity of the autonomic nervous system in complex and long-duration driving tasks. Temporally aligning the emotion recognition results with EEG-based fatigue indicators provides a feasible data-driven framework for revealing the mechanisms based on which emotions influence fatigue during driving tasks.
From an application perspective, the findings indicate that negative emotions are a significant amplifier of fatigue, whereas positive emotions possess potential buffering value. This provides a theoretical basis for emotion regulation-based fatigue intervention strategies, such as maintaining drivers’ positive or neutral emotional states through optimized human–machine interface design, work–rest scheduling, or psychological interventions.
This study still has certain limitations. First, the sample size is relatively small, and the participants mainly consisted of young male individuals who had only undergone simulated driving training. Future studies should include metro drivers with actual driving experience to enhance generalization. Second, this study is conducted in a simulated driving environment. However, considering the complexity of the task and unexpected real-world events, formulizing emotion–fatigue relationship is non-trivial, making validation in actual lines or quasi-realistic environments necessary. In addition, future research should incorporate more physiological indicators (e.g., electrodermal activity and eye-tracking measures) and combine multimodal fusion models to further deepen the understanding of the mechanisms based on which drivers’ emotions affect fatigue.