Driver Stress Detection Using Ultra-Short-Term HRV Analysis under Real World Driving Conditions

Considering that driving stress is a major contributor to traffic accidents, detecting drivers’ stress levels in time is helpful for ensuring driving safety. This paper attempts to investigate the ability of ultra-short-term (30-s, 1-min, 2-min, and 3-min) HRV analysis for driver stress detection under real driving circumstances. Specifically, the t-test was used to investigate whether there were significant differences in HRV features under different stress levels. Ultra-short-term HRV features were compared with the corresponding short-term (5-min) features during low-stress and high-stress phases by the Spearman rank correlation and Bland–Altman plots analysis. Furthermore, four different machine-learning classifiers, including a support vector machine (SVM), random forests (RFs), K-nearest neighbor (KNN), and Adaboost, were evaluated for stress detection. The results show that the HRV features extracted from ultra-short-term epochs were able to detect binary drivers’ stress levels accurately. In particular, although the capability of HRV features in detecting driver stress also varied between different ultra-short-term epochs, MeanNN, SDNN, NN20, and MeanHR were selected as valid surrogates of short-term features for driver stress detection across the different epochs. For drivers’ stress levels classification, the best performance was achieved with the SVM classifier, with an accuracy of 85.3% using 3-min HRV features. This study makes a contribution to building a robust and effective stress detection system using ultra-short-term HRV features under actual driving environments.


Introduction
Driver stress can affect a driver's decision-making, performance, and perception abilities [1], increasing traffic safety risks [2]. One study estimated that 30 percent of road crashes are caused by driver stress, thus making stress a major contributor to crashes [3]. Prolonged exposure to driving stress can cause drivers to suffer from headaches, reduced sleep quality, and even increased cardiovascular risk [4]. Therefore, detecting driver stress is crucial for improving driver performance to ensure traffic safety.
Previous research mainly assessed driver stress, based on the driver's psychological, physical, and physiological responses. Psychological assessments have been used as ground truth in stress detection in most studies. Questionnaires or self-reporting methods [5] are used to measure drivers' stress after driving. However, conducting questionnaires at the end of the experiment may not reflect the driver's feelings accurately, affecting the assessment results [6]. Some studies have attempted to detect driver stress by monitoring drivers' physical responses, such as facial expressions [7] and vehicle dynamic data [8]. Additionally, researchers also have concentrated on physiological-based assessment methods, such as heart activity [9], electrodermal activity [6], and respiration activity [1]. Particularly, heart rate variability (HRV), which could be collected non-invasively, has been one of the prevalent methods to assess driver stress [10]. HRV describes the interval variation between consecutive R-wave peaks and can reflect the fluctuation of the autonomic nervous system (ANS), which directly affects cardiac activity. HRV signals are commonly analyzed in time-domain, frequency-domain, and non-linear analyses to detect driver stress. During stress phases, HRV features significantly vary with the fluctuations of the ANS activities, making them good indicators of drivers' stress [9].
The recommended minimum duration for a short-term HRV analysis is 5 min, which has been used to detect driver stress in many studies [9][10][11]. Specifically, there is an outstanding performance for using short-term (5 min) HRV features to detect driver stress in real driving conditions [11]. However, fewer studies have investigated the estimation of the driver's stress levels using an ultra-short-term (less than 5 min) HRV analysis. In fact, the epochs for a 5-min HRV analysis may be too long to detect driver stress in time, resulting in the failure for alerting the driver to avoid traffic incidents and accidents. To address this problem, many researchers have used an ultra-short-term HRV analysis, which could be used to monitor driver stress levels in a real-time and continuous manner, to detect driver stress levels [12,13]. Although those studies tried to use some HRV features extracted from less than 5-min epochs for driver stress detection, they ignored the difference in the stress detection abilities between various ultra-short-term and short-term HRV features. Moreover, previous studies conducted ultra-short-term HRV analyses for mental stress detection using the data collected in well-designed experiments, rather than those collected in the field [14] [15]. Considering that participants could feel more relaxed in the lab, resulting in different HRV feature changes between the data collected in the field and in the lab, conducting an ultra-short-term HRV analysis with data in the field may represent authenticity and have high value for practical applications. Moreover, ultra-short-term HRV analyses have been employed in various clinical fields [16,17]. Some studies have shown that ultra-short-term HRV features are valid surrogates for short-term HRV features for investigating the ANS function in obstructive sleep apnea (OSA) patients [18,19]. Although Munoz et al. [20] evaluated the validity of two HRV features extracted from 10-sec epochs, it is unclear to measure the stress level for other HRV features, since more HRV features could be used to build a more robust detection model. Therefore, there is a challenge to build a real-time and robust stress detection model using ultra-short-term HRV features.
In this study, we focus on driver stress detection and assessment using an ultra-shortterm HRV analysis in real work conditions. Firstly, a statistical analysis was performed to determine which ultra-short-term HRV feature could be used to substitute the corresponding short-term HRV feature, to detect driver stress levels. Then, the machine learning approach was employed to build a classification model for stress detection. To the best of our knowledge, this is the first study to analyze driver stress detection using ultra-shortterm HRV features in actual driving conditions. The ultra-short-term driver stress detection model could be used to monitor driver stress states in a short and real time manner, alerting the driver to make adjustments in a timely manner and thereby ensuring traffic safety.

Stress Data
The data used in this study were the ECG signals in the driver database [1]. The data include the signals of 17 drivers, who were required to drive a prespecified route in Boston. As shown in Figure 1, the driving route contains three driving scenarios: rest, city-driving, and highway-driving, which were designed to induce drivers' varying stress feelings. Specifically, two 15-min rest periods occurred at the beginning and end of the drive. During the rest periods, the driver sat in the garage with eyes closed to create a low-stress level. Following the first rest period, the drivers drove on a busy street with unexpected risks generated by non-motorized traffic flow, resulting in a high-stress level. The route then led drivers away from the city, over a bridge, and onto a highway. During highway driving, the drivers would have a medium-stress level when driving without congestion. Finally, drivers turned around and followed the same route in the opposite direction, back to the pected risks generated by non-motorized traffic flow, resulting in a high-stress level. The route then led drivers away from the city, over a bridge, and onto a highway. During highway driving, the drivers would have a medium-stress level when driving without congestion. Finally, drivers turned around and followed the same route in the opposite direction, back to the starting point. Each driving experiment lasted around 60 to 90 min, depending on the traffic conditions. Three digital cameras were used to record video during the whole driving experiment, including the first camera placed at the steering wheel, the second wide-angle camera was mounted on the dashboard, and the third camera was used for event recording. Then, two experts watched those videos to score the driver's stress level, according to some stress indicators (e.g., stops, turning, bumps in the road, head-turning, and gaze changes). Following the experiment, the drivers were required to complete subjective rating questionnaires immediately, including two main ratings: a subjective stress rating scale and a stressful event rating scale. The results from both expert ratings and subjective questionnaires validated the assumption that the experimental scenarios can induce defined stress levels: low stress, moderate stress, and high stress, corresponding to rest, highway, and city street driving [1].

Pre-Processing
The RR intervals were extracted from the ECG signals using the PhysioNet HRV toolkit, a rigorously validated open-source software package for HRV analysis [21]. Moreover, outliers and ectopic beats were removed from those RR intervals. An outlier is defined as a point outside the range of 280 to 1500 milliseconds [22], and an ectopic beat is a point where the RR interval differs from the previous RR interval by more than 20% [23]. Both outliers and ectopic beats were replaced by interpolated intervals with linear interpolation [22]. Moreover, to analyze the HRV features extracted from various epochs, all RR intervals were divided into 5-min epochs without overlapping. Secondly, the shorter epochs (i.e., 30-s, 1-min, 2-min, and 3-min epochs) were obtained from the corresponding 5-min epoch. Specifically, 30-s, 1-min, 2-min, and 3-min epochs have the same center point with corresponding 5-min epochs, as shown in Figure 2. Three digital cameras were used to record video during the whole driving experiment, including the first camera placed at the steering wheel, the second wide-angle camera was mounted on the dashboard, and the third camera was used for event recording. Then, two experts watched those videos to score the driver's stress level, according to some stress indicators (e.g., stops, turning, bumps in the road, head-turning, and gaze changes). Following the experiment, the drivers were required to complete subjective rating questionnaires immediately, including two main ratings: a subjective stress rating scale and a stressful event rating scale. The results from both expert ratings and subjective questionnaires validated the assumption that the experimental scenarios can induce defined stress levels: low stress, moderate stress, and high stress, corresponding to rest, highway, and city street driving [1].

Pre-Processing
The RR intervals were extracted from the ECG signals using the PhysioNet HRV toolkit, a rigorously validated open-source software package for HRV analysis [21]. Moreover, outliers and ectopic beats were removed from those RR intervals. An outlier is defined as a point outside the range of 280 to 1500 milliseconds [22], and an ectopic beat is a point where the RR interval differs from the previous RR interval by more than 20% [23]. Both outliers and ectopic beats were replaced by interpolated intervals with linear interpolation [22]. Moreover, to analyze the HRV features extracted from various epochs, all RR intervals were divided into 5-min epochs without overlapping. Secondly, the shorter epochs (i.e., 30-s, 1-min, 2-min and 3-min epochs) were obtained from the corresponding 5-min epoch. Specifically, 30-s, 1-min, 2-min and 3-min epochs have the same center point with corresponding 5-min epochs, as shown in Figure 2. The recordings from the first 5 min of each experiment were discarded due to a poor signal quality. Moreover, only "low-stress level" and "high-stress level" recordings were considered in this study to obtain a clearer distinction between the different stress levels.  The recordings from the first 5 min of each experiment were discarded due to a poor signal quality. Moreover, only "low-stress level" and "high-stress level" recordings were considered in this study to obtain a clearer distinction between the different stress levels.

HRV Features
For each epoch, 22 commonly used HRV features were extracted from the RR intervals, according to Table 1, including 10 time-domain features, seven frequency-domain features, and five non-linear features. The Lomb-Scargle periodogram, which is suitable for small sample RR intervals and does not require resampling, was used to analyze the power spectral density of the RR intervals.

. Student's t-Test
The t-test was conducted to investigate whether there is a statistically significant difference in those HRV features between the varying stress levels. Specifically, the t-test was performed for different time scales, respectively, to explore how the HRV features extracted from different time scales related to the stress levels. It is assumed that, for an HRV feature, there is the same ability of stress detection in all time scales, if a significant difference and the same tendency were found for it and extracted from the various time scales, between the varying stress levels. The significance level was set at p < 0.05.

Spearman Correlation Analysis and Bland-Altman Plots
The Spearman correlation analysis, a nonparametric rank statistic, was carried out to determine whether an ultra-short-term HRV feature was correlated with the corresponding short-term HRV feature at the same stress level. Although Spearman's rank correlation coefficient has been used to measure the degree of association between two variables, the strong correlation does not necessarily indicate an agreement. Considering that the Bland-Altman analysis could be used to measure the mean difference and calculate the limits of agreement(LoA) for assessing the agreement between two methods of measurement [25,26], the Bland-Altman analysis was further used to assess the agreement between the ultrashort HRV features and the corresponding short-term HRV features at the same stress level. It would be assumed that an ultra-short-term HRV feature could be used to replace the corresponding short-term HRV feature for driver stress detection, if an HRV feature meets the following conditions: (1) there are the same changes for all of the ultra-short-term HRV features and corresponding short-term HRV feature, between the low-stress and high-stress levels, and (2) there are significant correlations for all of the ultra-short-term HRV features and corresponding short-term HRV features between the low-stress and highstress levels (i.e., the correlation coefficient is higher than 0.7, and the correlation coefficient is significantly different). The significance level of Spearman's correlation coefficient was set at 0.05.

Stress Classification
The machine learning pipeline used in this work is shown in Figure 3. For the ultra-short-term HRV features, only those features, which could be substituted for the corresponding short-term HRV features, based on a statistical analysis, were used to build the machine-learning models. The feature data was split into two parts: 70% of the data for training and parameter tuning, and 30% of the data for testing. To make sure that the results are not just coincidental, this process was repeated ten times. The 10-fold cross-validation was performed to research the optimal parameters using a grid search at each split. The accuracy, sensitivity, specificity, and F1-score were calculated to evaluate the performance of the models. The final test results are the mean values across these ten repetitions.
Four different binary classifiers were considered: support vector machine (SVM) [27], random forests (RFs) [28], K-nearest neighbor (KNN) [29], and AdaBoost [30]. These classifiers were chosen, partly because they are different from each other in algorithmic logic, but also because they have been used to build stress detection in previous studies. The SVM used a Gaussian kernel function. The KNN was trained with K varies from 1 to 10 and used the Euclidean distance metric. The RFs used 50, 100, and 150 decision trees. The Adaboost classifier used the same decision trees as the RFs and a learning rate of 0.1. Except for these parameters, the default settings in the Python packages sklearn version 0.19.2 were used.  Four different binary classifiers were considered: support vector machine (SVM) [27], random forests (RFs) [28], K-nearest neighbor (KNN) [29], and AdaBoost [30]. These classifiers were chosen, partly because they are different from each other in algorithmic logic, but also because they have been used to build stress detection in previous studies. The SVM used a Gaussian kernel function. The KNN was trained with K varies from 1 to 10 and used the Euclidean distance metric. The RFs used 50, 100, and 150 decision trees. The Adaboost classifier used the same decision trees as the RFs and a learning rate of 0.1. Except for these parameters, the default settings in the Python packages sklearn version 0.19.2 were used.

Ultra-Short-Term HRV Features Analysis
The t-test results and the trends of the HRV features are reported in Tables 2 and 3, respectively. For the 5-min epochs, 18 of the 22 HRV features showed statistically significant differences between the low-stress and high-stress levels. Fourteen of these 18 features increased significantly with the increased stress level, while the remaining four features were significantly decreased. Moreover, the results of the ultra-short-term HRV features analysis showed that there were the same significant differences and trends for one HRV feature extracted from different time scales between low-stress and high-stress levels, suggesting that varying epoch lengths might not affect the changes of the HRV features between the different stress levels.

Ultra-Short-Term HRV Features Analysis
The t-test results and the trends of the HRV features are reported in Tables 2 and 3, respectively. For the 5-min epochs, 18 of the 22 HRV features showed statistically significant differences between the low-stress and high-stress levels. Fourteen of these 18 features increased significantly with the increased stress level, while the remaining four features were significantly decreased. Moreover, the results of the ultra-short-term HRV features analysis showed that there were the same significant differences and trends for one HRV feature extracted from different time scales between low-stress and high-stress levels, suggesting that varying epoch lengths might not affect the changes of the HRV features between the different stress levels.  The correlation analysis results for ultra-short HRV features are reported in Table 4. For the time-domain features, four features (MeanNN, SDNN, NN20, and MeanHR) showed high correlation coefficients for 30-s or 1-min epochs (i.e., ultra-short vs. short time-epoch per each feature). For the frequency-domain features, only HF showed a high correlation for 1-min epochs. However, there was no high correlation coefficient in the frequencydomain features computed in 30-s epochs. For the non-linear features, CVI and SampEn showed high correlation coefficients for 1-min epochs, while none of the non-linear features showed high correlation coefficients for 30-s epochs. Moreover, there are high correlation coefficients in all HRV features extracted from 2-min or 3-min epochs between the low-stress and high-stress levels.
For 30-s epochs, MeanNN, SDNN, NN20, and MeanHR could be selected as valid surrogates for the short-term HRV features. For 1-min epochs, MeanNN, SDNN, NN20, MeanHR, HF, CVI, and SampEn were selected as valid surrogates for the short-term features for the driver stress detection. For 2-min or 3-min epochs, all the features, except for SDSD, pNN20, RMSSD, and SD1, could be used as surrogates for the short-term HRV features. For all time scales, MeanNN, SDNN, NN20, and MeanHR showed statistically significant differences between the low-stress and high-stress levels and high correlation coefficients across time-scales (i.e., each ultra-short vs. short time-scale per each feature). Therefore, MeanNN, SDNN, NN20, and MeanHR were selected as valid surrogates for the short-term HRV features to detect drivers' stress levels. Furthermore, the results of the correlation analysis were supported by the visual inspection of the Bland-Altman plots. As the epoch length increases, a decrease in bias and in width of the 95% LoA (±1.96 SD) was observed. The Bland-Altman plots analyses of MeanNN are shown in Figure 4.

Stress Classification with Ultra-Short-Term HRV Features
For the feature selection, we selected four HRV features, which were valid surrogates for the short HRV features across all time scales, as input to build the driver's stress detection model. Since the 5-min epoch is recommended for short-term HRV analyses, the performance of various classifiers with different ultra-short-term HRV features was compared with those classifiers with 5-min HRV features. Table 5 shows the performance of different classifiers using four short-term HRV features (i.e., MeanNN, SDNN, NN20, and MeanHR) as input. The best classification performance was achieved with the SVM classifier, whose mean accuracy was 87.5%. The best-performing classifier was evaluated in various time scale settings. Hence, the SVM classifier with those HRV features (i.e., MeanNN, SDNN, NN20, and MeanHR) was evaluated for each time scale. The results are summarized in Table 6. The best classification accuracy was achieved with 3-min epochs, whose accuracy was 85.3%, while Entropy 2023, 25,194 9 of 13 the accuracy of the SVM classifier using 30-s epochs was 85.0%. Performance dropped by about 3 percentage points using ultra-short-term HRV features, compared with short-term HRV features. Although the HRV features extracted from different time scales as input data had an impact on the performance of stress classification, the ultra-short-term HRV features could still detect drivers' stress levels with a good performance.
tures for the driver stress detection. For 2-min or 3-min epochs, all the features, excep SDSD, pNN20, RMSSD, and SD1, could be used as surrogates for the short-term features. For all time scales, MeanNN, SDNN, NN20, and MeanHR showed statisti significant differences between the low-stress and high-stress levels and high correl coefficients across time-scales (i.e., each ultra-short vs. short time-scale per each feat Therefore, MeanNN, SDNN, NN20, and MeanHR were selected as valid surrogate the short-term HRV features to detect drivers' stress levels. Furthermore, the results o correlation analysis were supported by the visual inspection of the Bland-Altman p As the epoch length increases, a decrease in bias and in width of the 95% LoA (±1.96 was observed. The Bland-Altman plots analyses of MeanNN are shown in Figure 4.

Stress Classification with Ultra-Short-Term HRV Features
For the feature selection, we selected four HRV features, which were valid surrogates for the short HRV features across all time scales, as input to build the driver's stress detection model. Since the 5-min epoch is recommended for short-term HRV analyses, the performance of various classifiers with different ultra-short-term HRV features was com-

Discussion
In this study, we investigate whether ultra-short-term HRV features could be used to measure and assess drivers' stress. Considering that a 5-min HRV analysis may be too long to detect driver stress in time, an ultra-short-term (less than 5 min) HRV analysis was conducted. The t-test results for the HRV features showed that almost all HRV features extracted from the 5-min epochs had statistically significant differences between the lowstress and high-stress levels. The significant differences in ultra-short-term HRV features under different stress levels were in accordance with 5-min HRV features. Moreover, the tendency of ultra-short-term HRV features is the same as the tendency of the corresponding short-term HRV features between various stress levels, which are consistent with another study's findings [31].
For the correlation analysis, the HRV features extracted from 2-min or 3-min epochs had high correlation coefficients with short-term HRV features. In particular, all frequencydomain HRV features extracted from 2-min or 3-min epochs were strongly correlated with those HRV features extracted from 5-min epochs, which is consistent with the findings of another study [32]. Considering that at least a 1-min length is required to calculate the HF band and at least a 2-min length is required to estimate the LF band [32], the results show that LF had a very low Spearman coefficient, below 2 min, whilst for HF, it was below 1 min. Moreover, a 5-min length is required to evaluate the VLF band [33], but in this study, the VLF calculated from 5-min epochs had a high strong correlation for the ultra-shorttime features extracted from over 2-min epochs. Regarding the non-linear HRV features, few studies have investigated their reliability in ultra-short-term lengths. Moreover, the correlation coefficient between the ultra-short-term and short-term HRV features decreased with the decreased length of the ultra-short-term epochs. Moreover, MeanNN, SDNN, NN20 and MeanHR extracted from various time scales have high correlation coefficients with those extracted from 5-min epochs, suggesting that those HRV features could be good surrogates for the short-term HRV features to detect drivers' stress levels.
For driver stress detection, the results show that it is feasible to detect drivers' stress levels in a very short time. The performances of four different classifiers were evaluated for stress detection with ultra-short-term HRV features. Although the best classification performance was achieved by the SVM classifier using short-term HRV features as input with an accuracy of 87.5%, the performance of the SVM classifier using ultra-short-term HRV features as input was not bad. The performance dropped by about 3 percentage points for the stress detection with ultra-short-term HRV features, suggesting that the ultra-shortterm HRV features could be good surrogates for the short-term HRV features for stress detection. Moreover, previous studies have used the driver dataset used in this study to explore the stress detection method. Our model's performance is superior to that of those studies. Munla et al. [9] used an SVM-RBF classifier using 5-min epochs as input to predict driver stress levels and achieved an accuracy of 83%, which is lower than the accuracy achieved in our study. Dalmeida et al. [12] proposed a method to automatically detect stress with HRV features computed by 30-s epochs and obtained an accuracy of 85% (sensitivity = 81% and F1-score = 78%). However, it does not explore that ultra-short-term features are significantly correlated with 5-min features. The redundant features may affect the model's performance. Our study has confirmed that not all 30-s HRV features are a good substitute for 5-min HRV features. Moreover, Vargas et al. [34] analyzed the ability of EMG signals with different time scales (1 min, 2 min, 3 min, 4 min, and 5 min) for driver stress detection. Their results showed that an ultra-short-term EMG analysis was not able to detect drivers' stress levels, and the accuracy of the ultra-short-term epochs dropped by about 20%, compared to short-term epochs. However, none of the studies have focused on ultra-short-term HRV analyses for driver stress detection.
Our result indicated that MeanNN, SDNN, NN20, and MeanHR could be used to replace the corresponding short-term HRV feature to detect driver stress levels. Moreover, to explore the ability of the single features to detect drivers' stress, each valid surrogate feature using 5-min epochs was used, respectively, as input for the classifiers. The performances of these classifiers dropped by about 10-15 percentage points using each valid surrogate feature as input, compared with the combination of these HRV features.
While this study provides stress detection with ultra-short-term HRV features, important limitations are noted. Although the stress detection model in this study was developed, based on some ultra-short-term HRV features (i.e., MeanNN, SDNN, NN20, and MeanHR), more frequency-domain and non-linear HRV features should be explored for stress detection. More HRV features used as inputs might improve the models' performance. Furthermore, the variation in cardiac activity between individuals could be considerably large, which may be influenced by age, weight, and gender. It has been shown that accounting for inter-individual variability may improve the classification performance [35]. Future work should investigate more powerful and robust models to accurately detect drivers' stress, combining ultra-short-term HRV analysis and other physiological signals.

Conclusions
This study investigated the ability of ultra-short-term HRV analysis to detect and assess driver stress levels under real world driving conditions. Although the ability of short-term HRV analysis has been found for stress detection, the feasibility of stress detection using ultra-short-term HRV features has not been explored. Our study demonstrates that not all ultra-short HRV features are good surrogates of corresponding short-term HRV features. In particular, HRV features which could be used for stress assessment are different for different ultra-short-term time scales. Moreover, although ultra-short-term HRV features have a weak negative impact on the classification performance, compared with shortterm HRV features, the performance of ultra-short-term HRV features still could reach a high level.
Driver stress detection, based on ultra-short-term HRV analysis, remains an interesting and challenging issue, which have not been addressed appropriately. With the advance in vehicle equipment, it is very important to detect drivers' stress levels in time. The ultrashort-term HRV analysis could detect driver stress levels accurately and timely, alerting the driver to avoid traffic incidents and accidents. The findings of this study could contribute to building a robust and effective stress detection system using ultra-short-term HRV features for driver stress measurement and assessment in a short and real-time manner, which could be used to alert the driver to make an adjustment and thereby ensure traffic safety.