Mental Stress Assessment Using Ultra Short Term HRV Analysis Based on Non-Linear Method

Mental stress is on the rise as one of the major health problems in modern society. It is important to detect and manage mental stress to prevent various diseases caused by stress and to maintain a healthy life. The purpose of this paper is to present new heart rate variability (HRV) features based on empirical mode decomposition and to detect acute mental stress through short-term HRV (5 min) and ultra-short-term HRV (under 5 min) analysis. HRV signals were acquired from 74 young police officers using acute stressors, including the Trier Social Stress Test and horror movie viewing, and a total of 26 features, including the proposed IMF energy features and general HRV features, were extracted. A support vector machine (SVM) classification model is used to classify the stress and non-stress states through leave-one-subject-out cross-validation. The classification accuracies of short-term HRV and ultra-short-term HRV analysis are 86.5% and 90.5%, respectively. In the results of ultra-short-term HRV analysis using various time lengths, we suggest the optimal duration to detect mental stress, which can be applied to wearable devices or healthcare systems.


Introduction
As the number of people with health problems due to stress increases every year, mental stress is emerging as one of the major human health problems in modern society. Mental stress not only harms mental and physical health, but also causes various diseases, such as diabetes, cardiovascular and respiratory diseases, depression, and cancers [1][2][3][4]. Mental stress is largely classified into chronic stress and acute stress. When one faces an acute stressor, the human body triggers a "fight-or-flight" response, a survival mechanism triggered by external stimuli to maintain homeostasis. This response activates the sympathetic nervous system within the body's autonomic nervous system (ANS) and causes changes in the body through the endocrine system [5]. If an acute stressor continuously affects the body, it can be transformed into chronic stress, in which the ANS becomes unbalanced. Therefore, it is necessary to quantitatively measure and manage acute stress in daily life to remain healthy.
New methods for measuring stress are required in various fields. First, questionnaire methods [6][7][8] are often used in the medical field, but such methods can be considered a subjective indicator of each individual. Another stress measurement method utilizes biomarkers, such as salivary alpha-amylase and cortisol [9,10]. Although the biomarker From this point of view, we analyzed intrinsic mode function (IMF) energy features extracted from EMDs that can provide frequency domain information using less than 5 min of data. Furthermore, we evaluated the stress classification accuracy of short-term and ultra-short-term HRV data using time domain features, including general HRV features, entropy features, and proposed EMD-based features, and suggested the optimal time length for ultra-short-term HRV data collection in acutely stressful situations. In addition, the classification method used a linear SVM classifier, and the appropriate evaluation was performed with the leave-one-subject-out cross-validation (LOSOCV) method, which is most similar to real-life application.

Methods
The stress classification method proposed in this study is shown in Figure 1. Subject selection and the experimental protocol for HRV signal measurement are described in Sections 2.1 and 2.2, respectively; preprocessing to remove ectopic heartbeats is described in Section 2.3; the description of the proposed HRV features and general features is outlined in Section 2.4; and ranked feature processing to increase accuracy and a classification method using leave-one-subject-out cross-validation are outlined in Section 2.5.
From this point of view, we analyzed intrinsic mode function (IMF) energy features extracted from EMDs that can provide frequency domain information using less than 5 min of data. Furthermore, we evaluated the stress classification accuracy of short-term and ultra-short-term HRV data using time domain features, including general HRV features, entropy features, and proposed EMD-based features, and suggested the optimal time length for ultra-short-term HRV data collection in acutely stressful situations. In addition, the classification method used a linear SVM classifier, and the appropriate evaluation was performed with the leave-one-subject-out cross-validation (LOSOCV) method, which is most similar to real-life application.

Methods
The stress classification method proposed in this study is shown in Figure 1. Subject selection and the experimental protocol for HRV signal measurement are described in Sections 2.1 and 2.2, respectively; preprocessing to remove ectopic heartbeats is described in Section 2.3; the description of the proposed HRV features and general features is outlined in Section 2.4; and ranked feature processing to increase accuracy and a classification method using leave-one-subject-out cross-validation are outlined in Section 2.5.

Dataset
The stress database [39,40] provided by PhysioNet assumes that the subject is stressful in the given stress situations, and this approach may be limited in classifying stress. In addition, the PhysioNet stress database is difficult to generalize because it consists of a small number of subjects. In order to supplement the above limitations, the following experiment was conducted.

Subjects
The dataset consists of physiological signals and questionnaires from 80 participants (78 males and 2 females). All participants were third-year police officers without heart

Dataset
The stress database [39,40] provided by PhysioNet assumes that the subject is stressful in the given stress situations, and this approach may be limited in classifying stress. In addition, the PhysioNet stress database is difficult to generalize because it consists of a small number of subjects. In order to supplement the above limitations, the following experiment was conducted.

Subjects
The dataset consists of physiological signals and questionnaires from 80 participants (78 males and 2 females). All participants were third-year police officers without heart disease and were in good physical condition. The data for six subjects (all male) were excluded due to sensor problems. This study was approved by the Institutional Review Board of Hanyang University Hospital and was conducted according to the guidelines of the Declaration of Helsinki, and all subjects provided informed consent before the experiment (HYUIRB-202009-032-3)

Experimental Protocol
All participants performed the stress-inducing experiment in a laboratory environment according to the experimental protocol shown in Figure 2. The sequence of the experiment protocol was sensor attachment, resting state (pre) for 5 min, exposure to each of the 3 stressors for 5 min, resting state (post) for 5 min, and self-reported subjective stress intensity. We used various methods to induce acute mental stress, including TSST (public speaking and arithmetic task) and horror movie viewing. All participants performed the stress-inducing experiment in a laboratory enviro ment according to the experimental protocol shown in Figure 2. The sequence of the periment protocol was sensor attachment, resting state (pre) for 5 min, exposure to ea of the 3 stressors for 5 min, resting state (post) for 5 min, and self-reported subjective str intensity. We used various methods to induce acute mental stress, including TSST (pub speaking and arithmetic task) and horror movie viewing. Before the experiment, participants wore a Polar H10 HR monitor with a Polar P Chest Strap (Polar Electro Oy, Kempele, Finland) to measure respective heart rate (H and heart rate variability (HRV), which represent autonomic responses. The RR inter data were transferred to Samsung Galaxy Tab A (Samsung Electronics, Co., Ltd., Seo South Korea) and saved in Polar's app. Based on findings that alpha-amylase increa when humans are exposed to a stressful situation [41,42], salivary alpha-amylase was a measured using a COCORO Meter (Nipro Co, Osaka, Japan).
As shown in Figure 2, the stress-inducing experiment protocol consists of a non-str section, which is comprised of rest (pre and post), and a stress section, comprised of TS (public speaking, arithmetic task) and horror movie viewing. In the resting section, participants were seated on a comfortable chair, and the HRV signal was measured fo min of relaxation. During the Stress 1 section, public speaking including job-related qu tions and a simulated job interview that was conducted for 5 min. In the Stress 2 secti mental arithmetic tasks (MATs) were performed. The MAT was to continue subtracti 17 from 2023. If an answer was wrong, subtraction started again from 2023, and the M was repeated for 5 min. During the horror movie viewing within the Stress 3 section, p ticipants watched horror movie clips for 5 min. After the experiment, a questionnaire w conducted on the ranking of subjective stress intensity. The ranking of subjective str intensity was set to 5 for the most stressful situations and 1 for the least stressful situatio 59 of 74 subjects answered that mental arithmetic was the most stressful, while the maining eight and seven subjects said horror movie clips and public speaking were most stressful, respectively.

Noise Removal and Interpolation of HRV Signal
Prior to HRV analysis, preprocessing is essential to remove the outliers in RR int vals caused by noise, such as movement. Outliers in RR interval data were removed a defined as data outside 3 standard deviations (SD) from the mean [43]. Considering Before the experiment, participants wore a Polar H10 HR monitor with a Polar Pro Chest Strap (Polar Electro Oy, Kempele, Finland) to measure respective heart rate (HR) and heart rate variability (HRV), which represent autonomic responses. The RR interval data were transferred to Samsung Galaxy Tab A (Samsung Electronics, Co., Ltd., Seoul, South Korea) and saved in Polar's app. Based on findings that alpha-amylase increases when humans are exposed to a stressful situation [41,42], salivary alpha-amylase was also measured using a COCORO Meter (Nipro Co, Osaka, Japan).
As shown in Figure 2, the stress-inducing experiment protocol consists of a non-stress section, which is comprised of rest (pre and post), and a stress section, comprised of TSST (public speaking, arithmetic task) and horror movie viewing. In the resting section, the participants were seated on a comfortable chair, and the HRV signal was measured for 5 min of relaxation. During the Stress 1 section, public speaking including job-related questions and a simulated job interview that was conducted for 5 min. In the Stress 2 section, mental arithmetic tasks (MATs) were performed. The MAT was to continue subtracting 17 from 2023. If an answer was wrong, subtraction started again from 2023, and the MAT was repeated for 5 min. During the horror movie viewing within the Stress 3 section, participants watched horror movie clips for 5 min. After the experiment, a questionnaire was conducted on the ranking of subjective stress intensity. The ranking of subjective stress intensity was set to 5 for the most stressful situations and 1 for the least stressful situations. 59 of 74 subjects answered that mental arithmetic was the most stressful, while the remaining eight and seven subjects said horror movie clips and public speaking were the most stressful, respectively.

Noise Removal and Interpolation of HRV Signal
Prior to HRV analysis, preprocessing is essential to remove the outliers in RR intervals caused by noise, such as movement. Outliers in RR interval data were removed and defined as data outside 3 standard deviations (SD) from the mean [43]. Considering the non-linear characteristics of HRV signals, cubic spline interpolation was performed [44]. The HRV signal used to obtain EMD-based features was interpolated with cubic spline and was resampled at 8 Hz.

Comparison of Short-Term HRV and Ultra-Short-Term HRV
For comparison with ultra-short-term HRV, we analyzed short-term HRV using data collected over 5 min, which is the data length commonly used in previous studies. We selected the resting state and stress state for each participant using the rankings of subjective stress intensity. The resting state was selected as the lowest subjective stress intensity ranking between Resting 1, measured before the stress experiment, and Resting 2, measured after the stress experiment. The stress state was selected as the highest stress ranking among Stress 1, Stress 2, and Stress 3.
According to Thomas Wyss et al. [45], the HR, an ANS indicator, is initially high during acute mental stress situations and decreases as the stress situation continues. These results indicate that the response of the sympathetic nervous system to acute mental stress not only responds rapidly, but also adapts to a stress stimulus due to ANS homeostasis. As shown in Figure 3, ultra-short-term HRV analysis was performed by dividing time segments into first, middle, and last segments over 1, 2, and 3 min, respectively. Each resting state was paired with stress state, and the segment with the lowest average HR was used. For comparison with ultra-short-term HRV, we analyzed short-term HRV using data collected over 5 min, which is the data length commonly used in previous studies. We selected the resting state and stress state for each participant using the rankings of subjective stress intensity. The resting state was selected as the lowest subjective stress intensity ranking between Resting 1, measured before the stress experiment, and Resting 2, measured after the stress experiment. The stress state was selected as the highest stress ranking among Stress 1, Stress 2, and Stress 3.
According to Thomas Wyss et al. [45], the HR, an ANS indicator, is initially high during acute mental stress situations and decreases as the stress situation continues. These results indicate that the response of the sympathetic nervous system to acute mental stress not only responds rapidly, but also adapts to a stress stimulus due to ANS homeostasis. As shown in Figure 3, ultra-short-term HRV analysis was performed by dividing time segments into first, middle, and last segments over 1, 2, and 3 min, respectively. Each resting state was paired with stress state, and the segment with the lowest average HR was used.

Time Domain Features
Time domain HRV features frequently used to evaluate acute mental stress were extracted. The mean RR interval (RR), standard deviation of RR interval (SDNN), square root of the mean squared difference between successive RR intervals (RMSSD), and the proportion of successive differences between RR intervals greater than x msec (pNN30, pNN50) were extracted. Expressions for these features are as follows.

Time Domain Features
Time domain HRV features frequently used to evaluate acute mental stress were extracted. The mean RR interval (RR), standard deviation of RR interval (SDNN), square root of the mean squared difference between successive RR intervals (RMSSD), and the proportion of successive differences between RR intervals greater than x msec (pNN30, pNN50) were extracted. Expressions for these features are as follows.
Additionally, we extracted G-pNNx (Grouped-pNNx), a new feature that can be applied to a group of young and healthy people. Based on a prior study [46], using a parameter between pNN10-40 instead of pNN50 is more suitable for stress assessment. We developed G-pNNx as a pNNx-based stress feature optimized for young people using data collected from young police officers at rest. The procedure for calculating the x-value of G-pNNx is as follows.

1.
On the basis of subjectively ranked self-reported stress intensity in the experimental protocol, select the lowest stress rank between Resting 1 and Resting 2.

2.
If the distribution of data satisfies normality, obtain G-pNNx using the mean value of the distribution; otherwise, obtain it using the median value.
Since the data do not satisfy normality (Kolmogorov-Smirnov, p < 0.05) in our study, the x value of G-pNNx was obtained using the median value (18.5).

EMD-Based Features EMD
EMD is an adaptive time-series analysis method suitable for processing non-stationary and non-linear series [47]. Since the HRV signal has non-stationary and non-linear characteristics [48], EMD is suitable for our research. EMD can decompose any signal with an IMF. As the x-value in an IMF increases, the low-frequency component of the original signal is included. For example, IMF1 represents the highest local frequency component of the signal. In order to extract IMFs from the original signal using the EMD method, the following two basic conditions are essential [49].
1. The numbers of extrema and zero crossings must be the same or different at most by one within the entire dataset.
2. The mean value of the envelope defined by local maxima and minima must be zero at any point.
If the two conditions are satisfied, the IMF can be continuously decomposed by the EMD method for x(t) and mathematically expressed as follows.
where x(t) is the HRV signal, we used decomposed IMF of 1~3, and the residual r was not used.

Entropy Features
The entropy method is suitable for HRV signals as a non-linear method similar to EMD, and the entropy methods used in our study are permutation entropy and sample entropy.
Permutation entropy (PE) is useful to represent the complexity of dynamic time-series signals and has the advantages of simple calculation and robustness to noise [50]. PE is calculated by the following equation [51].
The factorial calculated from sequence length m (dimension) is the number of possible permutation patterns, and A i is the probability of the i-th permutation pattern. The setting values in the permutation entropy method are time delay tau and dimension m. Since the sampling frequency of the resampled HRV signal is 8 Hz, the maximum value of tau was set to 4 by the Nyquist-Shannon sampling theorem [52]. Since the minimum length of data used for ultra-short-term HRV analysis was N (data length) = 480 (sampling frequency = 8 Hz, 60 s), m was set to 4, suitable for values under the condition of N > 5m! [53].
Sample entropy compensates for the disadvantage of approximate entropy [54] and measures the irregularity and complexity of HRV signal and IMF components. The setting values used in sample entropy consist of embedding dimension m and tolerance r. In a previous study [55], the embedding dimension value was set to 2, and the tolerance value was set using the standard deviation of the HRV data (r = 0.2 × SD).

Energy Features
Previous stress assessment studies based on HRV signals frequently use frequency domain features, including the VLF component (~0.04 Hz), LF component (0.04~0.15 Hz), HF component (0.15~0.4 Hz), and LF/HF ratio. A major drawback of frequency domain features is the inaccuracy when using ultra-short-term HRV signals. According to [56], LF band (0.04~0.15 Hz) power requires HRV data to span at least 250 s from a theoretical point of view, so ultra-short-term HRV analysis using HRV data lasting about 4 min or less has poor reliability regarding several HRV features based on LF band power.
Based on Parseval's theorem [57] that the total energy of a signal can be calculated by summing power across time or spectral power across frequency, as shown in Equation (6), this paper presents a methodology that can replace frequency domain features with the energy of IMF components. The methodology compares the HF band power and energy of the high-frequency component IMF1 and the LF band power and energy of the relatively low-frequency components (IMF2 and IMF3). In addition, IMF energy features corresponding to the LF/HF ratio, normalized-LF, and normalized-HF were extracted.

SD-IMF and RMSSD-IMF Features
Three IMF components obtained through the EMD method were extracted using SDNN Equations (1) and RMSSD (2). Instead of RRi in Equation (1), three SD-IMF features were used for the IMF components, and three RMSSD-IMF features were extracted using Equation (2).

Feature Ranking Method
The number of extracted HRV features was 26, including 6 general time domain features, 3 entropy/energy features of HRV signal not decomposed through EMD (permutation entropy, sample entropy, and energy), 6 EMD-based entropy features, 5 EMD-based entropy features, and 6 EMD-based time domain features.
The Relief-F algorithm was applied to remove features that did not contribute to stress classification performance and to improve computational efficiency. The Relief-F algorithm is a method to evaluate the contribution of each feature based on k-nearest neighbors by increasing the inter-class difference and decreasing the intra-class difference [58]. The algorithm finds a sample with the same class label and a sample with a different class label closest to a randomly selected sample among the k samples in the closest order and calculates a weight using the difference between the selected sample and the closest sample. In this study, the feature with the lowest weight value between the resting state and the stress state was removed sequentially to achieve good performance.

Support Vector Machine (SVM) Classifier
In order to classify the resting and stress states, the SVM classifier widely used in stress research was used. The SVM classifier is a method of finding the optimal hyperplane to classify a class, and a vector contributing to creating the hyperplane is called a support vector [59]. Mathematically, an SVM is represented as follows [60]: where P(z,z k ) is the kernel function, z k is the D-dimension k-input vector (feature), t k is the target class vector, a k is the LaGrangian multiplier, and b represents the bias term. We used SVM with a linear classifier and classified stress and non-stress states using short-term HRV and ultra-short-term HRV data.

Leave-One-Subject-Out Cross Validation (LOSOCV)
The LOSOCV method is a variation of the k-fold cross-validation approach that validates as many folds as there are subjects included in the data set [61]. LOSOCV evaluates the accuracy on new subjects that have not been seen by the model, suggesting whether the developed classifier model can achieve classification performance when applied in real situations. In this study, the performance of the training model was evaluated using LOSOCV.

Performance Evaluation
We calculated evaluation indicators, such as accuracy, precision, recall, and F1-score, to explain the results of LOSOCV. Each evaluation indicator is expressed as follows [62]:

Relationships between Frequency Domain Features and IMF Energy Features
In order to perceive the trends in 5-min HRV signal and IMF components in the frequency domain, we represented each component using fast Fourier transforms (FFTs), as shown in Figure 4. The HF band (0.15-0.4 Hz) of the HRV and IMF1 spectral areas (blue) are similar, and the LF band (0.04-0.15 Hz) of the HRV spectrum and the sum of the IMF2 (red) and IMF3 spectral areas (pink) are similar. In addition, the mean frequency for each IMF using the entire data including the resting and stress state is shown in Figure 5. The mean frequency of IMF1 is included in the HF band, and IMF2 and IMF3 are included in the LF band.    The Pearson correlation method was used to describe the proposed IMF tures as surrogates of frequency domain features. Table 1 presents the corre cients of the relationships between IMF energy (IMF1, IMF2+IMF3, and ra quency domain features (HF, LF, and HF/LF ratio). Using the scale of Hopk results in Table 1  Using all data, including both the resting state and the stress state, IMF1 tures had a nearly perfect correlation value (r = 0.93) with HF and a relative lation value with LF (r = 0.77). Contrary to IMF1-energy features, the IMF2+I features had a nearly perfect correlation value (r = 0.92) with LF and a relativ very large correlation value (r = 0.79) in HF. The IMF-energy ratio was high with the LF/HF ratio (r = 0.86).  The Pearson correlation method was used to describe the proposed IMF energy features as surrogates of frequency domain features. Table 1 presents the correlation coefficients of the relationships between IMF energy (IMF1, IMF2+IMF3, and ratio) and frequency domain features (HF, LF, and HF/LF ratio). Using the scale of Hopkins [63], the results in Table 1 were qualitatively analyzed with the Pearson correlation r, described as trivial  Using all data, including both the resting state and the stress state, IMF1-energy features had a nearly perfect correlation value (r = 0.93) with HF and a relatively low correlation value with LF (r = 0.77). Contrary to IMF1-energy features, the IMF2+IMF3energy features had a nearly perfect correlation value (r = 0.92) with LF and a relatively low and very large correlation value (r = 0.79) in HF. The IMF-energy ratio was highly correlated with the LF/HF ratio (r = 0.86).

Comparison of Feature Ranks between Resting and Stress States
When classifying the resting state and stress state with the Relief-F algorithm using short-term HRV data, the results are listed in descending order of weight, as shown in Table 2. Linear SVM classifier training was repeated by adding features in ascending order from the ranked features one at a time, and the optimal number of features with the best classification performance was selected. From these results, the proposed energy-related features, such as ranks 1, 3, and 6, have high weights.

Short-Term HRV Classification and Performance Evaluation
The results of LOSOCV performance evaluation on short-term HRV data after feature selection are shown in Figure 6. The best classification performance was obtained when the classifier model was developed using the top 17 highest ranked features. The results comparing the stress state to the resting state were 86.5% accurate, with a recall of 85.1%, precision of 87.5%, and F1-score of 86.3%.

Short-Term HRV Classification and Performance Evaluation
The results of LOSOCV performance evaluation on short-term HRV d selection are shown in Figure 6. The best classification performance was the classifier model was developed using the top 17 highest ranked featu comparing the stress state to the resting state were 86.5% accurate, with a precision of 87.5%, and F1-score of 86.3%.

Ultra-Short-Term Classification and Performance Evaluation
The classification results based on ultra-short-term HRV data by dividing the time segments into each time length, designated as the first, middle, and last time lengths (1, 2, and 3 min), are shown in Figure 3. The linear-SVM classifier model was used in the same way as the classification method based on the short-term HRV data, and LOSOCV performance evaluation was performed using the ranked features listed in Table 2. The results of ultra-short-term HRV classification were compared for time segment and time length (Table 3). When compared in the time segments between the first, middle, and last time lengths, the classification accuracy of the first length is higher than that of the middle and last lengths, as shown in Figure 7. The accuracy of the first segment of 2-min and 3-min lengths is higher than the classification accuracy using short-term HRV data. Segments of 1-min length achieved the highest accuracy in the middle part, and this result indicated a different tendency than those of segments of 2-min and 3-min length. All segments of 1-min length led to less accurate predictions than the classification performance using short-term HRV data. Table 3. Classification performance using ultra-short-term HRV data according to time segments and time lengths.

Front
Middle Last

Ultra-Short-Term Classification and Performance Evaluation
The classification results based on ultra-short-term HRV data by dividing the tim segments into each time length, designated as the first, middle, and last time lengths (1 and 3 min), are shown in Figure 3. The linear-SVM classifier model was used in the sam way as the classification method based on the short-term HRV data, and LOSOCV perfo mance evaluation was performed using the ranked features listed in Table 2. The resu of ultra-short-term HRV classification were compared for time segment and time leng (Table 3). When compared in the time segments between the first, middle, and last tim lengths, the classification accuracy of the first length is higher than that of the middle a last lengths, as shown in Figure 7. The accuracy of the first segment of 2-min and 3-m lengths is higher than the classification accuracy using short-term HRV data. Segments 1-min length achieved the highest accuracy in the middle part, and this result indicated different tendency than those of segments of 2-min and 3-min length. All segments of min length led to less accurate predictions than the classification performance using sho term HRV data.

Salivary Cortisol of Different States
According to the experiment protocol in Figure 2, sAA measurements were performed a total of four times during the experiment (sAA1: After public speaking sAA, sAA2: After mental arithmetic sAA, sAA3: After horror movie sAA, sAA4: After resting sAA). Since not all sAA data satisfied the normality test, the resting state (sAA4) and stress states (sAA1~3) analyses were performed using the non-parametric Mann-Whitney U test. Compared with sAA4, sAA1 (p = 0.056), sAA2 (p = 0.797), and sAA3 (p = 0.082) were not significant in all stress states.

Discussion
This study suggests that the proposed IMF energy features obtained using EMD are good surrogates of the frequency domain features. Although there is previous research that showed similarity among HRV features, including time domain, frequency domain, and non-linear features [64], this is the first study to suggest that frequency domain features such as HF, LF, and ratio can be replaced with other HRV features.
Ultra-short-term HRV analysis, which evaluates mental stress using data less than 5 min long, is a topic of increasing interest [56]. Castaldo et al. [32] presented time segments and HRV features in ultra-short-term HRV data that can be replaced with short-term HRV analysis as a result of correlation analysis between ultra-short-term and short-term HRV features. It was suggested that features including HR, RMSSD, pNN50, and sample entropy can be substituted for ultra-short-term HRV analysis with a 2-min length compared to short-term HRV analysis (r > 0.7). Using the same method from the research above, 4 time lengths (5 min, 3 min, 2 min, and 1 min) were compared, as in Table 4. In the resting state, all features were significantly correlated in each time scale (r > 0.7), and all features except the 1 min time scale were significantly correlated in the stress state (r > 0.7). Accordingly, our proposed IMF energy features show that ultra-short-term HRV analysis can replace short-term HRV analysis with a time length of at least 2 min. The proposed stress classification method was compared with some of the latest studies that selected TSST including public speaking and cognitive tasks as mental stressors and conducted acute stress classification with short-term HRV analysis. Table 5 compares the number of subjects, type of measured physiological signals, classifier model, validation method, and accuracy of our and previous studies. Multi-physiological signals, including ECG, PPG, and GSR, were measured using public speaking as a stressor, and the result was achieved with 79% accuracy using the AdaBoost classifier and four-fold cross-validation to classify between stress and non-stress states [17]. The results of this study have relatively low accuracy compared to other studies. The stress classification accuracy of another study [18] was 96.3% using leave-one-out cross-validation with the SVM-RBF classifier considering PPG, GSR, and EEG. However, the result considering a single signal (PPG) was 80.0%, which is lower than our accuracy of 86.5%. Another study [12] selected TSST and Stroop color test as stressors and used a random forest classifier and three-fold crossvalidation. The results for overlapping and not overlapping HRV signals over 5 min are accuracies of 96.0% and 85.9%, respectively. The disadvantage of the aforementioned study is that the number of subjects is small, and leave-one-subject-out cross-validation, similar to that applied in real-world, was not used. The study most similar to the present study [11] selected TSST as the stressor, and the classification accuracy results of five-fold cross-validation and leave-one-subject-out cross-validation with an ANN classifier were 91% and 84.4%, respectively. When comparing based on LOSOCV, the results of our study achieved a higher accuracy of 86.5%.
The dataset of our study, collected from 74 people, is the largest study that selected TSST as a stressor. Compared with precedent studies using a single physiological signal and LOSOCV, the result of the proposed stress detection method presents the highest performance. For various stress-inducing tasks (public speaking, arithmetic, horror movie viewing), the stress state and resting state were selected as individual stress intensities through a questionnaire to rank subjective stress intensity. The reason for dividing the stress and resting states in this way is that the stress intensity for each task differs by person, which is also one of the advantages compared to previous studies. As HRV analysis of data spanning less than 5 min becomes important in several healthcare applications, ultra-short-term HRV analysis has been actively studied in previous studies of HRV stress assessment using various methods included in mental arithmetic tasks [28], examination stress [11,32], and the Stroop color-word task [19,65]. However, in all previous studies, the HRV data used for ultra-short-term HRV analysis was divided according to the required duration or the central position of the 5 min HRV data. The proposed study is the first considering stress adaptation due to ANS homeostasis during stress-induced tasks. By comparing the results of ultra-short-term HRV stress classification consisting of three time segments (first, middle, and last) and three durations (1 min, 2 min, and 3 min) with general short-term HRV stress classification, stress adaptation was confirmed by deriving higher accuracy in the first segment than in the last segment. In addition, the accuracy of ultra-short-term HRV stress classification (Table 3) shows higher accuracy in the first segments of 2-min and 3-min lengths compared to the accuracy of short-term HRV stress classification (86.5%). The optimal time length to be used for stress assessment using ultra-short-term HRV data is 3 min, but a minimum length of 2 min is suggested.

Conclusions
In our study, we presented IMF energy features based on EMD as a surrogate of general frequency domain features in HRV analysis and suggested the optimal duration for ultra-short-term HRV analysis as a result of stress classification accuracy comparison between short-term and ultra-short-term HRV analysis. Performance evaluation was conducted using LOSOCV, most similar to real-world situations. The proposed study has high accuracy compared to previous studies using similar classification methods based on a single physiological signal.
This study has three contributions. First, we provide frequency domain information with data of less than 5 min. Through EMD-based features, it is possible to replace information about frequency domain that are difficult to use in ultra-short-term. Second, we identified stress adaptability over time. In this way, the ultra-short-term analysis would be more appropriate when evaluating stress than the commonly used short-term analysis. Finally, we propose an optimal time length for ultra-short-term HRV data collection in acutely stressful situations. It is possible to provide a comfortable stress analysis service for several healthcare applications through the proposed length.