Detection of Drivers’ Anxiety Invoked by Driving Situations Using Multimodal Biosignals

: It has become increasingly important to monitor drivers’ negative emotions during driving to prevent accidents. Despite drivers’ anxiety being critical for safe driving, there is a lack of systematic approaches to detect anxiety in driving situations. This study employed multimodal biosignals, including electroencephalography (EEG), photoplethysmography (PPG), electrodermal activity (EDA) and pupil size to estimate anxiety under various driving situations. Thirty-one drivers, with at least one year of driving experience, watched a set of thirty black box videos including anxiety-invoking events, and another set of thirty videos without them, while their biosignals were measured. Then, they self-reported anxiety-invoked time points in each video, from which features of each biosignal were extracted. The logistic regression (LR) method classified single biosignals to detect anxiety. Furthermore, in the order of PPG, EDA, pupil, and EEG (easiest to hardest accessibility), LR classified accumulated multimodal signals. Classification using EEG alone showed the highest accuracy of 77.01%, while other biosignals led to a classification with accuracy no higher than the chance level. This study exhibited the feasibility of utilizing biosignals to detect anxiety invoked by driving situations, demonstrating benefits of EEG over other biosignals.


Introduction
The emotional state during driving is related to driving safety and comfort [1,2]. Negative emotions, especially, can have a serious impact on driving performance, resulting in an increase in the risk of accidents. For example, anger is directly linked to vehicle accidents, and anxiety interferes with concentration on driving [3]. Some studies showed that negative emotions can be regulated by feedback from in-vehicle agents [4,5], which suggests that it is essential to identify the emotional state of a driver to give appropriate feedback.
It has already been revealed, that changes in physiological features such as electroencephalography (EEG), photoplethysmography (PPG), electrodermal activity (EDA), and eye-related features are more suitable than a subjective questionnaire in stress detection [6,7]. Similarly, many studies have attempted to recognize a driver's emotional state using biosignals, without self-expression of emotions by the driver [8][9][10][11][12][13]. Some studies measured the physiological outcomes of autonomic nervous systems such as heart rates and skin conductance and used them to infer the level of stress in driving situations [8][9][10], while others also inspected traffic situations (e.g., crash) since drivers' internal emotional state can be changed significantly by external events [11,12,14,15]. For instance, a study revealed that an attention reaction level represented by skin conductance response increased with an accident risk level (i.e., external driving environment), regardless of individual trait anxiety levels (i.e., internal state) [14].
Likewise, drivers are affected by environmental dynamics, which gives rise to the demand on detection of a driver's emotion invoked from external driving situations.
Although driving anxiety is one of the emotions most influential to driving safety [16], few studies have measured the physiological [14] and neural responses [15] of anxiety compared to other negative emotions [8][9][10][11][12]. In addition, the previous studies determined the onset of anxiety as being spread over an entire video clip [12], or as identical across subjects [14,15]. However, due to variability of driving experiences and personal traits, individual drivers may start to feel anxiety at different time points.
Therefore, in the present study, we aimed to detect driving anxiety using biosignals measured at individualized anxiety onset. In addition, we investigated how combining multiple biosignals could improve such detection. For this purpose, we extracted features from four different biosignals: Electroencephalography (EEG), photoplethysmography (PPG), electrodermal activity (EDA) and pupil size (PS). As a detection algorithm, we built and trained a classifier based on the data of individual subjects and used it to classify biosignals into either a normal or anxiety state.
We confirmed that classification of EEG outperformed that of other signals in terms of average accuracy and weights in the classification model. Classifiers tended to utilize frontal theta, alpha and gamma powers of EEG to detect anxiety-invoked situations. Furthermore, adding other biosignals such as EDA or pupil size to EEG further enhanced the detection performance in some participants. Our findings contribute to the ability to extract feasible biosignals and reveal cognitive processes related to driving anxiety.

Participants and Stimuli
Thirty-one university students with normal vision were recruited who had maintained their driver licenses for at least one year since they obtained them (15 females, 16 males, mean age 23.26 ± 1.93 years, mean license possession period 19.62 ± 11.84 months). The participants in the present study were different from our previous study that used the same stimuli [15]. This study was carried out in accordance with the recommendations of the Institutional Review Board of the Ulsan National Institute of Science and Technology (UNISTIRB-18-45-C) with written informed consent from all participants. After experiments, eight participants were excluded from data analysis because in more than 80% of trials, one or more biosignal data points was found to be in poor quality.
Three anxiety-invoking external events during driving were used in this study: A sudden jaywalker, a sudden entry of a vehicle including bicycle, and a speeding vehicle passing by. These events were chosen using the risk criteria in the Hazard Perception Test provided by England Driver and the Vehicle Standard Agency [17]. We collected thirty 30 s driver perspective video clips from YouTube, which contained one of the three anxiety-invoking events above. Each video clip included one anxiety-invoking event (video of anxiety: VA). We also collected another set of thirty 30 s driver perspective video clips from YouTube that did not include any anxiety-invoking events but presented driving at normal speed (video of normal condition: VN). The anxiety-invoking events started on average at 12.73 s (S.D. 5.77 s) and lasted for 2.87 s (S.D. 1.20 s) ( Table A1). The start time was determined when an anxiety-related object appeared in the video and the lasting time elapsed from the start time to the time point when the object disappeared.

Experimental Task
The experiment consisted of two sessions ( Figure 1). In the first session, participants were asked to watch sixty videos. At the end of each video, they were asked to answer the question of whether or not they felt anxiety during the video by pressing a keypad (1: Yes/2: No). Presentation of videos was repeated over three successive runs with a short break between runs-there were twenty trials of video presentation followed by responses in each run. The number of VA and VN in each run were balanced and each video was presented in a random order. In the second session, participants were told to press balanced and each video was presented in a random order. In the second session, participants were told to press the space bar at the points when they had felt anxiety while they watched the same sixty videos again. They were allowed to press multiple times, yet only the first one was used in the subsequent analysis. Figure 1. Experimental task and multimodal biosignal recording. In Session 1, participants watched video clips with or without anxiety events and answered whether they felt anxiety. In Session 2, participants indicated a point where in the video they felt anxiety.

Multimodal Biosignal Recordings
Four biosignals were collected simultaneously in the first session: EEG, PPG, EDA, and PS. EEG signals were measured (band-pass filtering 1-50 Hz, sampling rate: 500 Hz) with a 31-channel wetelectrode recording system (actiCHamp, Brain products GmbH, Gliching, Germany) at the following electrode locations, determined in accordance with the International 10/20 system: FP1 , FPz, FP2, F7,  F3, Fz, F4, F8, FC9, FC5, FC1, FC2, FC6, FC10, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4,  P8, O1, Oz, and O2. Two additional electrodes were attached to the left mastoid (TP9) as a ground, and the right one (TP10) as a reference. PPG and EDA were collected from a wristband-type wearable device (E4, Empatica Inc, Milano, Italy) with a 64 Hz and 4 Hz sampling rate, respectively. PS was acquired by a wearable eye tracker (Tobii Pro Glasses 2, TOBII, Danderyd, Sweden). The signals from three devices were synchronized by marking the beginning of the first video as follows: Before watching the first driving video, participants should press the event-marker button on the wristband to the 0.5 s rhythm of the countdown from 10 to 1, as instructed on the monitor screen. By doing so, participants could press the button accurately at the moment when the last number 1 was shown, while they could miss some other time points that were not used for synchronization. When the last number, '1', was shown on the monitor screen, a beep sound was presented together, which was recorded by a camera embedded in the eye-tracker. The first video started 0.5 s after the display of '1' (Figure 1). EEG signals were recorded along with triggers, marking the beginning of every trial.

Behavior Analysis
The behavioral data were acquired from the experiment, including the self-reports of anxiety for all videos and the time points of each VA. The ratio of self-reports of anxiety was calculated as the number of videos with 'Yes' response over the number of VA or VN (i.e., 30). To verify that VA clearly Figure 1. Experimental task and multimodal biosignal recording. In Session 1, participants watched video clips with or without anxiety events and answered whether they felt anxiety. In Session 2, participants indicated a point where in the video they felt anxiety.

Multimodal Biosignal Recordings
Four biosignals were collected simultaneously in the first session: EEG, PPG, EDA, and PS. EEG signals were measured (band-pass filtering 1-50 Hz, sampling rate: 500 Hz) with a 31-channel wet-electrode recording system (actiCHamp, Brain products GmbH, Gliching, Germany) at the following electrode locations, determined in accordance with the International 10 /20 system: FP1, FPz,  FP2, F7, F3, Fz, F4, F8, FC9, FC5, FC1, FC2, FC6, FC10, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7,  P3, Pz, P4, P8, O1, Oz, and O2. Two additional electrodes were attached to the left mastoid (TP9) as a ground, and the right one (TP10) as a reference. PPG and EDA were collected from a wristband-type wearable device (E4, Empatica Inc, Milano, Italy) with a 64 Hz and 4 Hz sampling rate, respectively. PS was acquired by a wearable eye tracker (Tobii Pro Glasses 2, TOBII, Danderyd, Sweden). The signals from three devices were synchronized by marking the beginning of the first video as follows: Before watching the first driving video, participants should press the event-marker button on the wristband to the 0.5 s rhythm of the countdown from 10 to 1, as instructed on the monitor screen. By doing so, participants could press the button accurately at the moment when the last number 1 was shown, while they could miss some other time points that were not used for synchronization. When the last number, '1', was shown on the monitor screen, a beep sound was presented together, which was recorded by a camera embedded in the eye-tracker. The first video started 0.5 s after the display of '1' (Figure 1). EEG signals were recorded along with triggers, marking the beginning of every trial.

Behavior Analysis
The behavioral data were acquired from the experiment, including the self-reports of anxiety for all videos and the time points of each VA. The ratio of self-reports of anxiety was calculated as the number of videos with 'Yes' response over the number of VA or VN (i.e., 30). To verify that VA clearly invoked anxiety, we compared this ratio between VA and VN using paired t-test. We also estimated the number of time points for each video by fitting Poisson distribution. The time points of self-invoked anxiety from VA were used to determine the onset of individuals' anxiety (anxiety onset). There was no clear onset time for VN due to absence of event. Thus, the control onset was defined as the average start time of VA (i.e., 12.73 ± 5.77 s, control onset) for VN. These two onsets were used to extract the features of anxiety from biosignals (Section 2.5).

EEG
To remove eye movement artifacts from EEG signals, artifact subspace reconstruction (ASR) was applied to the recorded EEG data [18]. Then, EEG data were transformed to the spectral domain using short-time Fourier Transform (STFT) with a 1-s window and 50% overlapping. The power spectral density (PSD) in four frequency bands was estimated using Welch's method: Theta (4-8 Hz), alpha (8)(9)(10)(11)(12), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and gamma (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40). Only frontal channels (F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6) were used for the analysis of this study as the frontal cortex is involved in emotional processing of anxiety [15,19] (Figure 2). The data were extracted from t 2 s after the two onset types (i.e., anxiety onset and control onset) and baseline corrected with t 1 s before the onsets, where t 1 = (1 2 3) s and t 2 = (3 4 5) s. Additionally, stress related EEG features [20], such as frontal alpha asymmetry (FAA), brain load index (BLI) and beta/alpha ratio (B/A), were extracted from the same 9 periods. Thus, a total of 423 features (FAA, BLI, B/A for 9 channels, and 4 frequencies for 9 channels for each period) were extracted from EEG data. To prevent over-fitting due to the sizable number of features compared to input data (i.e., the number of trials), we reduced the number of features to 20 using least absolute shrinkage and selection operator (LASSO) regression analysis provided by the function 'lasso' from MATLAB (2019a, MathWorks, Natick, MA, USA, 2019). We also extracted the same features of EEG data with a 2 s window and 0.5 s non-overlapping to check if it had more reliable estimates. invoked anxiety, we compared this ratio between VA and VN using paired t-test. We also estimated the number of time points for each video by fitting Poisson distribution. The time points of selfinvoked anxiety from VA were used to determine the onset of individuals' anxiety (anxiety onset). There was no clear onset time for VN due to absence of event. Thus, the control onset was defined as the average start time of VA (i.e., 12.73 ± 5.77 s, control onset) for VN. These two onsets were used to extract the features of anxiety from biosignals (Section 2.5).

EEG
To remove eye movement artifacts from EEG signals, artifact subspace reconstruction (ASR) was applied to the recorded EEG data [18]. Then, EEG data were transformed to the spectral domain using short-time Fourier Transform (STFT) with a 1-s window and 50% overlapping. The power spectral density (PSD) in four frequency bands was estimated using Welch's method: Theta (4-8 Hz), alpha (8)(9)(10)(11)(12), beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) and gamma (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40). Only frontal channels (F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6) were used for the analysis of this study as the frontal cortex is involved in emotional processing of anxiety [15,19] (Figure 2). The data were extracted from t2 s after the two onset types (i.e., anxiety onset and control onset) and baseline corrected with t1 s before the onsets, where t1 = (1 2 3) s and t2 = (3 4 5) s. Additionally, stress related EEG features [20], such as frontal alpha asymmetry (FAA), brain load index (BLI) and beta/alpha ratio (B/A), were extracted from the same 9 periods. Thus, a total of 423 features (FAA, BLI, B/A for 9 channels, and 4 frequencies for 9 channels for each period) were extracted from EEG data. To prevent over-fitting due to the sizable number of features compared to input data (i.e., the number of trials), we reduced the number of features to 20 using least absolute shrinkage and selection operator (LASSO) regression analysis provided by the function 'lasso' from MATLAB (2019a, MathWorks, Natick, MA, USA, 2019). We also extracted the same features of EEG data with a 2 s window and 0.5 s non-overlapping to check if it had more reliable estimates.   Table 1). Then, the rest of the features were extracted from a peak-to-peak interval (PPI) according to a previous feature extraction method [21] (No. 5 ~12 in Table 1). As shown in Figure 3a, a PPI is defined as a time interval, t(n + 1) − t(n), between   Table 1). Then, the rest of the features were extracted from a peak-to-peak interval (PPI) according to a previous feature extraction method [21] (No. 5~12 in Table 1). As shown in Figure 3a, a PPI is defined as a time interval, t(n + 1) − t(n), between the n-th peak, P(n), to a subsequent peak, P(n + 1) where 't' indicates time. The length and irregularity of PPI are defined as Equations (1) and (2), respectively. We also calculated the number of PPIs within a time window, denoted as 'nPPI', as well as the number of fast PPIs that was defined as PPIs faster than average PPI, as 'fast PPIpost count'. In addition, the ratio of low frequency (LF: 0.04~0.15 Hz) to high frequency (HF: 0.15~0.4 Hz) was obtained within a time period of interest. the n-th peak, P(n), to a subsequent peak, P(n + 1) where 't' indicates time. The length and irregularity of PPI are defined as Equations (1) and (2), respectively. We also calculated the number of PPIs within a time window, denoted as 'nPPI', as well as the number of fast PPIs that was defined as PPIs faster than average PPI, as 'fast PPIpost count'. In addition, the ratio of low frequency (LF: 0.04~0.15 Hz) to high frequency (HF: 0.15~0.4 Hz) was obtained within a time period of interest.  EDA increases from certain latency, normally 1 s, after the onset of arousal events [22]. Thus, EDA signal was corrected with baseline that was determined as a period from the onset to 1 s. Then, we epoched the EDA signals from 1 s to 6 s after both the anxiety onset and control onset. The five

EDA
EDA increases from certain latency, normally 1 s, after the onset of arousal events [22]. Thus, EDA signal was corrected with baseline that was determined as a period from the onset to 1 s. Then, we epoched the EDA signals from 1 s to 6 s after both the anxiety onset and control onset. The five arithmetic features were selected within the 5 s time-window: Mean, std., maximum, and minimum EDA signal, as well as EDA amplitude defined as a difference between maximum and minimum.

Pupil Size
To reduce blinking noise in PS, we removed the pupil data whose velocity was 1.5% higher than average velocity (Figure 4). This threshold was set heuristically. According to previous studies [23,24], the largest change in PS is within 2 to 5 s after an emotional change compared to the size from 1 s before the emotional change. Thus, PS data were corrected by baseline of signal 1 s before the onset. The five arithmetic features selected in this study were: Mean, std., max, min and pupil range calculated as maximum minus minimum within a time window (i.e., 3 s).

Decoding Analysis
We built 15 feature sets with all possible combinations of 4 signals in order to find which signal or combination of signals provided the best features for detecting anxiety. We extracted 20 features from EEG, 12 features from PPG, 5 from EDA and 5 from pupil size, respectively. To evaluate decoding accuracy, leave-one-trial-out (LOTO) validation was used for each participant ( Figure 5). To predict whether given trial data contained video with an anxiety event or not, we trained a classifier using the rest of the trials. Before training the classifier, we normalized the features using the standard scaling for each feature. The logistic regression (LR) was used as a classifier. Additionally, we used 10-fold cross validation (CV) for evaluating decoding accuracy as a more conservative validation method and the artificial neural networks (ANN) was used as another classifier to check if it could improve accuracy. Thus, there were eight decoding methods for analysis (2 validation methods × 2 classifiers × 2 sets with EEG features extracted using a 2 s window). In addition, we developed a cumulative feature count (CFC) in order to evaluate which bio signal was more involved in building the classifier across participants. To do so, we calculated the average of the absolute values of the LR weights assigned to each of the 42 features in each participant ( Figure 6). Then, we sorted the features based on their average absolute weight values in a descending

Decoding Analysis
We built 15 feature sets with all possible combinations of 4 signals in order to find which signal or combination of signals provided the best features for detecting anxiety. We extracted 20 features from EEG, 12 features from PPG, 5 from EDA and 5 from pupil size, respectively. To evaluate decoding accuracy, leave-one-trial-out (LOTO) validation was used for each participant ( Figure 5). To predict whether given trial data contained video with an anxiety event or not, we trained a classifier using the rest of the trials. Before training the classifier, we normalized the features using the standard scaling for each feature. The logistic regression (LR) was used as a classifier. Additionally, we used 10-fold cross validation (CV) for evaluating decoding accuracy as a more conservative validation method and the artificial neural networks (ANN) was used as another classifier to check if it could improve accuracy. Thus, there were eight decoding methods for analysis (2 validation methods × 2 classifiers × 2 sets with EEG features extracted using a 2 s window).

Decoding Analysis
We built 15 feature sets with all possible combinations of 4 signals in order to find which signal or combination of signals provided the best features for detecting anxiety. We extracted 20 features from EEG, 12 features from PPG, 5 from EDA and 5 from pupil size, respectively. To evaluate decoding accuracy, leave-one-trial-out (LOTO) validation was used for each participant ( Figure 5). To predict whether given trial data contained video with an anxiety event or not, we trained a classifier using the rest of the trials. Before training the classifier, we normalized the features using the standard scaling for each feature. The logistic regression (LR) was used as a classifier. Additionally, we used 10-fold cross validation (CV) for evaluating decoding accuracy as a more conservative validation method and the artificial neural networks (ANN) was used as another classifier to check if it could improve accuracy. Thus, there were eight decoding methods for analysis (2 validation methods × 2 classifiers × 2 sets with EEG features extracted using a 2 s window). In addition, we developed a cumulative feature count (CFC) in order to evaluate which bio signal was more involved in building the classifier across participants. To do so, we calculated the average of the absolute values of the LR weights assigned to each of the 42 features in each participant ( Figure 6). Then, we sorted the features based on their average absolute weight values in a descending order (Figure 6a). Finally, we collected this vector of sorted features from every participant and In addition, we developed a cumulative feature count (CFC) in order to evaluate which bio signal was more involved in building the classifier across participants. To do so, we calculated the average of the absolute values of the LR weights assigned to each of the 42 features in each participant ( Figure 6). Then, we sorted the features based on their average absolute weight values in a descending order (Figure 6a). Finally, we collected this vector of sorted features from every participant and counted the number of times a feature appeared on each rank (Figure 6b). A feature with the largest proportion in the high ranking could be interpreted as the best feature and/or the best signal. The CFCs from other possible classifiers were also calculated in the same way, except for the number of features in the feature set. Since the CFC was used to rank weights rather than to select features, the number of features of all classifiers was not changed, regardless of CFC application. counted the number of times a feature appeared on each rank (Figure 6b). A feature with the largest proportion in the high ranking could be interpreted as the best feature and/or the best signal. The CFCs from other possible classifiers were also calculated in the same way, except for the number of features in the feature set. Since the CFC was used to rank weights rather than to select features, the number of features of all classifiers was not changed, regardless of CFC application.

Behavior Results
The ratio of self-reports of anxiety for VA and VN were 0.7505 and 0.1704, respectively, indicating VA invoked anxiety significantly more than VN did (t(30) = 20.78, p < 0.0001). In addition, the average number of keyboard presses for anxiety timing for each VA was 0.99 ± 0.22. The expected number of anxiety expressions for each stimulus fitted by Poisson distribution is summarized in Table 2. For example, one would expect to observe 1.103 keyboard press for the stimulus no. 1, estimated by 31 participants button pressing data. These results confirmed that VA could sufficiently arouse anxiety in our experiment.

Decoding Results
Twenty-three participants' data was used for a decoding analysis. The number of anxiety and control trials used in the analysis were 24.91 ± 7.36 and 24.09 ± 6.65 out of maximum 30 trials for each. The paired t-test showed no difference in the number of trials between anxiety and control (t(22) =

Behavior Results
The ratio of self-reports of anxiety for VA and VN were 0.7505 and 0.1704, respectively, indicating VA invoked anxiety significantly more than VN did (t(30) = 20.78, p < 0.0001). In addition, the average number of keyboard presses for anxiety timing for each VA was 0.99 ± 0.22. The expected number of anxiety expressions for each stimulus fitted by Poisson distribution is summarized in Table 2. For example, one would expect to observe 1.103 keyboard press for the stimulus no. 1, estimated by 31 participants button pressing data. These results confirmed that VA could sufficiently arouse anxiety in our experiment.

Decoding Results
Twenty-three participants' data was used for a decoding analysis. The number of anxiety and control trials used in the analysis were 24.91 ± 7.36 and 24.09 ± 6.65 out of maximum 30 trials for each. The paired t-test showed no difference in the number of trials between anxiety and control (t(22) = 1.67, p = 0.11), thus informing the chance level of decoding as 50%.
The LR classifier with LOTO validation using a feature set with 1 s EEG data showed the highest accuracies among seven other methods (Tables A2 and A3). Paired t-test of decoding methods revealed that the other three classification methods using a feature set with 1 s EEG data showed lower average and maximum accuracy (ps < 0.05). We also found that the decoding accuracy of feature sets including EEG features with a 2 s window using LR classifier with both LOTO and 10-fold CV method were not above chance level (ps > 0.5) (Table A4). Also, ANN classifier trained with 10-fold CV method did not perform above chance level either (ps > 0.9). When using ANN classifier with LOTO method, however, feature set 2 (PPG only) showed results slightly above the chance level (average 0.53, t(22) = 1.90, p = 0.035) across subjects, while other feature sets did not (ps > 0.05). In sum, our analysis results indicated that the LR classifier with LOTO validation method produced the most accurate estimation.
Decoding results showed that among 15 possible combinations of multimodal biosignals, decoding EEG alone showed the highest accuracy ( Table 3, the third column). In addition, we obtained accuracy above the chance level in most participants (i.e., 22 or 23) whenever the feature sets included EEG features ( Table 3, the fifth column). When decoding all the features from every biosignal, the cumulative feature count analysis revealed that the EEG features dominated the top ranks followed by PPG (Figure 7). The cumulative feature count results from other combinations of biosignals also indicated that the EEG features were mostly used for decoding ( Figures A1 and A2). Although using the EEG features exhibited the highest performance on average, a subset of participants showed higher decoding accuracy when using other feature sets compared to using the feature set 1-which contained EEG features only (Table 3, the rightmost column). Nine participants exhibited higher accuracy when using the feature set 7, consisting of PS plus EEG, compared to using EEG only. However, only two of them presented above-chance-level accuracy using the feature set 4 that contained PS only, indicating that PS could augment EEG to enhance classification accuracy but not yield high accuracy alone. This is the case when using other feature sets such as the feature set 6 (EEG + EDA) and 13 (EEG + EDA + PS), where adding other signals to EEG helped increase accuracy, but using those signals alone did not produce high accuracy.  Once we observed improvement of decoding by adding other signals to EEG, we counted how many participants benefited from mixing of other biosignals to EEG in terms of decoding accuracy. In other words, we evaluated in how participants using any of the feature sets including EEG plus other signals (i.e., sets 5, 6, 7, 11, 12, 13 and 15) performed using the feature set 1 (i.e., EEG only). We found that 16 out of 23 participants exhibited higher accuracy when using multimodal features than when using EEG only. Figure 8 describes the best feature set for each participant and how much it improved decoding accuracy compared to the uni-modal feature set of EEG. The 7 participants (i.e., 2, 4, 5, 6, 10, 24, 27) who had the highest accuracy for uni-modal EEG feature or the same accuracy between uni-modal EEG feature and the multimodal one were excluded for visualization. Especially, the feature set 7 (EEG + PS) and set 6 (EEG + EDA) were most influential in increasing the possibility of accuracy improvement by multimodal signals.

Selected Features from EEG
We selected twenty-dimensional feature vectors from 324-dimensional feature vectors of EEG features using LASSO. The most commonly selected features across participants were alpha power at F3 and Fz channels, followed by theta and gamma at Fz channel (Figure 9a). In addition, the most commonly selected features among all training sets for building models of all participants (i.e., 17,775 sets) were also alpha power at F3 (4,488 sets) and Fz (4,405 sets) and theta power at Fz (4,387 sets) (Figure 9b). Notably, gamma feature selection occurred more frequently over front-central channels Once we observed improvement of decoding by adding other signals to EEG, we counted how many participants benefited from mixing of other biosignals to EEG in terms of decoding accuracy. In other words, we evaluated in how participants using any of the feature sets including EEG plus other signals (i.e., sets 5, 6, 7, 11, 12, 13 and 15) performed using the feature set 1 (i.e., EEG only). We found that 16 out of 23 participants exhibited higher accuracy when using multimodal features than when using EEG only. Figure 8 describes the best feature set for each participant and how much it improved decoding accuracy compared to the uni-modal feature set of EEG. The 7 participants (i.e., 2, 4, 5, 6, 10, 24, 27) who had the highest accuracy for uni-modal EEG feature or the same accuracy between uni-modal EEG feature and the multimodal one were excluded for visualization. Especially, the feature set 7 (EEG + PS) and set 6 (EEG + EDA) were most influential in increasing the possibility of accuracy improvement by multimodal signals. Once we observed improvement of decoding by adding other signals to EEG, we counted how many participants benefited from mixing of other biosignals to EEG in terms of decoding accuracy. In other words, we evaluated in how participants using any of the feature sets including EEG plus other signals (i.e., sets 5, 6, 7, 11, 12, 13 and 15) performed using the feature set 1 (i.e., EEG only). We found that 16 out of 23 participants exhibited higher accuracy when using multimodal features than when using EEG only. Figure 8 describes the best feature set for each participant and how much it improved decoding accuracy compared to the uni-modal feature set of EEG. The 7 participants (i.e., 2, 4, 5, 6, 10, 24, 27) who had the highest accuracy for uni-modal EEG feature or the same accuracy between uni-modal EEG feature and the multimodal one were excluded for visualization. Especially, the feature set 7 (EEG + PS) and set 6 (EEG + EDA) were most influential in increasing the possibility of accuracy improvement by multimodal signals.

Selected Features from EEG
We selected twenty-dimensional feature vectors from 324-dimensional feature vectors of EEG features using LASSO. The most commonly selected features across participants were alpha power at F3 and Fz channels, followed by theta and gamma at Fz channel (Figure 9a). In addition, the most commonly selected features among all training sets for building models of all participants (i.e., 17,775 sets) were also alpha power at F3 (4,488 sets) and Fz (4,405 sets) and theta power at Fz (4,387 sets) (Figure 9b). Notably, gamma feature selection occurred more frequently over front-central channels

Selected Features from EEG
We selected twenty-dimensional feature vectors from 324-dimensional feature vectors of EEG features using LASSO. The most commonly selected features across participants were alpha power at F3 and Fz channels, followed by theta and gamma at Fz channel (Figure 9a). In addition, the most commonly selected features among all training sets for building models of all participants (i.e., 17,775 sets) were also alpha power at F3 (4,488 sets) and Fz (4,405 sets) and theta power at Fz (4,387 sets) (Figure 9b). Notably, gamma feature selection occurred more frequently over front-central channels (e.g., FC1, FC2, FC6) than frontal channels, whereas theta and alpha features over frontal channels were preferred.

Discussion
This study aimed to investigate whether multimodal biosignals from wearable sensors could be used to detect anxiety invoked by driving situations, and which signal or combination of signals would show the highest detection accuracy. We simultaneously measured four biosignals-EEG, PPG, EDA, and pupil size-and built a classifier to discriminate anxiety-invoked driving situations and normal ones from these biosignals. The results revealed that classification of EEG outperformed that of other signals in terms of average accuracy and cumulative feature counts. Specifically, classifiers tended to harness frontal theta, alpha and gamma powers of EEG to detect anxiety-invoked situations. Adding other biosignals such as EDA or pupil size to EEG further enhanced the detection performance in some participants.
The selected EEG features for anxiety detection might indicate neural processes involved in dealing with anxiety events. Frontal-midline theta oscillations may directly represent the emotional process of anxiety. It is widely known that anterior cingulate cortex (ACC) is involved in processing negative affects and generates theta oscillations at the frontal midline [25,26]. Another possible explanation is, that theta oscillations at frontal midline were engaged in attention demanding tasks [25][26][27][28]. For example, encountering sudden increases of traffic on road lanes or crossroads increased frontal midline theta power in a driving simulator where the external situation required attention for action derived from the new information [27]. The anxiety events used in our study delivered the new information requiring follow-up action (e.g., hit the brake) in driving environments, thus inducing theta oscillations at frontal midline. In addition, frontal gamma oscillations often appear along with frontal theta oscillations when attention is required for the task [29]. However, it is difficult to find a proper explanation for the alpha oscillations at frontal channels.
Despite dominance of EEG features in contribution to brain-computer interface (BCI) performance, some participants (i.e., 16 out of 23) exhibited better performance when other biosignals (i.e., EDA, pupil size or both) were added to EEG in BCIs. This leaves room for the feasibility of simpler biosignals, other than EEG, to be used in anxiety detection systems in the future. Yet, it should also be highlighted that the combination of multiple biosignals varies across individuals, suggesting

Discussion
This study aimed to investigate whether multimodal biosignals from wearable sensors could be used to detect anxiety invoked by driving situations, and which signal or combination of signals would show the highest detection accuracy. We simultaneously measured four biosignals-EEG, PPG, EDA, and pupil size-and built a classifier to discriminate anxiety-invoked driving situations and normal ones from these biosignals. The results revealed that classification of EEG outperformed that of other signals in terms of average accuracy and cumulative feature counts. Specifically, classifiers tended to harness frontal theta, alpha and gamma powers of EEG to detect anxiety-invoked situations. Adding other biosignals such as EDA or pupil size to EEG further enhanced the detection performance in some participants.
The selected EEG features for anxiety detection might indicate neural processes involved in dealing with anxiety events. Frontal-midline theta oscillations may directly represent the emotional process of anxiety. It is widely known that anterior cingulate cortex (ACC) is involved in processing negative affects and generates theta oscillations at the frontal midline [25,26]. Another possible explanation is, that theta oscillations at frontal midline were engaged in attention demanding tasks [25][26][27][28]. For example, encountering sudden increases of traffic on road lanes or crossroads increased frontal midline theta power in a driving simulator where the external situation required attention for action derived from the new information [27]. The anxiety events used in our study delivered the new information requiring follow-up action (e.g., hit the brake) in driving environments, thus inducing theta oscillations at frontal midline. In addition, frontal gamma oscillations often appear along with frontal theta oscillations when attention is required for the task [29]. However, it is difficult to find a proper explanation for the alpha oscillations at frontal channels.
Despite dominance of EEG features in contribution to brain-computer interface (BCI) performance, some participants (i.e., 16 out of 23) exhibited better performance when other biosignals (i.e., EDA, pupil size or both) were added to EEG in BCIs. This leaves room for the feasibility of simpler biosignals, other than EEG, to be used in anxiety detection systems in the future. Yet, it should also be highlighted that the combination of multiple biosignals varies across individuals, suggesting that a system to detect anxiety may need personal customization, particularly in a vehicle. We attempted to extract a common feature set from all the participants and examine decoding performance using it. But, decoding performance was only close to a chance level. This might be because the features varied across individuals, as expected. In addition, further work is required to explore why all individuals did not display more improved accuracy for multimodal signals compared with EEG only. Nonetheless, our study highlights that EEG seems to be essential in the development of such a system.
Overall, the average accuracy achieved in this study is lower than other studies that detected driver's states: 77% vs. {82%, 82.03%, 89.70%, 100%, 77.95%} [8][9][10][11][12]. However, other studies demonstrated the estimation of driver's states other than anxiety, such as stress or specific emotions (happy and angry), where they discriminated these emotional states from a normal state. In contrast, our studies estimated changes in anxiety derived by sudden events in driving situation.
The present study contributes to the extraction of feasible biosignals for anxiety detection while driving. Furthermore, the analysis of neural data demonstrated that attention for action and processing negative affects were involved in driving with anxiety events. Our findings can be applied to systems for monitoring driver's emotional states in smart cars. This research suggests the following directions for future research. Broadening the scope of the target group to novices who may feel anxiety more frequently or the elderly whose change of states are slower than normal drivers. In addition, future work should focus on enhancing the decoding accuracy of anxiety detection by applying feature selection methods, suggested in other emotion detection studies such as hybrid techniques (e.g., clustering, principal component analysis (PCA), etc.) [30,31]. Funding: This research was funded by the Electronics and Telecommunications Research Institute (ETRI), grant number 18ZS1300 (the development of smart context-awareness foundation technique for major industry acceleration) and by the Korean Government (MSIT), grant number 2017-0-00432 (development of non-invasive integrated BCI SW platform to control home appliances and external devices by user's thought via AR/VR interface).

Conflicts of Interest:
The authors declare no conflict of interest.