Temporal EEG Imaging for Drowsy Driving Prediction

As a major cause of vehicle accidents, the prevention of drowsy driving has received increasing public attention. Precisely identifying the drowsy state of drivers is difficult since it is an ambiguous event that does not occur at a single point in time. In this paper, we use an electroencephalography (EEG) image-based method to estimate the drowsiness state of drivers. The driver’s EEG measurement is transformed into an RGB image that contains the spatial knowledge of the EEG. Moreover, for considering the temporal behavior of the data, we generate these images using the EEG data over a sequence of time points. The generated EEG images are passed into a convolutional neural network (CNN) to perform the prediction task. In the experiment, the proposed method is compared with an EEG image generated from a single data time point, and the results indicate that the approach of combining EEG images in multiple time points is able to improve the performance for drowsiness prediction.


Introduction
The prevention of drowsy driving has become a major challenge in safety driving issues. Many drivers experience driving in drowsy conditions, especially in long-term driving. Continuous, unexciting driving reduces the vigilance of drivers and increases the risk of traffic accidents. To address this problem, the development of brain-computer interfaces (BCIs) to investigate the human's cognitive state is an urgent necessity. Electroencephalography (EEG) is one of the most direct and effective physiological measures for the estimation of brain dynamics. Recent EEG studies have demonstrated that changes in alertness during driving are related to changes in global brain dynamics [1,2]. It has also been shown that EEG is a robust measurement for the estimation of a driver's cognitive state [3][4][5][6]. In addition, EEG provides abilities of convenient measurement in real timeand is therefore widely used in real applications [3,[7][8][9].
Although EEG has many advantages for the analysis of brain dynamics, the use of EEG-based BCIs in real applications remains challenging. The raw EEG signals acquired from the electrodes are often obscured by physiological artifacts such as eye movement and muscle movement, which is undesirable in the BCI system [10]. Therefore, removing these unwanted artifactsto capture brain activity has become a crucial issue in EEG-based BCI applications. Many studies have shown that independent component analysis (ICA) can effectively separate the artifacts from raw EEG data [11][12][13][14]. The mixture signal is decomposed into many statistically independent components by ICA. A non-artifact signal is obtained by excluding the components that are associated with artifacts. Although ICA is a powerful tool for extracting brain activity from raw EEGsignal, it cannot support real-time applications because the separated artifacts need to be removed manually. This drawback limits the utility of ICA for real-world BCI applications. An automatic processing BCI is strongly required for drowsy driving prediction since traffic accidents always occur in a very short time. Therefore, this study does not apply any artifact removal process to the raw EEG data during the experiment, ensuring that the proposed method does not use manual processes for the drowsy driving prediction task.
For EEG signals without artifact removal, how to correctly extract the informative features of EEG signals becomes a major challenge in BCI applications. A popular approach for feature extraction is transforming the EEG signals into a frequency domain [15,16]. Fast Fourier transform (FFT) is applied to compute the power spectra of the multi-channel time-series EEG signals into the frequency domain; then, the average of the power spectra value for each frequency is collected to obtain a feature vector for classification [3,17]. The main disadvantage of such an approach is that it only considers the frequency information. EEG is measured over the scalp in a three-dimensional space. It is obvious that the spatial information of EEG cannot be well described by a feature vector. Instead of the 1D feature vector, there is an increasing trend to use 2D feature maps for the analysis of EEG, which have achieved good performance in their application areas [18,19].
As the most popular machine learning technique in recent years, deep learning has achieved significant success in a variety of research fields, such as speech [20], images [21][22][23] and video [24]. The ability of deep learning techniques to learn unknown features from incoming data has gained considerable attention in EEG studies [25][26][27]. There is an increasing trend to use convolutional neural networks (CNNs) to analyze EEGs due to their state-of-the-art performance in the computer vision field. A popular approach is transforming the EEG measurement into a 2D feature map and then passing it into a CNN model for classification [28][29][30][31].
For drowsy driving prediction, it is difficult to identify the drowsy state using the single time point of EEG data because drowsiness is an ambiguous event. The driving performance may not immediately decrease with increasing drowsiness levels, which means that drivers maintain normal driving performance even though their vigilance level has started to decrease. To overcome these difficulties, this study proposes a new EEG image method that combines multiple frames of EEG images to examine the temporal activity of EEG data. Such approach not only focus on the current EEG data, but also considers the brain activity of the previous time period. The evaluation results show that the proposed method can improve the performance of EEG image-based BCI systems in drowsiness prediction.

Virtual Reality (VR)-Based Driving Environment
In our previous studies, to observe the subject's drowsy state during the driving task, a virtual reality (VR)-based realistic driving environment was developed to simulate a long-term driving situation [2,13,[32][33][34][35]. As shown in Figure 1, the surrounding scenes were projected from six projectors to constitute a surrounding vision. A night-time driving scene at a fixed velocity of 100 km/h on a four-lane highway is set up in the VR experiment. Before the experiment was started, participants were directed to enter the real car mounted on a motion platform and then steer the vehicle according to the instructions. All participants were required to take a 5-min pre-test session to ensure that they clearly understood the instruction and did not suffer from simulator sickness. The highway scene was connected to a physiological measuring system, where the EEG and participants' performance were continuously and simultaneously measured and recorded.
center of the cruising lane. The participants were required to quickly steer the car back whenever the car started to deviate from the original cruising lane. There was no feedback to wake the participants even if they did not respond to the lane-departure event. The car continued to move along the curb until the participants steered it to return to the center of the cruising lane. Figure 2 describes a complete trial in the driving paradigm that includes the one-second baseline recording, deviation onset, response onset, and response offset. The time interval between the random lane-departure event was set to 5-10 s.

Participants
Thirty-eight right-handed, healthy young adults aged 20-30 years participated in the driving experiment. All subjects were required to have a driving license and sufficient sleep in the two preceding weeks. According to self-reporting, no subject had a history of psychological disorders. Before the experiment, the subjects were asked to answer a questionnaire about their sleep patterns to ensure that they had a normal cognitive state during the driving task, and they needed to complete a consent form explaining the experimental protocol that was approved by the Institutional Review Board of the Taipei Veterans General Hospital, Taiwan. The EEG signals of the subjects were captured from a Quik-Cap (CompumedicalNeuroScan) with 32 Ag/AgCl electrodes, including 30 EEG electrodes and two reference electrodes. The EEG electrodes were placed in

Driving Fatigue Paradigm
The event-related lane-keeping driving task was adopted in this study for the evaluation of the brain dynamics occurring during the driving task, as illustrated in Figure 2. The participants were instructed to perform a 90-min driving task without breaking or resting in the VR driving environment. The driving experiment began in the early afternoon (13:00-14:00) after lunch because people often feel sleepy during this time [36]. During the sustained attention driving task, the VR paradigm randomly simulated a lane-departure event that caused the car to drift away from the center of the cruising lane. The participants were required to quickly steer the car back whenever the car started to deviate from the original cruising lane. There was no feedback to wake the participants even if they did not respond to the lane-departure event. The car continued to move along the curb until the participants steered it to return to the center of the cruising lane. Figure 2 describes a complete trial in the driving paradigm that includes the one-second baseline recording, deviation onset, response onset, and response offset. The time interval between the random lane-departure event was set to 5-10 s.

Participants
Thirty-eight right-handed, healthy young adults aged 20-30 years participated in the driving experiment. All subjects were required to have a driving license and sufficient sleep in the two preceding weeks. According to self-reporting, no subject had a history of psychological disorders. Before the experiment, the subjects were asked to answer a questionnaire about their sleep patterns to ensure that they had a normal cognitive state during the driving task, and they needed to complete a Appl. Sci. 2019, 9, 5078 4 of 13 consent form explaining the experimental protocol that was approved by the Institutional Review Board of the Taipei Veterans General Hospital, Taiwan. The EEG signals of the subjects were captured from a Quik-Cap (CompumedicalNeuroScan) with 32 Ag/AgCl electrodes, including 30 EEG electrodes and two reference electrodes. The EEG electrodes were placed in accordance with a modified international 10-20 system. The impedance of all electrodes was kept under 5 kΩ during the experiments.

Drowsiness Measurement
The driving performance was defined based on the response time (RT), which represented the time between the deviation onset and the response onset. As the lane-departure event occurred, it was expected that the participant would take a long time to steer the car back to the center of the cruising lane if he/she was in a drowsy state; then, the response time (RT) in the trial could be very long. By contrast, when the participant was alert, he/she could respond to the lane-departure event in a short time. Previous studies have shown that baseline EEG activity is strongly correlated with changes in RT [34]. In this study, the 1s baseline signal (red region shown in Figure 2) was used to perform the drowsy prediction based on the trial's RT.

Approach
The general flowchart of our method is presented in Figure 3. Before data analysis, the acquired EEG records were processed using a 1-Hz high-pass and 50-Hz low-pass infinite impulse response filter to remove the noise and then down-sampled to 250-Hz to reduce the dimensions of the data. The power spectral activities of EEG signals were computed using FFT. To transform the EEG signal to a 2D image, we needed to address the following issues: (1) transforming the power spectrum of EEG signals to image values and (2) interpolating the points of the image data to a color image. The detailed approach is explained in the following sections.

Drowsiness Measurement
The driving performance was defined based on the response time (RT), which represented the time between the deviation onset and the response onset. As the lane-departure event occurred, it was expected that the participant would take a long time to steer the car back to the center of the cruising lane if he/she was in a drowsy state; then, the response time (RT) in the trial could be very long. By contrast, when the participant was alert, he/she could respond to the lane-departure event in a short time. Previous studies have shown that baseline EEG activity is strongly correlated with changes in RT [34]. In this study, the 1s baseline signal (red region shown in Figure 2) was used to perform the drowsy prediction based on the trial's RT.

Approach
The general flowchart of our method is presented in Figure 3. Before data analysis, the acquired EEG records were processed using a 1-Hz high-pass and 50-Hz low-pass infinite impulse response filter to remove the noise and then down-sampled to 250-Hz to reduce the dimensions of the data. The power spectral activities of EEG signals were computed using FFT. To transform the EEG signal to a 2D image, we needed to address the following issues: (1) transforming the power spectrum of EEG signals to image values and (2) interpolating the points of the image data to a color image. The detailed approach is explained in the following sections. Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 14

Feature Extraction
To extract the physiological features, the 30-channel time-series EEG signal was transformed into a frequency domain via a 256-point FFT. Based on the findings in previous studies [14,37,38], the frequency band in the, theta (4-8 Hz), alpha (8)(9)(10)(11)(12)(13), and beta (13-20 Hz) was suitable for estimating the driver's vigilance level. Our past studies also observed that the increasing power of theta band and alpha band had positive correlation with RT, and beta band had high correlation to kinesthetic stimuli which can affect the prediction performance [13,17]. The mean power of these frequency bands of interest was combined to form a feature vector. As depicted in Figure 4, this feature vector was considered a pixel value of the RGB image. Each channel of the colour image corresponds to a frequency band of interest.

Feature Extraction
To extract the physiological features, the 30-channel time-series EEG signal was transformed into a frequency domain via a 256-point FFT. Based on the findings in previous studies [14,37,38], the frequency band in the, theta (4-8 Hz), alpha (8)(9)(10)(11)(12)(13), and beta (13-20 Hz) was suitable for estimating the driver's vigilance level. Our past studies also observed that the increasing power of theta band and alpha band had positive correlation with RT, and beta band had high correlation to kinesthetic stimuli which can affect the prediction performance [13,17]. The mean power of these frequency bands of interest was combined to form a feature vector. As depicted in Figure 4

Interpolation of the EEG Measurement to Image Pixels
As described in the previous section, we obtained 30 data points corresponding to the location of the EEG electrodes. First, we converted the magnitude of the power spectrum of the EEG signals into an image pixel value. Equation (1) shows the sigmoid function utilized to normalize the value of the EEG power spectrum to [0,1]: where t P is the normalized image pixel value, and t is the magnitude of the frequency response in the dB. Next, we needed to interpolate the scattered image data points to a color image. Figure 5 illustrates the interpolation scheme of the EEG image. The finite element method, a numerical technique that is usually applied for the approximate solution of engineering problems that are difficult to solve analytically, was adopted to perform the interpolation task. The Clough-Tocher scheme was used to interpolate a 32×32 mesh from the 30 image data points [39]. In this study, the EEG electrodes were placed in accordance with a modified international 10-20 system, which means that the location corresponding to each image point is known. Three topographical maps corresponding to the three frequency bands of interest were acquired by the Clough-Tocher scheme. The three spatial maps were then merged to create a 32×32 color image. Figure 6 demonstrates several samples of the EEG image.

Temporal EEG Image
One of the challenges in drowsy driving prediction is that some drowsy trials may have similar patterns of the alert trials. The driving performance might not degrade immediately, even if the alertness level of the drivers begins falling, which means that drivers can respond well to lane-departure events before they fall asleep (but the drowsy pattern of the EEG has appeared). In that case, the generated EEG images between the drowsy trial and its previous alert trials can be very similar. As the drivers wake up by themselves, their vigilance level dramatically recovers, and the RT returns to the alert state; then, the EEG pattern becomes completely different from the drowsy trials. Based on these findings, the drowsy state should be estimated not only using the

Interpolation of the EEG Measurement to Image Pixels
As described in the previous section, we obtained 30 data points corresponding to the location of the EEG electrodes. First, we converted the magnitude of the power spectrum of the EEG signals into an image pixel value. Equation (1) shows the sigmoid function utilized to normalize the value of the EEG power spectrum to [0,1]: where P t is the normalized image pixel value, and t is the magnitude of the frequency response in the dB. Next, we needed to interpolate the scattered image data points to a color image. Figure 5 illustrates the interpolation scheme of the EEG image. The finite element method, a numerical technique that is usually applied for the approximate solution of engineering problems that are difficult to solve analytically, was adopted to perform the interpolation task. The Clough-Tocher scheme was used to interpolate a 32 × 32 mesh from the 30 image data points [39]. In this study, the EEG electrodes were placed in accordance with a modified international 10-20 system, which means that the location corresponding to each image point is known. Three topographical maps corresponding to the three frequency bands of interest were acquired by the Clough-Tocher scheme. The three spatial maps were then merged to create a 32 × 32 color image. Figure 6 demonstrates several samples of the EEG image.

Temporal EEG Image
One of the challenges in drowsy driving prediction is that some drowsy trials may have similar patterns of the alert trials. The driving performance might not degrade immediately, even if the alertness level of the drivers begins falling, which means that drivers can respond well to lane-departure events before they fall asleep (but the drowsy pattern of the EEG has appeared). In that case, the generated EEG images between the drowsy trial and its previous alert trials can be very similar. As the drivers wake up by themselves, their vigilance level dramatically recovers, and the RT returns to the alert state; then, the EEG pattern becomes completely different from the drowsy trials. Based on these findings, the drowsy state should be estimated not only using the current trial but also by examining the previous trials. This study proposes a temporal EEG image that is generated by a linear combination of a sequence of EEG images, as shown in Equation (2): where I t is the generated temporal EEG image, I t is the array of the EEG image data, c is a scalar coefficient and c i < c j when i < j. N in this study is set to five. Figure 7 illustrates the schematic diagram of the temporal EEG image. Instead of asingle frame EEG image I N , this approach estimates the drowsiness state using I N , which includes the information of the brain activity from multiple time points.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 14 current trial but also by examining the previous trials. This study proposes a temporal EEG image that is generated by a linear combination of a sequence of EEG images, as shown in Equation (2): where ' t I is the generated temporal EEG image, t I is the array of the EEG image data,c is a scalar coefficient and i j c c < when i j < . N in this study is set to five. Figure. 7 illustrates the schematic diagram of the temporal EEG image. Instead of asingle frame EEG image N I , this approach estimates the drowsiness state using ' N I , which includes the information of the brain activity from multiple time points.  current trial but also by examining the previous trials. This study proposes a temporal EEG image that is generated by a linear combination of a sequence of EEG images, as shown in Equation (2): where ' t I is the generated temporal EEG image, t I is the array of the EEG image data,c is a scalar coefficient and i j c c < when i j < . N in this study is set to five. Figure. 7 illustrates the schematic diagram of the temporal EEG image. Instead of asingle frame EEG image N I , this approach estimates the drowsiness state using ' N I , which includes the information of the brain activity from multiple time points.

Classification Using the CNN Model
This study applies a CNN including six convolution layers, three max-pooling layers, and a layer to the classification of the input EEG image, as shown in Figure 8. A popular open source deep Figure 7. The schematic diagram of the temporal EEG image. I N is the single frame EEG image generated from the current trial. Temporal EEG image I N is acquired by linearly combining I N and its previous trials.

Classification Using the CNN Model
This study applies a CNN including six convolution layers, three max-pooling layers, and a layer to the classification of the input EEG image, as shown in Figure 8. A popular open source deep learning framework named Caffe is employed for implementing the CNN model [40]. The parameter setting of the overall CNN architecture is presented in Table 1. A set of filters is used to convolve the input EEG images for feature extraction. The convolved images are then subsampled by the max-pooling layer to derive compacted features. The convolution and pooling progress are repeated several times through CNN layers. The lower-level features of the input data are extracted via the early layers, and those features are collected in the later layers to hierarchically learn the higher-level features. Finally, the acquired high-level features are concatenated and passed into the fully-connected layer for the classification. The final prediction result is determined according to the output of the fully-connected layer. We only use the alert and drowsy classes in this study, so the output size of the fully-connected layer is 2 × 1.

Classification Using the CNN Model
This study applies a CNN including six convolution layers, three max-pooling layers, and a layer to the classification of the input EEG image, as shown in Figure 8. A popular open source deep learning framework named Caffe is employed for implementing the CNN model [40]. The parameter setting of the overall CNN architecture is presented in Table 1. A set of filters is used to convolve the input EEG images for feature extraction. The convolved images are then subsampled by the max-pooling layer to derive compacted features. The convolution and pooling progress are repeated several times through CNN layers. The lower-level features of the input data are extracted via the early layers, and those features are collected in the later layers to hierarchically learn the higher-level features. Finally, the acquired high-level features are concatenated and passed into the fully-connected layer for the classification. The final prediction result is determined according to the output of the fully-connected layer. We only use the alert and drowsy classes in this study, so the output size of the fully-connected layer is 2×1.

Experiment
The EEG dataset used in this study includes 10,395 alert trials and 3080 drowsy trials collected from 38 subjects. According to the suggestion of the previous research, if the drivers are fully aware of the driving situation, the average time for them to respond to the lane-departure event is approximately 0.7 s [41]. According to the previous studies [42][43][44], drivers provide poor performance when they don't respond the lane-departure event in three times the mean alert RT. Therefore, this study adopts three times the average response time as the classification boundary of the drowsy prediction task. The EEG trials are considered as alert trials as long as their RT are less than 2.1 s. In addition, the EEG trials with an RT larger than 2.5 s are labelled as drowsy trials, and the EEG trials with an RT between 2.1 s and 2.5 s are not used in this experiment. The evaluation of our approach is performed using leave-one-subject-out cross-validation. We select the data from one subject for testing and the data from the remaining subjects for training. This process is repeated for each of the 38 subjects. To evaluate the predictive performance of the proposed method, the temporal EEG image method is compared with the single frame EEG image method-directly using the current trial of the EEG image for drowsiness prediction. Table 2 shows the comparison result of the temporal EEG image method and the single frame EEG image method. The average accuracy is an average of the accuracy of individual categories. It is apparent that the temporal EEG image method outperforms the single frame EEG image methods. In both methods, a similar accuracy of the alert class is given, but our approach achieves significant improvement in the accuracy of the drowsy class. Based on the aim of drowsiness prediction, the prediction rate of the drowsy class is more important than the alert class. Furthermore, the results also demonstrate that our approach has better prediction performance than the single frame EEG image method in most subjects, which proves that the improvement of our approach has universality in general users. Table 3 shows the evaluation result of EEGnet and hierarchical convolutional neural network (HCNN), which are CNN-based approach for EEG analysis and achieve good performance in their applications [45]. The results demonstrate that the proposed method yields superior prediction performance than EEGnet and HCNN.

Discussion and Conclusions
It is challenging to classify EEG data without an artifact removal process because drivers' brain activity can change over time due to many factors, such as their mental state and body movement, which result in the temporal fluctuations of the EEG signals. However, there still contains important information associated with the drowsiness level of drivers, and thus, the temporal analysis of the EEG signals becomes a crucial issue. This study proposes a temporal EEG image algorithm that combines a sequence of EEG images to form a new EEG image that contains brain dynamics from multiple time points. Our experimental results show that the proposed method achieves good performance in the drowsiness prediction. Support vector machines (SVMs) are also employed for comparison with our approach because they are popular classifiers for EEG analysis. In our experiment, the computational cost is expensive and a bad prediction result is obtained if we use the EEG image as the input of SVM. Thus, the power spectrum of EEG is selected as the input of the SVM. Similar to the experiment described in the previous sections, the input data do not apply any artifact removal process. The experimental results indicate that SVM provides a biased prediction result towards the alert class. That is, it always predicts an alert regardless of the input and results in a perfect detection of alert trials but no detection of drowsy trials. We found that SVM only provides meaningful prediction results in a balanced training dataset that has a similar number of alert trials and drowsy trials. In real-world applications, BCI systems usually have to perform drowsiness prediction under imbalanced-datasets, which means that SVM cannot provide reasonable reliability in real-world BCI applications.
To find a suitable CNN model for the drowsiness prediction task, this study introduces two CNN architectures for further evaluation: (1) AlexNet-a very popular CNN architecture that is larger than the CNN architecture used in this study [18], (2) 3D CNN-by performing 3D convolutions, which is capable of learning features from both spatial and temporal dimensions. For AlexNet, the EEG measurement is transformed into 227 × 227 to fit the input size of AlexNet. For 3D CNN, different from our approach using a linear combination of a sequence of EEG images, these EEG images are directly fed into the 3D CNN model since they are 3D input data. Our results show that both AlexNet and 3D CNN cannot achieve better performance than the proposed CNN architecture in the present study. That is, high-complexity CNN is not required for the drowsiness prediction task.
The detection of drowsy trials remains a challenge because different subjects have different drowsy EEG patterns. A further investigation of brain dynamics is the key to improving the prediction performance of the BCI system.