Developing a Deep Neural Network for Driver Fatigue Detection Using EEG Signals Based on Compressed Sensing

: In recent years, driver fatigue has become one of the main causes of road accidents. As a result, fatigue detection systems have been developed to warn drivers, and, among the available methods, EEG signal analysis is recognized as the most reliable method for detecting driver fatigue. This study presents an automated system for a two-stage classiﬁcation of driver fatigue, using a combination of compressed sensing (CS) theory and deep neural networks (DNNs), that is based on EEG signals. First, CS theory is used to compress the recorded EEG data in order to reduce the computational load. Then, the compressed EEG data is fed into the proposed deep convolutional neural network for automatic feature extraction/selection and classiﬁcation purposes. The proposed network architecture includes seven convolutional layers together with three long short-term memory (LSTM) layers. For compression rates of 40, 50, 60, 70, 80, and 90, the simulation results for a single-channel recording show accuracies of 95, 94.8, 94.6, 94.4, 94.4, and 92%, respectively. Furthermore, by comparing the results to previous methods, the accuracy of the proposed method for the two-stage classiﬁcation of driver fatigue has been improved and can be used to effectively detect driver fatigue.


Introduction
With the advancement of industrial technology in recent years, car production has increased dramatically, resulting in an increase in traffic accidents. Every year, 1.25 million people die in road accidents, according to the World Health Organization (WHO) [1]. Driver fatigue can be considered the main cause of road fatalities among the factors affecting car accidents. According to the National Highway Traffic Safety Administration (NHTSA), 100,000 driver fatigue accidents caused 1550 deaths, 71,000 injuries, and USD 12.5 billion in monetary losses, annually, in the United States [2]. Therefore, it is important to develop a method that can detect the levels of mental fatigue accurately and automatically in order to prevent catastrophic driving events [3].
Fatigue may be caused by a lack of sleep, prolonged driving, driving overnight, driving on a monotonous route, and so on. Fatigue, as a general term, involves drowsiness. Drowsiness is defined as the need for sleep, while fatigue requires rest (not necessarily sleep) [3][4][5]. Yawning, impatience, daydreaming, heavy eyes, etc., are the initial signs of fatigue [5][6][7]. Driver fatigue slows down the reaction time (prolongations), it reduces the driver's alertness, and it affects the driver's decisions. Fatigue on the road can be suppressed by listening to music, drinking coffee or energy drinks, and so on [7][8][9]. However, it is necessary to design an automatic system to detect the driver's mental fatigue with a high reliability that can warn the driver before potential accidents. In recent years, many studies have been conducted on the automatic detection of driver fatigue. Still, for various theory to detect driver fatigue on the basis of EEG signals. These researchers performed their experiments on 28 subjects to classify two states of driver fatigue, and they used the EEG Lab toolbox to eliminate the environmental and motion noise. The accuracy of their classification was reported to be approximately 98%. Luo et al. [26] used a combination of the adaptive scaling factor and entropy features to automatically detect driver fatigue. The researchers performed their experiments on 40 subjects using two channels of the EEG signals (Fp1 and Fp2) for a two-stage classification of driver fatigue. The EEG toolbox was also used to remove EOG noise. Moreover, the classification accuracy of their algorithm is reported to be about 98%. Gao et al. [27] used deep neural networks (DNNs) to detect driver fatigue on the basis of EEG signals. The researchers also performed their experiments on 10 subjects. Their deep network architecture consisted of 11 convolutional layers. The accuracy of the classification reported by these researchers is about 95%. Karuppusamy et al. [28] used a combination of EEG signals, facial expressions, and gyroscopic data to diagnose driver fatigue. The researchers used DNNs in their research. The final accuracy obtained, according to the model proposed by these researchers, is approximately 93%. Jiao et al. [29] used EEG and EOG signals to automatically detect driver fatigue. The researchers used continuous wavelet transform (CWT) to extract the frequency band analysis, and then selected the discriminatory features in their proposed model. They also used generative adversarial networks (GANs) to balance the samples of their class. The final accuracy reported by these researchers, according to the long short-term memory (LSTM) classifier, is approximately 98%. Liu et al. [30] used deep transfer learning networks to automatically detect driver fatigue on the basis of EEG signals. The random forest (RF) algorithm was also used to identify active channels. The highest accuracy reported by these researchers is approximately 73%.
A comprehensive review of previous studies for the automatic detection of driver fatigue indicates that, while several studies have been conducted in this regard, there are still some issues from various perspectives that should be considered: (1) The majority of these studies extract features manually, and few studies have used feature learning methods to extract features. Using manual methods necessitates complex processes, as well as specialized knowledge. Furthermore, manual feature extraction does not guarantee that the features chosen are optimal; (2) In most feature-learning-based driver-fatigue-detection methods, a large amount of raw time signals from multiple electrodes at high sampling rates are used directly as the input of the feature learning algorithms, imposing a significant burden on the acquisition hardware, data storage, and transmission bandwidth. As a result, it is critical to fundamentally alter the data processing mode of existing real-time monitoring systems by employing a novel data acquisition method and compression theory.
The current study aims to overcome the abovementioned challenges, especially the problem of a large amount of raw time signals. To the best of the authors' knowledge, this research study presents, for the first time, a novel method for the automatic detection of driver fatigue using a combination of compressed sensing (CS) and DNNs. Recently, CS has attracted the attention of many researchers in this field of research, and has shown great success in various fields, such as magnetic resonance imaging (MRI) [31], radar imaging [32], and seismic imaging [33]. DNNs are also a group of machine learning methods that can learn features hierarchically, from lower levels to higher levels, by building a deep architecture. Deep learning has achieved state-of-the-art performance in several application domains, such as signal processing [34][35][36], network security [37], the Internet of Things (IoT) [38], and so on.
In the proposed method, a deep convolutional long short-term memory (DCLSTM) neural network is designed to learn the optimal features from compressed data. In this study, compressed EEG signal data is used to detect driver fatigue for the first time. Indeed, the most significant contribution of this study can be attributed to the reduction in the amount of data collected in order to obtain the optimal features, while retaining useful information in the compressed data. In addition, by using compressed data as input, the computational burden of the feature learning process is significantly reduced. The simulation results of the proposed method for a single-channel recording show an accuracy of 92% for a compression rate of 90, which is a significant accuracy. For more clarity, the following outlines the contributions made by this article: a.
It provides a new and fully automated method for selecting and extracting discriminative features from the EEG signal to identify two stages of driver fatigue, with a high performance based on DNNs, with a proposed deep architecture, that does not require prior knowledge of, or expertise on, each case/subject; b.
For the first time, the paper presents the application of CS theory in combination with DL to reduce EEG signal samples without the loss of essential signal information, related to automatic driver-fatigue detection, for use in real-time systems; c.
The article illustrates the use of a minimum number of EEG signal channels to detect driver fatigue automatically, with the precondition of high classification accuracy and low detection errors; d. The selection of the parameters of the proposed method, and the effect of the key parameters on the deep network architecture, were thoroughly investigated in order to automatically detect driver fatigue. Furthermore, comparisons with traditional methods show the superiority of our proposed method. Because of the use of CS in the proposed algorithm, this method is suitable for extensive data processing and real-time processing, and it also provides a new idea for smart driver-fatigue detection; e.
In this study, the environmental noise while driving was considered for the first time among the previous research related to driver-fatigue detection. The results show that the proposed network is robust to noise up to 1 dB.
The rest of the paper is organized as follows: In Section 2, the mathematical background of CS theory and DNNs is presented; in Section 3, the acquisition of the EEG data, the proposed method, the parameter selection, and the network architecture are examined. Section 4 discusses the simulation results and makes comparisons to previous research, and, finally, Section 5 consists of the conclusion.

Background
In this section, a brief mathematical introduction to CS theory and DNNs for the automatic detection of driver fatigue is described.

Compressed Sensing Theory
This section provides a brief description of CS theory [39]. Theoretically, CS adopts a quasi-optimal measurement scheme that gathers all the raw signal information and that realizes an efficient and dimensionally reduced representation of the signal. Using CS theory, a sparse signal can be reconstructed from far fewer samples than what the Shannon-Nyquist sampling theorem [40] requires. It has been shown that many nonsparse signals (such as EEG signals) can be converted to sparse signals through various signal transformations, such as discrete cosine transformation (DCT), and discrete wavelet transformation (DWT). In view of the above, the CS theory can also be applied to nonsparse signals, provided that the sparsity of the transformed signal is guaranteed. Considering X ∈ R N and Φ ∈ R M×N as the input signal and the measurement matrix, respectively, the compressed output, Y ∈ R M , is obtained as: A measurement matrix that satisfies the restricted isometry property (RIP) will result in M N, which means an output observation vector of much less dimensionality than that of the input signal. The RIP implies that, for any strictly sparse vector, the measurement matrix must satisfy: where δ is the RIP constant, which has a value between 0 and 1. The compression ratio (CR) is an indicator of the extent to which the signals are compressed, and is defined as: Thus, having M N will result in a high CR. Thus, various levels of compressed acquisitions are achieved by adjusting the size of the measurement matrix (provided that the RIP condition is met). Finally, an exact reconstruction of X can be performed by various algorithms, such as L1 norm minimization [41], and orthogonal matching pursuit (OMP) [42].

Deep Convolutional Neural Networks
CNNs are considered powerful deep learning techniques to learn sophisticated phenomena. CNNs are very efficient in terms of learning features, and they are used in a variety of applications, such as computer vision. CNNs consist of three main layers: convolutional layers, pooling layers, and fully connected layers (FCs) [43,44]. In the convolutional layers, the CNN network uses various kernels to convolve the input signal, as well as intermediate feature maps. These layers reduce the number of parameters and cause immutability and stability with respect to displacement. The pooling layer is typically used after the convolutional layer to reduce the size of the feature maps. The most common method is to use the max-pooling and average pooling functions to implement a pooling layer. The fully connected layer comes after the last pooling layer, and it converts the two-dimensional feature map into a one-dimensional feature vector. To prevent overfitting, the dropout layer is used; thus, according to a probability, each neuron will be thrown out of the network at each stage of the training [45]. The batch normalization (BN) layer is typically used to normalize the data within the network and to accelerate the network training process [46]. Various types of activation functions are used in the layers, such as the Leaky Rectified Linear Unit (ReLU) and Softmax [47,48]. In the prediction stage, the "loss" function is carried out to learn the error ratio. Then, an optimization algorithm is applied for reducing the error criterion. Indeed, the optimization results are used for updating the hyperparameters. The "loss" function is a means of evaluating and representing the model efficiency in machine learning approaches [49,50].

Long Short-Term Memory (LSTM)
Recurrent neural networks (RNNs) are powerful networks that are widely used in learning from sequential data, such as text, audio, and video. RNNs can reduce the computational load of the algorithm by reducing the input data dimension and by facilitating the training process. However, as the gap between previous input information and the point of need grows, these networks encounter a lag in the learning features and perform poorly [51]. As a result, to address the shortcomings of traditional RNNs, LSTM networks have been introduced. Since prior information can affect the model accuracy, the use of LSTM has become a popular option for researchers. Unlike traditional RNNs, where the content is rewritten at every step, in LSTM, the network is able to decide whether to retain the current memory through its memory gateways. Intuitively, if an LSTM detects an important feature in the input sequence in the initial steps, it can easily transmit this information over a long period of time, thereby receiving and maintaining such long-term dependencies. Moreover, because of their memory-based design, these networks avoid gradient vanishes and the instability that plagues traditional RNNs [52,53].

Proposed Method
In this section, the proposed fatigue detection algorithm is provided, which is based on the CS theory and the DCLSTM network for the automatic classification of two stages of driver fatigue (a block diagram of the proposed method is shown in Figure 1). First, the acquisition of the EEG signals for the diagnosis of two-stage driver fatigue is described. Then, the preprocessing techniques performed on the recorded EEG signal are described, followed by the related details of the signal compression. Finally, the proposed DCLSTM network architecture is presented.

Proposed Method
In this section, the proposed fatigue detection algorithm is provided, which is based on the CS theory and the DCLSTM network for the automatic classification of two stages of driver fatigue (a block diagram of the proposed method is shown in Figure 1). First, the acquisition of the EEG signals for the diagnosis of two-stage driver fatigue is described. Then, the preprocessing techniques performed on the recorded EEG signal are described, followed by the related details of the signal compression. Finally, the proposed DCLSTM network architecture is presented.

Acquisition of EEG Signals
Eleven graduate students (six men and five women), between 22 and 30 years of age, took part in a driving simulation experiment. It was ensured that all participants had a driver's license, and that, up until the experiment, none of them had experienced driving

Acquisition of EEG Signals
Eleven graduate students (six men and five women), between 22 and 30 years of age, took part in a driving simulation experiment. It was ensured that all participants had a driver's license, and that, up until the experiment, none of them had experienced driving in a driving simulator. All of the participants in the experiment were also right-handed. This experiment was carried out with the ethics code license number, IR.TBZ-REC.1399.6, in the signal processing laboratory of the Biomedical Engineering Department of the Faculty of Electrical and Computer Engineering, at the University of Tabriz. All participants were asked to confirm and sign the voluntary attendance consent form and the test requirements (no history of psychiatry, no history of epilepsy, and pretesting hair washing, enough sleep overnight, and no pretesting caffeine) before the experiment. The experiment was conducted using a G-Tec 32-Channel EEG Recorder, an MSI Laptop (Corei7 and 16 Ram), a Logitech Driving Simulator G29, a City Car Driving Simulator, and a Samsung 40-inch LCD. Figure 2 shows the recording of the EEG signal of a subject while driving in the simulator. The EEG signal recording was carried out following the international standard 10-20 electrode placement system, at the sampling frequency of 1000 Hz, and with the A1 and A2 channels as reference electrodes. Before the experiment, all participants practiced driving with the simulator to become acquainted with it and the purpose of the experiment. In order to induce mental fatigue in the drivers, the driving route in the simulator was considered a uniform highway without traffic. When the driving procedure began for 20 min, the last 3-min EEG recording was labeled as the "normal" stage. The ongoing driving process lasted until the participant's questionnaire results (the multidimensional fatigue inventory scale [54]) showed that the subject was at the "fatigue" stage (varying from 40-100 min, depending on the participant), and the last 3-min EEG recordings were marked as the "fatigue" stage. The drivers were required to report their levels of fatigue using the Chalder Fatigue and Lee Fatigue questionnaires to confirm their fatigue [55,56]. The questionnaires included the following questions: "Do you feel tired?"; "Do you have a blurred vision problem?"; "Do you feel like you are constantly increasing your speed?"; "Do you feel out of focus?", etc. Each question in the questionnaire had four scores, from −1 to 2. A score of −1 means "better than usual", a score of 0 means "normal", a score of 1 means "tired", and a score of 2 means "very tired". A high fatigue score indicates a high level of driver fatigue, which has been used in many recent studies to confirm fatigue [23,24,27,30]. The driving task started at 9 a.m., and only one EEG signal per day was recorded in order to ensure the same recording conditions for all of the participants. in a driving simulator. All of the participants in the experiment were also right-handed. This experiment was carried out with the ethics code license number, IR.TBZ-REC.1399.6, in the signal processing laboratory of the Biomedical Engineering Department of the Faculty of Electrical and Computer Engineering, at the University of Tabriz. All participants were asked to confirm and sign the voluntary attendance consent form and the test requirements (no history of psychiatry, no history of epilepsy, and pretesting hair washing, enough sleep overnight, and no pretesting caffeine) before the experiment. The experiment was conducted using a G-Tec 32-Channel EEG Recorder, an MSI Laptop (Corei7 and 16 Ram), a Logitech Driving Simulator G29, a City Car Driving Simulator, and a Samsung 40-inch LCD. Figure 2 shows the recording of the EEG signal of a subject while driving in the simulator. The EEG signal recording was carried out following the international standard 10-20 electrode placement system, at the sampling frequency of 1000 Hz, and with the A1 and A2 channels as reference electrodes. Before the experiment, all participants practiced driving with the simulator to become acquainted with it and the purpose of the experiment. In order to induce mental fatigue in the drivers, the driving route in the simulator was considered a uniform highway without traffic. When the driving procedure began for 20 min, the last 3-min EEG recording was labeled as the "normal" stage. The ongoing driving process lasted until the participant's questionnaire results (the multidimensional fatigue inventory scale [54]) showed that the subject was at the "fatigue" stage (varying from 40-100 min, depending on the participant), and the last 3-min EEG recordings were marked as the "fatigue" stage. The drivers were required to report their levels of fatigue using the Chalder Fatigue and Lee Fatigue questionnaires to confirm their fatigue [55,56]. The questionnaires included the following questions: "Do you feel tired?"; "Do you have a blurred vision problem?"; "Do you feel like you are constantly increasing your speed?"; "Do you feel out of focus?", etc. Each question in the questionnaire had four scores, from −1 to 2. A score of −1 means "better than usual", a score of 0 means "normal", a score of 1 means "tired", and a score of 2 means "very tired". A high fatigue score indicates a high level of driver fatigue, which has been used in many recent studies to confirm fatigue [23,24,27,30]. The driving task started at 9 a.m., and only one EEG signal per day was recorded in order to ensure the same recording conditions for all of the participants.

Preprocessing
In order to remove unwanted artifacts from the recorded EEG signals, at first, a notch filter was applied to remove the 50-Hz frequency of the power supply. Second, a firstorder Butterworth filter, with a frequency of 0.5 to 60 Hz, was applied to the data to capture useful information to detect driver fatigue [57]. Third, to improve the detection efficiency, the features for each participant were normalized by scaling between 0 and 1, and by applying min-max normalization. Fourth, since one of the objectives of this study is to use minimal EEG signal channels, it is necessary to identify the active electrodes to reduce the computational complexity. In accordance with [24,25,58,59], 12 electrodes, out of the

Preprocessing
In order to remove unwanted artifacts from the recorded EEG signals, at first, a notch filter was applied to remove the 50-Hz frequency of the power supply. Second, a first-order Butterworth filter, with a frequency of 0.5 to 60 Hz, was applied to the data to capture useful information to detect driver fatigue [57]. Third, to improve the detection efficiency, the features for each participant were normalized by scaling between 0 and 1, and by applying min-max normalization. Fourth, since one of the objectives of this study is to use minimal EEG signal channels, it is necessary to identify the active electrodes to reduce the computational complexity. In accordance with [24,25,58,59], 12 electrodes, out of the 30 electrodes used for signal recording, were identified in the form of six active regions, on the basis of the electrode weights, for this purpose. Accordingly, only data from the 12 selected channels were used for the compression and data processing, and the rest of the channels were excluded from the processing. The selected electrodes, in the form of six regions, A, B, C, D, E, and F, are shown in 2D and 3D in Figure 3. 30 electrodes used for signal recording, were identified in the form of on the basis of the electrode weights, for this purpose. Accordingly, onl selected channels were used for the compression and data processing, channels were excluded from the processing. The selected electrodes, regions, A, B, C, D, E, and F, are shown in 2D and 3D in Figure 3.

Signal Compression Based on CS Theory
This section provides a detailed description of how to compress basis of CS theory. After preprocessing the signal, 12 selected electrod pression process. In the following, the process of the segmentation an the signal, according to the block diagram of Figure 4, is described. As ously, 3 min of the recorded EEG channel is assigned to either the "n tigue" stages. In this case, we have two classes of data (normal and fati samples for each channel. Then, every 3-min recording is divided into 5 segments, with an overlap of 1200 samples. Since we have 11 subjects the input matrix X , for each class (normal and fatigue), corresponding to E), will be equal to N × S, where N = 11 × 150, and S is the raw signa equal to n × w (n is the number of electrodes, and w is the length of each is equal to 5000). As stated in Section 3, in accordance with the CS theor RIP condition, the input matrix, X , is multiplied by a random Gaussia the dimension, M × N, in order to produce the compressed signal, Y number of rows of the raw signal, X , as N = 11 × 150 = 1650, now the n the compressed signal, Y, is reduced to M = N × (1 − CR/100). In the nex proposed DCLSTM network for the selection/extraction of the discrim perform the automatic classification.

Signal Compression Based on CS Theory
This section provides a detailed description of how to compress the signal on the basis of CS theory. After preprocessing the signal, 12 selected electrodes enter the compression process. In the following, the process of the segmentation and compression of the signal, according to the block diagram of Figure 4, is described. As mentioned previously, 3 min of the recorded EEG channel is assigned to either the "normal" or the "fatigue" stages. In this case, we have two classes of data (normal and fatigue), with 180,000 samples for each channel. Then, every 3-min recording is divided into 5-s intervals of 150 segments, with an overlap of 1200 samples. Since we have 11 subjects, the dimension of the input matrix X, for each class (normal and fatigue), corresponding to each region (A to E), will be equal to N × S, where N = 11 × 150, and S is the raw signal dimension and is equal to n × w (n is the number of electrodes, and w is the length of each 5-s segment, and is equal to 5000). As stated in Section 3, in accordance with the CS theory, to guarantee the RIP condition, the input matrix, X, is multiplied by a random Gaussian matrix, Φ, with the dimension, M × N, in order to produce the compressed signal, Y. Considering the number of rows of the raw signal, X, as N = 11 × 150 = 1650, now the number of rows of the compressed signal, Y, is reduced to M = N × (1 − CR/100). In the next step, Y enters the proposed DCLSTM network for the selection/extraction of the discriminative features to perform the automatic classification.

The Proposed Deep Neural Network Architecture
In this section, the architecture of the proposed DCLSTM network will be introduced, which is depicted in Figure 5. To recognize the "fatigue" and "normal" stages, the proposed network consists of seven convolutional layers, three LSTM layers, and one Softmax layer (without using the fully connected layer), as follows: (a) A dropout layer; (b) A convolution layer with a nonlinear Leaky ReLU function and a max-pooling layer, followed by a dropout layer and a batch normalization; (c) The architecture of the previous step is repeated six times without a dropout layer; (d) The previous architecture's output is connected to an LSTM layer with a nonlinear Leaky ReLU function, with a dropout layer and a batch normalization; (e) The architecture of the previous step is repeated two times; (f) The Softmax layer is used to access the outputs and calculate the scores. Table 1 displays the specifics of the proposed DCLSTM architecture with CS (DCLSTM-CS), such as the sizes of the filters, the layer types, and the number of filters. As shown in Table 1, the dimensions of the strides at different CRs are different.

The Proposed Deep Neural Network Architecture
In this section, the architecture of the proposed DCLSTM network will be introduced, which is depicted in Figure 5. To recognize the "fatigue" and "normal" stages, the proposed network consists of seven convolutional layers, three LSTM layers, and one Softmax layer (without using the fully connected layer), as follows: (a) A dropout layer; (b) A convolution layer with a nonlinear Leaky ReLU function and a max-pooling layer, followed by a dropout layer and a batch normalization; (c) The architecture of the previous step is repeated six times without a dropout layer; (d) The previous architecture's output is connected to an LSTM layer with a nonlinear Leaky ReLU function, with a dropout layer and a batch normalization; (e) The architecture of the previous step is repeated two times; (f) The Softmax layer is used to access the outputs and calculate the scores. Table 1 displays the specifics of the proposed DCLSTM architecture with CS (DCLSTM-CS), such as the sizes of the filters, the layer types, and the number of filters. As shown in Table 1

Training and Evaluation
All of the hyperparameters for the proposed method were carefully adjusted to achieve the best convergence degree. A trial-and-error method was used to select these parameters. The numbers of the samples and the parameters for the training, evaluation, and test sets for all the active regions are also shown in Figure 6. According to Figure 6, 70% of the gathered data is randomly selected for training, 10% for validation, and the remaining 20% is selected as the test set. The 5-fold cross-validation was also performed for all of the active regions for a more detailed analysis. Figure 7 shows a graphical schematic for the 5-fold evaluation.  In the proposed network, the weights are initially assumed to be random and small, and they are then updated using the optimal hyperparameters on the basis of the RMSProp optimizer and the cross-entropy cost function shown in Table 2. In designing the proposed network architecture, we used different types of optimizers, and different numbers and filter sizes, and we achieved the optimal values for the parameters of the proposed architecture, which are shown in Table 2.  In the proposed network, the weights are initially assumed to be random and small, and they are then updated using the optimal hyperparameters on the basis of the RMSProp optimizer and the cross-entropy cost function shown in Table 2. In designing the proposed network architecture, we used different types of optimizers, and different numbers and filter sizes, and we achieved the optimal values for the parameters of the proposed architecture, which are shown in Table 2.

Results and Discussion
The simulation results, comparisons with previous studies, and an intuitive evaluation of the proposed method are presented in this section.

Simulation Results
The simulation results of the proposed DCLSTM-CS algorithm for the automatic detection of driver fatigue are discussed in this section. The computer specifications used to simulate the proposed method are as follows: 32 GB of RAM, a Core i9 CPU, and an RTX 2070 Graphics card. The different parameters for the proposed DCLSTM network were finetuned using the trial-and-error method. As is shown in Figure 8, we used different numbers of convolution layers (three to eleven layers) in the proposed network design, while seven layers of convolution is considered as the optimum solution (in terms of accuracy and speed). As can be seen from Figure 8, by increasing the number of convolution layers (above seven), the classification accuracy remains almost constant, and the training time increases accordingly. As mentioned before, in our proposed algorithm, the compressed signal is used as the DCLSTM network input, for which Table 3 shows the accuracy obtained on the validation data. As is shown in Table 3, the performance of the proposed algorithm for the noncompression (NC) mode, where the raw signal is directly fed into the DCLSTM network, is given, as well as for the compression mode, where the CR is set to 40, 50, 60, 70, 80, and 90. As is clear from the NC mode, the final accuracy for all regions is over 93%. Moreover, as can be seen, the accuracy of Region E, which includes a single channel, is approximately 98%. In addition, as is clear, even at CR = 90, the accuracy of the proposed network for all regions, except for Region F, is still above 90%. Figure 9 shows the loss of the proposed network for six active regions, where CR = 40. As can be seen from Figure 9, increasing the number of iterations leads to a decrease in the losses for all regions, reaching a steady-state value at approximately 300 iterations. Figure 10 shows the classification accuracy of the proposed network where CR = 40, and, as can be seen, the classification accuracies for Regions A, B, C, D, E, and F reach 96.3, 95.15, 95.9, 94.9, 95, and 91.3%, respectively, at about 310 iterations. The confusion matrix for the proposed DCLSTM-CS is provided in Figure 11, for all regions at CR = 40. As can be seen, only 21 samples were misdiagnosed as "fatigue" stage in Region E, indicating that the proposed network performs well. For further analysis, Figure 12 shows the performances (sensitivity, precision, specificity, F-score, kappa, accuracy, and training time) of the proposed network for the single-channel region (E) at different CRs, as well as the NC mode. As can be seen from Figure 12, the network training time decreases as the CR increases; however, the network performance for CR = 90 is still higher than 90%, which indicates a promising performance for the proposed network in single-channel driver-fatigue-detection scenarios. Furthermore, Figure 13 shows the T-Sen graph for the raw signal and the LSTM2 layers for the single-channel region (E) at CR = 80, as well as the NC mode. As can be seen from the last layer in the different CRs, almost all of the samples are separated for the evaluation set in each of the CRs, indicating the desirable performance of the proposed method for the two-stage classification of driver fatigue. As a result, on the basis of the questionnaire, the model calculation results are consistent with the actual fatigue of the participants. Therefore, the proposed method can detect driver fatigue effectively. To further research the efficacy of the proposed method, Figure 14 shows the classification accuracy obtained for each fold in all the selected regions at different CRs. As is shown in Figure 14, the accuracy obtained for each fold at different compression rates is approximately higher than 90%, indicating that the overfitting phenomenon did not occur in different folds.

Comparison with the State-of-the-Art Methods
The proposed method was evaluated in a variety of ways. First, it was compared with the common and popular methods used to diagnose driver fatigue, and second, it was compared with the most recent studies. In order to demonstrate the performance of the proposed algorithms, based on DCLSTM and DCLSTM-CS, the two-stage classification of

Comparison with the State-of-the-Art Methods
The proposed method was evaluated in a variety of ways. First, it was compared with the common and popular methods used to diagnose driver fatigue, and second, it was compared with the most recent studies. In order to demonstrate the performance of the proposed algorithms, based on DCLSTM and DCLSTM-CS, the two-stage classification of driver fatigue based on Region E (single-channel) was simulated using four existing popular networks. The networks to be compared included MLP [60], DBM [61], SVM [62], and Fully CNN (FCNN) [63], and they are based on feature learning from raw data, as well as manual feature extraction (for the manual features, the mean, minimum, crest factor, skewness, variance, maximum, and kurtosis were also used), which have recently been widely used in driver-fatigue detection studies. For the FCNN, the proposed network architecture is considered in Figure 5, without the use of LSTM. As the kernel function of the SVM, the Gaussian radial basis function (RBF) was used, and to optimize the kernel parameters, the grid search method was used. In addition, the number of hidden layers and the learning rate for the MLP and DBM networks is 3 and 0.0001, respectively. The accuracies obtained, on the basis of feature learning from the raw data and manual features, are presented in Table 4. As is shown in this table, applying feature learning in deep networks (FCNN, DBM, DCLSTM, and DCLSTM-CS) leads to a significant improvement in terms of the accuracy, compared with the manual feature extraction approach. As it is seen in Table 4, the proposed DCLSTM and DCLSTM-CS present the highest accuracies, compared to the other deep and traditional approaches (i.e., SVM and MLP). The accuracies of the proposed DCLSTM and DCLSTM-CS (for CR = 40), together with FCNN, DBM, SVM, and MLP, based on feature learning, are shown Figure 15. The accuracies of the proposed DCLSTM, DCLSTM-CS, and the FCNN, DBM, SVM, and MLP reach 98, 95, 88, 85, 72, and 70%, respectively, after 400 iterations. As is seen, the accuracies of the proposed networks are significantly higher than the existing methods, while having faster convergence rates. As is shown in Figure 15, although the proposed DCLSTM network has higher accuracy than its compressed mode (DCLSTM-CS), the accuracy achieved by DCLSTM-CS is still remarkably higher that the existing methods (about 95%), and it is more computationally efficient compared to DCLSTM. In addition, for a thorough comparison of the proposed methods with the existing state-of-the-art methods, a performance of all the methods was examined in a noisy environment. For this purpose, a white Gaussian noise of SNR, from −4 to 20 dB, was added to the EEG signals as measurement noise, and the classification accuracies for all the methods are shown in Figure 16. As can be seen, the proposed DCLSTM and DCLSTM-CS algorithms are quite robust to the measurement noise over a wide range of SNRs, such that the classification accuracy is still above 90%. A number of studies have been conducted in recent years in the field of the automatic detection of driver fatigue. The best results presented in these studies are shown in Table 5 and are compared to the proposed algorithms. As is shown in Table 5, the proposed methods have the best performance when compared to previous studies. Table 4. Accuracies of the proposed DCLSTM and DCLSTM-CS methods, as well as of the FCNN, DBM, SVM, and MLP methods for both feature learning and manual feature extraction scenarios. tures, are presented in Table 4. As is shown in this table, applying feature learning in de networks (FCNN, DBM, DCLSTM, and DCLSTM-CS) leads to a significant improveme in terms of the accuracy, compared with the manual feature extraction approach. As it seen in Table 4, the proposed DCLSTM and DCLSTM-CS present the highest accuracie compared to the other deep and traditional approaches (i.e., SVM and MLP). The accur cies of the proposed DCLSTM and DCLSTM-CS (for CR = 40), together with FCNN, DBM SVM, and MLP, based on feature learning, are shown Figure 15. The accuracies of t proposed DCLSTM, DCLSTM-CS, and the FCNN, DBM, SVM, and MLP reach 98, 95, 8 85, 72, and 70%, respectively, after 400 iterations. As is seen, the accuracies of the propos networks are significantly higher than the existing methods, while having faster conve gence rates. As is shown in Figure 15, although the proposed DCLSTM network has high accuracy than its compressed mode (DCLSTM-CS), the accuracy achieved by DCLSTM CS is still remarkably higher that the existing methods (about 95%), and it is more com putationally efficient compared to DCLSTM. In addition, for a thorough comparison the proposed methods with the existing state-of-the-art methods, a performance of all t methods was examined in a noisy environment. For this purpose, a white Gaussian noi of SNR, from −4 to 20 dB, was added to the EEG signals as measurement noise, and t classification accuracies for all the methods are shown in Figure 16. As can be seen, t proposed DCLSTM and DCLSTM-CS algorithms are quite robust to the measureme noise over a wide range of SNRs, such that the classification accuracy is still above 90 A number of studies have been conducted in recent years in the field of the automa detection of driver fatigue. The best results presented in these studies are shown in Tab  5 and are compared to the proposed algorithms. As is shown in Table 5, the propos methods have the best performance when compared to previous studies.

Methods
Feature Learning from Raw Manual Features (%) Figure 16. Robustness of the proposed algorithms against existing methods in different SNRs. Table 5. Accuracies of the proposed networks compared with existing state-of-the-art methods.