Noise Robustness Analysis of Performance for EEG-Based Driver Fatigue Detection Using Different Entropy Feature Sets

Driver fatigue is an important factor in traffic accidents, and the development of a detection system for driver fatigue is of great significance. To estimate and prevent driver fatigue, various classifiers based on electroencephalogram (EEG) signals have been developed; however, as EEG signals have inherent non-stationary characteristics, their detection performance is often deteriorated by background noise. To investigate the effects of noise on detection performance, simulated Gaussian noise, spike noise, and electromyogram (EMG) noise were added into a raw EEG signal. Four types of entropies, including sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE), and spectral entropy (PE), were deployed for feature sets. Three base classifiers (K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Decision Tree (DT)) and two ensemble methods (Bootstrap Aggregating (Bagging) and Boosting) were employed and compared. Results showed that: (1) the simulated Gaussian noise and EMG noise had an impact on accuracy, while simulated spike noise did not, which is of great significance for the future application of driver fatigue detection; (2) the influence on noise performance was different based on each classifier, for example, the robust effect of classifier DT was the best and classifier SVM was the weakest; (3) the influence on noise performance was also different with each feature set where the robustness of feature set FE and the combined feature set were the best; and (4) while the Bagging method could not significantly improve performance against noise addition, the Boosting method may significantly improve performance against superimposed Gaussian and EMG noise. The entropy feature extraction method could not only identify driver fatigue, but also effectively resist noise, which is of great significance in future applications of an EEG-based driver fatigue detection system.


Introduction
As EEG signals can reflect the instant state of the brain, it is an excellent method to evaluate the state and function of the brain, and is often used to assist in the diagnosis of stroke, epilepsy, and seizure.Various computational methods based on EEG signals have been developed for the analysis and detection of driver fatigue.
Correa et al. [1] developed an automatic method to detect the drowsiness stage in EEG signals using 19 features and a Neural Network classifier, and obtained an accuracy of 83.6% for drowsiness detections.Mu et al. [2] employed fuzzy entropy for feature extraction and an SVM classifier to achieve an average accuracy of 85%.Other results from their study showed that four feature sets (SE, AE, PE, and FE) and SVM were proposed, with an average accuracy of 98.75% [3].Fu et al. [4] proposed a fatigue detection model based on the Hidden Markov Model (HMM), and achieved a highest accuracy of 92.5% based on EEG signals and other physiological signals.Li et al. [5] collected Entropy 2017, 19, 385 3 of 29 in the count, and SE requires a relatively large r to find similar subsequences and to avoid the log(0) problem.They are also very sensitive to input parameters m, r, and N [25].More recently, FE has been proposed to alleviate these problems.FE is based on a continuous function to compute the dissimilarity between two zero-mean subsequences and, consequently, is more stable in noise and parameter initialization terms.These metrics is still scarcely used in EEG studies, but are expected to replace AE and SE because of their excellent stability, mainly when applied to noisy or short records.
Given the non-stationary characteristics of EEG signals, we have observed that the optimal detection performance varied as a result of the classifiers or feature sets, which is a major obstacle in EEG signal classification.Thus, a classifier optimized for a particular set of training data may not work well for driver fatigue detection with new data.
Investigating the ability of feature sets and classifiers to evaluate the performance of a detection system in the presence of noise is an important area of investigation as the real EEG signal is seldom noise free.However, how the addition of simulated noise can cause changes in the driver fatigue detection performance for various classifiers or various feature sets has yet to be sufficiently studied.Furthermore, research involving noise robustness analysis to evaluate for the driver fatigue detection performance of the EEG signals in the presence of noise by various feature sets and various classifiers has not been addressed.In general, systematic study investigating the effects of simulated noise on driver fatigue detection systems and the ability of such measures to evaluate the detection systems under simulated Gaussian noise is missing.To the best of our knowledge, our study is one of the first to apply the noise robustness analysis method on EEG signals for driver fatigue detection.
In this study, our aim was to evaluate the robustness of various classifiers and feature sets for driver fatigue detection systems under simulated Gaussian noise.Four types of entropy were deployed as feature sets in this work: FE, SE, AE, and PE.The classification procedure was implemented by three base classifiers: KNN, SVM, and DT, which have been known as state-of-the-art classification methods in many studies.The ensemble classifiers were developed by two ensemble methods: Bagging and Boosting.The challenge was to analyze the impacts of noise on detection performance with four feature sets and five classification methods.
First, with simulated Gaussian noise, we compared the detection performance, i.e., the average accuracy of DT, SVM, and KNN methods.Second, we evaluated the noise robustness of these methods.The noisy EEG signals were generated with the addition of random Gaussian noise into the original EEG signal.Then, we assessed the noise robustness of these methods.Third, in addition to the base classifiers, we examined the effects of the Bagging and Boosting ensemble methods.Moreover, we repeated these analyses with simulated spike noise and simulated EMG noise.This paper is organized as follows: in Section 2, the experiment and EEG signal processing methods such as acquisition, preprocessing, segment, feature extraction and classification are described.In addition, noise generation is explained in this section.Section 3 shows the experimental results and discussion.Finally, we conclude this paper in Section 4.

Materials and Methods
Figure 1 shows the workflow of this paper, including EEG acquisition, preprocessing, segment, feature extraction, noise generation, classification, and performance analysis.

Subjects
Twenty-two university students (14 male, 19-24 years) participated in this experiment.All subjects were asked to be abstain from any type of stimulus like alcohol, medicine, or tea before and during the experiment.Before the experiment, subjects practiced the driving task for several min to become acquainted with the experimental procedures and purposes.This work was approved by all subjects, and the experiments was authorized by the Academic Ethics Committee of the Jiangxi University of Technology.The subjects provided their written informed consent as per human research protocol in this study.Furthermore, all subjects provided their written informed consent as per human research protocol in this study.

Experimental Paradigm
The driving fatigue simulation experiment was performed by each subject on a static driving simulator (The ZY-31D car driving simulator, produced by Peking ZhongYu CO., LTD, China), as shown in Figure 2. On the screen, a customized version of the Peking ZIGUANGJIYE software ZG-601 (Car driving simulation teaching system, V9.2) was shown.
This equipment was an analog form of a real driving car, which contained all the driving capabilities of a vehicle.Using computer software technology, different driving environments could be constructed, such as sunny, foggy or snowy weather and mountain, highway, and countryside areas.The driving environment selected for this experiment was a highway with low traffic density that could more easily induce monotonous driving.Some research has suggested that the brain in this driving environment is more easily turned into a state of fatigue and the EEG signal was more stable, therefore benefiting our next data recording.All subjects in this experiment had an approximate real driving experience.

Subjects
Twenty-two university students (14 male, 19-24 years) participated in this experiment.All subjects were asked to be abstain from any type of stimulus like alcohol, medicine, or tea before and during the experiment.Before the experiment, subjects practiced the driving task for several min to become acquainted with the experimental procedures and purposes.This work was approved by all subjects, and the experiments was authorized by the Academic Ethics Committee of the Jiangxi University of Technology.The subjects provided their written informed consent as per human research protocol in this study.Furthermore, all subjects provided their written informed consent as per human research protocol in this study.

Experimental Paradigm
The driving fatigue simulation experiment was performed by each subject on a static driving simulator (The ZY-31D car driving simulator, produced by Peking ZhongYu CO., LTD, Beijing, China), as shown in Figure 2. On the screen, a customized version of the Peking ZIGUANGJIYE software ZG-601 (Car driving simulation teaching system, V9.2) was shown.
This equipment was an analog form of a real driving car, which contained all the driving capabilities of a vehicle.Using computer software technology, different driving environments could be constructed, such as sunny, foggy or snowy weather and mountain, highway, and countryside areas.The driving environment selected for this experiment was a highway with low traffic density that could more easily induce monotonous driving.Some research has suggested that the brain in this driving environment is more easily turned into a state of fatigue and the EEG signal was more stable, therefore benefiting our next data recording.All subjects in this experiment had an approximate real driving experience.

Data Acquisition and Preprocessing
In summary, the total duration of the experiment was 40-130 min.The first step was to become familiar with the simulating software, followed by continuous monotonous driving until driver fatigue was determined and the experiment terminated.
When the driving lasted 10 min, the last 5 min of the EEG signals were recorded as the normal state.When the continuous driving lasted 30-120 min (until the self-reported fatigue questionnaire results showed the subject was in driving fatigue), obeying Borg's fatigue scale and Lee's subjective fatigue scale, the last 5 min of the EEG signals were labeled as the fatigue state.EOG was also used to analyze eye blink patterns as an objective part of the validation of the fatigue state.It should be noted that the validation of the fatigue condition was also based on a self-reported fatigue questionnaire as per Borg's fatigue scale and Lee's subjective fatigue scale [26,27].This method of using a questionnaire to identify the fatigue condition has not only been used in our study, but also in many other studies [2,3].The drivers were required to complete all tasks and ensure safe driving.Prior to the experiment, the drivers familiarized themselves with the driving simulator and the completion of the driving tasks.
All channel data were referenced to two electrically linked mastoids at A1 and A2, digitized at 1000 Hz from a 32-channel electrode cap (including 30 effective channels and two reference channels) based on the International 10-20 system (Figure 3) and stored in a computer for offline analysis.Eye movements and blinks were monitored by recording the horizontal and vertical EOG.
After the acquisition of EEG signals, the main steps of data preprocessing were carried out by using the Scan 4.3 software of Neuroscan (Compumedics, Australia).The raw signals were first filtered by a 50 Hz notch filter and a 0.15-45 Hz band-pass filter was used.Next, 5-min EEG signals from 30 channels were sectioned into 1-s epochs, resulting in 300 epochs.With 22 subjects, a total of 6600 epochs (792,000 units for 30 channels and 4 feature sets) of dataset was randomly formed for the normal state and another 6600 epochs (792,000 units for 30 channels and 4 feature sets) for the fatigue state.

Data Acquisition and Preprocessing
In summary, the total duration of the experiment was 40-130 min.The first step was to become familiar with the simulating software, followed by continuous monotonous driving until driver fatigue was determined and the experiment terminated.
When the driving lasted 10 min, the last 5 min of the EEG signals were recorded as the normal state.When the continuous driving lasted 30-120 min (until the self-reported fatigue questionnaire results showed the subject was in driving fatigue), obeying Borg's fatigue scale and Lee's subjective fatigue scale, the last 5 min of the EEG signals were labeled as the fatigue state.EOG was also used to analyze eye blink patterns as an objective part of the validation of the fatigue state.It should be noted that the validation of the fatigue condition was also based on a self-reported fatigue questionnaire as per Borg's fatigue scale and Lee's subjective fatigue scale [26,27].This method of using a questionnaire to identify the fatigue condition has not only been used in our study, but also in many other studies [2,3].The drivers were required to complete all tasks and ensure safe driving.Prior to the experiment, the drivers familiarized themselves with the driving simulator and the completion of the driving tasks.
All channel data were referenced to two electrically linked mastoids at A1 and A2, digitized at 1000 Hz from a 32-channel electrode cap (including 30 effective channels and two reference channels) based on the International 10-20 system (Figure 3) and stored in a computer for offline analysis.Eye movements and blinks were monitored by recording the horizontal and vertical EOG.
After the acquisition of EEG signals, the main steps of data preprocessing were carried out by using the Scan 4.3 software of Neuroscan (Compumedics, Australia).The raw signals were first filtered by a 50 Hz notch filter and a 0.15-45 Hz band-pass filter was used.Next, 5-min EEG signals from 30 channels were sectioned into 1-s epochs, resulting in 300 epochs.With 22 subjects, a total of 6600 epochs (792,000 units for 30 channels and 4 feature sets) of dataset was randomly formed for the normal state and another 6600 epochs (792,000 units for 30 channels and 4 feature sets) for the fatigue state.

Feature Extraction
As the EEG signal is assumed to be a non-stationary time series and most feature extraction methods are only applicable to stationary signal, in this study, to deal with this problem, the EEG time series was divided into many short windows and its statistics were assumed to be approximately stationary within each window.The following feature extraction methods were applied to each 1 s windowed signal.EEG signals were segmented without overlap, and finally feature sets were extracted from all channels in each 1 s window.
The ability to distinguish between the normal state and fatigue state depended mainly on the quality of input vectors of the classifier.To capture EEG characteristics, four feature sets including FE, SE, AE, and PE were calculated [21][22][23][24][25].In this section, methods for the calculation of these feature sets on EEG recordings are described in detailed.

Spectral Entropy (PE)
PE was evaluated using the normalized Shannon entropy [28], which quantifies the spectral complexity of the time series.The power level of the frequency component is denoted by Yi and the normalization of the power yi is performed as: The spectral entropy of the time series is computed using the following formula: 2.4.2.Approximate Entropy (AE) AE, as proposed by Pincus [23], is a statistically quantified nonlinear dynamic parameter that measures the complexity of a time series.The procedure for the AE-based algorithm is described as follows: Considering a time series t(i), a set of m-dimensional vectors are obtained as per the sequence order of t(i):

Feature Extraction
As the EEG signal is assumed to be a non-stationary time series and most feature extraction methods are only applicable to stationary signal, in this study, to deal with this problem, the EEG time series was divided into many short windows and its statistics were assumed to be approximately stationary within each window.The following feature extraction methods were applied to each 1 s windowed signal.EEG signals were segmented without overlap, and finally feature sets were extracted from all channels in each 1 s window.
The ability to distinguish between the normal state and fatigue state depended mainly on the quality of input vectors of the classifier.To capture EEG characteristics, four feature sets including FE, SE, AE, and PE were calculated [21][22][23][24][25].In this section, methods for the calculation of these feature sets on EEG recordings are described in detailed.

Spectral Entropy (PE)
PE was evaluated using the normalized Shannon entropy [28], which quantifies the spectral complexity of the time series.The power level of the frequency component is denoted by Y i and the normalization of the power y i is performed as: The spectral entropy of the time series is computed using the following formula: 2.4.2.Approximate Entropy (AE) AE, as proposed by Pincus [23], is a statistically quantified nonlinear dynamic parameter that measures the complexity of a time series.The procedure for the AE-based algorithm is described as follows: Considering a time series t(i), a set of m-dimensional vectors are obtained as per the sequence order of t(i): where d[T m i , T m j ] is the distance between two vectors T m i and T m j , defined as the maximum difference values between the corresponding elements of two vectors: Define S i as the number of vectors T j that are similar to T i , subject to the criterion of similarity Define the function γ m (s) as: Set m = m + 1, and repeat Equations ( 1) to (3) to obtain S m+1 i (s) and γ m+1 (s), then: The approximate entropy can be expressed as:

. Sample Entropy (SE)
The SE algorithm is like that of AE [25,29], and is a new measure of time series complexity proposed by Richman and Moorman [24].Equations ( 1) and (2) can be defined in the same way as the AE-based algorithm; other steps in the SE-based algorithm are described as follows: Define A i as the number of vectors T j that are similar to T i , subject to the criterion of similarity Define the function γ m (s) as: Set m = m + 1, and repeat the above steps to obtain A m+1 i (s) and γ m+1 (s), then The sample entropy can be expressed as:

Fuzzy Entropy (FE)
To deal with some of the issues with sample entropy, Xiang et al. [22] proposed the use of a fuzzy membership function in computing the vector similarity to replace the binary function in sample entropy algorithm, so that the entropy value as continuous and smooth.The procedure for the FE-based algorithm is described in detail as follows: Set a L-point sample sequence: {v(i) : 1 ≤ i ≤ L}; The phase-space reconstruction is performed on v(i) as per the sequence order.The reconstructed vector can be written as: where i = 1, 2, . . ., L − m + 1, and v 0 (i) is the average value described as the following equation: d m ij , the distance between two vectors T m i and T m j , is defined as the maximum difference in values between the corresponding elements of two vectors: Based on the fuzzy membership function σ(d m ij , n, s), the similarity degree D m ij between two vectors T m i and T m j is defined as: where the fuzzy membership function σ(d m ij , n, s) is an exponential function, while n and s are the gradient and width of the exponential function, respectively.
Define the function γ m (n, s): Repeat the Equations (1) to (4) in the same manner.Define the function: The fuzzy entropy can be expressed as: In the above-mentioned four types of entropies, AE, SE and FE have variable parameters, m and r.In the present study, m = 2 while r = 0.2*SD, where SD denotes the standard deviation of the time series as per the literature [3,22,25].
For optimizing detection quality, the feature sets were normalized for each subject and each channel by scaling between 0 and 1.

Classification
However, due to the lack of a substantial sample size, algorithms based on ensemble learning methods needed to evaluate the detection performance for driver fatigue.Bagging is an acronym of "bootstrap aggregating" [30,31], and builds several subsets and aggregates their individual predictions to form a final prediction.In the Bagging method, the number of base classifiers must be set.To investigate the impact of base classifier number on the classification result, we set the number of base classifiers as 50, 100, and 200, respectively.Like Bagging, Boosting also uses subsets to train classifiers, but not randomly [32][33][34].In Boosting, difficult samples have higher probabilities of being selected for training, and easier samples have less chance of being used.In the Boosting method, the number of Boosting stages has to be set.To investigate the impact of the Boosting stage number on the classification result, we set the number of the Boosting stage to 50, 100, and 200, respectively.
The Bagging and Boosting methods both try to construct multiple classifiers by using different subsets.Bagging trains each classifier over a randomly selected subset, while the Boosting method trains each new classifier [35].
Some classification models can fit data for a range of values of a parameter almost as efficiently as fitting the classifier for specific value of the parameters.This feature can be leveraged to perform a more efficient cross-validation for the selection of parameters.A high variance can lead to over-fitting in model selection, and hence poor performance, even when the number of hyper-parameters is relatively small [36].It seems likely that over-fitting during model selection can be overcome using various approaches.To overcome the bias in performance evaluation, parameter selection should be conducted independently in each trial to prevent selection bias and to reflect optimal performance.Performance evaluation based on these principles requires repeated training with different sets of hyper-parameter values on different samples of the available data, which makes it well-suited to parallel implementation.The magnitude of the bias deviations from full nested cross-validation can be introduced, which can easily swamp the difference in performance between the classifier systems.
To avoid the problem of over-fitting and to make general classifiers for other independent datasets, the datasets were separated into training sets and test sets in the following pattern.In the training phase, a 10-fold cross validation was applied on the features so that 10% of the feature vectors were dedicated as a test set and the other 90% of feature vectors were considered as the training set.In the next iteration, another 10% of the feature vectors were considered as a test set and the rest for the training set, until all the feature vectors had participated once in the test phase.The final result was achieved by averaging the outcome produced in the corresponding test repeated 10 times (for different subjects and different feature sets).Using this evaluation scheme, the dependency of the training and test features was removed, thus avoiding the over-fitting problem [37][38][39][40][41][42].In particular, though GB is a more capable and practical boosting algorithm, like most other classifiers, GB also had the problem of over-fitting when dealing with very noisy data.To overcome such a problem, the validation sets were used to adjust the hypothesis of the Boost algorithm to improve generalization, thereby alleviating overfitting and improving performance, which have long been used in addressing the problem of overfitting with neural networks and decision trees [43,44].Its basic concept is to apply the classifier to a set of instances distinct from the training set.Thus, the sequence of base classifiers produced by GB from the training set, also is applied to the validation set for alleviating overfitting problem.
For optimizing parameters, it is very important to obtain the optimum values for the classifier performance.Three widely used classifiers (KNN, SVM, and DT) were employed as classifiers in this work.To select optimal parameters of the model, this paper adopted the method of cross validation based on grid search, thus avoiding arbitrary and capricious behavior.Grid search is a model hyperparameter optimization technique.In this study, a grid parameter search was used to achieve optimal results.Related parameters in this study are as follows: penalty parameter, kernel and kernel coefficient for SVM, number of neighbors for KNN, the number of features, the maximum depth of the tree and the minimum number of samples for DT, the number of base estimators and the number of features for Bagging method, learning rate, the number of boosting stages and maximum depth for Boosting method.

Simulated Noise
The noises of the EEG signals included white noise, spike noise, muscular noise, ocular noise, and cardiac noise.White noise accounts for possible sources in real environments, such as thermal noise or electro-magnetic noise, which can be generated by a Gaussian random process.Spikes can be of sensor movement origin and the probability of appearance was kept relatively low in a real case.Muscular artifacts were drawn from electromyogram (EMG) signals.Ocular artifacts came from electrooculogram (EOG) signals.Cardiac artifacts were generated by heartbeat.In this paper, only white noise was considered for simplicity.
To analyze the influence of noise on detection performance, we built a simulated Gaussian noise P noise i where P i is the original EEG signal of channel I; P noise i is the simulated Gaussian white noise; and P i is the noisy EEG signal with simulated Gaussian white noise.We assumed that P noise i and P i were uncorrelated, and P noise i ∼D*N(0, 1).Here, D is defined as the level of noise given as a percentage of the average level of the noise-free data P i (t).
Therefore, to evaluate the noise robustness of the classifiers systematically, we used scale factor D to control the noise power.To make polluted EEG data by Gaussian noise, we generated the same dimension of Gaussian noise to the segmented EEG signal, i.e., noise dimension was 1024 per second per channel.

Spike Noise
Spikes were synthetically generated as described in Reference [45] and these interferences can be of a technological (sensor movement, electrical interferences) or physiological (mainly eye blinks) origin.The probability of appearance was kept relatively low (0.01), as to be expected in a real case.Duration was set at 1 sample and amplitude was set at 1.

Muscular Noise
Muscular noises were drawn from an actual long electromyogram (EMG) signal downloaded from PhysioNet [46], which corresponded to a patient with myopathy.Data were acquired at 50 KHz and then down sampled to 1 KHz.For each run, an EMG epoch of length N was extracted from the entire record by commencing at a random sample.These noises accounted for muscular activity during EEG recording.

Performance Metrics
To estimate the potential application performance of a detector, it is very important to properly examine the detection quality.The total average accuracy based on a feature set and some classifiers was the average of the accuracy of all single channels based on the same feature and the same classifiers.The classification capabilities of different classifiers were comprehensively investigated with several indexes including Accuracy, Precision, Recall, F1-score, and the Matthews Correlation Coefficient (MCC) [47].These indexes are given as follows: Accuracy is the percentage of normal predictions corresponding to all samples; Precision is the percentage of normal predictions corresponding to the normal samples; and Recall is the percentage of fatigue predictions corresponding to the fatigue samples.Furthermore, the F1-score was used to appraise both Precision and Recall.The MCC was used as a measure of the quality of binary classifications as it considers true and false positives and negatives, and is generally regarded as a balanced measure which can be used even if the classes are of extremely different sizes.Therefore, a high Precision, Recall, F1-score, and MCC value relates to higher performance.The following equation set is used in the literature for examining performance quality: The recall is intuitively the ability of the classifier to find all the positive samples.
The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.
The F1 score can be interpreted as a weighted average of the precision and recall, where an F1 score reaches its best value at 1 and worst score at 0.
The Matthews correlation coefficient (MCC) is used in machine learning as a measure of the quality of two-class classifications.The MCC is a correlation coefficient value between −1 and +1.

MCC
where TP (true positive) represents the number of normal signals identified as such; TN (true negative), the number of fatigue signals classified as such; FP (false positive), the number of fatigue signals recognized as such; FN (false negative), the number of normal signals distinguished as fatigue signals.
To investigate differences in average accuracy among various classifiers and feature sets, the paired sample t-test was used to evaluate the effectiveness of each comparison.The results were averaged over ten independently drawn combinations in each experiment.

Gaussian Noise
In general, when all EEG channels are used for detecting driver fatigue, good results may be achieved; however, we wanted to understand what the impact would be on detection performance if the noise was superimposed on some channels.To investigate this question, first, the feature sets (FE, SE, AE and PE) of the EEG signals across all 30 channels were extracted for training and recognition, before gradually adding a number of noisy channels.
To explore the effect of the number of noisy channels that can be added to the detection system, we evaluated the system performance with respect to the number of polluted channels.For each number m (from 1 to 30), a random combination (m out of 30 channels) was repeated 10 times to calculate classification accuracy using a 10-fold cross validation.The scale factor (D) of superimposed noise was set at 1.0.Furthermore, for each condition (m from 1 to 30), the paired t-test was used as a post-hoc test to evaluate and compare the performance of the classifiers.

Effect of Noise: DT Classifier vs. KNN Classifier vs. SVM Classifier
Based on the literature [36][37][38][39], of the four feature sets, FE out-performed the other feature sets.DT was the best among several classifiers, while SVM was the weakest.Here, we compare the detection performance for the three classifiers and four feature sets with increasingly noisy channels.The comparison among the three classifiers in terms of average accuracy for each feature set is shown in Figure 4.
First, we evaluated the classification accuracy of these methods using the original experimental datasets uncontaminated by noise sources.We observed little difference in the average accuracy between the three classifiers; moreover, we investigated the impact of increasing the noisy channels on the detection performance of each method.The number of noisy channels varied from 1 to 30.
When using the FE feature set, there were no differences in average accuracy among KNN, DT, and SVM for the original EEG signals (paired t-test, p > 0.05).However, with more channels adding noise, the average accuracy of the three classifiers decreased, but DT decreased slowly.For the DT classifier, the average accuracy was decreased from 0.958 with a noise-free signal to 0.771 with 30 noisy channels.For the SVM and KNN, the average accuracies were 0.972 and 0.966 with noise-free signals and dropped to 0.682 and 0.590 with 30 noisy channels, respectively.The performance of DT was better than those of SVM and KNN in the presence of 30 noisy channels.In addition, the mean difference of the classification accuracy between DT and SVM (KNN) was statistically significant in the presence of 30 noisy channels (paired t-test, p < 0.01).
When using feature set AE, SVM achieved a competitive average accuracy over both DT and KNN for the original EEG signals.However, this difference was not statistically significant (paired t-test, p > 0.05).After noise addition by the proposed method, significantly lower accuracy was obtained by the SVM than by DT.However, with more and more channels adding noise, the average accuracy of three classifiers decreased, but DT decreased slowly.For the DT classifier, the average accuracy was decreased from 0.929 with noise-free signal to 0.690 with 30 noisy channels.For the SVM and KNN, the average accuracies were 0.952 and 0.926 with noise-free signal and dropped to 0.647 and 0.550 with 30 noisy channels, respectively.The effect of DT was better than those of SVM and KNN in the presence of 30 noisy channels (paired t-test, p < 0.01).The effect of SE was similar to that of AE.
When using the feature set PE, the detection performance of SVM was significantly better than the other two classifiers for the original EEG signals (paired t-test, p < 0.01).However, with more and more channels adding noise, the average accuracy of three classifiers decreased, until the final average accuracy was almost the same.For the DT classifier, the average accuracy decreased from 0.782 with a noise-free signal to 0.636 with 30 noisy channels (paired t-test, p < 0.01).For the SVM and KNN, the average accuracies were 0.825 and 0.763 with noise-free signal and dropped to 0.645 and 0.567 with 30 noisy channels, respectively (paired t-test, p < 0.01).
From the above results, for all four feature sets (FE, SE, AE and PE), the difference between the various classifiers for noisy EEG signals was clear and remained consistent, with the performance of the DT classifier being greater than those of the SVM and KNN classifiers (except PE).The average accuracy for the DT classifier decreased slowly, while the average accuracy for the other two classifiers decreased quickly with increasing noisy channels.As mentioned before, the difference between the DT classifier and the other two classifiers continued to grow across a varying number of noisy channels, with little difference in the classification accuracy between the SVM and KNN methods.
The classification accuracy of the FE consistently out-performed other feature sets regardless of classifier.The most significant differences in the noisy EEG data between SVM and DT were found in FE.The average accuracy for the PE feature set with original noise-free EEG was not high, and was lower than those for the other three feature sets significantly (paired t-test, p < 0.01).In two-class classification problems, the theoretical chance level is 50%; however, in the EEG based driver fatigue detecting system, classification accuracy of at least 60% is considered as a threshold for an acceptable recognition.Thus, there is little difference among the three classifiers with the PE feature set.

Effect of Noise: Using Bagging Ensemble Learning Method
As mentioned before, many studies have found that the use of ensemble learning can provide a certain degree of robustness for noise; nevertheless, we wanted to investigate whether ensemble learning would work for driving fatigue detection.Next, we analyzed the effect of noise using the Bagging ensemble learning method.A comparison between average accuracies obtained from noisy EEG data using the Bagging method is illustrated in Figure 5.
As shown in Figure 5, bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble with 50, 100, and 200 KNN base classifiers, respectively.Figure 5 shows how the average accuracy for different classifiers and feature sets changed with an increase in noisy channels.The average accuracy for the Bagging method decreased the same as that for KNN without the Bagging method when noisy channels increased.There was no difference in average accuracy between the KNN without the Bagging method and KNN with the Bagging method (paired t-test, p > 0.05), and average accuracies both decreased with the increase in noisy channels.

Effect of Noise: Using Bagging Ensemble Learning Method
As mentioned before, many studies have found that the use of ensemble learning can provide a certain degree of robustness for noise; nevertheless, we wanted to investigate whether ensemble learning would work for driving fatigue detection.Next, we analyzed the effect of noise using the Bagging ensemble learning method.A comparison between average accuracies obtained from noisy EEG data using the Bagging method is illustrated in Figure 5.
As shown in Figure 5, bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble with 50, 100, and 200 KNN base classifiers, respectively.Figure 5 shows how the average accuracy for different classifiers and feature sets changed with an increase in noisy channels.The average accuracy for the Bagging method decreased the same as that for KNN without the Bagging method when noisy channels increased.There was no difference in average accuracy between the KNN without the Bagging method and KNN with the Bagging method (paired t-test, p > 0.05), and average accuracies both decreased with the increase in noisy channels.
When using the feature set FE, there was no difference in average accuracy between KNN without the Bagging method and KNN with the Bagging method (paired t-test, p > 0.05).However, with more channels adding noise, the average accuracy of KNN classifiers without the Bagging method and with the Bagging method both decreased.For the KNN classifier, the average accuracy decreased from 0.966 with a noise-free signal to 0.590 with 30 noisy channels.For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.954, 0.954, and 0.954 with noisefree signal and dropped to 0.534, 0.561 and 0.550 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Bagging method cannot effectively improve the recognition effects of the KNN classifier without noisy channels and with noisy channels.There was also no obvious effect when the number of base classifiers was increased.When using the feature set FE, there was no difference in average accuracy between KNN without the Bagging method and KNN with the Bagging method (paired t-test, p > 0.05).However, with more channels adding noise, the average accuracy of KNN classifiers without the Bagging method and with the Bagging method both decreased.For the KNN classifier, the average accuracy decreased from 0.966 with a noise-free signal to 0.590 with 30 noisy channels.For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.954, 0.954, and 0.954 with noise-free signal and dropped to 0.534, 0.561 and 0.550 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Bagging method cannot effectively improve the recognition effects of the KNN classifier without noisy channels and with noisy channels.There was also no obvious effect when the number of base classifiers was increased.

Effect of Noise: Using Boosting Ensemble Learning Method
Next, we analyzed the effects of noise using another ensemble learning method, Boosting.As shown in Figure 6, GB50, GB100 and GB 200 represent the Boosting ensemble with 50, 100, and 200 Boosting stages.Figure 6 shows how the average accuracy for different classifiers and feature sets changed with increasing channels of additive noise.In the case of the Boosting method, for the four feature sets (FE, SE, AE, and PE), the difference between the various classifiers for noise-free EEG signals and noisy EEG signals was clear (paired t-test, p < 0.01).The average accuracy for the Boosting method decreased slower than that of KNN without the Boosting method when the noisy channels increased.
When using the feature set FE, there were differences in average accuracy between KNN and the Boosting method (paired t-test, p < 0.01).With more channels adding noise, the average accuracy of both classifiers decreased, but the Boosting method decreased slowly.For the KNN classifier, the average accuracy decreased from 0.966 with noise-free signals to 0.590 with 30 noisy channels.For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.950, 0.947 and 0.947 with noise-free signal and dropped to 0.793, 0.806 and 0.792 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Boosting method was unable to improve the recognition effect without noisy channels; however, it did significantly improve the recognition effect with noisy channels, and there was no obvious effect when the number of base classifiers was increased.The above results are summarized in Table 1.A0 is defined as the average accuracy with noisefree signals while A30 is defined as the average accuracy with 30 noisy channels signals.In this paper, A30/A0 is used as an important indicator for robustness.Table 1 summarizes the average accuracy of the three classifiers and the two ensemble methods in the four feature sets obtained from noisy EEG data.It was noted that the Boosting method had significantly different average accuracies from other methods across all feature sets when the EEG data were polluted.Furthermore, FE achieved a better performance than those of SE and AE. Figure 6 shows how the average accuracy for different classifiers and feature sets changed with increasing channels of additive noise.In the case of the Boosting method, for the four feature sets (FE, SE, AE, and PE), the difference between the various classifiers for noise-free EEG signals and noisy EEG signals was clear (paired t-test, p < 0.01).The average accuracy for the Boosting method decreased slower than that of KNN without the Boosting method when the noisy channels increased.
When using the feature set FE, there were differences in average accuracy between KNN and the Boosting method (paired t-test, p < 0.01).With more channels adding noise, the average accuracy of both classifiers decreased, but the Boosting method decreased slowly.For the KNN classifier, the average accuracy decreased from 0.966 with noise-free signals to 0.590 with 30 noisy channels.For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.950, 0.947 and 0.947 with noise-free signal and dropped to 0.793, 0.806 and 0.792 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Boosting method was unable to improve the recognition effect without noisy channels; however, it did significantly improve the recognition effect with noisy channels, and there was no obvious effect when the number of base classifiers was increased.
The above results are summarized in Table 1.A 0 is defined as the average accuracy with noise-free signals while A 30 is defined as the average accuracy with 30 noisy channels signals.In this paper, A 30 /A 0 is used as an important indicator for robustness.Table 1 summarizes the average accuracy of the three classifiers and the two ensemble methods in the four feature sets obtained from noisy EEG data.It was noted that the Boosting method had significantly different average accuracies from other methods across all feature sets when the EEG data were polluted.Furthermore, FE achieved a better performance than those of SE and AE.Combined entropy has been employed to achieve a better performance [3], but questions remain as to the impact on the detection performance if noise was superimposed on some channels.Combined feature sets (FE + SE + AE + PE) of EEG signals were extracted for training and recognition, before gradually adding noise.
As shown in Figure 7, there were no differences in average accuracy among the KNN, DT, and SVM for the original EEG signals (paired t-test, p > 0.05); however, with more channels adding noise, the average accuracy of the three classifiers decreased, but DT decreased slower than the others.For the DT classifier, the average accuracy decreased from 0.933 with a noise-free signal to 0.815 with 30 noisy channels.For the SVM and KNN classifiers, the average accuracies were 0.929 and 0.941 with noise-free signals and dropped to 0.688 and 0.762 with 30 noisy channels, respectively.The effect of the DT was better than those of the SVM and KNN in the presence of 30 noisy channels (paired t-test, p < 0.01).For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.943, 0.943, and 0.943 with noise-free signals and dropped to 0.778, 0.790, and 0.772 Entropy 2017, 19, 385 16 of 29 with 30 noisy channels, respectively.The above results show that the Bagging method cannot effectively improve the recognition effect of the KNN classifier without noisy channels and with noisy channels (paired t-test, p > 0.05).There was also no obvious effect when the number of base classifiers was increased.For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.944, 0.942, and 0.953 with noise-free signals and dropped to 0.900, 0.888 and 0.878 with 30 noisy channels, respectively.The above results show that the Boosting method was unable to improve the recognition effect without noisy channels (paired t-test, p > 0.05); however, it could significantly improve the recognition effect with noisy channels (paired t-test, p < 0.01).Furthermore, there were no obvious effects when the number of base classifiers was increased.
the DT classifier, the average accuracy decreased from 0.933 with a noise-free signal to 0.815 with 30 noisy channels.For the SVM and KNN classifiers, the average accuracies were 0.929 and 0.941 with noise-free signals and dropped to 0.688 and 0.762 with 30 noisy channels, respectively.The effect of the DT was better than those of the SVM and KNN in the presence of 30 noisy channels (paired t-test, p < 0.01).For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.943, 0.943, and 0.943 with noise-free signals and dropped to 0.778, 0.790, and 0.772 with 30 noisy channels, respectively.The above results show that the Bagging method cannot effectively improve the recognition effect of the KNN classifier without noisy channels and with noisy channels (paired t-test, p > 0.05).There was also no obvious effect when the number of base classifiers was increased.For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.944, 0.942, and 0.953 with noise-free signals and dropped to 0.900, 0.888 and 0.878 with 30 noisy channels, respectively.The above results show that the Boosting method was unable to improve the recognition effect without noisy channels (paired t-test, p > 0.05); however, it could significantly improve the recognition effect with noisy channels (paired t-test, p < 0.01).Furthermore, there were no obvious effects when the number of base classifiers was increased.The above results are summarized in Table 2, and in conjunction with Table 1, it can be seen that combined entropy can enhance robustness.The above results are summarized in Table 2, and in conjunction with Table 1, it can be seen that combined entropy can enhance robustness.The above results are summarized in Table 3.A 0 is defined as the average accuracy with noise-free signals while A 30 is defined as the average accuracy with 30 noisy channel signals.In this paper, A 30 /A 0 was used as an important indicator of robustness.Table 3 summarizes the other four performance indexes of three classifiers and two ensemble methods in the FE feature sets obtained from noisy EEG data.It was noted that the Boosting method had significantly different average accuracies from other methods across all four indexes when EEG data were polluted (paired t-test, p < 0.01).bKNN50 0.825 bKNN100 0.837 bKNN200 0.819 GB50 0.900 GB100 0.888 GB200 0.878

Other Performance Indexes
Figure 8 shows a comparison of the different classifiers for the FE feature sets.In this section, Precision, Recall, F1-score and MCC were used as the model performance indicators.A comparison of the results of different prediction methods and FE feature sets indicated that the GB model and DT classifier were statistically different to any of the other techniques, and achieved a better model performance.This finding further confirmed the advantages of the GB model and DT classifier in modeling complex relationships between EEG signals and the fatigue state.
The above results are summarized in Table 3. A0 is defined as the average accuracy with noisefree signals while A30 is defined as the average accuracy with 30 noisy channel signals.In this paper, A30/A0 was used as an important indicator of robustness.Table 3 summarizes the other four performance indexes of three classifiers and two ensemble methods in the FE feature sets obtained from noisy EEG data.It was noted that the Boosting method had significantly different average accuracies from other methods across all four indexes when EEG data were polluted (paired t-test, p < 0.01).In this section, we used polluted EEG signals that were generated by adding white Gaussian noise with a different scale factor D into the original EEG signal as mentioned in Section 2.6.1.This was accomplished by computing the average accuracy under increasing levels of noise (0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0).The changes to A 30 /A 0 with levels of noise were investigated in the 30-channel system.
As shown in Figure 9, we found that the classification accuracy of DT was higher than those of the SVM and KNN for all noise levels (paired t-test, p < 0.01).The difference in classification accuracy between the DT and SVM (KNN) increased with the increase in scale level.Similarly, Figure 10 shows the noise robustness results of the Bagging and Boosting method.It was found that the classification accuracy of DT was higher than that of the SVM (KNN) for all noise levels.In addition, when the noise power increased, the accuracy difference between the DT and SVM increased.For example, in the noiseless case, the average accuracy difference between the SVM and DT was 1.9%; however, in the case of D = 0.5 and 1.0, the difference was 5.8% and 8.5%.These results indicate that the DT method was more robust than the SVM for the polluted EEG signal in the Gaussian white noise case (paired t-test, p < 0.01).Furthermore, there were no significant differences among the four feature sets (paired t-test, p > 0.05).
The above results show that the level of noise did not change the effect of noise on the detection performance.Additionally, these results indicate that the Boosting method significantly enhanced the capabilities and robustness of the system, while the Bagging method was unable to do so.

Spike Noise
The experiment was repeated using spike noise.With a probability of 0.01, and a duration of 1 sample, spikes did not seem to significantly impact the matches count and, therefore, impact on the entropy metrics.
Based on the results of Section 3.1, among the four feature sets, FE performed best, and PE was the worst.Among the three classifiers, DT was the best, and SVM was the weakest.Here, we compare the detection performance for three classifiers and four feature sets with the addition of spike noise.
Figure 11 shows the variation of the average accuracy as a function of the number of noisy channels.Unlike the results described in Section 3.1, with an increase in the number of noisy channels for the five feature sets (FE, SE, AE, PE and Combined), the average accuracy was almost unchanged for different classifiers (paired t-test, p > 0.05).Given the low frequency of spike noise and the entropy feature extraction method, spike noise had little effect on classification performance.For the four kinds of entropy and the nine classification models, the average accuracy basically changed over a small range.
noise power increased, the accuracy difference between the DT and SVM increased.For example, in the noiseless case, the average accuracy difference between the SVM and DT was 1.9%; however, in the case of D = 0.5 and 1.0, the difference was 5.8% and 8.5%.These results indicate that the DT method was more robust than the SVM for the polluted EEG signal in the Gaussian white noise case (paired t-test, p < 0.01).Furthermore, there were no significant differences among the four feature sets (paired t-test, p > 0.05).The above results show that the level of noise did not change the effect of noise on the detection performance.Additionally, these results indicate that the Boosting method significantly enhanced the capabilities and robustness of the system, while the Bagging method was unable to do so.

Spike Noise
The experiment was repeated using spike noise.With a probability of 0.01, and a duration of 1 sample, spikes did not seem to significantly impact the matches count and, therefore, impact on the   A0 is defined as the average accuracy with noise-free signals while A30 is defined as the average accuracy with 30 noisy channels signals.In this paper, A30/A0 was used as an important indicator for robustness.Table 4 summarizes the average accuracy of the three classifiers in four feature sets obtained from noisy EEG data with the addition of spike noise.It was noted that spike noise made no difference to average accuracy across all classifiers and feature sets (paired t-test, p > 0.05).A 0 is defined as the average accuracy with noise-free signals while A 30 is defined as the average accuracy with 30 noisy channels signals.In this paper, A 30 /A 0 was used as an important indicator for robustness.Table 4 summarizes the average accuracy of the three classifiers in four feature sets obtained from noisy EEG data with the addition of spike noise.It was noted that spike noise made no difference to average accuracy across all classifiers and feature sets (paired t-test, p > 0.05).

EMG Noise
The experiment was repeated using simulated EMG noise.Based on the results described in Section 3.1, among the four feature sets, FE performed best, and PE was the worst.Among the three classifiers, DT was the best, and SVM was the weakest.Here we compare the detection performance for the three classifiers and four feature sets with EMG noise.
Figure 12 shows the variation of the average accuracy as a function of the number of noisy channels.For the four feature sets (FE, SE, AE, and PE), the difference between the various classifiers for the noise-free EEG signal and noisy EEG signal was clear and remained consistent, with a greater performance of the DT classifier than those of the SVM and KNN classifiers.However, unlike the results seen in Section 3.1, DT decreased significantly.
When using the feature set FE, there were no differences in average accuracy between the KNN, DT, and SVM classifiers for the original EEG signals (paired t-test, p > 0.05).However, with more channels superimposing noise, the average accuracy of the three classifiers decreased.For the DT classifier, the average accuracy decreased from 0.939 with noise-free signals to 0.816 with 30 noisy channels.For the SVM and KNN, the average accuracies were 0.976 and 0.966 with noise-free signal and dropped to 0.810 and 0.673 with 30 noisy channels, respectively.The effect of DT and SVM was better than that of KNN in the presence of 30 noisy channels (paired t-test, p < 0.01).
When using the feature set AE, there were no differences in average accuracy among the KNN, DT, and SVM for the original EEG signals (paired t-test, p > 0.05).However, with more channels adding noise, the average accuracy of three classifiers decreased.For the DT classifier, the average accuracy decreased from 0.899 with noise-free signal to 0.742 with 30 noisy channels.For the SVM and KNN, the average accuracies were 0.952 and 0.925 with noise-free signals and dropped to 0.709 and 0.610 with 30 noisy channels, respectively.Therefore, the effects of DT were better than those of the SVM and KNN classifiers in the presence of 30 noisy channels (paired t-test, p < 0.01), and SE was similar to AE.
When using the feature set PE, there were no differences in the average accuracy among the KNN, DT and SVM for the original EEG signals (paired t-test, p > 0.05).However, with more channels adding noise, the average accuracy of the three classifiers decreased until the final average accuracy was almost the same.For the SVM classifier, the average accuracy decreased from 0.766 with noise-free signal to 0.652 with 30 noisy channels.For the DT and KNN, the average accuracies were 0.746 and 0.721 with noise-free signal and dropped to 0.642 and 0.546 with 30 noisy channels, respectively.
The above results show that: (1) when EMG signals were superimposed; the effect of noise was greater than that of the Gaussian noise; (2) Similarly, while the FE feature set had higher robustness, AE and SE were similar.The average accuracy for the PE feature set with original noise-free EEG was not high, and was lower than those of the FE, SE, and AE feature sets significantly (paired t-test, p < 0.01), so there was little difference between the three classifiers.Finally, DT had the best robustness, and SVM and KNN were similar.

Effect of Noise: Using Bagging Ensemble Learning Method
Like Section 3.1.2,the average accuracy for the Bagging method decreased the same as that for KNN without the Bagging method when the noisy channels increased (paired t-test, p > 0.05).As shown in Figure 13, when using the feature set FE, there were no differences in average accuracy between the KNN without the Bagging method and KNN with the Bagging method (paired t-test, p > 0.05).For the KNN classifier, the average accuracy decreased from 0.966 with a noise-free signal to 0.673 with 30 noisy channels.For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.965, 0.966 and 0.966 with the noise-free signals and dropped to 0.688, 0.693 and 0.692 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Bagging method could not significantly improve the recognition effect of the KNN classifier without noisy channels and with noisy channels, and there were no obvious effects when the number of base classifiers was increased.

Effect of Noise: Using Bagging Ensemble Learning Method
Like Section 3.1.2,the average accuracy for the Bagging method decreased the same as that for KNN without the Bagging method when the noisy channels increased (paired t-test, p > 0.05).As shown in Figure 13, when using the feature set FE, there were no differences in average accuracy between the KNN without the Bagging method and KNN with the Bagging method (paired t-test, p > 0.05).For the KNN classifier, the average accuracy decreased from 0.966 with a noise-free signal to 0.673 with 30 noisy channels.For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.965, 0.966 and 0.966 with the noise-free signals and dropped to 0.688, 0.693 and 0.692 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Bagging method could not significantly improve the recognition effect of the KNN classifier without noisy channels and with noisy channels, and there were no obvious effects when the number of base classifiers was increased.

Effect of Noise: Using Boosting Ensemble Learning Method
As shown in Figure 9, in the case of the Boosting method, for four feature sets (FE, SE, AE, and PE), the difference between various classifiers for the noise-free EEG signal and noisy EEG signal was clear and remained consistent.The average accuracy for the Boosting method decreased slower than that for KNN without the Boosting method when the noisy channels increase.
As shown in Figure 14, when using the feature set FE, there was a difference in average accuracy between the KNN and Boosting method (paired t-test, p < 0.01).With more channels adding noise, the average accuracy of both classifiers decreased, but the Boosting method decreased slower.For the KNN classifier, the average accuracy decreased from 0.966 with a noise-free signal to 0.673 with 30 noisy channels (paired t-test, p < 0.01).For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.944, 0.947 and 0.945 with noise-free signals and dropped to 0.864, 0.877 and 0.873 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Boosting method could not effectively improve the recognition effect of the KNN classifier without noisy channels; however, it could significantly improve the recognition effect of KNN classifiers with noisy channels.Additionally, there was no obvious effect when the number of base classifiers was increased.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble with 50, 100, and 200 KNN base classifiers.

Effect of Noise: Using Boosting Ensemble Learning Method
As shown in Figure 9, in the case of the Boosting method, for four feature sets (FE, SE, AE, and PE), the difference between various classifiers for the noise-free EEG signal and noisy EEG signal was clear and remained consistent.The average accuracy for the Boosting method decreased slower than that for KNN without the Boosting method when the noisy channels increase.
As shown in Figure 14, when using the feature set FE, there was a difference in average accuracy between the KNN and Boosting method (paired t-test, p < 0.01).With more channels adding noise, the average accuracy of both classifiers decreased, but the Boosting method decreased slower.For the KNN classifier, the average accuracy decreased from 0.966 with a noise-free signal to 0.673 with 30 noisy channels (paired t-test, p < 0.01).For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.944, 0.947 and 0.945 with noise-free signals and dropped to 0.864, 0.877 and 0.873 with 30 noisy channels, respectively.The other three feature sets were similar to FE.
The above results show that the Boosting method could not effectively improve the recognition effect of the KNN classifier without noisy channels; however, it could significantly improve the recognition effect of KNN classifiers with noisy channels.Additionally, there was no obvious effect when the number of base classifiers was increased.
As shown in Figure 15, there were no differences in average accuracy among KNN, DT, and SVM for the unpolluted EEG signals (paired t-test, p > 0.05).However, with more channels adding EMG noise, the average accuracy of the three classifiers decreased, but DT decreased slower.For the DT classifier, the average accuracy decreased from 0.932 with a noise-free signal to 0.892 with 30 noisy channels.For the SVM and KNN, the average accuracies were 0.929 and 0.941 with noise-free signals and dropped to 0.766 and 0.838 with 30 noisy channels, respectively.The effect of DT was better than those of the SVM and KNN classifiers in the presence of 30 noisy channels (paired t-test, p < 0.01).For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.943, 0.943, and 0.943 with noise-free signals and dropped to 0.833, 0.849, and 0.841 with 30 noisy channels, respectively (paired t-test, p > 0.05).The above results showed that the Bagging method could not effectively improve the recognition effect of the KNN classifier without noisy channels and with noisy channels.There was also no obvious effect when the number of base classifiers was increased.For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.941, 0.952, and 0.953 with noise-free signals and dropped to 0.923, 0.929, and 0.933 with 30 noisy channels, respectively (paired t-test, p < 0.01).The above results showed that the Boosting method was unable to improve the recognition effect without noisy channels; however, it could significantly improve the recognition effect with noisy channels.There was no obvious effect when increasing the number of base classifiers.As shown in Figure 15, there were no differences in average accuracy among KNN, DT, and SVM for the unpolluted EEG signals (paired t-test, p > 0.05).However, with more channels adding EMG noise, the average accuracy of the three classifiers decreased, but DT decreased slower.For the DT classifier, the average accuracy decreased from 0.932 with a noise-free signal to 0.892 with 30 noisy channels.For the SVM and KNN, the average accuracies were 0.929 and 0.941 with noise-free signals and dropped to 0.766 and 0.838 with 30 noisy channels, respectively.The effect of DT was better than those of the SVM and KNN classifiers in the presence of 30 noisy channels (paired t-test, p < 0.01).For the Bagging method with 50, 100, and 200 base classifiers, the average accuracies were 0.943, 0.943, and 0.943 with noise-free signals and dropped to 0.833, 0.849, and 0.841 with 30 noisy channels, respectively (paired t-test, p > 0.05).The above results showed that the Bagging method could not effectively improve the recognition effect of the KNN classifier without noisy channels and with noisy channels.There was also no obvious effect when the number of base classifiers was increased.For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.941, 0.952, and 0.953 with noise-free signals and dropped to 0.923, 0.929, and 0.933 with 30 noisy channels, respectively (paired t-test, p < 0.01).The above results showed that the Boosting method was unable to improve the recognition effect without noisy channels; however, it could significantly improve the recognition effect with noisy channels.There was no obvious effect when increasing the number of base classifiers.

Conclusions
In this study, an approach based on simulated Gaussian noise was proposed to investigate the effect of different classifiers and four feature sets in detecting driver fatigue in an EEG-based system.For this purpose, we generated noise corrupted EEG signals using simulated Gaussian noise, Spike noise and simulated EMG noise.Next, we assessed the detection performance of various classifier methods with a varied number of noisy channels.Using the experimental driver fatigue based EEG and generated noisy signals, we compared the classification results of the DT, SVM, and KNN

Conclusions
In this study, an approach based on simulated Gaussian noise was proposed to investigate the effect of different classifiers and four feature sets in detecting driver fatigue in an EEG-based system.For this purpose, we generated noise corrupted EEG signals using simulated Gaussian noise, Spike noise and simulated EMG noise.Next, we assessed the detection performance of various classifier methods with a varied number of noisy channels.Using the experimental driver fatigue based EEG and generated noisy signals, we compared the classification results of the DT, SVM, and KNN methods.From our results, it was evident that DT showed superior noise robustness than the SVM and KNN methods.Furthermore, the results showed that the classification accuracy of FE and the combined feature set were better than those of the other feature sets.It was also found that the Bagging method could not effectively improve performance with noise, while the Boosting method may have effectively improved performance with noise.
Practically, the proposed method may face more problems outside the EEG acquisition from the lab.One of the most important is the noise issue as there are many artifacts that may affect driving fatigue recognition.Currently, there has been some research focused on artifact removal methods prior to the feature extraction process, but these methods may also cause problems in the elimination of the artifacts, and also weaken the feature, such as the average method.In addition, it may lead to computational complexity and temporal extension, which is unfavorable in practical applications.This study revealed that the extraction method with an appropriate combination of entropy features (such as FE or combined feature sets) and classifier (such as DT or Boosting) could not only improve the recognition rate; but could weaken the noise impact on the recognition rate.
However, some limitations of this study are: (1) the number of subjects was relatively small.Although the existing literature suggests that 22 subjects is not too small a sample size, the number still needs to be increased; (2) Only three commonly used classifiers and the four feature sets were compared in this study; (3) For simplicity, the noise and the original signal were subject to linear superposition.However, the models of external noise were diverse, and the interaction model with the original EEG signal were also diverse.Finally, the different impacts of different channels were not considered.

Figure 2 .
Figure 2. Snapshot of the experimental setup.

Figure 2 .
Figure 2. Snapshot of the experimental setup.

T
, defined as the maximum difference values between the corresponding elements of two vectors:

Figure 4 .
Figure 4. Influence of superimposed noise on the average accuracy for three classifiers when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for number of noisy channels.

Figure 4 .
Figure 4. Influence of superimposed noise on the average accuracy for three classifiers when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for number of noisy channels.

Figure 5 .
Figure 5. Influence of added noise on the average accuracy for three classifiers with the Bagging method when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for number of noisy channels.bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble method with 50, 100, and 200 K-Nearest Neighbors (KNN) base classifiers.3.1.3.Effect of Noise: Using Boosting Ensemble Learning Method Next, we analyzed the effects of noise using another ensemble learning method, Boosting.As shown in Figure 6, GB50, GB100 and GB 200 represent the Boosting ensemble with 50, 100, and 200 Boosting stages.Figure6shows how the average accuracy for different classifiers and feature sets changed with increasing channels of additive noise.In the case of the Boosting method, for the four feature sets (FE, SE, AE, and PE), the difference between the various classifiers for noise-free EEG signals and noisy EEG signals was clear (paired t-test, p < 0.01).The average accuracy for the Boosting method decreased slower than that of KNN without the Boosting method when the noisy channels increased.When using the feature set FE, there were differences in average accuracy between KNN and the Boosting method (paired t-test, p < 0.01).With more channels adding noise, the average accuracy of both classifiers decreased, but the Boosting method decreased slowly.For the KNN classifier, the average accuracy decreased from 0.966 with noise-free signals to 0.590 with 30 noisy channels.For the Boosting method with 50, 100, and 200 base classifiers, the average accuracies were 0.950, 0.947 and 0.947 with noise-free signal and dropped to 0.793, 0.806 and 0.792 with 30 noisy channels, respectively.The other three feature sets were similar to FE.The above results show that the Boosting method was unable to improve the recognition effect without noisy channels; however, it did significantly improve the recognition effect with noisy channels, and there was no obvious effect when the number of base classifiers was increased.

Figure 5 .Figure 6 .
Figure 5. Influence of added noise on the average accuracy for three classifiers with the Bagging method when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for number of noisy channels.bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble method with 50, 100, and 200 K-Nearest Neighbors (KNN) base classifiers.Entropy 2017, 19, 385 15 of 30

Figure 6 .
Figure 6.Influence of added noise on the average accuracy for three classifiers using the Boosting method when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.GB50, GB100, and GB200 represent the Boosting ensemble with 50, 100, and 200 Boosting stages.

Figure 7 .
Figure 7.Comparison of different classifiers for impact of noise on detection performance with combined feature sets.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for number of noisy channels.

Figure 7 .
Figure 7.Comparison of different classifiers for impact of noise on detection performance with combined feature sets.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for number of noisy channels.

Figure 8
Figure 8 shows a comparison of the different classifiers for the FE feature sets.In this section, Precision, Recall, F1-score and MCC were used as the model performance indicators.A comparison

Figure 8 .
Figure 8.Comparison of different classifiers on the impact of noise on the detection performance of fuzzy entropy (FE) feature sets.The left vertical coordinate is for average precision, Recall, F1-score and Matthews Correlation Coefficient (MCC), while the horizontal coordinate is for the number of noisy channels.(a-e) represents classifier KNN, DT, SVM, bKNN200 and GB200, respectively.

FE+GB200Figure 8 .
Figure 8.Comparison of different classifiers on the impact of noise on the detection performance of fuzzy entropy (FE) feature sets.The left vertical coordinate is for average precision, Recall, F1-score and Matthews Correlation Coefficient (MCC), while the horizontal coordinate is for the number of noisy channels.(a-e) represents classifier KNN, DT, SVM, bKNN200 and GB200, respectively.

Figure 9 .Figure 9 .Figure 9 .
Figure 9. Relationship between A30/A0 and levels of noise for the three classifiers for (a) KNN; (b) DT; (c) SVM.The left vertical coordinate is the value of A30/A0, while the horizontal coordinate is the scale level of noise.

Figure 10 .
Figure 10.Relationship between A30/A0 and levels of noise based on three classifiers for (a) Bagging method and (b) Boosting method.The left vertical coordinate is A30/A0, while the horizontal coordinate is the level of noise.In each subfigure, from top to bottom, from left to right, the results are based on FE, SE, AE, and PE, respectively.

Figure 10 .
Figure 10.Relationship between A 30 /A 0 and levels of noise based on three classifiers for (a) Bagging method and (b) Boosting method.The left vertical coordinate is A 30 /A 0 , while the horizontal coordinate is the level of noise.In each subfigure, from top to bottom, from left to right, the results are based on FE, SE, AE, and PE, respectively.

Figure 11 .
Figure 11.Influence of superimposed noise on the average accuracy for the three classifiers and two ensemble methods based on (a) FE feature set; (b) SE feature set; (c) AE feature set; (d) PE feature set and (e) combined feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble with 50, 100, and 200 KNN base classifiers.GB50, GB100 and GB200 represent the Boosting ensemble with 50, 100, and 200 Boosting stages.

Figure 11 .
Figure 11.Influence of superimposed noise on the average accuracy for the three classifiers and two ensemble methods based on (a) FE feature set; (b) SE feature set; (c) AE feature set; (d) PE feature set and (e) combined feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble with 50, 100, and 200 KNN base classifiers.GB50, GB100 and GB200 represent the Boosting ensemble with 50, 100, and 200 Boosting stages.

Entropy 2017, 19 , 385 24 of 30 pFigure 12 .
Figure 12.Influence of superimposed noise on the average accuracy for the three classifiers when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.

Figure 12 .
Figure 12.Influence of superimposed noise on the average accuracy for the three classifiers when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.

Figure 13 .
Figure 13.Influence of added noise on the average accuracy for three classifiers with the Bagging method when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble with 50, 100, and 200 KNN base classifiers.

Figure 13 .
Figure 13.Influence of added noise on the average accuracy for three classifiers with the Bagging method when using (a) FE feature set; (b) SE feature set; (c) AE feature set and (d) PE feature set.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.bKNN50, bKNN100 and bKNN200 represent the Bagging ensemble with 50, 100, and 200 KNN base classifiers. bKNN200

Figure 15 .
Figure 15.Comparison of different classifiers for the impact of noise on detection performance with combined feature sets.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.

Figure 15 .
Figure 15.Comparison of different classifiers for the impact of noise on detection performance with combined feature sets.The left vertical coordinate is for average accuracy, while the horizontal coordinate is for the number of noisy channels.

Table 1 .
Results of the analysis of the average accuracy with simulated noise electroencephalogram (EEG) signals in a 30-channel system.A 0 is defined as the average accuracy with noise-free signals while A 30 is defined as the average accuracy with 30 noisy channels signals.

Table 2 .
Results of the analysis of average accuracy with simulated noise EEG signals for the combined feature set.A 0 is defined as the average accuracy with noise-free signals while A 30 is defined as the average accuracy with 30 noisy channels signals.

Table 3 .
Results of the analysis of average precision, Recall, F1-score and MCC with simulated noise EEG signals in a 30-channel system.A0 is defined as the average accuracy with noise-free signals while A30 is defined as the average precision, Recall, F1-score and MCC with 30 noisy channel signals.

Table 3 .
Results of the analysis of average precision, Recall, F1-score and MCC with simulated noise EEG signals in a 30-channel system.A 0 is defined as the average accuracy with noise-free signals while A 30 is defined as the average precision, Recall, F1-score and MCC with 30 noisy channel signals.

Table 4 .
Analysis of average accuracy with simulated spike noise EEG signals.A 0 is defined as the average accuracy with noise-free signals while A 30 is defined as the average accuracy with 30 noisy channel signals.