Driver Fatigue Detection via Di ﬀ erential Evolution Extreme Learning Machine Technique

: Fatigue driving (FD) is one of the main causes of tra ﬃ c accidents. Traditionally, machine learning technologies such as back propagation neural network (BPNN) and support vector machine (SVM) are popularly used for fatigue driving detection. However, the BPNN exhibits slow convergence speed and many adjustable parameters, while it is di ﬃ cult to train large-scale samples in the SVM. In this paper, we develop extreme learning machine (ELM)-based FD detection method to avoid the above disadvantages. Further, since the randomness of the weight and biases between the input layer and the hidden layer of the ELM will inﬂuence its generalization performance, we further apply a di ﬀ erential evolution ELM (DE-ELM) method to the analysis of the driver’s respiration and heartbeat signals, which can e ﬀ ectively judge the driver fatigue state. Moreover, not only will the Doppler radar and smart bracelet be used to obtain the driver respiration and heartbeat signals, but also the sample database required for the experiment will be established through extensive signal collections. Experimental results show that the DE-ELM has a better performance on driver’s fatigue level detection than the traditional ELM and SVM. (ELM) model based on fatigue driving detection.


Introduction
With the rapid development of automobile technology, car ownership has increased rapidly over the past decades. However, the frequent occurrence of road traffic accidents has brought social problems that seriously threaten the safety of human life and property. According to the data of World Health Organization [1], more than 1.2 million people have died in traffic accidents each year, and millions were injured or maimed. Due to the increase in the number of traffic accidents, the severity of this problem has drawn considerable attention from society and governments [2]. Therefore, how to prevent traffic accidents has become one of the most important aspects in the world. According to relevant research, traffic accidents caused by fatigue driving (FD) account for 20-30% of all traffic accidents, which indicates that FD is a major cause of traffic accidents [3]. Drivers normally tend to be distracted, have less activity and slower brain response under fatigue, which will increase the likelihood of traffic accidents [4]. FD detection (FDD) techniques have broad development prospects in the prevention of traffic accidents, and has gradually attracted intensive attention among researchers, automotive industry as well as government organizations. Based on the experimental platform, a set of data collected from the platform have been tested. The corresponding test results are described as follows: (i) the waveform shown in Figure 2 is the signal collected by the radar module without testers. As is seen from the waveform, there is no signal input other than a small amount of noise signal. (ii) The waveform shown in Figure 3 is the signal collected by the radar module with the tester breathing normally. As observed from the waveform, the waveform changes periodically and the signal should be composed of the respiratory signal, the heartbeat signal, and the noise superposition in each cycle. (iii) The waveform shown in Figure 4 is the signal collected by the tester when he holds his breath. It can be seen that the amplitude of signal change is very small, which indicates that the heartbeat signal collected by the radar module is very weak as well as mixed with noise. Therefore, during signal collection, a smart bracelet is added to detect the driver's heartbeat signal in real time which will be recorded every 2 min.   Based on the experimental platform, a set of data collected from the platform have been tested. The corresponding test results are described as follows: (i) the waveform shown in Figure 2 is the signal collected by the radar module without testers. As is seen from the waveform, there is no signal input other than a small amount of noise signal. (ii) The waveform shown in Figure 3 is the signal collected by the radar module with the tester breathing normally. As observed from the waveform, the waveform changes periodically and the signal should be composed of the respiratory signal, the heartbeat signal, and the noise superposition in each cycle. (iii) The waveform shown in Figure 4 is the signal collected by the tester when he holds his breath. It can be seen that the amplitude of signal change is very small, which indicates that the heartbeat signal collected by the radar module is very weak as well as mixed with noise. Therefore, during signal collection, a smart bracelet is added to detect the driver's heartbeat signal in real time which will be recorded every 2 min.     Through the above tests, it is noted that the Doppler radar module is used to detect human respiratory signals. As for the heartbeat signal, several experiments have shown that the heartbeat signal is very weak and is basically covered by noise, such that the Doppler radar module cannot detect the heartbeat signal. Instead, the smart bracelet is used for heart rate collection. The experiment recruited 7 drivers as experimental test subjects, including 6 men and 1 woman, aged between 22 and 30 years old. It requires good health, normal hearing and vision, and no red-green color blindness. Before the experiment, the tester is required to ensure sufficient sleep time. After the debugging of the whole experimental platform, the test personnel will conduct simulated driving, and each test period will last for 3 h. Throughout the three-hour test, the tester will experience different fatigue states. The Doppler radar module and the smart bracelet collect respiratory and heartbeat signals, and synchronizes video signals to record facial features. At the end of each test, a data set containing respiration and heartbeat signals and corresponding video signals will be obtained through each test. The entire data set will be used for the next expert's review before classification into different fatigue levels.

Sample Library
In Section 2, the data are classified through the facial expert evaluation method. This method was first introduced to the driver's fatigue assessment and becomes the most practical method for evaluating the fatigue state of drivers [41]. The specific operation procedure is as follows. Firstly, the Through the above tests, it is noted that the Doppler radar module is used to detect human respiratory signals. As for the heartbeat signal, several experiments have shown that the heartbeat signal is very weak and is basically covered by noise, such that the Doppler radar module cannot detect the heartbeat signal. Instead, the smart bracelet is used for heart rate collection. The experiment recruited 7 drivers as experimental test subjects, including 6 men and 1 woman, aged between 22 and 30 years old. It requires good health, normal hearing and vision, and no red-green color blindness. Before the experiment, the tester is required to ensure sufficient sleep time. After the debugging of the whole experimental platform, the test personnel will conduct simulated driving, and each test period will last for 3 h. Throughout the three-hour test, the tester will experience different fatigue states. The Doppler radar module and the smart bracelet collect respiratory and heartbeat signals, and synchronizes video signals to record facial features. At the end of each test, a data set containing respiration and heartbeat signals and corresponding video signals will be obtained through each test. The entire data set will be used for the next expert's review before classification into different fatigue levels.

Sample Library
In Section 2, the data are classified through the facial expert evaluation method. This method was first introduced to the driver's fatigue assessment and becomes the most practical method for evaluating the fatigue state of drivers [41]. The specific operation procedure is as follows. Firstly, the video signals and the synchronously collected radar and smart bracelet signals are segmented every 2 min and stored randomly. Secondly, three facial experts are selected to score based on multiple indicators such as the rubbing eyes, scratching face, yawning, closing eyes, and adjusting postures, etc. The evaluation result is a continuous value between 0 and 3. The specific classification criteria are described in Table 1. If more than two facial experts judge the same on a certain fatigue level, then the corresponding fatigue level of the driver in the video signal is determined. If the evaluation levels of the three experts are different, then the fatigue level evaluation of this signal needs to be re-evaluated. Finally, three experts will discuss and determine the fatigue level. After the fatigue level is determined, the signals need to be labeled for subsequent neural network learning. The video evaluation results are corresponding to the synchronous radar signals and the smart bracelet signals, which are used as the criteria and basis for fatigue driving evaluation. After classifying all signals according to the expert evaluation mechanism, it is necessary to conduct filtering processing and discrete Fourier transform (DFT) for each group of data. A zero-phase indefinite impulse response (IIR) filter is used for the filtering algorithm, which can completely eliminate the signal phase distortion and improve the real-time performance of detection at the cost of increasing the appropriate computation [42]. After the filtering is completed, DFT processing of the signal will continue to obtain the spectrum diagram. Then, both the frequency and amplitude of the signal are extracted. Figures 3 and 5 show the respiratory signal and its amplitude-frequency characteristics, respectively. the corresponding fatigue level of the driver in the video signal is determined. If the evaluation levels of the three experts are different, then the fatigue level evaluation of this signal needs to be re-evaluated. Finally, three experts will discuss and determine the fatigue level. After the fatigue level is determined, the signals need to be labeled for subsequent neural network learning. The video evaluation results are corresponding to the synchronous radar signals and the smart bracelet signals, which are used as the criteria and basis for fatigue driving evaluation. After classifying all signals according to the expert evaluation mechanism, it is necessary to conduct filtering processing and discrete Fourier transform (DFT) for each group of data. A zero-phase indefinite impulse response (IIR) filter is used for the filtering algorithm, which can completely eliminate the signal phase distortion and improve the real-time performance of detection at the cost of increasing the appropriate computation [42]. After the filtering is completed, DFT processing of the signal will continue to obtain the spectrum diagram. Then, both the frequency and amplitude of the signal are extracted. Figures 3 and 5 show the respiratory signal and its amplitude-frequency characteristics, respectively. Finally, the following characteristic values are determined as the training sample data: , , , where is the respiratory cycle, is the respiratory amplitude, and indicates the heart rate.
The sample library can be built as: Finally, the following characteristic values are determined as the training sample data: R C , R A , H R , where R C is the respiratory cycle, R A is the respiratory amplitude, and H R indicates the heart rate.
The sample library can be built as: where X: input data. T: output label corresponding to X.
h(x): impact function which defined as: where s is the fatigue state with the value as 1, 2, 3 or 4, which correspondingly represents sober state, first-level fatigue state, second-level fatigue state and third-level fatigue state, i is the sample index, and N is the total sample size. A total of 720 sets of respiration and heartbeat data were collected in this experiment. After obtaining the complete data set S, we also need to divide it into training set and test set by using a subject-wise method. Data classification should follow the following three principles: (i) randomly assigned, (ii) training set sample size: test set sample size = 7:3, and (iii) the same number of samples per fatigue level.
In the following section, we will introduce the basic principles of ELM and DE-ELM in detail and further give the DE-ELM-based FDD approach.

Extreme Learning Machine
ELM is an SLFN proposed by Guangbin Huang [43], which consists of the following parts: dimension of input feature vector n, total number of samples N, number of hidden layer neurons L, and dimension of outputs m, data set S = X T . It can be represented by following matrix: Input data: Output label: Hidden layer input weight matrix is W: Hidden layer bias is b: Hidden layer output weight matrix is β: The activation function selected in this paper is sigmoid function which is defined as: Electronics 2020, 9, 1850 8 of 17 By using the activation function, the nonlinear characteristics can be added to make learning faster and more efficient [44]. Thus, we have The output O (i) can be expressed as: The goal of the neural network learning is to minimize the output error That is, there exist β, W j and b j , such that: For the entire training set, Equation (17) can be expressed in matrix form as: where H is the hidden layer output matrix: For fixed input weights and the hidden layer biases, to train an SLFN is simply equivalent to find a least-squares solutionβ of the linear system Hβ = T: According to the minimum norm criterion, the solution is obtained by finding the least squares: where (H) † is the Moore-Penrose generalized inverse of the hidden layer output matrix H. In summary, when inputting training data and randomly initializing the input weight matrix, the output weight matrix can be obtained through Equation (22). The design of the ELM neural network model for fatigue driving detection is shown Figure 6. It is noted that the ELM possesses the advantages of high learning efficiency and strong generalization ability and thus is widely used in classification, regression, clustering, feature learning and other problems [45]. Since the input weights and the hidden layer biases of the ELM are randomly assigned, these weights and biases may not be the optimal choices relative to the input data. For practical applications, in order to enable the neural network to have better generalization performance, more hidden layer neurons may be needed, thereby increasing the complexity of the network. To compensate for these shortcomings, we will introduce differential evolution algorithms in the following section to optimize the weights and biases of the ELM, such that the optimal network structure can be obtained. and the hidden layer biases of the ELM are randomly assigned, these weights and biases may not be the optimal choices relative to the input data. For practical applications, in order to enable the neural network to have better generalization performance, more hidden layer neurons may be needed, thereby increasing the complexity of the network. To compensate for these shortcomings, we will introduce differential evolution algorithms in the following section to optimize the weights and biases of the ELM, such that the optimal network structure can be obtained.

Differential Evolution ELM (DE-ELM)
Differential evolution, proposed by Storn and Price in 1995, is a simple yet powerful evolutionary algorithm (EA) [39]. The basic idea of the optimization algorithm is as follows: starting from a randomly generated initial population, a new individual is generated by summing the vector difference of any two individuals in the population with a third individual, and then comparing the new individual with the corresponding individual in the contemporary population. The

Differential Evolution ELM (DE-ELM)
Differential evolution, proposed by Storn and Price in 1995, is a simple yet powerful evolutionary algorithm (EA) [39]. The basic idea of the optimization algorithm is as follows: starting from a randomly generated initial population, a new individual is generated by summing the vector difference of any two individuals in the population with a third individual, and then comparing the new individual with the corresponding individual in the contemporary population. The corresponding fitness is better than the fitness of the current individual, so the new individual will replace the old individual in the next generation, otherwise the old individual will still be saved. Through continuous evolution, it will keep the good individuals, eliminate the bad individuals, and guide the search to the optimal solution. Compared with most of the available evolutionary algorithms, it exhibits the advantages of simple structure, fast convergence, few adjustable parameters, and strong robustness, etc.
Next, we show the detailed mathematical description of DE algorithm in the following.
Step 1: Initialization. We randomly generate NP individuals to form the primary population, where D is the dimension of the population. The i-th individual θ i (g) in the g-th iteration can be marked as: θ i (g) = θ i,j (g) , i = 1, 2, · · · , NP; j = 1, 2 · · · D The value of the j-th dimension of the i-th individual θ i,j (g) can be obtained by the following equation: θ i,j (g) = θ min + rand(0, 1)·(θ max − θ min ) (24) where θ max and θ min represent the upper and lower bounds of each parameter: θ min ≤ θ i,j (g) ≤ θ max and rand(0, 1) represents a random number uniformly distributed in the interval (0,1).
Step 2: Individual Evaluation. In this step, the entire population is evaluated, that is, the fitness function value of each individual in the population is calculated.
Step 3: Mutation Operation. DE achieves the mutation of individuals through a differential strategy, which is also an important difference from genetic algorithms. The differential strategy used in this paper is to randomly select two different individuals in the population, scale their vector differences, and perform vector synthesis with the individuals that need to be mutated, that is: where r 1 , r 2 , r 3 are randomly chosen in the range [1, NP], with r 1 r 2 r 3 i, θ r2 (g) − θ r3 (g) is the differential variation, v i (g + 1) is the new mutation individual, and the constant factor F is a scaling parameter, which is used to control the amplification of the differential variation.
In the mutation process, in order to ensure the validity of the solution, it must be determined whether the parameters of each individual are between the maximum and minimum values. If this condition is not met, they will be regenerated by using Equation (24).
Step 4: Crossover Operation. The crossover to differential evolution algorithm is introduced for the sake of increasing the diversity of generation. Crossover operation is described as follows: where CR is the crossover probability and j rand is a random integer generated in the set {1, 2, . . . , D}.
Step 5: Selection Operation. The purpose of this step is to generate individuals of the population in g + 1 generation. Among the target individual θ i (g) and u i (g + 1) obtained in the previous step, the one with better effect is selected as the individual θ i (g + 1) of the g + 1 generation population according to the fitness function: where f is the fitness function. The smaller fitness function value is selected as the individual of the g + 1 generation population, which is used to replace the previous individual. Meanwhile, g = g + 1.
Step 6: Stop Test. Judge whether the termination condition is reached or the maximum evolutionary algebra is reached. If so, the evolution is terminated, and the optimal parameters obtained at this time are output as the solution. Otherwise, the program will jump to Step 2 for re-execution.
In order to reduce the number of hidden layer neurons and improve the generalization performance of the neural network, the global optimization capability of the DE algorithm is applied to the reasonable selection of the input weights and the hidden layer biases of the ELM. Figure 7 shows the algorithm flow of the DE-ELM. Then, the optimization problem to be solved is to perform min f (x), where f (x) is the fitness function. Suppose that the cost function (E) is root mean squared error (RMSE) [46]: The RMSE on the whole training dataset is used as the fitness function. In the following section, we will carry out experiments to compare the classification effect of ELM, DE-ELM and SVM on the fatigue driving dataset.
Remark 1: Before training, we need to determine the parameters of the DE algorithm and the corresponding parameter selection criteria is given as follows. (i) The population size NP refers to the number of individuals in the population. When the population is large, the entire population exhibits diversity, which makes a larger search space and greater possibility of searching for the optimal solution, but the convergence rate will be reduced. On the other hand, when the population is small, the convergence rate is fast, but sometimes the global optimal solution cannot be obtained. (ii) The scaling parameter F is used to control the amplification of the differential variation, which plays a moderating role in the local search and global search of the algorithm. When F has large value, differential variation will have big impact on the mutation individual seen from (25) resulting in large disturbances, which is beneficial to maintain population diversity and global search capabilities. However, the search efficiency will be lower and the accuracy of the global optimal solution obtained will be lower. Smaller value of F may lead to loss of population diversity and the algorithm is prone to fall into a local optimum causing early convergence. (iii) Crossover probability CR can determine whether members in a population perform crossover operations, which has an important impact on population diversity. Electronics 2020, 9, x FOR PEER REVIEW 10 of 17 The RMSE on the whole training dataset is used as the fitness function. In the following section, we will carry out experiments to compare the classification effect of ELM, DE-ELM and SVM on the fatigue driving dataset. Remark 1: Before training, we need to determine the parameters of the DE algorithm and the corresponding parameter selection criteria is given as follows. (i) The population size refers to the number of individuals in the population. When the population is large, the entire population exhibits diversity, which makes a larger search space and greater possibility of searching for the optimal solution, but the convergence rate will be reduced. On the other hand, when the population is small, the convergence rate is fast, but sometimes the global optimal solution cannot be obtained. (ii) The scaling parameter is used to control the amplification of the differential variation, which plays a moderating role in the local search and global search of the algorithm. When has large value, differential variation will have big impact on the mutation individual seen from (25) resulting in large disturbances, which is beneficial to maintain population diversity and global search capabilities. However, the search efficiency will be lower and the accuracy of the global optimal solution obtained will be lower. Smaller value of may lead to loss of population diversity and the algorithm is prone to fall into a local optimum causing early convergence. (iii) Crossover probability can determine whether members in a population perform crossover operations, which has an important impact on population diversity.

Results
In the experiments, after many trials and comparisons, the following parameters for the DE algorithm were determined: = 20, = 0.7, = 0.8, = 30. The experimental results are listed in Table 2. As seen from Table 2, with the increase in the hidden layer nodes number, the training accuracy and the testing accuracy of both the ELM and the DE-ELM are well improved. For the DE-ELM, when the number of hidden layer nodes is 150, it has better performance than the ELM with 200 nodes, in terms of both the training accuracy and the testing accuracy. It can also be seen that the DE-ELM using fewer hidden layer nodes can achieve better classification results than the ELM with more hidden layer nodes. The DE-ELM not only reduces network complexity, but also

Results
In the experiments, after many trials and comparisons, the following parameters for the DE algorithm were determined: NP = 20, F = 0.7, CR = 0.8, g max = 30. The experimental results are listed in Table 2. As seen from Table 2, with the increase in the hidden layer nodes number, the training accuracy and the testing accuracy of both the ELM and the DE-ELM are well improved. For the DE-ELM, when the number of hidden layer nodes is 150, it has better performance than the ELM with 200 nodes, in terms of both the training accuracy and the testing accuracy. It can also be seen that the DE-ELM using fewer hidden layer nodes can achieve better classification results than the ELM with more hidden layer nodes. The DE-ELM not only reduces network complexity, but also achieves stronger generation ability. Moreover, when the number of hidden layer nodes increases to 150 and 200, the training and test accuracies of SVM cannot compete with those of ELM and DE-ELM. In order to further verify the effects of three approaches on the test set, the classification results on each fatigue state are shown in Figures 8-14, where the category labels of 1-4 represent the driver's sober state, first-level fatigue state, second-level fatigue state and third-level fatigue state, respectively. The classification results of the ELM and the DE-ELM for 100, 150, and 200 hidden layer nodes are shown in Figures 8-13, while the classification results of the SVM are shown in Figure 14. It can be seen that, for the test set samples, the DE-ELM prediction outputs match with the actual outputs much better than the ELM and the SVM. In order to clearly evaluate the classification accuracy of the fatigue state, Table 3 shows the classification accuracy of three approaches for each state on the test set when the number of hidden layer nodes is 200. It is worth noting that the recognition rate of the DE-ELM for various fatigue states all exceed 90%, which achieves the best classification performance. In detail, although the ELM obtains similar performance to the DE-ELM for the first and second level fatigue state, its classification accuracy for sober state and third-level fatigue state are not as good as DE-ELM. In addition, the classification accuracies of the DE-ELM are a lot better than the ones of the SVM for third-level fatigue states, except for the sober state in which similar accuracy is obtained. It clearly demonstrates that the developed DE-ELM method in this paper exhibits the most excellent classification performance of fatigue driving dataset compared to its ELM and SVM counterparts.            Please note that the criterion for determining the level of fatigue driving in this article is based on the evaluation method of facial video experts. This method only subjectively recognizes and judges the facial expression movement characteristics of the tester. It may not accurately and objectively determine the driver's fatigue state, which also leads to a lower recognition rate of fatigue level in this experiment. The more input feature values and training samples of a classification model we have, the higher the classification accuracy we will obtain. Due to the limited conditions of this experiment, fewer input feature values and insufficient training samples have limited the recognition rate of fatigue level in this paper. Further studies on determining the fatigue level and the selection of input feature values will be carried out to obtain a higher recognition rate for fatigue driving.

Conclusions
In this paper, the ELM-based FDD algorithm has been developed to determine the driver's fatigue status. Considering that the input weights and hidden layer biases are randomly obtained in the ELM leading to the degraded generalization performance, the weights and biases are optimized via the DE algorithm, such that the sensitivity of neurons is increased and the classification accuracy can be improved. The driver's respiratory and heartbeat signals are collected by Doppler radar and the smart bracelet in this work, which has little impact on the driver's normal operation. Experimental studies have demonstrated that the DE-ELM has significantly improved the accuracy of driver's fatigue state detections compared to the traditional ELM and SVM approaches.  Please note that the criterion for determining the level of fatigue driving in this article is based on the evaluation method of facial video experts. This method only subjectively recognizes and judges the facial expression movement characteristics of the tester. It may not accurately and objectively determine the driver's fatigue state, which also leads to a lower recognition rate of fatigue level in this experiment. The more input feature values and training samples of a classification model we have, the higher the classification accuracy we will obtain. Due to the limited conditions of this experiment, fewer input feature values and insufficient training samples have limited the recognition rate of fatigue level in this paper. Further studies on determining the fatigue level and the selection of input feature values will be carried out to obtain a higher recognition rate for fatigue driving.

Conclusions
In this paper, the ELM-based FDD algorithm has been developed to determine the driver's fatigue status. Considering that the input weights and hidden layer biases are randomly obtained in the ELM leading to the degraded generalization performance, the weights and biases are optimized via the DE algorithm, such that the sensitivity of neurons is increased and the classification accuracy can be improved. The driver's respiratory and heartbeat signals are collected by Doppler radar and the smart bracelet in this work, which has little impact on the driver's normal operation. Experimental studies have demonstrated that the DE-ELM has significantly improved the accuracy of driver's fatigue state detections compared to the traditional ELM and SVM approaches.

Conflicts of Interest:
The authors declare no conflict of interest.