Driving Drowsiness Detection with EEG Using a Modiﬁed Hierarchical Extreme Learning Machine Algorithm with Particle Swarm Optimization: A Pilot Study

: Driving fatigue accounts for a large number of tra ﬃ c accidents in modern life nowadays. It is therefore of great importance to reduce this risky factor by detecting the driver’s drowsiness condition. This study aimed to detect drivers’ drowsiness using an advanced electroencephalography (EEG)-based classiﬁcation technique. We ﬁrst collected EEG data from six healthy adults under two di ﬀ erent awareness conditions (wakefulness and drowsiness) in a virtual driving experiment. Five di ﬀ erent machine learning techniques, including the K-nearest neighbor (KNN), support vector machine (SVM), extreme learning machine (ELM), hierarchical extreme learning machine (H-ELM), and the proposed modiﬁed hierarchical extreme learning machine algorithm with particle swarm optimization (PSO-H-ELM), were applied to classify the subject’s drowsiness based on the power spectral density (PSD) feature extracted from the EEG data. The mean accuracies of the ﬁve classiﬁers were 79.31%, 79.31%, 74.08%, 81.67%, and 83.12%, respectively, demonstrating the superior performance of our new PSO-H-ELM algorithm in detecting drivers’ drowsiness, compared to the other techniques.


Introduction
The rapid development of industrial technology has turned cars into a popular means of transportation.Simultaneously, the fierce competition in modern society pushes people to work hard, leading to excessive fatigue in daily life.Working under fatigue not only decreases overall efficiency, but also damages bodily health and can even cause accidents [1].In recent years, fatigue driving has emerged as a major cause of traffic accidents.Drivers attempting to travel long distances often operate their vehicle in a fatigued or drowsy condition.In addition, driving time is not the only cause of fatigue; other factors, for instance, monotonous environments, sleep deprivation, chronic sleepiness, and drug and alcohol use are also related to fatigue [2,3].Driving fatigue can lead to devastating accidents, especially in cases where drivers fall asleep while the car is still moving.Drowsiness detection is therefore an important topic in the field of driving safety, with the aim of decreasing the frequency and severity of traffic accidents.
Over the last few decades, increasing efforts have been made to investigate the effective detection of driving fatigue, resulting in a number of methods being proposed for this particular application [4].At present, the primary fatigue detection methods mainly comprise three categories.The first category generally focuses on behavioral features by detecting the variability of lateral lane position and vehicle heading difference to monitor the driver's mental condition [5].This approach is conceptually straightforward, but a standard for these features has not yet been well established, rendering it difficult to implement.Image-based facial expression recognition, such as the degree of resting eye closure or nodding frequency, represents the second type of approach for detecting driving fatigue [6,7].Although these measures are easy to use, the detection of fatigue-linked features is subject to multiple factors, including image angle and image illumination, which lower their overall identification accuracy and limit their effectiveness in practical applications.The third type of approach mainly relies on physiological features, including electrocardiogram (ECG), heart rate [8], electromyogram (EMG), and electroencephalogram (EEG)-based features to detect driving fatigue [9].Unlike the behavioral and facial features, physiological features serve as objective markers of inherent changes in the human body in response to surrounding conditions.Among these signals, EEG is considered to be the most straightforward, effective, and suitable one for detecting driving fatigue [10].
The most accessible equipment for collecting EEG signals is a wearable EEG acquisition device that allows a portable setup to be applied to measure EEG signals and determine whether the driver is in a fatigued condition or not.Classification algorithms, for instance, K-nearest neighbor (KNN) and support vector machine (SVM), are commonly used in this field.The classification efficiency of these classifiers, however, is not ideal due to the complexity of computation.As a result, in recent years some research groups have turned to algorithms that feature high computational efficiency, such as the extreme learning machine (ELM) [11].However, the shallow architecture of the ELM may render it ineffective when attempting to learn features from natural signals, regardless of the quantity of hidden nodes.Considering this issue, an enhanced ELM-based hierarchical learning algorithm framework has been proposed by Huang et al. [12], known as a hierarchical extreme learning machine (H-ELM).When compared with other multilayer perceptron (MLP) training approaches, such as the traditional ELM, the H-ELM has a faster training speed as well as higher accuracy.The high-level representation achieved by the H-ELM primarily benefits from a layer-wise encoding architecture which extracts multilayer sparse features from the input data.A detailed description of this method will be provided in Section 2.6 Conversely, the L2 penalty and the scaling factor of the H-ELM classifiers are usually chosen according to empirical data, ignoring the importance of optimizing the parameters and performance of the classifier.Therefore, there is clearly a need to improve the performance of H-ELM by selecting optimal parameters.
Taking these together, in this study we proposed a new algorithm based on the H-ELM classifier and the particle swarm optimization (PSO) algorithm to improve classification accuracy.The PSO algorithm has been previously proven to be able to effectively select the best parameters for the classifiers [13].By combining the PSO with the unique characteristics of EEG signals, the parameters of the H-ELM kernel function can be optimized, thereby improving the performance of driving fatigue detection.

Participants
Six male volunteers (right-handed, average age 24 years) with valid drivers' licenses were recruited to participate in a simulated driving experiment.The research ethics board of Hangzhou Dianzi University approved the experiment and it was performed according to the Declaration of Helsinki.All subjects were physically and psychologically healthy, with regular sleep patterns.

Experiment
In this study, we used driving simulation equipment with related software to simulate the real driving scenario.The platform was an advanced simulation system (Shanghai Infrared Automobile Simulator Driving Equipment Co., Ltd, Shanghai, China), which can simulate surrounding scenes as well as the response of driving.The driving simulation platform consisted of a steering wheel, brake and accelerator pedals, a high-definition monitor, a high-performance computer with driving simulation software, and EEG signal recording equipment (Brain Products GmbH, Germany).The platform was established to dynamically record EEG signals during the experiment and monitor vehicle driving as well as operating status.
In order to obtain EEG data for fatigue and awake states, each subject was required to sleep for four hours (sleep deprivation) or eight hours (adequate sleep), respectively, on the day immediately before the collection of EEG data.As such, EEG data were collected twice from each participant separately.The experiments were designed to start at 9:00 a.m.uniformly on each experimental day as scheduled, and the EEG data were recorded continuously for 20 min during the driving in the driving simulation environment.Specifically, in order to induce drowsiness in the subjects in the fatigue group, we chose a long highway that was straight with smooth curves.Conversely, the road for the awake group was relatively complex to keep the subjects awake.During the experiment, we evaluated the subject's drowsiness using a micro-camera based on the driving fatigue criteria (more than two seconds' eye closure, head nodding, and large deviation off the road) described in the literature [14,15].

EEG Data Acquisition
In our experiment, subjects were required to sit on a chair and drive on the simulation platform while EEG signals were recorded from 32 EEG electrodes positioned on each subject's head surface with a sampling rate of 1000 Hz.EEG data for the fatigue group were recorded for 20 min after the subjects had fatigue symptoms.For the subjects who did not show fatigue symptoms after 60 min driving, the experiments were resumed a few days later.The temperature of the laboratory was maintained at 22 • C, which was suitable for the subjects.Figure 1 shows the driving simulation platform of our experiment.

Experiment
In this study, we used driving simulation equipment with related software to simulate the real driving scenario.The platform was an advanced simulation system (Shanghai Infrared Automobile Simulator Driving Equipment Co., Ltd, Shanghai, China), which can simulate surrounding scenes as well as the response of driving.The driving simulation platform consisted of a steering wheel, brake and accelerator pedals, a high-definition monitor, a high-performance computer with driving simulation software, and EEG signal recording equipment (Brain Products GmbH, Germany).The platform was established to dynamically record EEG signals during the experiment and monitor vehicle driving as well as operating status.
In order to obtain EEG data for fatigue and awake states, each subject was required to sleep for four hours (sleep deprivation) or eight hours (adequate sleep), respectively, on the day immediately before the collection of EEG data.As such, EEG data were collected twice from each participant separately.The experiments were designed to start at 9:00 a.m.uniformly on each experimental day as scheduled, and the EEG data were recorded continuously for 20 min during the driving in the driving simulation environment.Specifically, in order to induce drowsiness in the subjects in the fatigue group, we chose a long highway that was straight with smooth curves.Conversely, the road for the awake group was relatively complex to keep the subjects awake.During the experiment, we evaluated the subject's drowsiness using a micro-camera based on the driving fatigue criteria (more than two seconds' eye closure, head nodding, and large deviation off the road) described in the literature [14,15].

EEG Data Acquisition
In our experiment, subjects were required to sit on a chair and drive on the simulation platform while EEG signals were recorded from 32 EEG electrodes positioned on each subject's head surface with a sampling rate of 1000 Hz.EEG data for the fatigue group were recorded for 20 min after the subjects had fatigue symptoms.For the subjects who did not show fatigue symptoms after 60 min driving, the experiments were resumed a few days later.The temperature of the laboratory was maintained at 22 °C, which was suitable for the subjects.Figure 1 shows the driving simulation platform of our experiment.
In this study we defined a 10-s segment of EEG data as a sample, creating 240 samples for each subject and 1440 samples in each group.Among these, 240 samples from one subject were set aside as a testing set, while the remaining 1200 samples from the other five subjects were used as a training set.

Feature Extraction
Features were then extracted from the preprocessed and segmented EEG samples.In this study we adopted the power spectral density (PSD) of the EEG samples as features since they represent the distribution of the EEG frequencies.The PSD values were computed by short-time Fourier transforms (STFT) for each EEG channel and each frequency band.A total of 160 PSD datapoints (32 electrodes × 5 frequency bands) were thereby extracted for each EEG sample.Specifically, we adopted a Hanning window-based discrete STFT algorithm to extract the PSD features.
Briefly, for the EEG signal recorded by an electrode, denoted as where w k = 2πk N is the angular frequency and k = 0, 1, • • • , N − 1.In this case, m represents the width of windows function in time domain, m is discrete and ω is continuous, w[n] is a window function, N represents the number of sampling point.The definition equation of a Hanning window is as follows: The Fourier transform equation is given as: where 1), and PSD is defined as: The energy spectrum of each EEG frequency band could then be defined as: where w[n] = {w 1 , w 2 , • • • , w n } represent all the frequencies in one frequency band.Figure 2 shows an example of the comparison of raw EEG signals and the PSD between the awake and fatigue stages.Ten-second recorded EEG signals recorded from a participant from the awake group and fatigue group, respectively, and their PSD were compared.For the original signal, the x-axis represents the datapoint, and the y-axis represents the amplitude.For the PSD signal, the x-axis represents the frequency, and the y-axis represents the signal's power content versus frequency.It can be seen that it is easier to distinguish the fatigue state from the awake state in the frequency domain, and the main distinction is concentrated between 0 and 40 Hz.

Classical ELM and PSO-H-ELM
The classical extreme learning machine (ELM) was proposed for classification or regression tasks.It is a feedforward neural network with a single layer of hidden nodes.The weights connecting each node were assigned randomly initially.Specifically, the weights between input neurons and hidden nodes were instant, while the weights between hidden nodes and output neurons upgraded gradually by the linear model.This design accelerates the learning speed greatly with favorable generalization performance.
where  weight vector connecting the ith hidden node with input neurons, the weight vector between the i th hidden node and the output nodes is defined by , , , and is the bias of the ith hidden node.  g x is the nonlinear activation function, for instance, a sigmoid or Gaussian function.The standard SLFNs can approximate the target output i t with zero error means.
For convenience, this equation could be written compactly as: , , In this definition, ( ) h   is a feature mapping function which turns the N -dimensional data into the m-dimensional feature, and represent the factors of the mapping function.
The original ELM algorithm approximates target output by adjusting weights connecting input and hidden layer as well as biases of hidden nodes.Specifically, the weights are filled with random values when the activation function is infinitely differentiable.
According to the above introduction, the initial hidden nodes of the original ELM algorithm are

Classical ELM and PSO-H-ELM
The classical extreme learning machine (ELM) was proposed for classification or regression tasks.It is a feedforward neural network with a single layer of hidden nodes.The weights connecting each node were assigned randomly initially.Specifically, the weights between input neurons and hidden nodes were instant, while the weights between hidden nodes and output neurons upgraded gradually by the linear model.This design accelerates the learning speed greatly with favorable generalization performance.
Assume there are N samples in a training set (x i ,t i ), where x i = [x i1 , x i2 , . . .x in ] T represents each sample and its corresponding network target vector is t i = [t i1 , t i2 , . . .t im ] T .The classical single hidden layer feedforward neural network (SLFN) is mathematically defined as: where N represents hidden nodes, g(x) is activation function, w i = [w i1 , w i2 , . . .w in ] T represents the weight vector connecting the ith hidden node with input neurons, the weight vector between the ith hidden node and the output nodes is defined by and b is the nonlinear activation function, for instance, a sigmoid or Gaussian function.The standard SLFNs can approximate the target output t i with zero error means.For convenience, this equation could be written compactly as: Hβ=T In this definition, h θ (•) is a feature mapping function which turns the N-dimensional data into the m-dimensional feature, and θ = {w, b} represent the factors of the mapping function.
The original ELM algorithm approximates target output by adjusting weights connecting input and hidden layer as well as biases of hidden nodes.Specifically, the weights are filled with random values when the activation function is infinitely differentiable.
According to the above introduction, the initial hidden nodes of the original ELM algorithm are assigned randomly.When it comes to a multilayer perceptron (MLP), the ELM algorithm needs to combine the auto encoder function.The auto encoder function extracts hidden features through the feature map, with which ELM could approximate the output by minimizing error.The framework of H-ELM is multilayer, as presented in Figure 3. Every circle illustrated in Figure 3 represents a neuron of a neural network, and every matrix of those circles represents a hidden layer.The last column in Figure 3 is the prediction of H-ELM framework.
Electronics 2020, 8, x FOR PEER REVIEW 6 of 11 combine the auto encoder function.The auto encoder function extracts hidden features through the feature map, with which ELM could approximate the output by minimizing error.The framework of H-ELM is multilayer, as presented in Figure 3. Every circle illustrated in Figure 3 represents a neuron of a neural network, and every matrix of those circles represents a hidden layer.The last column in Figure 3 is the prediction of H-ELM framework.

Input
Original ELM Multilayer forward encoding (deep architecture) Normally, deep learning algorithms apply greedy layer-wise architecture to train the data, while the H-ELM algorithm divides it into two separate phases.The first phase refers to the processing of unsupervised hierarchical feature representation.Through the process of unsupervised learning within the new ELM-based auto encoder, we could transform the input into a random feature space.This could be aided by taking advantage of the hidden feature of training data.The auto encoder is designed through using a sparse constraint with an added approximation capability, and is therefore known as the ELM sparse auto encoder.Each hidden layer of the H-ELM works like an individual unit which extracts features; increasing the number of layers in this method will consequently increase the number of extracted features.By adopting an N -layer unsupervised learning method, the H-ELM algorithm obtains higher sparse features.The output of each hidden layer is shown as: where i Η represents the output of the i th layer,

 
g  represents the activation function, and  is the output weights.
The output weights  are calculated as: where is the hidden layer outputs matrix and are output data.The L2 penalty C is preset to balance the distance of separating margin and the training error [20].Specifically, the value of L1 is optimized in order to extract more sparse hidden features, which is different from reference [21] where L2-norm singular values are optimized.
The next phase of H-ELM is supervised feature classification.Specifically, we applied the traditional ELM regression for the classification.The auto encoder approximates the inputs to outputs through the least mean square method, while the H-ELM algorithm offers a better solution with improved accuracy and speed.In this phase, the scaling factor S is used to adjust the max value of training output, and the value of s is specified by the user.
Figure 4 depicts the testing accuracies of H-ELM in the C and S subspaces, and it can be seen Normally, deep learning algorithms apply greedy layer-wise architecture to train the data, while the H-ELM algorithm divides it into two separate phases.The first phase refers to the processing of unsupervised hierarchical feature representation.Through the process of unsupervised learning within the new ELM-based auto encoder, we could transform the input into a random feature space.This could be aided by taking advantage of the hidden feature of training data.The auto encoder is designed through using a sparse constraint with an added approximation capability, and is therefore known as the ELM sparse auto encoder.Each hidden layer of the H-ELM works like an individual unit which extracts features; increasing the number of layers in this method will consequently increase the number of extracted features.By adopting an N-layer unsupervised learning method, the H-ELM algorithm obtains higher sparse features.The output of each hidden layer is shown as: where H i represents the output of the ith layer, H i−1 represents the output of the (i − 1)th layer, g(•) represents the activation function, and β is the output weights.The output weights β are calculated as: where The L2 penalty C is preset to balance the distance of separating margin and the training error [20].Specifically, the value of L1 is optimized in order to extract more sparse hidden features, which is different from reference [21] where L2-norm singular values are optimized.
The next phase of H-ELM is supervised feature classification.Specifically, we applied the traditional ELM regression for the classification.The auto encoder approximates the inputs to outputs through the least mean square method, while the H-ELM algorithm offers a better solution with improved accuracy and speed.In this phase, the scaling factor S is used to adjust the max value of training output, and the value of s is specified by the user.
Figure 4 depicts the testing accuracies of H-ELM in the C and S subspaces, and it can be seen that the performance of H-ELM is obviously related to parameters C and S. It is therefore of great value to select the most appropriate values for these two parameters by using optimization algorithms to achieve the best performance of H-ELM.In this study we adopted particle swarm optimization (PSO) to optimize the H-ELM algorithm.
Electronics 2020, 8, x FOR PEER REVIEW 7 of 11 algorithms to achieve the best performance of H-ELM.In this study we adopted particle swarm optimization (PSO) to optimize the H-ELM algorithm.The PSO is one of the 'swarm intelligence techniques', which are proposed by the observation of biotic communities in nature.The PSO algorithm itself was based on the Boid model, which imitates the flying behavior of bird flocks [22,23].With the PSO model, each particle's position is described by a position vector and speed vector, which respectively represent the solutions of problem and movement within the search space.The particles remember the best position where they reached the highest accuracy during their past states and use this information to guide their search of the solution space.
The kernel of the PSO algorithm contains constantly updated values for the speed and position of each particle.These equations are defined as: where   is the inertia weight, 1 c and 2 c are the acceleration constants, 1 r and 2 r are uniformly distributed random numbers between   0,1 ,   ij v t represents the speed of the j th dimension of i th particle of the t th generation.
  ij pbest t describes the best position of the j th dimension of i th particle of the t th generation, while   ij gbest t represents the best position of the whole swarm.
In this part, we adopted the classic PSO algorithm to find the best L2 penalty for the last layer of the ELM as well as the best scaling factor for the original ELM.Root mean squared error (RMSE) is a common criterion to measure the performance of classification or regression.Mathematically, the RMSE is defined as: where i t is the target of training data, i i Ηβ represents the output, and N represents the number of subjects.The PSO is one of the 'swarm intelligence techniques', which are proposed by the observation of biotic communities in nature.The PSO algorithm itself was based on the Boid model, which imitates the flying behavior of bird flocks [22,23].With the PSO model, each particle's position is described by a position vector and speed vector, which respectively represent the solutions of problem and movement within the search space.The particles remember the best position where they reached the highest accuracy during their past states and use this information to guide their search of the solution space.

Classification
The kernel of the PSO algorithm contains constantly updated values for the speed and position of each particle.These equations are defined as: where ξ ω is the inertia weight, c 1 and c 2 are the acceleration constants, r 1 and r 2 are uniformly distributed random numbers between [0, 1], v ij (t) represents the speed of the jth dimension of ith particle of the tth generation.pbest ij (t) describes the best position of the jth dimension of ith particle of the tth generation, while gbest ij (t) represents the best position of the whole swarm.
In this part, we adopted the classic PSO algorithm to find the best L2 penalty for the last layer of the ELM as well as the best scaling factor for the original ELM.Root mean squared error (RMSE) is a common criterion to measure the performance of classification or regression.Mathematically, the RMSE is defined as: where t i is the target of training data, H i β i represents the output, and N represents the number of subjects.

Classification
In order to evaluate the performance of the proposed PSO-H-ELM, in this study we tested the classification performance in the driving fatigue detection using five different classifiers, including the K-nearest neighbor (KNN), support vector machine (SVM), extreme learning machine (ELM), hierarchical extreme learning machine (H-ELM), and the proposed PSO-H-ELM.The KNN, ELM, and, especially, SVM, are all widely used in the field.As mentioned above, 240 samples from one subject were defined as the testing data, whereas the remaining 1200 samples of the five other subjects were training data.This arrangement was adopted to avoid any possible confounds from the random selection of training and test data.All processing was implemented in the MATLAB 2017a environment on a PC with a 3.4 GHz processor and 8.0 GB RAM.

Results
In this study, the initial values of C and S were set to 2 × 10 −30 and 0.8, respectively.The inertia weight ξ was 0.6 and the acceleration constants c 1 and c 2 were set to 1.6 and 1.5, respectively.The number of particles was set to 1000, and the maximum number of population iterations was set to 20.
Average accuracies for the five classifiers were calculated and presented in Figure 5.We further performed paired t-tests to determine if the performance of the PSO-H-ELM significantly outperforms other classifiers.
Electronics 2020, 8, x FOR PEER REVIEW 8 of 11 hierarchical extreme learning machine (H-ELM), and the proposed PSO-H-ELM.The KNN, ELM, and, especially, SVM, are all widely used in the field.As mentioned above, 240 samples from one subject were defined as the testing data, whereas the remaining 1200 samples of the five other subjects were training data.This arrangement was adopted to avoid any possible confounds from the random selection of training and test data.All processing was implemented in the MATLAB 2017a environment on a PC with a 3.4 GHz processor and 8.0 GB RAM.

Results
In this study, the initial values of C and S were set to 2 × 10 −30 and 0.8, respectively.The inertia weight  was 0.6 and the acceleration constants 1 c and 2 c were set to 1.6 and 1.5, respectively.
Average accuracies for the five classifiers were calculated and presented in Figure 5.We further performed paired t-tests to determine if the performance of the PSO-H-ELM significantly outperforms other classifiers.

Discussion
As shown in Figure 6, the PSO-H-ELM algorithm achieved the highest average accuracy, indicating that the proposed PSO-H-ELM is superior to others in the detection of driving fatigue.By comparing the performance of H-ELM with the proposed PSO-H-ELM in different cases, we found that the selection of training data has no appreciable influence on the performance of the proposed algorithm.Despite exhibiting a higher mean accuracy in the proposed PSO-H-ELM, no significant differences were found between the proposed PSO-H-ELM and the KNN and SVM.Significant differences were found, however, when the PSO-H-ELM was compared to the ELM and H-ELM algorithms, indicating that the optimization of the L2 penalty and scaling factor does improve the performance of the H-ELM algorithm.The results of the statistical analysis are shown in Table 1.
algorithm.Despite exhibiting a higher mean accuracy in the proposed PSO-H-ELM, no significant differences were found between the proposed PSO-H-ELM and the KNN and SVM.Significant differences were found, however, when the PSO-H-ELM was compared to the ELM and H-ELM algorithms, indicating that the optimization of the L2 penalty and scaling factor does improve the performance of the H-ELM algorithm.The results of the statistical analysis are shown in Table 1.Across traditional classification methods tested in this study, the H-ELM outperforms the SVM.With the addition of the PSO algorithm, the PSO-H-ELM further optimizes the performance of the classifier in fatigue detection, which provides inspiration for other studies.

Limitations
One of the limitations of this pilot study was the small sample size (n = 6).Our human experiment has been suspended due to the COVID-19 outbreak.The promising results achieved based on such a small sample size, however, enable important adjustments for a large-scale future study when the situation allows.Across traditional classification methods tested in this study, the H-ELM outperforms the SVM.With the addition of the PSO algorithm, the PSO-H-ELM further optimizes the performance of the classifier in fatigue detection, which provides inspiration for other studies.

Limitations
One of the limitations of this pilot study was the small sample size (n = 6).Our human experiment has been suspended due to the COVID-19 outbreak.The promising results achieved based on such a small sample size, however, enable important adjustments for a large-scale future study when the situation allows.

Figure 1 .
Figure 1.The experimental setup with the driving platform and electroencephalogram (EEG) recording system.

Figure 1 .
Figure 1.The experimental setup with the driving platform and electroencephalogram (EEG) recording system.

11 Figure 2 .
Figure 2. The comparison of raw EEG signals and their power spectral density (PSD) between the awake and fatigue stages.

Figure 2 .
Figure 2. The comparison of raw EEG signals and their power spectral density (PSD) between the awake and fatigue stages.

Figure 3 .
Figure 3.The overall framework of the hierarchical extreme learning machine (H-ELM) learning algorithm.

Figure 3 .
Figure 3.The overall framework of the hierarchical extreme learning machine (H-ELM) learning algorithm.

Figure 4 .
Figure 4. Testing accuracy in (C, S) subspace for the H-ELM.

Figure 4 .
Figure 4. Testing accuracy in (C, S) subspace for the H-ELM.

Figure 5 .
Figure 5.The classification accuracies achieved by K-nearest neighbor (KNN), support vector machine (SVM), extreme learning machine (ELM), H-ELM, and proposed modified hierarchical extreme learning machine algorithm with particle swarm optimization (PSO-H-ELM) with respect to different training subjects.

Figure 5 .
Figure 5.The classification accuracies achieved by K-nearest neighbor (KNN), support vector machine (SVM), extreme learning machine (ELM), H-ELM, and proposed modified hierarchical extreme learning machine algorithm with particle swarm optimization (PSO-H-ELM) with respect to different training subjects.
Assume there are N samples in a training set     i t .The classical single hidden layer feedforward neural network (SLFN) is mathematically defined as:

Table 1 .
Comparison of accuracies of classifiers.

Table 1 .
Comparison of accuracies of classifiers.