Development of Filtered Bispectrum for EEG Signal Feature Extraction in Automatic Emotion Recognition Using Artiﬁcial Neural Networks

: The development of automatic emotion detection systems has recently gained signiﬁcant attention due to the growing possibility of their implementation in several applications, including affective computing and various ﬁelds within biomedical engineering. Use of the electroencephalograph (EEG) signal is preferred over facial expression, as people cannot control the EEG signal generated by their brain; the EEG ensures a stronger reliability in the psychological signal. However, because of its uniqueness between individuals and its vulnerability to noise, use of EEG signals can be rather complicated. In this paper, we propose a methodology to conduct EEG-based emotion recognition by using a ﬁltered bispectrum as the feature extraction subsystem and an artiﬁcial neural network (ANN) as the classiﬁer. The bispectrum is theoretically superior to the power spectrum because it can identify phase coupling between the nonlinear process components of the EEG signal. In the feature extraction process, to extract the information contained in the bispectrum matrices, a 3D pyramid ﬁlter is used for sampling and quantifying the bispectrum value. Experiment results show that the mean percentage of the bispectrum value from 5 × 5 non-overlapped 3D pyramid ﬁlters produces the highest recognition rate. We found that reducing the number of EEG channels down to only eight in the frontal area of the brain does not signiﬁcantly affect the recognition rate, and the number of data samples used in the training process is then increased to improve the recognition rate of the system. We have also utilized a probabilistic neural network (PNN) as another classiﬁer and compared its recognition rate with that of the back-propagation neural network (BPNN), and the results show that the PNN produces a comparable recognition rate and lower computational costs. Our research shows that the extracted bispectrum values of an EEG signal using 3D ﬁltering as a feature extraction method is suitable for use in an EEG-based emotion recognition system.


Introduction
It is widely believed that psychological factors can affect a patient's recovery process; positive emotions, for example, affect the progression of recovery, and a patient's emotional response to his or her illness could also affect the type and amount of medication prescribed by doctors [1].Psychological treatment is one of the most important factors in a patient's recovery and, when administered properly, might even accelerate the healing process.Although psychologists are still debating whether positive emotions could cure patients suffering from life-threatening illnesses, experts believe that experiencing positive emotions (e.g., feeling happy and stress free) would improve patients' social life, which in turn make patients more productive in their daily activities.
The difficulty in utilizing the psychological approach in the patient recovery process stems from the patient's ability to hide emotions or the inability to express an emotional condition despite his or her desires, such as cases where patients experience facial nerve paralysis or adhere to certain cultural dynamics.When these situations occur, even social approaches through various communication techniques or observations of body language still pose a challenge, as the nurse or the patient's family members might not be able to accompany the patient at all times.A solution can therefore be found in an automatic emotion recognition tool or system.
There are several ways to determine human emotions automatically.One popular way is by using a camera and image processing technique to recognize facial expressions; however, this method is ineffective when patients suffer from facial paralysis or when they hide their genuine emotions.Therefore, another approach should be considered, such as automatic emotion recognition that incorporates the use of physiological signals, such as the electrocardiograph (ECG) for monitoring the patient's heart rate, the multi-resonance imaging (f-MRI) for monitoring blood-oxygen-level dependence (BOLD), the eye-gaze trackers (EGT) for monitoring the width of the pupil movements or the EEG for monitoring brain signals.The EEG-based emotion recognition system is preferred due to the advantage of its temporal resolution [2] and the ability of the patient to move around while the EEG-based emotion recognition system is processing the signals.
The goal of this research is to develop an automatic system for recognizing human emotions through EEG signals.The power spectrum analysis has previously been implemented in the developed automatic emotion recognition system [3], and in this work, the bispectrum analysis, which theoretically is better than the power spectrum analysis, was used as the feature extraction subsystem, while ANNs served as the classifiers.For the purpose of benchmarking the experiment setup and comparing the recognition capability of the developed system, we have used the EEG signals from healthy subjects as provided by the Database for Emotion Analysis Using Physiological Signals (DEAP) [4].
The rest of the paper is organized as follows: Section 2 presents the literature review on methodologies previously proposed by other researchers; Section 3 discusses the methodology used in this research, the materials of the study, the bispectrum processing technique, including the proposed 3D filtering method for extracting the bispectrum value, and the classifiers used (i.e., the BPNN and the PNN); Section 4 describes in detail the experimental method and discusses the results; finally, Section 5 concludes the paper.

Literature Review
Researchers have proposed many different methods for recognizing emotions through EEG signals [2,5].The fundamental challenge of using EEG signals to detect human emotion lies in understanding how a particular emotional state is represented in the brain and applying the correct computational model to accurately identify that emotion, both automatically and in real time.Human emotion is naturally felt after a few seconds, not in a split second; therefore, it is more appropriate if the EEG signals used for the analysis have a set time frame to avoid analyzing the signals' amplitude as in the event-related potentials (ERP) technique.However, the ERP technique has been used in other emotion recognition research [6][7][8][9] because the emotional stimuli were instantaneous pictures.On the other hand, to analyze EEG signals within a certain time duration for successful and productive use in an emotion recognition system, a signal processing technique is needed: one widely-adopted method is to calculate the power spectrum of the brain signals [4,[10][11][12][13], and the results were an approximately 60-70% recognition rate, showing much room for improvement.
Bispectrum analysis has been used as a signal processing technique for analyzing the EEG signal [14] and also used in medical research, such as in cases related to anesthesia [15,16], meditation [17], vigilance tests [18] and recently in emotion recognition [19,20], but the recognition rate result for the latter was somewhat lower than desired.Hosseini [19] used seven features from bispectrum analysis taken from each of five EEG channels to recognize whether the subject was stressed; Hosseini used a support vector machine (SVM) as the classifier and a genetic algorithm as the feature selection algorithm.Kumar [20] used the bispectrum as a feature extraction method to identify four types of emotional states, namely the low and high arousal and the low and high valence from the data provided in the DEAP database [4], using only two channels (Fp1 and Fp2) as the input signals and SVM as the classifier.There were five types of features calculated after the bispectrum analysis: normalized bispectral entropy (NBE), normalized bispectral squared entropy (NBSE), mean-magnitude of bispectrum (MMOB), first order spectral moment (FOSM) and second order spectral moment (SOSM).However, the use of a higher complexity feature calculation leads to an increased computation cost and may result in decreased accuracy.In this work, we propose a simpler method for feature extraction after the bispectrum values are calculated: to calculate the mean of bispectrum values after they are divided, by a filtering method, into several spatial regions, and using the percentage of the mean as the feature.
Regarding the relationship between EEG and emotion, the prefrontal cortex area of the brain exhibits a strong correlation with human emotion.Nevertheless, through the feature selection process, some researchers have found that other EEG channels were better suited for emotion recognition [20,21].In the frequency domain, particular emotions are believed to affect specific frequency bands, such as alpha [22,23], gamma [24,25] and theta [26], and some researchers have incorporated all frequencies, delta, alpha, beta, theta and gamma [19,27], into their EEG research.In this work, we used all frequencies of brain signals and omitted none.
For the EEG signal classification using DEAP as the dataset, both linear and non-linear methods have been used widely; such as support vector machine (SVM) [28][29][30][31][32], linear discriminant analysis (LDA) [33], quadratic discriminant analysis (QDA) [30], k-nearest neighbor (KNN) [34] for the linear classifier; and back-propagation neural networks (BPNN) [3], probability neural networks (PNN) [35], Bayesian neural networks (BNN) [36], deep learning networks (DLN) [37] and Elman neural networks (ENN) [19] for the non-linear classifier.In this work, we use the BPNN as the benchmark classifier.Although BPNN has a major drawback in terms of time, it is guaranteed to execute the best performance because the algorithm uses a gradient descent of error.We also compared our classification results using BPNN to the results with that of the PNN.
DEAP has been used by many researchers because a complete description of the data acquisition is available, the data samples are large enough, i.e., 1280 samples, and the pre-processed data in the form of MATLAB and Python matrices are also provided; these allow researchers to be focused on the development of feature extraction and the classification methods.However, the results achieved by different researchers are difficult to compare because they use various cross-validation methods, and the recognition rate result varied between 40% and 90%.

Bispectrum Analysis of EEG Signals for an Emotion Recognition System
There is a significant amount of research focusing on human emotion recognition.However, only a few publicly open databases of emotion are available for download, and in this research, we chose the DEAP database [4] for two reasons.First, the DEAP database uses carefully-chosen emotional stimuli based on statistical methods from the respondents.To ensure objectivity, the respondents who assessed the emotional stimuli were different from the respondents who recorded the EEG signals.Second, this database provides numerous observations.Using a publicly open database enables us to reproduce and compare our research results with other findings.
To recognize human emotion automatically, the proposed method employs higher order statistics to the EEG signal to produce 3D bispectrum matrices.Then, the bispectrum matrices are filtered using a 3D pyramid filter to reduce the size of the matrices and to form a feature vector.The resulting feature vectors are used as the input for the BPNN classifier.

EEG Database and Its Acquisition
Currently, the existing EEG databases available for research purposes mostly contain the EEG signals of motor imagination [38,39], sleep stages [40] and epilepsy [41], and only a few EEG databases related to the emotional states are available, such as Mahnob-HCI [42], the SJTU Emotion EEG Dataset (SEED) database [43] and the DEAP database [4].This work utilized the DEAP database [4] that provided the most EEG data from participants; emotional stimuli came from video music clips, and the emotional states of the participants were defined in 2D emotion models (i.e., the arousal and the valence levels [4]), producing four quadrants of emotion: the high arousal, high valence (HAHV); the low arousal, high valence (LAHV); the low arousal, low valence (LALV); and the high arousal, low valence (HALV) emotion classes.
In this database, the emotional stimuli for each quadrant of the 2D emotion model were portioned equally with 10 music videos, and the excited EEG signals from the brain activity were recorded using 32 channels of an EEG Biosemi ActiveTwo with a sampling rate of 512 Hz.The DEAP database also provided a list of the videos and their YouTube URLs, the basic statistics of participants' ratings, the physiological signals, the facial expression videos and the preprocessed data.EEG signals were recorded from 32 people (balance between women and men) between the ages of 19 and 37 years.During the recording process, each participant was asked to rate, on a scale of 1-9, his or her emotional response when watching the video in terms of arousal, valence, likes/dislikes, the dominance and the familiarity through the Self-Assessment Manikin (SAM) questionnaire [4].
As EEG signals are vulnerable to noises, several noise filtering methods, such as to remove power line noise, eye and muscle artefact removal [44,45], are usually applied.In DEAP, several pre-processing steps that consist of a band-pass filtering at 50 Hz and 60 Hz, a high-pass filtering at 2 Hz, a down-sampling process to 128 Hz and the removal of artefacts corresponding to electrooculography (EOG) have been conducted [4].The DEAP database then provides 32 matrices of 40 × 40 × 8064, representing video/trial × physiology channels × duration of the signals, respectively; however, for our research purposes, these original 32 matrices were divided into a 1280 person-video data matrix (32 people × 40 videos).In this work, we only considered the EEG signals; thus, only 32 out of 40 physiology signals are used, and we eliminated the first three-second baseline from the 63-second duration of the signals, so the resulting size of each matrix was 32 × 7680 (EEG channels × duration of the signals).The 1280 matrices were further divided into 320 matrices (32 people × 10 videos) for each class of emotion.

EEG Channel Selection Related to Human Emotion
The brain is the operations center of the human nervous system and can be divided into three major parts: the brainstem, the cerebellum (hindbrain) and the cerebrum (front brain).The largest part of the brain is the cerebrum, and the outer part of the cerebrum is the cerebral cortex, which generates most of the voltage measured on the surface of the head.The cerebral cortex is divided into four lobes, which are the temporal, the frontal, the parietal and the occipital lobes.Each lobe is associated with various functions, and the frontal lobe is where we can observe brain signals associated with emotions.
An EEG captures the electrical activity of the brain by measuring the voltage fluctuations due to the flow of the electric current between neurons in the brain.The EEG electrodes are named according to their corresponding area of the cerebral cortex: the frontal (F), the central (C), the parietal (P), the occipital (O) and the temporal (T).Some electrodes are placed in pairs on the left side and right side of the head, with odd numbers given to the left electrode positions and even numbers given to the right electrode positions.The electrodes in the middle positions are given zero (z) subscripts.
In this study, we document the effect of channel selection on the recognition rate of the emotion classification system; then, aside from the initial 32 channels, we also assess the effect of channel selection on the brain emotion theory, which involves all eight channels on the frontal, frontal parietal and anterior-frontal regions.In addition to studying the 32 channels and the reduced eight channels, we also compared our results with the 14 channels corresponding to the channels available in the BCI Emotiv device.

The Developed Methodology for Emotion Classification System
Generally, the automatic emotion recognition process can be carried out using classification algorithms through a mathematical-based machine learning method or through a biological-based ANN method.Using both learning mechanisms of the classification algorithms, the researchers' main focus is finding the best techniques for the feature extraction subsystem in order to minimize classification error and improve the recognition rate.
The developed methodology for our automatic emotion recognition system is illustrated in Figure 1.At the first stage, human emotions are elicited through a set of emotion stimuli, usually by pictures, sounds or videos.Then, the EEG input signals are captured by an EEG device with several electrodes using the appropriate sampling frequency.In this research, the EEG input signals were provided by the DEAP database, and these signals were stimulated by a set of music videos (clips).The EEG input signals from this database were already preprocessed [3] through certain steps, such as reducing interference noise during the EEG recording, including power line frequency noise filtering and the blink artefact removal.
focus is finding the best techniques for the feature extraction subsystem in order to minimize classification error and improve the recognition rate.
The developed methodology for our automatic emotion recognition system is illustrated in Figure 1.At the first stage, human emotions are elicited through a set of emotion stimuli, usually by pictures, sounds or videos.Then, the EEG input signals are captured by an EEG device with several electrodes using the appropriate sampling frequency.In this research, the EEG input signals were provided by the DEAP database, and these signals were stimulated by a set of music videos (clips).The EEG input signals from this database were already preprocessed [3] through certain steps, such as reducing interference noise during the EEG recording, including power line frequency noise filtering and the blink artefact removal.To retrieve any meaningful information hidden in the EEG signal, the original time domain signals are processed using some signal processing techniques, and the features are then calculated and extracted.Because the higher spectrum analysis is theoretically better than the power spectrum model, we processed our signals using bispectrum analysis, and to reduce the bispectrum matrix size, we also applied a 3D filtering method.The features are extracted by calculating several relevant pieces of information from the filtered bispectrum matrix, such as the entropy, the moment and the mean.The best features are then investigated further to determine the highest recognition rate.
As the feature vectors produced by bispectrum analysis are usually larger than that of the power spectrum analysis, the higher computation cost of using bispectrum analysis could not be avoided.To reduce the size of the feature vectors, and in turn reduce the computation cost, a principal component analysis (PCA) technique that searches the best principal components (i.e., the coefficient matrix according to the number of the Eigen values desired) is utilized.We also sought to determine whether reduction in the number of channels used could contribute to reducing the computation cost without decreasing the recognition rate.
In the classification stage, the BPNN classifier was chosen as the benchmark classifier among other ANN classifiers due to its capacity to guarantee the convergence of the minimum error using a gradient descent training method.We then compared the performance of the BPNN with another ANN classifier, namely the PNN, that uses Bayesian probability in the classification process.

Bispectrum
The power spectrum analysis, which has been widely used in biomedical signal processing, performs power distribution calculation using a function of frequency and pays no attention to the signal phase information.In the power spectral analysis, the signal is assumed to arise from a linear process, thus ignoring any possible interaction between components (phase coupling).However, the brain signal is part of the central nervous system with many nonlinear sources, so it is highly likely to have a phase coupling between signals [14].The bispectrum analysis is superior to the power spectrum analysis because in its mathematical formula, there is a correlation calculation between the frequency components [46]; thus, the phase coupling components of the EEG signals could be To retrieve any meaningful information hidden in the EEG signal, the original time domain signals are processed using some signal processing techniques, and the features are then calculated and extracted.Because the higher spectrum analysis is theoretically better than the power spectrum model, we processed our signals using bispectrum analysis, and to reduce the bispectrum matrix size, we also applied a 3D filtering method.The features are extracted by calculating several relevant pieces of information from the filtered bispectrum matrix, such as the entropy, the moment and the mean.The best features are then investigated further to determine the highest recognition rate.
As the feature vectors produced by bispectrum analysis are usually larger than that of the power spectrum analysis, the higher computation cost of using bispectrum analysis could not be avoided.To reduce the size of the feature vectors, and in turn reduce the computation cost, a principal component analysis (PCA) technique that searches the best principal components (i.e., the coefficient matrix according to the number of the Eigen values desired) is utilized.We also sought to determine whether reduction in the number of channels used could contribute to reducing the computation cost without decreasing the recognition rate.
In the classification stage, the BPNN classifier was chosen as the benchmark classifier among other ANN classifiers due to its capacity to guarantee the convergence of the minimum error using a gradient descent training method.We then compared the performance of the BPNN with another ANN classifier, namely the PNN, that uses Bayesian probability in the classification process.

Bispectrum
The power spectrum analysis, which has been widely used in biomedical signal processing, performs power distribution calculation using a function of frequency and pays no attention to the signal phase information.In the power spectral analysis, the signal is assumed to arise from a linear process, thus ignoring any possible interaction between components (phase coupling).However, the brain signal is part of the central nervous system with many nonlinear sources, so it is highly likely to have a phase coupling between signals [14].The bispectrum analysis is superior to the power spectrum analysis because in its mathematical formula, there is a correlation calculation between the frequency components [46]; thus, the phase coupling components of the EEG signals could be revealed.Some characteristics of bispectrum analysis are the ability to extract deviations due to Gaussianity, to suppress the additive colored Gaussian noise of an unknown power spectrum and to detect nonlinearity properties [47].Because of its superiority, bispectrum analysis is used in this research as the signal processing technique in the feature extraction step.We expect that by using bispectrum analysis, the recognition rate of the EEG-based automatic emotion recognition system will improve.
For example, suppose there is a signal S 3 that is a combination of two other signals, S 1 and S 2 , where the frequencies are f 1 = 20 Hz, f 1 = 10 Hz and f 3 = f 1 + f 2 = 30 Hz, and the phases are φ 1 = π/6, φ 2 = 5π/8 and φ 3 = φ 1 + φ 2 , respectively.The signals S 1 , S 2 and S 3 are then defined as follows: S 1 = 3 cos(2πf 1 t + ϕ 1 ), S 2 = 5 cos(2πf 2 t + ϕ 2 ) and S 3 = 8 cos(2πf 3 t + ϕ 3 ).The resulting signal X(t) = S 1 + S 2 + S 3 is then received by a sensor.With the sampling frequency of 100 Hz, signal X(t) is processed to reveal its power spectrum and its bispectrum values, and the result can be seen in Figure 2. The power spectrum (Figure 2a) produced using FFT showed that the dominant frequencies are 10, 20 and 30 Hz; however, it does not reveal the fact that the frequency 30 Hz is a resulting combination of frequencies 10 and 20 Hz.On the other hand, in bispectrum (Figure 2b), the pair of normalized frequencies 0.1 and 0.2 Hz (equal to the original frequencies 10 and 20 Hz), on coordinate (0.1,0.2) and (0.2,0.1), show a high spectrum, which means that they are strongly correlated because they produce the 30-Hz signal.Supposing that the signal S 3 is noise, by using a bispectrum analysis, the noise signal will not be taken into consideration.Thus, we can conclude that through bispectrum analysis, the main frequency components are revealed while the other frequencies are eliminated.Because bispectrum analysis provides more information than the power spectrum analysis, it is expected that the use of bispectrum analysis in the EEG signals' feature extraction process will increase the recognition rate.
revealed.Some characteristics of bispectrum analysis are the ability to extract deviations due to Gaussianity, to suppress the additive colored Gaussian noise of an unknown power spectrum and to detect nonlinearity properties [47].Because of its superiority, bispectrum analysis is used in this research as the signal processing technique in the feature extraction step.We expect that by using bispectrum analysis, the recognition rate of the EEG-based automatic emotion recognition system will improve.
For example, suppose there is a signal S3 that is a combination of two other signals, S1 and S2, where the frequencies are f1 = 20 Hz, f1 = 10 Hz and f3 = f1 + f2 = 30 Hz, and the phases are φ1 = π/6, φ2 = 5π/8 and φ3 = φ1 + φ2, respectively.The signals S1, S2 and S3 are then defined as follows: S1 = 3 cos(2πf1t + φ1), S2 = 5 cos(2πf2t + φ2) and S3 = 8 cos(2πf3t + φ3).The resulting signal X(t) = S1 + S2 + S3 is then received by a sensor.With the sampling frequency of 100 Hz, signal X(t) is processed to reveal its power spectrum and its bispectrum values, and the result can be seen in Figure 2. The power spectrum (Figure 2a) produced using FFT showed that the dominant frequencies are 10, 20 and 30 Hz; however, it does not reveal the fact that the frequency 30 Hz is a resulting combination of frequencies 10 and 20 Hz.On the other hand, in bispectrum (Figure 2b), the pair of normalized frequencies 0.1 and 0.2 Hz (equal to the original frequencies 10 and 20 Hz), on coordinate (0.1,0.2) and (0.2,0.1), show a high spectrum, which means that they are strongly correlated because they produce the 30-Hz signal.Supposing that the signal S3 is noise, by using a bispectrum analysis, the noise signal will not be taken into consideration.Thus, we can conclude that through bispectrum analysis, the main frequency components are revealed while the other frequencies are eliminated.
Because bispectrum analysis provides more information than the power spectrum analysis, it is expected that the use of bispectrum analysis in the EEG signals' feature extraction process will increase the recognition rate.The autocorrelation of a signal is the correlation between the signal and itself at a different time; for example, at time t and at time t + m.The autocorrelation function of x(n) can be expressed as the expectation of stationary process, defined as: The higher order moments are a natural generalization of autocorrelation, and the cumulants are a nonlinear combination of moments.The first order cumulant (C1x) from stationary process is the mean, C1x = E {x(t)}, with E{.} an expectation notation.The higher order cumulants have the property invariant to the shift of the mean; therefore, it is practical to describe the cumulants under zero mean assumption, meaning that if the mean of a process is not zero, then as the first step, the mean should be subtracted from each value.The second order polyspectrum, which is the power spectrum, is defined as the Fourier transform of the second order cumulant, while the third order polyspectrum, which is the bispectrum, is defined as the Fourier transform of the third order cumulant.
For the third order cumulant, the autocorrelation of a signal will be calculated until the distance t + τ1 and t + τ2, where τ1 and τ2 are the lag.The third order cumulant from a zero mean stationary process is defined as [48]: The autocorrelation of a signal is the correlation between the signal and itself at a different time; for example, at time t and at time t + m.The autocorrelation function of x(n) can be expressed as the expectation of stationary process, defined as: The higher order moments are a natural generalization of autocorrelation, and the cumulants are a nonlinear combination of moments.The first order cumulant (C 1x ) from stationary process is the mean, C 1x = E {x(t)}, with E{.} an expectation notation.The higher order cumulants have the property invariant to the shift of the mean; therefore, it is practical to describe the cumulants under zero mean assumption, meaning that if the mean of a process is not zero, then as the first step, the mean should be subtracted from each value.The second order polyspectrum, which is the power spectrum, is defined as the Fourier transform of the second order cumulant, while the third order polyspectrum, which is the bispectrum, is defined as the Fourier transform of the third order cumulant.
For the third order cumulant, the autocorrelation of a signal will be calculated until the distance t + τ 1 and t + τ 2 , where τ 1 and τ 2 are the lag.The third order cumulant from a zero mean stationary process is defined as [48]: Thus, the bispectrum, B( f 1 , f 2 ), defined as the Fourier transform of the third order cumulant, becomes: (3) The bispectrum has a specific symmetrical property that is derived from the symmetrical property of the third order cumulant [46], which results in similarity of the six regions of the bispectrum, shown as follows: Because the bispectrum matrix has a redundant region as described in (4), it is sufficient to extract features from only one quadrant of the bispectrum matrix, and because the FFT in the calculation of bispectrum value may result in non-imaginary values, then the absolute value of the bispectrum is used.The pseudocode of the bispectrum calculation in the feature extraction process for an EEG signals, derived from (1)−( 4), is summarized in Algorithm 1 as follows: Algorithm 1: Bispectrum calculation in feature extraction algorithm.
For channel = 1 to C 3.
Calculate autocorrelation signal X to the defined lag as (2) 4.
Construct symmetrical matrix C 3x for the first quadrant 5.
Take only 1 quadrant of bispectrum matrix 8.
Take the absolute value of the bispectrum matrix 9.
End For

The 3D Pyramid Filter
The output of the previous step is one quadrant bispectrum matrix, which is a 64 × 64 matrix for each of the 32 EEG channels, equaling a total of 131,072 elements; therefore, the number of elements is too large to be used for calculating the extracted features.To reduce the size of these feature vectors, we have proposed a filtering mechanism by utilizing 3D pyramid shape filters for the bispectrum elements value so that the bispectrum value at the center of the pyramid becomes the most significant value.From the filtered area, one or more statistical properties are derived and calculated as the features-extracted data, which will be described in the next subsection.
To find the best filtering mechanism, two filter models are proposed, as shown in Figure 3: the non-overlapping filters with various sizes at the base and the overlapping filters with equal sizes at the base.Figure 2b shows that the bispectrum usually gathers near the center; thus, in this area, the filters are dense and the bases are small.At the higher frequencies, the bispectrum usually has a very low value, so in this area, the filters are sparse, and the bases are large.Therefore, in the non-overlapping filters, we use 5 × 5 filters (Figure 3a), with the size of the filter varying (32, 16, 8, 4 and 4) along the xand y-axis.
By increasing the number of filters and overlapping the filters, the quantization process is expected to provide a better approximation; we therefore increased the number of filters up to 7 × 7 and constructed the filters with overlapping areas at the base (Figure 3b).The size of the bases is 16 × 16 equal elements, and there are 50% overlapping areas with the adjacent filters.However, the complexity and the resulting feature vector's size of the 7 × 7 filter should be considered as a trade-off.The height of the overlapping and non-overlapping filters in both pyramid models is equal: in this case, it is one.To filter the bispectrum matrix, each selected area was multiplied by the corresponding filter.The filtering process results in several filtered matrices with their respective bispectrum values, and from these filtered bispectrum matrices, the features are calculated and extracted.The pseudocode of the filtering mechanism for the bispectrum matrices is summarized in Algorithm 2 as follows: for m = 1 to M 5.
for n = 1 to N 6.
Calculate length of rectangle base of the filter 7.
Calculate width of rectangle base of the filter 8.
Construct 2D filter for length side of rectangle with triangle type 9.
Construct 2D filter for width side of rectangle with triangle type 10.
Construct 3D pyramid filter based on 2D filter 11.
Take bispectrum value with the length and width of the rectangle 12.
Multiply bispectrum value with the 3D filter 13.
Perform calculation for feature 14.
Take only half of the triangle (non-redundant) of the feature matrix 15.
Transform feature matrix to feature vector 16.end 17.end 18.end The implementation of the filtering process using the 3D pyramid filters is illustrated in Figure 4.In this example, the effect of 3D pyramid filtering on the bispectrum matrix from the EEG signal of Person 1-Video 1-Channel 1 is shown.The original bispectrum matrix is shown in Figure 4a, and then, one-quarter is extracted, resulting in the 64 × 64 matrix shown in Figure 4b.The filtering process began by multiplying this matrix with the constructed 3D pyramid filters, and for this example, we use the 5 × 5 non-overlapping filters (Figure 3a).The multiplication process was conducted for each area of the matrix according to the size of the pyramid base; for example, the bispectrum matrix region (0:32,0:32) was multiplied by the filter whose base size is 32 × 32 and whose height is one.Therefore, for 5 × 5 non-overlapping filters, there will be 25 times the filtering process through this multiplication step, and the result is shown in Figure 4c.The calculation of features from The height of the overlapping and non-overlapping filters in both pyramid models is equal: in this case, it is one.To filter the bispectrum matrix, each selected area was multiplied by the corresponding filter.The filtering process results in several filtered matrices with their respective bispectrum values, and from these filtered bispectrum matrices, the features are calculated and extracted.The pseudocode of the filtering mechanism for the bispectrum matrices is summarized in Algorithm 2 as follows: Algorithm 2: Bispectrum 3D filtering for feature extraction algorithm.
N = number of row filter 3.
For channel = 1 to C 4.
for m = 1 to M 5.
for n = 1 to N 6.
Calculate length of rectangle base of the filter 7.
Calculate width of rectangle base of the filter 8.
Construct 2D filter for length side of rectangle with triangle type 9.
Construct 2D filter for width side of rectangle with triangle type 10.
Construct 3D pyramid filter based on 2D filter 11.
Take bispectrum value with the length and width of the rectangle 12.
Multiply bispectrum value with the 3D filter 13.
Perform calculation for feature 14.
Take only half of the triangle (non-redundant) of the feature matrix 15.
Transform feature matrix to feature vector 16.end 17.
end 18. end The implementation of the filtering process using the 3D pyramid filters is illustrated in Figure 4.In this example, the effect of 3D pyramid filtering on the bispectrum matrix from the EEG signal of Person 1-Video 1-Channel 1 is shown.The original bispectrum matrix is shown in Figure 4a, and then, one-quarter is extracted, resulting in the 64 × 64 matrix shown in Figure 4b.The filtering process began by multiplying this matrix with the constructed 3D pyramid filters, and for this example, we use the 5 × 5 non-overlapping filters (Figure 3a).The multiplication process was conducted for each area of the matrix according to the size of the pyramid base; for example, the bispectrum matrix region (0:32,0:32) was multiplied by the filter whose base size is 32 × 32 and whose height is one.Therefore, for 5 × 5 non-overlapping filters, there will be 25 times the filtering process through this multiplication step, and the result is shown in Figure 4c.The calculation of features from the filtered bispectrum matrix is then conducted; in this example, the features are the mean (average) of each filtered bispectrum matrix, and the result is shown in Figure 4d. the filtered bispectrum matrix is then conducted; in this example, the features are the mean (average) of each filtered bispectrum matrix, and the result is shown in Figure 4d.As the bispectrum matrix shown in Figure 4b is asymmetrical, then the feature-extracted matrix (such as in Figure 4d) is also asymmetrical; therefore, it is sufficient to take just half of the whole matrix.For example, in the 5 × 5 filters, the number of non-redundant elements of the matrix equal to the sum of arithmetic sequence ∑ = 15, resulting in 15 dimensions of feature vectors, as shown in Figure 4e.Thus, for the full 32 EEG channels used, the dimension of feature vectors will be As the bispectrum matrix shown in Figure 4b is asymmetrical, then the feature-extracted matrix (such as in Figure 4d) is also asymmetrical; therefore, it is sufficient to take just half of the whole matrix.For example, in the 5 × 5 filters, the number of non-redundant elements of the matrix equal to the sum of arithmetic sequence ∑ 5 n=1 n = 15, resulting in 15 dimensions of feature vectors, as shown in Figure 4e.Thus, for the full 32 EEG channels used, the dimension of feature vectors will be 15 × 32 = 480, as shown in Figure 4f.Obviously, for the 7 × 7 filters, there will be 28 dimensions of feature vector per channel, resulting in 28 × 32 = 896 dimensions of feature vectors for the whole channel.

Feature Types Based on the Bispectrum
Bispectrum analysis has also been used for an EEG-based emotion recognition system by Kumar [20], with several entropies, values and moments taken as the features, which we adapted as one of the feature extraction modes in our experiments (Mode #1 feature extraction); however, this feature produces a high dimension of feature vectors and thus requires a high computation cost.In this work, we propose as simpler feature, which is the mean of the bispectrum value, as Mode #3 feature extraction, producing only one feature value for each area of the filter, thereby reducing the dimension of the feature vectors.
In previous work, we have found that the energy percentage performed well as the feature of an EEG signal [49]; therefore, here, we considered the percentage value as the feature.The percentage value of entropies and moments of Mode #1 becomes the Mode #2 feature extraction, and the percentage of the mean becomes Mode #4 feature extraction.Suppose F m is a feature and M is the number of features per channel; then, the percentage value of the feature FP m is defined as:

Back-Propagation Neural Networks
BPNN is one of the ANN classifiers constructed by the multi-layer perceptron (MLP) architecture.BPNN provides a mechanism to update the weights and biases of each neuron's connection by propagating and minimizing the error in each iteration (epoch).The BPNN used in this work has three layers of architecture: one input layer, one hidden layer and one output layer.The number of neurons in the input layer was equal to the dimension of the feature vector, while the number of hidden neurons amounted to half of the input neurons.Because the classification for arousal and valence were conducted separately, each emotion was divided into high and low class, resulting in only one neuron in the output layer.The Nguyen-Widrow method [22] was used to initialize the weight and bias of each neuron.
At the feed-forward part of the training phase, for each epoch, the input vectors were presented one at a time to the input layer, and their values were passed to the hidden layer.In the hidden layer, the value received by each neuron was multiplied by its weight and added with the bias.The activation function used was a sigmoid function, f (x) = 1/(1-e x ).On the output layer, the value received by the neuron was multiplied by its weight and added with the bias.The activation function at the output layer was also a sigmoid function, resulting in the output value of the ANN.
At the back-propagation part of the algorithm, the calculated error between the values in the output layer (the predicted emotion) and the target (the actual emotion) was propagated back to adjust the weights and the biases for the neurons in the output and the hidden layer [23].The learning process was carried on until the preferred minimum error was achieved; in this case, we used root mean sum squared error (RMSSE), or until it reached the maximum number of epochs.
At the testing stage, the testing data part was fed-forward to the input layer and sent to the hidden layer and up to the output layer using the weights and the biases calculated and obtained from the learning stage.The testing stage was similar to the feed-forward part of the learning stage, but the error between the predicted emotion and the stimulated emotion was calculated to produce the recognition rate of the automatic emotion recognition system.

Probabilistic Neural Networks
PNN is an ANN classifier that is based on Bayes theorem.With a Bayes classifier, the datum X belongs to the class C j when P(C j |X) has the biggest probability value.
To calculate the probability P(C j |X), it is necessary to estimate the conditional probability P(X|C j ) and the a priori probability of P(C j ) of each class Cj.The calculation of P(X) is not required because P(X) exists in every class.The a priori probability P(C j ) could be calculated by including the entire training data.The conditional probability P(X|C j ) is estimated using the Parzen window p.d.f.estimation.By assuming Gaussian distribution, the Parzen window p.d.f.estimates p j (x) as: Here, x is the sample data with the probability being estimated, j is the class number, N is the number of training data in class Cj and d is the dimension of the feature vector.The value of σ (sigma) in the Gaussian distribution is the standard deviation or the smoothing parameter of the Gaussian curve; however, the actual standard distribution was unknown and should be determined.
The PNN consists of four layers: an input layer, a pattern layer, a summation layer and a decision layer.In the input layer, which consists of one neuron, the input vector was received and then forwarded to the pattern layer.Each neuron at the pattern layer represents the training data that belong to each class.We used 50% of the data samples as the training data.Thus, the number of neurons in the pattern layer was equal to 50% of the number of data samples.In the pattern layer, the vector input was compared to the training data in each class.In the summation layer, the Parzen window p.d.f.formula was used to determine the class of the data.The biggest probability value determined where the datum should belong.

Experiment and Results
The development of an automatic emotion recognition system using a new methodology for 3D filtered bispectrum feature extraction and the ANN classifier has been presented in Section 3; to verify the performance of the proposed methodology, five experiments were carried out (Exp.#1-Exp.#5).In Exp.#1, to find the best feature extraction type based on bispectrum analysis, four feature extraction types were compared.In Exp.#2, two types of 3D pyramid filters were compared, namely the overlap and the non-overlapping filters.To reduce the computation cost, in Exp.#3, the number of EEG channels used in the system was reduced.In Exp.#4, to increase the recognition rate, the number of samples in the training data was increased to 90%.In the final experiment, Exp.#5, two types of ANN classifiers were compared, the BPNN and the PNN, in order to find the best classifier.
The DEAP database [4] provides EEG signals from 32 people and 40 video stimuli, forming 1280 person-video samples.Originally, the DEAP database used four emotion classes, which are the HAHV (Class 1), the LAHV (Class 2), the LALV (Class 3) and the HALV (Class 4) emotions, by way of H = high, L = low, A = arousal and V = valence, with a balanced sample size (320 samples) resulting from 32 people and 10 video stimuli per class.As indicated in the data acquisition section, this research principally takes two binary classes of emotion in 2D arousal and valence planes: high arousal (Class 1 + Class 4)/low arousal (Class 2 + Class 3) and high valence (Class 1 + Class 2)/low valence (Class 3 + Class 4), with 640 samples for each class, accordingly.
In the following experiments (apart from Exp. #4), 50% of the samples were used as the training data, while the remaining data were used as the testing data, producing 320 samples per class for each of the training and testing parts.All thirty-two people were represented in the training data, but the video stimuli were presented to only a half.For instance, Videos 1-5 (from Class 1) and Videos 31-35 (from Class 4) were chosen as high arousal training data, while Videos 6-10 (from Class 1) and Videos 36-40 (from Class 4) were chosen as testing data.Similarly, Videos 11-15 (from Class 2) and Videos 21-25 (from Class 3) were chosen as low arousal training data, and Videos 16-20 (from Class 2) and Videos 26-30 (from Class 3) were chosen as testing data.
For Exp. #1-Exp.#4, BPNNs were used as the benchmark classifier, with the setting parameters of α (learning rate) = 0.2 and β (momentum) = 0.3.The number of input neurons was equal to the dimension of the feature vectors, the number of hidden neurons is equal to half of the input neurons, and the output of the neurons equals one.In Exp.#5, the PNN was compared with the results of the benchmark classifier.

Exp. #1: Comparison of Bispectrum-Based Feature Extraction Modes
The following experiments in Exp.#1 were carried out to compare and select the best feature extraction mode from the bispectrum value.There were four types of features to consider: the entropy and moment (NBE, NBSE, MMOB, FOSM and SOSM as in [20]) with additional standard deviation (STD) value as Mode #1, the percentage value of Mode #1 as Mode #2, the mean as Mode #3 and the percentage of the mean as Mode #4.A 5 × 5 3D pyramid filter was used in this experiment, producing 480 dimensions of feature vectors for Mode #3 and Mode #4.The dimensions of feature vectors for Mode #1 and Mode #2 are 2880 because there are six features from each area of the filters (480 × 6 = 2880).To reduce the large dimension, each feature vector was dimensionally reduced using PCA with 99% Eigen values.The comparison result of the feature extraction type Mode #1-Mode #4 is presented in Table 1.Table 1 shows that the mean percentage feature (Mode #4) provides the highest recognition rate (74.22%) for the arousal and 77.58% for the valence.From the recognition rate in Table 1, it can be seen that for both arousal and valence emotion, Mode #1 and Mode #4 gave the best recognition rate compared to Mode #2 and Mode #3, with Mode #4 providing a slightly better recognition rate than Mode #1.To investigate whether any of the feature types is significantly different, one-way ANOVA was used to compare the recognition rate of four feature type (Mode #1-Mode #4), and the resultant p-values of both separate and combined arousal and valence were all <10 −4 .This result suggests Tukey's HSD test to see which of the feature types is significantly different.
Table 2 presents the Tukey HSD test result for pair-wise comparisons.For arousal emotion, Mode #1 is statistically different from Mode #4 with the p-value = 0.0114 (p < 0.05); while for valence emotion, Mode #1 did not have significant difference with Mode #4 with the p-value = 0.9716 (p > 0.05).Moreover, Mode #4 also has shortcomings due to the more complicated feature calculation compared to Mode #1.From this result, we can conclude that for the arousal emotion, the mean percentage of the bispectrum (Mode #4) is a better feature type than entropy and moment (Mode #4), while for the valence emotion, both Mode #1 and Mode #4 are similar.However, it can be noted that both Mode #1 and Mode #4 are significantly different with Mode #2 and Mode #3 (p < 0.05).This experiment also confirmed our hypothesis that the mean percentage of the bispectrum (Mode #4) provides more accurate information than the absolute mean value (Mode #3).This finding implies that for the same emotion, the relative/percentage bispectrum values from the EEG signals may have similarities among people, although different people may have different amplitude in the bispectrum values.The newly-developed methodology proposed in this paper is the use of bispectrum analysis and a 3D pyramid filtering for the feature extraction subsystem.The idea of using a 3D pyramid filter as the filtering method is to emphasize the values in the center of the filter and devote less attention to the values at the edges.As Table 1 shows, the mean percentage (Mode #4) with 5 × 5 non-overlapping filters provides a good recognition rate.To improve the recognition rate, the number of filters used for feature extraction was increased to 7 × 7 filters with overlapping filters at the base of the pyramid model.The hypothesis is that by using overlapping filters, the values at the edge of the filters, which have less importance in one filter, will be given higher importance in the adjacent filters; therefore, it is expected that every element of the bispectrum matrix has equal importance.In this experiment we assessed whether the increasing number of the filter will increase the recognition rate.However, we only compare the 5 × 5 filters with the 7 × 7 filters as the first step, and if the higher number of filters results in a higher recognition rate, then the number of filters could be increased more.
In this experiment, we only used the mean percentage feature (Mode #4), following our findings in the previous experiment.Each feature type is dimensionally reduced using PCA with 99% Eigen values.The BPNN was used as the benchmark classifier, and the recognition rate result is presented in Table 3. Table 3 shows that the 5 × 5 non-overlapping filters gave a higher recognition rate than that of the 7 × 7 overlapping filters, which deviates from our hypothesis.The superiority of the 5 × 5 non-overlapping filters is obvious for the arousal emotion, with a difference up to 10%.One-way ANOVA was used to compare the recognition rate of using the different filter types in the feature extraction process (5 × 5 non-overlapping filters and 7 × 7 overlapping filters).The resultant p-value of the combined arousal and valence emotion recognition rate was p = 0.126 (p > 0.05), which implies that neither of the filter types were significantly different.However, the p-value for arousal emotion recognition rate was p = 0.001 (p < 0.05), and the valence emotion recognition rate was p = 0.685 (p > 0.05).The p-value implies that for the arousal emotion, the use of the 5 × 5 non-overlapping filter was better than the 7 × 7 overlapping filters; however, for the valence emotion, these filters did not differ significantly.
The finding of this Exp.#2 deviated from our hypothesis that increasing the number of filters and by using overlapping strategies in the adjacent filters would increase the recognition rate.The analysis is as follows.The more filters used in the feature extraction process, the higher the dimension of feature vectors produced.However, a higher dimension of feature vectors produces higher cumulative errors, resulting in lower recognition rates.Another reason might be that in the 5 × 5 non-overlapping filters, more attention (by using more numbers and smaller filters) is given to the area where the bispectrum matrix elements show high amplitudes, whereas in the 7 × 7 overlapping filters, all elements of the bispectrum matrix have the same significance.This finding implies that the higher amplitude areas of the bispectrum matrix are more important than the lower ones.Further optimization of the filter type should be investigated as future work, such as by using more numbers of filters, but the size of the filters should be different, which is smaller at the area where the bispectrum amplitudes are high.

Exp. #3: Channel Selection for Emotion Classification
Research has shown that emotions, and their related brain signals, are associated with the frontal lobe of the cerebral cortex; therefore, the frontal EEG electrodes were often used in the emotion classification research.In this experiment, the effect of channel selection on the recognition rate of the emotion classification system is assessed.The selection of eight and 14 channels is in accordance with our previous work [3].The reduced eight channels were selected because they are positioned on the frontal and the near frontal area of the cerebral cortex (frontal parietal and anterior-frontal), while the reduced 14 channels were chosen in accordance with the channels available in the BCI Emotiv device.The channel descriptions and the results of this experiment are depicted in Table 4. BPNN was used for the classifier, and the experiment was repeated five times.Table 4 shows the recognition rate of the reduced eight and 14 channels compared to the recognition rate of the full 32 channels.One-way ANOVA was used to compare the recognition rate of using the complete and the reduced number of EEG channels.The p-value of the combined arousal and valence recognition rate was p = 0.3 (p > 0.05), while for the arousal emotion recognition rate, p = 0.485 (p > 0.05), and for the valence emotion recognition rate, p = 0.548 (p > 0.05); this finding shows that the reduced eight and 14 channels did not significantly differ from that of the complete 32 channels.This experiment showed that the 14 channels in the BCI equipment are sufficient to conduct emotion classification, with only slight differences compared to that of the 32 EEG channels.However, this experiment ignored the possibility that the signal quality of the BCI equipment might be lower than that of the medical-grade EEG.This experiment also shows that when the research calls for fewer electrodes to reduce the complexity and computation cost, the use of eight electrodes from the frontal, frontal parietal and anterior-frontal regions would be sufficient.

Exp. #4: Comparison of the Number of Training Samples
To increase the recognition rate, the number of samples in the training data in this experiment was increased to 90%.In the previous experiments (Exp.#1-Exp.#4), the training data composed 50% of the whole dataset, and all samples in the dataset were used for testing.In this experiment, the amount of training data samples was increased to 90% of all data samples, and classification was conducted using a 10-fold cross-validation method.There were 1152 samples used for the training data out of the 1280 samples available in the dataset.The division of the 10-fold cross-validation was designed so that if any of the video stimuli produced a different emotion from the emotion class desired, then it would be depicted in the recognition result.Therefore, for Fold 1 (F1), the sample data from the people with Videos 1, 11, 21 and 31 were used as testing data, while the rest was used for training.Similarly, for Fold 2 (F2), sample data from the people with Videos 2, 12, 22 and 32 were used for testing, and the rest were used for training, and so on, until Fold 10 (F10).This experiment was conducted using a BPNN classifier, and the reduced eight channels were used as the input.
From the data shown in Table 5 it can be seen that the recognition rate results from the 10-fold cross-validation are 92.92% for arousal and 93.51% for valence.One-way ANOVA resulted in the p-value = 0.669 (p > 0.05); thus, there is no significant difference between the fold.This experiment implies that by increasing the training data up to 90%, the recognition rate increased by an average of 18.96% (18.47% and 19.42% for arousal and valence), and the comparison of recognition rates between using 50% training data and 90% training data is depicted in Figure 5. From this result, it can be concluded that the amount of training data greatly influences the results obtained, and the higher the ratio of the training data, the higher the recognition rate.From the data shown in Table 5 it can be seen that the recognition rate results from the 10-fold cross-validation are 92.92% for arousal and 93.51% for valence.One-way ANOVA resulted in the pvalue = 0.669 (p > 0.05); thus, there is no significant difference between the fold.This experiment implies that by increasing the training data up to 90%, the recognition rate increased by an average of 18.96% (18.47% and 19.42% for arousal and valence), and the comparison of recognition rates between using 50% training data and 90% training data is depicted in Figure 5. From this result, it can be concluded that the amount of training data greatly influences the results obtained, and the higher the ratio of the training data, the higher the recognition rate.In a leave subject out (LSO) training-testing method, each fold in this experiment represented the video used as the stimulus as the subject, instead of the person.The recognition rate for the arousal emotion was on average 53.28% and for the valence emotion was 57.19%.This finding was comparable to recent other research results using the same database, such as the leave-subject-out scheme of the person (53.42% for arousal and 52.05% for valence) [37], using a random subset 10-fold leave-p-out cross-validation (all between 40% and 50% accuracy) [50].

Exp. #5: Comparison of BPNN and PNN Classifiers
To find the best classifier for the automatic emotion recognition system, two types of ANN classifiers were compared, the BPNN and the PNN.The drawback of the BPNN was the computing time to conduct the training phase for the classifier; thus, the PNN, which is generally faster than the BPNN, was used in this experiment.
The BPNN setting parameter was similar to that of the previous experiments (Exp.#1-Exp.#4), whereas for the PNN, it is difficult to find a good choice of smoothing parameter (σ), because in the original PNN classifier, there is no mathematical calculation to determine the best smoothing In a leave subject out (LSO) training-testing method, each fold in this experiment represented the video used as the stimulus as the subject, instead of the person.The recognition rate for the arousal emotion was on average 53.28% and for the valence emotion was 57.19%.This finding was comparable to recent other research results using the same database, such as the leave-subject-out scheme of the person (53.42% for arousal and 52.05% for valence) [37], using a random subset 10-fold leave-p-out cross-validation (all between 40% and 50% accuracy) [50].

Exp. #5: Comparison of BPNN and PNN Classifiers
To find the best classifier for the automatic emotion recognition system, two types of ANN classifiers were compared, the BPNN and the PNN.The drawback of the BPNN was the computing time to conduct the training phase for the classifier; thus, the PNN, which is generally faster than the BPNN, was used in this experiment.
The BPNN setting parameter was similar to that of the previous experiments (Exp.#1-Exp.#4), whereas for the PNN, it is difficult to find a good choice of smoothing parameter (σ), because in the original PNN classifier, there is no mathematical calculation to determine the best smoothing parameter.In this experiment, the PNN smoothing parameter (σ) was chosen randomly, and the trial was repeated 60 times to find the highest recognition rate.In this experiment, we used only eight channels from the EEG signals, as the previous experiment showed that using eight channels would be sufficient.The PNN recognition result for various randomly-generated smoothing parameters (σ) can be seen in Figure 6.ones; thus, the far-left handed side of the ROC graph becomes more interesting; thus, from this point of view, the BPNN performed better than the PNN.
Although the maximum value achieved was not significantly different, the computing time of the PNN for a single run was about 200-times faster than that of the BPNN.However, PNN requires an optimization method for finding the best sigma value, whereas the BPNN does not require prior optimization; thus, the iterative training for BPNN can be conducted only once.Nevertheless, for tthis experiment, the total time for running 60 time trials of PNN is still faster (60 × 2.09 = 125.4s) compared to a single run of BPNN (448 s), which is about 3.5-times faster.Using one-way ANOVA, the time difference between BPNN and PNN shows a significant difference with p < 10

Discussion and Conclusions
In our developed automatic emotion recognition system, we propose a new methodology for EEG signal feature extraction using bispectrum analysis, a 3D pyramid filtering method and an ANNbased classifier.The proposed method could recognize two binary types of emotion-high/low arousal and high/low valence-from the EEG input signals provided by the DEAP benchmark database.From the feature extraction mode experiment (Exp.#1), the mean percentage of the bispectrum provided the best recognition rate with lower complexity (74.22% for the arousal and 77.58% for the valence).To reduce and extract the features from the bispectrum values, we propose a new method of filtering by means of 3D pyramid filters and by comparing the 5 × 5 non-overlapping and the 7 × 7 overlapping filters (Exp.#2); we found that the 5 × 5 non-overlapping filters with various pyramid base sizes provide the highest recognition rate.
The reduction of channels (Exp.#3) did not significantly affect the recognition rate.Therefore, we conclude that the channels provided in the BCI equipment are sufficient to conduct emotion classification, and when it is more suitable to use fewer electrodes to reduce complexity and computation cost, the choice of eight electrodes in the frontal, frontal parietal and anterior-frontal regions is also sufficient.By increasing the number of training data samples to 90% of the entire dataset (Exp.#4), the recognition rate was increased by 18.96%; from this result, it can be noted that the amount of training data used greatly influences the results obtained.We also compared the benchmark BPNN classifier with a PNN classifier for the eight EEG channels' input (Exp.#5), and we achieved a slightly better result: 76.09% for arousal and 75.31% for valence.Although the result is not significantly higher, the computing time of the PNN to achieve the maximum recognition rate result is about 3.5-times faster than the BPNN.
The proposed bispectrum-based feature extraction gave a comparable result with some recent research using different feature extraction methods, such as Discrete Wavelet Transform -Relative Wavelet Energy (DWT-RWE) [3], Short Time Fourier Transform (STFT) [50], Power Spectral Density (PSD) [37] and Empirical Mode Decomposition (EMD) with Sample Entropy (SampEn) [53].Future studies in the EEG-based emotion recognition system should focus on improving the feature calculation from the bispectrum values.This research utilized two non-linear classifiers, namely the BPNN and the PNN.The result of using these classifier was also comparable to other recent research using other linear and non-linear classifier, such as SVM [50,53] and DLN [37].However, newer classifiers, such as group sparse canonical correlation analysis (GCCA) [54] and sparse deep belief networks (SDBN) [55], have never been used, giving room to further future works.

Discussion and Conclusions
In our developed automatic emotion recognition system, we propose a new methodology for EEG signal feature extraction using bispectrum analysis, a 3D pyramid filtering method and an ANN-based classifier.The proposed method could recognize two binary types of emotion-high/low arousal and high/low valence-from the EEG input signals provided by the DEAP benchmark database.From the feature extraction mode experiment (Exp.#1), the mean percentage of the bispectrum provided the best recognition rate with lower complexity (74.22% for the arousal and 77.58% for the valence).To reduce and extract the features from the bispectrum values, we propose a new method of filtering by means of 3D pyramid filters and by comparing the 5 × 5 non-overlapping and the 7 × 7 overlapping filters (Exp.#2); we found that the 5 × 5 non-overlapping filters with various pyramid base sizes provide the highest recognition rate.
The reduction of channels (Exp.#3) did not significantly affect the recognition rate.Therefore, we conclude that the channels provided in the BCI equipment are sufficient to conduct emotion classification, and when it is more suitable to use fewer electrodes to reduce complexity and computation cost, the choice of eight electrodes in the frontal, frontal parietal and anterior-frontal regions is also sufficient.By increasing the number of training data samples to 90% of the entire dataset (Exp.#4), the recognition rate was increased by 18.96%; from this result, it can be noted that the amount of training data used greatly influences the results obtained.We also compared the benchmark BPNN classifier with a PNN classifier for the eight EEG channels' input (Exp.#5), and we achieved a slightly better result: 76.09% for arousal and 75.31% for valence.Although the result is not significantly higher, the computing time of the PNN to achieve the maximum recognition rate result is about 3.5-times faster than the BPNN.
The proposed bispectrum-based feature extraction gave a comparable result with some recent research using different feature extraction methods, such as Discrete Wavelet Transform -Relative

Figure 1 .
Figure 1.Block diagram of the EEG-based emotion recognition system.

Figure 1 .
Figure 1.Block diagram of the EEG-based emotion recognition system.

Figure 2 .
Figure 2. Comparison between (a) power spectrum which shows dominant frequencies and (b) bispectrum, which shows dominant frequencies and their correlation (if any).

Figure 2 .
Figure 2. Comparison between (a) power spectrum which shows dominant frequencies and (b) bispectrum, which shows dominant frequencies and their correlation (if any).

Algorithm 2 :
Bispectrum 3D filtering for feature extraction algorithm.1. Define:M = number of column filter 2. N = number of row filter 3.For channel = 1 to C 4.

Figure 4 .
Figure 4. Bispectrum filtering step: (a) the original bispectrum contour plot, (b) one-quarter of the bispectrum matrix, (c) the result of the filtering process, (d) the mean as the feature, (e) the feature vector constructed from the non-redundant region of the quantized matrix and (f) the full feature vectors from the 32 channels.

Figure 4 .
Figure 4. Bispectrum filtering step: (a) the original bispectrum contour plot, (b) one-quarter of the bispectrum matrix, (c) the result of the filtering process, (d) the mean as the feature, (e) the feature vector constructed from the non-redundant region of the quantized matrix and (f) the full feature vectors from the 32 channels.

Figure 5 .
Figure 5. Recognition rate for different amounts of data training samples.

Figure 5 .
Figure 5. Recognition rate for different amounts of data training samples.

Figure 7 .
Figure 7. ROC plot of BPNN and PNN as the classifier.

Table 1 .
Recognition rate comparison of various feature type.MMOB, mean-magnitude of bispectrum; FOSM, first order spectral moment; SOSM, second order spectral moment; STD, standard deviation.
1Shown is the maximum recognition rate from 5 repeated experiments.

Table 2 .
Result of statistical significance of Tukey-Kramer HSD test in the arousal and valence recognition rate.

Table 3 .
Recognition rate results from different filter types.

Table 4 .
Channel selection and description.

Table 5 .
Result of the 10-fold cross-validation recognition rate for mean percentage feature type from 8 EEG channels with 5 × 5 non-overlapped filters.people with Videos 1, 11, 21 and 31 were used as testing data, while the rest was used for training.Similarly, for Fold 2 (F2), sample data from the people with Videos 2, 12, 22 and 32 were used for testing, and the rest were used for training, and so on, until Fold 10 (F10).This experiment was conducted using a BPNN classifier, and the reduced eight channels were used as the input.

Table 5 .
Result of the 10-fold cross-validation recognition rate for mean percentage feature type from 8 EEG channels with 5 × 5 non-overlapped filters.