A Novel Permutation Entropy-Based EEG Channel Selection for Improving Epileptic Seizure Prediction

The key research aspects of detecting and predicting epileptic seizures using electroencephalography (EEG) signals are feature extraction and classification. This paper aims to develop a highly effective and accurate algorithm for seizure prediction. Efficient channel selection could be one of the solutions as it can decrease the computational loading significantly. In this research, we present a patient-specific optimization method for EEG channel selection based on permutation entropy (PE) values, employing K nearest neighbors (KNNs) combined with a genetic algorithm (GA) for epileptic seizure prediction. The classifier is the well-known support vector machine (SVM), and the CHB-MIT Scalp EEG Database is used in this research. The classification results from 22 patients using the channels selected to the patient show a high prediction rate (average 92.42%) compared to the SVM testing results with all channels (71.13%). On average, the accuracy, sensitivity, and specificity with selected channels are improved by 10.58%, 23.57%, and 5.56%, respectively. In addition, four patient cases validate over 90% accuracy, sensitivity, and specificity rates with just a few selected channels. The corresponding standard deviations are also smaller than those used by all channels, demonstrating that tailored channels are a robust way to optimize the seizure prediction.


Introduction
Epilepsy is a serious brain disorder, second only to strokes in its effect. More than 50 million people worldwide are affected by epilepsy, and the symptoms of one-third of those are not controlled by anticonvulsant medication. Therefore, one of the critical objectives in seizure management in epileptic patients is its early detection and prediction to provide well-timed preventive interventions [1]. If epileptic seizures can be predicted in advance, the patients' unfortunate consequences can be alleviated. Unfortunately, despite decades of international efforts devoted to predicting seizures, seizure prediction remains an unsolved problem [2].
Two key components in research into seizure detection and prediction using epileptic electroencephalography (EEG) signals are feature extraction and classification [3,4]. Most of the existing research is patient-independent and trains models for all types of patients [5][6][7][8][9][10], while some EEG-based seizure detection algorithms are patient-dependent and are adaptive to individual patients. In order to reduce the computational load for a real-time seizure prediction using EEG data, identifying the most relevant channels for the seizure prediction is both important and effective. It can make seizure-predicting wearable or implantable devices with less complicated feature extraction during the process of developing machine learning algorithms for the real-time analysis. In addition, a decreased number of EEG channels may deliver more convenience to the patients.
However, selecting channels in epileptic features extraction is often not considered necessary. As to patient-specific feature extraction, although the benefits of patient-specific seizure prediction research have not yet been identified, we believe that discovering wellchosen channels tailored to an individual can lead to the uncovering of behavioral patterns in seizure activity through relations between neurophysiological characteristics and EEG channels [11], given the complex aspects of seizure onsets.
Even though much epileptic EEG feature-extraction research has been published, not many papers related to EEG channel selection have been reported over the last decades. Furthermore, the research about machine learning performance comparisons between results with selected channels and all channels is seldom found. Chang et al. [12] proposed that channel selection reduced the channel number from 22 to fewer than 6 channels, and it also saved 93.73% of the computation time. The best result showed a success rate of 70% in three-channel cases of the EEG database. Ibrahim et al. [13] also showed the seizure prediction probability by the selected channel, and the selected feature was higher than 70%, while the false-alarm probability was less than 30%. The channels were classified by a statistical frame. Chakrabarti et al. [14] applied an artificial neural network (ANN) and a principal component analysis (PCA) for the selection of epileptic EEG channels. The results revealed that the accuracy decreased simultaneously as the number of channels decreased. The highest accuracy of 86.7% was achieved with 18 channels out of 23 channels.
Nevertheless, none of those studies showed the machine learning validating performance comparisons between results with selected channels and results with all channels. Moctezuma and Molinas [15] decomposed the EEG data from each channel into different frequency bands using the empirical mode decomposition (EMD) or the discrete wavelet transform (DWT) for the channel selection. The results showed accuracies of up to 100% with only one EEG channel in the epileptic seizure classification, while all the test results of channels were less than 100%; however, this research only classified the seizure and non-seizure signals, not the pre-ictal signals. The classification performance to detect seizure EEG signals usually achieves high accuracy. Prasanna et al. [16] examined recent research to classify between seizure and non-seizure EEG signals. According to their review, the accuracy range that recent studies achieved was from 90% to almost 100%. This research, however, focuses on seizure prediction instead of seizure detection.
In this research, we confine the features to the channels, and present a patientdependent optimization method for EEG channel selection based on the permutation entropy (PE) values, and employing K nearest neighbors (KNN) combined with a genetic algorithm (GA) for epileptic-seizure prediction. In the last few decades, some seizure prediction studies have applied the GA to generate solutions to search features derived from EEG signals [17][18][19][20][21][22]. For example, Firpi et al. [23] employed a GA to create artificial features from EEG signals. In their experiment, three patients' datasets were used, and the validation was performed by the KNN, achieving an average of 83.33% seizure prediction. KNN is one of the most widespread methods in the machine learning techniques. As medical facilities require minimal computational time, the KNN has been used as a seizure prediction algorithm in many recent studies [24][25][26][27]. For instance, Wang et al. [27] proposed a KNN analysis on EEG data from 10 patients with epilepsy, achieving 73% sensitivity and 67% specificity on average using a 150-min prediction horizon.
The classifier in this research is the SVM, as the SVM classification complexity does not depend on the feature dimension, and it provides a global solution [28][29][30], which might be appropriate for epileptic EEG classification. Shiao et al. [31] showed that the SVM-based seizure prediction system could achieve a robust prediction for preictal period and normal period iEEG signals from dogs with epilepsy. The sensitivity was 90-100%, and the falsepositive rate was about 0-0.3 times per day. However, SVM does not always seem suitable for the epileptic EEG signals classification. Direito et al. [32] used massive data from 216 patients from the European Epilepsy Database, including 185 patients with scalp EEG recordings and 31 with intracranial data. They tested their method over a total of 16,729.80 h with inter-ictal data, including 1206 seizures using the SVM. The method achieved an overall sensitivity of 38.47% and a false-positive rate per hour of 0.20 (statistical significance only in 11% of the patients). This disproved the importance of proper feature extractions. This research is the first study to compare the effectiveness of EEG channel selection with that before channel selection. It also aims to reveal that patient-specific channel selection can contribute to a more efficient seizure prediction. The remainder of this paper is arranged as follows. Section 2 presents the details of the proposed techniques for the EEG channels selection and classifications. Section 3 explains the datasets used in this paper, experimental setup, and results. Section 4 discusses the findings of this research. Finally, the conclusions of this study are drawn in Section 5.

Methodology
The goal is to construct a less complicated seizure prediction system with less computational load but high accuracy for real-time seizure prediction. The PE values differentiated by KNN combined with a GA (KNN-GA) are employed in this research to select channels for efficient analysis and seizure prediction. The overall process is divided into three steps: PE calculation and data sampling, channel selection by KNN-GA, and test modelling by the machine learning method, SVM. Firstly, the raw EEG signals without noise-filtration, segmented into time windows, are directly used to acquire the PE values, which are the parameters obtained by feature extraction. Secondly, the selected PE values of each channel are used for selecting the most pre-ictal related channels through KNN-GA, which is executed repeatedly (maximum number of executions is 30 in this study). Finally, the effect of the selected channels is validated and compared using the SVM classification with all 23 channels. The primary process of the method is illustrated below (Figure 1). achieved an overall sensitivity of 38.47% and a false-positive rate per hour of 0.20 (statistical significance only in 11% of the patients). This disproved the importance of proper feature extractions. This research is the first study to compare the effectiveness of EEG channel selection with that before channel selection. It also aims to reveal that patient-specific channel selection can contribute to a more efficient seizure prediction. The remainder of this paper is arranged as follows. Section 2 presents the details of the proposed techniques for the EEG channels selection and classifications. Section 3 explains the datasets used in this paper, experimental setup, and results. Section 4 discusses the findings of this research. Finally, the conclusions of this study are drawn in Section 5.

Methodology
The goal is to construct a less complicated seizure prediction system with less computational load but high accuracy for real-time seizure prediction. The PE values differentiated by KNN combined with a GA (KNN-GA) are employed in this research to select channels for efficient analysis and seizure prediction. The overall process is divided into three steps: PE calculation and data sampling, channel selection by KNN-GA, and test modelling by the machine learning method, SVM. Firstly, the raw EEG signals without noise-filtration, segmented into time windows, are directly used to acquire the PE values, which are the parameters obtained by feature extraction. Secondly, the selected PE values of each channel are used for selecting the most pre-ictal related channels through KNN-GA, which is executed repeatedly (maximum number of executions is 30 in this study). Finally, the effect of the selected channels is validated and compared using the SVM classification with all 23 channels. The primary process of the method is illustrated below ( Figure 1).

Permutation Entropy
For the proper channels to be selected efficiently from EEG signals in the dataset, the collected original data samples are used as the input to obtain the PE values to measure the detailed variations in the EEG signals by expressing the signal in multi-scale timefrequency domains. The PE provides a quantity measure of the complexity of a dynamic system by capturing the order relations on time-series signals and their probability distribution of the ordinal patterns [33].
The first step is to convert a one-dimensional time series into a matrix of overlapping column vectors. Then, M-dimensional vectors are mapped into unique permutations that achieve the ordinal rankings of the data. These permutations are the values that are associated with each partitioned vector based on the ordinal position of the values within the vector. Then, the relative frequency of each permutation is calculated by counting the number of times the permutation is found in the signals divided by the total number of sequences [34]. Finally, the relative frequency of each permutation is used to compute the PE of the order M of the signals, which is given by Equation (1)

Permutation Entropy
For the proper channels to be selected efficiently from EEG signals in the dataset, the collected original data samples are used as the input to obtain the PE values to measure the detailed variations in the EEG signals by expressing the signal in multi-scale time-frequency domains. The PE provides a quantity measure of the complexity of a dynamic system by capturing the order relations on time-series signals and their probability distribution of the ordinal patterns [33].
The first step is to convert a one-dimensional time series into a matrix of overlapping column vectors. Then, M-dimensional vectors are mapped into unique permutations that achieve the ordinal rankings of the data. These permutations are the values that are associated with each partitioned vector based on the ordinal position of the values within the vector. Then, the relative frequency of each permutation is calculated by counting the number of times the permutation is found in the signals divided by the total number of sequences [34]. Finally, the relative frequency of each permutation is used to compute the PE of the order M of the signals, which is given by Equation (1) [34]: The smaller the value of PE M , the more regular and more deterministic is the time series. Contrarily, the closer to 1 PE M is, the noisier and more random the time series is.

Channel Selection by KNN Based on Genetic Algorithm
Noise and redundant data points in signals can render information on the training of the method irrelevant. For effective and efficient EEG signal analysis, identifying the channels that contribute most to the prediction outcomes is crucial. A genetic algorithm (GA), developed by John Holland et al. in 1970s [35] is also applied in this research. A GA is a search heuristic that imitates the process of Charles Darwin's theory of natural selection, in areas such as inheritance, mutation, selection, and crossover.
For feature selection, 'mutation' in GA means switching features on and off. 'Crossover' means interchanging the used features. In this paper, the selection is based on the accuracy of the KNN classification performance. KNN is a supervised learning algorithm, and it is one of the most important non-parameter algorithms in the pattern recognition field [36]. The training samples themselves generate the classification rules without any additional data. The KNN classification algorithm predicts the test sample's category according to the K training samples, which are the nearest neighbors to the test sample, and judges the category with the most significant probability [36].
The overall process of KNN-GA for a channel selection works as follows in this study ( Figure 2):
KNN-GA begins with a set of individual subjects, which are the total population (all individuals). A subject is described by a set of parameters (channels in this research) noted as Genes. Genes are combined into a string to form a Chromosome (any possible solution). The population size is 20, and the minimum number of Genes is one.

3.
Then each Chromosome in the population is evaluated by the fitness function (KNN in this paper) to test how well it predicts pre-ictal periods. It gives a fitness score (maximum: infinity) to each subject. 4. Now the selection operator chooses some of the Chromosomes for reproduction based on a probability distribution. We set 0.9 for the initial probability. For example, if f (x) is a fitness function, then the probability that chromosome C X is chosen to reproduce is: where Npop is the number of Chromosomes in the population. 5.
Next, we mix Chromosomes for crossover (type: uniform, crossover probability: 1.0). Each Gene is selected randomly from one of the corresponding genes of the parent Chromosomes. 6.
The final step is to apply random mutations. For each Gene that we are to copy to the new population, we allow a small probability of error (0.01 in this paper). 7.
Repeat from step 2 until the population converges (does not produce offspring which are significantly different from the previous generation). It can then be said that the genetic algorithm has provided a set of solutions to our problem (maximum number of generations: 30).

Selected Channels Validation by a SVM Model
Following channel selection, a SVM is used to classify the patterns into pre-ictal and normal periods. There are three types of optimization method for the SVM used in this research: Lagrange multiplier (LM), evolutionary and Particle Swarm Optimization (PSO).
The PE values of the selected channels by KNN-GA were trained and tested for each of the three types of SVMs, and the best result was selectively adopted. The PE values of all For feature selection, 'mutation' in GA means switching features on and off. 'Crossover' means interchanging the used features. In this paper, the selection is based on the accuracy of the KNN classification performance. KNN is a supervised learning algorithm, and it is one of the most important non-parameter algorithms in the pattern recognition field [36]. The training samples themselves generate the classification rules without any additional data. The KNN classification algorithm predicts the test sample's category according to the K training samples, which are the nearest neighbors to the test sample, and judges the category with the most significant probability [36].
The overall process of KNN-GA for a channel selection works as follows in this study ( Figure 2):  4. Now the selection operator chooses some of the Chromosomes for reproduction based on a probability distribution. We set 0.9 for the initial probability. For example, if f(x) is a fitness function, then the probability that chromosome CX is chosen to reproduce is: where Npop is the number of Chromosomes in the population. 5. Next, we mix Chromosomes for crossover (type: uniform, crossover probability: 1.0). Each Gene is selected randomly from one of the corresponding genes of the parent Chromosomes. 6. The final step is to apply random mutations. For each Gene that we are to copy to the new population, we allow a small probability of error (0.01 in this paper). 7. Repeat from step 2 until the population converges (does not produce offspring which are significantly different from the previous generation). It can then be said that the genetic algorithm has provided a set of solutions to our problem (maximum number of generations: 30).

Selected Channels Validation by a SVM Model
Following channel selection, a SVM is used to classify the patterns into pre-ictal and normal periods. There are three types of optimization method for the SVM used in this research: Lagrange multiplier (LM), evolutionary and Particle Swarm Optimization (PSO). The PE values of the selected channels by KNN-GA were trained and tested for each of the three types of SVMs, and the best result was selectively adopted. The PE values of all channels were also derived through the same process. The detailed steps are demonstrated below (Figure 3).
The 24 patients' EEG signals with a 256 Hz sampling rate were recorded using 23 channels which are FP1-F7 (1)   Epileptic EEG signals are typically classified into four periods: normal, pre-ictal, ictal, and post-ictal periods (as shown in Figure 5). In some experimental results, the high accuracy rate might not be impressive when available normal period data are surplus and the pre-ictal period signals occupy only a tiny fraction of the testing dataset. Thus, this research restricts the ratio of normal to pre-ictal training/testing data up to 10:1. Selecting segments of EEG signal recording for the analysis is one of the significant problems of seizure prediction research. The seizure prediction horizon (SPH) is the period between the seizure alarm sign and the beginning of seizure occurrence. Therefore, the SPH prerequisites are to be designated before assessing the analysis. The size of the SPH has been reported to be between a few minutes and several hours before a seizure onset. The Epileptic EEG signals are typically classified into four periods: normal, pre-ictal, ictal, and post-ictal periods (as shown in Figure 5). In some experimental results, the high accuracy rate might not be impressive when available normal period data are surplus and the pre-ictal period signals occupy only a tiny fraction of the testing dataset. Thus, this research restricts the ratio of normal to pre-ictal training/testing data up to 10:1. Selecting segments of EEG signal recording for the analysis is one of the significant problems of seizure pre-diction research. The seizure prediction horizon (SPH) is the period between the seizure alarm sign and the beginning of seizure occurrence. Therefore, the SPH prerequisites are to be designated before assessing the analysis. The size of the SPH has been reported to be between a few minutes and several hours before a seizure onset. The standard size is still a debatable question. This research set an SPH of 10 min (2.8 s duration) for both training and testing.
research restricts the ratio of normal to pre-ictal training/testing data up to 10:1. Selecting segments of EEG signal recording for the analysis is one of the significant problems of seizure prediction research. The seizure prediction horizon (SPH) is the period between the seizure alarm sign and the beginning of seizure occurrence. Therefore, the SPH prerequisites are to be designated before assessing the analysis. The size of the SPH has been reported to be between a few minutes and several hours before a seizure onset. The standard size is still a debatable question. This research set an SPH of 10 min (2.8 s duration) for both training and testing.
Each patient dataset contains data points of 17-154 h. Data samples of a normal period (2.8 s duration) are randomly selected in each hour of the 17-154 h duration. In summary, the samples are collected from: • Pre-ictal period: 10 min before a seizure onset.

•
Normal period: between pre-ictal and post-ictal periods (30 min after a seizure onset).   • Pre-ictal period: 10 min before a seizure onset. • Normal period: between pre-ictal and post-ictal periods (30 min after a seizure onset).

Validation of the Channel Selection Technique
The KNN-GA algorithm selected three to eight channels among 23 channels based on the PE values from each patient's EEG signals. The most frequently selected channels are P7-O1 (10 times), P8-O2 (9 times), C3-P3 (8 times) and CZ-PZ (8 times) from 22 patient datasets ( Figure 6).

Validation of the Channel Selection Technique
The KNN-GA algorithm selected three to eight channels among 23 channels based on the PE values from each patient's EEG signals. The most frequently selected channels are P7-O1 (10 times), P8-O2 (9 times), C3-P3 (8 times) and CZ-PZ (8 times) from 22 patient datasets ( Figure 6). The efficiency of a seizure prediction algorithm is determined by the prediction rate, accuracy, sensitivity, and specificity. The prediction rate refers to how many predictions are correctly made out of the total number of ictal occurrences in the testing set. Sensitivity is the percentage of the true pre-ictal prediction, and specificity is the percentage of the The efficiency of a seizure prediction algorithm is determined by the prediction rate, accuracy, sensitivity, and specificity. The prediction rate refers to how many predictions are correctly made out of the total number of ictal occurrences in the testing set. Sensitivity is the percentage of the true pre-ictal prediction, and specificity is the percentage of the true normal period prediction (Table 2). Table 3 presents the performance of the selected channels and all channels based on the SVM classification testing for the 22 patients in the CHB-MIT Scalp EEG Database. Table 2. Accuracy, sensitivity, and specificity.
The prediction rate average of the selected channels from 22 patients is 92.42%, while that of all channels from 22 patients is only 71.13%, an improvement of 29.93%. The accuracy average of the selected channels is 74.60%, and that of all channels is 67.46%. The sensitivity and specificity completed by the selected channels testing also show a higher rate (average 69.51% and 73.14%, respectively) than all channels testing (average 56.25% and 69.29%, respectively). On average, the accuracy, sensitivity, and specificity with selected channels are improved by 10.58%, 23.57%, and 5.56%, respectively. The analysis of variance (ANOVA) tests also confirm that the accuracy and sensitivity using the selected channels from the SVM testing result are significantly higher than those using all channel testing results (at p < 0.01 and p < 0.05, respectively) ( Table 4). The standard deviations of the accuracy, sensitivity, and specificity from the selected channels testing for the 22 patients are smaller (15.36, 25.03, and 20.81, respectively) than from all channel testing (Table 4). In addition, the execution time of the SVM model is almost instantaneous (10-500 milliseconds) in many patients' cases. Nevertheless, the average percentage of computational runtime saved by channel selection is 42%.
Two-dimensional area graphs are also added to view the numerical results visually (Figure 7). In Figure 7a,b, the blue shapes with red outline (pre-ictal period) of "Real status" are closer to the blue shapes of "Prediction using the selected channels" than the black shapes of "Prediction using all channels". Thus, the figures demonstrate that using the selected channels can better predict the pre-ictal period than using all channels.  Two-dimensional area graphs are also added to view the numerical results visually (Figure 7). In Figures 7a,b, the blue shapes with red outline (pre-ictal period) of "Real status" are closer to the blue shapes of "Prediction using the selected channels" than the black shapes of "Prediction using all channels". Thus, the figures demonstrate that using the selected channels can better predict the pre-ictal period than using all channels.

Discussion
Seizures can occur anywhere in the brain, but for children, they frequently occur in the temporal and frontal lobes, affecting the functions these regions control [38]. Three to eight channels among 23 channels were selected for each subject by KNN_GA based on PE values of epileptic EEG signals. The most frequently selected channel was P7-O1 (10 times), which is located at the scalp of the parietal and occipital lobes of the brain. However, the total number of channels connected to the frontal and temporal lobes region is much higher than that of the parietal and occipital region channels. Consequently, the number of selected frontal and temporal lobes region channels is higher.

Discussion
Seizures can occur anywhere in the brain, but for children, they frequently occur in the temporal and frontal lobes, affecting the functions these regions control [38]. Three to eight channels among 23 channels were selected for each subject by KNN_GA based on PE values of epileptic EEG signals. The most frequently selected channel was P7-O1 (10 times), which is located at the scalp of the parietal and occipital lobes of the brain. However, the total number of channels connected to the frontal and temporal lobes region is much higher than that of the parietal and occipital region channels. Consequently, the number of selected frontal and temporal lobes region channels is higher.
The patient-specific channel selection technique improves the prediction rate by 29.93% and the accuracy, sensitivity, and specificity by 10.58%, 23.57%, and 5.56%, respectively. The average accuracy, sensitivity, and specificity of the SVM testing are 74.60%, 69.51%, and 73.14%, respectively, and with all channels, they are 67.46%, 56.25%, and 69.29% in this research into epileptic seizure prediction. In particular, the true pre-ictal prediction rate (sensitivity) of the classification with the selected channels is considerably higher than that with all channels. The corresponding standard deviations are also smaller than those using all channels, demonstrating that tailored channels are more robust in optimizing seizure prediction rates. With the selected channels, the highest accuracy, sensitivity, and specificity rates are 97.28% (patient ID 1), 99.17% (patient ID 7), and 100% (patient ID 1), respectively. On the other hand, patient ID 17 and ID 24 cases achieved poor accuracy (under 50%) despite having high sensitivity.
There are a couple of limitations for the proposed approach. (1) Based on the results from different subjects (such as Patients 17 and 24), it is observed that the patterns of PE values during the nighttime are similar to the patterns of PE values during the pre-ictal period. This phenomenon may affect the predication accuracy. In reality, it is difficult to verify whether a patient is sleeping or just at rest during the nighttime. (2) It is possible that the starting point of the preictal periods are likely not the same for all patients. In this research, the SPH is set to 10 min for all subjects during the model training, while the SPH could be any time period (e.g., several hours).
This research aims to reduce the complexity of feature extraction and classification steps in predicting seizures while a high accuracy is retained and the computation time is significantly reduced. The average execution time by using the selected channels was only 47.09% of that by all channels. For Patient IDs 1,8,19, and 20, more than 90% validation accuracy, sensitivity, and specificity rates with just a few selected channels are obtained in this research method. The results demonstrate that the proposed EEG channel selection method with a suitable classification algorithm (SVM in this paper) can increase real-time seizure prediction accuracy.

Conclusions
In this paper, we recognize that the patterns of epileptic seizure occurrences are patient specific. The key issue is to discern which regions of the brain are most relevant to the seizure onsets for a specific patient. The most frequently selected channel was P7-O1 (10 times). However, many EEG channels were connected to the temporal and frontal lobes, which frequently causes seizures in children.
After finding the suitable channels for each patient through the KNN-GA algorithm, the SVM training and testing based on PE values of epileptic EEG signals exhibit more accurate outcomes of seizure prediction and less computation load than with all 23 channels. Consequently, fewer patient-dependent EEG channels can contribute to essential aspects of seizure prediction analysis, such as less EEG electrodes required on the scalp and more accurate mobile real-time seizure predictions.
Author Contributions: J.S.R. presented the project idea and completed the modelling, experiments and the writing of this manuscript, while T.L. and Y.L., being supervisors, contributed to the design of the study, the completion of the project and the editing of this manuscript. All authors read and approved the final. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Data Availability Statement:
The data and materials used in this study are available at the University of Southern Queensland under the research data management policy.

Conflicts of Interest:
The authors declare no conflict of interest.