An Ensemble Feature Selection Approach to Identify Relevant Features from EEG Signals

: Identifying relevant data to support the automatic analysis of electroencephalograms (EEG) has become a challenge. Although there are many proposals to support the diagnosis of neurological pathologies, the current challenge is to improve the reliability of the tools to classify or detect abnormalities. In this study, we used an ensemble feature selection approach to integrate the advantages of several feature selection algorithms to improve the identiﬁcation of the characteristics with high power of differentiation in the classiﬁcation of normal and abnormal EEG signals. Discrimination was evaluated using several classiﬁers, i.e., decision tree, logistic regression, random forest, and Support Vecctor Machine (SVM); furthermore, performance was assessed by accuracy, speciﬁcity, and sensitivity metrics. The evaluation results showed that Ensemble Feature Selection (EFS) is a helpful tool to select relevant features from the EEGs. Thus, the stability calculated for the EFS method proposed was almost perfect in most of the cases evaluated. Moreover, the assessed classiﬁers evidenced that the models improved in performance when trained with the EFS approach’s features. In addition, the classiﬁer of epileptiform events built using the features selected by the EFS method achieved an accuracy, sensitivity, and speciﬁcity of 97.64%, 96.78%, and 97.95%, respectively; ﬁnally, the stability of the EFS method evidenced a reliable subset of relevant features. Moreover, the accuracy, sensitivity, and speciﬁcity of the EEG detector are equal to or greater than the values reported in the literature.


Introduction
Research on developing systems for capturing and analyzing biomedical signals has increased over time [1]. In addition, the need to find new mechanisms to support the clinical diagnosis of specific pathologies has accelerated this process. For instance, electroencephalographic (EEG) signal processing monitors neuronal activity in the brain and obtains data that describe valuable information for detecting neurological pathologies. Nowadays, the diagnosis of diseases such as epilepsy through digital analysis of EEG signals has become one of the promising research areas supporting the automatic EEG reading [2]. The EEG signals are decomposed and processed through feature extraction mechanisms to obtain a description that can classify them as normal or abnormal [3]. Likewise, other studies based on the analysis of EEG signals have been performed to analyze brain activity [4,5] and support clinical diagnosis. For example, some proposals have used neural networks, decision trees, rules based on domain knowledge, and clustering mechanisms to classify new signals [6,7].
Even though numerous mechanisms characterize EEG signals by detecting or classifying events associated with epilepsy, this area's most significant research challenge is improving the classification's performance in terms of precision, accuracy, and recall, providing reliable tools that support neurologists in the diagnosis. Considering the above, one of the main strategies to improve the classification models in machine learning or data mining is to train the models with relevant features, that is, those features that do not represent noise for the learning model and, on the contrary, have a high power of differentiation between classes.
On the other hand, feature selection (FS) helps build robust classification models [8] by identifying relevant features. This process is a mandatory task, especially when the datasets have (i) high dimensionality [7] or (ii) more features than instances, which means the dataset has more columns than rows. This scenario coincides with the classification of abnormalities in EEGs considering the large number of feature extractors reported in the literature and the low availability of datasets with instances or single rows that describe epileptiform events.
Besides, the literature review shows the use of different feature selection techniques to support the automatic analysis of different types of physiological signals. For example, proposals range from general methods to select features on clinical databases [9] to implementations designed to help the diagnosis of diseases such as Alzheimer's [10], multiple sclerosis [11], sleep disorders [12], and epilepsy [13]. Additionally, the list of reviewed papers presents solutions designed for the detection of emotions by analyzing the electrical activity of the brain [14] or the recognition of activities using analysis of physiological signals [15] and external devices [16]. Furthermore, some literature has reported feature selection methods to identify features with the more remarkable power of differentiation in classifying or detecting epileptic patterns. However, most of the reviewed results focused on identifying specific patterns using a set of features without considering each feature's relevance or impact in subsequent analyses [6]. Thus, the proposals end up training machine learning models with features that could represent noise or redundancy for the learning process.
Recently, several studies have focused on improving the performance of feature selection algorithms. For example, in [17], the authors proposed identifying correlations between features and classes to enhance the effectiveness and maintain a low computational cost in the feature selection process. Additionally, Refs. [18,19] incorporated techniques such as bootstrap to select features using samples from the original dataset and integrate the subsets of features generated. However, these proposals depend on balancing the datasets and the continuous data (data that can be measured on an infinite scale), which could bias the subsequent analyses. Hence, some authors have proposed assembly feature selection algorithms to improve the identification of relevant features through the consensus of FS algorithms with different approaches [20].
Considering the above, we believe that an ensemble feature selection (EFS) approach can improve the selection of relevant features and enhance the classification of epileptiform events in EEG signals. Furthermore, this approach is based on the premise of multiple classifiers: "several classifiers classify better than one", which would be applied to the feature selection, where we intended to demonstrate that "several feature selectors select better than one".
The main objective of this paper is to show how to improve the classification of EEG signals by enhancing the feature selection process with the ensemble feature selection method.
The rest of the document is organized as follows: Section 2 shows the feature extractors used to calculate the dataset of normal and abnormal segments of EEGs. Section 3 presents the evaluation performed to validate the relevant features selected by the EFS approach. Section 4 offers a discussion of results and contributions. Finally, Section 5 describes the main conclusions of this research.

Dataset
The EEG repository built in [21] contains 200 records from 200 patients that, given their structure, cannot be processed by machine learning algorithms. Each EEG record was acquired under the electrode positioning system 10-20, considering a sampling rate Besides, each EEG was decomposed channel by channel, and 672 segments diagnosed as abnormal were extracted and described using a set of feature extractors. Each segment had 200 samples. This same process was carried out for a set of 672 segments considered normal. Thus, the dataset was built with 142 features extracted from 1344 EEG segments. Since all the descriptors were applied to all the segments, the dataset did not contain columns with null data.
The descriptors used to extract the features from the EEG signals are described below.

• Basic Descriptors
Statistical features allow summarizing the values that describe a segment of EEG signal in a single value. The measures of this type that will be applied in the construction of the dataset are min, max, mean, median, low median, high median, variance, and standard deviation.

• Entropy
Entropy is considered a family of statistical measures that quantify the variant complexity in a system. In this study, we evaluated three different ways of measuring Entropy:

Shannon Entropy
Approximate Entropy Renyi Entropy RenyiEntropy(x, m) = SamEn(x, m, r) + log(2r) • Kurtosis and Skewness The skewness and kurtosis are higher-order statistical attributes of a time series.
Skewness represents the degree of distortion from the symmetrical bell curve or the normal distribution. In other words, the lack of symmetry in data distribution is measured by skewness. Kurtosis measures the peakedness of the probability density function (PDF) of a time series. It is used to measure the outliers present in the distribution.

• Energy
The signal is viewed as a function of time, and energy represents its size. The energy can be measured in different ways, but the area under the curve is the most common measure to describe the size of a signal. It measures the signal strength, and this concept can be applied to any signal or vector.
• Fractal Dimension-Higuchi The fractal dimension corresponds to a noninteger dimension of a geometric object. Based on this principle, fractal dimension analysis is used to analyze biomedical signals. In this approach, the waveform is considered a geometric figure [22]. This type of analysis provides a quick mechanism to calculate the fractal dimension bypassing the series in a binary sequence. For example, the following describes the equation that calculates the Petrosian fractal dimension [22]: •

Hurst Exponent
This exponent is a measure of the predictability of the signal. It is a scalar between 0 and 1 which measures long-range correlations of a time series [23].

•
Zero-Crossing Rate The zero-crossing rate is a statistical feature that describes the number of times that a signal crosses the horizontal axis.

• Hjort Parameters
The Hjort parameters describe statistical properties in the time domain [12]. Usually, these are used to analyze electroencephalography signals.

Activity
Activity, also known as the variance or mean power, measures the squared standard deviation of the amplitude.
Mobility Mobility measures the standard deviation of the slope concerning the standard deviation of the amplitude.

Complexity
This parameter is associated with the wave shape.
• Discrete Wavelet Transform The discrete wavelet transform allows the analysis of a signal in a specific segment. The procedure consists of expressing a continuous signal to expand coefficients of the internal product between the particular segment and a mother wavelet function. As a result, the wavelet transform's discretization changes from a continuous mapping to a finite set of values. This process is done by changing the integral in the definition by an approximation with summations. Hence, the discretization represents the signal in terms of elementary functions accompanied by coefficients.
The mother wavelet functions include a set of scale functions. The parent functions represent the fine details of the signal, while the scale functions calculate an approximation. Thus, considering the above, a function or signal can be described as a summation of wavelet functions and scale functions.

of 17
A signal can be decomposed into various levels from the time domain to the frequency domain in wavelet analysis. The decomposition is done from the detail coefficients as well as the approximation coefficients. Figure 1 describes the different encoding paths for n levels of decomposition. The upper level of the tree represents the temporal representation. As the decomposition levels increase, an increase in the compensation in the time-frequency resolution is obtained. Finally, the last level of the tree describes the representation of the signal frequency.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 18 A signal can be decomposed into various levels from the time domain to the frequency domain in wavelet analysis. The decomposition is done from the detail coefficients as well as the approximation coefficients. Figure 1 describes the different encoding paths for n levels of decomposition. The upper level of the tree represents the temporal representation. As the decomposition levels increase, an increase in the compensation in the time-frequency resolution is obtained. Finally, the last level of the tree describes the representation of the signal frequency. •

Fast Fourier Transform
The fast Fourier transform computes a short version of the discrete Fourier transform of a signal by decomposing the original signal into different frequencies (smaller transforms). The decomposed signals are used to calculate the resulting transform signal. FFT is used to convert a signal from the time domain to a representation in the frequency domain or vice versa.
Features extracted from the fast Fourier transform calculation are as follows: o

Spectral Centroid
The spectral centroid is a statistical measure used to describe the spectrum's shape in digital signal processing. This centroid defines the spectrum as a probability distribution and represents where the center of mass of the spectrum is located. o

Spectral Flatness
The Spectral Flatness defines the ratio of the geometric mean to the arithmetic mean of a power spectrum.
The crest factor defines how extreme the peaks are in a signal. •

Matched Filter
Matched filters are basic signal analysis tools used to extract known waveforms from a signal that has been contaminated with noise. For example, in the context of the detection of epileptic spikes, given a signal ( ) that describes the brain activity (EEG), the matched filter ℎ( ) seeks a well-known pattern of epilepsy s(t); then, if the signal contains •

Fast Fourier Transform
The fast Fourier transform computes a short version of the discrete Fourier transform of a signal by decomposing the original signal into different frequencies (smaller transforms). The decomposed signals are used to calculate the resulting transform signal. FFT is used to convert a signal from the time domain to a representation in the frequency domain or vice versa.
Features extracted from the fast Fourier transform calculation are as follows:

Spectral Centroid
The spectral centroid is a statistical measure used to describe the spectrum's shape in digital signal processing. This centroid defines the spectrum as a probability distribution and represents where the center of mass of the spectrum is located.

Spectral Flatness
The Spectral Flatness defines the ratio of the geometric mean to the arithmetic mean of a power spectrum.
Crest Factor The crest factor defines how extreme the peaks are in a signal. •

Matched Filter
Matched filters are basic signal analysis tools used to extract known waveforms from a signal that has been contaminated with noise. For example, in the context of the detection of epileptic spikes, given a signal x(t) that describes the brain activity (EEG), the matched filter h(t) seeks a well-known pattern of epilepsy s(t); then, if the signal contains an epileptiform pattern, the signal is described by the brain activity n(t) with the abnormality s(t) generating x(t) = s(t) + n(t). Otherwise, the signal only contains the normal brain activity x(t) = n(t).
Considering the above, 21 descriptors were applied on the normal and abnormal EEG segments, and their wavelet coefficients (5) were generated from the original segments generating 126 features. The 21 descriptors are min, max, mean, median, high median, low median, variance, standard deviation, Shannon entropy, approximate entropy, Renyi entropy, kurtosis, skewness, energy, Higuchi fractal dimension, Petrosian fractal dimension, Hurst exponent, zero-crossing rate, Hjort activity, Hjort mobility, and Hjort complexity. Besides, the fast Fourier transform (FFT) was calculated, and 15 descriptors were applied to the result of FFT: min, max, mean, median, high median, low median, variance, standard deviation, Shannon entropy, kurtosis, skewness, energy, spectral centroid, spectral flatness, and crest factor. The matched filter was also applied to the original segments, and a Boolean feature was generated with the results. Then, we obtained 142 features: 126 features calculated from the original segment and 5 wavelet coefficients (21 × 6), 15 features extracted from the FFT calculation, and the matched filter.
Considering the number of segments that could be analyzed in a single EEG record (1 EEG with 21 channels and 30 min of duration could generate more than 37,800 segments of 200 samples), it is necessary to reduce the number of features not only to reduce the complexity of describing the segments but also to avoid the introduction of noise and redundant information into the classification process and increase the stability of the classifiers.

The Ensemble Feature Selection Approach
A dataset could contain three types of features: relevant, redundant, and noise. The category of the feature selection (FS) method: filter, wrapper, or embedded, is defined by the mechanism that evaluates the relevance of the features: statistical tests or crossvalidation. The analysis performed by FS methods defines a ranking of feature relevance in the filter-based techniques, a subset of relevant features in wrapper methods, or a subset of features with a learning model in the embedded methods. The rankings of features generated by filter methods are used to select the k highest-ranked features.
Considering ensemble learning, the consensus of several experts improves the creation of a decision in a context [24]. Thus, we decided to use the results of our previous research, where we built a framework of ensemble feature selection [25]. This considers the pooling of n FS algorithms by aggregating their results in a unique subset of relevant features. This scheme is described in Figure 2 and defined in [26] as a heterogeneous centralized ensemble, where single methods represent each FS method used to select a subset of relevant features, outcomes of single methods are the subset generated, pooling is the process to aggregate all subsets of relevant features, and relevant features are the result of the pooling process. The EFS method described in [25] uses an importance index (II) to aggregate the subsets generated by the n FS algorithms. First, the subsets of features generated by each FS method build a set SUM with all selected features. Then, for each feature in the subset The EFS method described in [25] uses an importance index (II) to aggregate the subsets generated by the n FS algorithms. First, the subsets of features generated by each FS method build a set SUM with all selected features. Then, for each feature in the subset SUM, the importance index is computed according to Equation (15). Thus, the number of times that feature i is presented in the subset SUM (FF i ) is divided by n to calculate its importance index. Finally, the EFS selects the features with an importance index greater than a threshold defined by the user.
The main objective of the ensemble feature selection approach is to reach a consensus among several FS methods to generate a subset of relevant features capable of representing the advantages of all used methods and face the biases of the single methods by compensating their disadvantages with the benefits of the others. Thereby, the result of EFS is a subset of relevant features that could improve the performance of subsequent analyses, such as classification processes.
Although the EFS implemented in the framework could be configured with different FS algorithms, in this study, we used five FS algorithms, three based on rankings of features (ANOVA, chi-squared, and mutual information), one wrapper (importance of features calculated by decision trees), and one embedded (recursive feature elimination-RFE). Each single FS algorithm generated a subset of relevant features, which the EFS aggregates.

N-Fold: Cross-Validation
Cross-validation is an analysis tool that allows the evaluation of the results offered by a model. This method is used to divide the dataset into smaller sets to train and evaluate a classifier. The single step divides a sample into test and training data. For this study, the application of single cross-validation was carried out for splitting the test data. However, N-fold cross-validation implied breaking the original dataset into n samples, and for each sample, it tested and trained the subsamples. Averaging accuracies calculated for all samples allowed us to determine a general accuracy statistically. Figure 3 describes a general scheme of N-fold cross-validation. It shows how the mechanism divides the sample data into n partitions and performs the traditional crossvalidation process n times, iterating different partitions as a test dataset and the remaining n − 1 partitions as a training dataset.

Classification Algorithms
In machine learning, classification is a process for categorizing data into classes. The objective is to predict the class of given data points or instances. For this study, we implemented the following algorithms using the scikit-learn framework:

Classification Algorithms
In machine learning, classification is a process for categorizing data into classes. The objective is to predict the class of given data points or instances. For this study, we implemented the following algorithms using the scikit-learn framework:

•
Decision Tree: This is supervised machine learning algorithm where the data are divided into several levels to obtain an outcome (class). For the evaluation, the algorithm's parameters were tested to evaluate the best performance for the classification. However, the best results were achieved when the entropy of the value and random were assigned to the parameters criterion and splitter. • Logistic Regression: This is a machine learning algorithm for binary classification. This method measures the relationship between the variable that we want to predict and the features by estimating probabilities. One of the parameters established in the configuration was class_weight to define if the dataset was balanced or not. Besides, the solver used was liblinear to minimize the multivariate function by solving the univariate optimization problem in a loop. • Random Forest: This is a machine learning algorithm based on the ensemble of decision trees. The configuration of this algorithm that achieved the best performance of the model included 35 estimators, entropy as a function to measure the quality of a split (criterion), and bootstrap option. • Support Vector Machine: This is a popular supervised learning algorithm; its goal is to create the best line (decision) boundary to segregate n-dimensional space into classes. We used a kernel polynomial with 3 degrees and without a limit of iterations for building the SVM classifier.
All settings were made according to the scikit-learn configurations.

Jackar Index
The Jackar index is a statistic used to measure the similarities between sample sets. It is defined by Equation (16): A and B are two subsets of relevant features calculated by an FS algorithm using different data samples. Figure 4 describes an architecture proposal of a detector of epileptic events. In this scheme, the detector decomposes an EEG signal into channels and segments. Thus, each channel was broken into 200 samples, and each segment was classified as normal or abnormal using a classifier.

The Detector of Epileptic Activity
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 18 Figure 4 describes an architecture proposal of a detector of epileptic events. In this scheme, the detector decomposes an EEG signal into channels and segments. Thus, each channel was broken into 200 samples, and each segment was classified as normal or abnormal using a classifier.

Results
The evaluation was divided into three stages. The first one focused on assessing the

Results
The evaluation was divided into three stages. The first one focused on assessing the feature relevance; thus, we compared the performance of several classification algorithms using all features with the performance achieved when we used subsets of relevant features selected by the EFS approach. The second stage evaluated the classification algorithm and the subset of relevant features that reached the best performance in the previous step by applying N-fold cross-validation. Finally, the stability of the subset of relevant features was calculated.

Evaluating Ensemble Feature Selection (EFS)
To evaluate the utility of subset features selected by the EFS algorithm, a set of four classifiers (decision tree, logistic regression, random forest, and SVM) were configured to determine which one of them achieves the best performance in classifying normal or abnormal brain activity. The evaluation considered 70% of data for training the models and 30% for testing them. This process was repeated 10 times, and the data were split randomly. Tables 1-4 describe the results of the accuracy and standard deviation of accuracy in classification calculated using all features and the subset of selected features by the EFS technique using different sizes (K) of the subsets generated by the single methods. "Features selected" represents the number of features chosen after aggregating the subsets generated by each FS algorithm in the EFS method. Thus, K represents the number of features determined by each FS algorithm. Besides, when we trained the models with all features, we obtained different values in each row because we repeated each EFS test. However, although the values are different, they are close. These results are because the data were randomly split in each test, and each test used different samples of the data.  Tables 1-4 prove that subsets of selected features could reach a similar performance in classification compared to the performance achieved using all features. Even the accuracy of SVM improved when the classification process used only subsets of features selected by the EFS technique. Likewise, the previous tables show that support vector machine was the algorithm with the best performance in classifying abnormal and normal segments of brain activity. Table 5 shows the features selected by each single FS algorithm that allowed training the model with the best performance in this preliminary test. Table 5. Subsets of features selected by single algorithms. The select K best FS methods were calculated using the ANOVA, chi-squared, and mutual information metrics.

Selecting Relevant Features from the EEG Dataset
This phase used EFS to analyze the relevance of features on a dataset with descriptions of normal and abnormal segments extracted from EEGs. To validate the subset of relevant features calculated using the EFS method, a classification process was built to evaluate the accuracy reached with the subset of features selected.
The setting of the aggregation method of the EFS returned the features aggregated from the subsets of relevant features generated by each FS algorithm with an importance index greater than or equal to 0.7. This setting was established experimentally following the trial-and-error approach. Thus, we tested different hyperparameters for the FS algorithms and different thresholds for the aggregation; the selected threshold (0.7) was the importance index used to select the subset of relevant features that allow to build the classifier with the best performance.  Table 6 shows the results in the classification of a decision tree (DT) algorithm, linear regression (LR) algorithm, random forest (RF) algorithm, and support vector machine algorithm using all features, and the features calculated by the select K best algorithm, recursive feature elimination algorithm, feature importance algorithm, and EFS method. In this experiment, the select K best used the chi-squared metric, which obtained a subset of relevant features better than the subsets generated by ANOVA and mutual information metrics. The comparison was based on the accuracy achieved by each subset of relevant features generated by each metric. We considered 70% of the data for training the models and 30% for testing them for this evaluation. The results shown in Table 6 evidence that the EFS method allowed identifying the best subset of relevant features used to classify normal and abnormal brain activity.

N-Fold Cross-Validation
Considering the best results for the classification were achieved with the SVM classifier, we used it in this stage. The results of the N-fold cross-validation calculated for different values of n can be seen in the following table. The value of n in Table 7 corresponds to the value used to determine the number of samples generated in the N-fold validation.  Figure 5 describes the confusion matrix calculated for this evaluation for n = 10. The results show that the classifier SVM achieved a true positive rate of 96.43% and a true negative rate of 97.96%. Besides, the sensitivity was 96.78%, and the specificity was 97.95%. Figure 5 describes the confusion matrix calculated for this evaluation for n = 10. The results show that the classifier SVM achieved a true positive rate of 96.43% and a true negative rate of 97.96%. Besides, the sensitivity was 96.78%, and the specificity was 97.95%.

The Detector of Epileptic Activity
The SVM model built in the previous step was included as part of a detector of epileptic events to support the automatic reading of EEGs. Then, following the approach described in Section 2.6, we built a detector capable of decomposing an EEG signal into channels and segments; the segments were analyzed by the SVM model and classified as normal or abnormal.
The detector was developed to evaluate the relevance of the EFS approach in the classification of EEG signals. One of the main reasons that motivated this research was to help diagnose epilepsy by supporting the automatic detection of epileptic events in EEG signals. To achieve this, we proposed improving the classification process by including only the relevant features that describe an EEG signal in the learning process.
To validate the detector, a set of 100 EEG records taken from 100 pediatric patients were read by the detector. The 100 EEG records are part of the EEG repository built in this research. For the test, each EEG record with epileptic activity describes the beginnings and ends of the epileptic abnormalities. These descriptions were used to validate the detections made by the detector. Table 8 describes the results of the reading of the 533,909 segments extracted from 100 EEGs, where 6806 segments are epileptiform events. According to the confusion matrix, the detector's accuracy, sensitivity, specificity, NPV, and PPV were 92.53%, 95.57%, 92.48%, 92.49%, and 95.80%. The rate of false negatives was 4.20%, and the rate of false positives was 7.51%.

Stability EFS
To determine the reliability of the implemented EFS method, the subset of features generated to support the classification of epileptiform events was evaluated. First, the EFS method was used 10 times to generate 10 relevant features with 10 different random samples from the dataset. Then, the 10 subsets generated were compared according to the Jackar index to determine the difference between them. Considering the previous, it is concluded that at least for datasets with complete and correctly balanced data, such as the one used in this test, the EFS method implemented achieved 100% stability.

Discussion
We evaluated an ensemble feature selection approach to support the feature selection for the classification of EEG signals with epileptiform events. The evaluation considered three aspects: (i) evaluating the impact of the relevant features selected by the EFS method in the classification of segments of EEG signals, (ii) evaluating a classifier of normal or abnormal segments of EEG signals using a set of relevant features selected by the EFS method, and (iii) evaluating the stability of the EFS method for selecting features with different samples of the dataset.
In the review of the state of the art, several studies were found that proposed approaches for building an ensemble method of feature selection algorithms [27][28][29]. However, most of the works were not applied to EEG datasets, and the results are not conclusive. Moreover, the works that proposed a kind of ensemble feature selection used an approach based on stages, where the first step selects the first subset of relevant features. Then, in the second stage, the subset chosen in the first stage is re-evaluated by another feature selection algorithm. Thus, the first stage could bias the second stage.
Likewise, some authors propose solutions to build the ensemble of feature selection algorithms based on filters [30][31][32][33]. However, although this kind of algorithm is simple and easy to implement, the algorithms based on filters have many weaknesses. In this sense, if the goal of an ensemble learning scheme is to combine the decision of different models to create robust choices, the idea to build an ensemble based on a filter could be a wrong decision. Besides, most of the studies of ensemble feature selection reviewed do not include stability as a metric to evaluate the quality of the feature selection process.
Considering the results in Tables 1-4, the best results in the classification were achieved when the classifiers, i.e., decision tree, logistic regression, random forest, and SVM, used the subset of relevant features generated by the EFS method. Besides, the SVM was the algorithm that classified better for the evaluation performed to see the impact of the pertinent features of the learning process. Thus, the model built to classify normal and abnormal EEG signals was based on SVM and the relevant features selected by the EFS method. As a result, this classifier achieved an accuracy of 97.46%, a true negative rate of 96.43%, a true positive rate of 97.96%, a sensitivity of 96.78%, and a specificity of 97.95%. These values showed a performance equal to or greater than those found in studies reviewed in the literature.
In the same way, a detector of epileptic activity was built to show the use of the classifier built in the context of the automatic reading of EEG signals and analyze the classifier's performance in a scenario where there is not a balanced dataset. An EEG record contains a large number of segments. However, most of them are normal segments. A reduced number of segments are abnormal, which generates an unbalanced scenario to evaluate the detection of abnormal EEGs as a binary classification task. Although the classifier used by the detector was trained using a perfect balanced dataset, the results showed an accuracy of 92.53%, sensitivity of 95.79%, and specificity of 92.48% in a scenario with an unbalanced dataset. Considering that early detection of epilepsy is critical to its treatment, the priority for the detector is to increase the probability that a segment detected as normal is a normal segment; this decreases the rate of false negatives and, consequently, reduces the likelihood of putting the patient's health at risk. Although the tests evidenced a low rate of false negatives, the detector has not been designed to replace the work of an expert, and its potential should be used to help the experts to identify abnormalities quickly and optimize their time.
Besides, the detector allows validating the classifier's performance, which was trained using a balanced dataset. In the evaluation, the detector scanned the EEGs segment by segment and classified each segment as normal or abnormal, which generated a test dataset with more normal segments than abnormal segments.
On the other hand, the stability of the ensemble feature selection method was evaluated by generating samples from the dataset. The results showed stability equal to 1, which means that the EFS method selected the same set of relevant features for all samples generated.

Conclusions
In this study, we used an ensemble feature selection approach that integrates the advantages of several feature selection algorithms to improve the identification of the characteristics with high power of differentiation in the classification of normal and abnormal EEG signals.
The discrimination was evaluated using several classifiers, i.e., decision tree, logistic regression, random forest, and SVM. This evaluation allowed demonstrating that machine learning models could improve their performance, discarding the features that are not relevant or represent noise.
The classifier built using features selected by the EFS method achieved an accuracy (97.64%), sensitivity (96.78%), and specificity (97.95%) equal to or greater than the values found in the literature using only a subset of features selected instead of all features. Additionally, the perfect stability achieved in selecting features on different samples of the original dataset demonstrated the reliability of the feature selection process.
Although the detector of epileptic segments decreased almost five percentage points in the accuracy (92.53%) and two percentage points in the sensitivity (95.97%) when it was tested with a highly unbalanced scenario, the achieved specificity (92.48%) meets the requirements of the medical context, where the specificity is the main priority because it is crucial avoid false negatives that put the patient's health at risk.
The EFS used to select the subset of relevant features allowed the computational complexity of the classification of epileptic segments to be decreased, and it demonstrated that it is not necessary to calculate many features to describe epileptiform events and classify them well.
Finally, the main contribution of this work was to validate the selection of relevant features by the ensemble feature selection method on a dataset of EEG signals. The evaluation results allowed us to confirm that the use of EFS could help us improve the reliability of classifiers and detectors of epileptiform events in EEG signals.  Institutional Review Board Statement: The study was conducted according to the guidelines approved by the Ethics Committee of the University of Cauca.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The dataset used to evaluate EFS and train the classification model is available at https://github.com/Maritzag/EEGSignals/tree/master/EvaluationEFS, accessed on 15 May 2021.