BGOA-TVG: Binary Grasshopper Optimization Algorithm with Time-Varying Gaussian Transfer Functions for Feature Selection

Feature selection aims to select crucial features to improve classification accuracy in machine learning and data mining. In this paper, a new binary grasshopper optimization algorithm using time-varying Gaussian transfer functions (BGOA-TVG) is proposed for feature selection. Compared with the traditional S-shaped and V-shaped transfer functions, the proposed Gaussian time-varying transfer functions have the characteristics of a fast convergence speed and a strong global search capability to convert a continuous search space to a binary one. The BGOA-TVG is tested and compared to S-shaped and V-shaped binary grasshopper optimization algorithms and five state-of-the-art swarm intelligence algorithms for feature selection. The experimental results show that the BGOA-TVG has better performance in UCI, DEAP, and EPILEPSY datasets for feature selection.


Introduction
Researchers from all over the world have been paying more and more attention to how to handle massive datasets in recent years, thanks to data exploration.These databases can contain duplicate and pointless features.Thus, in order to solve this issue and increase the effectiveness of both supervised and unsupervised learning algorithms, feature selection is crucial [1][2][3].It is obviously a challenging task to find the best subset of features in high-dimensional feature datasets because there are 2 N−1 feature subsets in a dataset with N features.Traditional mathematical techniques cannot produce the desired outcome in an acceptable amount of time; hence, meta-heuristic algorithms are frequently used for feature selection issues.
To find the best solution to complex optimization problems, metaheuristics are employed.A system of solutions that develops over many iterations utilizing a set of rules or mathematical equations can be used by several agents to facilitate the search process.These iterations continue until the result satisfies a set of predetermined requirements.This final solution (the near-optimal solution) is called the optimal solution, and the system is considered to have reached a state of convergence [30].
In contrast to exact methods that find optimal solutions but require a long computational time, heuristic methods find near-optimal solutions quite quickly [31].However, most of these methods are problem-specific.As the word "meta" in metaheuristic methods indicates, metaheuristics are one level higher than heuristics.Metaheuristics have been very successful because they have the potential to provide solutions at an acceptable computational cost.By mixing good heuristics with classical metaheuristics, very good solutions can be obtained for many real-world problems.
Binary particle swarm optimization (BPSO) and its variants have been widely used in the FS problem.In 2020, the authors of ref. [32] proposed self-adaptive PSO with a local search strategy to find less-correlated feature subsets.The authors of ref. [33] proposed an improved version of the SSA algorithm called the ISSA to solve the FS problem.Furthermore, a binary chaotic horse herd optimization algorithm for feature selection (BCHOAFS) was proposed by Esin Ayşe Zaimo glu [34].The authors of ref. [35] used four families of transfer functions in the binary AOA (BAOA) to test 10 low-dimensional and 10 high-dimensional datasets.In feature selection problems, several wrapping-based algorithms using binary meta-heuristic algorithms have been proposed.The S-shaped and V-shaped transfer functions are mostly used, such as modified binary SSA (MBSSA) [36].
In recent years, to increase the efficiency of transfer functions, the time-varying Sshaped and V-shaped transfer functions have been proposed and applied to the binary dragonfly algorithm (BDA) [37] and BMPA [38].The results showed that the time-varying S-shaped transfer function has a better performance than the V-shaped one.Besides the single-objective algorithm, the muti-objective algorithm also plays an important role in the feature selection problem.Multi-objective whale algorithm optimization (WOA) [39] was proposed for data classification for which filter and wrapper fitness functions were simultaneously optimized.Another paper studies a multi-label feature selection algorithm using improved multi-objective particle swarm optimization (PSO) [40], with the purpose of searching for a Pareto set of non-dominated solutions (feature subsets).Liu proposes a novel feature selection method utilizing a filtered and supported sequential forward search technique called multi-objective ant colony optimization (MOACO) in the context of support vector machines (SVMs) [41] to solve the feature selection problem.A new method known as improved multi-objective salp swarm algorithm (IMOSSA) is tested for a feature selection task [42].Also, other kinds of transfer functions are used in feature selection problems.An automata-based improved BEO [43] (AIEOU) used a U-shaped transfer function to select the best subset of features.This algorithm applied both learning-based automata and adaptive β-hill climbing (AβHC) to find the best parameters to form a better equilibrium pool.Many binary metaheuristic algorithms have been introduced to select the best subset of features.However, due to the importance of feature selection in many fields, it is necessary to design algorithms that can obtain a higher accuracy with a smaller subset of features.
An important step in the feature selection problem is mapping a continuous space to binary ones, and the transfer function plays a significant role in this process.Moreover, using transfer functions is one of the easiest ways to convert an algorithm from continuous to binary without modifying its structure.The shapes of transfer functions are classified into two families: S-shaped and V-shaped [44].The S-shaped transfer function is a transformation function that increases monotonically within the [0, 1] interval, and its main attribute is that it is asymmetric and has a single fixed value throughout its entire area.The V-shaped transformation function is symmetric and has two numbers with equal values in the [0, 1] interval, which enhances the diversity of the population in the region within the range.
A common drawback of common transfer functions used in binary algorithms is that they do not explore and develop evolutionarily during the search for the optimal solution; that is, the process to obtain a solution involves changing the probability of the parameter values in a nonadaptive manner.Thus, they have poor exploration or exploitation and are static functions which cannot change as time goes on.
In this paper, a powerful transfer function is proposed to convert continuous search spaces to binary ones.The transfer function is dynamic during this process, which can enhance the search ability of the BGOA in the exploration phase.Then, the transfer function gradually changes while the proposed algorithm switches from exploration to exploitation and finally reaches good result in the end, the K-nearest neighbors (KNN) algorithm is applied to classify it.
The main contributions for this paper can be summarized as follows: 1.
A time-varying Gaussian transfer function is introduced.

2.
A new binary grasshopper optimization algorithm based on time-varying Gaussian transfer functions (BGOA-TVG) is proposed.

3.
The BGOA-TVG achieves a balance between its exploration and exploitation capabilities and improves the convergence speed of the algorithm.

4.
The BGOA-TVG can effectively deal with high-dimensional feature selection problems.

5.
Compared with proposed binary metaheuristic optimization algorithms in recent years, the excellent performance of the BGOA-TVG is verified.
The rest of the paper is organized as follows: Section 2 presents a brief introduction to the feature selection problem.In Section 3, the basic grasshopper optimization algorithm is discussed.The enhanced transfer function is presented in Section 4. Section 5 shows the results of the test.In Section 6, the proposed method is demonstrated within the EEG analysis field.Finally, Section 7 concludes the paper and suggests some directions for future studies.

Feature Selection Problem
The feature selection problem is an NP-hard optimization problem [45], in which as the features of a dataset increase, the search space of the problem exponentially grows.It is a useful way to find a relevant subset of fewer features from an initial dataset to reduce dimensions and training times [46].However, traditional mathematic methods cannot solve high-dimensional feature selection problems in a reasonable time, and according to tests, metaheuristic algorithms are better at finding subsets of features [47][48][49][50][51][52][53][54].There are three selection strategies in feature selection: wrapper-based, filter-based, and hybrid filter-wrapper-based methods [55].The precision of the learning algorithm in the wrapperbased strategy creates the optimum subset.The chosen subset in the filter-based method is unrelated to the learning process.These methods are combined in the hybrid approach.The wrapper-based method outperforms the others in terms of accuracy, but it requires more CPU resources and a longer testing time.The authors of [56][57][58] provided a new method to extract optimal feature subset to enhance accuracy of the calculation.Others have proposed correlation feature selection [59] and in-depth analyses on the usage of searching methods like the best-first, greedy step-wise, genetic, linear forward selection, and rank searches [60].
A feature selection module is applied prior to the classification method to optimize efficiency and precision by eliminating irrelevant features and to reduce the time complexity to find the classification to which a document belongs [61].
The most important thing in feature extraction is to extract subsets and determine whether to select an element in the set according to the accuracy rate.In the binary algorithm, individuals traverse a set and display whether to select an element using 0 s and 1 s.In Equation (1), N is the number of the dataset and SF is a subset selected by the algorithms; in the binary algorithm, it is selected by the value of an individual.
Feature selection is a multi-objective optimization problem for which we aim to minimize the subset of the selected features and maximize the classification accuracy, which is described as a fitness function as follows: where Err is the resulting classification error.SF is the number of selected features of the subset, and N is the total number of features of the dataset.SF N is the feature selection ratio of the subset to the total set.α and β are parameters in the interval of [0, 1] and α = 1 − β.

Grasshopper Optimization Algorithm (GOA)
The grasshopper optimization algorithm is a population-based swarm intelligence algorithm introduced by Mirjalili et al. in 2017 [62], which models the behaviour of grasshopper swarms in nature.There are two essential phases in this algorithm: the exploration and exploitation of the search space.Through social interactions during the food search process, the swarm of grasshoppers changes between the phases.The swarm moves slowly and goes a small distance in the larval stage.In contrast, the swarm moves quickly and goes a large distance in adulthood.
There are three evolutionary operators in the position-updating process of individuals in swarms [62]: the social interaction operator, S i in Equation (2); the gravity force operator, G i in Equation (2); and the wind advection operator, A i in Equation (2).The movement of individuals in the swarm is describes as follows: where X i defines the position of the i th grasshopper.
where N is the number of grasshoppers in the swarm, d ij represents the distance between the i th and the j th grasshopper, S is a function that defines the strength of the social forces and is calculated as shown in Equation (4), and is the unit vector from the i th grasshopper to the j th .
where f and l are two constants that indicate the intensity of attraction and the attraction length scale, respectively, and r is a real value.G i in Equation ( 2) is calculated as shown in Equation ( 5) below: where g is the gravitational constant, and → e g shows a unity vector towards the center of the earth.The effect of an individual's flight to overcome gravity is represented by the symbol preceding it.
A i in Equation ( 2) is calculated as shown in Equation (6) below: where u is a constant drift, and → e w is a unity vector in the direction of the wind.
Biomimetics 2024, 9, 187 5 of 38 Equation (2) can be expanded to Equation (7) as follows: However, the mathematical model using Equation ( 7) cannot be used directly to solve optimization problems, mainly because the grasshoppers quickly reach their comfort zone and the swarm does not converge to a specified point, according to a test in ref. [62].The author of the GOA algorithm suggested a modified version of Equation (7) as shown in Equation (8) to solve optimization problems [62], where the gravity operator is unconsidered, the gravity factor is set to 0, and the wind direction is always defined as moving towards a target.Accordingly, Equation (2) becomes Equation (8) as follows: where ub d is the upper bound in the dth dimension, and lb d is the lower bound in the d th dimension.T d is the value of the d th dimension in the target (the best solution found so far).
The coefficient c reduces the comfort zone proportional to the number of iterations and is calculated in Equation ( 9) as follows.
where C max is the maximum value, C min is the minimum value, l indicates the current iteration, and L is the maximum number of iterations.In ref. [62], they use C max = 1 and C min = 0.00001.Equation (8) shows that the next position of a grasshopper is defined based on its current position, the position of all other grasshoppers, and the position of the target.Algorithm 1 shows the pseudocode of the GOA algorithm.

Our Proposed BGOA-TVG Method
A binary search space is commonly considered as a hypercube [63].The space is four-dimensional, which is formed by moving three-dimensional objects.In this search space, the search agents of the binary optimization algorithm can only move to nearer and farther corners of this hypercube by flipping various numbers of bits.Therefore, to design the binary version of the GOA, the concepts of the velocity-and position-updating process should be modified.
In the continuous version of the GOA, the swarm of grasshoppers moves around the search space by utilizing direction vectors, and the value of position is in the continuous real domain.In the binary space, due to dealing with only two numbers ("0" and "1"), the position cannot be updated using Equation (6).The way to change the position and velocity is outlined below.
In binary spaces, position updating means switching between "0" and "1" values.This switching should be based on the probability of updating the binary solution's elements from 0 to 1 and vice versa.The main problem here is how to change the concept of velocity in the real world to a binary space.
In order to achieve this, a transfer function is important to map velocity values to probability values to update the positions.In other words, a transfer function defines the probability of changing a position element from 0 to 1 and vice versa.In general, transfer functions force predators to move in a binary space.According to ref. [64], the following concepts should be taken into consideration when selecting a transfer function in order to map velocity values to probability values: (1) The range of a transfer function should be bounded in the interval [0, 1], as this represents the probability that a particle will change its position.
(2) A transfer function should have a high probability of changing position for large absolute values of velocity.Particles with large absolute values for their velocities are probably far from the best solution, so they should switch their positions in the next iteration.
(3) A transfer function should also have a small probability of changing position for small absolute values of velocity.
(4) The return value of a transfer function should increase as the velocity rises.Particles that are moving away from the best solution should have a higher probability of changing their position vectors in order to return to their previous positions.
(5) The return value of a transfer function should decrease as the velocity is reduced.( 6) These concepts guarantee that a transfer function is able to map the process of searching from a continuous search space to a binary search space while preserving simi-lar concepts of the search for a particular evolutionary algorithm.The GOA is simulated by PSO, the changed part in Equation (8).Defined as ∆X, Equation ( 10) is analogous to the velocity vector (step) in PSO [65].The transfer function defines the probability of updating the binary solution's elements from 0 to 1 and vice versa.In the BGOA, the probability of changing the positions of elements is based on the step vector values.
where α is in the range of [α min , α max ], and the linear increase in Equation ( 11) switches the algorithm smoothly from the exploration to the exploitation phases.
Figure 1 shows the time-varying transfer function.It enhances the capability of the exploration in the first phase, as shown by the blue curve in Figure 1.In this phase, the diversity is extremely high, so the swarm can search all of the space.The red curve shows the phase between exploration and exploitation, which has a lower level of diversity than the first phase and searches more around the good solutions.The last phase, shown by the purple curve, changes slowly for the last iterations.
To avoid the local optima, the GOA uses Equation ( 8) to update the best solution.In the BGOA-TVG, a new time-varying V-shaped transfer function combined with a Gaussian mutation is proposed, as shown in Figure 2. The binary solutions are generated based on the TVG as shown in Equation ( 14) and are defined as follows: ) where β is in the range of [0.05, 10], and sigma is in the range of [0.01, 10] to switch efficiently from the exploration to the exploitation phases over time.
Biomimetics 2024, 9, x FOR PEER REVIEW 7 of 39 To avoid the local optima, the GOA uses Equation ( 8) to update the best solution.In the BGOA-TVG, a new time-varying V-shaped transfer function combined with a Gaussian mutation is proposed, as shown in Figure 2. The binary solutions are generated based on the TVG as shown in Equation ( 14) and are defined as follows: where  is in the range of [0.05, 10], and sigma is in the range of [0.01, 10] to switch efficiently from the exploration to the exploitation phases over time.To avoid the local optima, the GOA uses Equation ( 8) to update the best solution.In the BGOA-TVG, a new time-varying V-shaped transfer function combined with a Gaussian mutation is proposed, as shown in Figure 2. The binary solutions are generated based on the TVG as shown in Equation ( 14) and are defined as follows: where  is in the range of [0.05, 10], and sigma is in the range of [0.01, 10] to switch efficiently from the exploration to the exploitation phases over time.

Binaryposition
Binaryposition    , ,  Binaryposition    , ,  In Figure 2, the blue curve is the initial status of the combined function, which has both exploration and exploitation, and the purple curve is the last status, which has maximal exploitation.Because the parameters  and  in the function are constantly changing with each iteration, the intermediate conversion function is also constantly In Figure 2, the blue curve is the initial status of the combined function, which has both exploration and exploitation, and the purple curve is the last status, which has maximal exploitation.Because the parameters β and sigma in the function are constantly changing with each iteration, the intermediate conversion function is also constantly changing.The yellow curves depict diverse scenarios characterized by varying parameters.
Algorithm 1 shows the pseudocode of the BGOA-TVG algorithm.In ref. [65], it showed that normalizing the distance of grasshoppers in [1,4], individuals can have both attraction and repulsion forces, which balance exploration and exploitation in the algorithm.Hence, we set the distance of individuals within the closed interval [1,4].
Figure 3 shows the flowchart of the proposed algorithm.
Initialize C max , C min , and Max_Iterations Initialize a population of solutions X i (i = 1, 2, . .., n) Evaluate each solution in the population Set T as the best solution While (t < Max_Iterations) Update c using Equation ( 9) For each search agent Normalize the distances between grasshoppers in [1,4] Update the step vector ∆X of the current solution using Equation ( 10) For i = 1: dim Use Equation ( 8) to obtain the current position Use Equations ( 10)-( 13) to obtain the binary position Use Equations ( 14)-( 17) to obtain the final position Calculate α, β, sigma based on Equations ( 13), (15), and ( 16

Computational Complexity
The proposed transfer functions do not change the computational complexity of the algorithm during each iteration.Moreover, the core program of the grasshopper optimization algorithm is to find the current optimal value in a loop.

Computational Complexity
The proposed transfer functions do not change the computational complexity of the algorithm during each iteration.Moreover, the core program of the grasshopper optimization algorithm is to find the current optimal value in a loop.Factors that affect the overall complexity include the population number, number of individuals, and number of iterations.Therefore, the maximum of the BGOA-TVG is O (Max_Iteration × N × D), where D shows the number of dimensions.

Experimental Simulation Platform
For this experiment, we used Windows10 on a computer with the following specifications: a main frequency of 3.30 GHz; a memory of 16.0 GB; and an Inter(R) Core (TM)i3_6100 CPU.All algorithm codes were run in MATLAB2022a.

UCI Datasets
For this section, we selected 10 University of California at Irvine (UCI) datasets with different characteristics to verify the BGOA-TVG from different perspectives in terms of its performance.The name of the dataset, number of features, and number of instances the dataset has been used are shown in Table 1.

Parameter Settings
In order to verify the feasibility and effectiveness of the BGOA-TVG, we adopted some binary meta-heuristic algorithms: the BDA [23], BHHO [20], BPSO [11], BGWO [14], the BGBO [66], the BWOA [17], BGA [67], BBOA [68] and the BGOA-TVG.To make a fair comparison, the population size of the seven algorithms is set to 40, and the number of iterations is set to 100.Table 2 shows the main parameter settings of the seven algorithms.

Evaluation Criteria
The experimental results are evaluated in terms of the following criteria: (1) Average fitness function The datasets are tested dependently 30 times, and the average fitness shows the stability of the proposed algorithm, as calculated in Equation (18): where N is the number of runs of the optimization algorithm, and f itness i is the optimal solution resulting from the i th run. (

2) Average classification accuracy
The result is formulated in Equation ( 19) as follows: where N is the total number of runs of the proposed algorithm to select the subset.Acc i is the accuracy of the best solution from the i th run.
(3) Average feature selection size This criterion can be calculated as in Equation ( 20), and the result is shown in Table 3.
The results of the average classification accuracy are shown in Table 3.Among them, the BGOA-TVG achieved the best results.It managed to reach the highest precision in nine datasets, and in addition, it had the highest precision value among all the methods in five datasets, so it came in first place in the overall ranking.
The performance of the proposed algorithm will be more clearly demonstrated by comparing it with the other 7 algorithms, as depicted in Figures 4-13

Different Transfer Functions
In order to demonstrate the impact of different types of conversion functions on the final data results, we selected four classic S-shaped transfer functions and four classic V-shaped transfer functions, which are shown in Table 4.We compared these with our proposed time-varying transfer function.The grasshopper optimization algorithm was used, and the results are shown in Figures 14-25.

Name (S-Shaped Family)
Transfer Function

Electroencephalogram (EEG) Dataset Analysis
In this section, we present the proposed approach for channel selection for EEG-based signal acquisition.An EEG is very valuable for the diagnosis, identification, and treatment monitoring of epilepsy.Epilepsy often manifests itself as uncontrollable convulsions and involuntary behaviors.By observing and analyzing the signals recorded via EEGs, not only does it not harm a patient, but it can also help physicians determine if a patient has epilepsy through mapping analyses.In cases such as brain injury or stroke, EEGs can provide doctors with critical information to help them plan patients' treatment.This is why EEGs have important clinical applications.

EEG Dataset 6.1.1. DEAP Dataset
The EEG signals used in this work were obtained from the EEG Motor Movement/ Imagery dataset.The data were collected from 109 healthy volunteers using the BCI2000 System, which makes use of 64 channels (sensors) and provides a separated EDF (European data format) file for each of them.The subjects performed different motor/imagery tasks.These tasks are mainly used in BCI (brain-computer interface) applications and neurological rehabilitation and consist of imagining or simulating a given action, like opening and closing one's eyes, for example.Each subject performed four tasks according to the position of a target that appeared on the screen placed in front of them (if the target appears on the right or left side, the subject opens and closes the corresponding fist; if the target appears on the top or bottom, the subject opens and closes both fists or both feet, respectively).In short, the four experimental tasks were as follows: 1.
To open and close their left or right fist; 2.
To imagine opening and closing their left or right fist; 3.
To open and close both their fists or both their feet; 4.
To imagine opening and closing both their fists or both their feet.
Each of these tasks were performed three times, thus generating 12 recordings for each subject for a two-minute run, and the 64 channels were sampled at 160 samples per second.The features of the twelve recordings are extracted by means of an AR model with three output configurations for each EEG channel: 5, 10, and 20 features.Further, the average of each configuration is then computed in order to obtain just one feature per EEG channel (sensor).In short, for each sensor, we extracted three different numbers of AR-based features, with the output of each sensor being the average of their values.Henceforth, we have adopted the following notation for each of the dataset configurations: AR5 for the five autoregression coefficients extracted, and AR10 and AR20 for the ten and twenty autoregression coefficients, respectively.All the datasets we used were processed and can be found at https://openneuro.org/ (accessed on 31 January 2024).

Dataset of EEG Recordings of Pediatric Patients with Epilepsy
Self-limiting epilepsy with central temporal spikes is a common focal epilepsy in childhood, mainly characterized by paroxysmal seizures in the mouth, pharynx, and on one side of the face.It is often accompanied by tongue stiffness, as well as speech and swallowing difficulties.Today, electroencephalography and other methods are the primary diagnostic tools for self-limiting epilepsy with central temporal spikes, and the prognosis is generally favorable.An epileptic electroencephalogram (EEG) refers to a special type of brain wave phenomenon induced by sleep, which is close to sustained spike slow-wave emissions and occurs more frequently during the SELECTS seizure period.In order to further investigate the impact of epileptic electrical persistence during sleep on patients' pediatric symptoms, we selected the publicly available EEG dataset of pediatric epilepsy syndromes in a feature selection and accuracy analysis.In total, 88 subjects recorded EEGs with their closed eyes during the resting state in this dataset.A total of 36 of the participants were diagnosed with Alzheimer's disease (AD group), 23 were diagnosed with frontotemporal fementia (FTD group), and 29 were healthy subjects (CN group).Using the international Mini-Mental State Examination (MMSE), their cognitive and neuropsychological state was measured.Lower MMSE scores indicate more severe cognitive decline, with scores ranging from 0 to 30.Months were used to measure the duration of the disease, with an average of 25 and an interquartile range (IQR) of 24 to 28.5 months.No comorbidities related to dementia were reported in the AD groups.The AD group experienced an average MMSE of 17.75 (SD = 4.5), while the FTD group had an average of 22.17 (SD = 8.22), and the CN group had an average of 30.The AD group averaged 66.4 years, the FTD group averaged 63.6 years, and the CN group averaged 67.9 years, which was the SD for the whole group.The data we used were analyzed and are now available at https://openneuro.org/ (accessed on 31 January 2024).

Compared Methods
The BGOA-TVG is an improved version of the grasshopper optimization algorithm (GOA) for multitask problems of pattern recognition.Furthermore, it is characterized as a swarm intelligence (SI) algorithm.SI has been proven to be a technique that can solve NP-hard computational problems, such as feature selection.Although a considerable number of new swarm-inspired algorithms have emerged in recent years, particle swarm optimization (PSO) is still the most widely used SI algorithm for solving feature selection problems [52].In addition, the individual expression in SI for feature selection is typically a bit string, whereby the dimensionality of an individual is equal to the total number of features in the dataset.Binary encoding is more commonly used for feature selection than real encoding.Therefore, for feature selection, we compared the BGOA-TVG with BPSO [11,12], the BGA [24], BHHO [20], the BWOA [17], BACO [21] BGWO [14], and a method with all the features obtained from the time, frequency, and time-frequency domains.Their characteristics are shown in Table 5.All of three methods adopt 40 individuals for 100 iterations.Optimized: means that the item value is not constant and chosen by the corresponding optimization algorithm.

Classification Indices
In the experiment, five classification indices are used for the validation of the compared methods, including the true positive rate (recall, TPR), the positive predictive value (precision, PPV), the true negative rate (specificity, TNR), the negative predictive value (NPV), and the classification accuracy (ACC).They can be respectively defined as follows: ACC = TP + TN TP + TN + FP + FN (25)

Selection of Classifier
The algorithms chosen are directly applied to every subject dataset with a fixed classification method, i.e., the SVM, KNN, Bayes, DT, RF, or Adaboost methods.Their results for the DEAP dataset are recorded in Figures 26-33, respectively.It can be seen that the classification accuracies of the five classifiers fluctuate prominently even for the same subject dataset and the same pattern recognition scheme.In terms of the highest classification accuracy, the best classifier is chosen artificially for every subject dataset, as shown in Table 6, respectively, corresponding to the DEAP dataset.It should be noted that the emotion recognition results for the same dataset depend on the classification method to some extent.
classification accuracies of the five classifiers fluctuate prominently even for the same subject dataset and the same pattern recognition scheme.In terms of the highest classification accuracy, the best classifier is chosen artificially for every subject dataset, as shown in Table 6, respectively, corresponding to the DEAP dataset.It should be noted that the emotion recognition results for the same dataset depend on the classification method to some extent.The training dataset comprises 50% of the original dataset, while 30% is allocated for the validation sets and the remaining 20% is reserved for the test sets.When considering the selection of classifiers for optimization, the BGOA-TVG undoubtedly realizes emotion recognition in a more efficient way.Moreover, according to the data in Figures 26-33, the RF, DT, and Bayes methods are chosen as the best classifiers for emotion recognition with a high probability, while the KNN method is the worst one.This indicates that the RF, DT, and Bayes methods are more suitable for EEG-based emotion recognition than the KNN method.

Selection of Features and Parameters
To further identify the optimal results of the BGOA-TVG, we simultaneously recorded five classification indices for Allfeat, BPSO, the BGA, BHHO, the BWOA, BACO, and BGWO for all subject datasets.The parameters were the same as those in the previous experiment.The results are shown in Figures 34-38.Firstly, the TPR (true positive rate), describes the ratio of all identified positive cases to all positive cases.It can be seen in Figure 34 that the classification index curve of the BGOA-TVG is better than the other algorithms most of time, and BPSO and BHHO have lower TPR values.Secondly, the PPV (positive predictive value) is shown in Figure 35.BGWO achieves the best performance in subject 3; however, it exhibits inferior results compared to BGOA-TVG in other subjects.Moreover, the volatility of the curve is too large, proving the poor robustness of the algorithm.In contrast, the BGOA-TVG is very robust.Thirdly, the TNR describes the proportion of identified negative cases to all negative cases.The BGOA-TVG shows the best robustness compared to the other algorithms, as can be seen in Figure 36.Fourthly, the NPV is the negative predictive value.In Figure 37, BGWO shows a similar classification rate as that of BHHO for subjects 1-9, but in subject 10, BGWO performs worse than the others, while BPSO, BACO, and the BGOA-TVG obtain a ratio of 1. Furthermore, the average classification accuracy values of the DEAP datasets found by these algorithms are recorded in Figure 38.It can been seen that the BGOA-TVG can choose an appropriate method based on the specific characteristics of an emotion dataset.The mean classification accuracy of the BGOA-TVG for DEAP is respectively higher than those of the other algorithms.This verifies that the BGOA-TVG can efficiently recognize emotion patterns.describes the ratio of all identified positive cases to all positive cases.It can be seen in Figure 34 that the classification index curve of the BGOA-TVG is better than the other algorithms most of time, and BPSO and BHHO have lower TPR values.Secondly, the PPV (positive predictive value) is shown in Figure 35.BGWO achieves the best performance in subject 3; however, it exhibits inferior results compared to BGOA-TVG in other subjects.Moreover, the volatility of the curve is too large, proving the poor robustness of the algorithm.In contrast, the BGOA-TVG is very robust.Thirdly, the TNR describes the proportion of identified negative cases to all negative cases.The BGOA−TVG shows the best robustness compared to the other algorithms, as can be seen in Figure 36.Fourthly, the NPV is the negative predictive value.In Figure 37, BGWO shows a similar classification rate as that of BHHO for subjects 1-9, but in subject 10, BGWO performs worse than the others, while BPSO, BACO, and the BGOA−TVG obtain a ratio of 1. Furthermore, the average classification accuracy values of the DEAP datasets found by these algorithms are recorded in Figure 38.It can been seen that the BGOA−TVG can choose an appropriate method based on the specific characteristics of an emotion dataset.The mean classification accuracy of the BGOA−TVG for DEAP is respectively higher than those of the other algorithms.This verifies that the BGOA−TVG can efficiently recognize emotion patterns.

Analysis of Results
The horizontal lines in  represent the different datasets that we used, with subjects no.1-10 corresponding to each of the ten datasets and the points representing the outcomes obtained using diverse algorithms.The lines indicate the compatibility and stability of the different algorithms.As a result, the proposed binary GOA is more stable than the other algorithms, and the application of the proposed algorithm for different DEAP datasets is universal.
Figure 39 depicts the mean feature numbers for all optimization techniques regarding the learning algorithm.As we did not consider the feature extraction procedure, i.e., the autoregression coefficient computation, the feature numbers chosen over all dataset configurations are quite similar for BPSO and the BGA.In Table 7, it is possible to observe that the BGOA-TVG is the fastest technique in all situations, since it only updates one agent per iteration.Although this may be a drawback in terms of convergence, it is still the fastest approach with lowest number of features.
However, in Figure 38, the proposed binary GOA is the most accurate method compared to other algorithms and also selects fewer features.In general, the proposed algorithm can get results exhibit the most minimal error rate than other algorithms in EEG signal analysis.
Finally, we still need to deal with the trade-off between the number of features and the computational efficiency.Using all of the sensors does not lead to very different results, which supports the idea of this work: one can find a subset of sensors that can obtain reasonable results.The horizontal lines in  represent the different datasets that we used, with subjects no.1-10 corresponding to each of the ten datasets and the points representing the outcomes obtained using diverse algorithms.The lines indicate the compatibility and stability of the different algorithms.As a result, the proposed binary GOA is more stable than the other algorithms, and the application of the proposed algorithm for different DEAP datasets is universal.
Figure 39 depicts the mean feature numbers for all optimization techniques regarding the learning algorithm.As we did not consider the feature extraction procedure, i.e., the autoregression coefficient computation, the feature numbers chosen over all dataset configurations are quite similar for BPSO and the BGA.In Table 7, it is possible to observe that the BGOA−TVG is the fastest technique in all situations, since it only updates one agent per iteration.Although this may be a drawback in terms of convergence, it is still the fastest approach with lowest number of features.
However, in Figure 38, the proposed binary GOA is the most accurate method compared to other algorithms and also selects fewer features.In general, the proposed algorithm can get results exhibit the most minimal error rate than other algorithms in EEG signal analysis.
Finally, we still need to deal with the trade−off between the number of features and the computational efficiency.Using all of the sensors does not lead to very different results, which supports the idea of this work: one can find a subset of sensors that can obtain reasonable results.The best results are shown in bold.

Analysis of Results for Epilepsy EEG
Every subject dataset receives a direct application of the chosen algorithms through the SVM method, a fixed classification method.The epilepsy dataset results are documented in Figures 40-51, respectively.The classification accuracy shows significant fluctuations for the various subject datasets.In terms of the highest classification accuracy, the best classifier is chosen artificially for every subject dataset.The results shown in Table 8 correspond to the epilepsy EEG datasets.The best results are shown in bold.

Analysis of Results for Epilepsy EEG
Every subject dataset receives a direct application of the chosen algorithms through the SVM method, a fixed classification method.The epilepsy dataset results are documented in Figures 40-51, respectively.The classification accuracy shows significant fluctuations for the various subject datasets.In terms of the highest classification accuracy, the best classifier is chosen artificially for every subject dataset.The results shown in Table 8 correspond to the epilepsy EEG datasets.A total of 50% of the original dataset is present in the training dataset, while 30% and 20% are present in the validation and test sets, respectively.Without a doubt, the BGOA-TVG recognizes emotions in a more efficient manner.The data presented in Figures 39-50 suggest that the BGOA-TVG is superior to the other algorithms when it comes to EEG-based emotion recognition.The best results are shown in bold.

Conclusions and Future Work
This paper proposed a new binary version of the grasshopper optimization algorithm, called the binary grasshopper optimization algorithm, using time-varying Gaussian mixed transfer functions (BGOA-TVG).Compared with the original GOA, which has a slow convergence speed and can easily fall into a local optimum, the BGOA-TVG has a good global search capability and accuracy in the local search space.Furthermore, the improved version of the GOA balances the relationship between exploration and exploitation in the search space and effectively avoids premature convergence.In order to verify the effectiveness and feasibility of the BGOA-TVG, 10 UCI datasets and a well-known DEAP dataset were tested for the algorithm.And the application of the K-nearest neighbor method further proves that the improved BGOA-TVG has a faster convergence speed in the global search space and a better accuracy in the local search space than other binary algorithms.For the BGOA-TVG, in future works, there are two possible research avenues.First, one could use different strategies to improve the original BGOA to increase the speed of searching the global space or obtain more accurate results in the local search space.Second, one could apply the BGOA-TVG to solve more complex optimization problems, such as image segmentation or network optimization configuration problems.Specific feature extraction applications are recommended for most mental illnesses, as they can quickly process a patient's symptoms.Furthermore, augmenting the database volume will help us conduct more rapid and accurate data analyses.
New time-varying transfer functions are proposed to enhance the ability of the BGOA in the search space.Algorithm 1 shows the pseudocode of the BGOA-TVG.The first transfer function (time-varying sin) is proposed to convert positions in the continuous space into the binary search space.The position is in the range of [lb d , ub d ].The binary position is mapped as follows: Time − varying(position, α) = sin position α (11)

Figure 2 .
Figure 2. Time−varying V−shaped transfer function mixed with Gaussian function.

Figure 2 .
Figure 2. Time−varying V−shaped transfer function mixed with Gaussian function.

Figure 2 .
Figure 2. Time-varying V-shaped transfer function mixed with Gaussian function.

Figure 3 .
Figure 3. Flowchart of the proposed algorithm.

Figure 34 .
Figure 34.TPR obtained for ten subjects of DEAP dataset.

Figure 34 .
Figure 34.TPR obtained for ten subjects of DEAP dataset.

Figure 34 .
Figure 34.TPR obtained for ten subjects of DEAP dataset.

Figure 35 .
Figure 35.PPV obtained for ten subjects of DEAP dataset.

Figure 37 .
Figure 37. NPV obtained for ten subjects of DEAP dataset.

Figure 37 .
Figure 37. NPV obtained for ten subjects of DEAP dataset.

Figure 37 .
Figure 37. NPV obtained for ten subjects of DEAP dataset.

Figure 37 .
Figure 37. NPV obtained for ten subjects of DEAP dataset.

Figure 38 .
Figure 38.Mean ACC obtained for ten subjects of DEAP dataset.

Figure 38 .
Figure 38.Mean ACC obtained for ten subjects of DEAP dataset.

Figure 39 .
Figure39.Mean feature number obtained for ten subjects from the DEAP dataset over forty iterations.

Table 5 .
Characteristics of compared methods.

Table 6 .
Best classifiers for ten subjects of DEAP obtained by different algorithms.

Table 7 .
Mean values of classification results of algorithms for DEAP data.
Figure 39.Mean feature number obtained for ten subjects from the DEAP dataset over forty iterations.