Extended Evolutionary Algorithms with Stagnation-Based Extinction Protocol

: Extinction has been frequently studied by evolutionary biologists and is shown to play a signiﬁcant role in evolution. The genetic algorithm (GA), one of popular evolutionary algorithms, has been based on key concepts in natural evolution such as selection, crossover, and mutation. Although GA has been widely studied and implemented in many ﬁelds, little work has been done to enhance the performance of GA through extinction. In this research, we propose stagnation-driven extinction protocol for genetic algorithm (SDEP-GA), a novel algorithm inspired by the extinction phenomenon in nature, to enhance the performance of classical GA. Experimental results on various benchmark test functions and their comparative analysis indicate the effectiveness of SDEP-GA in terms of avoiding stagnation in the evolution process.


Introduction
The biological metaphor of evolution was being applied to computation since as early as 1950 [1][2][3], and the genetic algorithm (GA) is one of the evolutionary algorithms inspired by this process of natural evolution. The theory of evolution was introduced by Darwin in 1858 [4] and together with Weismann's theory of natural selection [5] and Mendel's concept of genetics [6] they formed the neo-Darwinian paradigm [7] which the genetic algorithm was based on. Although it is clear that in the process of evolution, there are fundamental components such as natural selection, reproduction (crossover), and mutation, it is often neglected that extinction plays an important role in this process too. Several evolutionary biologists have raised this question and argued that extinction plays a significant role in evolution [4,8], but little has been done in the field of evolutionary computation where extinction is still being neglected or ignored.
When we take a look at the efforts in improving the performance of GA, we can categorize the efforts into two groups: One tries to improve GA through modeling GA closer to natural evolution by introducing new biological operators or modifying existing ones to closely mimic processes observable in nature [9][10][11]. The other group mainly designs operators that were tailored to suit specific problems, and these operators have no correspondence to nature of any sort whatsoever [12][13][14]. Based on the increased performance and accuracy of GAs that were modeled to closely mimic the natural evolution process, we came to a deduction: the closer we model our GA towards natural evolution, the better it will perform. With this in mind, this research aims to introduce a novel genetic algorithm inspired by the extinction phenomenon in nature: stagnation-driven extinction protocol for genetic algorithm (SDEP-GA).
The idea of SDEP-GA is analogous to Dropout regularization [15]. In training the neural network, Dropout operation is randomly omitting a subset of hidden units at each training iteration in the neural network. This random removal of hidden units at each training iteration turns out to be a combination of exponentially many different neural networks which share the same weights. Thus, Dropout regularization basically attempts to average multiple models to improve the performance of the neural network. Srivastava et al. [16] have shown that dropout reduces overfitting and improves generalization. Removing chromosomes in SFEP-GA is similar to omitting hidden units in dropout, but in SDEP-GA, removing chromosomes with the extinction probability in a random way or a targeted way can allow greater possibility of exploration for new solutions.
The remaining of this paper is organized as follows. Section 2 presents a description of GA, specially focusing on simple GA. Section 3 provides the design and architecture of our proposed algorithm, stagnation-driven extinction protocol for GA (SDEP-GA). Section 4 will cover the experimental design of our proposed algorithm together with the experimental results and discussion of SDEP-GA compared against Simple GA in terms of performance. Finally, Section 5 gives a summary, a review of our contribution, and possible directions for future work.

Genetic Algorithm
Genetic algorithm (GA) was invented by Holland in the 1960s and formally introduced by Holland in 1975 [17]. GA is based on ideas from Darwinian evolution and thus it adopts some biological terminology. To assist in the understanding of SDEP-GA, we outline Simple GA algorithm.

Simple GA
Simple GA is the simplest form of GA, which consists of three types of operators commonly used in other GAs. The three basic operators are as below, and they will be further explained later.
• Selection • Crossover • Mutation Figure 1 show the flowchart of Simple GA process, and it can be interpreted as follows: 1.
Generate a random population of n chromosomes.

2.
Evaluate the fitness of each chromosome.

3.
If the termination criterion is not met, then continue the following steps. Otherwise, return the best chromosome found.

4.
Repeat the following until n chromosomes are created: (a) Select a pair of parent chromosomes.
Perform crossover with the selected pair of parent chromosome. A pair of offspring is created in this process. (c) Apply mutation operator on the pair of offspring.

5.
Replace current population with the newly created n offspring. 6.
Go to step 2 .
Each complete iteration (step 1-5 or step 2-5) performed is called a generation, and the entire set of generations is called a run. At the end of each run, the fittest chromosome or the best approximated candidate solution is returned to the user.

Fitness Evaluation
The first rule required to use GA to solve a problem is that we need to be able to clearly evaluate the fitness of a chromosome, meaning we must have a clear method of measuring the accuracy of each candidate solution. The fitness of a chromosome is judged by its accuracy in solving the given problem. The chromosome that is most accurate in solving the problem will be the fittest chromosome and the least accurate one will be the least fit chromosome.

Selection
The selection operator selects a pair of parent chromosomes from the current population, the probability of selection being an increasing function of fitness. Selection is done "with replacement", meaning that the same chromosome can be selected more than once to become a parent. There are various types of selection methods, but in this research we will only use stochastic universal sampling method which will be explained in Section 4.

Crossover
With probability Pc (the "crossover probability"), we cross over the pair of parents at a randomly chosen point (chosen with uniform probability) to form two offspring. If no crossover takes place (probability of 1-Pc), we form two offspring that are exact copies of their respective parents. The crossover operator randomly chooses one or more loci in a pair of chromosomes and exchanges the subsequences after the locus to create two offspring. There are two types of crossover operator: single-point crossover and multi-point crossover. Crossover points are selected randomly, so it might happen that during a single-point crossover the locus is located before the first bit or after the last bit of a chromosome. In such cases, the offspring pair will be an exact replica of their parents.

Mutation
Similar to the crossover operator, mutation is done with a probability Pm (the mutation probability). There are multiple ways to mutate a chromosome: single-bit mutation or n-bit mutation. For a binary vector case, the mutation operator randomly selects one or more bits in the chromosome and then flips it. Whether or not mutation takes place in the offspring created from crossover operation, the offspring will be placed in the population pool to replace the current population after n offspring are created.

SDEP-GA
This section introduces SDEP-GA, a GA with an extinction protocol driven by stagnation. The insight of this extinction protocol is to mimic the extinction phenomenon in nature to bring GA closer to how natural evolution takes place. Before going deep into the mechanism of the extinction protocol, it is useful to define what stagnation is and when it occurs.
We define stagnation as a condition when there is no improvement in the best fitness after K generations. Stagnation usually takes place when the algorithm is stuck in a local optimum while trying to search for a global optimum. To overcome this problem, we suggest make the chromosomes in the population extinct with a probability of Pe (the extinction probability). This will by chance remove some chromosomes that lead towards the stagnation from the population before the selection process takes place. This step will also increase the chances of less fit chromosomes to be selected to reproduce if they have survived the extinction step due to the decrease in the population size. The flowchart of how SDEP-GA works is shown in Figure 2.
SDEP-GA is modeled based on simple GA, with an added operator named SDEP that takes place before the selection operator. Taking into consideration that extinction in nature can either happen to only a certain species (e.g., bird flu) or any species (e.g., starvation, flood, etc.), we proposed two types of extinction protocol:

Random Extinction
In SDEP with random extinction, if the algorithm is stagnant for K generations, each of the chromosomes in the population will be removed (extinct) with a probability Pe (the extinction probability). This is done by assigning a random value from 0 to 1 to each of the chromosomes in the population pool, and if that value is greater than Pe, the chromosome will be removed from the population pool. The flowchart of random extinction protocol is shown in Figure 3 below.

Targeted Extinction
In SDEP with targeted extinction, if the algorithm is stagnant for K generations, the chromosomes in the population will be removed (extinct) with a probability Pe (the extinction probability) only if their fitness value is below a certain threshold value T. This is done to preserve the elite chromosomes in the population. The threshold value T will be changed after every generation so that only a small percentage of chromosomes from the population pool will be exempted from the extinction process. This is done by ranking each of the chromosomes according to their fitness value and then using the fitness value of chromosome in rank i to be the threshold value. Suppose we have a population of 100 chromosomes and we want to preserve the top 10 percent of the chromosome based on their fitness value, the threshold value T will take the value of the chromosome ranked 10th according to the fitness value of the entire population. The flowchart of targeted extinction protocol is shown in Figure 4 below.

Experimental Design and Result
With the aim to test the effectiveness of the SDEP-GA in solving multivariable optimization problems, we decided to use some classical test functions widely used among researchers in benchmarking the performance of GA.

Classical Test Functions
The performance of SDEP-GA is compared against Simple GA using the following classical test functions: Griewank's function [20] They are all multivariable minimization problems, where the global minimum is known and thus can be used to compare against the solution found by the algorithm.
Rastrigin has many local optima and is highly multimodal. The mathematical expression of the function is as below, together with the lower and upper bound values and the global minimum (shown in Figure 5a when n = 2).
Schwefel is a deceptive function where search algorithms are potentially prone to converge towards the wrong direction. The mathematical expression of the function is as below, together with the lower and upper bound values and the global minimum (shown in Figure 5b when n = 2).
Griewank is similar to Rastrigin with many widespread local minima. The mathematical expression of the function is as below, together with the lower and upper bound values and the global minimum (shown in Figure 5c when n = 2).

Representation
As mentioned in Section 2, the first step of the experiment is to represent the candidate solution in the form of chromosomes. Rastrigin, Schwefel, and Griewank have 20 variables each. Each variable of Rastrigin will be represented by 10 bits and each variable of Schwefel and Griewank will be represented by 20 bits. Gray coding is used for all these experiments.

Selection
We use a stochastic universal sampling method [21] in the parent selection process where chromosomes are probabilistically selected for reproduction according to their fitness ranking in the current population. The probability of a chromosome being selected for crossover (Ps) is as follows: where f (x i ) is the fitness of chromosome x i and P s (x i ) is the probability of that individual being selected.

Crossover
A single-point crossover will be used with a crossover rate of P c = 0.7. This is to ensure that crossover will more likely to take place so that the offspring created are not a mere replica of the parent pair. The crossover point is selected at random, and the vectors after the selected locus will be swapped.

Mutation
Mutation is done on each element of the chromosome with the mutation probability, P m , as follows: Length refers to the length of the chromosome structure, for example, the length is 20 if the chromosome is a 20-bit binary vector. This value is selected as it implies that the probability of any one element of a chromosome being mutated is approximately 0.5 [22]. This means that with a probability of 50 percent, at least one bit of the chromosome will be mutated.

SDEP
For the extinction protocol to take place, in Tables 1 and 2, we set the stagnation counter at 10 (K = 10) generations, and the top 20 percent of the population sorted by their ranks in fitness value will be preserved in the targeted extinction process.

Evaluation Methodology
The performance of SDEP-GA is compared against Simple GA on all three classical test functions. We adopt some parameters used in [23] to evaluate the performance of these two algorithms: mean generation number (Mgn) and post-extinction number (Pen). The definitions of these two parameters are as follows: • Mgn: the average number of generations for the best result to be obtained in one complete run • Pen: the average number of generations needed to achieve a new best found solution in an entire run To assist in understanding the way these parameters function, an example will be given below for each of the parameters described above. Suppose we run the experiment for 5 runs, with each run having obtained the best result at 10th, 11th, 20th, 15th, 21st generation, respectively. The Mgn will be the average of these five values: Mgn = (10 + 11 + 20 + 15 + 21)/5 = 15.4. This value shows that on average the best result is found at the 15.4th generation. Suppose we are working on an optimization problem where we are trying to find the global minimum of a function and these are the best results obtained after each generation for 7 generations: 80, 92, 90, 78, 81, 83, and 74. To calculate the value of Pen, we first need to see how many times the best result is updated for one run, which in this case over the duration of 7 generations. The best result is updated at the 1st, 4th, and 7th generation, for a total of 3 times. The value of Pen will then be Pen = 7/3 = 2.33. This value indicates that on average the algorithm will obtain a better result after every 2.33 generations.
Besides that, we will compare the performance of the two algorithms in terms of accuracy. The average result of 30 runs will be used, where each run will have a total number of 400,000 evaluations. Namely, we used a population size of 400 chromosomes in the population and a maximum number of generations of 1000.

Results and Discussion
The summarized experimental results of Simple GA (SGA) and SDEP-GA using random and targeted extinction is shown in Table 1. Table 1. Summary of SDEP-GA with random and targeted extinction against SGA; 95% confidence interval is presented after ± sign, and standard deviation is presented inside parenthesis. Minimum results are in bold font.

Test
Avg With reference to the third column of Table 1, we can see that SDEP with targeted extinction has a lower fitness value when compared to SGA for Rastrigin, Schwefel, and Griewank. As these three test functions are all minimization problems, the lower fitness value indicates that the algorithm scores better in terms of accuracy. In terms of Pen, we can see that SDEP-GA has a lower or equal value when compared against SGA. This indicates that the algorithm is able to find a better result within a shorter number of generations. In other words, SDEP with targeted extinction evolves towards the right direction at a faster rate compared to SGA. The lower value of SDEP-GA in Mgn shows that it is able to find a close approximation of the ideal solution faster than SGA. Figure 6 shows the fitness values of the three algorithms over the generations for Rastrigin, which supports our argument. In Figure 6, note that we reversed the fitness for convenience in understanding the graph. It can be seen that SDEP with targeted extinction reaches near-optimum earlier than SGA and achieves near-optimum points multiple times. These results suggest that SDEP with targeted extinction is better in terms of accuracy as well as the computation time to reach the solution when compared against SGA. The overall running time of SDEP-GA is a slightly longer than that of SGA because of the extinction overhead. Table 2 shows the running time in milliseconds for Rastrigin, Schwefel, and Griewank functions, respectively. However, for complicated problems like Griewank, there is not much difference in running time among the algorithms, because the additional overhead for random and targeted extinction is relatively small.  Tables 3-5 show the fitness results of SDEP-GA Random Extinction with various parameters on Rastrigin, Schwefel, and Griewank functions, respectively. Note that the optimal value is in bold.  Table 5. Fitness results of SDEP with random extinction on Griewank (K is the stagnant counter); 95% confidence interval is presented after ± sign, and standard deviation is presented inside parenthesis. Taking a look at the performance of SDEP using random extinction in Tables 3-5  compared against SGA in Table 1, we can see that, for various values of two parameters (K and P e ), unfortunately, neither algorithm clearly outperforms the other.
However, from Tables 6-8, and the summary table, Table 1, it can be seen that SDEP using targeted extinction generally outperforms SGA. The reason for the poor performance of random extinction when compared to the targeted extinction might be due to the removal of certain crucial chromosomes during the random extinction process. This will set back the overall evolution rate of the algorithm because the efforts of previous evolutions are being wasted. Table 6. Fitness results of SDEP with targeted extinction on Rastrigin (K is the stagnant counter and fitness threshold T = 0.5); 95% confidence interval is presented after ± sign, and standard deviation is presented inside parenthesis. Minimum results are in bold font.   Figure 7 shows a box and whisker plot with Friedman test results. To alleviate multiple comparison errors, p-values are adjusted using the Bonferroni method [25]. Note that a p-value < 0.05 means that the experimental result in Table 1 is statistically significant. We also perform pairwise Wilcoxon signed-rank tests [26] to identify which pairs were significant. Table 9 shows the pairwise Wilcoxon signed-rank test of the Rastrigin results. It can be seen that SDEP with targeted extinction outperforms SGA, while SGA outperforms SDEP with random extinction. Table 9. Pairwise Wilcoxon signed-rank test of Rastrigin fitness performance among SGA, SDEP with random extinction, and SDEP with targeted extinction. The results in bold are statistically significant with a 95% confidence level. p-values are adjusted using Bonferroni method.

SGA SDEP with Random Extinction
SDEP with targeted extinction 0.03 0.0000000783 SDEP with random extinction 0.00000207

Conclusions
In this paper, we presented the results of an investigation aimed to explore a new GA operator borrowed from nature, i.e., the stagnation-driven extinction protocol with random extinction and targeted extinction. We defined SDEP-GA based on those two operators, and tested their performance against SGA on three classical test functions for GAs benchmarking purpose. The achieved result suggests that SDEP-GA using targeted extinction is comparable or sometimes advantageous over SGA in terms of accuracy and also computation time. However, the same cannot be said for SDEP-GA using random extinction. These results encourage us to further investigate forms of GAs closer to natural evolution, especially in the aspect of extinction process.
The pros and cons of our proposed algorithms are as follows: • Random extinction randomly removes chromosomes, which expedites more exploration for the population; however, it is prone to degrading the performance due to removal of near-optimal solutions. • Targeted extinction selectively removes chromosomes, which at once enables exploration and preserves exploitation. However, it takes longer time due to internal sorting and population management overhead.
From a theoretical point of view, we plan to investigate further the optimum value of threshold values for SDEP, i.e., the number of generations of stagnation (K) to trigger extinction protocol and the chromosomes to preserve in targeted extinction. We also plan to improve SDEP-GA by mimicking how extinction takes place in nature and how it affects the evolution of species. SDEP can also be added to other forms of GAs such as elitist GA.
From a practical point of view, we plan to measure the effectiveness and robustness of the algorithm when dealing with real-world problems.