Influence of Binomial Crossover on Approximation Error of Evolutionary Algorithms

Although differential evolution (DE) algorithms perform well on a large variety of complicated optimization problems, only a few theoretical studies are focused on the working principle of DE algorithms. To make the first attempt to reveal the function of binomial crossover, this paper aims to answer whether it can reduce the approximation error of evolutionary algorithms. By investigating the expected approximation error and the probability of not finding the optimum, we conduct a case study comparing two evolutionary algorithms with and without binomial crossover on two classical benchmark problems: OneMax and Deceptive. It is proven that using binomial crossover leads to the dominance of transition matrices. As a result, the algorithm with binomial crossover asymptotically outperforms that without crossover on both OneMax and Deceptive, and outperforms on OneMax, however, not on Deceptive. Furthermore, an adaptive parameter strategy is proposed which can strengthen the superiority of binomial crossover on Deceptive.


Introduction
Evolutionary algorithms (EAs) are a big family of randomized search heuristics inspired from biological evolution. Many evolutionary algorithms (EAs) such as genetic algorithm (GA) and evolutionary strategies (ES) use a crossover operator besides mutation and selection. Many empirical studies demonstrate that crossover which combines genes of two parents to generate new offspring is helpful to convergence of EAs. Theoretical results on runtime analysis validate the promising function of crossover in EAs [1,2,3,4,5,6,7,8,9,10,11,12], whereas there are also some cases that crossover cannot be helpful [13,14].
Differential evolution (DE) algorithms implement crossover operations in a different way. By exchanging components of target vectors with donor vectors, continuous DE algorithms achieve competitive performance on a large variety of complicated problems [15,16,17,18], and its competitiveness is to great extent attributed to the employed crossover operations [19]. Besides the theoretical studies on continuous DE [20] as well as the runtime analysis to reveal working principle of BDE [21], there were no theoretical results reported on how crossover influences performance of discrete-coded DE algorithms.
Nevertheless, there is a gap between runtime analysis and practice. Since EAs belong to randomized search, their optimization time to reach an optimum is uncertain and could be even infinite in continuous optimization [22]. Due to this reason, optimization time is seldom used for evaluating the performance of EAs in computer simulation. While EAs stop after running finite generations, their performance is evaluated by solution quality such as the mean and median of the fitness value or approximation error [23]. In theory, solution quality can be measured for given iteration budget by the expected fitness value [24] or approximation error [25,26], which contributes to the analysis framework named as fixed budget analysis. A fixed budget analysis on immune-inspired hypermutations leads to theoretical results that are very different from those of runtime analysis but consistent to the empirical results, which further demonstrates that the perspective of fixed budget computations provides valuable information and additional insights for performance of randomized search heuristics [27].
In this paper, solution quality of an EA after running finite generations is measured by two metrics: the expected value of the approximation error and the error tail probability. The former measures solution quality, that is the fitness gap between a solution and optimum. The latter is the probability distribution of the error over error levels, which measures the probability of finding the optimum. An EA is said to outperform another if for the former EA, its error and tail probability are smaller. Furthermore, an EA is said to asymptotically outperform another if for the former EA, its error and tail probability are smaller after a sufficiently large number of generations.
The research question of this paper is whether the binomial crossover operator can help reduce the approximation error. As a pioneering work on this topic, we investigate a (1 + 1)EA C that performs the binomial crossover on an individual and an offspring generated by mutation, and compare a (1 + 1)EA without crossover and its variant (1+1)EA C on two classical problems, OneMax and Deceptive. By splitting the objective space into error levels, the analysis is performed based on the Markov chain models [28,29]. Given the two EAs, the comparison of their performance are drawn from the comparison of their transition probabilities, which are estimated by investigating the bits preferred to by evolutionary operations. Under some conditions, (1 + 1)EA C with binomial crossover outperforms (1 + 1)EA on OneMax, but not on Deceptive; however, by adding an adaptive parameter mechanism arising from theoretical results, (1+1)EA C with binomial crossover outperforms (1 + 1)EA on Deceptive too.
The rest of this paper is organized as follows. Section 2 reviews related theoretical work. Preliminary contents for our theoretical analysis are presented in Section 3. Then, the influence of the binomial crossover on transition probabilities is investigated in Section 4. Section 5 conducts an analysis of the asymptotic performance of EAs. To reveal how binomial crossover works on the performance of EAs for consecutive iterations, the OneMax problem and the Deceptive problem are investigated in Sections 6 and 7, respectively. Finally, Section 8 presents the conclusions and discussions.

Theoretical Analysis of Crossover
To understand how crossover influences the performance of EAs, Jansen et al. [1] proved that an EA using crossover can reduce the expected optimization time from super-polynomial to a polynomial of small degree on the function Jump. Kötzing et al. [2] investigated crossover-based EAs on the functions OneMax and Jump and showed the potential speedup by crossover when combined with a fitness-invariant bit shuffling operator in terms of optimization time. For a simple GA without shuffling, they found that the crossover probability has a drastic impact on the performance on Jump. Corus and Oliveto [3] rigorously obtained an upper bound on the runtime of standard steady state GAs to hillclimb the OneMax function and proved that the steady-state EAs are 25% faster than their mutation-only counterparts. Their analysis also suggests that larger populations may be faster than populations of size 2. Dang et al. [4] revealed that the interplay between crossover and mutation may result in a sudden burst of diversity on the Jump test function and reduce the expected optimization time compared to mutation-only algorithms like the (1+1) EA. For royal road functions and OneMax, Sudholt [5] analyzed uniform crossover and k-point crossover and proved that crossover makes every (µ + λ) EA at least twice as fast as the fastest EA using only standard bit mutation. Pinto and Doerr [6] provided a simple proof of a crossover-based genetic algorithm (GA) outperforming any mutation-based black-box heuristic on the classic benchmark OneMax. Oliveto et al. [7] obtained a tight lower bound on the expected runtime of the (2 + 1) GA on OneMax. Lengler and Meier [8] studied the positive effect of using larger population sizes and crossover on Dynamic BinVal.
Besides artificial benchmark functions, several theoretical studies also show that crossover is helpful on non-artificial problems. Lehre and Yao [9] proved that the use of crossover in the (µ+1) Steady State Genetic Algorithm may reduce the runtime from exponential to polynomial for some instance classes of the problem of computing unique input-output (UIO) sequences. Doerr et al. [10,11] analyzed EAs on the all-pairs shortest path problem. Their results confirmed that the EA with a crossover operator is significantly faster than without in terms of the expected optimization time. Sutton [12] investigated the effect of crossover on the closest string problem and proved that a multi-start (µ+1) GA required less randomized fixed-parameter tractable (FPT) time than that with disabled crossover.
However, there is some evidence that crossover is not always helpful. Richter et al. [13] constructed Ignoble Trail functions and proved that mutation-based EAs optimize them more efficiently than GAs with crossover. The later need exponential optimization time. Antipov and Naumov [14] compared crossoverbased algorithms on RealJump functions with a slightly shifted optimum. The runtime of all considered algorithms increases on RealJump. The hybrid GA fails to find the shifted optimum with high probability.

Theoretical Analysis of Differential Evolution Algorithms
Although numerical investigations of DEs have been widely conducted, only a few theoretical studies paid attention to theoretical analysis, most of which are focused on continuous DEs [20]. By estimating the probability density function of generated individuals, Zhou et al. [30] demonstrated that the selection mechanism of DE, which chooses mutually different parents for generation of donor vectors, sometimes does not work positively on performance of DE. Zaharie and Micota [31,32,33] investigated influence of the crossover rate on both the distribution of the number of mutated components and the probability for a component to be taken from the mutant vector, as well as the influence of mutation and crossover on the diversity of intermediate population. Wang and Huang [34] attributed the DE to a one-dimensional stochastic model, and investigated how the probability distribution of population is connected to the mutation, selection and crossover operations of DE. Opara and Arabas [35] compared several variants of differential mutation using characteristics of their expected mutants' distribution, which demonstrated that the classic mutation operators yield similar search directions and differ primarily by the mutation range. Furthermore, they formalized the contour fitting notion and derived an analytical model that links the differential mutation operator with the adaptation of the range and direction of search [36].
By investigating expected runtime of the binary differential evolution (BDE) proposed by Gong and Tuson [37], Doerr and Zhang [21] performed a first fundamental analysis on the working principles of discrete-coded DE. It was shown that BDE optimizes the important decision variables, but is hard to find the optima for decision variables with small influence on the objective function. Since BDE generates trial vectors by implementing a binary variant of binomial crossover accompanied by the mutation operation, it has characteristics significantly different from classic EAs or estimation-of-distribution algorithms.

Fixed Budget Analysis and Approximation Error
To bridge the wide gap between theory and application, Jasen and Zarges [24] proposed a fixed budget analysis (FBA) framework of randomized search heuristics (RSH), by which the fitness of random local search and (1+1)EA were investigated for given iteration budgets. Under the framework of FBA, they analyzed the any time performance of EAs and artificial immune systems on a proposed dynamic benchmark problem [38]. Nallaperuma et al. [39] considered the well-known traveling salesperson problem (TSP) and derived the lower bounds of the expected fitness gain for a specified number of generations. Doerr et al. Considering that runtime analysis demonstrated that hypermutations tend to be inferior on typical example functions, Jansen and Zarges [27] conducted an FBA to explain why artificial immune systems are popular in spite of these proven drawbacks. Although the single point mutation in random local search (RLS) outperforms than the inversely fitness-proportional mutation (IFPM) and the somatic contiguous hypermutation (CHM) on OneMax in terms of expected optimization time, it was shown that IFPM and CHM could be better while FBA is performed by considering different starting points and varied iteration budgets. The results show that the traditional perspective of expected optimization time may be unable to explain observed good performance that is due to limiting the length of runs. Therefore, the perspective of fixed budget computations provides valuable information and additional insights.

Problems
Consider a maximization problem Denote its optimal solution by x * and optimal objective value by f * . The quality of a solution x is evaluated by its approximation error e(x) The error e(x) takes finite values, called error levels: Two instances, the uni-modal OneMax problem and the multi-modal Deceptive problem, are considered in this paper.
Both OneMax and Deceptive can be represented in the form where | x |:= n i=1 x i . Error levels of (1) take only n + 1 values. For the OneMax problem, both exploration and exploitation are helpful to convergence of EAs to the optimum, because exploration accelerates the convergence process and exploitation refines the precision of approximation solutions. However, for the Deceptive problem, local exploitation leads to convergence to the local optimum, but it in turn increases the difficulty to jump to the global optimum. That is, exploitation hinders convergence to the global optimum of the Deceptive problem, thus, the performance of EAs are dominantly influenced by their exploration ability. for i = 1, 2, . . . , n do 5: end if 10:

Evolutionary Algorithms
For the sake of analysis on binomial crossover excluding influence of population, (1 + 1)EA (presented by Algorithm 1) without crossover is taken as the baseline algorithm in our study. Its candidate solutions are generated by the bitwise mutation with probability p m . Binomial crossover is added to (1 + 1)EA, getting (1 + 1)EA C which is illustrated in Algorithm 2. The (1 + 1)EA MC first performs bitwise mutation with probability q m , and then applies binomial crossover with rate C R to generate a candidate solution for selection.
The EAs investigated in this paper can be modeled as homogeneous Markov chains [28,29]. Given the error vectorẽ = (e 0 , e 1 , . . . , e L ) ′ , and the initial distributionq the transition matrix of (1 + 1)EA and (1 + 1)EA C for the optimization problem (1) can be written in the formR where Recalling that the solutions are updated by the elitist selection, we knowR is an upper triangular matrix that can be partitioned asR where R is the transition submatrix depicting the transitions between non-optimal states.

Transition Matrices
By elitist selection, a candidate y replaces a solution x if and only if f (y) ≥ f (x), which is achieved if "l preferred bits" of x are changed. If there are multiple solutions that are better than x, there could be multiple choices for both the number of mutated bits l and the location of "l preferred bits". Example 1. For the OneMax problem, e(x) equals to the amount of '0'-bits in x. Denoting e(x) = j and e(y) = i, we know y replaces x if and only if j ≥ i. Then, to generate a candidate y replacing x, "l preferred bits" can be confirmed as follows. for i = 1, 2, . . . , n do 7: 10: x t+1 = y; 11: end if 12: • If i = j, "l preferred bits" consist of l/2 '1'-bits and l/2 '0'-bits, where l is an even number that is not greater than min{2j, 2(n − j)}.
• While i < j, "l preferred bits" could be combinations of Here, k is not greater than i, because j − i + k could not be greater than j, the number of '0'-bits in x. Meanwhile, k does not exceed n − j, the number of '1'-bits in x.
If an EA flips each bit with an identical probability, the probability of flipping l bits are related to l and independent of their locations. Denoting the probability of flipping l bits by P (l), we can confirm the connection between the transition probability r i,j and P (l).

Transition Probabilities for OneMax
As presented in Example 1, transition from level j to level i (i < j) results from flips of j − i + k '0'-bits and k '1'-bits. Then, where

Transition Probabilities for Deceptive
According to definition of the Deceptive problem, we get the following map from | x | to e(x).
Transition from level j to level i (0 ≤ i < j ≤ n) is attributed to one of the following cases.
• If i ≥ 1, the amount of '1'-bits decreases from j − 1 to i − 1. This transition results from change of • if i = 0, all of n − j + 1 '0'-bits are flipped, and all of its '1'-bits keep unchanged.

Performance Metrics
We propose two metrics to evaluate the performance of EAs, which are the expected approximation error (EAE) and the tail probability (TP) of EAs for t consecutive iterations. The approximation error was considered in previous work [25] but the tail probability is a new performance metric.
EAE is the fitness gap between a solution and the optimum. It measures solution quality after running t generations.
Definition 2. Given i > 0, the tail probability (TP) of the approximation error that e(x t ) is greater than or equal to e i is defined as TP is the probability distribution of a found solution over non-optimal levels where i > 0. The sum of TP is the probability of not finding the optimum.
Given two EAs A and B, if both EAE and TP of Algorithm A are smaller than those of Algorithm B for any iteration budget, we say Algorithm A outperforms Algorithm B on problem (1). Definition 3. Let A and B be two EAs applied to problem (1).

Algorithm
The asymptotic outperformance is weaker than the outperformance.

Comparison of Transition Probabilities of the Two EAs
In this section, we compare transition probabilities of (1 + 1)EA and (1 + 1)EA C . According to the connection between r i,j and P (l), comparison of transition probabilities can be conducted by considering the probabilities of flipping "l preferred bits".

Probabilities of Flipping Preferred Bits
Denote probabilities of (1 + 1)EA and (1 + 1)EA C of flipping "l preferred bits" by P 1 (l, p m ) and P 2 (l, C R , q m ), respectively. By (2), we know Since the mutation and the binomial crossover in Algorithm 2 are mutually independent, we can get the probability by considering the crossover first. When flipping "l preferred bits" by the (1 + 1)EA C , there are l + k (0 ≤ k ≤ n − l) bits of y set as v i by (4), the probability of which is If only "l preferred bits" are flipped, we know, Note that (1 + 1)EA C degrades to (1 + 1)EA when C R = 1, and (1 + 1)EA becomes the random search while p m = 1. Thus, we assume that p m , C R and q m are located in (0, 1). For a fair comparison of transition probabilities, we consider the identical parameter setting Then, we know q m = p/C R , and equation (14) implies Subtracting (13) from (16), we have From the fact that 0 < C R < 1, we conclude that P 2 (l, C R , p/C R ) is greater than P 1 (l, p) if and only if l > np. That is, the introduction of the binomial crossover in the (1 + 1)EA leads to the enhancement of exploration ability of the (1 + 1)EA C . We get the following theorem for the case that p ≤ 1 n .
Proof The result can be obtained directly from equation (17) by setting p ≤ 1 n .

Comparison of Transition Probabilities
Given transition matrices from two EAs, one transition matrix dominating another is defined as follows.
Definition 4. Let A and B be two EAs with an identical initialization mechanism.Ã = (a i,j ) andB = (b i,j ) are the transition matrices of A and B, respectively. It is said thatÃ dominatesB, denoted byÃ B , if it holds that Denote the transition probabilities of (1 + 1)EA and (1 + 1)EA C by p i,j and s i,j , respectively. For the OneMax problem and Deceptive problem, we get the relation of transition dominance on the premise that p m = C R q m = p ≤ 1 n .
Theorem 2. For (1 + 1)EA and (1 + 1)EA C , denote their transition matrices byP andS, respectively. On the condition that p m = C R q m = p ≤ 1 n , it holds for problem (1) that Proof Denote the collection of all solutions at level k by S(k), k = 0, 1, . . . , n. We prove the result by considering the transition probability Since the function values of solutions are only related to the number of '1'-bits, the probability to generate a solution y ∈ S(i) by performing mutation on x ∈ S(j) depends on the Hamming distance l = H(x, y).
Given x ∈ S j , S(i) can be partitioned as where S l (i) = {y ∈ S(i) | H(x, y) = l}, and L is a positive integer that is smaller than or equal to n.
Accordingly, the probability to transfer from level j to i is confirmed as where | S l (i) | is the size of S l (i), P (l) the probability to flip "l preferred bits". Thus, we have Since p ≤ 1/n, Theorem 1 implies that Combining it with (19) and (21) we know Then, we get the result by Definition 3.

Example 2. [Comparison of transition probabilities for the OneMax problem]
Let p m = C R q m = p ≤ 1 n . By (8), we have where M = min {n − j, i}. Since p ≤ 1/n, Theorem 1 implies that and by (23) and (24) we have Example 3.
[Comparison of transition probabilities for the Deceptive problem] Let p m = C R q m = p ≤ 1 n . Equation (10) implies that where M = min {n − j + 1, i − 1}. Similar to analysis of the Example 2, we know when p ≤ 1/n, Nevertheless, if p > 1 n , we cannot get Theorems 1. Since the differences among p i,j and q i,j depends on the characteristics of problem (1), Theorem 2 does not hold, too.

Definition 5. The average convergence rate (ACR) of an EA for t generation is
The following lemma presents the asymptotic characteristics of the ACR, by which we get the result on the asymptotic performance of EAs.

Lemma 1. [29, Theorem 1]
Let R be the transition submatrix associated with a convergent EA. Under random initialization (i.e., the EA may start at any initial state with a positive probability), it holds where ρ(R) is the spectral radius of R.
Proof By Lemma 1, we know ∀ ǫ > 0, there exists T > 0 such that From the fact that the transition submatrix R of an RSH is upper triangular, we conclude ρ(R) = max{r 1,1 , . . . , r L,L }.
Noting that the tail probability p [t] (e i ) can be taken as the expected approximation error of an optimization problem with error vector e = (0, . . . , 0 i , 1, . . . , 1) ′ ,

by (31) we have p [t]
A (e i ) ≤ p B (e i ), ∀ t > T, 1 ≤ i ≤ L. The second conclusion is proven. Definition 3 and Proposition 2 imply that dominance of transition matrix lead to the asymptotic outperformance. Then, we get the following theorem for comparing the asymptotic performance of (1 + 1)EA and (1 + 1)EA C .
Proof The proof can be completed by applying Theorem 2 and Lemma 2.
On condition that C R = C R q m = p ≤ 1 n , Theorem 3 indicates that after sufficiently many number of iterations, the (1 + 1)EA C can performs better on problem (1) than the (1 + 1)EA.
A further question is whether the (1 + 1)EA C outperforms the (1 + 1)EA for t < +∞. We answer the question in next sections.

Comparison of the Two EAs on OneMax
In this section, we show that the outperformance introduced by binomial crossover can be obtained for the unimodel OneMax problem based on the following lemma [26].
For the EAs investigated in this study, conditions (32)- (34) are satisfied thanks to the monotonicity of transition probabilities. Lemma 4. When p ≤ 1/n (n ≥ 3), P 1 (l, p) and P 2 (l, C R , p/C R ) are monotonously decreasing in l.
Proof When p ≤ 1/n, equations (13) and (14) imply that all of which are not greater than 1 when n ≥ 3. Thus, P 1 (l, p) and P 2 (l, C R , p/C R ) are monotonously decreasing in l.

Lemma 5.
For the OneMax problem, p i,j and s i,j are monotonously decreasing in j.
Proof We validate the monotonicity of p i,j for the (1 + 1)EA, and that of q i,j and s i,j can be confirmed in a similar way. Let 0 ≤ i < j < n. By (23) we know where M = min {n − j − 1, i}. Moreover, (35) implies that and we know Note that From (37), (38), (39) and (40) we conclude that Similarly, we can validate that s i,j+1 < s i,j , 0 ≤ i < j < n.
In conclusion, p i,j and s i,j are monotonously decreasing in j.

Theorem 4.
On condition that p m = C R q m = p ≤ 1 n , it holds for the OneMax problem that (1 + 1)EA C (1 + 1)EA.
Proof Given the initial distributionq [0] and transition matrixR, the level distribution at iteration t is confirmed byq By premultiplying (41) withẽ andõ i , respectively, we get Meanwhile, By Theorem 2 we have and Lemma 5 implies Then, we get the conclusion by Definition 3. The above theorem demonstrates that the dominance of transition matrices introduced by the binomial crossover operator leads to the outperformance of (1 + 1)EA C on the unimodal problem OneMax.

Comparison of the two EAs on Deceptive and Adaptive Parameter Strategy
In this section, we show that the outperformance of (1 + 1)EA C over (1 + 1)EA may not always hold on Deceptive, and then, propose an adaptive strategy of parameter setting arising from the theoretical analysis.

A counterexample for inconsistency between the transition dominance and the algorithm outperformance
For the Deceptive problem, we present a counterexample to show even if the transition matrix of an EA dominates another EA, we cannot draw that the former EA outperform the later. and the respective transition matrices arẽ . . . 1 − n 2 +n+2 Obviously, it holdsS R . Through computer simulation, we get the curve of EAE difference of the two EAs in Figure 1(a) and the curve of TPs difference of the two EAs in Figure 1(b). From Figure 1(b), it is clear that EA R does not always outperform EA S because the difference of TPs is negative at the early stage of the iteration process but later positive.

Numerical comparison for the Two EAs on Deceptive
Now we turn to discuss (1 + 1)EA and (1 + 1)EA C on Deceptive. We demonstrate (1 + 1)EA C may not outperform (1 + 1)EA over all generations although the transition matrix of (1 + 1)EA C dominates that of (1 + 1)EA. For (1 + 1)EA C , let q m = 1 2 , C R = 2 n . The numerical simulation results of EAEs and TPS for 5000 independent runs are depicted in Figure 2. It is shown that when n ≥ 9, both EAEs and TPS of (1 + 1)EA could be smaller than those of (1 + 1)EA C . This indicates that the dominance of transition matrix does not always guarantee the outperformance of the corresponding algorithm. With p m = C R q m = p ≤ 1 n , although the binomial crossover leads to transition dominance of the (1 + 1)EA C over the (1 + 1)EA, the enhancement of exploitation plays a governing role in the iteration process. Thus, the imbalance of exploration and exploitation leads to a poor performance of (1 + 1)EA C on some stage of the iteration process.

Comparisons on the Probabilities to Transfer from Non-optimal Statuses to the Optimal Status
As shown in the previous two counterexamples, the outperformance of (1 + 1)EA)M C cannot be drawn from dominance of transition matrices. To enhance the performance of an EA, adaptive parameter settings should be incorporated to improve the exploration ability of an EA on Deceptive.
Local exploitation could result in convergence to the local optimal solution, global convergence of EAs on Deceptive is principally attributed to the direct transition from level j to level 0, which is quantified by the transition probability r 0,j . Thus, we investigate the impact of binomial crossover on the transition probability r 0,j , and accordingly, arrive at the strategies for adaptive regulations of the mutation rate and the crossover rate.
In the following, we first compare p 0,j and s 0,j by investigating their monotonicity, and then, get reasonable adaptive strategies that improve the performance of EAs on the Deceptive problem.
Substituting (13) and (14) to (25) and (26), respectively, we have We first investigate the maximum values of p 0,j to get the ideal performance of (1+1)EA on the Deceptive problem.
p 0,j gets its maximum values Proof By (47), we know While p m = n−j+1 n , p 0,j gets its maximum value p max 0,j = P 1 (n − j + 1, Influence of the binomial crossover on s 0,j is investigated on condition that p m = q m . By regulating the crossover rate C R , we can compare p 0,j with the maximum value s max 0,j of s 0,j .
Theorem 6. On condition that p m = q m , the following results hold.
Proof Note that (1 + 1)EA C degrades to (1 + 1)EA when C R = 1. Then, if the maximum value s max 0,j of s 0,j is obtained by setting C R = 1, we have s max 0,j = p 0,j ; otherwise, it holds s max 0,j > p 0,j . 1. For the case that j = 1, equation (48) implies Obviously, s 0,1 is monotonously increasing in C R . It gets the maximum value while C ⋆ R = 1. Then, by (47) we have s max 0,1 = p 0,1 . 2. While j = 2, by (48) we have • If 0 < q m ≤ n−1 n , s 0,2 is monotonously increasing in C R , and gets its maximum value while C ⋆ R = 1. For this case, we know s max 0,2 = p 0,2 .
• While n−1 n < q m < 1, s 0,2 gets its maximum value s max 0,2 by setting Then, we have s max 0,2 > p 0,2 . 3. For the case that 3 ≤ j ≤ n − 1, we denote where Then, • While 0 < q m ≤ n−j n−1 , both I 1 and I 2 are monotonously increasing in C R . For this case, s 0,j gets its maximum value when C ⋆ R = 1, and we have s max 0,j = p 0,j .
• If n−j+1 n−1 ≤ q m ≤ 1, I 1 gets its maximum value when C R = n−j (n−1)qm , and I 2 gets its maximum value when C R = n−j+1 (n−1)qm . Then, s 0,j could get its maximum value s max 0,j at some Then, we kmow s max 0,j > p 0,j . • If n−j n−1 < q m < n−j+1 n−1 , I 1 gets its maximum value when C R = n−j (n−1)qm , and I 2 is monotonously increasing in C R . Then, s 0,j could get its maximum value s max 0,j at some Then, s max 0,j > p 0,j . 4. While j = n, equation (48) implies that we can confirm the sign of ∂s 0,n /∂C R by considering • While 0 < q m ≤ n−1 n , g (q m , C R ) is monotonously decreasing in C R , and its minimum value is g (q m , 1) = (nq m − 1) (q m − 1) .
The maximum value of g (q m , C R ) is (a) If 0 < q m ≤ 1 n , we have g (q m , C R ) ≥ g (q m , 1) > 0.
Thus, ∂s0,n ∂CR ≥ 0, and s 0,n is monotonously increasing in C R . For this case, s 0,n get its maximum value when C ⋆ R = 1, and we have s max 0,n = p 0,n . (b) If 1 n < q m ≤ 1 2 , s 0,n gets the maximum value s max 0,n when Thus, s max 0,n > p 0,n . (c) If 1 2 < q m ≤ n−1 n , g (q m , 0) < 0, and then, s 0,n is monotonously decreasing in C R . Then, its maximum value is obtained by setting C ⋆ R = 0. Then, we know s max 0,n > p 0,n . • While n−1 n < q m ≤ 1, g (q m , C R ) is monotonously increasing in C R , and its maximum value g (q m , 1) = (nq m − 1) (q m − 1) < 0.
Then, s 0,n is monotonously decreasing in C R , and its maximum value is obtained by setting C ⋆ R = 0. Then, s max 0,n > p 0,n . In summary, s max 0,n > p 0,n while q m > 1 n ; otherwise, s max 0,n = p 0,n .
Theorems 5 and 6 presents the "best" settings to maximize the transition probabilities from non-optimal statues to the optimal level. Unfortunately, such results are not available if the global optima solution is unknown.

Parameter Adaptive Strategy to Enhance Exploration of EAs
We proposes a parameter Adaptive Strategy based on Hamming distance. Since the level index j is equal to the Hamming distance between x and x * , improvement of level index j is in deed equal to reduction of the Hamming distance obtained by replacing x with y. Then, while the local exploitation leads to transition from level j to a non-optimal level i, the practically adaptive strategy of parameters can be obtained according to the Hamming distance between x and y.
For the Deceptive problem, consider two solutions x and y such that f (y) > f (x), and denote their levels by j and i, respectively. When the (1 + 1)EA is located at the solution x, equation (49) implies that the "best" setting of mutation rate is p ⋆ m = n−j+1 n . While the promising solution y is generated, the level transfers from j to i, and the "best" setting change to p ⋆ m = n−i+1 n . Let H(x, y) denote the Hamming distance between x and y. Noting that H(x, y) ≥ j − i, we know the difference of "best" parameter settings is bounded from above by H(x,y) n . Accordingly, the mutation rate of (1 + 1)EA can be updated to For the (1 + 1)EA C , the parameter q m is adapted using the strategy consistent to that of p m to focus on influence of C R . That is, Since s 0,j demonstrates different monotonicity for varied levels, one cannot get an identical strategy for adaptive setting of C R . As a compromise, we would like to consider the case that 3 ≤ j ≤ n − 1, which is obtained by random initialization with overwhelming probability. According to the proof of Theorem 6, we know C R should be set as great as possible for the case q m ∈ (0, n−j n−1 ]; while q m ∈ ( n−j n−1 , 1], C ⋆ R is located in intervals whose boundary values are n−j (n−1)qm and n−j+1 (n−1)qm , given by (52) and (53), respectively. Then, while q m is updated by (55), the update strategy of C R can be confirmed to satisfying that H(x, y) n − 1 .
Accordingly, the adaptive setting of C R could be where q ′ m is updated by (55). To demonstrate the promising function of the adaptive update strategy, we incorporate it to (1 + 1)EA and (1 + 1)EA C to get the adaptive variant of these algorithms, and test their performance on the 12-20 dimensional Deceptive problems. Parameters of EAs are initialized by (15), and adapted according to (54), (55) and (56), respectively.
Since the adaptive strategy decreases stability of performances to a large extent, numerical simulation of the tail probability is implemented by 10,000 independent runs. To investigate the sensitivity of the adaptive strategy on initial values of q m , the mutation rate q m in (1 + 1)EA C is initialized with values 1 √ n , 3 2 √ n and 2 √ n , where the three variants are denoted by (1 + 1)EA 1 C , (1 + 1)EA 2 C and (1 + 1)EA 3 C , respectively. The converging curves of averaged TPs are illustrated in Figure 3. Compared to the EAs with fixed parameters during the evolution process, the performance of adaptive EAs has significantly been improved. Meanwhile, the outperformance of (1 + 1)EA C introduced by the binomial crossover is greatly enhanced by the adaptive strategies, too.

Conclusions
Under the framework of fixed budget analysis, we conduct a pioneering analysis of the influence of binomial crossover on the approximation error of EAs. The performance of an EA after running finite generations is measured by two metrics: the expected value of the approximation error and the error tail probability. Using the two metrics, we make a case study of comparing the approximation error of (1 + 1)EA and (1 + 1)EA C with binomial crossover.
Starting from the comparison on probability of flipping "l preferred bits", it is proven that under proper conditions, incorporation of binomial crossover leads to dominance of transition probabilities, that is, the probability of transferring to any promising status is improved. Accordingly, the asymptotic performance of (1 + 1)EA C is superior to that of (1 + 1)EA.
It is found that the dominance of transition probability guarantees that (1+1)EA C outperforms (1+1)EA on OneMax in terms of both expected approximation error and tail probability. However, this dominance does leads to the outperformance on Deceptive. This means that using binomial crossover may reduce the approximation error on some problems but not on other problems.
For Deceptive, an adaptive strategy of parameter setting is proposed based on the monotonicity analysis of transition probabilities. Numerical simulations demonstrate that it can significantly improve the exploration ability of (1 + 1)EA C and (1 + 1)EA, and superiority of binomial crossover is further strengthened by the adaptive strategy. Thus, a problem-specific adaptive strategy is helpful for improving the performance of EAs.
Our future work will focus on a further study for adaptive setting of crossover rate in population-based EAs on more complex problems, as well as development of adaptive EAs improved by introduction of binomial crossover.