Hybrid Cuckoo Search for the Capacitated Vehicle Routing Problem

Having the best solution for Vehicle Routing Problem (VRP) is still in demand. Beside, Cuckoo Search (CS) is a popular metaheuristic based on the reproductive strategy of the Cuckoo species and has been successfully applied in various optimizations, including Capacitated Vehicle Routing Problem (CVRP). Although CS and hybrid CS have been proposed for CVRP, the performance of CS is far from the state-of-art. Therefore, this study proposes a hybrid CS with Simulated Annealing (SA) algorithm for the CVRP, consisting of three improvements—the investigation of 12 neighborhood structures, three selections strategy and hybrid it with SA. The experiment was conducted using 16 instances of the Augerat benchmark dataset. The results show that 6 out of 12 neighborhood structures were the best and the disruptive selection strategy is the best strategy. The experiments’ results showed that the proposed method could find optimal and near-optimal solutions compared with state-of-the-art algorithms.


Introduction
Having the best solution for the vehicle routing problem (VRP) is challenging. It is categorized as a combinatorial optimization problem related to operations research, algorithm theory, and computational complexity theory, which use mathematical and scientific studies with a robust experiment to obtain a reliable solution. The VRP aims to optimize the delivery routing by finding the least-cost (distance, time) route [1]. Determining the optimal solution for VRP without violating specific limitations is also known as capacitated VRP (CVRP) [2] or NP-hard problem [3]. Several metaheuristics have been developed to solve the CVRP that can be classified into two main categories: single-based and population-based. Single-based solutions such as tabu search [4] and simulated annealing (SA) [5] have the ability to exploit the search space in a short time but also have some limitations such as weak exploration as Lévy flight artificial bee colony algorithm. Population-based solutions such as the genetic algorithm (GA) [6], particle swarm optimization (PSO) [7-10], water flow-like algorithm [11,12] and membrane algorithm [13] are more concerned with exploration rather than exploitation but have limitations in terms of premature convergence and low convergence speed. Enhancement of perturbation based variable neighborhood search with adaptive selection mechanism is also proposed to address the capacitated vehicle routing problem [14]. Hybridization between a population-based and a single-based metaheuristic has been proposed for the CVRP to overcome these two categories' limitations [15].

1.
Adopting a suitable multiple neighborhood structure strategy to improve CS exploration and to help the algorithm to escape from local optima by observe the behaviors of a random sequence of 12 neighborhood structures and then select the most dominant ones to form the best sequence.

2.
Adopting a suitable selection strategy for exploring the search space to find new solutions and enhance CS exploitation by investigating three selection strategies, namely tournament, rank and disruptive, to find the best-performing one. 3.
Combining the advantages of CS with those of SA to improve their ability to cooperatively explore the search space and quickly find good solutions within a small region of the search space and also to prevent deterioration. 4.
Lastly, evaluate the proposed hybrid CS with SA against other state-of-art methods based on average, standard deviation and computational time. The algorithm is furthermore evaluated using Freidman, Holm and Hochberg test, to identify whether there was a significant difference between the proposed algorithm and the best methods in the literature.
The remainder of this paper is organized as follows: we discuss the basic CS, our propose hybrid CS and explain how it can be applied to the CVRP. We present the results of our experiments that were conducted to select the most suitable components for the proposed CS and then compare the performance of the proposed hybrid CS, which we call Hybrid Cuckoo Search with Simulated Annealing (HCS-SA), with that of state-of-the-art methods by way of computational experiments. Finally, we conclude and suggest areas for further research.

CVRP Description
In CVRP, a fleet of K vehicles with homogeneous capacity Q is available for serving customers. The single depot and customers are defined as a set of node N = {0, . . . , n}, where node 0 represents the depot and the other nodes represent the customers. Each vehicle trip must start from and end at the depot. Each customer i with known demand di is visited by an exact one vehicle trip. The total demand assigned to each vehicle trip must not exceed the vehicle capacity Q. dij > 0 corresponds to the travel distance from node i and node j. The objective of the CVRP problem is to determine a set of vehicle trips serving all customers while minimizing the total travel distance [2].

Standard Cuckoo Search
Brood parasitism is an interesting feature of the cuckoo bird, in which a female cuckoo lays her eggs in another nest of the observed bird and lets the host bird hatch and brood the young cuckoo chicks. The cuckoo adapting its eggs like a host bird in terms of patterns, shapes and colors to increase Symmetry 2020, 12, 2088 4 of 28 the probability of a new cuckoo being born and reducing the probability of the host bird abandoning the eggs [16]. The behavior of the cuckoo in its search for a suitable nest is represented by a combination of CS with Lévy flight [56]. Lévy flight represents a random walk model that is characterized by step lengths that obey a power-law distribution. Studies have shown that hunters search for prey by following a Lévy flight pattern, which commonly consists of small random steps followed in the long term by large jumps [57].
The CS was recently developed by Yang and Deb [16] and was initially designed to solve multimodal functions by following the rules that each cuckoo lays one egg at a time and selects a nest randomly, the best nest with the highest quality egg can pass onto the new generations and the number of host nests is fixed and the egg laid by a cuckoo can be discovered by the host bird with a probability pa ≤ [0, 1].
A cuckoo i generates a new solution xi (t+1) via Lévy flight, according to Equation (1): where α is the step size that follows the Lévy distribution that is shown in Equation (2): which has an infinite variance with an infinite mean [16] and s is a step size drawn from a Lévy distribution. The Lévy flight is described by Reference [58] as a special type of random walk whose step length in not constant, rather are chosen from a probability distribution with a power-law tail, this means that very large step lengths are possible. The potential of using Lévy flight to enhance met-heuristic is reported in various studies [59][60][61].
In CS, cuckoos are abstracted as entities that each has an associated solution (i.e., an egg) of the optimization problem that they try to place in a solution container (i.e., the host bird's nest). Table 1 provides an analogy between the behavior of the cuckoo in nature and the artificial behavior of the CS. A detailed description of the basic CS for solving the CVRP is presented in Figure 1 [62].

Standard Cuckoo Search
Brood parasitism is an interesting feature of the cuckoo bird, in which a female cuckoo lays her eggs in another nest of the observed bird and lets the host bird hatch and brood the young cuckoo chicks. The cuckoo adapting its eggs like a host bird in terms of patterns, shapes and colors to increase the probability of a new cuckoo being born and reducing the probability of the host bird abandoning the eggs [16]. The behavior of the cuckoo in its search for a suitable nest is represented by a combination of CS with Lévy flight [56]. Lévy flight represents a random walk model that is characterized by step lengths that obey a power-law distribution. Studies have shown that hunters search for prey by following a Lévy flight pattern, which commonly consists of small random steps followed in the long term by large jumps [57].
The CS was recently developed by Yang and Deb [16] and was initially designed to solve multimodal functions by following the rules that each cuckoo lays one egg at a time and selects a nest randomly, the best nest with the highest quality egg can pass onto the new generations and the number of host nests is fixed and the egg laid by a cuckoo can be discovered by the host bird with a probability pa ≤ [0, 1].
A cuckoo i generates a new solution xi (t+1) via Lévy flight, according to Equation (1): where α is the step size that follows the Lévy distribution that is shown in Equation (2): which has an infinite variance with an infinite mean [16] and s is a step size drawn from a Lévy distribution. The Lévy flight is described by Reference [58] as a special type of random walk whose step length in not constant, rather are chosen from a probability distribution with a power-law tail, this means that very large step lengths are possible. The potential of using Lévy flight to enhance met-heuristic is reported in various studies [59][60][61].
In CS, cuckoos are abstracted as entities that each has an associated solution (i.e., an egg) of the optimization problem that they try to place in a solution container (i.e., the host bird's nest). Table 1 provides an analogy between the behavior of the cuckoo in nature and the artificial behavior of the CS. A detailed description of the basic CS for solving the CVRP is presented in Figure 1 [62].

Cuckoo Search for CVRP
To adapt CS to the CVRP these notions will appear in the discussion of the following three main elements: egg, nest and search space, which extend the work of Reference [19]. These key elements can have important implications for combinatorial problems.

The Egg
Assuming that a cuckoo lays a single egg in one nest, we can say that one egg in a nest is a solution represented by one individual in the population. An egg can also be one new candidate solution laid by a cuckoo for a place/location reserved by an individual in the population, while the nest is the container of that new cuckoo egg. In the CVRP, one egg is the equivalent of multiple Hamiltonian cycles of served routes that start and end at the central depot, as shown in Figure 2.
h for CVRP to the CVRP these notions will appear in the discussion of the follo est and search space, which extend the work of Reference [19]. The ant implications for combinatorial problems.
hat a cuckoo lays a single egg in one nest, we can say that one e nted by one individual in the population. An egg can also be one a cuckoo for a place/location reserved by an individual in the popu iner of that new cuckoo egg. In the CVRP, one egg is the equiva les of served routes that start and end at the central depot, as shown Figure 2. The egg appears as a multiple Hamiltonian cycle route. resentation adopted in this study can be described as follows-assu vailable vehicles for delivery, then the number 0 denotes the depo ts. Based on the v vehicle at the depot, so each egg has at most th ry path (route) starts at the depot and stops at the depot. Possible r The egg representation adopted in this study can be described as follows-assume a CVRP with n clients and v available vehicles for delivery, then the number 0 denotes the depot and 1, 2, . . . , n denotes the clients. Based on the v vehicle at the depot, so each egg has at most the v distribution path (route), every path (route) starts at the depot and stops at the depot. Possible routes are shown in Figure 3.

Cuckoo Search for CVRP
To adapt CS to the CVRP these notions will appear in the discussion of the following three main elements: egg, nest and search space, which extend the work of Reference [19]. These key elements can have important implications for combinatorial problems.

The Egg
Assuming that a cuckoo lays a single egg in one nest, we can say that one egg in a nest is a solution represented by one individual in the population. An egg can also be one new candidate solution laid by a cuckoo for a place/location reserved by an individual in the population, while the nest is the container of that new cuckoo egg. In the CVRP, one egg is the equivalent of multiple Hamiltonian cycles of served routes that start and end at the central depot, as shown in Figure 2. The egg representation adopted in this study can be described as follows-assume a CVRP with n clients and v available vehicles for delivery, then the number 0 denotes the depot and 1, 2, …, n denotes the clients. Based on the v vehicle at the depot, so each egg has at most the v distribution path (route), every path (route) starts at the depot and stops at the depot. Possible routes are shown in Figure 3.

The Nest
In CS, the number of nests is fixed and this number represents the size of the population. A nest is a container of an individual of the population and its abandonment involves its egg being replaced in the population by a new one. By projecting these features on to the CVRP, we can say that a nest is shown as an individual in the population with its Hamiltonian cycle route. Obviously, a nest can

The Nest
In CS, the number of nests is fixed and this number represents the size of the population. A nest is a container of an individual of the population and its abandonment involves its egg being replaced in Symmetry 2020, 12, 2088 6 of 28 the population by a new one. By projecting these features on to the CVRP, we can say that a nest is shown as an individual in the population with its Hamiltonian cycle route. Obviously, a nest can have multiple eggs for future extensions but in this study, each nest contains only one egg.

The Search Space
In the CVRP, the visited customers on each route have fixed coordinates; however, the visiting order between the customers can be changed either within the same route or between different routes. The search space in the CVRP is a set of points (customers). Each point is representative of a route that appears as a potential solution. These solutions are positioned in space according to the order of their customers. Since the coordinates of customers are fixed, the movements are based on the order of visited customers. There are several neighborhoods structure, methods, operators, or perturbations that generate a new solution from another existing solution by changing the order of visited customers.
In the adaptation of CS to CVRP, one possible perturbation that can be used to change the order of visited customers is the REINSERTION, SHIFT-1-0, K-SHIFT and EXCHANGE neighborhoods structure. However, REINSERTION, SHIFT-1-0 is used for small steps, while larges jumps are made by SWAP-2-2 and K-SHIFT. Moving a solution to another in the search space (combinatorial space) is made by small steps in a small area around the current solution. Therefore, carrying out several steps leads to additional solutions that are farther away. However, if we want to change the area of search and point to another far area, the solution is perturbed by K-SHIFT and EXCHANGE neighborhood structures.

Propose Discrete Hybrid Cuckoo Search
In this section, we present the proposed hybrid discrete CS with SA to emulate cuckoo behavior in nature. The main purpose of improving CS is to balance exploration and exploitation to find a better quality solution in less time. In nature, there is what could be likened to an arms race between host birds and cuckoo, where host birds evolve defenses and cuckoos respond with trickery to increase their eggs' chance of survival [63]. For this reason, cuckoo birds always try to adapt their eggs to appear similar in pattern, shape and color to those of the host bird species. This arms race serves as an inspiration to improve CS in this research. The proposed improvements are summarized in the four phase, that is, the first phase is starts race when the host bird lays eggs that are enhanced in terms of their pattern, shape and color. This step represents the generation of the initial population in CS and for this purpose, a sequential insertion heuristic is used. The second phase is the enhancements of the host bird eggs puts more pressure on the cuckoo bird to respond with trickery to increase its own eggs' chance of survival. For this purpose, two components are investigated (neighborhood structures and acceptance criteria) to identify a suitable neighborhood structure strategy. The third phase is the cuckoo bird in nature has some kind of surveillance to decide which host nest is the best choice in which to lay its eggs. For this purpose, three selection strategies are investigated in order to find the most suitable one. The last phase is the cuckoo chick starts to throw out the host eggs to get rid of competition for food by using SA when the cuckoo egg is hatched. Thus, a kind of local search is performed around the current solution.
The general pseudo-code for the proposed CS is presented in Figure 4. Note that the act of laying eggs involves the cuckoo bird egg (new solution) and the act of selecting eggs involves the host bird egg (old solution). Moreover, the nest always has one egg in it, which means that the selection and lying of the egg in that nest actually relate to the same egg, which changes from a host to a cuckoo egg.
As shown in Figure 4, CS starts with an initial solution (empty host nest) and then an initial population is generated using a sequential insertion heuristic (lines 5-7). The number of generated solutions in the population is equal to the number of laid host nest eggs. Then the host nest eggs are evaluated by the fitness function (line 8). After the initialization is finished and the host bird's nests are built with eggs in them, the cuckoo bird arrives to lay its egg in one of the host bird's nests, which is described in the pseudo-code as the iterative stage (lines [13][14][15][16][17][18][19][20][21][22][23][24][25][26][27]. The cuckoo bird starts to select a host bird nest in which to lay its egg (this act of laying the egg can be described as replacing the Symmetry 2020, 12, 2088 7 of 28 current (host) egg with a new, hopefully improved (cuckoo) one). The selection is made by using one of the selection strategies, which ranks the host bird eggs in the population according to the selection probability (in the basic CS, the solutions are ranked based on just the fitness value). The host bird egg that has the highest rank is selected based on that selection strategy for local exploration. Later, the selected egg is improved by the neighborhood structures.
Symmetry 2020, 12, x FOR PEER REVIEW 7 of 28 host bird egg that has the highest rank is selected based on that selection strategy for local exploration. Later, the selected egg is improved by the neighborhood structures. In the natural world, to increase its chance of surviving and hatching to become a bird, a cuckoo bird egg has to imitate the host bird egg in the pattern, color and shape. The cuckoo bird must adapt with nature and develop this imitation and this behavior is represented in line 17 of the pseudo-code, where a local search mechanism is employed to explore the neighbor solutions of the selected host bird egg, which will lead to further improvement in the pattern, color and shape of the selected egg. In this study, SA is used for this purpose. The improved cuckoo bird egg is then compared to its previous state (host bird egg); if its pattern, color and shape are better than the previous egg it is accepted, otherwise it is discarded (lines [19][20][21][22]. When the host bird goes to the nest, it starts to find the alien (cuckoo) eggs, the discovery of which depends on parameter pa. The discovered cuckoo eggs are destroyed or the nest is abandoned, in this case, a Lévy flight is used to replace the worse cuckoo eggs by applying neighborhood structures followed by the acceptance criterion (lines [23][24][25]. Finally, all the host nest eggs are ranked based on their fitness value (line 27) and the iterative improvements stage starts again.

Initialization of Host Nest Using Sequential Cheapest Insertion Heuristic
In the context of the CVRP, the initial host bird eggs are generated using a sequential cheapest insertion heuristic, whereby the customer with the minimum traveling cost is sequentially inserted into its respective route until all the vehicles are full. The main aim is to build initial quality solutions using relatively simple and fast schemes. In the natural world, to increase its chance of surviving and hatching to become a bird, a cuckoo bird egg has to imitate the host bird egg in the pattern, color and shape. The cuckoo bird must adapt with nature and develop this imitation and this behavior is represented in line 17 of the pseudo-code, where a local search mechanism is employed to explore the neighbor solutions of the selected host bird egg, which will lead to further improvement in the pattern, color and shape of the selected egg. In this study, SA is used for this purpose. The improved cuckoo bird egg is then compared to its previous state (host bird egg); if its pattern, color and shape are better than the previous egg it is accepted, otherwise it is discarded (lines [19][20][21][22]. When the host bird goes to the nest, it starts to find the alien (cuckoo) eggs, the discovery of which depends on parameter p a . The discovered cuckoo eggs are destroyed or the nest is abandoned, in this case, a Lévy flight is used to replace the worse cuckoo eggs by applying neighborhood structures followed by the acceptance criterion (lines [23][24][25]. Finally, all the host nest eggs are ranked based on their fitness value (line 27) and the iterative improvements stage starts again.

Initialization of Host Nest Using Sequential Cheapest Insertion Heuristic
In the context of the CVRP, the initial host bird eggs are generated using a sequential cheapest insertion heuristic, whereby the customer with the minimum traveling cost is sequentially inserted Symmetry 2020, 12, 2088 8 of 28 into its respective route until all the vehicles are full. The main aim is to build initial quality solutions using relatively simple and fast schemes.

Selection of Host Nest Egg to Lay a New Cuckoo Egg
The basic CS uses a random selection strategy. This does not fully reflect the behavior of the cuckoo bird in nature, which is based on selecting the most apparently similar eggs in the pattern, color and shape to increase the survival of the egg. Therefore, we incorporate a more advanced selection strategy to emulate this behavior better. In the proposed approach, the calculation of the selection probability is based on the fitness of solution for each egg is based on one of three selection strategies. However, after the calculation of these probabilities is obtained, the solution selection is made based on roulette wheel selection, as shown in Figure 5. color and shape to increase the survival of the egg. Therefore, we incorporate a more advanced selection strategy to emulate this behavior better. In the proposed approach, the calculation of the selection probability is based on the fitness of solution for each egg is based on one of three selection strategies. However, after the calculation of these probabilities is obtained, the solution selection is made based on roulette wheel selection, as shown in Figure 5.

Tournament Selection
Tournament selection is a selection process in which a number of eggs (Ntour) from the population are chosen at random and then a comparison is made depending on their fitness in order to choose the best egg. Parameter Ntour is called the tournament size. Normally, tournaments are held only between two eggs (binary tournament) but the generalization is possible to an arbitrary group. Tournament selection gives eggs with high fitness a greater chance to survive [64]. In this study, we select two eggs from the population and compare their fitness values, then assign one score coded as (a) to the better egg. This process is repeated for all the eggs in the population, as shown in Figure 6, where fi is the fitness value of i = 0 … n and n is the population size [65]. After calculating the value of (a) for all the eggs, the selection probability for each egg is calculated by using Equation (3):

Rank Selection
In rank selection, eggs are sorted in descending order based on the fitness value. An index k value is given to each egg from the best to the worst, that is, for the best fitness, k = 1, whilst for the worst fitness, k = n, where n is the population size and N is the maximum number of iterations. Finally, the selection probability is calculated by using Equation (4) [66]: Disruptive Selection The disruptive selection gives higher and lower quality eggs the chance of being selected by changing the definition of the fitness function as in Equation (5):

Tournament Selection
Tournament selection is a selection process in which a number of eggs (N tour ) from the population are chosen at random and then a comparison is made depending on their fitness in order to choose the best egg. Parameter N tour is called the tournament size. Normally, tournaments are held only between two eggs (binary tournament) but the generalization is possible to an arbitrary group. Tournament selection gives eggs with high fitness a greater chance to survive [64]. In this study, we select two eggs from the population and compare their fitness values, then assign one score coded as (a) to the better egg. This process is repeated for all the eggs in the population, as shown in Figure 6, where f i is the fitness value of i = 0 . . . n and n is the population size [65]. and shape to increase the survival of the egg. Therefore, we incorporate a more adva ion strategy to emulate this behavior better. In the proposed approach, the calculation o ion probability is based on the fitness of solution for each egg is based on one of three sele gies. However, after the calculation of these probabilities is obtained, the solution selecti based on roulette wheel selection, as shown in Figure 5. ament Selection ournament selection is a selection process in which a number of eggs (Ntour) from lation are chosen at random and then a comparison is made depending on their fitness in o ose the best egg. Parameter Ntour is called the tournament size. Normally, tournaments are etween two eggs (binary tournament) but the generalization is possible to an arbitrary gr ament selection gives eggs with high fitness a greater chance to survive [64]. In this study two eggs from the population and compare their fitness values, then assign one score code the better egg. This process is repeated for all the eggs in the population, as shown in Figu fi is the fitness value of i = 0 … n and n is the population size [65]. fter calculating the value of (a) for all the eggs, the selection probability for each eg ated by using Equation (3): Selection n rank selection, eggs are sorted in descending order based on the fitness value. An ind is given to each egg from the best to the worst, that is, for the best fitness, k = 1, whilst fo fitness, k = n, where n is the population size and N is the maximum number of iterat After calculating the value of (a) for all the eggs, the selection probability for each egg is calculated by using Equation (3):

Rank Selection
In rank selection, eggs are sorted in descending order based on the fitness value. An index k value is given to each egg from the best to the worst, that is, for the best fitness, k = 1, whilst for the worst Symmetry 2020, 12, 2088 9 of 28 fitness, k = n, where n is the population size and N is the maximum number of iterations. Finally, the selection probability is calculated by using Equation (4) [66]: Disruptive Selection The disruptive selection gives higher and lower quality eggs the chance of being selected by changing the definition of the fitness function as in Equation (5): where f i is the fitness function and f t is the average value of the fitness function f i of the individuals in the population.

Best Sequence of Neighborhood Structure and Lévy Flight
A multi-neighborhood structure is used to improve the cuckoo bird's eggs' imitation of the host bird eggs in the nests in terms of pattern, color and shape and thus give them a good chance of survival. The neighborhood structure consists of seven inter-routes (where a change occurs between two routes) and five intra-routes (where a change occurs within the same route), that is, 12 neighborhood structures in total, as shown in Table 2. Five of the seven inter-routes are based on the λ-interchange scheme [67], which involves exchanging up to λ customers between two routes. In this study, λ = 2 is considered due to the high computational cost associated with large λ.

Name Category Details
SHIFT-1-0 Inter-route One customer is transferred from one route to another route. SWAP-1-1 Inter-route Swap one customer from one route with one customer from another route. SHIFT-2-0 Inter-route Move two adjacent customers from one route to another route.

SWAP-2-1
Inter-route Swap two adjacent customers from one route with one customer from another route.

SWAP-2-2
Inter-route Swap two adjacent customers from one route with two adjacent customers from another route.

CROSS Inter-route
The arc between two adjacent customers i and j belonging to a route one and the arc between two adjacent customers i and j belonging to route two are both removed. Next an arc is inserted connecting i and j and another is inserted linking i and j.
K-SHIFT Inter-route A subset of consecutive customers is transferred from a route one to the end of a route two. REINSERTION Intra-route One customer is removed and later inserted into other position of the route.
OR-OPT2 Intra-route Two adjacent customers are removed and later inserted into other position of the route.
OR-OPT3 Intra-route Three adjacent customers are removed and later inserted into other position of the route.
TWO-OPT Intra-route Two nonadjacent arcs are deleted and later other two are added in such a way that a new route is generated. EXCHANGE Intra-route Permutation of two customers being swapped.
To avoid generating infeasible solutions, we include the following two restrictions, that is, checking the vehicle capacity before adding a new customer and adding another restriction to avoid emptying the vehicle of the customer, whereby at least one customer is retained to be served by a particular vehicle. The above neighborhood structures need to be linked to the step length generated by the Lévy flight. Lévy flight is generated by a probability density function that has a power-law tail. The Cauchy distribution is often used for this purpose [68]. The method we use to generate a random number from a Lévy distribution is shown in Figure 7. In this study, the adaptation of CS for the discrete CVRP is similar to the neighborhood structure adaptation for the TSP in Reference [69]. The 12 neighborhood structures are each associated with a step length generated by Lévy flight that can be categorized into small (more frequent), medium (most frequent) and large (less frequent) steps. Thus, the eggs' pattern, color and shape are changed according to neighborhood structures for each category: small (SHIFT-1-0, SWAP-1-1, SHIFT-2-0, REINSERTION and OR-OPT2), medium (OR-OPT3, TWO-OPT, EXCHANGE and SWAP-2-1, Whilst SWAP-2-2) and large (CROSS, K-SHIFT). To facilitate control over the size of the steps, an interval between 0 and 1 is used. Therefore, according to the value given by the Lévy flight in this interval, we can choose the appropriate step length as follows: If the value of Lévy is: , a big neighborhood step is performed.
The value of i in this process is I = (1/(n + 1)), where n is the maximum number of steps and k is {0, ..., n}. So, if we assume that n = 12, then i = 0.07, thus the interval is divided into 12 parts, as listed in Table 3. In this study, the adaptation of CS for the discrete CVRP is similar to the neighborhood structure adaptation for the TSP in Reference [69]. The 12 neighborhood structures are each associated with a step length generated by Lévy flight that can be categorized into small (more frequent), medium (most frequent) and large (less frequent) steps. Thus, the eggs' pattern, color and shape are changed according to neighborhood structures for each category: small (SHIFT-1-0, SWAP-1-1, SHIFT-2-0, REINSERTION and OR-OPT2), medium (OR-OPT3, TWO-OPT, EXCHANGE and SWAP-2-1, Whilst SWAP-2-2) and large (CROSS, K-SHIFT). To facilitate control over the size of the steps, an interval between 0 and 1 is used. Therefore, according to the value given by the Lévy flight in this interval, we can choose the appropriate step length as follows: If the value of Lévy is: 1. 0, i, one step of a small neighborhood structure is performed.. 2. (k−1) × i, k × i, one step of a medium neighborhood structure is performed. 3. k × i, 1, a big neighborhood step is performed.
The value of i in this process is I = (1/(n + 1)), where n is the maximum number of steps and k is {0, ..., n}. So, if we assume that n = 12, then i = 0.07, thus the interval is divided into 12 parts, as listed in Table 3.
The association of these steps with the neighborhood structures is set without any prior knowledge. However, it is better to sequence them based on experimental knowledge in order to identify the most successful neighborhood that has a high impact on improving the solutions. Therefore, an experimental investigation of these neighborhood structures was carried out to identify their effectiveness in improving the solution. Table 3. Lévy flight associations with neighborhood structures.
Step In this study, each selected egg is further improved using SA as a local search, whereby we employed the same neighborhood structures mechanism in the previous Section 3.2.3. However, only one neighborhood structure is fired at each iteration based on levy flight. In order to avoid getting trapped in local minima, the fundamental idea is to allow moves to solutions with objective function values that are worse than the objective function value of the current solution. Such a move is often called an uphill-move. At each iteration a solution s' is generated and later it is randomly chosen. If f(s') is better than f(s) (i.e., has a lower objective function value), then s' is accepted as the new current solution. Otherwise, if the move from s to s' is a worse solution, s' is accepted with a probability calculated using Equation (6): Usually, this probability is computed following the Boltzmann distribution, where ∆f = f(s') − f(s) is the difference in quality between the previous and the newly generated solution and t is the current temperature. The temperature is reduced iteratively by the geometric scheduled g(t) = βt, where β is the cooling rate (β < 1). The search process is repeated until the termination criterion is met.

Experiment Design
The proposed algorithm consists of three investigation process. The first is to identify a suitable neighborhood strategy. The second identifies the selection strategy in which three selections strategy was investigated and later investigated the performance of proposed hybrid CS with SA. Experiments were conducted to evaluate the performance of the proposed CS on 16 instances Augerat benchmark [9], which can be downloaded from www.branchandcut.org/VRP/data/. In the instances, the total number of customers varies from 30 to 135 and the total number of vehicles varies from 3 to 10. The locations of customers appear in some instances in clusters, whilst in other problems, the customers are randomly scattered or semi-clustered. Experiments were performed on a 3.2 GHz Intel Core i3 CPU and the heuristics were coded using C++ in a Microsoft Visual Studio 2013 environment.
The following details of these instances are presented in Table 4: •

Number of vehicles and customers (column 2 and 3) •
The capacity of each vehicle (column 4) • Tightness of each problem instance (column 5), which is computed by • Best-known solution (BKS) of each instance (column 6).
Symmetry 2020, 12, 2088 12 of 28 The best (Min.), average (Avg.) and worst (Max.) solution and the standard deviation (Std.) were computed 31 independent runs on each instance, along with the average computational time in seconds required to reach the final best solutions. The best solutions found by the proposed algorithm that was equal to the BKS are shown in bold in the tables in this section. The basic discrete CS parameters were optimized based on the Taguchi method, which is part of the initial stage in this study published in reference [2], the number of iterations equaled 200, the fraction of worse nests to be discarded was 0.1 and there were 50 nests. Simulated annealing has three parameters: initial temperature, final temperature and cooling schedule. Table 5 shows the parameter values that were used in this study. The final temperature and cooling schedule were set based on a suggestion in Reference [70], whilst the value of the initial temperature was set based on a preliminary experiment in which three different values of the initial temperature were measured. Four instances (A-n33-k5, B-n35-k5, E-n51-k5 and P-n101-k4) of different sizes were chosen to conduct this experiment. The result of the preliminary experiment is summarized in Table 6. The SA performed better when the initial temperature was set to 100.

Parameter Value
Initial temperature 100 set in the preliminary experiment as in  Three experiments were performed in this study: • Experiment 1: First, we investigated the performance of the 12 neighborhood structures to identify the best set of neighborhoods among them and then we formed a sequence based on the knowledge obtained. Second, we investigated which of two acceptance criteria (best, first) was the most suitable for improving the solution. All the neighborhood is evaluated and the best neighborhood is chosen. We were then able to create a neighborhood-enhanced CS (NE-CS) whose performance we compared with that of the basic CS. • Experiment 2: We investigated three selection strategies and combined the best-performing one (which we named Discrete Cuckoo Search or Dis-CS) with NE-CS and compared its performance with the output from the first experiment. • Experiment 3: We hybridized CS with SA (which we named HCS-SA) and compared its performance with the output from the second experiment.

Results of Comparison of Neighborhood Structures
First, we investigated the performance of the neighborhood structures on a random generated solution. The results obtained by the CS on 11 independent runs for each instance were averaged. The results of the experiment are summarized in Table 7, which shows the average neighborhood structure frequency (avg neighborhood frequency) along with the average actual improvement of each neighborhood structure. The frequency was calculated by taking into account all the executions until the termination criterion was met. Based on this frequency, we ranked each neighborhood structure according to its successful percentage rate in improving the solution. It can be seen from the table that REINSERTION achieved the highest percentage of success, whereas K-SHIFT had the lowest percentage. The lowest percentage indicates that the algorithm became stuck in local optima and therefore the neighborhood structure was unable to improve the solution any further. It can be seen from Table 7 that the neighborhood structures REINSERTION, TWO-OPT, EXCHANGE, OR-OPT2, SHIFT-1-0 and OR-OPT3 are the most successful structures in the algorithm with a percentage success rate of 99.9%, 99.6%, 99%, 97.4%, 94.8% and 94.8%, respectively. Most of these neighborhoods are intra-route (REINSERTION, TWO-OPT, EXCHANGE, OR-OPT2 and OR-OPT3); only one inter-route neighborhood structure (SHIFT-1-0) was able to achieve a comparable result. The three neighborhoods that were least successful in improving the solutions were K-SHIFT, SHIFT-2-0 and SWAP-2-2 with 44%, 58.6% and 62.3%, respectively. Based on this finding, a neighborhood structure sequence called 'best sequence' was selected. The three best neighborhood structures were chosen from each category (intra-and inter-route) as the aim was to not concentrate on particular categories of neighborhood structure to better explore the solution search space. The best sequences are REINSERTION, SHIFT-1-0, TWO-OPT, SWAP-1-1, EXCHANGE and SWAP-1-2.

Results of Comparison of Accepting Criteria
We investigate the efficacy of two acceptance criteria by applying them to four instances of different sizes: A-n33-k5 and B-n35-k5, which are small-sized instances; E-n51-k5, which is medium-sized; and P-n101-k4, which is large. The two acceptance criteria were best and first. Best, where the algorithm picks the best improvement neighborhood after examining all the neighborhoods of the current solution. First, where the algorithm picks the first improvement neighborhood of the current solution. The result of the experiment after 11 runs is shown in Figures 8-11. Figures 8 and 9 show the small A-n33-k5 and B-35-k5 instances, respectively, whilst Figure 10 shows the medium E-n51-k5 instance and Figure 11 the large P-n101-k4 instance.

Results of Comparison of Accepting Criteria
We investigate the efficacy of two acceptance criteria by applying them to four instances of different sizes: A-n33-k5 and B-n35-k5, which are small-sized instances; E-n51-k5, which is medium-sized; and P-n101-k4, which is large. The two acceptance criteria were best and first. Best, where the algorithm picks the best improvement neighborhood after examining all the neighborhoods of the current solution. First, where the algorithm picks the first improvement neighborhood of the current solution. The result of the experiment after 11 runs is shown in Figures 8-11. Figures 8 and 9 show the small A-n33-k5 and B-35-k5 instances, respectively, whilst Figure 10 shows the medium E-n51-k5 instance and Figure 11 the large P-n101-k4 instance.
Clearly, the Best acceptance criterion achieved the best average objective for the four instances. It was also able to achieve an optimal solution across all instances than the First acceptance criterion. Moreover, the Best acceptance criterion was able to get a near to optimal solution (the lowest line) for two instances, A-n33-k5 and B-n35-k5, as shown in Figures 8 and 9, respectively. Therefore, the Best acceptance criterion was chosen to further investigate the performance of the proposed algorithm.

Results of Comparison of Accepting Criteria
We investigate the efficacy of two acceptance criteria by applying them to four instances of different sizes: A-n33-k5 and B-n35-k5, which are small-sized instances; E-n51-k5, which is medium-sized; and P-n101-k4, which is large. The two acceptance criteria were best and first. Best, where the algorithm picks the best improvement neighborhood after examining all the neighborhoods of the current solution. First, where the algorithm picks the first improvement neighborhood of the current solution. The result of the experiment after 11 runs is shown in Figures 8-11. Figures 8 and 9 show the small A-n33-k5 and B-35-k5 instances, respectively, whilst Figure 10 shows the medium E-n51-k5 instance and Figure 11 the large P-n101-k4 instance.
Clearly, the Best acceptance criterion achieved the best average objective for the four instances. It was also able to achieve an optimal solution across all instances than the First acceptance criterion. Moreover, the Best acceptance criterion was able to get a near to optimal solution (the lowest line) for two instances, A-n33-k5 and B-n35-k5, as shown in Figures 8 and 9, respectively. Therefore, the Best acceptance criterion was chosen to further investigate the performance of the proposed algorithm.   . Acceptance criteria applied on instance P-n101-k4.

Result of Comparison of NE-CS with Basic CS
The result of a comparison of the basic CS with the above CS enhanced by the best sequence of neighborhood structures and Best acceptance criterion, which we named neighborhood-enhanced CS (NE-CS), is summarized in Table 8. From Table 8, it can be seen that NE-CS was able to get a near-optimal solution for most of the instances based on the (Min.) value. The result shows the sensitivity of the algorithm to the neighborhood structures and the positive impact that this has on the search process. Furthermore, the result shows that the proposed NE-CS is much more stable than the basic CS for all instances based on the (Std.) value. Therefore, we can conclude that NE-CS has the ability to explore the search space more efficiently than the basic CS. Also, in terms of computational time, NE-CS outperformed the basic CS for all instances.  First Best Figure 10. Acceptance criteria applied on instance E-n51-k5.

Result of Comparison of NE-CS with Basic CS
The result of a comparison of the basic CS with the above CS enhanced by the best sequence of neighborhood structures and Best acceptance criterion, which we named neighborhood-enhanced CS (NE-CS), is summarized in Table 8. From Table 8, it can be seen that NE-CS was able to get a near-optimal solution for most of the instances based on the (Min.) value. The result shows the sensitivity of the algorithm to the neighborhood structures and the positive impact that this has on the search process. Furthermore, the result shows that the proposed NE-CS is much more stable than the basic CS for all instances based on the (Std.) value. Therefore, we can conclude that NE-CS has the ability to explore the search space more efficiently than the basic CS. Also, in terms of computational time, NE-CS outperformed the basic CS for all instances.  First Best Figure 11. Acceptance criteria applied on instance P-n101-k4.
Clearly, the Best acceptance criterion achieved the best average objective for the four instances. It was also able to achieve an optimal solution across all instances than the First acceptance criterion. Moreover, the Best acceptance criterion was able to get a near to optimal solution (the lowest line) for two instances, A-n33-k5 and B-n35-k5, as shown in Figures 8 and 9, respectively. Therefore, the Best acceptance criterion was chosen to further investigate the performance of the proposed algorithm.

Result of Comparison of NE-CS with Basic CS
The result of a comparison of the basic CS with the above CS enhanced by the best sequence of neighborhood structures and Best acceptance criterion, which we named neighborhood-enhanced CS (NE-CS), is summarized in Table 8. From Table 8, it can be seen that NE-CS was able to get a near-optimal solution for most of the instances based on the (Min.) value. The result shows the sensitivity of the algorithm to the neighborhood structures and the positive impact that this has on the search process. Furthermore, the result shows that the proposed NE-CS is much more stable than the basic CS for all instances based on the (Std.) value. Therefore, we can conclude that NE-CS has the ability to explore the search space more efficiently than the basic CS. Also, in terms of computational time, NE-CS outperformed the basic CS for all instances. We also performed a Wilcoxon test to determine whether there is a significant difference between the two methods, with a confidence level of 95%. The results are presented in Table 9. The p-values indicate that there were significant differences in 13 out of 16 instances. Table 9. p-value results for NE-CS vs. Basic CS.

Result of Comparison of Selection Strategies
Next, we compared the performance of the NE-CS when modified by three different selection strategies, namely tournament, rank and disruptive, which were named Tour-CS, Rank-CS and Dis-CS, respectively. The result of this comparison is summarized in Table 10. Note that the computational time is not reported because it is similar for all three-selection strategies.
It is clear from the table that Dis-CS outperformed the other algorithms. It was able to obtain the BKS for six out of 16 instances (A-n33-k5, B-n46-k7, B-n35-k5, E-n30-k3, F-n72-k4 and M-n101-k10). On the other hand, Tour-CS obtained the BKS for two out of 16 instances (B-n35-k5 and E-n30-k3) and Rank-CS obtained the BKS for three out of 16 instances (A-n33-k5, B-n35-k5 and B-n45-k5). We believe that the better performance of the disruptive strategy is due to its ability to achieve a balance between the selection of worse and better solutions in the population with a small bias towards the better ones; it tries to improve most of the solutions whether they are worse or have high fitness concurrently.
The other two selection strategies (tournament and rank) are biased much more towards high-fitness solutions. However, all three-selection strategies outperformed the random selection strategy used by NE-CS. Moreover, the Dis-CS was able to outperform the NE-CS in five instances.
To illustrate the superior performance of Dis-CS, Figure 12 provides boxplots for a range of instance sizes (A-n33-k5, B-n35-k5, E-n51-k5 and P-n101-k4). It is clear from the figure that the disruptive selection strategy performed better than the original random selection strategy for all four instances. Moreover, in the case of B-n35-k5 and P-n101-k4 even the worst solution obtained by the disruptive selection strategy was better than the best obtained by the random selection strategy. The disruptive selection strategy also outperformed the tournament and rank selection strategies. Lastly, the algorithm performed very well on the large instance, P-n101-k4, when using the disruptive selection strategy.
Symmetry 2020, 12, x FOR PEER REVIEW 17 of 28 the algorithm performed very well on the large instance, P-n101-k4, when using the disruptive selection strategy.  We also carried out statistical analysis to identify the rank of each algorithm by conducting a Friedman test followed by Holm and Hochberg tests as post-hoc methods to obtain the adjusted p-value for each comparison when the control algorithm was Dis-CS. Table 12 summarizes the ranking obtained by the Friedman test (the lowest is the best) and Table 13 provides the results of the Holm and Hochberg tests. Table 12 shows that Dis-CS ranked first, followed by Tour-CS and Rank-CS, respectively. The p-value computed by the Friedman test was 0.000017, which was below the significant interval of 95% and thus indicates that there were significant differences between the observed results. Table 12 shows that the Holm and Hochberg procedures revealed significant differences when Dis-CS was the control algorithm. The comparison showed that Dis-CS was better than Rank-CS and Tour-CS with a confidence level of α = 0.05 (2/3 algorithms).

Result of Comparison of HCS-SA with Dis-CS
Finally, we compared Dis-CS with HCS-SA in order to ascertain whether hybridization would improve performance more than the best-performing selection strategy. Table 14 presents the results of this comparison. It can be seen from the table that HCS-SA obtained a better result in 12 out of 16 instances. Moreover, it performed better than Dis-CS in all of the instances in terms of Avg., Std. and Max., values when the search ended. In the case of the medium-and large-sized instances (A-n60-k9, B-n45-k5, B-n68-k9, E-n51-k5 and P-n76-k4), HCS-SA found the BKS.  Table 17. Percentage improvement in time of HCS-SA and state-of-the-art methods.

DPSO-SA PSO-SR-2 CPSO-SA PSO-ACO IQEA
28.7 ---28.7   Table 18 shows that the PSO-ACO ranked first, followed by HCS-SA and IQEA, respectively. The p-value obtained by the Friedman test was 0.000004, which is below the significance interval of 95% (α = 0.05). This shows that there was a significant difference between the observed results. Furthermore, Table 19 shows that the control method (PSO-ACO) outperformed the proposed method HCS-SA at the critical level of 0.05 (adjusted p-value < 0.05). The results of the Holm and Hochberg statistical tests suggest that HCS-SA is not better than PSO-ACO. However, the results in Table 13 show that HCS-SA outperformed PSO-ACO on one instance and produced a comparable performance across all the other instances.

Discussion
As mentioned earlier, this paper has proposed a hybrid HCS-SA for CVRP, however, it has been developed with various stages, as shown in Table 20. The higher -ve value mean the proposed algorithm performed far better. The first, improvement basic CS based on neighborhood structures (NE-CS). It has improved the mean up to 3%, reduced standard deviation up to 20 and reduced computation time up to 17 s. Furthermore, the Freidman test showed NE-CS was significantly improved (p-value < 0.05) compared to basic CS in all instances. The second is improvement using a selection strategy known as Dis-CS. Dis-CS has improved the mean up to 2%, 4% and 4% compare to NE-CS, Tour-DS and Rank-DS respectively. The Freidman test also shows Dis-CS has significantly improved for all instances compare with NE-CS, Tour-CS and Rank-CS except E-30-n3 and B-n45-k5. Besides, it has increased most of the standard deviation and computational time. The third improvement is hybrid with SA, in which it has improved the mean up to 6%, reduced standard deviation up to 8.9, however, it has increased the computational time between 21.65 s to 1999 s. The comparison result between HCS-SA with the state-of-art algorithms shown it has either obtained the same or outperformed for most instances from 0.3 up to 72, except for E-n78-k10 and E-n76-k7 instances, however still better compare with algorithm IWD and IQEA. In terms of computational time, HCS-SA appears to even have performed better on some algorithms such as DIPSO-SA and IQEA but on HCS-SA losses against other algorithms for all problem instances.

Conclusions
This study introduced hybrid CS with SA for CVRP that consists of three improvements strategy. The first strategy is to improve the CS exploration of search space by avoiding the trap in local optima using 12 multiple neighborhood structure strategies. However, only six multiple neighborhood structures are enough to obtain the best solution as it has improved both the cost and computational time. The second strategy is to further increase the CS exploration using three selection strategies as the disruptive strategy shown the best solution. The result showed it has significantly improved the solution, however it also increased computational time as well. Therefore, the third strategy aims to improve the exploitation of Disruptive CS ability by hybridizing with SA by targeting of having higher improvement solution with less increase its complexity. The result shown that the HCS-SA has outperformed compared with the state-of-art algorithms includes with other CS improvement. Even though the complexity of HCS-SA is higher compare with most of the state-of-the-art algorithms. However, the complexity of HCS-SA is far less compared with the state of art improvement CS for CVRP. We found that the exploitation strategy has given a higher impact on CS improvement compare with the two exploration strategies. So, it can be concluded that the strategy to maximize the exploration first follows by exploitation as the best strategy for CVRP's CS improvement. With such a result, we believe the proposed algorithm can be applied to other variants of routing problems as well, such as VRP with time windows [76][77][78], VRP for hazardous materials transportation [79], dynamic vehicle routing problem [80], weighted vehicle routing problem [81], split delivery vehicle routing problem [82] and VRP with outsourcing and profit balancing [83].