A Memetic Algorithm for the Green Vehicle Routing Problem

: The green vehicle routing problem is a variation of the classic vehicle routing problem in which the transportation ﬂeet is composed of electric vehicles with limited autonomy in need of recharge during their duties. As an NP-hard problem, this problem is very di ﬃ cult to solve. In this paper, we ﬁrst propose a memetic algorithm (MA)—a population-based algorithm—to tackle this problem. To be more speciﬁc, we incorporate an adaptive local search procedure based on a reward and punishment mechanism inspired by reinforcement learning to e ﬀ ectively manage the multiple neighborhood moves and guide the search, an e ﬀ ective backbone-based crossover operator to generate the feasible child solutions to obtain a better trade-o ﬀ between intensiﬁcation and diversiﬁcation of the search, and a longest common subsequence-based population updating strategy to e ﬀ ectively manage the population. The purpose of this research is to propose a highly e ﬀ ective heuristic for solving the green vehicle routing problem and bring new ideas for this type of problem. Experimental results show that our algorithm is highly e ﬀ ective in comparison with the current state-of-the-art algorithms. In particular, our algorithm is able to ﬁnd the best solutions for 84 out of the 92 instances. Key component of the approach is analyzed to evaluate its impact on the proposed algorithm and to identify the appropriate search mechanism for this type of problem.


Introduction
Currently, environmental concerns have driven governments to develop regulations and laws that require organizations to adopt green logistics methods in their operations [1]. As a result, major automotive manufacturers design and produce increasingly more and better alternative fuel vehicles (AFVs). Hence, the number of AFVs on the roads are increasing. AFVs represent a promising opportunity for reducing costs and pollution caused by transportation and mobility operations [2]. This is because many AFV models in the market not only help reduce fuel consumption, but also have advantages in reducing fuel costs and reducing greenhouse gas emissions. However, many AFVs have limited travel distances and must often rely on an underdeveloped refueling infrastructure. Therefore, it may be necessary to include the exact locations of the refueling stations in route planning. Furthermore, refueling delays may also cause considerable negative impacts. For example, for electric vehicles, due to their relatively short driving range and long recharging time, coupled with limited charging infrastructure, it may lead to range anxiety, so consumers may be worried about insufficient energy to reach the desired destination [3]. As a result, new routing models and methods for AFVs have emerged to help provide reliable and adequate levels of service.
As one of the most famous combinatorial optimization problems, the classic vehicle routing problem (VRP) is concerned with designing least cost delivery routes for a fleet of vehicles to serve a set of customers. The green vehicle routing problem (GVRP) is a variation of the classic VRP in which the fleet is composed of alternative fuel powered vehicles with limited autonomy in need of recharge during the execution of their duties [4]. For the GVRP, there is an unlimited homogeneous fleet of alternative fuel vehicles with limited time that starts from a depot node 0 and returns to it with the objective of minimizing the total travel distances incurred by all the vehicles. The vehicles must fulfill all the requests by visiting each customer node once with the constraints on the maximum driving time and maximum fuel capacity available [5]. Figure 1 depicts a feasible solution to a green VRP.
Sustainability 2019, 11, x FOR PEER REVIEW 2 of 22 As one of the most famous combinatorial optimization problems, the classic vehicle routing problem (VRP) is concerned with designing least cost delivery routes for a fleet of vehicles to serve a set of customers. The green vehicle routing problem (GVRP) is a variation of the classic VRP in which the fleet is composed of alternative fuel powered vehicles with limited autonomy in need of recharge during the execution of their duties [4]. For the GVRP, there is an unlimited homogeneous fleet of alternative fuel vehicles with limited time that starts from a depot node 0 and returns to it with the objective of minimizing the total travel distances incurred by all the vehicles. The vehicles must fulfill all the requests by visiting each customer node once with the constraints on the maximum driving time and maximum fuel capacity available [5]. Figure 1 depicts a feasible solution to a green VRP. The goal of this research is to propose a highly effective heuristic for solving the green vehicle routing problem and to bring new ideas for this problem or this type of problem. The main contributions of this paper can be summarized as follows. First, to the best of our knowledge, this work is the first to employ a population-based algorithm (i.e., memetic algorithm (MA)) to solve the green vehicle routing problem. Second, we propose a reward and punishment mechanism inspired by reinforcement learning to effectively manage the multiple neighborhood moves and guide the search. Compared with the traditional local search, the adaptive local search can manage the different neighborhood moves to adapt to different instances. Third, a backbone-based crossover operator and a longest common subsequence based (LCS-based) population updating strategy is devised to obtain a better trade-off between intensification and diversification of the search. Compared to the general crossover operator (such as one-point crossover operator), the backbone-based crossover operator can better integrate with the problem structure by inheriting the promising customer and station sequences from the routes. Fourth, our experimental results demonstrate that the performance of our MA is highly effective compared to state-of-the-art approaches in the literature.
The rest of the paper is organized as follows. In Section 2, we present the review of the literature. In Section 3, we give the problem description of GVRP. In Section 4, we provide details of our memetic algorithm. In Section 5, we present the results of extensive computational experiments carried out to assess the performance of the proposed algorithms in comparison to current bestperforming approaches. In Section 6, we analyze the important component (i.e., adaptive mechanism for neighborhood moves) of the MA algorithm and study the impact on its performance. In Section 7, we discuss the proposed algorithm in comparison with the other methods in the literature. Finally, we provide concluding remarks as well as potential research directions in Section 8. The goal of this research is to propose a highly effective heuristic for solving the green vehicle routing problem and to bring new ideas for this problem or this type of problem. The main contributions of this paper can be summarized as follows. First, to the best of our knowledge, this work is the first to employ a population-based algorithm (i.e., memetic algorithm (MA)) to solve the green vehicle routing problem. Second, we propose a reward and punishment mechanism inspired by reinforcement learning to effectively manage the multiple neighborhood moves and guide the search. Compared with the traditional local search, the adaptive local search can manage the different neighborhood moves to adapt to different instances. Third, a backbone-based crossover operator and a longest common subsequence based (LCS-based) population updating strategy is devised to obtain a better trade-off between intensification and diversification of the search. Compared to the general crossover operator (such as one-point crossover operator), the backbone-based crossover operator can better integrate with the problem structure by inheriting the promising customer and station sequences from the routes. Fourth, our experimental results demonstrate that the performance of our MA is highly effective compared to state-of-the-art approaches in the literature.

Review of the Literature
The rest of the paper is organized as follows. In Section 2, we present the review of the literature. In Section 3, we give the problem description of GVRP. In Section 4, we provide details of our memetic algorithm. In Section 5, we present the results of extensive computational experiments carried out to assess the performance of the proposed algorithms in comparison to current best-performing approaches.
In Section 6, we analyze the important component (i.e., adaptive mechanism for neighborhood moves) of the MA algorithm and study the impact on its performance. In Section 7, we discuss the proposed algorithm in comparison with the other methods in the literature. Finally, we provide concluding remarks as well as potential research directions in Section 8.

Review of the Literature
The green vehicle routing problem was first introduced in [6], where a mixed integer linear programming formulation and two heuristics are proposed. One of the proposed heuristics is a modified Clarke and Wright savings algorithm (MCWS) that repairs infeasible routes by inserting fuel vehicle stations (AFSs) using a savings criterion and removes redundant AFSs after merging the routes. Schneider et al. (2014) [7] introduced the electric vehicle routing problem with time windows and recharging stations (E-VRPTW), which is an extension of the VRP with a fleet of electric vehicles. The E-VRPTW problem considers limited capacities of vehicles, time windows of customers, and the possibility of recharging at any of the available stations using an appropriate recharging scheme. For this problem, they proposed an MILP formulation and a hybrid metaheuristic combining variable neighborhood search and tabu search (VNS/TS). The proposed VNS/TS explores infeasible solutions with respect to capacity, time windows, and battery usage constraints. A dynamic penalizing scheme is used to guide the search toward feasible solutions. Furthermore, Schneider et al. (2014) [7] made improvement over the MILP formulation proposed by Erdogan and Miller-Hooks (2012) [6], and they evaluated their VNS/TS and MILP approaches on the 52 instances proposed by Erdogan and Miller-Hooks (2012) [6]. Computational experiments showed that CPLEX 12.2 was unable to solve to optimality instances with 20 customers using their MILP model, and VNS/TS outperformed the constructive heuristics proposed by Erdogan and Miller-Hooks (2012) [6]. Schneider et al. (2015) [8] introduced the vehicle routing problem with intermediate stops (VRPIS) that generalizes the GVRP and the MDVRPI. To solve this problem, they proposed an adaptive variable neighborhood search (AVNS). Their AVNS uses a modified savings algorithm to generate an initial solution that is later improved with local search. The algorithm uses an adaptive shaking with 24 neighborhood structures, five route selection methods, three vertex sequence selection methods, and an adaptive mechanism to choose the route and vertex selection methods. The solution generated at the shaking step is subsequently improved by several greedy local searches. Furthermore, the AVNS has a dynamic penalization scheme to guide the search toward feasible solutions and a simulated annealing acceptance criterion. Since the green VRP is a special case of the VRPIS, Schneider et al. (2015) [8] evaluated their approach on the instances of Erdogan and Miller-Hooks (2012) [6]. This method outperformed all previous methods both in terms of solution quality and computational time.
More recently, Montoya et al. (2016) [5] presented a new mathematical formulation for the GVRP and a branch-and-cut algorithm. The authors also proposed a simulated annealing heuristic and report results on small instances with up to 20 customers and 310 refueling stations. Andelmin and Bartolini (2019) [1] proposed a multi-start local search (MSLS) algorithm for the GVRP. Their MSLS algorithm consists of three phases. The first two phases iteratively construct new solutions, improve them by local search, and store all vehicle routes from these solutions in a route pool. The third phase optimally combines vehicle routes in the route pool by solving a set partitioning problem and improves the final solution by local search. They reported computational results on benchmark instances with up to 470 customers, showing that the algorithm is highly competitive with the previous best performing heuristics.
Swarm intelligence (SI) is the collective behavior of decentralized and self-organized systems, natural or artificial. The concept is employed in work on artificial intelligence. This method is widely employed to solve hard solving complex combinatorial decision problems [9][10][11][12]. As a type of evolutionary algorithm [3], memetic algorithm is a general-purpose metaheuristic approach that typically combines a local search optimization procedure with a population-based framework. Memetic algorithm has been successfully applied to tackle many classical combinatorial optimization problems, including the machine scheduling problem [13], graph coloring problem [14] and quadratic assignment problem [15]. Hence, in this paper, we first employ the memetic algorithm to tackle the famous green vehicle routing problem.

Problem Description
For the green vehicle routing problem, the objective is to find a set of routes of minimum total distance such that each customer is visited exactly once; the level of the tank when the vehicle arrives at any vertex is nonnegative; each route satisfies the maximum-duration limit; and each route starts and ends at the depot. Figure 1 depicts a feasible solution to a green VRP.
More precisely, we are given a set N = {1, . . . , n} of n customers and a set of F = {n + 1, . . . , n + s} of s refueling stations. There is an unlimited homogeneous fleet of alternative fuel vehicles with limited time that starts from a depot node 0 and return to it with the objective of minimizing the total travel distances incurred by all the vehicles. The vehicles must fulfill all the requests by visiting each customer node once with the constraints on the maximum driving time and maximum fuel capacity available. More precisely, GVRP can be defined on a complete weighted undirected graph G = (V, E) with the following features. V = N ∪ F ∪ 0 denotes the set of nodes, where N denotes the set of customer nodes, F is the set of refueling station nodes, 0 denotes the starting and ending node, also called depot, and E = {(u, v): u, v∈V, u v} is the edge set.
The distance between two nodes (u, v ∈ N) is denoted by c(u, v).

•
The service time at each customer node u ∈ N is denoted by st u .

•
The refueling time at each refueling station node u ∈ F is denoted by f t u .

•
The travel time to traverse the arc (u, v) ∈ E is denoted by tt u,v .

•
The maximum fuel capacity without refueling for each vehicle is MC (i.e., the maximum fuel capacity constraint).

•
The maximum driving time of each route including the service time, traversal time and refueling time is MT (i.e., the maximum driving time constraint). • A route traveling from i to j consumes c(i, j) units of distance and t(i, j) units of time. The time t(i, j) is assumed to be proportional to the distance c(i, j) from i to j and computed as t(i, j) = c(i, j)/v where v is the vehicle speed.
Let R be one route in a solution S and R = {u 0 = 0, u 1 , u 2 , . . . , u m−1 , u m = 0}, where u k is the kth node visited in R(0 ≤ k ≤ m). It is worth noting that the GVRP expresses the maximum driving range of a vehicle as its maximum fuel capacity MC. However, since vehicle fuel consumption is assumed to be linearly dependent on the distance traveled, the maximum driving range can be equivalently expressed as a distance value. Letting K be the constant rate of fuel consumption per distance unit, the maximum distance Q that a vehicle can travel without refueling is thus defined as Q = MC/K.
If the visited node is a customer node (i.e., u k ∈ N), the corresponding fuel surplus of the vehicle after visiting it is f s(u k ) = f s(u k−1 ) − c(u k−1 , u k )/K. For each node in the route, the corresponding fuel surplus must be non-negative. Note that all vehicles departing from the depot and the refuel stations are assumed to be fully refueled having the maximum fuel amount MC. We denote by DT(R) = m−1 k=0 tt u k ,u k+1 the total traversal time, ST(R) = m−1 k=1 st u k the total service time if u k is the customer node, and FT(R) = m−1 k=1 f t u k if u k is the refuel station node. The corresponding total duration of each route including the traversal time, service time and refuel time cannot exceed the given maximum driving time, i.e., DT(R) + ST(R) + FT(R) ≤ MD(the maximum driving time constraint). The objective is to find a feasible solution with the minimum total travel distance as follows: where DT denotes the traversal time and v denotes the vehicle speed.

Main Framework
A memetic algorithm is a general-purpose metaheuristic approach that typically combines a local search optimization procedure with a population based framework, which has been successfully applied to tackle many classical combinatorial optimization problems, including the job shop scheduling problem [16], p-center problem [17] and nurse rostering problem [18]. The purpose of combining local search and population-based strategies is to take advantage of both the crossover operator as a diversification mechanism for discovering promising unexplored regions of the search space and the local optimization as an intensification procedure to obtain high quality solutions within a search region. We outline our proposed memetic algorithm for GVRP in Algorithm 1. At the beginning of the algorithm, we iteratively employ a hybrid heuristic method to generate the initial population (line 1). Following this, we employ an adaptive local search to optimize the solutions in the population (lines 2-4). Later, we iteratively combine two parent solutions randomly selected from the population to generate offspring solutions using a backbone-based crossover operator until the stopping criterion, i.e., maximum computing time, is satisfied (lines 5-7). After each use of the crossover operator, we improve the generated offspring solution using an adaptive local search to guide the search to promising regions (line 8). During this process, S * records the best solution found so far (lines 9-11). We then apply the longest-common-sequence-based (LCS-based) population updating strategy to possibly replace the worst individual in the population with the improved offspring solution (lines 12-15).
: if S c is better than S * then 10: S * ← S c 11: end if /* The longest-common-subsequence based population updating strategy (Section 4.5) */ 12: Determine the worst individual S w where the goodness value GS(S w ,

Initial Solutions
In this section, we employ a constructive heuristic based on a greedy generation method called k-Pseudo Greedy proposed by Felipe et al. (2014) [2] to generate the initial population. The k-Pseudo Greedy focuses more on feasibility and diversification, producing multiple different feasible solutions, rather than on quality or cost that are to be improved later. Specifically, the k-Pseudo Greedy method creates feasible solutions by starting from an empty solution and iteratively extending it (i.e., adding customers and fuel stations in route) until a complete solution is constructed. Specifically, the method first finds up to k unvisited customers that are closest to the current node and are reachable according to capacity, autonomy and time availability, and then select one of them, denoted by j, at random. More precisely, if it is possible to reach the depot directly form j, then j is added to the initial route. If it is possible to reach the depot from j by visiting a recharge node r, the j and r are both added to the initial route. Otherwise, the current route is ended in depot. The above procedure iteratively continues until all the customers are added in the initial solution.
If k = 1, it reduces to a nearest neighbor greedy algorithm, visiting the closest customer and recharging at the closest recharge station when needed, until the capacity or time limit is reached. If k > 1, the next node to be visited is chosen randomly from the set of k closest candidates, allowing the generation of different feasible solutions in different runs. This algorithm is focused more on feasibility and diversification, producing multiple different feasible solutions, rather than on quality or cost, that are to be improved later. However, if k is small, the produced solutions are expected to have an acceptable quality.
Generally speaking, the initial solution procedure will have an insignificant impact on the performance of the overall algorithm, if the algorithm itself is powerful enough. Therefore, we adopt in this study this simple but effective initial solution construction process above to generate the initial population.

Adaptive Local Search
One of the key components of our memetic algorithm is the adaptive local search procedure that plays the critical role of intensifying the search. With the exception of tabu search, traditional local search utilizes a set of moves to search the solution regions without maintaining a memory of the process, while the local search based on our reinforcement learning mechanism is able to effectively exploit memory to manage the neighborhood moves and guide the search to promising regions.

Neighborhood Moves
Our algorithm employs both intra-route moves (performed in the same route) and inter-route moves (performed between two different routes) as follows: • Intra-node-insertion (hereafter denoted as M 1 ): This operator selects a customer node removed from a given route and tries to relocate it in the same route. Specifically, it first removes node i and relocates it between customers k and l to contains the customer sequence (k, i, l), as shown in Figure 2. • Intra-nodes-swap (M 2 ): two customer nodes in the same route exchange their positions. Specifically, it first removes nodes i and k and relocates them in the current route to contains the customer sequences (AFS 1 , k, j) and (AFS 2 , i, l), as shown in Figure 3. • Intra-arc-insertion (M 3 ): this operator selects an arc of two customers from a given route and tries to relocate it somewhere else of the same route. Specifically, it first removes one arc between two successive customers i and j. It then tries to reconnect the route so that it contains the customer sequence (AFS 2 , k, j, 0), as shown in Figure 4. • Inter-node-insertion (M 5 ): different from the intra-node-insertion operator, this operator selects a customer node removed from a given route and tries to relocate it in a different route. To be specific, it first removes node i in route 1 and relocates it between customers k and l to contains the customer sequence (k, i, l) in route 2, as shown in Figure 6. . Example of the intra-nodes-swap operator relocating customer node i of the current route between refuel station AFS2 and customer l, and relocating customer node k of the current route between refuel station AFS2 and customer l.
• Intra-arc-insertion ( 3 M ): this operator selects an arc of two customers from a given route and tries to relocate it somewhere else of the same route. Specifically, it first removes one arc between two successive customers i and j. It then tries to reconnect the route so that it contains the customer sequence 2 ( , , ,0) AFS k j , as shown in Figure 4. . Example of the intra-nodes-swap operator relocating customer node i of the current route between refuel station AFS2 and customer l, and relocating customer node k of the current route between refuel station AFS2 and customer l.
• Intra-arc-insertion ( 3 M ): this operator selects an arc of two customers from a given route and tries to relocate it somewhere else of the same route. Specifically, it first removes one arc between two successive customers i and j. It then tries to reconnect the route so that it contains the customer sequence 2 ( , , ,0) AFS k j , as shown in Figure 4.  , , ) AFS i l , as shown in Figure 3.
(a) (b) Figure 3. Example of the intra-nodes-swap operator relocating customer node i of the current route between refuel station AFS2 and customer l, and relocating customer node k of the current route between refuel station AFS2 and customer l.
• Intra-arc-insertion ( 3 M ): this operator selects an arc of two customers from a given route and tries to relocate it somewhere else of the same route. Specifically, it first removes one arc between two successive customers i and j. It then tries to reconnect the route so that it contains the customer sequence 2 ( , , ,0) AFS k j , as shown in Figure 4.   • Inter-node-insertion ( 5 M ): different from the intra-node-insertion operator, this operator selects a customer node removed from a given route and tries to relocate it in a different route. To be specific, it first removes node i in route 1 and relocates it between customers k and l to contains the customer sequence ( , , ) k i l in route 2, as shown in Figure 6. • Inter-arc-insertion ( 7 M ): different from the intra-arc-insertion operator, this operator relocates a customer arc into another route. See an example in Figure 8. (a) (b) Figure 5. Example of the intra-arcs-swap operator relocating customer arc (i, j) of the current route between refuel stations AFS2 and AFS3 and relocating customer arc (k, l) of the current route between refuel stations AFS1 and AFS2.
• Inter-node-insertion ( 5 M ): different from the intra-node-insertion operator, this operator selects a customer node removed from a given route and tries to relocate it in a different route. To be specific, it first removes node i in route 1 and relocates it between customers k and l to contains the customer sequence ( , , ) k i l in route 2, as shown in Figure 6.
(a) (b) Figure 6. Example of the inter-node-insertion operator relocating customer i of route 1 into route 2 between customer k and l. • Inter-arc-insertion ( 7 M ): different from the intra-arc-insertion operator, this operator relocates a customer arc into another route. See an example in Figure 8. (a) (b) Figure 5. Example of the intra-arcs-swap operator relocating customer arc (i, j) of the current route between refuel stations AFS2 and AFS3 and relocating customer arc (k, l) of the current route between refuel stations AFS1 and AFS2.
• Inter-node-insertion ( 5 M ): different from the intra-node-insertion operator, this operator selects a customer node removed from a given route and tries to relocate it in a different route. To be specific, it first removes node i in route 1 and relocates it between customers k and l to contains the customer sequence ( , , ) k i l in route 2, as shown in Figure 6.   • Inter-arcs-swap ( 8 M ): different from the intra-arcs-swap operator, this operator exchanges the positions of two customer arcs between two different routes. See an example in Figure 9. In our local search, we only consider neighborhood moves that can satisfy both two constraints, i.e., the maximum driving time, and maximum fuel capacity constraints.

Reward and Penalty Strategy Based on Adaptive Mechanism
Reinforcement learning is an area of machine learning concerned with how an agent should take actions in an environment to maximize cumulative reward. The intuition underlying reinforcement learning is that actions that lead to large rewards should be made more likely to recur.
We employ a reward and penalty strategy to dynamically manage the neighborhood moves and guide the search based on the expectation that different neighborhood moves may be preferable for different problem instances or search landscapes. Consequently, we keep track of a score for each neighborhood move, which measures how well the move has performed for the current instance or landscape, adopting the perspective that alternating among different moves based on the proposed adaptive mechanism may yield more robust performance.
To select moves, we assign scores to different moves and use the roulette wheel selection At the beginning of the search, each neighborhood move has the same score 0 sc and hence the same probability of being chosen. After each iteration j , the score of the neighborhood used is updated as follows: In our local search, we only consider neighborhood moves that can satisfy both two constraints, i.e., the maximum driving time, and maximum fuel capacity constraints.

Reward and Penalty Strategy Based on Adaptive Mechanism
Reinforcement learning is an area of machine learning concerned with how an agent should take actions in an environment to maximize cumulative reward. The intuition underlying reinforcement learning is that actions that lead to large rewards should be made more likely to recur.
We employ a reward and penalty strategy to dynamically manage the neighborhood moves and guide the search based on the expectation that different neighborhood moves may be preferable for different problem instances or search landscapes. Consequently, we keep track of a score for each neighborhood move, which measures how well the move has performed for the current instance or landscape, adopting the perspective that alternating among different moves based on the proposed adaptive mechanism may yield more robust performance.
To select moves, we assign scores to different moves and use the roulette wheel selection principle. If we have n moves with scores sc i (i ∈ 1, 2, . . . , n), move k is selected with probability λ k , where At the beginning of the search, each neighborhood move has the same score sc 0 and hence the same probability of being chosen. After each iteration j, the score of the neighborhood used is updated as follows: sc i,j+1 = sc i,j + α * β l , i, j = 1, 2, . . . , n; l = 1, 2.
where the reaction factor α controls how quickly the score adjustment function reacts to changes according to the performance of the moves, and parameter β denotes the different incremental scores according to the following several situations. If one move can produce a new best solution, we reward this neighborhood move by choosing β 1 in Equation (3). If one move can generate a better solution than the current solution, the neighborhood move would still be rewarded β 2 . However, if the generated solution is worse than the current solution, then we punish the move by multiplying the score by γ as follows: sc i,j+1 = γ * sc i,j , i, j = 1, 2, . . . , n.
The adaptive local search phase proceeds with iterative exploitation of the eight neighborhood moves as shown in Algorithm 2. In each iteration, one neighborhood move is picked with probability λ i (lines [3][4]. Then, if the neighborhood solution S' obtained by this neighborhood move cannot improve the current solution S, the next neighborhood move is chosen from those remaining; otherwise, the current solution S is replaced by the best neighborhood solution S generated by current neighborhood move (lines 5-10). Subsequently, the score of the neighborhood move N i is updated by Equations (3) and (4) (line 11). During this process, S b records the best solution found in the local search, S * preserves the best found solution so far, and no_improve_iter denotes the number of iterations without improving the best found solution S b (lines [12][13][14][15][16]. When none of the moves can improve the current best solution, we apply a simple perturbation strategy to achieve a better trade-off between diversification and intensification of the search (lines [17][18][19]. To apply the perturbation operator, we randomly delete part of the customer nodes (n/ζ) from the current solution and re-insert the deleted customer nodes into the partial solution based on a greedy strategy.

Backbone-Based Crossover Operator
The crossover operator is another key component of our memetic algorithm. In practice, it is important to devise a dedicated recombination operator that has strong "semantics" with respect to the studied problem. In the last few years, several kinds of crossover operator have been used in the literature. Li et al. [19] introduced a complicated crossover operator based on the tree representation and Cherkesly et al. [4] proposed an adapted-order-based crossover operator. Both operators can only be used for the pickup and delivery constraints. In this study we propose a dedicated crossover operator for GVRP, which has the following features: first, our crossover operator always generates feasible solutions with respect to all the constraints, i.e., the maximum driving time and the maximum fuel capacity constraints. Thus, it is unnecessary to employ the repair strategy to ensure feasibility of the generated solutions. Second, based on a greedy chosen mechanism, our crossover operator can obtain high-quality offspring solutions.
The proposed backbone-based crossover operator operates in the following two sequential steps:

•
Step 1. We select the chosen customer and station sequence of a route from the corresponding parent solutions as a route of the child solution. To illustrate this, Figure 10 depicts an example with two parents S a and S b , where Route 1 and Route 2 denote the two routes of the parent solutions, respectively. The best chosen route selected from the candidate routes of two parent solutions is the one yielding the minimum ratio value of ∆: where δ( f ) represents the incremental objective value after selecting the candidate route as a route in the child solution S c , and θ denotes the number of customers in the current route. As shown in Figure 10, we first select Route 1 from S b in the first iteration, and Route 2 from S b in the second iteration. At the last iteration we select Route 1 from S a .

•
Step 2. We delete the customers in the chosen route from the remaining routes in two parent solutions. As shown in Figure 10, we delete the customers from both two parent solutions, which appear in Route 1 after we select Route 1 from S a in the first iteration. The reason is that it is necessary to remove the same customer from both parent solutions in order to avoid redundancy after each iteration. • Step 2. We delete the customers in the chosen route from the remaining routes in two parent solutions. As shown in Figure 10, we delete the customers from both two parent solutions, which appear in Route 1 after we select Route 1 from a S in the first iteration. The reason is that it is necessary to remove the same customer from both parent solutions in order to avoid redundancy For the crossover operator, both steps are iteratively alternated until all the customers are included in the offspring solution. It is intuitive that our proposed crossover operator is able to generate the feasible offspring solution since the offspring solution inherit all the feasible routes (or the subset of the routes) from the parent solutions.
In our proposed crossover operator, the first step aims to select the most promising customer and station sequences from the routes according to a dedicated equation. The second step iteratively deleted the chosen customers and stations each time. Compared to the general and typical crossover operators, the proposed backbone-based crossover based on these two steps with the iterated greedy strategy is able to integrate well with the problem structure. Hence in this study, we employ the backbone-based crossover operator as the component of our algorithm.

LCS-Based Population Updating Strategy
To maintain healthy diversity of the population, we use the LCS-based population updating strategy introduced by Cheng et al. [16] to solve the job shop scheduling problem, to decide whether the improved solution should be inserted into the population or discarded. This population updating strategy simultaneously considers the solution quality and the distances among the individuals in the population to guarantee diversity of the population. The underlying idea is that the similarity of two solutions based on the longest common subsequence could clearly match the neighborhood moves based on the insert and swap operations. For this purpose, we first make two definitions as follows:

Definition 1. Distance between a solution and its population
Given a solution S i and the population PP = {S 1 , S 2 , . . . , S np }, the distance between the solution S i and its population PP can be defined as follows: where 2 * (n + s) and lcs(s i , s j ) denote the number of all the customer and refuel station nodes and the length of the longest common subsequence between S i and S j , respectively.

Definition 2. Goodness score of a solution in the population
The goodness score GS(S i ) of a solution S i is defined by its objective function value, as well as its distance to the population, as follows: where f max and f min denote the maximum and minimum objective values of the individuals in the population PP, and Dist max and Dist min are the maximum and minimum distances between a solution to the population, respectively. The number 1 is used to avoid the possibility of a 0 denominator and δ is a constant parameter. In each generation, the offspring individual is inserted into the population if the goodness score of the offspring is better than the worst solution in the population according to the goodness score. Otherwise, the offspring individual is discarded. It is clear that the greater the goodness score GS(S i ) is, the better is the solution S i . It is noted that this goodness score function simultaneously considers the factors of solution quality and population diversity. On the one hand, we should maintain a population of elite solutions. On the other hand, we have to emphasize the importance of the diversity of the solutions to avoid premature convergence of the population.

Computational Studies
In this section we report extensive computational studies conducted to assess the performance of our proposed memetic algorithm (MA) with the state-of-the-art reference algorithms in solving public benchmark instances of GVRP.

Benchmark Instances and Experimental Protocols
For experimental evaluations, we employ two data sets for GVRP. The first set of benchmark instances (called EMH) was proposed in [6], which consists of 52 instances with 20-500 customers and 3-28 refueling stations. Most of these instances (40 out of 52) consist of 20 customers and 2-10 refueling stations and have been randomly constructed to represent different types of customer and refueling station configurations. The second set of instances (denoted by AB) was created by [20] by extracting a subset of customers from the larger EMH instances. The number of customers for the AB instances ranges between 50 and 100. These instances are divided into two subsets called AB1 and AB2. The AB1 instances have the same characteristics as the EMH instances, whereas the AB2 instances have the same customers and refueling stations as those in AB1, but the vehicles are assumed to travel at a higher speed of v = 60 miles/h and have a maximum driving autonomy of 280 miles. Notice that the AB2 instances allow longer vehicle routes with respect to the EMH and AB1 instances due to the higher vehicle speed.
In this study we coded our MA algorithm in C++ and ran it on a PC with an AMD Athlon 3.0 GHz CPU and 2Gb RAM operating under the Windows 7 operating system. To evaluate the performance of MA, we compare it with the following state-of-the-art heuristics from the literature:

•
The adaptive variable neighborhood search (AVNS) [7]. Algorithm AVNS was evaluated on a desktop computer with an Intel Core i5 2.67 GHz processor with 4GB RAM, running Windows 7 Professional.  Table 1 presents the settings of the MA parameters used in the reported experiments. We tuned the parameters with the iterated F-race (IFR) proposed by Birattari et al. [21] and an automated configure method that is part of the IRACE package from [22]. We performed the tuning process on ten instances 111c_24s, 111c_26s, 111c_28s, 200c_21s, 250c_21s, 300c_21s, 350c_21s, 400c_21s, 450c_21s, and 500c_21s with 109-471 vertices. For each parameter, IFR requires a limited set of values as input to choose from the "candidate values" which are empirically determined and presented in Table 1. We set the total time budget for IRACE to 100 executions of MA, with a time limit of 60 minutes for the GVRP instances 111c_24s, 111c_26s, 111c_28s, 200c_21s, 250c_21s and 300c_21s, and 200 min for the GVRP instances 350c_21s, 400c_21s, 450c_21s and 500c_21s. We denote the parameters setting suggested by IFR as Final Value in Table 1. The parameter (the number of the candidate customers) in the initial population phase 1, 3, 5 3

Parameter Tuning
α The reaction factor that controls how quickly the score adjustment function reacts to changes according to the performance of the moves The reward parameter if a move produces a new best solution 1, 5, 10 5

Computational Results on the EMH Benchmark Instances
In this section we evaluate the performance of MA in solving GVRP in comparison with the best-performing algorithms (i.e., AVNS, MSH and MSLS). Specifically, we applied MA to solve each instance for ten times. As depicted in Tables 2 and 3, the columns n, s and Γ report for each instance the number of customers, the number of stations, and the number of arcs in G, respectively. The column under the heading CPLEX shows the results obtained with an improved version of the model implemented by Schneider et al. [3] and solved with CPLEX with a 3-h time limit. The cases in which a feasible solution could not be found within the time limit are tagged with a-sign results.
The following columns f best , Veh, f avg and Time report the best objective value, i.e., the minimum travelled distance, the number of vehicles corresponding to the best solution, the average objective value over the ten runs, and the computing time, respectively. In addition, the row #Best indicates the numbers of instances for which the associated algorithm obtains best objective values in terms of f best compared with the reference results reported in the literature, and the row denotes the average value over all the instances in the set. Table 2 presents the result on the 20-customer EMH instances. Considering this part of instances is small-scale with only 20 customers, we limit the computational time of the algorithm to 0.01 s for each instance. Note that for this instance set with the smallest scale, CPLEX solver cannot find solutions for four instances within the specified time limit (i.e., 3 h). Our proposed MA algorithm can obtain the best results for 38 out of 40 instances. The best and average objective values are better than the state-of-the-art MSLS algorithm (1635.42 vs. 1636.84, and 1636.94 vs. 1635.52). Table 3 shows the experimental results on the large EMH instances with 109-471 customers. From Table 3, we observe that the proposed MA algorithm outperforms the reference algorithms AVNS, MSH and MSLS in terms of the indicators f best and f avg . In particular, MA achieves better results than the current best-performing algorithm MSLS in terms of obtaining the best objective value for 9 out of the 12 instances, while only slightly worse for 3 instances. Moreover, MA is able to obtain better result in terms of f best and f avg than MSLS (10,301.37 vs. 10,303.43, and 10,316.86 vs. 10,319.86) within a relatively close computing time (49.65 m vs. 49.70 m). The experimental results based on the large EMH instances above show that our proposed MA algorithm is highly competitive in comparison with the best-performing algorithms in terms of solution quality and computational efficiency. Note that, we indicate the best objective values found by the corresponding algorithms in bold.

Computational Results on the AB Benchmark Instances
In view of its good performance, we further compare the MA algorithm with the state-of-the-art heuristic algorithm MSLS for the second instance set (i.e., AB instances). As shown in Table 4, the first two columns present the instance identifiers and the number of the customers, respectively. The following two columns (s and Γ) report the number of the number of refuel stations, and arcs in graph G. The best-known results found by all the previous algorithms proposed in the literature are presented in column BKS. The number of the vehicles (Veh*) corresponding to the best solution (BKS) found by the reference algorithms, and the lower bound for each AB instance are presented in the following two columns. The average gaps above the BKS values in percentages reported under columns %Best and %Avg are computed with respect to the best and average upper bound, respectively, obtained over 10 runs. As shown in Table 4, MA outperforms the state-of-the-art algorithm MSLS in terms of all the indicators (0.01 vs. 0.12, 0.30 vs. 0.32, 11.85 vs. 11.87, 123.99 s vs. 125.98 s). In particular, the MSLS algorithm can find the best-known solution for 24 instances out of 40 instances, while MA algorithm can obtain the best-known solutions for 37 instances, larger number of instances, out of 40 instances. To summarize, the results of the above extensive computational studies demonstrate the efficacy of MA for solving GVRP in terms of both solution quality and computational efficiency in comparison with the state-of-the-art algorithm.

Analysis on the Impact of the Adaptive Mechanism
To highlight the importance of the adaptive mechanism for managing the multiple neighborhood moves employed in local search phase, we carried out the following computational studies. Specifically, we use MA and its variant (called WMA) to denote the memetic algorithm based on the adaptive neighborhood moves management mechanism and the memetic algorithm without the adaptive mechanism, respectively. In other words, under WMA, the neighborhood moves are chosen in a token ring fashion (i.e., N1, . . . N8) with the same fixed probability without the reward and punishment strategy to select the neighborhood moves, while other ingredients are kept unchanged.
We selected 13 representative and large EMH instances for this test, and independently solved each instance for ten times by using MA and WMA. We report the computational results in Table 5, which shows the best objective value and average objective value, f best and f avg , respectively, for each instance, and the average computing time per successful run, i.e., Time, where we indicate the best objective values between the two algorithms in bold.  Table 5 shows that when the adaptive mechanism is used, MA outperforms WMA on five instances. In particular, in terms of the best and average objective values, MA obtains better values than WMA for all the tested instances, as illustrated by WMA's 10,304.86 and 10,322.12 versus MA's 10,301.67 and 10,316.81. As for the average running time, the two algorithms are very close to each other, i.e., WMA's 49.95 m versus MA's 46.64 m. The above results indicate that our adaptively selected neighborhood moves play a crucial role in boosting the performance of MA in solving GVRP.

Discussion
As mentioned above, we propose a novel memetic algorithm for solving the green vehicle routing problem. This population-based algorithm incorporates an adaptive local search procedure based on a reward and punishment mechanism inspired by reinforcement learning to effectively manage the multiple neighborhood moves and guide the search, an effective backbone-based crossover operator to generate the feasible child solutions, and a longest common subsequence-based population updating strategy to effectively manage the population. Compared with the traditional local search, the adaptive local search can manage the different neighborhood moves to adapt to different instances. Compared to the general crossover operator (such as one-point crossover operator), the backbone-based crossover can better integrate with the problem structure by inheriting the promising customer and station sequences from the routes. Based on the extensive experimental results reported earlier, one can observe that the proposed memetic algorithm is a highly effective heuristic in comparison with the best-performing methods in the literature for solving the green vehicle routing problem.

Conclusions and Future Research
Our proposed memetic algorithm (MA) for the green vehicle routing problem demonstrates the effectiveness of its key features, including an adaptive local search procedure, a backbone-based crossover operator to generate the feasible child solution, and a longest-common-subsequence-based population updating strategy.
Experimental evaluations on two sets of public benchmark instances show that our MA performs very favorably compared to the current state-of-the-art reference algorithms in the literature. In particular, MA is able to obtain highly competitive results in terms of both computational efficiency and solution quality for two sets of the EMH and AB instances. In addition, our computational studies demonstrate the effectiveness of the key strategy (i.e., adaptive mechanism to manage multiple neighborhood moves) incorporated into our proposed MA.
The main limitation of this research is the algorithmic generality. The algorithm presented in this paper does not guarantee the effectiveness of solving other types of GVRP problems, such as GVRP problems with the popular pickup and delivery constraints. Therefore, in order to solve other variant problems better, we need to consider adding more general strategies.
These outcomes motivate future research to extend our work in the following directions. First, it would be interesting to employ a powerful tabu search method (such as granular tabu search [23]) to improve the search capability of the adaptive local search phase. Second, the design of our approach implies that the development of related procedures that combine its strategies with those of other population-based frameworks like path-relinking [24] should be very promising. Finally, the success of these ideas for tackling the GVRP problem suggests that it would be worthwhile to test their performance in dealing with other variants of the vehicle routing problem [25] or scheduling problems [26].

Conflicts of Interest:
The authors declare no conflicts of interest.