A Novel Hyper-Heuristic for the Biobjective Regional Low-Carbon Location-Routing Problem with Multiple Constraints

With the aim of reducing cost, carbon emissions, and service periods and improving clients’ satisfaction with the logistics network, this paper investigates the optimization of a variant of the location-routing problem (LRP), namely the regional low-carbon LRP (RLCLRP), considering simultaneous pickup and delivery, hard time windows, and a heterogeneous fleet. In order to solve this problem, we construct a biobjective model for the RLCLRP with minimum total cost consisting of depot, vehicle rental, fuel consumption, carbon emission costs, and vehicle waiting time. This paper further proposes a novel hyper-heuristic (HH) method to tackle the biobjective model. The presented method applies a quantum-based approach as a high-level selection strategy and the great deluge, late acceptance, and environmental selection as the acceptance criteria. We examine the superior efficiency of the proposed approach and model by conducting numerical experiments using different instances. Additionally, several managerial insights are provided for logistics enterprises to plan and design a distribution network by extensively analyzing the effects of various domain parameters such as depot cost and location, client distribution, and fleet composition on key performance indicators including fuel consumption, carbon emissions, logistics costs, and travel distance and time.


Introduction
City logistics, particularly in the context of freight transportation, exert pressure on the economy, society, environment, and citizens [1].Among several tools of logistic network design, the traveling salesman problem (TSP) [2], the vehicle-routing problem (VRP) [3], and the location-routing problem (LRP) [4] are the most important and widely studied combinatorial optimization problems, especially the LRP.However, these basic tools are incapable of addressing sustainable development for the economy, society, and the environment simultaneously.Recently, premutation and coordination of the joint bond of economic, social, and environmental benefits have emerged as one of the most addressed problems.This inspires us to model the LRP by considering the environmental effect, which aims to reduce cost, shorten the service period, and reduce greenhouse gases (GHG) emissions, especially CO 2 .
Zhang et al. [5] first introduced the term low-carbon LRP (LCLRP) in 2017, in which the goal is to reduce carbon emission (CE) composed of depots' fixed CE and vehicles' traveling CE, by optimizing the sets of depots to open and tracings of the routes.Koc et al. [1] developed a regional LCLRP (RLCLRP) as a single objective to assess the effects of depots, clients, and fleets on logistics cost, fuel consumption and CE (FCCE).The authors applied a penalty function to the transfer fuel consumption cost.Our previous study [6] used the same method and considered a series of real-world requirements for the RLCLRP, such as hard time windows (HTW), simultaneous pickup and delivery (SPD), and a heterogeneous fleet (HF).This paper is based on the works of Koc et al. [1] and Leng et al. [6].Without a doubt, there are important differences among these three works.This paper defines a biobjective model for the RLCLRP whereas the others addressed a single objective.The solution approach in this paper is a quantum-inspired multiobjective hyper-heuristic (MOHH), while the approaches in our previous work and Koc et al. [1] were, respectively, single-objective hyper-heuristics and adaptive large neighborhood search metaheuristic.Besides, the real-world conditions in this paper are different those in the work of Koc et al. [1].The main contributions of this paper are as follows:

•
Problem domain.We first consider biobjective optimization for the RLCLRP with simultaneous pickup and delivery, hard time windows, and a heterogeneous fleet.

•
Problem model.A novel and simple mixed-integer programming model is defined.

•
Solution method.This work develops a quantum-inspired selection strategy (QS) as a high-level heuristic in the framework of MOHH.

•
Managerial insights.This paper provides several management suggestions by analyzing the effects of depots, clients, and fleet composition on key performance indicators, such as travel distance, travel time, CE, and operational cost.
The remainder of this work is organized as follows.Section 2 reviews the literature of the domain problem and the proposed heuristics.Section 3 defines the problem description and biobjective model.Section 4 details the proposed methods.Experiments and evaluation are given in Section 5, and, finally, conclusions are presented in Section 6.

Fuel Consumption and Carbon Emission Review
The LRP simultaneously handles two types of NP-hard decisions that are often addressed in logistics: the location-allocation problem (LAP) and the VRP [6,7].Among the various versions of the LRP in the literature, the LRP considering environmental effect (LCLRP) has gained many researchers' attention because environmental effects have a strong relationship to economic and social sustainability and even to public health.Similarly, the LCLRP is a more complicated NP-hard problem comprising of the LAP and the VRP with environmental effects, referred to as the pollution-routing problem (PRP) [8], green vehicle-routing problem (GVRP) [9], or eco-routing problem (ERP) [10].
In applications of the LCLRP, the most significant estimation is that of the amount of FCCE [11].As Demir et al. [12] stated, estimation and reduction of FCCE require accurate, real-time models to heighten the logistics efficiency.Also, the nature and types of various FCCE estimation models determine the application range, because transportation activity and traffic conditions are uncertain factors.As the public requires higher air quality, three types of models [13] have been developed:

•
In factors models, only few key factors are considered, such as travel distance and vehicle load.
Examples are the fuel consumption rate (FCR) [5,[14][15][16][17] and model by Tang et al. [18], In macroscopic models, average aggregate effecting factors are used to estimate the FCCE rate.Examples are the methodology for calculating transportation emissions and energy consumption (MEET) [19] and computer programs to calculate emissions from road transportation (COPERT) [20].
In terms of accuracy in estimating the FCCE rate, the microscopic models perform best, followed by macroscopic models, and the worst is the factor models.However, microscopic models need many more parameters than the other models and are the most complex.Moreover, factor models are simplified versions of microscopic ones, and the last two are interconvertible in traveling time.For state-of-the-art FCCE models, the reader is referred to the surveys and works of Lin et al. [11] and Demir et al. [12,13].

LCLRP Application Review
In recent years, environmental and public health increasingly call for sustainable development of environment, economy and society.As one of the first and foremost sources, low-carbon supply chain network design has been the focus of studies [25].As one of the most significant tools in supply chain, LRP should be bound to consider CE, which have been studied by many researchers.
Mohammadi et al. [26] considered multiple real-world conditions, such as stochastic demand and SPD, and developed multiobjective invasive weed optimization for the proposed model.Their results show higher efficiency than other multiobjective algorithms.Govindan et al. [27] took the two-echelon LCLRP with time windows into account for perishable food and proposed a multiobjective particle swarm optimization (MOPSO) for tackling the multiobjective model.Validi et al. [28] investigated a model of an integrated LCLRP for the demand side of a product distribution supply chain and a design of experiment-guided MOPSO optimization.In the same year, the authors [29] applied a technique for order of preference by similarity to create an ideal solution to rank the realistically feasible transportation routes resulting from the trade-offs between costs and CO 2 .Tang et al. [18] combined the LCLRP with an inventory that takes the client's environmental behaviors into account.They developed a MOPSO to examine the efficiency of the biobjective models.Qazvini et al. [30] studied the LCLRP with the time windows of clients and split-delivery by using the cost of fuel consumption as a constraint, and their results suggested efficiency and effectiveness of the proposed single-objective model.Toro et al. [31] presented a green LRP model, and the results shown that the proposed model can generate a set of tradeoff solutions and interesting conclusions about the operational costs and the environmental impact were obtained.Nakhjirkan and Rafiei [32] developed a genetic-based solution to tackle a biobjective model for the LCLRP with stochastic demand.Wang et al. [33] constructed a biobjective modelcomposed of logistics cost and CE.LINGO software was used to tackle this model.Wang and Li [34] considered a model for the LCLRP with a HF, SPD, and soft time windows, and used a penalty way to calculate the penalty cost consisting of FCCE and vehicle and clients waiting time.Rabbani et al. [35] developed a multiobjective model for the LCLRP with a HF.Among three objects, cost, distance and CE are viewed as contradictions.A nondominated sorting genetic algorithm-II (NSGA-II) was developed to tackle their three objects.Wang et al. [36] used the FCR model to calculate the FCCE and proposed a genetic algorithm to tackle a single-objective model for the LCLRP.Chen et al. [25] combined NSGA-II with tabu search (TS) to address the biobjective model of the LCLRP considering full truckloads.Qian et al. [37] applied the FCR model to the LCLRP and designed a TS combined with reinforcement learning as high-level tactics.Faraji and Afshar-Nadjafi [38] considered multiple conditions in their LCLRP, such as time windows, HF, multiple periods and products, and associated GA with simulated annealing to solve their biobjective model.
Looking at the above papers' FCCE models, only the one proposed by Toro et al. [31] used a microscopic model; three papers adopted the FCR factor model as their main FCCE model: Wang and Li [34], Chen et al. [25], and Qian et at.[37], which are similar to the works by Leng et al. [15], Zhao et al. [16] and Wang et al. [17].However, others used self-customized factor models.There were eight papers published in 2018 (including in literature [6,[15][16][17]), and six papers in 2017 (including Zhang et al. [5]), followed by three papers in 2016 (including Koc et al. [1]) since being proposed by Mohammadi et al. [26] in 2013.The above data suggest that the LCLRP will be a hot research topic.Also, multiobjective models account for 61.9% of the above papers.Most of these multiobjective models used evolutionary algorithms, such as MOPSO and NSGA-II.Lastly, the biobjective models in most of the papers consisted of logistics cost and CE, but a few papers adopted different approaches; for example, FCCE can be viewed as traveling cost by penalizing [1,6,16,34,36], constraint [30], or the single-objective without any cost [5,15,17].
Although this paper is based on the problem used by Koc et al. [1] and Leng et al. [6], to our knowledge, a biobjective model is first defined in our paper in terms of cost and vehicle waiting time.Moreover, simultaneous pickup and delivery, hard time windows, and a heterogeneous fleet are simultaneously addressed in our biobjective LRP model.Additionally, arc-specific speed and speed zones are also applied for the first time in the biobjective LCLRP.

Hyper-Heuristic Review
The hyper-heuristic (HH) approach has received increasing and renewed research interest toward developing a search method that is more applicable [39].Early, the raw ideal of HH was derived from Denzinger et al. [40], then basic rationale was developed by Cowling et al. [41] as "heuristics to choose heuristics", which has two meanings: (i) stated as "no free lunch" [42], existing theory and numerical experiments have shown that it is impossible to develop a single algorithm that is always efficient for a diverse set of optimization problems; and (ii) it helps researchers to reduce the inherent time and effort required to set up a new domain, as it is a difficult task for testers without deep prior domain-specific knowledge.Burke et al. [43] proposed an extensive definition, stating that "heuristics choose heuristics" and "heuristics generate heuristics."This paper investigates the former based on population search and presents a brief review hereafter.
There are two levels with different goals in the HH framework: high-level heuristics (HLHs) and low-level heuristic (LLHs).The former focuses on the selection strategy through supervising the performance of heuristics in the LLH for intelligently selecting the appropriate heuristics, and acceptance criterion through deciding to either accept or reject the resultant solutions [44,45].The difference between HLH and LLH is the search space: HLH manipulates the space consisting of a fixed pool of LLHs instead of directly transforming the space of solutions [6,46].Among various selection strategies, three modules can be distinguished: online, offline, and no-learning.The first takes place while the algorithm is performing and can exploit task-dependent local properties to decide on the right LLH to apply [47].Examples of online learning include the choice function (CF) [41], fitness rate rank based multi-armed bandit (FRR-MAB) [48], reinforcement learning [49], and metaheuristic-based selection strategies such as the genetic algorithm (GA) [50], and the quantum evolutionary algorithm (QEA) [15,16,51].Offline learning always applies forms of rules or programs to gather information by training a set of instances; learning classifier systems, case-based reasoning, and genetic programming are classic examples.The no-learning strategy does not use any feedback information from the search space.Examples are simple random (SR), random descent (RD), and random permutation (RP).Acceptance criteria fall into two categories: deterministic and nondeterministic acceptance.The former accepts the resultant solution by fitness or special rules, such as all moves (AM), only improved (OI), and improved and equal (IE).The nondeterministic criteria accept the resultant solution in forms of threshold or probability; examples of nondeterministic criteria in HH are great deluge acceptance (GDA) [52], late acceptance (LA) [53], and Monte Carlo (MC).The above strategies are the classic ones in single-objective optimization, especially in combinatorial optimization problems, such as the educational time-tabling problem [53], the VRP [39], and the LRP [6,[15][16][17].It is worth noting that only the nondomain information or knowledge can pass the knowledge barrier between HLH and LLH; this includes the fitness improve rate (FIR), a count of used operators, maximum iteration without improvement, elapsed time, and information of acceptance and selection.
In recent years, several researchers have investigated the application of MOHHs in many research fields and produced many strategies in that framework.Among the literatures about MOHH, two categories can be discerned: multiobjective evolutionary algorithm-based HH (MOEAs-based HH) and MOHH-II.The former investigates the selection of the LLH, which consists of local search, crossovers, or mutation operators implemented in the frameworks of MOEAs; the latter utilizes the metaheuristics as LLHs, such as strengthen Pareto evolutionary algorithm2 (SPEA2), NSGA-II, and multiobjective genetic algorithm (MOGA).Given this, the acceptance mechanisms of the former always depend on the metaheuristics' environmental selection, such as the crowding distance and nondominated sorting in NSGA-II, and the latter always applies the acceptance criteria inspired from single-objective HH, such as AM, GDA, and LA.Several literatures about MOHH are provided as follows: • MOHH-I: MOEAs-based HH Hitomi and Selva [54] proposed a classification that groups credit assignment strategies by the sets of solutions used to assess an operator's impact and by the fitness function used to compare those sets of solutions.MOEAs-based HHs were developed for a series of benchmarks.Ferreira et al. [55] utilized MOHHs to solve software product line testing.They integrated of the HH component, credit assignment, and selection method (i.e., random and upper confidence bound) into MOEAs, that is, NSGA-II, SPEA2, indicator-based evolutionary algorithm (IBEA), and MOEA based on decomposition (MOEA/D).Guizzo et al. [56] designed their MOHH to tackle integration and test ordering in Google Guava by inserting it MOHH into the MOEAs framework to evaluate the operators' performance through CF and MAB.Yao et al. [57] proposed a MOHH framework for walking route planning in a smart city and a reinforcement learning mechanism was established to select the LLH.Xu et al. [58] used a genetic-based MOHH to tackle multiobjective mapping for network-on-chip.In their MOHH, HLH selects suitable operators through a 'reward' mechanism.Castro et al. [59] integrated HH into MOPSO to select a proper combination of leader and archiving methods, and four selection strategies (i.e., CF, MAB, SR and roulette wheel) were designed.Zhang et al. [60] applied extreme value credit to reward operators and probability matching to select suitable operators.
• MOHH-II: MOEAs as LLH Maashi et al. [61] presented a learning selection choice function-based MOHH (MOHH-CF) to choose proper LLHs (i.e., NSGAII, SPEA2 and MOGA) and applied AM as acceptance criterion.Next year, Maashi et al. [62] developed GDA and LA to accept/reject the solutions for MOHH-CF.In 2017, Li et al. [63] also applied this framework to solve a real-word problem.In their framework, three selection strategies (random choice, fixed sequence and CF) and two acceptance criterions (GDA and Best acceptance) were developed.
However, no other researchers (exception of Qian et al. [37]) have yet developed a MOHH for the LRP or even the VRP for the following reasons: (i) Such algorithms are much more time consuming to execute than others.(ii) No high-efficiency operators have been developed because domain knowledge is difficult for testers to compile.(iii) It is not easy to obtain promising solutions because the VRP, LRP, and their variants are NP-hard problems and much more complicated than others.Therefore, we devote this paper to developing MOHHs for tackling the LCLRP and designing efficient operators for testers.In addition, we present a novel selection strategy based on the QEA.Our results demonstrate that our MOHH can efficiently tackle the proposed problem and provide competitive Pareto solutions.To the best of our knowledge, MOHH has not been used thus far to address the proposed biobjective problems.

Mathematical Model
The adopted FCCE model is the CMEM; for the corresponding settings see Refs.[1,6].The description, assumptions and definition of the proposed problem will be given in Section 3.1.Section 3.2 provides the corresponding mathematical model and pivotal constraints for the proposed problem.Finally, Section 3.3 provides some extra feasible restrictions on the LCLRP.

Description and Assumptions of the Proposed Problem
The proposed problem can be defined by a complete and directed graph G = (V, E) with a set V of nodes and a set E of edges.V consists of a set of clients C = {1, 2, . . ., N} and a set of depots D = {1, 2, . . ., M}.Each client i∈C has d i delivery demand and p i pickup demand under a known and deterministic location and a specific service time window (e i , l i ).Each depot j∈D has a fixed capacity CD j and fixed fee FD j .In each depot j∈D, three types of vehicles are available, denoted as the fleet set F = {L1, L2, M}, and each vehicle f ∈F has a capacity CV f and a one-time rent fee FV f .E = {(i,j): i,j∈V, i =j}\{(i,j): i,j∈D } consists of traveling distance and speed.The goal is to identify the set of depots to open and tracings of routes by minimizing the costs and vehicle waiting time.
The following assumptions are made: (1) each client must be satisfied only once; (2) each vehicle must return to the departure depot; (3) the load of each vehicle in each edge must not exceed its capacity; (4) the load of each depot must be less than its capacity; (5) the number of each type of vehicle is unlimited; (6) the maximum load of a route is the principle for selecting the type of vehicle; (7) the vehicle must arrive at client's nodes before the closing time window; (8) if a vehicle arrives before the opening time window, it waits until the opening time windows.

Mathematical Model of the Proposed Problem
Before developing the mathematical model, we present the following additional definitions of decision variables: (1) x ijf = 1 if a vehicle of type f travels from node i∈V to j∈V, and x ijf = 0, otherwise; (2) y j = 1 if the depot j∈D is used, and y j = 0, otherwise; and (3) z ij = 1 if depot j∈D services client i∈C, and z ij = 0, otherwise.
A biobjective mathematical model of the proposed problem is given as follows: where c fc (rmb/L) and c ce (rmb/kg) are, respectively, the price of 1 L of FC and 1 kg of CE; FC ij and CE ij are FC (in liters) and CE (in kilograms), respectively; AT j is the moment the vehicle arriving at node j∈V.The objective function Cost is minimized in terms of the total logistics cost consisting of four parts: depots cost, vehicle cost, FC cost, and CE cost.
The following constraints ensure that each client is served exactly once: Constraints ( 5) and ( 6) guarantee that each client is assigned to only one depot and one vehicle, respectively: Constraints ( 7)-( 9) forbid infeasible routes that do not return to the departure depot: Karaoglan et al. [64] had proved the validity of the above constraints.Constraint (10) enforces that total delivery and pickup demand of clients assigned to each depot must not exceed its capacity: Constraints (11) and ( 12) are the vehicle load at departure and termination depot: where Q ijf is the dynamic load of a vehicle type f ∈F.Constraint ( 13) is the dynamic equilibrium of the load after visiting client j∈C: Constraint (14) ensures that load of the vehicle type f ∈F traveling at each edge must be less than its capacity.Constraint (15) is the additional bound on the load variable.
Constraints ( 16) and ( 17) represent the bounds on the load of each depot: Namely, the total load of vehicles departing/arriving at depots must be equal to the total delivery/pickup demand of clients assigned to it.Constraints (18) and (19) is the bounds on the arrival time at each node, namely, the moment of a vehicle must arrive at a client before closing time windows: where AT i and ST i are, respectively, arriving time and service time; TT ij is travel time between nodes i and j.

Other Vaild Constraints
The above mathematical model and constraints shown by ( 3)- (19) are valid for the LCLRP with SPD, a HF, and a HTW.Other valid restrictions can also be provided but are not necessary.
The first inequality we need to add as a constraint is a special case of classical subtour elimination constraints: where S is the set of clients serviced by a vehicle type f ∈F.Constraint (20) breaks the subtours in the route.The second valid inequality is the actual count of vehicles uses: Constraint ( 21) imposes the number of routes; namely, the utilization number of vehicles must be larger than the value that is equal to total demand of clients dividing the maximum fleet capacity and cannot exceed the number of clients.The third inequalities, stipulate that at least one client must be assigned to the selected depot and the client cannot be serviced by an unselected depot.The following constraints ensure that each vehicle must be assigned to the open depots and the opened depots must serve at least one route: Constraint (26) ensures that the number of opened depots must be larger than the minimum number of k opened depots determined by the constraint Constraint (27) imposes that different depots in a single route are forbidden:

Proposed Hyper-Heuristic
This section describes the proposed MOHH combined with a novel selection strategy: a quantum-inspired selection strategy.In the proposed framework, a pool of metaheuristics for scheduling, that is, SPEA2 [65], NSGA-II [66], nondominated sorting and local search (NSLS) [67], and bi-goal evolution (BiGE) [68], that has corresponding expertise, is developed as LLH.At high-level, a quantum-inspired learning strategy for the multiobjective problem is used to select promising metaheuristics and to maintain the diversity of choice.
The proposed MOHH is a population-based method.Each independent run begins with initialization, including solution and heuristic space initialization.In each iteration, the proposed strategy adaptively selects a metaheuristic to search the solution space for a number of generations.After each iteration, the metaheuristic space is updated according to a quantum-inspired selection strategy and an appropriate metaheuristic is obtained via an observation mechanism.The maximum number of iterations is the basis of the stopping criterion.In the following section, we describe the proposed MOHH from the following aspects: (1) chromosome representation, (2) basic general-duty framework of metaheuristics for multiobjective RLCLRP, (3) problem-domain operators applied in each metaheuristic, (4) quantum-inspired-learning high-level selection strategy, (5) three nondeterministic acceptance criteria, and (6) a framework of MOHH.

Chromosome Representation
One significant decision in designing a metaheuristic is how to represent and relate the solutions to the search space efficiently [26].In this paper, we use a simple, intuitive, and competent representation, which was proposed by Leng et al. [6,15] and Zhao et al. [16].Given the multiobjective problem, we further modified the chromosome representation for the multiobjective RLCLRP.Each solution contains a set of routes, denoted as V = {v 1 , v 2 , . . ., v ϕ }, where v i is a vehicle traveling route consisting of the service sequence of clients and attributive depot inserted in two ends of it.All routings are kept in a subcell array.It is worth noting that each route must satisfy all constraints (3)-( 19) to avoid restoring feasibility for solutions and decoding.To calculate the fitness of each solution quickly, corresponding properties of each route are also stored in its subcell, including the departing and returning loads of the vehicle, the type of vehicle and two objectives concerning cost and vehicle waiting time.The calculation we need to do is to obtain the cost fitness of an individual by simply summing the FCCE, opened depots, and vehicle costs.For the second objective, the vehicle waiting time of each route can be used directly.
To clarify the chromosome representation, Figure 1  To clarify the chromosome representation, Figure 1 provides a simple example of a solution with 15 clients, four depots, and three vehicle types.The left portion shows four vehicle traveling routes, and the right portion shows the properties of each route with the type of vehicle in the last position.All individuals are generated by randomly sorting the sequence of clients and applying a greedy rule to partition clients into sets assigned to each vehicle, and then each route is assigned to the depot that can service those clients in this routing.

Pool of Low-Level Heuristics
In this paper, we focus on MOHH-II by using four efficient MOEAs: SPEA2, NSGA-II, NSLS, and BiGE.A general framework for the RLCLRP is developed as shown in Algorithm 1. Mpop is the mutational population and Cpop is the population after performing local search operators.Line 1 is used to generate the necessary parameters if needed.Lines 2-21 are the iterations for obtaining the Pareto solutions.First, one of the mutation operators is randomly selected to perturb the chosen individual in the population (by tournament, if needed), then the results are merged into Mpop.Then we merge the newly generated solutions with the remaining individuals without mutating into the next phase if the number of mutational individuals is less than N.After that, one of the local search operators is randomly applied to improve each individual in Mpop, and the new population CPop is obtained.Lines 18 and 19 show the basic difference among four MOEAs, as the environmental selection is an extremely significant part of maintaining the trade-off between exploration and

Pool of Low-Level Heuristics
In this paper, we focus on MOHH-II by using four efficient MOEAs: SPEA2, NSGA-II, NSLS, and BiGE.A general framework for the RLCLRP is developed as shown in Algorithm 1. Mpop is the mutational population and Cpop is the population after performing local search operators.Line 1 is used to generate the necessary parameters if needed.Lines 2-21 are the iterations for obtaining the Pareto solutions.First, one of the mutation operators is randomly selected to perturb the chosen individual in the population (by tournament, if needed), then the results are merged into Mpop.Then we merge the newly generated solutions with the remaining individuals without mutating into the next phase if the number of mutational individuals is less than N.After that, one of the local search operators is randomly applied to improve each individual in Mpop, and the new population CPop is obtained.Lines 18 and 19 show the basic difference among four MOEAs, as the environmental selection is an extremely significant part of maintaining the trade-off between exploration and exploitation.A stopping criterion is designed as the maximum number of generations, namely max gen .In the following, it is important to decide how to define the operators.Leng et al. [6] provided 16 domain knowledge operators for the single-objective RLCLRP, and in this paper, we adopt and modify those operators to tackle the multiobjective RLCLRP, as follows.Among the operators in this paper, three types can be distinguished according to their role in search space: mutational operators (Mu), nondominated local search operators (NDLS), and dominating local search operators (DLS).Among the first ones, although insufficient for achieving competitive results, they are usually helpful for providing randomization in the search for a global optimum by performing simple and slight moves, as follows: • Mu 1: Add.Diversifies the selected depots by opening a new one and randomly assigning between one-third and two-thirds of the routes to it [7], or closes one depot and all of its routes are assigned to another depot [6].
• Mu 2: Decomposition.Routes with vehicles of type L2 and M are randomly (0.5) decomposed into two vehicles with a lower load.• Mu 3: Interchange.One-third to two-thirds of routes with at least two clients are randomly selected to swap one client with another.• Mu 4: Shift.Shifts a client into any place of another route.
The second module is developed for quickly obtaining nondominated or solution-dominating parent, as follows: • NDLS 1: 2-Opt.Exchanges two edges between different routes.
• NDLS 3: Swap of a segment.Swaps two or three clients between different routes.
• NDLS 5: Relocation of a segment.Relocates two or three clients into another route.
• NDLS 6: Decomposition 2. Decomposes all routes with at least two clients into two vehicles.
For DLS, five operators are proposed to obtain better solutions, namely results dominating parent, and the operators are developed by modifying constraints in NDLS 1-5.For the above operators, a key aspect is how to construct the implementation rules.To reduce computing time, we developed a rule whereby, if two groups of routes are obtained by randomly classifying them, then only the routes from the different group can be operated mutually; this rule is implemented in NDLS 1-5, DLS 1-5 and Mu 4. It is worth noting that, as long as restrictions ( 3)-( 19) are satisfied, the moves would be implemented, because no repair methods are needed and computing time is reduced.

High-Level Control Stratrgy
Among the operators in this paper, three types can be distinguished according to their role in search space: mutational operators (Mu), nondominated local search operators (NDLS), and dominating local search operators (DLS).Among the first ones, although insufficient for achieving competitive results, they are usually helpful for providing randomization in the search for a global optimum by performing simple and slight moves, as follows: • Mu 1: Add.Diversifies the selected depots by opening a new one and randomly assigning between one-third and two-thirds of the routes to it [7], or closes one depot and all of its routes are assigned to another depot [6].For DLS, five operators are proposed to obtain better solutions, namely results dominating parent, and the operators are developed by modifying constraints in NDLS 1-5.For the above operators, a key aspect is how to construct the implementation rules.To reduce computing time, we developed a rule whereby, if two groups of routes are obtained by randomly classifying them, then only the routes from the different group can be operated mutually; this rule is implemented in NDLS 1-5, DLS 1-5 and Mu 4. It is worth noting that, as long as restrictions (3)-( 19) are satisfied, the moves would be implemented, because no repair methods are needed and computing time is reduced.

High-Level Control Stratrgy
The HLH of the proposed MOHH framework is developed to select an appropriate heuristic to search the solution space and judge the acceptance of the resultant solutions.In the dynamical search environment, the performance of an LLH in a very early stage may be irrelevant to its current performance.Therefore, the HLH should be capable of quickly and accurately reflecting the performance of each heuristic and choosing the most appreciate LLHs to escape the phase optimal.Resolving this issue entails a design a strategy that can accurately monitor and track the performance of each LLH, that is, the chosen probability of each heuristic should be updated based on its real-time performance.An acceptance criterion in the HLH is used to decide whether or not to accept the resultant solutions.The convergence speed and direction, diversity, and strength of hyper-heuristics, to some extent, are determined by an acceptance criterion.Similarly, the design of an acceptance criterion in the MOHH is a different key technology.

Proposed High-Level Selection Strategy
In quantum computing, quantum bits and the superpositions of states make up the basic concept and principle.Instead of using binary and numeric representation, the qubit chromosome is used as a probabilistic representation and a qubit individual is modeled by a string of qubits.
A qubit is the smallest information storing unit; it may be in the 1 or 0 state, or in any superposition of these, as shown in Equation (28).During the update process, a q-gate is used to change the qubit to a more reasonable state, which is similar to the process of the LLH selected probability.In this paper, we use qubit to represent the LLH selection.Given ζ heuristics, a qubit individual with a string of |ζ| qubits can be expressed as follows: where llh i = [α i ,β i ] T is the qubit for the i-th LLH and α i and β i are the complex numbers.The parameters |α| 2 (|β| 2 ) are, respectively, the probability that a qubit will be found in the 0 (1) states (i.e., the probability of heuristic disuse (selected) states).Constraints should be secured between α and β, namely, α 2 + β 2 = 1.Once the beginning of the search, α 2 is equivalent to β 2 , namely, α 2 = β 2 = 0.5.The q-gate is an evolutionary learning strategy used to update the state of each qubit of individual and heuristics space, as given in where µ and θ are, respectively, the rotation direction and angle each qubit towards to 0 or 1 state.
In the following description, we present a strategy for calculating µ and θ by using the D-metric [62,69], which is an extended version of the hypervolume (HV) [62,69] used to measure the covered space between Pareto solutions obtained and a reference point.The description of the D-metric is depicted in Figure 2 [62,69].

( ) ( )
where μ and θ are, respectively, the rotation direction and angle each qubit towards to 0 or 1 state.
In the following description, we present a strategy for calculating μ and θ by using the D-metric [62,69], which is an extended version of the hypervolume (HV) [62,69] used to measure the covered space between Pareto solutions obtained and a reference point.The description of the D-metric is depicted in Figure 2 [62,69].In Figure 2, w 12 is the space covered by front 1 and not covered by front 2, and w 21 represents the size of the space dominated by front 2 and not dominated by front 1.The value γ is the intersection space simultaneously covered by fronts 1 and 2, and the mathematical relations can be provided as where ϑ(F 1 + F 2 ) is the union space covered by fronts 1 and 2 and ϑ(F 1 ) and ϑ(F 2 ) are, respectively, the space covered by fronts 1 and 2. We can further deduce that where D(F 1 , F 2 ) represents w 12 .As stated by Zitzler [69], if w 12 = 0 and w 21 > 0, then front 2 dominates front 1. Maashi et al. [62] further slacked the dominance relation: if w 21 > w 12 , then front 2 dominates front 1.Therefore, the former is a strict version of the latter.To use the dominance relation, we determine the rotation direction by the following equation: If we assume that front 1 is the resultant front and that front 2 is the parent front, after determining the µ, the most significant task is to calculate the rotation angle.In this paper, we use multi-indicators to calculate the reward values, such as the HV, spacing, and nondominated relation.The dominance relation between F c and F p can be expressed as where f c and f p are, respectively, a child solution of F c and a parent of F p ; HV and S are, respectively, the hypervolume and spacing value after standardization by dividing the maximum value in the union set of F c and F p (which effectively standardizes the magnitude size of HV and S).HV can represent the convergence and distributivity of F c .S denotes the uniformity of F c .Therefore, can efficiently describe the characteristics of F c and provide the performance indicators for LLH.
The heuristic qubit associated with LLH performance can then be generated by where pf (lh) is the performance score of the lh th heuristic and Q is a constant ensuring that the θ value range is [-0.05π,0.05π].
In each iteration, a metaheuristic from a pool of heuristics is chosen by roulette selection for the population.In the roulette selection, the probability of each LLH is Algorithm 2 calculates the reward, including two aspects involved in the quantum-inspired selection strategy: calculation of the rotation direction and angle.
Algorithm 3 is the quantum-inspired learning selection, involving two aspects: a q-gate update process and selection of low-level metaheuristics.inspired learning selection, involving two aspects: a q-gate update process and selection of low-level metaheuristics.
Algorithm 3. Quantum-inspired learning selection strategy

Proposed High-Level Acceptance Criterion
First, we introduce two acceptance criteria, GDA and LA, which were used in Mashael et al. [62] to solve multiobjective continuous problems (i.e., Walking Fish Group and the crashworthiness test suite).Two significant parameters are involved: Level and S. Level is the water level initially assigned by D(Ppop, Cpop), and S is the prefixed speed rate.According to the GDA rule, the child population with D(Cpop, Ppop) > Level can replace the parent population.Once the child population is accepted, the Level value will be added to S. As the Level increases, a greater difference between child and parent populations is required to accept the child population than previous iterations.

Proposed High-Level Acceptance Criterion
First, we introduce two acceptance criteria, GDA and LA, which were used in Mashael et al. [62] to solve multiobjective continuous problems (i.e., Walking Fish Group and the crashworthiness test suite).Two significant parameters are involved: Level and S. Level is the water level initially assigned by D(Ppop, Cpop), and S is the prefixed speed rate.According to the GDA rule, the child population with D(Cpop, Ppop) > Level can replace the parent population.Once the child population is accepted, the Level value will be added to S. As the Level increases, a greater difference between child and parent populations is required to accept the child population than previous iterations.
LA is another acceptance criterion using the D-matrix.In the LA mechanism, a list C of length L is used to record D(Cpop, Ppop) and is initialized to D(Ppop, Cpop) at the first iteration.In the acceptance mechanism, if D(Cpop, Ppop) > C l or D(Cpop, Ppop) > D(Ppop, Cpop) is satisfied, then the resultant solutions are accepted.However, if the above constraints are declined, then C l is set to the D-matrix of the pair of the front when Ppop was accepted previously.Compared to GDA, LA is the slacker version.Mashael et al. [62] illustrated that GDA is much more efficient than LA.In this paper, we provide another acceptance criterion based on the metaheuristic environmental selection, namely the nondominated sorting and crowded distance of NSGA-II (AC-NDSCD).The major reasons for introducing this criterion are as follows: (i) NSGA-II has been a mainstream MOEA, and the performance of its environmental selection is comparably better than others; and (ii) the proposed acceptance mechanism is irrelevant to domain knowledge (i.e., LCLRP).Unquestionably, many better environmental selections from other MOEAs can be used as the acceptance criterion, but the main idea is to illustrate that environmental selections from MOEAs are also utilized as acceptance mechanisms.Our experimental results show that our proposed AC-NDSCD outperforms GDA and has performance similar to that of LA.

Framework of the Proposed Hyper-Heuristic
Algorithm 4 is the framework of the quantum-inspired hyper-heuristic, which provides one part outside main loop and four parts insides main loop.The algorithm begins initialization by setting parameters and generating a population made of feasible random chromosomes, stopping when a maximum number of iterations is reached, max iter .

Framework of the Proposed Hyper-Heuristic
Algorithm 4 is the framework of the quantum-inspired hyper-heuristic, which provides one part outside main loop and four parts insides main loop.The algorithm begins initialization by setting parameters and generating a population made of feasible random chromosomes, stopping when a maximum number of iterations is reached, maxiter.
In each iteration of the main loop, a metaheuristic (MOEA) is chosen by the quantum-inspired selection strategy, then the selected MOEA is applied to generate the child population (Cpop).Finally, the acceptance mechanism is used to maintain exploration and exploitation of Pop into the next iteration by GDA, LA, and AC-NDSCD.Archived methods can also be applied in the proposed algorithm; as Zhang et al. [70] stated, an external approach can always obtain better solutions.In this paper, we save 5 × N archived individuals in each iteration.
The algorithm ends when the main loop stops, returning the Pareto solutions.For the sake of fairness, the archive method is removed when compared with others without the archive approach.
In each iteration of the main loop, a metaheuristic (MOEA) is chosen by the quantum-inspired selection strategy, then the selected MOEA is applied to generate the child population (Cpop).Finally, the acceptance mechanism is used to maintain exploration and exploitation of Pop into the next iteration by GDA, LA, and AC-NDSCD.Archived methods can also be applied in the proposed algorithm; as Zhang et al. [70] stated, an external approach can always obtain better solutions.In this paper, we save 5 × N archived individuals in each iteration.
The algorithm ends when the main loop stops, returning the Pareto solutions.For the sake of fairness, the archive method is removed when compared with others without the archive approach.

Computational Experiments and Analyses
Implementation aspects and evaluations of the proposed mathematical model and hyper-heuristic are presented and discussed in the following sections.

Implementation Aspects and Configuration of Parameters
The proposed hyper-heuristic was coded in parallel in MATLAB 2018a and results were achieved using a 4.0 GHz Intel Core i7-6700K CPU with 12 GB of RAM and running Windows 10.
The parameters of the proposed hyper-heuristic were obtained empirically.We conducted an empirical evolution with 10 runs for each parameter set in the hyper-heuristic.In this evaluation, we tested different parameter settings to obtain the solutions and found that the following were the most suitable.It is worth noting that we prefer to use the parameter configuration found in the literature.
The maximum iteration (max iter ) was set to 100, and the maximum generation (max gen ) depended directly on the number of clients (N) as follows: The multiplier α depended on the size of the instance, taking the values 30 for 20 clients, 38 for 30 clients, 45 for 40 clients, and 53 for 50 clients.In other words, the total iterations were 2000 for 20 clients, 3800 for 30 clients, 4500 for 40 clients, and 5300 for 50 clients.The number of individuals in the population and archive population were, respectively, set to 100 and 500.
We also evaluated the range of mutation probability from 0.1 to 0.5.The value was set at 0.35.The reasoning is that the larger the value, the more mutational individuals (Mpop) perform local search, and the more time-consuming it becomes, and the fewer individuals from Ppop there are, the less intensified the search for good solutions, and vice versa.
The configuration parameters used in GDA and LA follow the default values suggested in Maashi et al. [62]: rain speed (S = 0.0003) and list length (L = 5).
We identified the following points that can be threats to the validity of results: (i) The parameter settings may affect the results, as they may change with the characteristics of instances; and (ii) the experiments were conducted using only four datasets (20, 30, 40 and 50 clients) and tested over 10 independent runs, and it is necessary to execute more experiments with a broader range of datasets and larger independent runs to see whether similar behavior is maintained.

Test Instances
Because instances of the proposed biobjective problem are lacking and the problem is first studied in this paper, the test instances are randomly generated and based on the purposes of the experiments.The logistics distributed field is partitioned into three speed zones (1, 2, and 3) as follows.The speed of each zone was set to S zone1 ~U (20,40), S zone2 ~U (40,60) and S zone3 ~U(60, 80) km/h, and the size of each zone was stochastically conducted, and the nested nature was maintained.All clients and depots were located in a 10×10 km 2 grid.Three sets of instances were generated: Verification of Efficiency (EFFVER), Locations of Depots, and Clients (LDC), and Large Fleet (LFLEET).The first set was utilized to verify the efficiency of the proposed algorithm (Section 5.4) and the mathematical model (Section 5.5).The second set was used to detect the joint effect of the locations of depots and clients; the last set was used to analyze the effect of fleet composition.Among the above instances, the cost of fuel consumption (c fc ) was set to 6.708 RMB/L and the cost of CE (c ce ) had a value 32.09 RMB/ton.Time windows of the clients were randomly selected from the instance C101 proposed by Solomon et al. [71] and modified by dividing by 10.The service time of each client depends on the total demand (delivery and pickup) and the maximum for all clients was set to 540 s; the pickup and delivery demands of all clients obeyed a uniform distribution in the range 100-2000 kg.The cost of depots in zone 1, zone 2, and zone 3 was, respectively, 500, 300 and 200 RMB.The capacity of each depot also obeys a uniform distribution from in the range [20,25] tons; The cost of fleet composition was set to 38, 44, and 54 RMB for L1, L2, and M, respectively.The cost was used as one-time task instead of for one day, and other parameters were obtained from Koc et al. [1].

Performance Metrics
As stated by Li et al. [72], the assessment of a MOEA is mainly entails (i) the proximity to the real Pareto front (PF) (the closer, the better) and (ii) the diversity of solutions (the wider the front is, the better).Taking into account these evaluation criteria, we utilized two widely used metrics to evaluate the convergence, uniformity, and diversity of the approximate front (AF): inverted generational distance (IGD) [62,72,73] and HV [62,69,74].
The IGD value represents the quality and uniformity of AF, and the lower the IGD value, the better the quality and uniformity of AF for approximating the entire PF.Meanwhile, the HV value can express the distribution and diversity of AF, and the larger the HV value, the better the distribution and diversity of AF.
It is worth noting that both IGD and HV are calculated after normalization.Also, in our case, the true Pareto solutions are unknown.For this reason, and following the literature [69,74], PF ture was obtained by all approximated Pareto solutions achieved with all methods over 10 runs after removing dominated and repeated solutions, which is, in fact, an approximation of the real front.

Efficiency of the Proposed Hyper-heuristics
To compare the proposed algorithm with existing MOEAs, we conducted experiments on the four LLHs and another four MOEAs that performed well in solving multiobjective continuous optimization problem: grid-based evolutionary algorithm (GrEA) [75], IBEA [76], NSGA-III [77], and region-based selection in evolutionary-II (PESA-II) [78], which are also the LLHs of MOHH/QS2.The results of the 10 approaches are provided in Tables 1 and 2, which show the mean IGD (Table 1) and HV (Table 2) values obtained by performing 10 runs, where the last row is the number of first, second, and third places in each instance, abbreviated as f/s/t.In each table, the last two columns provide the results obtained by MOHH/QS and MOHH/QS2, and the former used BiGE, NSGA-II, NSLS, and SPEA2 as LLHs.From Table 1, we find that MOHH/QS obtained 13 best mean IGD values in 16 instances and three third places, while MOHH/QS2 obtained only one best mean IGD value, three second places, and one third place.SPEA2, NSLS, NSGA-II, and BiGE are the top in eight MOEAs, while the other four MOEAs obtained only a few beneficial results.From Table 2, we see that MOHH/QS obtained nine best mean HV values, four second places, and two third places, and MOHH/QS2 also performed well by obtaining five best mean HV values, four second places, and two third places.NSGA-II, SPEA2, IBEA, and NSGA-III performed better than others in terms of the total number of places in each instance.
The following conclusions can be drawn from the results in Tables 1 and 2: (i) A combination of MOEAs can efficiently improve their performance, as both MOHH/QS and MOHH/QS2 outperformed the others in both mean IGD and HV values; (ii) LLHs have a huge impact on the performance of methods even though they shared the same high-level strategy (QS+GDA), because MOHH/QS outperformed MOHH/QS2; therefore, testers should select some outstanding operators as LLHs to improve their algorithm's performance; and (iii) the proposed MOHH method is meaningful for solving the proposed problems.Given the performance of SPEA2, NSLS, NSGA-II, and BiGE, the following experiments were tackled by using MOHH with the above metaheuristics as LLHs.

Efficiency of Nine Pairs in MOHH
In this section, we compare nine pairs developed in MOHH; in other words, the SR, FRR-MAB (FM), CF, and QS are selected as the high-level selection strategies and GDA, LA, and AC-NDSCD are utilized as the high-level acceptance criteria.The EFFVER set is also used as instances to evaluate the performance of each pair, and the results obtained for the nine pairs are given in Tables 3 and 4. Looking at Table 3, QS/LA achieves five best mean IGD values, four second places, and two third places, and QS/AC-NDSCD performs well in 16 instances by achieving three best mean IGD values, five second places, and two third places.These two proposed approaches are the best two algorithms.For the third version of the proposed selection strategy, QS/GDA seems to be better than FM/GDA and worse than SR/GDA and CF/GDA.For all versions applying LA, QS/LA was the best one among SR/LA, FM/LA, and CF/LA.For the versions applying QS, QS/LA was also the best, and QS/AC-NDSCD was in the second place (which is also the second place in nine pairs), and QS/GDA was the worst among the three versions.However, the above conclusion on QS/LA and QS/AC-NDSCD seems to differ in Table 4, as the first place in nine pairs is QS/AC-NDSCD instead of QS/LA, and QS/LA is the second place in nine pairs.Meanwhile, the third version of QS (QS/GDA) also performed well in nine pairs and was better than all variants of GDA.For versions of QS, AC-NDSCD performed the best, and QS/LA was better than QS/GDA.
Therefore, the following conclusions can be made: (i) the selection strategy is one of the most significant parts in hyper-heuristics, and different selection strategies can, more or less, improve performance, but a poor selection strategy can pull down the performance of algorithms.The proposed selection strategy outperforms the others (SR, FM, and CF) combined with GDA and LA (except for QS/GDA in mean IGD values); and (ii) the acceptance criterion is also one of the most important components; both LA and AC-NDSCD perform well in most instances and were better than GDA, which was different from the conclusion of Maashi et al. [62].Therefore, the environmental selections are also the alternative acceptance criteria in the framework of hyper-heuristics.In the following experiments, we randomly selected one of the acceptance criteria (LA and AC-NDSCD) and utilized QS as the selection strategy.

Efficiency of the Proposed Mathematical Model
This section provides an evaluation of the efficiency of the proposed model by comparing the Pareto front of three models: RLCLRP/FCCE, RLCLRP/TD, and RLCLRP/TT.The first applies FCCE cost as the travel cost, while the second uses travel distance (TD) in kilometers as the travel cost, and the last one utilizes the travel time (TT) in minutes as the travel cost.To compare the Pareto front of each model, Pareto solutions of RLCLRP/TD and RLCLRP/TT were recalculated in the context of RLCLRP/FCCE to obtain comparable Pareto solutions; in other words, the proposed model was used to calculate the total cost (TC) and vehicle waiting time (VWT) for Pareto solutions obtained by RLCLRP/TD and RLCLRP/TT.The test instances were the same as the above set (EFFVER), and the mean IGD and HV values are listed in Table 5.
Looking at Table 5, the Pareto solutions of RLCLRP/FCCE dominate all Pareto solutions obtained by RLCLRP/TD and RLCLRP/TT in L20-3, L30-1, L30-2, L30-4, L40-2, and L50-1, and dominate most Pareto solutions obtained by RLCLRP/TD and RLCLRP/TT in other instances.Meanwhile, the mean IGD values of RLCLRP/FCCE are the smallest in each instance and are, respectively, reduced by 97.69% and 98.91% on average when compared with the mean IGD values obtained by RLCLRP/TD and RLCLRP/TT.The mean HV values of RLCLRP/FCCE are the largest for each instance (except for L20-2) and are, respectively, increased by 4.98% and 22.38% on average when compared with the mean HV values of RLCLRP/TD and RLCLRP/TT.
A question that arises when comparing the results of the three models is "why does the Pareto front of RLCLRP/FCCE not dominate all Pareto solutions of the other two in all instances?"Two of the principal reasons can be expressed as follows.The first is problem domain nature, namely clients' time windows.The rules of RLCLRP/TD and RLCLRP/TT use the minimum TD and TT to guide the vehicles to service clients instead of FCCE, and the result is that traveling time may differ from that of FCCE, and some clients may be not satisfied for RLCLRP/FCCE even if all three share the same routes.For example, a route v = {21,1,2,3,21} can be finished by models of the latter two but may not be met by RLCLRP/FCCE, because RLCLRP/FCCE selects the nondominated routes (travel distance and FCCE) [1], resulting in different travel times.Then, the second is that efficiency of the proposed algorithm is barely satisfactory to optimize true Pareto fronts of RLCLRP/FCCE.However, in our experience with optimization, the first explanation is the main issue.Figure 3 shows the Pareto fronts of the three models for tackling the EFFVER set.Most solutions of RLCLRP/TD and RLCLRP/TT are dominated by the Pareto fronts of RLCLRP/FCCE, although a few solutions coincide with those of RLCLRP/TD.Furthermore, it is undeniable that our proposed model is more suitable and better than RLCLRP/TD and RLCLRP/TT in addressing sustainable economy, society, and environment.

Evaluation of the Effect of the Domain Problem
In this section, we evaluate the effects of domain parameters (i.e., fleet composition and distribution of clients and depots) on TC, VWT, opening depot, assigned vehicle, FCCE cost, FC, CE, TD, and TT.Section 5.7.1 analyzes the joint effect of variations in depots and client locations, and Section 5.7.2 evaluates the effect of fleet composition.

Joint Effect of Depots and Client Distribution
As discussed above, three zones are partitioned, and a different speed is assigned to each zone.Therefore, the distribution of clients and depots may be one of the most significant factors affecting the key performance indicators of logistics networks.To solve this problem, we conducted a series of experiments to evaluate the joint effect of the distribution of clients and depots, as shown as in Figure 4, where CnDm is the representation of clients and depots distribution; C and D, respectively, represented clients and depots, and n and m, respectively, represent locations of clients and depots, with n, m∈{1,2,3}.

Evaluation of the Effect of the Domain Problem
In this section, we evaluate the effects of domain parameters (i.e., fleet composition and distribution of clients and depots) on TC, VWT, opening depot, assigned vehicle, FCCE cost, FC, CE, TD, and TT.Section 5.7.1 analyzes the joint effect of variations in depots and client locations, and Section 5.7.2 evaluates the effect of fleet composition.

Joint Effect of Depots and Client Distribution
As discussed above, three zones are partitioned, and a different speed is assigned to each zone.Therefore, the distribution of clients and depots may be one of the most significant factors affecting the key performance indicators of logistics networks.To solve this problem, we conducted a series of experiments to evaluate the joint effect of the distribution of clients and depots, as shown as in Figure 4, where CnDm is the representation of clients and depots distribution; C and D, respectively, represented clients and depots, and n and m, respectively, represent locations of clients and depots, with n, m∈{1,2,3}.
Figure 4 shows that Pareto solutions of C 1 D 3 can dominate the majority of other Pareto solutions, especially when TC is small.In the fronts of 20 clients, the dominating sequence is In the fronts of 30 clients, the dominating sequence is In the fronts of 50 clients, the dominating order is In Figure 5, the circled numbers indicate number of the clients' locations in the zone, and the number in squares is number of the depots' locations in the zone; for example, the first is the preference of 20 clients for depots in each zone, and the clients in Zone 1 prefer the depots in Zone 3 instead of those in Zone 1.Although the distance between clients and depots in Zone 1 is less than for others, as long as all clients' time windows are satisfied, the depots in Zone 3 are an optimal choice.The second is the preference of depots in each zone according to the zone's clients.For example, depots in Zone 1 are asked to service 20 clients, and clients in Zone 2 are the best choice for them, instead of clients in Zone 1, and clients in Zone 1 should be serviced by depots in Zone 3.Moreover, clients in Zone 3 should be serviced by depots in Zone 3, as a sequence of dominant symbol (A ≺ B, A dominates B).However, as the number of clients increases, the corresponding preference should be adaptively changed to obtain the best Pareto solutions.Therefore, the joint effect of the locations of both client and depot should be considered simultaneously.Figure 4 shows that Pareto solutions of C1D3 can dominate the majority of other Pareto solutions, especially when TC is small.In the fronts of 20 clients, the dominating sequence is C1D3 ≺ C2D3 ≺ C2D2 ≺ C3D3 ≺ C1D2 ≺ C3D2 ≺ C2D1 ≺ C1D1 ≺ C3D1.In the fronts of 30 clients, the dominating sequence is In the fronts of 50 clients, the dominating order is C1D3 ≺ C1D2 ≺ C1D1 ≺ C2D3 ≺ C2D2 ≺ C3D2 ≺ C3D3 ≺ C2D1 ≺ C3D1.In Figure 5, the circled numbers indicate number of the clients' locations in the zone, and the number in squares is number of the depots' locations in the zone; for example, the first is the preference of 20 clients for depots in each zone, and the clients in Zone 1 prefer the depots in Zone 3 instead of those in Zone 1.Although the distance between clients and depots in Zone 1 is less than for others, as long as all clients' time windows are satisfied, the depots in Zone 3 are an optimal choice.The second is the preference of depots in each zone according to the zone's clients.For example, depots in Zone 1 are asked to service 20 clients, and clients in Zone 2 are the best choice for them, instead of clients in Zone 1, and clients in Zone 1 should be serviced by depots in Zone 3.Moreover, clients in Zone 3 should be serviced by depots in Zone 3, as a sequence of dominant symbol (A ≺ B, A dominates B).However, as the number of clients increases, the corresponding preference should be adaptively changed to obtain the best Pareto solutions.Therefore, the joint effect of the locations of both client and depot should be considered simultaneously.Figure 4 shows that Pareto solutions of C1D3 can dominate the majority of other Pareto solutions, especially when TC is small.In the fronts of 20 clients, the dominating sequence is C1D3 In the fronts of 30 clients, the dominating sequence is 5, the circled numbers indicate number of the clients' locations in the zone, and the number in squares is number of the depots' locations in the zone; for example, the first is the preference of 20 clients for depots in each zone, and the clients in Zone 1 prefer the depots in Zone 3 instead of those in Zone 1.Although the distance between clients and depots in Zone 1 is less than for others, as long as all clients' time windows are satisfied, the depots in Zone 3 are an optimal choice.The second is the preference of depots in each zone according to the zone's clients.For example, depots in Zone 1 are asked to service 20 clients, and clients in Zone 2 are the best choice for them, instead of clients in Zone 1, and clients in Zone 1 should be serviced by depots in Zone 3.Moreover, clients in Zone 3 should be serviced by depots in Zone 3, as a sequence of dominant symbol (A ≺ B, A dominates B).However, as the number of clients increases, the corresponding preference should be adaptively changed to obtain the best Pareto solutions.Therefore, the joint effect of the locations of both client and depot should be considered simultaneously.Figure 6 analyzes the ratio of each part in terms of TC: depot, vehicle, and FCCE costs for each instance in LDC; and TD and TT are also calculated to detect whether or not they are affected by the distribution of depots and clients.The ratio of depot cost decreases as TC increases at the beginning for all nine pairs and {20,30,40,50} clients, indicating that Pareto solutions in this part are variations sharing the same set of depots; then the ratio of depot cost increases because new depots are added to the set of depots, but the ratio of fleet and FCCE cost decrease in the second and third columns.Fleet cost decreases as TC increases for all nine pairs and {20,30,40,50} clients, illustrating that fleet composition stays the same under all conditions, and fleet composition depends on the pickup and delivery demand and clients' time windows instead of the distribution of depots and clients, which can be explained by the following: TC increases with FCCE cost at the beginning, and continued decline is caused by adding new depots to the set of depots.Surprisingly, FCCE cost increases with TC at the beginning, while the ratio of depot cost and fleet cost decreases, which indicates that the increment of FCCE cost is the main reason for the increased TC because Pareto solutions in this part share the same depot set and fleet composition.However, if the set of depots is expanded by adding new depots, the FCCE cost keeps increasing, which demonstrates that adding new depots may not help reduce the FCCE cost in the context of a fixed set of depots.For TD, similar characteristics can be found in FCCE cost, indicating that a positive correlation exists between FCCE and TD, which explains some of the overlapping results exist in Figure 3.For TT, there seems to be no relationship with the distribution of clients and depots, as the biobjectives of the proposed problem are total cost and vehicle waiting time.
Therefore, from Figure 6, we see that speed zones have some influence on the performance indicators as indicators of each pair are different from those of others.Government and logistic enterprises may impose various rules on logistics network to optimize corresponding indicators for ameliorate the economic, social, and environmental effect.
increment of FCCE cost is the main reason for the increased TC because Pareto solutions in this part share the same depot set and fleet composition.However, if the set of depots is expanded by adding new depots, the FCCE cost keeps increasing, which demonstrates that adding new depots may not help reduce the FCCE cost in the context of a fixed set of depots.For TD, similar characteristics can be found in FCCE cost, indicating that a positive correlation exists between FCCE and TD, which explains some of the overlapping results exist in Figure 3.For TT, there seems to be no relationship with the distribution of clients and depots, as the biobjectives of the proposed problem are total cost and vehicle waiting time.
Therefore, from Figure 6, we see that speed zones have some influence on the performance indicators as indicators of each pair are different from those of others.Government and logistic enterprises may impose various rules on logistics network to optimize corresponding indicators for ameliorate the economic, social, and environmental effect.

Effect of Fleet Composition
The purpose of this section is to test the effect of fleet composition on the pivotal performance indicators of the logistics network.In the generated LFLEET suite, eight instances are used, and the results are listed in Table 6.In all instances, IGD values are equal to zero, and mean HV values are greater than those of others, indicating that Pareto solutions of HF dominate all Pareto solutions of others.Figure 7 presents the Pareto front of each instance with different fleet composition.In cases of

Effect of Fleet Composition
The purpose of this section is to test the effect of fleet composition on the pivotal performance indicators of the logistics network.In the generated LFLEET suite, eight instances are used, and the results are listed in Table 6.In all instances, IGD values are equal to zero, and mean HV values are greater than those of others, indicating that Pareto solutions of HF dominate all Pareto solutions of others.Figure 7 presents the Pareto front of each instance with different fleet composition.In cases of 20, 30, and 40 clients, the vehicle type L2 seems to be preferably selected.However, the vehicle type M is preferred in the context of larger cases with 30, 40 and 50 clients.Meanwhile, the fleet of type L1 may be abandoned in all instances.Therefore, logistics enterprises should rent or purchase vehicles of types L2 and M.            The results of this section indicate that the proposed algorithm can solve the problem effectively and outperforms other customized MOEAs and that the proposed model is more effective than the models of travel distance and time.Moreover, we also evaluated the effects of domain parameters on several performance indicators of the logistics network and provided some advice from managerial insight.

Conclusions
This paper concerned a variant of the location-routing problem, namely, the regional low-carbon location-routing problem considering a city in which goods need to be picked up and delivered from depots to clients located in nested zones characterized by different speed limits.Several real-world conditions were also taken into consideration: simultaneous pickup and delivery, hard time windows, and a heterogeneous fleet.To solve the proposed problem, a biobjective model was developed by viewing total logistics cost as the primary objective and vehicle waiting time as the secondary objective.The total logistics cost consists of three parts: depots cost, vehicles cost, and travel cost, with the latter defined as FCCE cost.A novel approach was designed to tackle the proposed problem: a quantum-inspired hyper-heuristic.
Extensively experiments were performed to (1) verify the efficiency of the proposed algorithm and model and ( 2) analyze the effects of domain parameters on the key performance indicators.The results show that our algorithm and model are efficient, providing the best results compared to other algorithms and models for benchmark instances.We also analyzed the effects of the domain parameters, such as depot cost and location, client distribution, and fleet composition, on the significant performance indicators, such as total logistic cost, vehicle waiting time, opened depot cost, fleet cost, fuel consumption and carbon emission cost, travel distance, and travel time.Moreover, several constructive suggestions are deduced from simulations to promote the sustainable development of the economy, society, and the environment.
Future works will explore new versions of the RLCLRP that merge the temporal dimension in traffic congestion, and other efficient estimations of fuel consumption and carbon emission will be considered.Additional novel selection strategies and acceptance criteria may also be developed.

30 Figure 1 .
Figure 1.Simple example of a solution.

Figure 1 .
Figure 1.Simple example of a solution.

Algorithm 1 .
General framework of multi-objective algorithm for LCLRP Sustainability 2019, 11, x FOR PEER REVIEW 10 of 30

Algorithm 2 .
Pseudo code of calculating rotation direction and angleSustainability 2019, 11, x FOR PEER REVIEW 13 of 30

Algorithm 4 .
Framework of the Quantum-inspired Hyper-heuristc

Figure 3 .
Figure 3. Pareto front of three models (i.e., location-routing problem considering environmental effects (LCLRP) with fuel consumption and carbon emission (FCCE), distance and traveling time).

Figure 3 .
Figure 3. Pareto front of three models (i.e., location-routing problem considering environmental effects (LCLRP) with fuel consumption and carbon emission (FCCE), distance and traveling time).

Figure 4 .Figure 5 .
Figure 4. Pareto front of different clients and depot distribution.

Figure 4 .
Figure 4. Pareto front of different clients and depot distribution.

Figure 4 .Figure 5 .
Figure 4. Pareto front of different clients and depot distribution.

Figure 5 .
Figure 5. Preference of each client and depot.

Figure 6 .
Figure 6.Ratio of each part in function 1 and total travel distance and time.

Figure 6 .
Figure 6.Ratio of each part in function 1 and total travel distance and time.

20, 30 ,
and 40 clients, the vehicle type L2 seems to be preferably selected.However, the vehicle type M is preferred in the context of larger cases with 30, 40 and 50 clients.Meanwhile, the fleet of type L1 may be abandoned in all instances.Therefore, logistics enterprises should rent or purchase vehicles of types L2 and M.

Figure 8 .
Figure 8.Effect of fleet composition on four indicators.

Figure 8
Figure8shows the effect of fleet composition on the key performance indicators: ratio of FCCE cost, CE in kilograms (or FC in Liters), TD, and TT.A conclusion can be reached: the fleet with HF can help effectively reduce the TC, VWT, FCCE cost, CE (FC), TD, and TT.

Sustainability 2019 ,
11,  x FOR PEER REVIEW 6 of 30 20, 30, and 40 clients, the vehicle type L2 seems to be preferably selected.However, the vehicle type M is preferred in the context of larger cases with 30, 40 and 50 clients.Meanwhile, the fleet of type L1 may be abandoned in all instances.Therefore, logistics enterprises should rent or purchase vehicles of types L2 and M.

Figure 8 .
Figure 8.Effect of fleet composition on four indicators.

Figure 8 .
Figure 8.Effect of fleet composition on four indicators.
Note: bold numbers are the minimum values; italic numbers are the second minimum values; Numbers with underline are the third minimum values.
Note: bold number is the minimum values; incline number is the second minimum values; Number with underline are the third minimum values.
Note: bold numbers are the minimum values; inclined number are the second minimum values; Numbers with underline are the third minimum values.

Table 4 .
Mean HV values of nine pairs of MOHH for EFFVER instances.Note: bold numbers are the minimum values; inclined number are the second minimum values; Numbers with underline are the third minimum values.

Table 5 .
Mean inverted generational distance (IGD) and HV values of three models for EFFVER instances.

Table 6 .
Effect of fleet composition on IGD and HV of the Pareto front.

Table 6 .
Effect of fleet composition on IGD and HV of the Pareto front.

Table 6 .
Effect of fleet composition on IGD and HV of the Pareto front.