An Effective Approach for the Multiobjective Regional Low-Carbon Location-Routing Problem

In this paper, we consider a variant of the location-routing problem (LRP), namely the the multiobjective regional low-carbon LRP (MORLCLRP). The MORLCLRP seeks to minimize service duration, client waiting time, and total costs, which includes carbon emission costs and total depot, vehicle, and travelling costs with respect to fuel consumption, and considers three practical constraints: simultaneous pickup and delivery, heterogeneous fleet, and hard time windows. We formulated a multiobjective mixed integer programming formulations for the problem under study. Due to the complexity of the proposed problem, a general framework, named the multiobjective hyper-heuristic approach (MOHH), was applied for obtaining Pareto-optimal solutions. Aiming at improving the performance of the proposed approach, four selection strategies and three acceptance criteria were developed as the high-level heuristic (HLH), and three multiobjective evolutionary algorithms (MOEAs) were designed as the low-level heuristics (LLHs). The performance of the proposed approach was tested for a set of different instances and comparative analyses were also conducted against eight domain-tailored MOEAs. The results showed that the proposed algorithm produced a high-quality Pareto set for most instances. Additionally, extensive analyses were also carried out to empirically assess the effects of domain-specific parameters (i.e., fleet composition, client and depot distribution, and zones area) on key performance indicators (i.e., hypervolume, inverted generated distance, and ratio of nondominated individuals). Several management insights are provided by analyzing the Pareto solutions.


Introduction
City logistics poses challenges to governments, businesses, carries, and citizens, particularly in the context of freight transport [1] in terms of three effects: economy, society, and the environment. Therefore, it requires new business operating models for addressing the above triple effects. Meanwhile, with increasing public demand for sustainable development and health, city logistics, as a primary source of carbon emissions (CE), should initiate reductions in CE during related activities. In the last few decades, many researchers have developed tools to optimize logistics networks, especially the location-routing problem (LRP), which includes location-allocation and vehicle-routing problems (VRP) [2]. However, the basic LRP model is only concerned with logistics costs and neglects the environmental and social effects of transportation.
Recently, the permutation and coordination of economic, social, and environmental benefits have emerged as one of the most addressed problem. The low-carbon LRP (LCLRP) concept introduced by Zhang et al. [3] focuses on depot locations and vehicle routes to minimize total CE from depots and Int. J. Environ. Res. Public Health 2019, 16, 2064 2 of 28 traveling activities. The regional LCLRP (RLCLRP), which was first proposed by Koc et al. [1], considers the distribution of clients and depots located in nested zones characterized by different speed limits. Leng et al. [2] further developed a single-objective model (SOM) for the RLCLRP, with consideration of a set of given real-world constraints. The above three papers all share the same feature, namely SOMs. In real-life, however, multiobjective problems (MOP) always play a role in determining trade-offs among multiple demands. Therefore, Leng et al. [4] developed a biobjective model for the RLCLRP and quantum-inspired selection to improve hyper-heuristic performance. Meanwhile, for the other variants of LRP, there were several studies devoted to the development of economic and environmental effects. For example, Pourhejazy et al. [5] considered the fuel consumption in a variant of LRP, and modeled a biobjective model for their problem.
With the development of economy and society, joint effects of logistics on economy, society, and environment should be considered simultaneously rather than separately. As the main performance indicators of logistics network, our proposed model is in tune with the above triple concerns. In other words, the first objective defined FCCE as a traveling cost (economic and environmental effect), delivery duration (i.e., total traveling time) as the second objective, and client satisfaction (social effect) (i.e., total client waiting time) as the third objective.
In real world, most logistics enterprises don't consider and analyze the joint effects in city logistics as they are only pursuing economic profit. Additionally, our study follows the basic idea of Leng et al. [4] and defines a multiobjective model (MOM) for the RLCLRP (MORLCLRP). To date, no research has been conducted using MOMs as a means to minimize logistics cost, service duration, and client waiting time, simultaneously. Furthermore, fewer studies have incorporated fuel consumption and carbon emission (FCCE) into basic LRP models, such as Pourhejazy et al. [5] and see Section 2.3. Indeed, most studies have failed to handle FCCE appropriately, in other words, the FCCE should be used as the traveling cost like [1][2][3] instead of an objective or constraints (see Section 2.2). The main contributions are as follows:

1.
We introduced a mixed-integer linear programming MOM for the RLCLRP. In the real-world logistics network, cost, service duration, and client satisfaction were considered the most significant performance indicators, and traveling cost incorporated FCCE cost.

2.
We developed an efficient MOHH to solve the MORLCLRP. For HLHs, we provided four selection strategies and three acceptance criteria to improve the performance of the MOHH framework and developed three MOEAs as the pool of LLHs. 3.
We conducted extensive computational experiments to assess the efficiency of the proposed algorithms and developed managerial implications by assessing problem parameters, such as client and depot locations, speed zone areas, and fleet composition. The model, algorithms and computational results can serve as a stepping-stone for further MORLCLRP research with cold chain logistics [6].
The paper is structured as follows: Section 2 provides a review of related literature; Section 3 describes the MORLCLRP with simultaneous pickup and delivery, heterogeneous fleet, and hard time windows; Section 4 gives a brief description of the hyper-heuristic framework together with a general MOEA structure for the MORLCLRP; Section 5 describes the computational experiments and simulated results; and, Section 6 outlines the study conclusions.

Literature Review
An important objective of the current research was to acquire a set of Pareto solutions for the MORLCLRP within the logistics network. We reviewed related literature based on models estimating FCCE, factors affecting FCCE, and the LCLRP and RLCLRP. As to the literature about the solution methods for the LRP and its variants, the reader is referred to the surveys of Pourhejazy et al. [7] and Drexl and Schneider [8].

Research about FCCE Models
Demir et al. [13] state that transportation activities can have damaging on environmental and public health by producing harmful emissions. Thus, accurate estimation models are required to measure and reduce FCCE during planning. To date, a variety of different analytical estimation models have been developed. According to Demir et al. [14] classified, three increasingly complex model types have been proposed, as described in Table 1: Table 1. Classification of models estimating FCCE.
Micro view (Micro) More detailed information, second level, especially vehicle speed.
Comprehensive model emission model (CMEM) [1,2,4,13,15,21]; comprehensive power-based fuel consumption models [35]; vehicle specific power model [36,37] The above three model groups are categorized by complexity. Demir et al. [14] also discussed load (power)-based and regression-based emission models. The former investigates vehicle gross weight and vehicle-specific parameters, such as fuel type, engine, and size. The latter considers prediction models by statistically analyzing the relationship between FCCE amount and its factors. For example, the Japanese government [38] indicated that travel distance per volume unit of fuel used is strongly correlated to vehicle gross weight, which forms the basis of the FCR model. Among the above three model types, FM/Micro is simplified/complicated version of Macro, and the nature of conversion is available between FM/Micro and Macro. Actually, from an instantaneous estimation perspective, FM is a part of Macro, because the parameters in FM, namely vehicle-specific and travel speed parameters, are negligible or constant. Therefore, the model types can be classified into non-instantaneous and instantaneous models. From the perspective of accuracy for estimating FCCE, Micro views are the best, followed by Macro, and then FM. To better understand FCCE models, please see Lin et al. [12], Demir et al. [14], and Demir et al. [13].

Research Concerning LCLRP
Environmental and public health require the sustainable development of the environment, economy, and society. Low-carbon supply chain network design has become an important area of research [39]. As an essential tool in the supply chain, the LRP should consider reduction of FCCE. Various researchers have analyzed and modeled the LCLRP and its variants: • Mohammadi et al. [40] proposed a biobjective LCLRP by optimizing logistics costs and traveling distance (i.e., low carbon/green objective). Although traveling distance is a significant factor in FCCE, other parameters, especially vehicle-specific parameters and load, should be considered in the model; • Govindan et al. [41] presented a more detailed model by combining traveling and fixed CE of depots and manufacturers, with several coefficient matrices (e.g., uniform distribution) used to consider the effect of vehicle and load. However, these coefficient matrices are generated with uniform distribution, which does not accurately reflect FCCE; This approach was also applied in Chen et al. [39] and Nakhjirkan and Rafiei [42]; • Validi et al. [43][44][45] used fuel efficiency and distance to calculate the FCCE, like Kuo et al. [17] and Poonthalir and Nadarajan [16]. Faraji and Afshar-Nadjafi [46] proposed a modified method considering the extra fuel consumption caused by carrying each extra loads. Tang et al. [47] applied the method to calculate routing CE by giving parameters for each edge, with the CE of depots/inventory also considered; • Qazvini et al. [48] proposed a SOM considering the cost of fuel consumption as a constraint. Although this method is effective, it is more appropriate to view the FCCE cost as a traveling cost in the real world; • Koc et al. [1] considered a city in which goods need to be delivered from a depot to clients located in nested zones characterized by different speed limits and used CMEM to estimate FCCE, which was considered a traveling cost. This followed Leng et al. [2], who studied an extensive version of the RLCLRP using a shared mechanism-based hyperheuristic. Leng [30] subsequently developed an ant-based hyperheuristic to solve the model by Zhang et al. [3]. Leng et al. [28] proposed an extensive version of the model by Zhang et al. [3], and a quantum-inspired hyperheuristic to solve it. Zhao et al. [29] also developed an integrated model for the LCLRP by defining the FCCE cost, and developed an evolutionary hyperheuristic to solve it; Qian et al. [52] modified the model by Zhang et al. [3] with a biobjective model, and used tabu search-based MOHH to solve it; • Wang et al. [27] developed a SOM using the FCR to estimate FCCE and as a part of costs.

Research about MOHH
Metaheuristics have been widely tailored for different domains. However, selection of the best algorithm and configuration of parameters and operators/pairs to solve problems can be difficult and time-consuming [53,54]. Moreover, even though a single tailored-search solution may achieve good performance, it is impossible to solve all cases with high efficiency. Therefore, the definition of hyperheuristic emerges while in such situations. Walker et al. [55] claimed that hyper-heuristic approaches are gaining renewed interest due to the development of more applicable search methods. The raw hyper-heuristic ideal was initially derived by Denzinger et al. [56], with the basic concept further developed by Cowling et al. [57] as "heuristics to select heuristics". Burke et al. [58] then proposed a comprehensive definition including (1) heuristic selection and (2) heuristic generation. The difference between heuristic selection and generation is recognized by the initial solutions; the second applies LLH to generate complete solutions beginning with empty solutions, whereas the first is initialized with complete solutions.
Metaheuristics are domain-specific, fine-tuned, and tailored-made rules-of-thumb that provide a set of mechanisms to search the domain space directly with the goal of obtaining optimal or near-optimal solutions. However, the purpose of hyperheuristic is to improve the level of generality and portability by selecting/generating appropriate operators (i.e., control operator space) to optimize rather than directly modify the domain-specific space. In other words, compared with customized methods, hyper-heuristic approaches are generally applicable to other problem domains as they contain no domain knowledge. Although Kumari and Srinivas [59] demonstrated that domain-specific, fine-tuned, and tailored-made methods will unquestionably outperform a general framework, hyperheuristics sharing the same/similar operators would also obtain good, possibly better solutions [2,[28][29][30]53].
The framework of the selection hyperheuristic involves HLH and LLH. The former searches the space formed by a set of LLHs, which directly optimize the space of domain-solutions [60]. Two decisions are defined in HLH: i.e., selection strategy and acceptance criterion. Selection strategy controls and monitors the performance of each LLH to intelligently choose the appropriate operators, with a single candidate solution then generated. After that, the new solution is either accepted or rejected. For LLH, all domain-specific knowledge is provided, including encoding, decoding, chromosome, and heuristics information, such as operators, crossovers, mutations, local searches, or metaheuristics [60]. It is worth noting that, a domain-barrier is developed to prevent domain knowledge from LLH to HLH; however, HLH is allowed to access problem domain-independent information, such as number of operators, count/time of applied operators, fitness rate improvement, and fitness/objective values.
To put the selection strategy into perspective, three main types have been observed according to the source of feedback information: (i) online-learning, which occurs when a hyperheuristic is performed and includes choice function (CF) [2,4,28,57], multi-armed bandit [2,4,28,[53][54][55], ant-based selection [29,30], and quantum-inspired selection (QS) [4,28,29]; (ii) offline-learning, which applies rules or programs to gather information by training a set of instances, including learning classifier systems, case-based reasoning, and genetic programming; and (iii) no-learning, which does not use any feedback information from search space, including simple random (SR), random descent, and random permutation. In contrast, there are two main types of acceptance criterion: (i) deterministic acceptance, which applies a 0-1 method-based strategy, including all moves (AM), improving and equal, and only improving; and (ii) nondeterministic acceptance, which calculates acceptance probability to judge the acceptance of new solutions, including simulated annealing, great deluge acceptance (GDA) [61], and late acceptance (LA) [61]. For a comprehensive understanding of hyperheuristics, please see Chakhlevitch and Cowling et al. [62], Burke et al. [58,63].
MOHHs have been successfully developed in both continuous and combinational optimization problems ("domain-problem" column in Table 2). To the best of our knowledge, only our previous studies have applied MOHHs to tackle the MORLCLRP, that is, Qian et al. [52] and Leng et al. [4]. Regretful, no other researcher has yet applied MOHHs to solve MOMs of the LRP, or VRP. The reasons for this may be as follows: (i) compared with other domains, much more time and effort are needed for programing effective LLHs of the MORLCLRP; (ii) greater computing time is required to execute the MORLCLRP algorithms; and (iii) the MORLCLRP is a fairly complex NP-hard problem, which means that it can be difficult to achieve a good Pareto set. Therefore, corresponding operators and metaheuristics for the MORLCLRP are urgently needed.

Mathematical Model
The applied model of estimating FCCE amount is the CMEM, and the corresponding parameters in the CMEM can be obtained from Ref. [1,2]. The descriptions and assumptions of domain problem are given in Section 3.1; Section 3.2 conducts the corresponding mathematical formulation and the necessary constraints; finally, we also provide some valid but unnecessary restrictions on the MORLCLRP in Section 3.3.

Description and Assumption of MORLCLRP
The MORLCLRP can be described on a complete and directed graph Ω = (N, E) with a vertex set N and an edge set of E. N consists of a subset of D of N d candidate depots and a subset of C = N\D of N c clients. Each client c∈C has a non-negative pickup demand p c and delivery demand d c , to be served exactly once, and is assigned to a single depot d∈D with capacity w d and renting cost FD d . The shipment of clients demand from the chosen depot is carried out by an unlimited set V = {v 1 , v 2 , v 3 } of heterogeneous vehicles with capacity Q v and one-time renting cost FV v ; and traveling fuel consumption amount W ijv of each edge (i, j)∈E = {(i, j): i, j∈N, i j}\{(i, j): i,j∈D} is calculated by the CMEM model, and it is worth noting that the distance of each edge (i, j)∈E are obtained by taxicab geometry (see Koc et al. [1] and Krause [84]).
The following are the hypotheses made: (1) each client is visited only once by a single vehicle and depot; (2) each vehicle must return the original depot from where it departs; (3) the load of each vehicle must be less than its vehicle capacity in service time; (4) the depot must serve the clients assigned to it; and (5) the vehicle must arrive at each node before the closing time windows.

Mathematical Model
To formulate the problem, we define the following additional decision variables. Let x ijv be equal to 1 if a vehicle of type v∈V travels on edge (i, j)∈E and to 0 otherwise; let y j be equal to 1 if a depot j∈D is selected and to 0 otherwise; let z ij is equal to 1 if client i∈C is serviced by depot j∈D and to 0 otherwise. A multiobjective formulation of MORLCLRP is given by: where f FC and f CE are the unit price of 1 L of fuel and 1 kg of CE, respectively; AT j is the time which service starts at node j∈N. Objective (1) is to minimize the total costs consisting of opened depots, vehicle, FC, and CE costs, corresponding to the economic and environmental effects, and the reason of simultaneously using FC and CE is that they represent the costs of consuming fuel and purchasing carbon emission from carbon trading market. Objective (2) aims at optimizing the total service time, corresponding to the long-term benefit. And the clients' satisfaction is modeled by the total client waiting time in objective (3), corresponding to the social interest. The constraints (4-23) are the necessary restrictions for the MORLCLRP.
The following two are degree restrictions, in particular constraint (4) makes sure that each client must be visited only once, and constraint (5) ensures that entering and leaving arcs to each node are equal: Constraints ((6) and (7)) ensure that each client must be visited by a single depot and vehicle: Restrictions ((8)-(10)) define illegal routes, i.e., each vehicle must return back to the original departure depot: The above three restrictions have been proved by Karaoglan et al. [85]. Constraints (11) ensures that total demand supplied/picked up by a depot cannot exceed its capacity; constraints (12) and (13) are the extra bounds about the load when all vehicles depart from and return back to the depot: where the L ijv is the dynamic load of a vehicle of type v∈V traveling over an edge (i,j)∈E. The dynamic load of each vehicle must not exceed the its capacity, which is ensured by constraint (14); the relaxed (if (15); constraint (16) implies that the delivery and pickup demand of each client is satisfied; constraints (17) and (18) indicate that the load of vehicle of type v∈V must be equal to the total delivery/pickup demand when departs from/return back to the depot: The next two are used to restrain the vehicles' behavior according to clients' time windows: where ST i corresponds to the service time of node i∈C; TT ij is the travel time over an edge (i,j)∈E. (e i , l i ) represents the time windows of client i∈C. Looking at constraint (19), if a vehicle arrives at client i∈C before time e i , it waits until e i to start to serve the client. Constraint (20) is designed for assumption (5). The next inequality is to restrict that opened depots must serve clients: Constraint (21) ensures that an unselected depot mustn't service the clients; constraint (22) makes sure that an opened depot must serve clients. The above constraints are significant for the proposed problem, and we introduce some extra valid restrictions but not necessary in Section 3.3.

Other Valid Constraints
We introduce several polynomial size valid inequalities for our problem, which were used by several studies for VRPs/LRPs. The first inequality is the subtour elimination: Constraint (23) derived from Koc et al. [22] are only suitable for the directed graph with specific constraints, such as time windows and simultaneous pickup and delivery, instead of the basic LRP and heterogonous fleet. The next inequality is to restrict the opened depots and vehicles: i∈C v∈V Constraint (22) ensures that an unselected depot mustn't service the clients; constraint (23) makes sure that an opened depot must serve client; constraint (24) guarantees that an unselected depot cannot assign vehicles; constraint (25) imposes that an opened depot must have vehicles to serve clients. The third restriction is introduced to the limitation of vehicles: Constraint (26) provides a lower and upper bound on the number of vehicles originating from depots, and • is the smallest integer larger than •. The fourth valid inequity is the number of opened depots: Constraint (27) states that the total capacity of opened depots is larger than the maximum demand. However, this restriction is not necessary, since sometimes if it is met, but it cannot provide restriction. The classical example is the Perl83-55×15 in Barreto et al. [86].

Proposed Method
This section describes the proposed MOHH from two aspects: (1) domain-specific level and (2) high-level strategy. Section 4.1 introduces the necessary chromosome representation, a general framework for MOEAs as LLHs, and practical operators. Section 4.2 provides the MOHH framework for solving the proposed MORLCLRP. Section 4.3 discusses the four selection strategies. Section 4.4 discusses the three acceptance criteria.

Chromosome Representation
In the proposed algorithm, the chromosome represents a complete solution, i.e., a collection of routes. Each route is stored in the cell array, i.e., F = {f 1 ,f 2 , . . . ,f k }, where f i is a complete vehicle route with opened depots inserted at its two ends. We also provide related information on the effects of vehicle and traveling on the objectives, such as starting/returning load, type of vehicle, traveling cost, time, and client waiting time in the attribute array, similarly to the solution representation proposed by Leng et al. [4].
Furthermore, a population-based search is used in our framework, so individuals in the population are constructed by randomly selecting clients to form a "super-client" (i.e., set of clients) assigned to each vehicle. Each vehicle is randomly assigned to the depot if only constraints (4)-(23) are met.

Applied Operators
In this section, we provide domain-specific operators, as used by Leng et al. [2] for SOMs and Leng et al. [4] for biobjective models. Here, we first study the MORLCLRP by minimizing total cost, service duration, and client waiting time. Therefore, corresponding modifications are conducted in the applied operators by calculating the three objectives. However, we also follow the classification of Leng et al. [4], i.e., mutational operator (Mu), nondominated local search (NDLS), and dominated local search (DLS). Mu provides randomness to avoid local optima, NDLS produces many nondominated solutions, and DLS accelerates the process of searching approximate Pareto solutions by finding the solutions dominating parents. For iterations, DLS can provide high performance in short-term iterations; Mu plays a role in escaping local optima in medium-term iterations; and, NDLS performs best for nondominated solutions in long-term iterations. The corresponding design can be seen in Leng et al. [4] In each generation, a Mu is randomly selected for the chosen individual (if random < p m ) to obtain a child solution, then a local search operator is randomly chosen to optimize the obtained solution, which is merged into the parent population. Afterwards, if the size of the merged population is larger than N, the elitism selection strategy is applied to survive the best N individuals for the next generation. The algorithm ends when the main loop stops, returning the survived Pop. if random < p m then 6: Mu: randomly choose a mutational operator 7: Obtain a child C 8: end if // Local search 9: Local search: randomly select an operator from NDLS 10: and DLS to optimize Child C/Parent P (if random > pm) 11: Obtain a new solution CC For each main loop iteration, a promising MOEA is chosen by one selection strategy (i.e., SR, MAB, CF, and QS) to transform the solution space of the proposed problem, and Cpop is obtained. Afterwards, an acceptance criterion (i.e., AM, GDA, and LA) is performed to accept Cpop if the requirements of the selection strategy are met. The algorithm ends when the main loop stops, returning the approximate Pareto solutions. We also provide an archiving strategy by storing 5 × N nondominated solutions where the nondominated sorting and crowded distance in NSGA-II [87] are used. An overview of the pseudocode of the proposed MOHH is given in Algorithm 2.
The main differences between the algorithm in this study and that of our previous work could be identified that the credit assignment (reward value) in this paper concerns D-mtrix (D), generational distance (GD), and population dominance (PD) instead of hypervolume (HV), PD, and Spacing. The main reason is that the difference between parent and child population can be identified by the first three. HV and Spacing only represent the characteristics of the child population.

Heuristic Selection Strategies
In this study, well-known and modified heuristic selection approaches are used as selection MOHH components for choosing the appropriate MOEA at each iteration/stage.

Simple Random
For SR, a random LLH (MOEA) is simply chosen at each iteration. It is usually used as a baseline for comparison against the learning hyper-heuristic methods.

Fitness Rate Rank Based Multi-Armed Bandit (FRR-MAB)
FRR-MAB is an upper confidence bound algorithm. In MAB, the key for a successful algorithm is to find a good trade-off between exploration and exploitation, i.e., the best LLH (exploitation) or other LLH (exploration). In this mechanism, there exists a credit assignment and operator selection.
In the process of FRR-MAB for MOM, the population dominance relationship is applied to calculate the reward of each LLH. Afterwards, the reward is stored in a sliding window, organized as a first-in-first-out mechanism. This is because, in dynamic environments, the performance of an operator in a very early stage may be irrelevant to its current performance [54]. Normalization is then applied to calculate the empirical quality estimate, and a confidence interval is obtained by the usage of operators. Finally, the credit value of each LLH is achieved. The LLH with a maximum credit value is selected for the next iteration.

Choice Function
The CF heuristic selection method was first used by Maashi et al. [64] to tackle a series of continuous optimization benchmarks. In their CF, a two-stage ranking scheme calculates performance value, with CPU seconds elapsed since the last call to a low-level (meta) heuristic applied as the confidence internal. The final score (or value) for a given MOEA: where, α is a positive parameter; H is the pool of LLHs; llh is the index of an MOEA; f 1 /f 2 is the performance value(intensification)/elapsed CPU time (in seconds) since the time llh was last called (diversification). In this paper, we use a simple method to calculate the reward value of llh at t iteration using D-matrix, generational distance (GD), and population dominance (PD): where, D is the difference in covered area size between parent front and child front; GD is the generational distance; and PD is the nondominated reward in FRR-MAB. The LLH with the highest CF value is preferentially selected.

Quantum-Inspired Selection
The QS heuristic selection method proposed in our previous work [4] uses a qubit chromosome as a probabilistic representation instead of binary and numeric ones. Population dominance was used to judge the angle direction, and a comprehensive function that merges PD, HV, and spacing was used to reward the promising llh. A q-gate was introduced to update the state of each qubit in individual and heuristic space. The β value of each LLH reflected the exploitation of an algorithm, and probability matching (roulette selection) was used to explore the solution spaces to avoid local optima. In this paper, for fair comparison, we use the same reward function in CF to reward the promising llh. For the corresponding design of QS, please see Leng et al. [4].

Move Acceptance Methods
Move acceptance methods play a key role in MOHHs by deciding to accept or reject candidate solutions produced by the chosen LLH. There are several simple and elaborate acceptance strategies (AS) used as part of MOHHs. Maashi et al. [64] initially applied AM as an AS and after developed two much more efficient ASs: i.e., GDA and LA [61] which use D-matrix to provide the dominant relationship between parent and child populations. These ASs are usually used for MOHH-II modules, which utilize MOEAs as LLHs. Elitism selection strategy can also be ASs, known as best acceptance. Leng et al. [4] and Li et al. [60,83] used nondominated sorting and crowding distance derived from NSGA-II [87] to select promising individuals for the next iteration. In this paper, we consider three well-known ASs: i.e., AM, GDA, and LA, as they performed well in Li et al. [60], Maashi et al. [61,64], and Leng et al. [4]. It is worth noting that the first two performs after normalizing the dimension of three objectives in this paper.

Implementation Aspects and Configuration of Parameters
The MOHH was coded in parallel in MATLAB 2018a (v9.4.0.813654) using a 4.0 GHz Intel Core i7-6700K with 12 GB of RAM and running Windows 10; it is embedded in the CLOR tool implemented by the MATLAB platform (available via email to the authors).
We followed the parameter configurations suggested by Leng et al. [4]. Here, the maximum iteration (max iter ) was 100 for the outside loop. The maximum number of generations depended on the size of each instance: where N c (N d ) is the number of clients (depots); and V max is the maximum number of vehicles with minimum capacity. Multiplier α depended on the size of the instance and objectives, and was 45 for 20 clients, 57 for 30 clients, 68 for 40 clients, and 80 for 50 clients. The size of the population and archiving population were set at 100 and 500, respectively. The mutation rate p m was set to default (0.35), as per Leng et al. [4]. For parameters in the acceptance criteria, we applied the default values from Maashi et al. [61] and Leng et al. [4], i.e., rain speed (S = 3 × 10 −4 ) and C length (L = 5).
For the test suite, we applied the instances used by Leng et al. [4] for the biobjective RLCLRP. We generated other sets to analyze the effects of each speed zone area on key performance indicators, as described in Section 5.2.

Performance Metrics
We utilized three well-known performance metrics to evaluate the performance of algorithms and problems by performing ten runs: inverted generated distance (IGD), HV, and the ratio of nondominated individuals (RNI).
The IGD describes the quality and uniformity of an approximate Pareto front (AF) by measuring the distance between AF and real Pareto front (PF), the smaller the IGD value, the better the distribution and convergence. The HV measures the covered space size between the AF and a reference point. The larger the HV value, the better the diversity and distribution. The last one is simply measured by putting together the nondominated solutions found by algorithms and the ratios between non-dominated solutions are reported. The larger the RNI, the better quality of AF.
However, as the MORLCLRP was first solved in this paper, if PF was unknown, we used all AF to form PF. The set PF was, in fact, an approximation of the real front. It is worth noting that the first two metrics were calculated after normalization.
The first hypothesis of this paper is that the use of hyperheuristics leads to better results than traditional MOEAs. The second is to analyze the effects of domain-specific parameters on the above three quality indicators. Considering these, our experiments were guided by the following research questions: (1) How do the results produced by the MOHH compare with traditional algorithms? and (2) what are the behaviors of the problem parameters affecting quality indicators?

Efficiency of MOHH
Although Kumari and Srinivas [59] stated that the efficiency of general framework methods is inferior to problem-specific solvers incorporating domain-specific knowledge and fine-tuned, tailor-made strategies, we compared our proposed MOHH and other MOEAs by analyzing the IGD and HV values by solving different instances. Tables 3 and 4 provide the mean IGD and HV values obtained by the eight MOEAs and two MOHHs with different LLHs, as mentioned above. Note that values in Tables 3 and 4 is multiplied 100 to save space. Table 3, only SPEA2 and BiGE obtained the best mean IGD for L20-1 and L50-1, respectively. Similarly, although both obtained four third places, SPEA2 outperformed BiGE as the number of third places was larger than that for BiGE. As also seen in Table 3, QS1 using SPEA2, BiGE, and NSLS as LLHs achieved ten best IGD means compared with the first eight MOEAs and QS2 using NSGAII, NSGA-III, and PEAS-II as LLHs. Moreover, QS2 showed inferior performance to SPEA2 and BiGE, but was superior to others. Additionally, the MOHHs using archive methods outperformed the others as the mean IGD was the lowest among all algorithms, QS1a obtained nine best mean IGDs, whereas QS2a only achieved three.    Table 4 displays the mean HV for all instances. Similarly, the performances of BiGE and SPEA2 were the best among the first eight MOEAs, especially SPEA2, which achieved two first places, three second places, and six third places. NSGAII also obtained one first place. Looking at the performance using the hyper-heuristic framework, QS1 claimed first place by achieving eight first places, three second places, and one third place in 12 instances; QS2 showed lower performance achieving only three second places and two third places. The archiving method thus improved the performance of MOEAs as stated in Zhang et al. [95]. Figure 1 is a boxplot of hypervolumes obtained by performing 12 MOEAs. The following conclusions were drawn: (1) A single MOEA cannot outperform all other MOEAs in all instances, e.g., BiGE for L50-1 and L50-3 and SPEA2 for L50-3; (2) BiGE, SPEA2, and NSLS were superior to others, and NSGA-II, NSGA-III, and PEAS-II outperformed GrEA and IBEA in most instances. We used the scoring system [96] to classify the first eight MOEAs, and found SPEA2(213) ≺ BiGE(180) ≺ NSLS(146) ≺ PEAS(93) ≺ NSGAII(92) ≺ NSGAIII(87) ≺ GrEA(79) ≺ IBEA (46), where A(a) ≺ B(b) indicates that the score a of A is larger than the score b of B and thus A dominates B. Therefore, QS1 used SPEA2, BiGE, and NSLS as LLHs and QS2 used PEAS, NSGAII, and NSGAIII as LLHs. (3) The use of hyperheuristics (i.e., QS1) outperformed others without the hyper-heuristic framework, however, QS2 was inferior to the top three MOEAs, but outperformed the other five MOEAs, suggesting that the configuration of the LLH pool is extremely significant in designing MOHHs. (4) The archiving method can significantly improve the performance of MOEAs, as shown in Tables 3 and 4, and Figure 1; therefore, the following experiments (except Section 5.4) used the archiving method to obtain better Pareto solutions. In brief, the hyperheuristic in this paper effectively improved performance of MOEAs by assimilating the essence and rejecting the dross of MOEAs.

As shown in
second places, and one third place in 12 instances; QS2 showed lower performance achieving only three second places and two third places. The archiving method thus improved the performance of MOEAs as stated in Zhang et al. [95]. Figure 1 is a boxplot of hypervolumes obtained by performing 12 MOEAs. The following conclusions were drawn: (1) A single MOEA cannot outperform all other MOEAs in all instances, e.g., BiGE for L50-1 and L50-3 and SPEA2 for L50-3; (2) BiGE, SPEA2, and NSLS were superior to others, and NSGA-II, NSGA-III, and PEAS-II outperformed GrEA and IBEA in most instances. We used the scoring system [96] to classify the first eight MOEAs, and found SPEA2(213) ≺ BiGE(180) ≺ NSLS(146) ≺ PEAS(93) ≺ NSGAII(92) ≺ NSGAIII(87) ≺ GrEA(79) ≺ IBEA (46), where A(a) ≺ B(b) indicates that the score a of A is larger than the score b of B and thus A dominates B. Therefore, QS1 used SPEA2, BiGE, and NSLS as LLHs and QS2 used PEAS, NSGAII, and NSGAIII as LLHs. (3) The use of hyperheuristics (i.e., QS1) outperformed others without the hyper-heuristic framework, however, QS2 was inferior to the top three MOEAs, but outperformed the other five MOEAs, suggesting that the configuration of the LLH pool is extremely significant in designing MOHHs. (4) The archiving method can significantly improve the performance of MOEAs, as shown in Tables 3 and 4, and Figure 1; therefore, the following experiments (except Section 5.4) used the archiving method to obtain better Pareto solutions. In brief, the hyperheuristic in this paper effectively improved performance of MOEAs by assimilating the essence and rejecting the dross of MOEAs.

Efficiency of Pairs in MOHH
This section compares the performance of each pair described in Section 4.2 and 4.3. Four selection strategies and three acceptance criteria formed 12 pairs, which were used to optimize the randomly generated instances. Figures 2 and 3 are boxplots of the IGD and HV values, respectively. For an easy plot, we used the logogram of GDA (GD) and FRR-MAB (FM).

Efficiency of Pairs in MOHH
This section compares the performance of each pair described in Section 4.2 and 4.3. Four selection strategies and three acceptance criteria formed 12 pairs, which were used to optimize the randomly generated instances. Figures 2 and 3 are boxplots of the IGD and HV values, respectively. For an easy plot, we used the logogram of GDA (GD) and FRR-MAB (FM).
As shown in Figure 2, the performance of GDA was inferior to the others, even though they shared the same SSs. The performance of LA was superior to that of AM in most instances when sharing the same SS. However, different SSs sharing the same AC demonstrated different performances. For example, QSLA outperformed the others in L202, L401. A similar pattern for all pairs can be seen in Figure 3. As comparing the performance of all pairs can be difficult, we used a scoring system (see Section 5.3), with results shown in Table 5. QSLA, SRLA, QSAM, FMLA, and CFAM, were the top five pairs. The score of each SS with GDA was equal to zero, indicating that GDA was the worst AC. Conversely, LA was the best AC, and was 20 scores ahead compared with AM. The order of performance was QSLA > SRLA > QSAM > FMLA > CFAM > SRAM > FMAM > CFLA. The order of performance of AC when neglecting SS was LA > AM > GDA. The order of performance of the four SSs when neglecting AC was QS > CF > SR > FM. In the following experiments, we randomly selected one of the top three pairs in each run, i.e., QSLA, SRLA, and QSAM, to analyze the effects of domain-specific parameters on IGD and HV values.  As shown in Figure 2, the performance of GDA was inferior to the others, even though they shared the same SSs. The performance of LA was superior to that of AM in most instances when sharing the same SS. However, different SSs sharing the same AC demonstrated different performances. For example, QSLA outperformed the others in L202, L401. A similar pattern for all pairs can be seen in Figure 3. As comparing the performance of all pairs can be difficult, we used a scoring system (see Section 5.3), with results shown in Table 5. QSLA, SRLA, QSAM, FMLA, and CFAM, were the top five pairs. The score of each SS with GDA was equal to zero, indicating that  As shown in Figure 2, the performance of GDA was inferior to the others, even though they shared the same SSs. The performance of LA was superior to that of AM in most instances when sharing the same SS. However, different SSs sharing the same AC demonstrated different performances. For example, QSLA outperformed the others in L202, L401. A similar pattern for all pairs can be seen in Figure 3. As comparing the performance of all pairs can be difficult, we used a scoring system (see Section 5.3), with results shown in Table 5. QSLA, SRLA, QSAM, FMLA, and CFAM, were the top five pairs. The score of each SS with GDA was equal to zero, indicating that

Effect of Fleet Composition
In this section, we analyzed the effects of fleet composition on Pareto front using IGD, HV, and RNI indicators. The instance sets were the same as in Section 5.3. Table 8 shows the mean IGD, HV, and RNI values for each pair. The "HF" represents heterogonous fleet. Note that values in Table 8 is multiplied 100 to save space. As shown in Table 8, the mean IGD, HV, and RNI values of HF for each instance were better than those of L1, L2, and M. The "real" Pareto front of each instance was the same as fronts of HF. As vehicle parameters were different, the preference of distribution in each instance was also different. For example, type L2 vehicles were preferred in L20, L30, L402, and L503, whereas type M vehicles were preferred in L401, L403, L501, and L502. Therefore, as the size of instance increases, a type M vehicle will be preferred. However, the total RNI indicator of L1, L2, and M was 48.8%, lower than 50%, which demonstrates that a composited HF can reduce logistics costs, service duration, fuel consumption, carbon emissions, and client satisfaction. From the perspectives of IGD and HV values, the Pareto front of HF dominated the fronts of L1, L2, and M after removing the duplicating individuals. Figure 4 is a boxplot of IGD, HV, and RNI for each fleet composition.

Effect of Zones Area
Here, we estimated the effects of speed zone area on IGD, HV, and RNI values. The instances were randomly generated. Table 9 lists the IGD, HV, and RNI of each instance, where EQ/Do/TF/FF/ORI represent equal/two-fold/three-fold/four-fold/original from Koc et al. [1]. Note that values in Table 9 is multiplied 100 to save space. Figure 5 is a boxplot of IGD, HV, and RNI values for each area ratio. As seen in Table 9, as the ratio of the zone area increased, performance also increased, especially FF. The ratio of FF obtained the best performance, except for IGD/RNI values of L501 and HV values of L401, L501, and L502. The ratio of the ORI area zone was zone1 (9%): zone 2 (25%): zone 3 (64%). An ideal ratio for speed zone area for specific instances was shown to exist. As shown in Table 9, ORI outperformed FF in L501, which indicates that the best ratio may exist between ORI and FF; however, the best ratio is extremely hard to find as infinite pairs exist and it may change with the nature of instances. Table 9. Mean IGD, HV, and RNI values of each area ratio.

Effect of Zones Area
Here, we estimated the effects of speed zone area on IGD, HV, and RNI values. The instances were randomly generated. Table 9 lists the IGD, HV, and RNI of each instance, where EQ/Do/TF/FF/ORI represent equal/two-fold/three-fold/four-fold/original from Koc et al. [1]. Note that values in Table 9 is multiplied 100 to save space.  Figure 5 is a boxplot of IGD, HV, and RNI values for each area ratio. As seen in Table 9, as the ratio of the zone area increased, performance also increased, especially FF. The ratio of FF obtained the best performance, except for IGD/RNI values of L501 and HV values of L401, L501, and L502. The ratio of the ORI area zone was zone1 (9%): zone 2 (25%): zone 3 (64%). An ideal ratio for speed zone area for specific instances was shown to exist. As shown in Table 9, ORI outperformed FF in L501, which indicates that the best ratio may exist between ORI and FF; however, the best ratio is extremely hard to find as infinite pairs exist and it may change with the nature of instances. that values in Table 9 is multiplied 100 to save space. Figure 5 is a boxplot of IGD, HV, and RNI values for each area ratio. As seen in Table 9, as the ratio of the zone area increased, performance also increased, especially FF. The ratio of FF obtained the best performance, except for IGD/RNI values of L501 and HV values of L401, L501, and L502. The ratio of the ORI area zone was zone1 (9%): zone 2 (25%): zone 3 (64%). An ideal ratio for speed zone area for specific instances was shown to exist. As shown in Table 9, ORI outperformed FF in L501, which indicates that the best ratio may exist between ORI and FF; however, the best ratio is extremely hard to find as infinite pairs exist and it may change with the nature of instances.

Management Implications
From a management point of view, several management implications can be obtained from the results. Section 5.5 analyzed the joint effects of depot and client location. The depot and client preference in each zone are shown in Figure 6. From the perspective of depots in each zone, the clients in zone 1 were preferred for the DR, indicating that the randomly located depots (zones 1, 2, and 3) should be given preference to serve the clients in zone 1 instead of other zones. All depots located in each zone preferred to serve clients in zone 1, indicating that the depots in D2 and D3 preferred to serve clients in C1 instead of C2 and C3, respectively. From the perspectives of clients, the first three groups (i.e., CR, C1, and C2) preferred DR for service rather than the depots in D1 and D2. However, the clients in zone 3 preferred the depots in zone 3 instead of DR. Moreover, the clients in CR, C1, C2, and C3 preferred depots in D3, D2, D3, and DR, respectively. The worst allocation strategy was clients in zone 3 served by depots in zone 1. The priority was: C1DR < C1D2 < C1D1 < C1D3 < C2DR < C2D3 < C2D2 < C3D3 < C2D1 < C3DR < CRDR < CRD3 < CRD2 < C3D2 < CRD1 < C3D1. The main reasons for these results are: (1) Zone-specific parameters: i.e., speed and geographical factors. Each zone had a different vehicle speed affecting FCCE amount, and the size of each zone was also another affecting parameter; (2) Domain-specific parameters. The cost of depots in each zone was the most significant factor determining the selection of depots to serve clients. Therefore, logistics companies should analyze the effect of client and depot distribution on Pareto solutions, as the calculation of costs, service duration, fuel consumption, carbon emission, and client satisfaction are based on client and depot locations. a different vehicle speed affecting FCCE amount, and the size of each zone was also another affecting parameter; (2) Domain-specific parameters. The cost of depots in each zone was the most significant factor determining the selection of depots to serve clients. Therefore, logistics companies should analyze the effect of client and depot distribution on Pareto solutions, as the calculation of costs, service duration, fuel consumption, carbon emission, and client satisfaction are based on client and depot locations. Section 5.6 analyzed the effects of fleet composition. From the results, we strongly conclude that the fleet composition consisted of HF can provided reduction in logistics costs, service duration, fuel consumption, carbon emission, and client satisfaction. Section 5.7 analyzed the effects of speed zone area, and we found that a best ratio range may exists for the speed zones and should be analyzed and determined, which is important for city planning decided by the economy and population development. Moreover, it is important for logistics enterprises to decide whether or not to authorize other logistics (such as local logistics companies) with a low entrust cost instead of building or renting the facilities (depots and vehicles) with a higher fixed cost.
In the experiments, we analyzed the efficiency of the proposed general framework, i.e., MOHH. Results showed that the proposed algorithms effectively tackled the proposed problem and outperformed several well-known MOEAs. Moreover, we analyzed the effects of domain-specific parameters, such as depot and client locations, fleet composition, and speed zone area on the Pareto solution indicators, i.e., IGD, HV, and RNI. We also provided several management and service suggestions to help reduce total costs (including FCCE cost) and service duration, as well as increase client satisfaction. Section 5.6 analyzed the effects of fleet composition. From the results, we strongly conclude that the fleet composition consisted of HF can provided reduction in logistics costs, service duration, fuel consumption, carbon emission, and client satisfaction. Section 5.7 analyzed the effects of speed zone area, and we found that a best ratio range may exists for the speed zones and should be analyzed and determined, which is important for city planning decided by the economy and population development. Moreover, it is important for logistics enterprises to decide whether or not to authorize other logistics (such as local logistics companies) with a low entrust cost instead of building or renting the facilities (depots and vehicles) with a higher fixed cost.
In the experiments, we analyzed the efficiency of the proposed general framework, i.e., MOHH. Results showed that the proposed algorithms effectively tackled the proposed problem and outperformed several well-known MOEAs. Moreover, we analyzed the effects of domain-specific parameters, such as depot and client locations, fleet composition, and speed zone area on the Pareto solution indicators, i.e., IGD, HV, and RNI. We also provided several management and service suggestions to help reduce total costs (including FCCE cost) and service duration, as well as increase client satisfaction.

Conclusions
In this paper, we presented a MOHH algorithm to solve a MORLCLRP considering simultaneous pickup and delivery, hard time windows, and heterogeneous fleets. In the problem domain, we modeled a multiobjective mathematical formula, which simultaneously minimized service duration, client waiting time, and total logistics costs, where the latter was defined with respect to total costs of renting depots and vehicles as well as FCCE. In the algorithms, we proposed a MOHH, with simple random, fitness rate rank-based multi-armed bandit, choice function, and quantum-inspired selection as the high-level strategy, and all moves, great deluge, and late acceptance as acceptance criteria. Moreover, we provided a general framework of MOEAs used to perform comparative analysis.
In regard to the study aims, we: (1) verified the efficiency of the proposed algorithms; (2) comparatively analyzed the performance of 16 pairs; and (3) analyzed the effects of domain-specific parameters on performance indicators. The first experiment verified our proposed algorithms compared with eight well-known MOEAs. The second experiment showed that the performances of QS-LA, SR-LA, and QS-AM were the best among the 16 pairs. The third experiment evaluated the impact of problem parameters on Pareto solutions, and several conclusions could be obtained:

•
Although method synthesis might promote the algorithm's performance, the strategy is significant to choose and monitor the performance of each method. Moreover, the LLHs have strong ability in effecting the whole performance, therefore, the analysis should be conducted before constructing the pool of low-level heuristics; • The HLHs are important for the algorithm's performance, and the unmerited design may produce the poor performance than the simple random, such as FRR-MAB and GDA; • The depot and client location has significant impacts on the logistics cost, client satisfaction, and service duration. Before determining the set of depots to open and the tracing of the routes, the joint effects of the depot and client location should be analyzed. In the context of this paper, the joint effect can be obtained: C1DR > C1D2 > C1D1 > C1D3 > C2DR > C2D3 > C2D2 > C3D3 > C2D1 > C3DR > CRDR > CRD3 > CRD2 > C3D2 > CRD1 > C3D1 (priority sequence). Moreover, we also analyzed the preference of client and depot in each speed zone, and we found that DR/D3 (C1→C2→C3→CR), D1/D2(C1→C2→CR→C3), CR/C2(DR→D3→D2→D1), C1(DR→D2→D1→D3), and C3(D3→DR→D2→D1), where A(B→C) indicates that the preference of A is B and C, and B is better than C; • The fleet composition is another factor effecting the logistics network. From the perspectives of the results, the heterogonous fleet could always obtain better Pareto solutions; • The zone area effects the depot and client location, and how to partition the speed zones (i.e., the best ratio of speed zones) and determine the depot and client location, to some extents, determine the logistics network. Therefore, the government and logistics companies should optimize the speed zone area for economic, environmental, and social effects.
However, as ours is a multiobjective model, the ratio of fuel consumption and CE cost is difficult to analyze. Moreover, although hyper heuristics are oriented from the generality level, the performance of LLHs significantly effects the whole performance, therefore the future work will focus on the development of MOHH-II, and may try to improve the performance of multiobjective hyperheuristics by designing more efficient high-level strategies. we may also consider the uncertainty related to the input parameters with fuzzy method [97], stochastic program [98,99], and practical application [6,100] to bring the problem closer to the reality.