A Hybrid Estimation of Distribution Algorithm for Multi-Objective Multi-Sourcing Intermodal Transportation Network Design Problem Considering Carbon Emissions

The increasing concern on global warming is prompting transportation sector to take into account more sustainable operation strategies. Among them, intermodal transportation (IT) has already been regarded as one of the most effective measures on carbon reductions. This paper focuses on the model and algorithm for a certain kind of IT, namely multi-objective multi-sourcing intermodal transportation network design problem (MO_MITNDP), in which carbon emission factors are specially considered. The MO_MITNDP is concerned with determining optimal transportation routes and modes for a series of freight provided by multiple sourcing places to find good balance between the total costs and time efficiencies. First, we establish a multi-objective integer programming model to formulate the MO_MITNDP with total cost (TTC) and maximum flow time (MFT) criteria. Specifically, carbon emission costs distinguished by the different transportation mode and route are included in the cost function. Second, to solve the MO_MITNDP, a hybrid estimation of distribution algorithm (HEDA) combined with a heterogeneous marginal distribution and a multi-objective local search is proposed, in which the from the Pareto dominance scenario. Finally, based on randomly generated data and a real-life case study of Jilin Petrochemical Company (JPC), China, simulation experiments and comparisons are carried out to demonstrate the effectiveness and application value of the proposed HEDA.


Introduction
As one of the most important logistics activities, freight transportation which happens throughout the whole production circulation process from material purchase to finished product sale is crucial to the economic growth and market trade.With the rapid growth of global purchasing, transportation has played an increasingly important role in world resource integrations.On the other hand, due to the downsides such as energy consuming and air pollution, transportation is always located at the crossroads of economic and environmental concerns.For example, as pointed by Environmental Protection Agency (USA), transportation is the second largest contributor of carbon emission, causing about 28% of the total emissions in US in the past years [1].Accumulation of greenhouse gas in the atmosphere, especially CO 2 , has been a main source of global warming and attracts increasing concerns by different sectors [2].For example, the European Commission announced that the major objective corresponds to transportation sector is to reduce the carbon emissions at least 60% of 1990 levels by 2050 [3].In the past years, carbon emissions from other transportation modes have experienced a relatively slight decreasing while the road transportation is keeping a rapid growth.Road transportation is still the dominant mode in inland freight transportation system, and meanwhile it is also a carbon intensive mode which contributes two-thirds of total carbon emissions in transportation sector [4].With the growth of transportation volume and economic, Zheng and Wang 2010 has forecasted that carbon emissions from the road freight transportation will keep a continual increase by 2020 globally [5].In China, the current energy structure is dominated by coal (about 64%) and oil (about 18%) [2], which also has been seen as the hardest obstacle to the low-carbon transportation.In view of these, different measures and regulations have been established, such as developing new energy technologies, prompting green transportation equipment, issuing relevant laws, etc.According to Benjaafar et al., 2013 [6], mode selection and route design are of great importance for low-carbon transportation systems, which might be more significant than policies, regulations, laws established by national governments at different levels or international organizations.In this regard, intermodal transportation (IT) has been put on the agenda by many countries and regions, as well as been promoted as a highly desirable measure with respect to improving the resource, environment and ecology sustainability.
Unlike other conventional transportation patterns, in which the transportation modes are independently operated without any coordination, IT aims at integrating more than one mode together such that it can provide more flexible "door to door" service according to the personalized requirements of customers [7].The optimization of IT is essentially to provide a satisfied solution for the intermodal transportation company with reducing transportation cost, transportation time and carbon emissions simultaneously.For carbon emissions reduction, it might be explained that the heavier emission modes are replaced by the lighter ones [8,9].In general, there are three kinds of modes in IT, i.e., waterway, railway and road, and the former two usually contribute less carbon emissions than the latter.However, intermodal transportation systems have an intractable drawback that it is more likely to result in a larger transportation distance than under a single transportation mode [10].Since the costs and carbon emissions are mainly from the transportation process, the increased transportation distance (under intermodal transportation situation) may not actually benefit the optimization of the whole transportation systems.Moreover, with the development of the diversified market, the transportation companies are faced with new challenges to meet different preferences as some customers expect a lower cost while some others focus on a fast delivery.In view of the aforementioned observations, the operating decision of IT is in fact a very complex issue.The complex transportation system and resource of freight, the preference of the customers and environmental sustainability (low carbon emissions) ought to be considered to some extent.
This paper focuses on the multi-objective multi-sourcing intermodal transportation network design problem (MO_MITNDP) with the aim to find a good trade-off between the transportation cost (TC), maximum flow time (MFT), and carbon emissions (CE).Among them, TC, MFT and CE are associated with economic, efficiency, and societal factors respectively.Furthermore, we have taken in account two kinds of technique constraints, i.e., constraints of transportation modes and constraints of each node's capacity which are realistic existing to make our model closer to the real-life intermodal transportation situations.Besides, the constraint for each node's capacity is also a strategy to balance the transportation pressure among the stations, leading to a better sustainability of the transportation system.
To formulate the MO_MITNDP, several sourcing places and a single destination are considered.Indeed, this framework is inspired by the well-known complexity of supply chains in the increasing global purchasing and manufacturing environments [11], which has been widely used in many practical transportation systems, such as petrochemical, steel, pharmaceutical industries, etc.The freight transportation company has acted as a freight integrator to transport freights from the sourcing places to the purchasing destination.On the whole, the overall intermodal transportation network is divided into several stages that consist of stations, ports or logistic centers, which are particularly grounded on the geographical characteristics and real intermodal infrastructure construction, such as business types and service radiant scopes.Meanwhile, it is assumed that the freights are irreversibly transported from the current stage to the following one without dismounting, which is also a typical assumption in the intermodal transportation problem (Steadieseifi et al., 2014) [12].The crossing logistics are considered as the business handling capacity constraints for each node of a certain stage when different freights from the corresponding sourcing places are assembled in the same node in a period of time.These constraints are quite close to the real-life situations and significant for balancing the logistics pressures among the stations, as well as reducing the idle transportation resources.In other words, such constraints can disperse the transportation pressure in the station of high-density traffic to the others belong to the same stage, which will further alleviate the city's traffic congestion and air pollution.The aim of this paper is to determine both the route and mode selection for freight derived from different sourcing places.The main contributions of this paper are listed as follows.
Firstly, the MO_MSITNDP is formulated as a multi-objective integer programming model.As for the network structure, the entire intermodal transportation network is divided into several stages according to the geography and social factors.Meanwhile, the capacity constrain of each intermediate node is considered to make the solution reliable in practice and balance the logistics pressure.
Secondly, a modified estimation of distribution algorithm (EDA), namely hybrid EDA (HEDA) is proposed to address the mathematical model, which is the first time to apply EDA-based meta-heuristic for MO_MITNDP.Based on the features of MO_MITNDP, we specially design the improvement strategies to enhance the effectiveness and robustness of the proposed method in multi-objective handling capacity.
Thirdly, we perform a series of numerous experiments and comparisons based on randomly generated instances to evaluate the effectiveness of HEDA.To further demonstrate the potential application value of HEDA, a case study of Jilin Petrochemical Company (JPC), China, is carried out such that we can obtain the related management insights from the real-world perspectives.
The remainder of the paper is organized as follows.Section 2 reviews the relevant literature.In Section 3, the mathematical model of MO_MITNDP is described in detail.Following a detailed description of the proposed HEDA in Section 4, we conduct numerous experiments and comparisons with other algorithms in Section 5, and a case study analysis is performed as well.Finally, conclusions together with future works are presented in Section 6.

Literature Review
Over the past decades, the IT has attracted many attentions as a cost-effectiveness and environment sustainability alternatives.The existing works can be classified as strategy planning level (Meng and Wang, 2011 [13]; Limbourg and Jourquin, 2009 [14]), tactical planning level (Lam and Gu, 2016 [7]; Verma et al., 2012 [15]), and operational planning level (Francesco et al., 2013 [16]; Bandeira et al., 2009 [17]) according to the decision horizon of the planning problems, and more works can refers to SteadieSeifi et al., 2014 [12].A majority of the works on ITs consider the cost or time as the single objective and discuss cost-saving or time-efficiency ability compared to the unimodal freight transportation (Bouchery and Fransoo, 2014 [3]).However, the published works regarding cost and time as the bi-objective optimization problem or multiple objectives are limited.Yang et al., 2011 [18] presented a multi-objective optimization model considering the total cost, transportation time and time variability to elevate the competitiveness of 36 candidate routes from China to Indian Ocean.Resat and Turkay, 2015 [19] established a bi-objective mixed-integer programming model with minimizing the total cost and time to determine the transportation modes in the designed intermodal network.Domuta et al., 2012 [20] proposed a modified bi-objective Martines' Algorithm to find the optimal route in the intermodal transportation network with the time window and the objective is to minimize the travel time and cost.Lam and Gu, 2016 [7] formulated the port hinterland intermodal transportation network design as a bi-objective optimization problem with minimizing the transit cost and time.
In view of the green logistics, carbon emissions have been an increasing concern in the transportation sector [7].Nevertheless, few current studies consider carbon emissions in the field of intermodal transportation network.In order to estimate the trade-offs between the transportation cost and carbon emission in the intermodal network and show the modification of the mode and route choice with the changing of the trade-offs, Namseok et al., 2009 [21] developed a decision-support method using the multi-objective optimization.Rudi et al., 2016 [22] established a multi-commodity intermodal network flow model considering the in-transit inventory to make route choices.The carbon emission factor is considered in the objective function which is to minimize the total truckload with the transportation cost, time and carbon emission criteria.Demir et al., 2015 [23] used the sample average approximation method to solve the stochastic intermodal network design problem.The method is applied to provide solutions with the different combination of weights corresponding to the direct transportation, carbon emission-related and time-related cost in the objective function.Considering the cost and carbon emission optimization situations, Bouchery and Fransoo, 2014 [3] presented an intermodal network optimization model to provide decisions for both the terminal location and allocation while Lam and Gu, 2016 [7] studied the carbon emissions in the green intermodal network as a constraint with variable levels.
In fact, comparing with single objective, multi-objective optimization can made trade-offs for conflict sub-objectives to satisfy practical fields, which has received wider applications in decision making (Zheng et al., 2015) [2].In addition, the increasing concerns on the environmental sustainability have promoted it a hotspot to reduce the carbon emission in transportation sector (Lee et al., 2017) [24].In that regard, The MO_MITNDP considers the cost and time as the objectives, which could help to provide flexible solutions for the decision makers to satisfy the diversified market.
The generation of optimal routes is a crucial part of the intermodal transportation network and efficient routes can achieve great resources saving and improve the service quality (Kılıç and Gök, 2014) [25].According to Verma and Verter, 2010 [26], the main contribution of rail-truck intermodal transportation is the reliability embodied in on-time delivery.The authors make the first attempt to develop a framework of rail-truck intermodal transportation for hazardous goods, in which the route selection is driven by the lead time from diversified customers.Considering the cooperation of inland terminals simulated by a discrete event model, An et al., 2010 [27] designed a service network for the intermodal barge transportation to select the best route with the goal to saving cost.Similarly, Braekers et al., 2013 [28] presented a decision support model for intermodal barge transportation network which involves a major seaport and several hinterland ports to determine the optimal shipping routes aiming to improving the overall profits.Ishfaq and Sox, 2012 [29] integrated an operation model and a location-allocation model for intermodal hub.In their designed hub logistics network, arcs that represent different transportation modes are alternative to increase cost benefits.In this paper, the intermediate nodes in the network are the first time to be divided into several stages grounded on the geographical characteristic and real intermodal infrastructure construction.
The ITs faced by the freight integrator is generally a NP-hard problem in which the routes selection, modes combination, terminal location or container scheduling are involved (Li et al., 2015) [30] and huge computational efforts are needed.The meta-heuristic algorithms have an outstanding performance in the combinatorial optimization and have been successfully used in the intermodal freight transportation optimization.Kılıç and Gök, 2014 [25] formulated the problem as an initial route programming for urban transportation network, and a route generation algorithm is designed to find solutions with objectives reflecting the interests of stakeholders.To handle the inland container transportation problem considering hard time constraints, Sterzik and Kopfer, 2013 [31] established a comprehensive model which contains vehicle routing and empty repositioning and a Tabu Search heuristic is used to minimizing the total operation time.Nossack and Pesch, 2013 [32] addressed a truck scheduling problem with time window in intermodal container transportation by a two-stage heuristic.The first stage is route construction heuristic which determines a feasible initial solution for the problem; the second stage is route improvement heuristic which improves solution quality by a neighborhood operator.Genetic algorithm is used in Zhang et al., 2015 [33] to effectively save time and cost.As a novel swarm intelligence algorithm based on probability model, EDA has been a research hotspot in the area of intelligent computing (Wang et al., 2012 [34]; Wang et al., 2016 [35]).Contrast with genetic algorithm, there is no mutation and crossover operators for EDA, whereas a competitive mechanism is adopted in EDA to estimate the probability model according to excellent individuals found in the search process.Thereafter, the new population can be generated through sampling the estimated probability model.Although EDA has been successfully used in various combinatorial optimization problems, there is no research works on EDA for ITs.Essentially, using EDA-based methods to solve MO_MSITNDP is equivalent to confirming the matching probability of the transportation modes and intermediate nodes associated with given freight.Finally, the optimal or approximate optimal relationship will be found.In this regard, this paper engages in raising the EDA-based solution method for dealing with MO_MSITNDP.

Formulation of Proposed Mathematical Model
MO_MSITNDP considers multiple sourcing places and a single purchasing destination.Denote N the total number of sourcing places and M the total number of stages, then the scale of MO_MSITNDP can be defined as N × M. As shown in Figure 1, there are totally M + 1 predefined transportation stages (i.e., stage 0, stage 1, . . ., stage M) in the network, where stage 0 is a virtual point without intermediate nodes.The transport request A 1 , . . ., A N deriving from the corresponding sourcing places 1, . . ., N will be responded by transporting the freights from segment 1 to M (i.e., destination B).During the transportation process, the transport requests A i (i = 1, . . .N) are processed independently without dismounting or consolidation.In this case, there exist crossing logistics in the intermediate nodes owing to the share of intermodal terminal, and the capacity constraint of each node must be considered.All the freights must be irreversibly transported from the stage 1 to M in sequence.The changes of transportation modes only occur on the stages rather than transportation process.Next, we will propose the multi-objective integer programming model of MO_MSITNDP.
Sustainability 2017, 9, 1133 5 of 24 optimization problems, there is no research works on EDA for ITs.Essentially, using EDA-based methods to solve MO_MSITNDP is equivalent to confirming the matching probability of the transportation modes and intermediate nodes associated with given freight.Finally, the optimal or approximate optimal relationship will be found.In this regard, this paper engages in raising the EDAbased solution method for dealing with MO_MSITNDP.

Formulation of Proposed Mathematical Model
MO_MSITNDP considers multiple sourcing places and a single purchasing destination.Denote N the total number of sourcing places and M the total number of stages, then the scale of MO_MSITNDP can be defined as N M  .As shown in Figure 1 are processed independently without dismounting or consolidation.In this case, there exist crossing logistics in the intermediate nodes owing to the share of intermodal terminal, and the capacity constraint of each node must be considered.All the freights must be irreversibly transported from the stage 1 to M in sequence.The changes of transportation modes only occur on the stages rather than transportation process.Next, we will propose the multi-objective integer programming model of MO_MSITNDP.
In our model, the objective functions to be minimized consist of the total cost (TTC) and maximum flow time (MFT).Furthermore, the TTC includes three parts, i.e., transportation cost, transfer cost of transportation mode and carbon emissions cost, thus our solutions can reflect the economic benefit, time efficiency and environmental sustainability (society) criteria.The MFT is the maximum completion time of all freights, so a lower MFT indicates a higher efficiency of the whole systems.Accordingly, our ultimate objective is to find a good trade-off among the economy, efficient and sustainability sub-objectives, and these three criteria are usually in conflict with each other.The optimization of IT is essentially to provide a satisfied solution for the intermodal transportation company with reducing the transportation cost, transportation time and carbon emissions simultaneously.In our model, the objective functions to be minimized consist of the total cost (TTC) and maximum flow time (MFT).Furthermore, the TTC includes three parts, i.e., transportation cost, transfer cost of transportation mode and carbon emissions cost, thus our solutions can reflect the economic benefit, time efficiency and environmental sustainability (society) criteria.The MFT is the maximum completion time of all freights, so a lower MFT indicates a higher efficiency of the whole systems.Accordingly, our ultimate objective is to find a good trade-off among the economy, efficient and sustainability sub-objectives, and these three criteria are usually in conflict with each other.The optimization of IT is essentially to provide a satisfied solution for the intermodal transportation company with reducing the transportation cost, transportation time and carbon emissions simultaneously.

Parameters:
M : An infinite number; Q i : Amount of freight from sourcing place i, i ∈ {1, . . . ,N}; R j : Set of available nodes of stage j, j ∈ {1, . .Carbon emission cost of freight i from the lth node in stage j − 1 to the hth node in stage j under the transportation mode w, i ∈ {1, . . . ,N}, w ∈ S j , l ∈ R j−1 , h ∈ R j ; C w i (j − 1, j, l, h) : Transportation cost of freight i from the lth node in stage j − 1 to the hth node in stage j under the transportation mode w, i ∈ {1, . . . ,N}, w ∈ S j , l ∈ R j−1 , h ∈ R j ; T w i (j − 1, j, l, h) : Transportation time of freight i from the lth node in stage j − 1 to the hth node in stage j under the transportation mode w, i ∈ {1, . . . ,N}, j ∈ {1, . . . ,M}, w ∈ S j , Switch cost of freight i from transportation mode w to v for the kth node of stage j, i ∈ {1, . . . ,N}, j ∈ {1, . . . ,M − 1}, w ∈ S j , v ∈ S j+1 , k = 1, . . ., R j ; Switch time of freight i from transportation mode w to v for the kth node of stage j, i ∈ {1, . . . ,N}, j ∈ {1, . . . ,M − 1}, w ∈ S j , v ∈ S j+1 , k = 1, . . ., R j ; and U w (j − 1, j) = 1if transportation mode w exists between stage j − 1 and stage j 0 otherwise .

Decision variables:
x 1 if transportation mode w is selected for freight i from the lth tnode in stage j − 1 to the hth node in stage j 0 otherwise ; 1 if transportation mode is transfered from w to v for freight i in the kth node of stage j 0 otherwise ; st ij the start time at the stage j for freight i; and ct ij the leave time at the stage j for freight i.
Based on the aforementioned definitions, we detail the mathematical model of MO_MSITNDP as follows.
x w i (j, j + 1, g, l) x w i (j, j + 1, l, h) ≤ U w (j, j + 1); ∀i ∈ {1, . . ., N}; j ∈ {0, . . ., M − 1} , w ∈ S j+1 ; st ij ≥ ct i,j−1 ; ∀i ∈ {1, . . ., N}, j ∈ {1, . . ., M}; Equations ( 1) and (2) are the objective functions.Equation ( 1) is to minimize the TTC which includes transportation cost, transfer cost of transportation mode, and carbon emission cost, whereas Equation (2) minimizes the MFT.Constraint (3) ensures that only one transportation mode can be chosen for two adjacent stages.Constraint (4) ensures the mode transferring only can be executed at the intermodal terminal.Constraint (5) represents the capacity restrictions for each node with respect to given stage.Constraint (6) is the transportation continuity constraints for all routes.Constraint (7) is the transportation mode constraints between two adjacent stages associated with given freight and nodes.Equations ( 8) and ( 9) are the definitions of arriving times and departing times with regard to given freight, stages, and selected nodes.Constraint (10) implies that the overall structure of MO_MSITNDP is a directed acyclic graph from the time perspective.Constraints (11)- (15) are the ranges of values for variables.

Modification of Established Model
In view of the aforementioned model, the MO_MSITNDP is essentially a bi-objective optimization problem with capacity constraints, i.e., Constraint (5).Thus, we should introduce relevant constraints handling techniques into HEDA to realize its global search.Unlike single objective optimization problem, the traditional penalty function method may be unusable for multi-objective optimization scenario.In addition, since the solution representation of MO_MSITNDP is a matrix which combines transportation modes with intermediate nodes together, the repair-based constraints handling techniques are more likely to result in a lower efficiency of computation, especially for large scale or strong constraints conditions.Owing to such intractability, we apply the multi-objective-based constraints handling approach [36,37], in which the degree of constraint violation is regarded as an additional objective function.In this regard, the Constraints (5) can be transformed to the third objective function, which denotes the total degree of constraint violation with regard to all intermediate nodes as shown by Equation ( 16): Afterwards, the modification of the established model can be given as follows: where the first objective functions f 1 and f 2 are the same as Equations ( 1) and (2).Except for removing Constraint (5), other constraints of this modified version keep unchanged as the previous model.In this work, HEDA deals with the modified version model in scene of the Pareto dominance.The criteria f 1 and f 2 are the ultimate optimization objective associated with the MO_MSITNDP, whereas the additionally introduced objective function f 3 is utilized to guide the search direction from the feasibility perspectives.

Solution Representation
Since the intermediate nodes and transportation modes need to be determined simultaneously, we specially design a matrix π = π ij (i = 1, . . ., N; j = 1, . . ., 2M − 1) to represent MO_MSITNDP with the scale N × M, in which the element π ij denotes the index of selected node (if jmod2 = 0) or transportation mode (if jmod2 = 0) of the ith sourcing place.In addition, since freight i (i = 1, . . ., N) need to undergo M nodes and M segments in turns after departing from its sourcing places, the sourcing places (i.e., stage 0) and the common destination (i.e., stage M) are not necessary to be considered.
Here, we take a simple example with the size N × M = 4 × 5 to illustrate this solution representation approach.Considering the matrix Sustainability 2017, 9, 1133 8 of 24 Afterwards, the modification of the established model can be given as follows: where the first objective functions 1 f and 2 f are the same as Equations ( 1) and (2 , which indicates that freight 1 selects the transportation mode 1, 2, 1, 1 and 2 from segment 1 to 5 (marked by circles) and selects the intermediate node 3, 4, 3 and 1 in terms of stage 1 to 4, and so on.

Multi-Objective Handling Method
To store the non-dominated solutions (i.e., approximate Pareto set), a Pareto archive ( PA) is introduced.For a multi-objective optimization problem with K criteria, if we say that ) then both of the following conditions should be satisfied simultaneously: [38] (1) , and ( 2) it has at least one will be added to PA if and only if it can dominate at which indicates that freight 1 selects the transportation mode 1, 2, 1, 1 and 2 from segment 1 to 5 (marked by circles) and selects the intermediate node 3, 4, 3 and 1 in terms of stage 1 to 4, and so on.

Multi-Objective Handling Method
To store the non-dominated solutions (i.e., approximate Pareto set), a Pareto archive (PA) is introduced.For a multi-objective optimization problem with K criteria, if we say that π 1 dominatesπ 2 (i.e., π 1 ≺ π 2 ) then both of the following conditions should be satisfied simultaneously: [38] (1) f k (π 1 ) ≤ f k (π 2 ), for k = 1, . . ., K, and ( 2) it has at least one l ∈ {1, . . . ,K} that f l (π 1 ) < f l (π 2 ).Let PS denote the population size of HEDA, Nπ i (gen)(i = 1, 2, . . ., PS) the ith individual in current population at generation gen, AC(gen) the length of PA at generation gen, Pπ i (gen)(i = 1, . . ., AC(gen))the ith non-dominated solution of PA at generation gen.The newly generated individual Nπ i (gen)(i = 1, 2, . . ., PS) will be added to PA if and only if it can dominate at least one solution in current PA, and, meanwhile, the solution that is dominated by newly generated individual will be removed from PA.Here, we propose the update procedure of PA as shown in Figure 2.

The Proposed Heterogeneous Probability Model and Update Mechanism
When using EDA to solve the discrete combinatorial optimization problems, the probability model is generally described as joint probability distribution function with independent random variables.Considering the inherent features of MO_MSITNDP, a heterogeneous marginal distribution law (HMDL) is put forward for serving as HEDA's probability model, which is utilized to represent the distribution of promising solution regions.The probability model ( ) gen pr can be formulated as follows: where is the maximum value of transportation modes,

( )
i sj p gen is the probability of transportation mode s to be selected in the jth segment for the ith freight at generation gen, and r gen is the probability of selecting the rth node in stage j for the ith freight at generation gen.Since there exists transportation mode constraint between stage 1 j  and j (for the relate probability should be set to 0 if

., w m axS 
).Therefore, the probability model can be initialized as follows:

The Proposed Heterogeneous Probability Model and Update Mechanism
When using EDA to solve the discrete combinatorial optimization problems, the probability model is generally described as joint probability distribution function with independent random variables.Considering the inherent features of MO_MSITNDP, a heterogeneous marginal distribution law (HMDL) is put forward for serving as HEDA's probability model, which is utilized to represent the distribution of promising solution regions.The probability model pr(gen) can be formulated as follows:

The Proposed Heterogeneous Probability Model and Update Mechanism
When using EDA to solve the discrete combinatorial optimization problems, the probability model is generally described as joint probability distribution function with independent random variables.Considering the inherent features of MO_MSITNDP, a heterogeneous marginal distribution law (HMDL) is put forward for serving as HEDA's probability model, which is utilized to represent the distribution of promising solution regions.The probability model ( ) gen pr can be formulated as follows: where is the maximum value of transportation modes,

( )
i sj p gen is the probability of transportation mode s to be selected in the jth segment for the ith freight at generation gen, and r gen is the probability of selecting the rth node in stage j for the ith freight at generation gen.Since there exists transportation mode constraint between stage 1 j  and j (for the relate probability should be set to 0 if

., w m axS 
).Therefore, the probability model can be initialized as follows: where maxS = max j∈{1,...,M} S j is the maximum value of transportation modes, p i sj (gen) is the probability of transportation mode s to be selected in the jth segment for the ith freight at generation gen, and r i rj (gen) is the probability of selecting the rth node in stage j for the ith freight at generation gen.Since there exists transportation mode constraint between stage j − 1 and j (for j = 1, . . ., M), the relate probability should be set to 0 if u w (j − 1, j)=0 (for j = 1, . . ., M, w = 1, . . ., maxS).Therefore, the probability model can be initialized as follows: Probability model update is one of the crucial operators for EDA [32], which is used to accumulate the historical information of excellent individuals found during the search process.Thereafter, the new populations are generated by sampling the updated probability model.As mentioned above, PA is dynamically updated by new generated individuals according to the dominance rule (see algorithm in Figure 1), so that PA can be seen as a set of excellent individuals.In view of this, we apply the non-dominated solutions that located in PA to update the probability model.Denote LR the learning rate of HEDA and x, y the temporary variables, then the probability model update method is given in Figure 3.
Sustainability 2017, 9, 1133 10 of 24 Probability model update is one of the crucial operators for EDA [32], which is used to accumulate the historical information of excellent individuals found during the search process.Thereafter, the new populations are generated by sampling the updated probability model.As mentioned above, PAis dynamically updated by new generated individuals according to the dominance rule (see algorithm in Figure 1), so that PA can be seen as a set of excellent individuals.In view of this, we apply the non-dominated solutions that located in PA to update the probability model.Denote LR the learning rate of HEDA and x y , the temporary variables, then the probability model update method is given in Figure 3.
Algorithm 2: Update the probability model.

Randomly select an individual
( [ ]

New Population Generation Method
HEDA generates new population by sampling the updated probability model (i.e., HMDL).In the sampling procedure, the new generated individuals should be consistent with the HMDL to guide HEDA's global search direction, and, meanwhile, to keep the population diversity, relevant random factors may need to be considered.That is, we tend to select a random variable in HMDL with larger probability for producing new individuals, whereas ones corresponding to relative smaller probability also have a chance to be selected.In this regard, we propose a sampling approach based on roulette wheel selection, for which the selection probability of a random variable is proportional to its probability in HMDL.Denote

New Population Generation Method
HEDA generates new population by sampling the updated probability model (i.e., HMDL).In the sampling procedure, the new generated individuals should be consistent with the HMDL to guide HEDA's global search direction, and, meanwhile, to keep the population diversity, relevant random factors may need to be considered.That is, we tend to select a random variable in HMDL with larger probability for producing new individuals, whereas ones corresponding to relative smaller probability also have a chance to be selected.In this regard, we propose a sampling approach based on roulette wheel selection, for which the selection probability of a random variable is proportional to its probability in HMDL.Denote FP i j (gen) =[FP i 1,j (gen), . . ., FP i maxS,j (gen),FP i maxS+1,j (gen)] T (j = 1, . . ., M) the jth marginal distribution function (MDF) of probability matrix P i at generation gen, FR i j (gen) = [FR i 1,j (gen), . . ., FR i |R j |,j (gen), FR i |R j |+1,j| (gen)] T (j = 1, . . ., M − 1) the jth MDF of probability matrix R i at generation gen, and x, y, rnd the temporary variables.Then, the procedure of calculating MDF and generating new population are shown in Figures 4 and 5.

Problem-Dependent Local Search
As mentioned above, HEDA may pay little attention to the local search during its iterative process.Hence, it would be highly advisable to embed problem-dependent local search into HEDA to exploit more promising solutions in relatively narrow regions of solution space.Since the solution representation of MO_MSITNDP is a N × M matrix, traditional neighborhood structures, such as Insert, Swap, Inverse, etc. [39], may be unsuitable.In this work, we propose a new neighborhood structure named partially interchanged neighborhood (PIN), in which two random positions c 1 , c 2 ∈ {1, . . . ,2M − 1}, c 1 < c 2 are selected first and then we interchange the partial elements between c 1 and c 2 associated with two randomly selected r 1 and r 2 , r 1 , r 2 ∈ {1, . . . ,N}, r 1 = r 2 .The mechanism of PIN can be illustrated by a simple example, as shown in Figure 6.

Problem-Dependent Local Search
As mentioned above, HEDA may pay little attention to the local search during its iterative process.Hence, it would be highly advisable to embed problem-dependent local search into HEDA to exploit more promising solutions in relatively narrow regions of solution space.Since the solution representation of MO_MSITNDP is a N M  matrix, traditional neighborhood structures, such as Insert, Swap, Inverse, etc. [39], may be unsuitable.In this work, we propose a new neighborhood structure named partially interchanged neighborhood (PIN), in which two random positions  Thereafter, we perform the PIN-based local search for all non-dominated solutions in PA, and meanwhile the first-move rule is adopted in order to avoid excessive exploitation.That is, once a relatively better solution is found from the neighborhood of current solution, the local search will be terminated.Let

Procedure of HEDA
Let maxGen denote the maximum number of iteration for HEDA.Based on aforementioned subsections, the procedure of HEDA is shown in Figure 8.Thereafter, we perform the PIN-based local search for all non-dominated solutions in PA, and meanwhile the first-move rule is adopted in order to avoid excessive exploitation.That is, once a relatively better solution is found from the neighborhood of current solution, the local search will be terminated.Let β = (π) denote a neighborhood solution associated with performing PIN operator one time on π, cnt the temporary variables.Then, the proposed local search operator is given in Figure 7.

Problem-Dependent Local Search
As mentioned above, HEDA may pay little attention to the local search during its iterative process.Hence, it would be highly advisable to embed problem-dependent local search into HEDA to exploit more promising solutions in relatively narrow regions of solution space.Since the solution representation of MO_MSITNDP is a N M  matrix, traditional neighborhood structures, such as Insert, Swap, Inverse, etc. [39], may be unsuitable.In this work, we propose a new neighborhood structure named partially interchanged neighborhood (PIN), in which two random positions  Thereafter, we perform the PIN-based local search for all non-dominated solutions in PA, and meanwhile the first-move rule is adopted in order to avoid excessive exploitation.That is, once a relatively better solution is found from the neighborhood of current solution, the local search will be terminated.Let ( ) denote a neighborhood solution associated with performing PIN operator one time on  , cnt the temporary variables.Then, the proposed local search operator is given in Figure 7. Algorithm

Procedure of HEDA
Let maxGen denote the maximum number of iteration for HEDA.Based on aforementioned subsections, the procedure of HEDA is shown in Figure 8.

Procedure of HEDA
Let maxGen denote the maximum number of iteration for HEDA.Based on aforementioned subsections, the procedure of HEDA is shown in Figure 8.

Computational Results and Comparisons
To evaluate the performance of HEDA, we first perform computational experiments and comparisons based on randomly generated instances.Then, a real-life case study of Jilin Petrochemical Company (JPC), China, is carried out to demonstrate the application value of the proposed HEDA.

Experimental Setup
Since there is no benchmark data for MO_MSITNDP, we consider fifteen different combinations of N M  that include 5 , transportation time , ]; and (6) the handling capacity is calculated as

Computational Results and Comparisons
To evaluate the performance of HEDA, we first perform computational experiments and comparisons based on randomly generated instances.Then, a real-life case study of Jilin Petrochemical Company (JPC), China, is carried out to demonstrate the application value of the proposed HEDA.
We independently run all algorithms 20 times, and the results are all based on the average level of 10 instances with respect to the related N × M combinations.In the results, MIN, MAX, AVG and SD represent the minimum, maximum, average and standard deviation, respectively, corresponding to the related performance metrics.The unique terminated condition for all algorithms is the maximal evaluation times of objective functions, which is set to 1000 × N × M. All algorithms are coded in C++ in Microsoft Visual Studio 2008 environment, and all the simulations are conducted on a PC with Intel Core-i5 3.30 GHz processor with 4 GB memory.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) l ONSN  is, the better performance of Algorithm l. (

2) Ratio of non-dominated solutions (RNDS)
Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) l ONSN  is, the better performance of Algorithm l. (

2) Ratio of non-dominated solutions (RNDS)
Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.
is the number of non-dominated solutions in Sustainability 2017, 9, 1133 14 of 24

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) l ONSN  is, the better performance of Algorithm l. (

2) Ratio of non-dominated solutions (RNDS)
Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) l ONSN  is, the better performance of Algorithm l. (

2) Ratio of non-dominated solutions (RNDS)
Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.

Performance Metrics
To systematically verify the performance of HEDA, three as follows: (1) Overall non-dominated solutions number (ONSN) Denote  is the shortest normalized distance fro then DIR can be defined in Equation (22).Obviously, a convergence performance to   , as well as a better distrib ( )

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algo HEDA in which the PIN-based local search operator is remo same as HEDA; (2) NSGAII [41], a well-known non-domi algorithm, which has been widely used for various multi-ob PGA [31], a relatively new reported genetic algorithm for a sp problem which is quite similar to MO_MSITNDP.It should be executed based on the modified model in Section 3.2.
For the parameters setting, we set the population si meanwhile, HEDA's learning rate , espectively.HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure It is can be obviously seen in Figure 9 that the solution better (both in terms of the quality and quantity of solution algorithms in all cases of variable problem sizes, which impl solving MO_MSITNDP.Furthermore, the statistic results of th 1-4.
is, the better performance of Algorithm l.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) ONSN  is, the better performance of Algorithm l. (

2) Ratio of non-dominated solutions (RNDS)
Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40 as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions w 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solut which are not dominated by solutions in   as shown in Equation ( 20).The large ( ) ONSN  is, the better performance of Algorithm l. . (

2) Ratio of non-dominated solutions (RNDS)
Denote the ratio of solutions which are not dominated by solutions in   following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performanc l.

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS HEDA in which the PIN-based local search operator is removed, and the parameter same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-obj algorithm, which has been widely used for various multi-objective optimization prob PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal t problem which is quite similar to MO_MSITNDP.It should be mentioned that all the a executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all alg meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability o PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded t local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtain HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained better (both in terms of the quality and quantity of solutions) than the ones from th algorithms in all cases of variable problem sizes, which implies HEDA share a high solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are l 1-4.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) ONSN  is, the better performance of Algorithm l.
(2) Ratio of non-dominated solutions (RNDS) Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.
is, the better performance of Algorithm l.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) ONSN  is, the better performance of Algorithm l.
(2) Ratio of non-dominated solutions (RNDS) Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated so which are not dominated by solutions in   as shown in Equation ( 20).The lar ( ) ONSN  is, the better performance of Algorithm l.

) Ratio of non-dominated solutions (RNDS)
Denote the ratio of solutions which are not dominated by solutions in  following Equation (21).The higher the value of ( ) l RNDS  is, the better performa l. then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_no HEDA in which the PIN-based local search operator is removed, and the parame same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-o algorithm, which has been widely used for various multi-objective optimization p PGA [31], a relatively new reported genetic algorithm for a special kind of intermoda problem which is quite similar to MO_MSITNDP.It should be mentioned that all th executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all a meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedde local search operator in to the NSGAII and PGA to ensure the fairness of compariso We first make an observation on the approximate Pareto fronts with regard to of 10 5  and 50 13  , espectively.Furthermore, the obt HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) ONSN  is, the better performance of Algorithm l.
(2) Ratio of non-dominated solutions (RNDS) Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size , espectively.Furthermore, the obtained results of [40], then DIR can be defined in Equation (22).Obviously, a smaller Sustainability 2017, 9, 1133 14 of 24

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) l ONSN  is, the better performance of Algorithm l.
(2) Ratio of non-dominated solutions (RNDS) Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.  20.The larger the value of ) l is, the better performance of Algorithm l.
of non-dominated solutions (RNDS) te the ratio of solutions which are not dominated by solutions in   , shown as Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm in   to l  [40], can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better ce performance to   , as well as a better distribution of l  .
( ) risons Results parison with Existing Algorithms is subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of which the PIN-based local search operator is removed, and the parameter setting is the HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic , which has been widely used for various multi-objective optimization problems; and (3) a relatively new reported genetic algorithm for a special kind of intermodal transportation hich is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are based on the modified model in Section 3.2.the parameters setting, we set the population size 50 PS  for all algorithms, and, le, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based h operator in to the NSGAII and PGA to ensure the fairness of comparisons.irst make an observation on the approximate Pareto fronts with regard to the first instance , as well as a better distribution of

(DIR)
s the shortest normalized distance from solution y in   to l  [40], ined in Equation (22).Obviously, a smaller ( ) l DIR  means a better ce to   , as well as a better distribution of l  .

Performance Metrics
To systematically verify the performance of HEDA, three performance metrics [40] are adopted as follows: (1) Overall non-dominated solutions number (ONSN)  is the union of non-dominated solutions with regard to 1,..., ,..., Algorithm l L , and then ( ) ONSN  is the number of non-dominated solutions in l  which are not dominated by solutions in   as shown in Equation (20).The larger the value of ( ) l ONSN  is, the better performance of Algorithm l.
(2) Ratio of non-dominated solutions (RNDS) Denote the ratio of solutions which are not dominated by solutions in   , shown as following Equation ( 21).The higher the value of ( ) l RNDS  is, the better performance of Algorithm l.
(3) Average distance (DIR) is the shortest normalized distance from solution y in   to l  [40], then DIR can be defined in Equation (22).Obviously, a smaller ( ) l DIR  means a better convergence performance to   , as well as a better distribution of l  .

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size 50 PS  for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 5  , 20 5  , 30 7  , 40 9  and 50 13  , espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.

Comparison with Existing Algorithms
In this subsection, we compare HEDA with three algorithms: (1) HEDA_noLS, a variant of HEDA in which the PIN-based local search operator is removed, and the parameter setting is the same as HEDA; (2) NSGAII [41], a well-known non-dominated sorting multi-objective genetic algorithm, which has been widely used for various multi-objective optimization problems; and (3) PGA [31], a relatively new reported genetic algorithm for a special kind of intermodal transportation problem which is quite similar to MO_MSITNDP.It should be mentioned that all the algorithms are executed based on the modified model in Section 3.2.
For the parameters setting, we set the population size PS = 50 for all algorithms, and, meanwhile, HEDA's learning rate LR = 0.06, the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.
We first make an observation on the approximate Pareto fronts with regard to the first instance of 10 × 5, 20 × 5, 30 × 7, 40 × 9 and 50 × 13, espectively.Furthermore, the obtained results of HEDA, HEDA_noLS, NSGAII and PGA can be seen in Figure 9.
It is can be obviously seen in Figure 9 that the solutions in Pareto set obtained by HEDA are better (both in terms of the quality and quantity of solutions) than the ones from the three other algorithms in all cases of variable problem sizes, which implies HEDA share a high superiority in solving MO_MSITNDP.Furthermore, the statistic results of these four algorithms are listed in Tables 1-4.
We can see in Tables 1-3 that the performance of HEDA is obviously better than other three compared algorithm associated with ONSN, RNDS and DIR metrics.It is clear that HEDA's performance is more superior than its variant HEDA_noLS, which demonstrate effectiveness of our proposed PIN-based local search.For most instances, HEDA can find more non-dominated solutions than the other algorithms, and its solutions possess a shorter distance to the union of non-dominated solutions.Thus, not only does HEDA ensure an outstanding convergence to the optimal Pareto font, but it also keeps a reasonable distribution and diversity on its approximated Pareto font.The feasible ratio of HEDA reaches to 100%, as well as other algorithms, which demonstrated the reliability of our proposed HEDA.Accordingly, the proposed HEDA can be seen as an effective solution for addressing MO_MSITNDP.
In terms of the runtime, Figure 10 shows that HEDA has lower runtimes than NSGAII and PGA, which indicates that the running speed of HEDA is faster than NSGAII and PGA.The runtimes of HEDA are a little bit higher than HEDA_noLS, implying that the PIN-based local search of HEDA causes relative lower time consumption.In addition, the feasibility rates for all algorithms are 100%.We can see in Tables 1-3 that the performance of HEDA is obviously better than other three compared algorithm associated with ONSN, RNDS and DIR metrics.It is clear that HEDA's performance is more superior than its variant HEDA_noLS, which demonstrate effectiveness of our proposed PIN-based local search.For most instances, HEDA can find more non-dominated solutions than the other algorithms, and its solutions possess a shorter distance to the union of non-dominated solutions.Thus, not only does HEDA ensure an outstanding convergence to the optimal Pareto font, but it also keeps a reasonable distribution and diversity on its approximated Pareto font.The feasible ratio of HEDA reaches to 100%, as well as other algorithms, which demonstrated the reliability of our proposed HEDA.Accordingly, the proposed HEDA can be seen as an effective solution for addressing MO_MSITNDP.
In terms of the runtime, Figure 10 shows that HEDA has lower runtimes than NSGAII and PGA, which indicates that the running speed of HEDA is faster than NSGAII and PGA.The runtimes of HEDA are a little bit higher than HEDA_noLS, implying that the PIN-based local search of HEDA causes relative lower time consumption.In addition, the feasibility rates for all algorithms are 100%.

Comparison with Optimization Solver
In this subsection, the developed HEDA is compared with optimization solver.Owing to the linear characteristic of the mathematical model, we choose CPLEX 12.5, which is a popular solver used in optimization area.The multiple objectives are transformed into a single objective problem by using the epsilon constraint method [42] to satisfy the solving conditions of CPLEX.It should be noted that we do not purse the optimal solutions regarding the branch-and-cut phase addressed by CPLEX to obtain more feasible solutions to construct the approximate Pareto font.That is, once the feasible solution is found, CPLEX will stop and turn to solve the next problem in terms of another upper bound.Upper limit of time for CPLEX is 600 s.The results of ONSN, RNDS, DIR and Time are shown in Table 4.
In Table 4, we can find that, for the small scale instances, HEDA has lower runtimes and the quality of the solutions is also at an acceptable level.For the middle problem size, HEDA has advantages in runtimes and can provide better solutions than CPLEX.Furthermore, CPLEX is unable to solve the large scale instances, such as instance 40 11  and 50 13  , whereas HEDA can obtain solutions easily, which indicates HEDA is more capable with respect to all instances.

Comparison with Optimization Solver
In this subsection, the developed HEDA is compared with optimization solver.Owing to the linear characteristic of the mathematical model, we choose CPLEX 12.5, which is a popular solver used in optimization area.The multiple objectives are transformed into a single objective problem by using the epsilon constraint method [42] to satisfy the solving conditions of CPLEX.It should be noted that we do not purse the optimal solutions regarding the branch-and-cut phase addressed by CPLEX to obtain more feasible solutions to construct the approximate Pareto font.That is, once the feasible solution is found, CPLEX will stop and turn to solve the next problem in terms of another upper bound.Upper limit of time for CPLEX is 600 s.The results of ONSN, RNDS, DIR and Time are shown in Table 4.
In Table 4, we can find that, for the small scale instances, HEDA has lower runtimes and the quality of the solutions is also at an acceptable level.For the middle problem size, HEDA has advantages in runtimes and can provide better solutions than CPLEX.Furthermore, CPLEX is unable to solve the large scale instances, such as instance 40 × 11 and 50 × 13, whereas HEDA can obtain solutions easily, which indicates HEDA is more capable with respect to all instances.According to the data above, we use HEDA to solve this case study and the non-dominated solutions are shown in Table 8 and Figure 12.Especially, to analyze the carbon emissions in our designed intermodal network, the carbon emission cost (CEC) is given.In Table 8, there are five non-dominated solutions provided by the HEDA for the intermodal transportation company to make transportation decision.By careful analysis, it can be found that the solutions have a good trade-off among cost, time and carbon emissions.For example, total cost and carbon emission cost of Solution 4 is 92,141 and 1850, respectively, which are a little higher than Solution 3 (92,188 and 617).However, the transportation time of Solution 3 is much less than Solution 4. It illustrates that time and carbon emissions can be greatly reduced without large sacrifice of cost.
Obviously, we can find that Solution 1 has a shortest maximum flow time of 601 while Solution 4 can provide a lowest total transportation cost with 92,141.Therefore, Solution 1 is proper if a fast delivery is needed, whereas Solution 4 should be adopted with respect to total cost focus.In addition, the rate of road use is very low; it has been replaced by waterway and railway, which have lighter carbon emission, implying that the carbon emission could be reduced in this network.Moreover, the stage division for an intermodal network that is mainly based on the physical characteristics of the available transportation infrastructures is a key preparation for the model construction.Through the gained solutions, it is obvious that some nodes are at a low frequency to be selected while some ones are invariably being passed.For example, the second node and third node are never selected in stage 1 for all solutions, which implies that we can consider removing the two nodes in stage division to decrease the computing complexity.

Management Insights
The approach and findings of this study will provide some practical insights for the logistics industry to make a tactical decision.An increasing number of transportation companies operate as freight integrators to provide one-stop service for customers.To enhance the competitiveness, they are supposed to optimize the transportation planning within the existing infrastructure and the expectations of customers.In this paper, the intermodal company can optimize the total cost and maximum flow time to satisfy diversified markets.In addition, carbon emissions are considered to In Table 8, there are five non-dominated solutions provided by the HEDA for the intermodal transportation company to make transportation decision.By careful analysis, it can be found that the solutions have a good trade-off among cost, time and carbon emissions.For example, total cost and carbon emission cost of Solution 4 is 92,141 and 1850, respectively, which are a little higher than Solution 3 (92,188 and 617).However, the transportation time of Solution 3 is much less than Solution 4. It illustrates that time and carbon emissions can be greatly reduced without large sacrifice of cost.
Obviously, we can find that Solution 1 has a shortest maximum flow time of 601 while Solution 4 can provide a lowest total transportation cost with 92,141.Therefore, Solution 1 is proper if a fast delivery is needed, whereas Solution 4 should be adopted with respect to total cost focus.In addition, the rate of road use is very low; it has been replaced by waterway and railway, which have lighter carbon emission, implying that the carbon emission could be reduced in this network.Moreover, the stage division for an intermodal network that is mainly based on the physical characteristics of the available transportation infrastructures is a key preparation for the model construction.Through the gained solutions, it is obvious that some nodes are at a low frequency to be selected while some ones are invariably being passed.For example, the second node and third node are never selected in stage 1 for all solutions, which implies that we can consider removing the two nodes in stage division to decrease the computing complexity.

Management Insights
The approach and findings of this study will provide some practical insights for the logistics industry to make a tactical decision.An increasing number of transportation companies operate as freight integrators to provide one-stop service for customers.To enhance the competitiveness, they are supposed to optimize the transportation planning within the existing infrastructure and the expectations of customers.In this paper, the intermodal company can optimize the total cost and maximum flow time to satisfy diversified markets.In addition, carbon emissions are considered to achieve the environment sustainability and response to low-carbon policy and regulation.The model and proposed algorithm will enable the freight integrator to make a trade-off among the total cost, time and carbon emissions while providing highly complicated solutions to improve the competitiveness.
Furthermore, the crossing logistics is involved in the model as the handling capacity in each node.Such constraints can make our model close to reality and ensure the feasibility of solutions.Besides, it can balance the transportation flow among the nodes, which is beneficial to mitigate the traffic congestion and air pollution in busy node and reduce the idle resource in other nodes when the total cost or time change less.
As mentioned above, a main objective of intermodal transportation is to reduce the road transportation rate.In general, unless the fastest delivery is preferred, waterway and railway will cover the entire long-distance transportation and reduce the rate in the last-mile trucking transportation in intermodal transportation network.Therefore, intermodal transportation would be further advocated if the hard carbon emission regulations are set and government should focus on improving intermodal infrastructure construction, especially the port construction where waterway is available and high speed railway construction when waterways do not exist.

Conclusions
In this paper, the intermodal transportation problem was modeled as a MO_MITNDP problem concerning both the total cost and the time criteria.What mostly distinguished our model is the stage dividing according to the real intermodal infrastructure construction such as business type, geography characteristic and service radiant scope.We stand on the freight integrator's perspective to enhance the competitiveness and honor loyal customers by offering complicated decisions according to the personalized requirements.Specifically, the handling capacity in each node is programmed as a constraint, which will disperse transportation pressure by reducing the probability of freight assembling.To solve this hard constraint, the multi-objective-based constraints handling approach is adopted.Then, a hybrid estimation of distribution algorithm was designed to solve the problem.Numerical comparison results showed the effectiveness and robustness of the proposed algorithm for MO_MITNDP.Meanwhile, the case study from the petrochemical industry helped the intermodal company provide a highly door-to-door service in the trade-off among transportation cost, emission cost and the maximum flow time.The simulation results confirmed that we can save time and carbon emission without much sacrifice of transportation cost by the optimization of intermodal transportation network.Furthermore, this work is the first to consider the estimation of distribution algorithm when solving an intermodal transportation problem.In future work, it is interesting to propose other evolutionary algorithms to solve such problems and research on the low-carbon transportation network design to achieve the economic effectiveness and the environmental effectiveness.

Figure 1 .
Figure 1.Structure of multi-sourcing intermodal transportation network.Figure 1. Structure of multi-sourcing intermodal transportation network.

Figure 1 .
Figure 1.Structure of multi-sourcing intermodal transportation network.Figure 1. Structure of multi-sourcing intermodal transportation network.

Figure 2 .
Figure 2. The procedure of updating the Pareto archive.

Figure 2 .
Figure 2. The procedure of updating the Pareto archive.

Figure 2 .
Figure 2. The procedure of updating the Pareto archive.

Figure 3 .
Figure 3.The procedure of updating the probability model.

Figure 5 .
Figure 5.The procedure of generating new population.

Figure 4 .
Figure 4.The procedure of calculating MDF.

Figure 5 .
Figure 5.The procedure of generating new population.Figure 5.The procedure of generating new population.

Figure 5 .
Figure 5.The procedure of generating new population.Figure 5.The procedure of generating new population.
are selected first and then we interchange the partial elements between 1 c and 2 c associated with two randomly selected 1 r and 2 r , mechanism of PIN can be illustrated by a simple example, as shown in Figure6.

Figure 7 .
Figure 7.The procedure of PIN-based local search.
and then we interchange the partial elements between 1 c and 2 c associated with two randomly selected 1 r and 2 r , mechanism of PIN can be illustrated by a simple example, as shown in Figure6.

Figure 7 .
Figure 7.The procedure of PIN-based local search.

Figure 7 .
Figure 7.The procedure of PIN-based local search.
all algorithms 20 times, and the results are all based on the average level of 10 instances with respect to the related N M  combinations.In the results, MIN, MAX, AVG and SD represent the minimum, maximum, average and standard deviation, respectively, corresponding to the related performance metrics.The unique terminated condition for all algorithms is the maximal evaluation times of objective functions, which is set to 1000 N M   .All algorithms are coded in C++ in Microsoft Visual Studio 2008 environment, and all the simulations are conducted on a PC with Intel Core-i5 3.30 GHz processor with 4 GB memory.

ONSN
 is the number which are not dominated by solutions in   as shown in ( ) l ONSN  is, the better performance of Algorithm l.

( 2 )
Ratio of non-dominated solutions (RNDS)Denote the ratio of solutions which are not dominate following Equation(21).The higher the value of an PGA are set to 0.8 and 0.2, respectively.It should be noted th local search operator in to the NSGAII and PGA to ensure the We first make an observation on the approximate Pareto of

( 2 )
Ratio of non-dominated solutions (RNDS)Denote the ratio of solutions which are not dominated by solutions in Sustainability 2017,9, 1133 normalized distance from solution y in   then DIR can be defined in Equation(22).Obviously, a smaller ( ) l DIR  m convergence performance to   , as well as a better distribution of l

( 3 )
Average distance (DIR) Denote d yx (π i ) is the shortest normalized distance from solution y in Sustainability 2017, 9, 1133 normalized distance from solution y in   50 PS for all algorithms, and, meanwhile, HEDA's learning rate 0.06 LR  , the crossover and mutation probability of NSGAII and PGA are set to 0.8 and 0.2, respectively.It should be noted that we have embedded the PIN-based local search operator in to the NSGAII and PGA to ensure the fairness of comparisons.We first make an observation on the approximate Pareto fronts with regard to the first instance of


verify the performance of HEDA, three performance metrics[40] are adopted : all non-dominated solutions number (ONSN)     is the union of non-dominated solutions with regard to 1,..., ,..., l L , and then ( )lONSN  is the number of non-dominated solutions in l  not dominated by solutions in   as shown in Equation ( normalized distance from solution y the performance of HEDA, three performance metrics[40] are adopted nated solutions number (ONSN) union of non-dominated solutions with regard to nd then ( )lONSN  is the number of non-dominated solutions in l  ed by solutions in   as shown in Equation(20).The larger the value of er performance of Algorithm l. are not dominated by solutions in   , shown as).The higher the value of ( ) l RNDS  is, the better performance of Algorithm

Figure 10 .
Figure 10.The trends of run time for the algorithms.* The feasibility rates of these four algorithms are 100%.

Figure 10 .
Figure 10.The trends of run time for the algorithms.* The feasibility rates of these four algorithms are 100%.
The handling capacity for the kth node of stage j, j ∈ {1, . . . ,M − 1} , k ∈ 1, . . ., R j , W 10 , W 20 , . . ., W N0 , and W 1M = M; Sustainability 2017, 9, 1133 9 of 24 least one solution in current PA, and, meanwhile, the solution that is dominated by newly generated individual will be removed from PA.Here, we propose the update procedure of PA as shown in Figure2.
08 End;//end if 09 End; //end for iii 10 End //end if 11 Else 12 Continue; //skip to the next loop 13 End; //end else 14 End; //end for ii 15 End; //end procedure Sustainability 2017, 9, 1133 9 of 24 least one solution in current PA, and, meanwhile, the solution that is dominated by newly generated individual will be removed from PA.Here, we propose the update procedure of PA as shown in Figure 2.
08 End;//end if 09 End; //end for iii 10 End //end if 11 Else 12 Continue; //skip to the next loop 13 End; //end else 14 End; //end for ii 15 End; //end procedure The procedure of updating the probability model.
[41]EDA_noLS, a variant of N-based local search operator is removed, and the parameter setting is the SGAII[41], a well-known non-dominated sorting multi-objective genetic een widely used for various multi-objective optimization problems; and (3) ew reported genetic algorithm for a special kind of intermodal transportation similar to MO_MSITNDP.It should be mentioned that all the algorithms are odified model in Section 3.2.rs setting, we set the population size , the crossover and mutation probability of NSGAII and 0.2, respectively.It should be noted that we have embedded the PIN-based .

Table 4 .
Comparisons of HEDA and CPLEX under a single run.

Table 7 .
Parameters of transportation cost and speed for each mode.

Table 8 .
The Pareto set obtained by HEDA.

Table 8 .
The Pareto set obtained by HEDA.