Multi-Objective Collaborative Optimization for Low-Carbon Cold-Chain Routing with Dynamic Demand

Hu, Qiaoying; Liu, Xiangxin; Jiang, Xiaoyun

doi:10.3390/math14050753

Open AccessArticle

Multi-Objective Collaborative Optimization for Low-Carbon Cold-Chain Routing with Dynamic Demand

by

Qiaoying Hu

,

Xiangxin Liu

and

Xiaoyun Jiang

^*

School of Economics and Management, Xiamen University of Technology, Xiamen 361024, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(5), 753; https://doi.org/10.3390/math14050753

Submission received: 6 January 2026 / Revised: 8 February 2026 / Accepted: 21 February 2026 / Published: 24 February 2026

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

To address the challenges of high energy consumption, substantial carbon emissions, and dynamic customer demand in cold-chain logistics, this paper investigates the balance between sustainable development and operational efficiency for low-carbon distribution. We construct a Multi-Objective Low-Carbon Cold-Chain Vehicle Routing Problem with Dynamic Demand (MO-LC-CCDVRP) model to synergistically optimize the comprehensive costs, including vehicle dispatch, transportation adjustments, carbon emissions, and refrigeration, while maximizing customer satisfaction. To solve this model efficiently, we propose a novel deep reinforcement learning-enhanced Non-Dominated Sorting Genetic Algorithm II (DRL-NSGA-II). By using DRL to adaptively control the genetic operators, this algorithm significantly enhances both the convergence speed and distribution quality of the Pareto front. The solution process occurs in two stages: first, high-quality initial routes are generated from static information; then, upon dynamic information updates, rapid replanning is performed for unserved customers. Numerical experiments using adapted Solomon benchmark instances demonstrate the superiority of the proposed algorithm. Furthermore, a dynamic distribution case study confirms the model’s effectiveness, and a sensitivity analysis elucidates the complex impact of carbon pricing on total cost, customer satisfaction, and carbon emissions.

Keywords:

dynamic demand; cold-chain vehicle routing problem; non-dominated sorting genetic algorithm II; low-carbon; multi-objective optimization

MSC:

90B06

1. Introduction

Driven by consumption upgrading and rising expectations regarding the quality and freshness of perishable goods, China’s cold-chain logistics market has experienced sustained growth. As a result, improving distribution efficiency has become a critical priority for the industry. However, many cold-chain logistics providers still rely largely on drivers’ experience for route planning and lack scientific, systematic delivery strategies, leading to high operating costs and limited efficiency gains. Meanwhile, customers increasingly demand tighter delivery timeliness and higher service quality. In practice, firms often emphasize time windows at distribution centers while overlooking customer-side constraints, which results in suboptimal routing, delivery delays, and ultimately lower customer satisfaction and reputational damage.

Furthermore, amid growing global environmental and energy concerns, logistics distribution has become a major source of carbon emissions. Cold-chain logistics, which require continuous refrigeration, consume substantially more energy and produce higher carbon emissions than conventional freight. As demand continues to surge, the associated environmental pressure also intensifies. Therefore, to meet low-carbon development imperatives, cold-chain enterprises must not only address efficiency and cost challenges but also strike a balance between economic and environmental objectives. Accordingly, there is an urgent need to explore energy-saving and emission-reducing routing and distribution solutions to achieve sustainable development.

2. Literature Review

2.1. Vehicle Routing Problem in Cold-Chain Logistics

The vehicle routing problem (VRP), first introduced by Dantzig and Ramser in 1959 [1], has since spawned numerous variants. Among these, the cold-chain vehicle routing problem (CCVRP) represents a significant branch. For instance, Zhang and Li [2] established a bi-objective model aiming to minimize both total travel time and propagation risk; Qi and Hu [3] optimized vehicle routing for emergency cold-chain logistics, with the objective of minimizing fuel, refrigeration, and cargo damage costs; Sun et al. [4] applied neural networks to assess the impact of traffic conditions on costs in cold-chain distribution. Meanwhile, Guan and Li [5] constructed a multi-objective programming model to optimize distribution cost and customer satisfaction.

2.2. Integration of Low-Carbon Objectives into Routing Optimization

With the wider adoption of low-carbon development concepts, carbon emissions have been increasingly integrated into the VRP research framework. In general logistics, Cai et al. [6] developed a variable-speed low-carbon routing model for autonomous vehicles that considers speed limits; Zhang et al. [7] incorporated an environmentally self-regulating cost function into their model and experimentally examined the impact of emission reduction coefficients; Liu et al. [8] addressed the time-dependent green vehicle routing problem with time windows with the goal of minimizing carbon emissions; and Li [9], while studying carbon emission issues, analyzed customer behavioral characteristics and factors influencing customer satisfaction. Due to the inherently high energy consumption of CCVRP, some scholars have also begun to incorporate carbon emissions into their models. For instance, Nan et al. [10] and Zhang et al. [11] treated carbon emissions as a cost component in cold-chain distribution problems, considering traffic congestion. Ma and Zhu [12] comprehensively considered costs related to cargo damage, energy consumption, and carbon emissions, establishing a multi-vehicle cold-chain VRP model. Ding et al. [13] integrated both product freshness and carbon emissions into the cold-chain logistics routing optimization problem. Although research combining low-carbon objectives with CCVRP is gradually increasing, many existing studies incorporate carbon emissions primarily as a cost sub-item, with limited discussion of their distinct impact on routing decisions.

2.3. Dynamic VRP and the Gap in Multi-Objective Cold-Chain Logistics

Given that customer demand in real-world scenarios is dynamic, research has increasingly shifted towards dynamic vehicle routing problems (DVRPs). In terms of model construction, substantial work has focused on single-objective formulations. For example, Jiang et al. [14] investigated cold-chain vehicle routing with dynamic demand considerations, with the objective of minimizing total costs. Baty et al. [15] investigated routing optimization with dynamic demand in e-commerce to improve delivery efficiency. Liu et al. [16] studied a green vehicle routing problem with time-varying time windows to minimize carbon emissions. To address dynamic events like new orders, cancellations, and address changes, Li et al. [17] developed a two-stage planning model for the DVRP to minimize total travel distance. Kim et al. [18] proposed two deployment strategies for the vehicle routing and scheduling of emergency orders in vaccine supply chains to minimize total costs. Regarding multi-objective models, Wan et al. [19] constructed a bi-objective model for emergency relief dispatch, minimizing distance and the number of served nodes to ensure timely response. Peng et al. [20] focused on multimodal cold-chain transportation, proposing a high-dimensional multi-objective routing optimization model to minimize total transportation time, cost, carbon emissions, and food waste.

While there has been growing exploration into the synergistic optimization of dynamic vehicle routing and low-carbon objectives, the integration of these two dimensions remains relatively limited within the specific domain of cold-chain logistics. In the broader field of general logistics, some scholars have attempted to combine real-time dynamic demand with carbon emission optimization. For instance, Liu et al. [21] developed a dynamic heterogeneous vehicle routing model that, through tests across different dynamism levels, highlighted the impact of carbon emission policies on routing planning effectiveness. Wang et al. [22] further addressed the dynamic time-dependent green vehicle routing problem for decoration waste collection. Their model integrated complex factors such as urban traffic congestion and time-varying vehicle speeds to jointly optimize carbon emissions and operational costs, employing a stochastic sampling method to handle demand uncertainty. However, these studies have seen limited extension to the context of cold-chain logistics. In cold-chain distribution, dynamic demand not only necessitates route adjustments but also requires altering parking duration, loading or unloading frequency, and the operational status of refrigeration units, thereby exerting a potentially nonlinear and magnified effect on energy consumption and carbon emissions. Existing research on dynamic cold-chain routing, such as [14], while considering demand fluctuations, often treats carbon emissions merely as a cost component, with limited discussion of their underlying impact mechanisms. Conversely, studies focusing on low-carbon cold-chain logistics, such as [10,11,12,13], often operate within static frameworks, struggling to respond to dynamic demand changes. Consequently, existing research, regardless of its focus, has yet to systematically reveal the key drivers and evolution patterns of carbon emissions within a dynamic multi-objective cold-chain vehicle routing planning framework.

2.4. Solution Methodologies for VRP

In terms of algorithms for solving the VRP, exact methods (e.g., cutting-plane [23] and branch-and-bound [24]) are often limited by problem scale. Consequently, metaheuristic algorithms have become mainstream in recent years. For instance, Jiang et al. [25] integrated the Floyd algorithm into a genetic algorithm to address the vehicle routing problem with time windows (VRPTW) while accounting for route infeasibility. Ahmed and Yousefikhoshbakht [26] designed an improved Tabu Search algorithm with variable tabu lists and reinforcement mechanisms to solve the heterogeneous fixed-fleet open VRPTW. İlhan [27] proposed an improved Simulated Annealing algorithm incorporating crossover operators to address the capacitated vehicle routing problem (CVRP). Furthermore, algorithms such as Ant Colony Optimization [28] and Particle Swarm Optimization [29] are also widely applied to VRP variants. This paper, however, focuses on a bi-objective VRP with dynamic demands, where the core challenge is obtaining a set of Pareto-optimal solutions that reflect the trade-offs between objectives. The Non-Dominated Sorting Genetic Algorithm II (NSGA-II) [30] has become a benchmark framework in this field due to its fast non-dominated sorting and crowding distance mechanisms. Nonetheless, its parameters are typically preset and fixed, making it difficult to dynamically adjust the search strategy in complex problems, which can limit both solution efficiency and the quality of the solution set. To overcome the limited adaptability of algorithms like NSGA-II in dynamic environments, researchers have begun to explore their integration with deep reinforcement learning (DRL), which possesses real-time decision-making capabilities.

2.5. RL-Enhanced Meta-Heuristic Optimization

In recent years, DRL has demonstrated outstanding performance in dynamically solving combinatorial optimization problems such as the traveling salesman problem [31] and job shop scheduling [32], providing novel insights for enhancing the performance of traditional algorithms in dynamic and complex scenarios.

The potential of reinforcement learning (RL), particularly its advanced form deep reinforcement learning (DRL), to guide and enhance search processes has generated significant research interest in its integration with meta-heuristics, especially genetic and evolutionary algorithms (GEAs). This integration primarily aims to overcome the limitations inherent in traditional GEAs, such as static parameter settings and fixed operator strategies. Current research efforts can be systematically categorized into several key methodologies: (1) adaptive parameter control, where an RL agent dynamically adjusts crucial GEA parameters (e.g., crossover and mutation rates) based on the real-time state of the evolutionary search to improve efficiency [33]; (2) intelligent operator selection, wherein RL learns to select the most promising evolutionary operators from a candidate pool during different search phases to generate higher-quality offspring [34]; and (3) search strategy guidance, especially in multi-objective optimization, where RL employs reward mechanisms to steer the population towards under-explored regions of the Pareto front, thereby enhancing both diversity and convergence [35]. Furthermore, some studies aim to develop more tightly coupled hybrid frameworks, for instance, by deeply embedding DRL within multi-objective evolutionary algorithms to achieve bidirectional synergy and real-time feedback at the genetic level, thereby co-driving the optimization process [36].

However, despite the demonstrated potential of these advanced hybrid strategies in certain dynamic optimization problems, their application to dynamic multi-objective VRP—particularly within complex real-world domains like cold-chain logistics, which involve intricate spatiotemporal and energy-consumption coupling constraints—remains limited.

2.6. Identified Research Gaps and Contribution

Through a review of the existing literature, the following key research gaps are identified:

(1): Insufficient depth in low-carbon analysis: Most studies treat carbon emissions primarily as a cost component, with limited discussion of the underlying mechanisms through which emission levels influence routing decisions.
(2): Limited integration between dynamic and low-carbon factors in CCVRP: Existing cold-chain routing optimization is often conducted under static assumptions, with limited exploration of how dynamic demand fluctuations may require coordinated adjustment to distribution plans and carbon emissions under low-carbon constraints.
(3): Scarcity of comprehensive multi-objective models: Few studies simultaneously incorporate cold-chain characteristics, dynamic demand, and low-carbon objectives while also adopting a two-stage modeling framework.
(4): A gap in translating hybrid algorithmic paradigms to complex VRP: Despite the development of various RL/DRL and meta-heuristic hybrids (see Section 2.5), there remains a gap in translating these paradigms into effective, adaptive solvers for dynamic multi-objective cold-chain VRP, which can affect solution quality and convergence in such scenarios.

To address these gaps, this paper proposes a mathematical model for MO-LC-CCDVRP. Formulated as a bi-objective optimization problem, this model simultaneously minimizes total cost and maximizes customer satisfaction while integrating practical constraints, including time windows, travel distance, and vehicle capacity. To solve this model efficiently, we design a deep reinforcement learning-enhanced Non-Dominated Sorting Genetic Algorithm II (DRL-NSGA-II) to enhance solution efficiency and quality.

3. Problem Description and Model Construction

3.1. Problem Description

To address the dual challenges of dynamic demand and low-carbon development in cold-chain logistics, this paper investigates the MO-LC-CCDVRP. The classical VRP model is insufficient because it relies on static, idealized assumptions, neglecting critical real-world factors, including time window constraints essential for preserving the quality of perishable goods, the dynamic nature of customer demand, and the need for synergistic optimization of economic and environmental objectives. The core problem addressed in this paper is defined as follows: During a single-distribution-center cold-chain delivery process, when customer demands occur dynamically, the objective is to simultaneously optimize total delivery cost (including carbon emission costs) and customer satisfaction in real time while adhering to constraints such as vehicle capacity and time windows and determining the optimal route.

To develop a feasible mathematical model, the following assumptions are made based on realistic cold-chain operations:

(1): The distribution network involves a single distribution center serving multiple customer nodes.
(2): All vehicles depart from the distribution center and must return after completing their service.
(3): Vehicles are of the same type, with identical load capacity and refrigeration capability.
(4): Vehicles deliver only a single type of cold-chain product and maintain a constant temperature throughout the journey.
(5): Each customer node is served exactly once.
(6): Each customer node has a fixed time window.
(7): Once a dynamic demand order is confirmed, it is considered known and must be served.
(8): Vehicles travel at a constant speed.
(9): Uncontrollable external factors, such as weather, are temporarily not considered.

Table 1 presents the notations and their descriptions required for the constructed mathematical model.

This paper formulates a multi-objective optimization model comprising two objectives:

Z_{1}

which minimizes the total cost, and

Z_{2}

, which maximizes customer satisfaction. These two objectives will be optimized simultaneously to obtain a set of non-dominated, Pareto-optimal solutions. The total cost objective function

Z_{1}

(Section 3.1.1) and the customer satisfaction objective function

Z_{2}

(Section 3.1.2) are analyzed in detail in the following subsections.

3.1.1. Analysis of the Total Cost Objective Function

The total cost comprises four components: the fixed vehicle dispatch cost

C_{1}

, the variable transportation cost

C_{2}

, the carbon emission cost

C_{3}

, and the refrigeration cost

C_{4}

. The first objective is to minimize this total cost. Formally, with respect to the decision variables

x_{i j k}

, the objective function is defined as follows:

\min_{{x_{i j k}}} C_{1} (x_{i j k}) + C_{2} (x_{i j k}) + C_{3} (x_{i j k}) + C_{4} (x_{i j k})

(1)

The calculation formulas for each cost component are as follows:

(1) The fixed vehicle dispatch cost

C_{1}

, which primarily covers vehicle procurement and driver expenses, is calculated using Equation (2):

C_{1} = c_{1} \sum_{j \in N} \sum_{k \in K} x_{0 j k}

(2)

(2) The variable transportation cost

C_{2}

is proportional to the total travel distance and is calculated using Equation (3):

C_{2} = c_{2} \sum_{i \in N} \sum_{j \in N} \sum_{k \in K} x_{i j k} d_{i j}

(3)

(3) The carbon emission cost

C_{3}

is calculated based on the carbon emissions produced by the vehicle’s fuel consumption during transit. Following the fuel consumption model proposed in [37], the expression for fuel consumption per unit distance is given by Equation (4). Then, the carbon emission calculation formula in [38] (Equation (5)) is incorporated to construct the final carbon emission cost formula, as expressed in Equation (6):

ρ (q_{i}) = ρ^{0} + \frac{ρ^{*} - ρ^{0}}{Q} \cdot q_{i}

(4)

E = ε \cdot H

(5)

C_{3} = λ_{f} γ \sum_{i \in N} \sum_{j \in N} \sum_{k \in K} ρ (q_{i}) d_{i j} x_{i j k} + λ_{r} γ \sum_{i \in N} \sum_{j \in N} \sum_{k \in K} q_{i} d_{i j} x_{i j k}

(6)

Here,

ρ^{0}

is the base fuel consumption rate,

ρ^{*}

is the full-load fuel consumption rate,

q_{i}

is the load carried by the vehicle,

Q

is the maximum vehicle load capacity,

E

is the carbon emission amount,

ε

is the fuel-to-emission conversion factor, and

H

is the fuel consumption.

(4) The refrigeration cost

C_{4}

arises mainly from continuous refrigeration during transportation and the additional energy consumption caused by door openings during unloading. According to [39], it is calculated using Equation (7):

C_{4} = c_{4} [\sum_{i \in N} \sum_{j \in N} \sum_{k \in K} d_{i j} (1 + α) \cdot R (\sqrt{S_{1} - S_{2}}) \cdot Δ T \frac{d_{i j}}{v} + \sum_{i \in N} \sum_{j \in N} \sum_{k \in K} β (0.54 v_{k} + 3.22) \cdot Δ T x_{i j k} \frac{q_{i}}{l}]

(7)

3.1.2. Analysis of the Customer Satisfaction Objective Function

Customer satisfaction

M_{i}

is defined as a function of the delivery time

t_{i}

, with values ranging from 0 to 1. This paper uses a piecewise function to characterize it: if the delivery occurs within the customer’s satisfactory time window

[E T_{i}, L T_{i}]

, satisfaction is 1; if the delivery occurs within the acceptable time window

[E T_{i}^{'}, L T_{i}^{'}]

, satisfaction decreases linearly with the deviation of the delivery time from the satisfactory time window; otherwise, satisfaction is 0. The specific calculation formula is given in Equation (8). The overall objective is to maximize the average customer satisfaction across all served customers, denoted as

Z_{2}

, which is defined in Equation (9):

M_{i} = \{\begin{cases} 0, t_{i} \leq E T_{i}^{'}, t_{i} \geq L T_{i}^{'} \\ \frac{t_{i} - E T_{i}^{'}}{E T_{i} - E T_{i}^{'}}, E T_{i}^{'} \leq t_{i} \leq E T_{i} \\ \frac{L T_{i} - t_{i}}{L T_{i}^{'} - L T_{i}}, L T_{i} \leq t_{i} \leq L T_{i}^{'} \\ 1, E T_{i} \leq t_{i} \leq L T_{i} \end{cases}

(8)

\max \frac{\sum_{i \in N} M_{i}}{N}

(9)

3.2. Model Construction

Based on the aforementioned problem description and analysis, this paper constructs a basic CCVRP model and a CCVRP model that incorporates dynamic demand, respectively.

3.2.1. Basic Cold-Chain Vehicle Routing Problem Model

\min_{{x_{i j k}}} C_{1} (x_{i j k}) + C_{2} (x_{i j k}) + C_{3} (x_{i j k}) + C_{4} (x_{i j k})

(10)

\max \frac{\sum_{i \in N} M_{i}}{N}

(11)

s . t . \sum_{i \in N, i \neq j} \sum_{k \in K} x_{i j k} q_{i} \leq Q, \forall j \in N

(12)

\sum_{j \in N} \sum_{k \in K} x_{i j k} \leq K, i = 0

(13)

\sum_{j \in N} x_{0 j k} = \sum_{i \in N} x_{i 0 k}, \forall k \in K

(14)

\sum_{i \in N, i \neq j} \sum_{k \in K} x_{i j k} = 1, \forall j \in N

(15)

\sum_{j \in N, i \neq j} \sum_{k \in K} x_{i j k} = 1, \forall i \in N

(16)

\sum_{i \in N, i \neq j} x_{i j k} (q_{i j k} - q_{i}) \geq 0, \forall j \in N; \forall k \in K

(17)

t_{j} = t_{i} + \frac{q_{i}}{l} + \frac{d_{i j}}{v}, \forall i, j \in N; \forall k \in K; i \neq j

(18)

E T_{i}^{'} \leq t_{i} \leq L T_{i}^{'}, \forall i \in N

(19)

\sum_{i \in S, i \neq j} \sum_{j \in S} x_{i j k} \leq |S| - 1, \forall S \in N; \forall k \in K

(20)

Equations (10) and (11) are the objective functions, representing the minimization of total cost and the maximization of customer satisfaction, respectively. Equation (12) represents the load capacity constraint; Equation (13) represents the fleet size constraint; Equation (14) ensures that vehicles return to the distribution center. Equations (15) and (16) together ensure that each customer is served exactly once by one vehicle, while Equation (17) guarantees that the servicing vehicle meets the customer’s demand. Equation (18) calculates the vehicle’s arrival time at each customer point, which must satisfy the time window constraints specified in Equation (19). Finally, Equation (20) is the subtour elimination constraint, ensuring route continuity.

3.2.2. Cold-Chain Vehicle Routing Problem Model Considering Dynamic Demand

When dynamic demand occurs, it is assumed that a subset of customers has already been served during the static phase, and the set of remaining unserved customers is denoted as

Y

. Vehicles that have already departed and cannot return are treated as a virtual depot, with the corresponding set denoted as

K_{V}

; the remaining capacity of the vehicle

k

in this set is denoted as

Q_{k v}

. Meanwhile, let the set of newly emerged dynamic demand customers be

Y^{*}

, and the set of additionally dispatched vehicles be

K_{V}^{*}

. Under this setting, the total cost for dynamic route replanning consists of the following two independent components:

(a): Delivery cost for static-phase unserved customers ( $C_{1}^{'}, C_{2}^{'}, C_{3}^{'}, C_{4}^{'}$ ): This refers to the cost incurred by continuing to serve the remaining customers from the static phase (set $Y$ ), including the fixed vehicle dispatch cost $C_{1}^{'}$ , the variable transportation cost $C_{2}^{'}$ , the carbon emission cost $C_{3}^{'}$ , and the refrigeration cost $C_{4}^{'}$ .
(b): Delivery cost for newly added dynamic-phase customers ( $C_{1}^{*}, C_{2}^{*}, C_{3}^{*}, C_{4}^{*}$ ): This refers to the cost incurred by serving the newly added customers in the dynamic phase (set $Y^{*}$ ), which likewise comprises the aforementioned four cost components.

Based on the above definitions, the model is formulated as follows:

\begin{array}{l} \min_{{x_{i j k}}} (C_{1}^{'} (x_{i j k}) + C_{2}^{'} (x_{i j k}) + C_{3}^{'} (x_{i j k}) + C_{4}^{'} (x_{i j k})) + (C_{1}^{*} (x_{i j k}) + C_{2}^{*} (x_{i j k}) + C_{3}^{*} (x_{i j k}) + C_{4}^{*} (x_{i j k})) \\ = (\begin{array}{l} c_{1} \sum_{j \in Y} \sum_{k \in K_{V}} x_{0 j k} + \\ c_{2} \sum_{i \in Y} \sum_{j \in Y} \sum_{k \in K_{V}} x_{i j k} d_{i j} + \\ λ_{f} γ \sum_{i \in Y} \sum_{j \in Y} \sum_{k \in K_{V}} ρ (q_{i}) d_{i j} x_{i j k} + λ_{r} γ \sum_{i \in Y} \sum_{j \in Y} \sum_{k \in K_{V}} q_{i} d_{i j} x_{i j k} + \\ c_{4} [\sum_{i \in Y} \sum_{j \in Y} \sum_{k \in K_{V}} d_{i j} (1 + α) \cdot R (\sqrt{S_{1} - S_{2}}) \cdot Δ T \frac{d_{i j}}{v} + \sum_{i \in Y} \sum_{j \in Y} \sum_{k \in K_{V}} β (0.54 v_{k} + 3.22) \cdot Δ T x_{i j k} \frac{q_{i}}{l}] \end{array}) + \\ (\begin{array}{l} c_{1} \sum_{j \in Y^{*}} \sum_{k \in K_{V}^{*}} x_{0 j k} + \\ c_{2} \sum_{i \in Y^{*}} \sum_{j \in Y^{*}} \sum_{k \in K_{V}^{*}} x_{i j k} d_{i j} + \\ λ_{f} γ \sum_{i \in Y^{*}} \sum_{j \in Y^{*}} \sum_{k \in K_{V}^{*}} ρ (q_{i}) d_{i j} x_{i j k} + λ_{r} γ \sum_{i \in Y^{*}} \sum_{j \in Y^{*}} \sum_{k \in K_{V}^{*}} q_{i} d_{i j} x_{i j k} + \\ c_{4} [\sum_{i \in Y^{*}} \sum_{j \in Y^{*}} \sum_{k \in K_{V}^{*}} d_{i j} (1 + α) \cdot R (\sqrt{S_{1} - S_{2}}) \cdot Δ T \frac{d_{i j}}{v} + \sum_{i \in Y^{*}} \sum_{j \in Y^{*}} \sum_{k \in K_{V}^{*}} β (0.54 v_{k} + 3.22) \cdot Δ T x_{i j k} \frac{q_{i}}{l}] \end{array}) \end{array}

(21)

\max \frac{\sum_{i \in Y \cup Y^{*}} M_{i}}{N}

(22)

s . t . \sum_{j \in Y \cup Y^{*}} x_{0 j k} = \sum_{i \in Y \cup Y^{*}} x_{i 0 k}, \forall k \in K_{V} \cup K_{V}^{*}

(23)

\sum_{i \in Y \cup Y^{*}, i \neq j} \sum_{k \in K_{V} \cup K_{V}^{*}} x_{i j k} = 1, \forall j \in Y \cup Y^{*}

(24)

\sum_{j \in Y \cup Y^{*}, i \neq j} \sum_{k \in K_{V} \cup K_{V}^{*}} x_{i j k} = 1, \forall i \in Y \cup Y^{*}

(25)

t_{j} = t_{i} + \frac{q_{i}}{l} + \frac{d_{i j}}{v}, \forall i, j \in Y \cup Y^{*}; \forall k \in K_{V} \cup K_{V}^{*}

(26)

x_{0 j k} = 0, \forall j \in Y \cup Y^{*}; \forall k \in K_{V} \cup K_{V}^{*}

(27)

q_{i j k} = Q - Q_{k v}, \forall i, j \in Y \cup Y^{*}; \forall k \in K_{V} \cup K_{V}^{*}; i \neq j

(28)

\sum_{i \in Y \cup Y^{*}, i \neq j} \sum_{j \in Y \cup Y^{*}} x_{i j k} q_{i} \leq Q_{k v}, \forall k \in K_{V} \cup K_{V}^{*}

(29)

\sum_{i \in Y \cup Y^{*}, i \neq j} x_{i j k} (q_{i j k} - q_{i}) \geq 0, \forall j \in Y \cup Y^{*}; \forall k \in K_{V} \cup K_{V}^{*}

(30)

E T_{i}^{'} \leq t_{i} \leq L T_{i}^{'}, \forall i \in Y \cup Y^{*}

(31)

\sum_{i \in S, i \neq j} \sum_{j \in S} x_{i j k} \leq |S| - 1, \forall S \in Y \cup Y^{*}; \forall k \in K_{V} \cup K_{V}^{*}

(32)

Equations (21) and (22) are the objective functions, aiming to minimize the total distribution cost defined above and to maximize customer satisfaction. Equation (23) is the vehicle return constraint. Equations (24) and (25) together ensure that each customer is served exactly once. Equation (26) calculates the vehicle arrival time at each point. For dynamic demand, Equation (27) enforces that the virtual customer point (representing a vehicle that has already departed) must be the next node in the route, with its load upon departure determined by Equation (28). Equations (29) and (30) jointly ensure load feasibility for vehicles in subsequent services. Equation (31) is the time window constraint, and Equation (32) is the subtour elimination constraint.

4. Research Methodology

4.1. Problem-Solving Strategy

This paper proposes a unified solution framework based on DRL-NSGA-II to address the entire process from static planning to dynamic response. This framework first generates a high-quality initial Pareto solution set under static demand conditions. When dynamic demands occur during the distribution process, this solution set serves as the foundation. Through customer information updates and population reconstruction, DRL-NSGA-II performs rapid re-optimization, thereby quickly obtaining a high-quality final solution set adapted to the new environment. The overall flowchart of the algorithm is presented in Figure 1, and its effectiveness hinges on the robust adaptive search capability of DRL-NSGA-II (as detailed in Section 3.2).

The specific solving steps are as follows:

Step 1: Generate static initial solutions: Using the complete set of known customer data, DRL-NSGA-II is executed to generate a high-quality initial Pareto solution set.

Step 2: Dynamic response and reset: When dynamic events occur (e.g., order addition/cancellation, demand change, time window adjustment), the algorithm starts from the initial solution set, updates the set of unserved customers, verifies capacity and time-window feasibility, and reconstructs the chromosome encoding to initialize a new population.

Step 3: Dynamic replanning: Based on the reset population, DRL-NSGA-II is invoked again to perform optimization, ultimately yielding a high-quality Pareto solution set adapted to the dynamic environment.

4.2. Design of the Deep Reinforcement Learning-Enhanced Non-Dominated Sorting Genetic Algorithm II (DRL-NSGA-II)

To address the MO-LC-CCDVRP model developed in this paper, we design an enhanced algorithm named DRL-NSGA-II, which integrates DRL with NSGA-II. The traditional NSGA-II algorithm relies on fixed genetic operators and parameters during evolution, making it difficult to adaptively balance exploration and exploitation in dynamic and complex environments. To overcome this limitation, this study introduces a DRL agent. By monitoring the real-time state of the population, the agent dynamically selects the most suitable combination of crossover and mutation operators for each generation. This enables autonomous guidance and optimization of the search process, enhancing convergence accuracy and distribution quality of the solution set. The overall flowchart of the algorithm is illustrated in Figure 2.

4.2.1. Chromosome Encoding

To address the common limitation of traditional encoding schemes, where genetic operations frequently generate infeasible solutions, a novel integer encoding scheme based on a template vector and index mapping is proposed. First, a template vector

T

is constructed by concatenating (

N_{c a r} - 1

) route delimiters (0) and the ordered customer numbers (

1, 2, \dots, N_{n o d e}

). A chromosome

C

is then defined as an index permutation of

T

. Each gene in

C

stores an index that references an element in

T

, not the route sequence itself. During decoding, elements are sequentially retrieved from

T

according to the index sequence in

C

; a route delimiter indicates the start of a new vehicle route. This design ensures that genetic operations produce only feasible routes. An illustrative example of the encoding and decoding processes is provided in Figure 3.

4.2.2. Initial Population

To achieve a well-distributed initial population and strengthen global exploration, we adopt the Sobol sequence, a low-discrepancy quasi-random method, to generate initial solutions. This approach generates uniformly distributed points in a high-dimensional unit hypercube, which are then linearly mapped into the problem’s solution space, yielding an initial chromosome set with high diversity.

4.2.3. Fitness Evaluation

The fitness function is derived directly from the mathematical model presented in Section 2.2. To establish a unified minimization framework, the second objective of maximizing customer satisfaction

Z_{2}

is transformed into minimizing customer dissatisfaction

Z_{2}^{'}

, i.e.,

Z_{2}^{'} = 1 - Z_{2}

. Hence, both objectives in the algorithm are formulated as minimization problems:

\min F = (Z_{1}, Z_{2}^{'})

(33)

4.2.4. Genetic Operators

(1): Selection Operator

In the selection stage, K-ary tournament selection is adopted. Individuals are repeatedly sampled at random from the current population in groups of

K

. The best individual from each group is then selected into the mating pool. This process is repeated until the desired parent population size is attained.

(2): Crossover and Mutation Operations based on the DRL Algorithm

The traditional NSGA-II algorithm relies on fixed crossover and mutation operators, which limit its ability to dynamically balance global exploration and local exploitation during evolution. To overcome this limitation, we integrate a DRL mechanism into the crossover and mutation phases. This mechanism adaptively selects the most suitable combination of genetic operators for each individual in each generation. The core of this component is a policy network that monitors the real-time state of the population and determines the optimal genetic operation strategy accordingly. This approach reduces reliance on manual parameter adjustment and enhances convergence efficiency and the quality of the solution set.

This DRL component establishes a decision framework encompassing the state, action, and reward definitions, as well as policy updates, as detailed below:

State Space: The state

s_{t}

observed by the agent is a four-dimensional feature vector extracted based on an individual’s relative performance within the current generation’s population. It includes the following indicators:

(a): Whether the individual belongs to the non-dominated front;
(b): Whether its rank in the first objective function (total distribution cost) falls within the top third;
(c): Whether its rank in the second objective function (customer satisfaction) falls within the top third;
(d): Whether the number of individuals dominating it is lower than the average.

Action Space: The agent’s actions correspond to nine predefined genetic operation strategies. Each action is a unique combination of one crossover operator—position-based crossover (PBX), partially mapped crossover (PMX), or order crossover (OX)—and one mutation operator: Swap, Scramble, or Inversion.

Reward Function: The reward signal provided back by the environment reflects the effectiveness of the genetic operation. A positive reward is assigned if the generated offspring dominates its parent; a negative reward is assigned if the parent dominates the offspring; otherwise, the reward is zero.

Policy Learning: The policy network is a fully connected neural network with two hidden layers (32 and 16 neurons) with ReLU activation. It takes the state vector

s_{t}

as input and outputs a nine-dimensional action probability distribution via a sigmoid output layer. The algorithm stores state transition experiences (

s_{t}, a_{t}, r_{t}, s_{t + 1}

) in an experience replay buffer (capacity:

1 \times 10^{4}

) and periodically updates the policy network parameters using stochastic gradient descent with momentum (learning rate:

α = 1 \times 10^{- 3}

, momentum:

β = 0.9

) on mini-batches of size

B = 32

. The DRL agent is invoked for each selected parent individual in each generation to choose a genetic operator combination. Through continuous learning, the agent eventually learns to favor operators with strong exploratory power when population diversity is insufficient and to select operators with strong exploitative power when the population is nearing convergence.

4.2.5. Environmental Selection

This paper employs the standard elitist preservation strategy of NSGA-II. Parent and offspring populations are first merged, and fast non-dominated sorting is applied. Individuals are selected for the next generation sequentially based on their Pareto-front ranking. If a front contains more individuals than the remaining slots in the new population, selection is then guided by the crowding distance within that front, prioritizing individuals located in sparser regions of the objective space to maintain population diversity.

4.2.6. Local Search Operation

To enhance the quality of elite solutions, a destroy–repair local search is applied to a subset of individuals from the non-dominated front after each environmental selection. Specifically, in the destroy phase, a customer point is randomly removed from a chosen route. In the repair phase, a greedy strategy is employed: all feasible insertion positions for the removed customer are enumerated across the current set of routes, and the position causing the smallest deterioration in the objective values is selected for reinsertion. This process facilitates the gradual refinement of the solution. A specific example of this operation is illustrated in Figure 4.

4.2.7. Termination Criteria and Output

The algorithm terminates after a fixed budget of 200 generations. Upon termination, the first non-dominated front of the final population is extracted as the Pareto approximate solution set. Duplicate removal in the objective space is performed on this solution set. Subsequently, the corresponding chromosomes are decoded into concrete vehicle routing plans. Finally, a feasibility check is conducted to ensure all constraints are satisfied.

5. Experimental Process and Data Analysis

5.1. Experimental Setup

To validate the effectiveness of the proposed DRL-NSGA-II, we employ data adapted from the standard Solomon benchmark dataset [40], which covers three classic distribution types: C (Clustered), R (Random), and RC (Mixed), with instance sizes of 25, 50, and 100 customers.

Comparative algorithms include MOPSO, SPEA2, and the standard NSGA-II. The evaluation metrics used are hypervolume (HV; larger is better) and inverted generational distance (IGD; smaller is better). The key parameters for all algorithms are set as follows: population size = 80, maximum iterations = 200.

All experiments were performed using MATLAB 2021a and repeatedly run independently five times on a computer equipped with an Intel^® Core™ i5-11320H processor and 16 GB RAM, Windows 10 (64-bit), to mitigate randomness.

5.2. Algorithm Performance Comparative Analysis

Table 2 summarizes the main comparative results. As shown, DRL-NSGA-II consistently outperforms the comparative algorithms on both HV and IGD metrics across all tested instances. For example, on the RC101-50 instance, DRL-NSGA-II achieves an HV value of 0.8932, which is higher than that obtained by other algorithms. More strikingly, its IGD value (0.0166) exhibits an order-of-magnitude improvement over the comparative algorithms (all > 1.0). While its runtime is higher than the fastest baseline (MOPSO), DRL-NSGA-II achieves a superior effectiveness-efficiency trade-off, as the substantial improvement in solution quality outweighs the additional computational cost.

To provide statistical rigor to these observations and assess algorithmic robustness, we conducted Wilcoxon signed-rank tests with Holm correction and computed the coefficient of variation (CV) across five independent runs. As shown in Table 3 and Table 4, DRL-NSGA-II demonstrates statistically significant superiority over all benchmark algorithms on both HV and IGD (all adjusted p-values < 0.05). Moreover, its significantly lower CV values (HV: 1.97%, IGD: 20.96%) indicate greater stability compared to competitors (HV: 24–36%, IGD: 35–49%).

These results provide further evidence that the DRL-based adaptive control of genetic operators enhances both convergence and diversity, enabling the algorithm to produce a strong approximation of the Pareto front. This robustness is consistently observed across different problem scales.

5.3. Dynamic Distribution Case Study

This section presents a specific case study to demonstrate the practical application of the model in a low-carbon dynamic distribution scenario. The base data were adapted from the first 25 customer points of the RC101 instance. Detailed customer information and vehicle parameters are provided in Table 5 and Table 6, respectively.

5.3.1. Initial Distribution Plan

First, an initial Pareto solution set consisting of 52 schemes was generated using all static information (as shown in Table 7). Figure 5 depicts the trade-off between total cost and customer satisfaction, indicating that higher satisfaction typically requires a higher cost. For example, the cost-optimal scheme incurs a cost of CNY 1560.3 but achieves only 40.5% satisfaction, whereas the satisfaction-optimal scheme reaches 84.2% satisfaction at a cost of CNY 2600.9.

5.3.2. Distribution Plan After Dynamic Optimization

A dynamic event was simulated at T = 2.5, including information updates for 9 known customers (Table 8), the addition of 10 new customers (Table 9), and congestion on 5 road segments (Table 10). Using this updated dynamic information, the algorithm performs real-time re-optimization of the routes for all unserved customers.

Following dynamic optimization, 18 feasible schemes were obtained (Table 11). A comparison between Figure 6 (dynamic phase) and Figure 5 (initial static phase) indicates that, for an equivalent level of satisfaction, dynamic distribution incurs higher costs. This is primarily due to the additional expenses associated with accommodating new demands and adjusting routes to bypass congested segments.

Taking a high-satisfaction scheme (Scheme 18) as an example, the cost breakdown (Table 12) indicates that, apart from the fixed dispatch cost, the variable cost and refrigeration cost are strongly correlated with travel distance. In contrast, the correlation between carbon emission costs and distance is weaker, as these costs are also influenced by other factors such as cargo weight.

5.4. Sensitivity Analysis

As a critical policy parameter, carbon prices significantly influence low-carbon distribution decisions. To investigate its impact, based on a benchmark carbon price of CNY 0.1 per kg [41], we further analyzed the effects of carbon price (

r

) at CNY 0.3, 0.5, 0.7, and 0.9 per kg on total cost, total carbon emissions, and customer satisfaction. The main findings are as follows:

As shown in Figure 7, the impact of rising carbon prices on total cost and carbon emissions exhibits nonlinear characteristics. When

r

increases to 0.5, the total cost (CNY 3039.39) decreases compared to when

r = 0.3

(CNY 3181.27), while carbon emissions significantly decrease to 211.42. This phenomenon of “co-decline in cost and emissions” highlights the effective incentive role of a medium carbon price. With all physical parameters held constant, varying the carbon price changes the relative weights in the objective function, thereby guiding the optimization algorithm to a different trade-off point on the Pareto frontier between total cost and customer satisfaction. Specifically, under a low carbon price (

r = 0.3

), the weight of emission cost is low, and the model tends to prioritize customer satisfaction. This may lead to a decentralized routing strategy with lower vehicle utilization but higher timeliness, which results in higher fixed dispatch costs and total travel distance. When the carbon price rises to a medium level (

r = 0.5

), the emission cost becomes significant enough to drive the algorithm to a new optimal equilibrium on the Pareto frontier: by consolidating customer points and planning more compact delivery routes, it slightly adjusts service times for some customers (with a controllable impact on satisfaction) while significantly increasing vehicle load rates, reducing the total number of vehicles dispatched, and shortening the total travel distance. The savings in fixed dispatch and variable costs achieved through this route optimization are sufficient to offset the increased carbon cost, resulting in a net reduction in total cost and lower emissions. This suggests that, within this price range, operators are incentivized to optimize operations, achieving both cost reduction and emission abatement. When

r \geq 0.7

, the imperative for emission abatement dominates, forcing the model to adopt economically inefficient routing solutions in pursuit of minimal emissions, which causes other cost components to increase sharply, leading to a significant rise in total cost.

Figure 8 illustrates the indirect influence of carbon prices on customer satisfaction through route adjustments. Satisfaction peaks at 0.89 within a medium carbon price range (

r = 0.5 - 0.7

), while carbon emissions are simultaneously maintained at a relatively low level. This supports the previously discussed mechanism: the route consolidation and optimization incentivized by a moderate carbon price not only reduce emissions but also lead to more efficient and rational route planning, thereby maintaining or even enhancing overall service efficiency and reliability. This suggests that a moderate carbon price can effectively incentivize emission reductions without compromising service quality. However, when the carbon price becomes excessively high (

r = 0.9

), a drive with minimal emissions may lead to compromises in service, such as accepting unreasonable waiting times or detours to reduce mileage, causing satisfaction to drop to 0.85.

The above results indicate that an effective carbon price range exists to simultaneously promote cost reduction, efficiency improvement, and green transformation. For policymakers, the key is to balance emission reduction incentives with corporate viability, avoiding prices that are too low to be binding or too high to undermine operational efficiency. This can be achieved by setting an appropriate price level, implementing a gradual and predictable adjustment mechanism, and complementing it with supporting incentives. For enterprises, it is imperative to proactively integrate carbon constraints into their strategies. By leveraging the opportunity presented by moderate carbon prices, enterprises can optimize operations through technological and managerial innovations such as vehicle electrification and intelligent routing algorithms while also pursuing multi-stakeholder collaboration. This systematic approach internalizes policy requirements into sustainable low-carbon competitiveness and long-term development advantages.

6. Conclusions

This paper addresses the DVRP for low-carbon cold-chain logistics. We establish an MO-LC-CCDVRP model that simultaneously minimizes total cost (incorporating carbon emission costs) and maximizes customer satisfaction while responding to dynamic demands. To solve this model efficiently, we propose DRL-NSGA-II, which integrates a DRL mechanism to adaptively control genetic operators, thereby enhancing the convergence and diversity of the Pareto front. In our experimental analysis, DRL-NSGA-II exhibits consistently superior performance over classical multi-objective algorithms, as demonstrated by the comparative results on adapted benchmark instances.

Beyond its algorithmic contributions, this study provides strategic insights for cold-chain logistics operating in low-carbon, dynamic environments. A key managerial implication is the necessary mindset shift: viewing carbon constraints and dynamic demands not as mere costs, but as strategic levers for innovation and restructuring. Firms are encouraged to adopt intelligent decision-making frameworks, like the one presented, to proactively balance cost, emission, and service trade-offs. Route optimization should be treated as a core capability, and investments in green technologies (e.g., electric vehicles, IoT) can be positioned as strategic initiatives for long-term resilience, not short-term compliance. For policymakers, guiding a systemic industry transition may require moving beyond carbon pricing alone to create a synergistic policy mix. This includes using a predictable carbon price for clear signals, accelerating clean technology adoption via R&D and financial incentives, and establishing standardized MRV (monitoring, reporting, and verification) protocols for emissions. This integrated approach can more effectively translate policy goals into endogenous drivers for sustainable sectoral growth.

In the future, this research can be extended in the following directions: Regarding model development, introducing heterogeneous vehicle fleets would enhance realism; incorporating time-varying speeds and traffic conditions would enable more accurate assessment of energy consumption and emissions; and integrating inventory management with routing decisions could form a more comprehensive cold-chain logistics optimization framework. In terms of algorithmic improvements, further exploration of deeper integration between deep reinforcement learning and evolutionary algorithms remains promising, such as designing multi-agent collaborative frameworks for dynamic environments or developing efficient online optimization algorithms. From the perspective of application and validation, extending the model to specific contexts such as multi-temperature joint distribution or pharmaceutical cold chains, along with conducting empirical case studies using real-world operational data, is crucial for validating the model’s practical value and facilitating its implementation.

Author Contributions

Conceptualization, X.J.; methodology, X.J.; software, Q.H. and X.L.; validation, Q.H. and X.L.; formal analysis, Q.H.; investigation, X.L.; writing—original draft preparation, X.J. and Q.H.; writing—review and editing, X.J.; visualization, Q.H. and X.L.; supervision, X.J.; funding acquisition, X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Key Project of the Natural Science Foundation of Fujian Province, China (Grant No. 2022J02053) and the 2025 Graduate Science and Technology Innovation Program Project of Xiamen University of Technology (Grant No. YKJCX2025055).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dantzig, G.; Ramser, J. The Truck Dispatching Problem. Manag. Sci. 1959, 6, 80–91. [Google Scholar] [CrossRef]
Zhang, J.; Li, Y. Vehicle routing problem for cold-chain drug distribution with epidemic spread situation. Expert Syst. Appl. 2025, 262, 125186. [Google Scholar] [CrossRef]
Qi, C.; Hu, L. Optimization of vehicle routing problem for emergency cold chain logistics based on minimum loss. Phys. Commun. 2020, 40, 101085. [Google Scholar] [CrossRef]
Sun, Z.; Ma, S.; Jian, Y.; Lu, Y.; Sun, Z. Cold chain delivery route modeling and optimizing based on the clustered Whale Optimization Algorithm. Appl. Soft Comput. 2025, 182, 113544. [Google Scholar] [CrossRef]
Guan, X.; Li, G. Optimization of cold chain logistics vehicle transportation and distribution model based on improved ant colony algorithm. Procedia Comput. Sci. 2023, 228, 974–982. [Google Scholar] [CrossRef]
Cai, L.; Lv, W.; Xiao, L.; Xu, Z. Total carbon emissions minimization in connected and automated vehicle routing problem with speed variables. Expert Syst. Appl. 2021, 165, 113910. [Google Scholar] [CrossRef]
Zhang, X.; Hao, Y.; Zhang, L.; Yuan, X. Application of improved genetic algorithm to vehicle routing problem considering the environmental self-regulation of the freight companies. Expert Syst. Appl. 2025, 274, 127010. [Google Scholar] [CrossRef]
Liu, Y.; Yu, Y.; Baldacci, R.; Tang, J.; Sun, W. Optimizing carbon emissions in green logistics for time-dependent routing. Transp. Res. Part B Methodol. 2025, 192, 103155. [Google Scholar] [CrossRef]
Li, X. Multi-objective multi-compartment vehicle routing problem of fresh products with the promised latest delivery time. Ann. Oper. Res. 2024. preprint. [Google Scholar] [CrossRef]
Nan, Z.; Yang, X.; Ruiz-Garcia, L.; Qiu, J.; Feng, Y.; Han, J. Multi-objective optimization of cold chain distribution routes considering traffic congestion. Agric. Commun. 2025, 3, 100104. [Google Scholar] [CrossRef]
Zhang, X.; Li, Y.; Hao, Y.; Yuan, X. Optimization of the low-carbon cold chain delivery route for fresh products in time-dependent networks. J. Clean. Prod. 2025, 532, 146969. [Google Scholar] [CrossRef]
Ma, R.; Zhu, Q. Research on VRP model optimization of cold chain logistics under low-carbon constraints. Int. J. Inf. Technol. Web Eng. 2024, 19, 1–14. [Google Scholar] [CrossRef]
Ding, Y.; Zhang, L.; Kuo, Y.; Zhang, L. Cold chain routing for product freshness and low carbon emissions: A target-oriented robust optimization approach. Transp. Res. Part E Logist. Transp. Rev. 2025, 199, 104138. [Google Scholar] [CrossRef]
Jiang, X.; Liu, X.; Pan, F.; Han, Z. Optimizing Cold Chain Distribution Routes Considering Dynamic Demand: A Low-Emission Perspective. Sustainability 2024, 16, 2013. [Google Scholar] [CrossRef]
Baty, L.; Jungel, K.; Klein, P.; Parmentier, A.; Schiffer, M. Combinatorial optimization-enriched machine learning to solve the dynamic vehicle routing problem with time windows. Transp. Sci. 2024, 58, 708–725. [Google Scholar] [CrossRef]
Liu, Y.; Yu, Y.; Zhang, Y.; Baldacci, R.; Tang, J.; Luo, X.; Sun, W. Branch-cut-and-price for the time-dependent green vehicle routing problem with time windows. INFORMS J. Comput. 2023, 35, 14–30. [Google Scholar] [CrossRef]
Li, J.; Duan, Y.; Zhang, W.; Zhu, L. Vehicle routing optimization algorithm based on time windows and dynamic demand. J. Meas. Sci. Instrum. 2024, 15, 369–378. [Google Scholar] [CrossRef]
Kim, Y.; Kim, H.; Kim, B. Vehicle routing and scheduling problem with two deployment strategies to handle urgent orders in a vaccine supply chain. Comput. Ind. Eng. 2025, 207, 111346. [Google Scholar] [CrossRef]
Wan, F.; Guo, H.; Li, J.; Gu, M.; Pan, W.; Ying, Y. A scheduling and planning method for geological disasters. Appl. Soft Comput. 2021, 111, 107712. [Google Scholar] [CrossRef]
Peng, Y.; Zhang, Y.; Yu, D.; Luo, Y. Multiobjective Route Optimization for Multimodal Cold Chain Networks Considering Carbon Emissions and Food Waste. Mathematics 2024, 12, 3559. [Google Scholar] [CrossRef]
Liu, Y.; Tang, Y.; Hua, C. A hybrid metaheuristic algorithm for dynamic heterogeneous vehicle routing problem with stochastic demand considering environmental aspects. Int. J. Electr. Power Energy Syst. 2025, 172, 111135. [Google Scholar] [CrossRef]
Wang, W.; Li, Y.; Yan, H.; Zhao, W.; Zhao, Q.; Luo, K. A two-phase algorithm for the dynamic time-dependent green vehicle routing problem in decoration waste collection. Expert Syst. Appl. 2025, 262, 125570. [Google Scholar] [CrossRef]
Kohar, A.; Jakhar, S.; Agarwal, Y. Strong cutting planes for the capacitated multi-pickup and delivery problem with time windows. Transp. Res. Part B Methodol. 2023, 176, 102806. [Google Scholar] [CrossRef]
Mikkelsen, J.; Gadegaard, S.; Lysgaard, J. A Branch-and-Cut Algorithm for the Mixed Fleet Green Vehicle Routing Problem. Eur. J. Oper. Res. 2025. preprint. [Google Scholar] [CrossRef]
Jiang, X.; Chen, W.; Liu, Y. A Study on the Vehicle Routing Problem Considering Infeasible Routing Based on the Improved Genetic Algorithm. Int. J. Eng. Technol. Innov. 2024, 14, 67–84. [Google Scholar] [CrossRef]
Ahmed, Z.; Yousefikhoshbakht, M. An improved tabu search algorithm for solving heterogeneous fixed fleet open vehicle routing problem with time windows. Alex. Eng. J. 2023, 64, 349–363. [Google Scholar] [CrossRef]
İlhan, İ. An improved simulated annealing algorithm with crossover operator for capacitated vehicle routing problem. Swarm Evol. Comput. 2021, 64, 100911. [Google Scholar] [CrossRef]
Ren, T.; Luo, T.; Jia, B.; Yang, B.; Wang, L.; Xing, L. Improved ant colony optimization for the vehicle routing problem with split pickup and split delivery. Swarm Evol. Comput. 2023, 77, 101228. [Google Scholar] [CrossRef]
Liu, Y.; Chen, W.; Jiang, X. PSO-Augmented NSGA-III Algorithm: A Combined Optimization Approach to Heterogeneous Vehicle Routing and Bin Packing Problems. IEEE Access 2024, 12, 153497–153518. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
Fu, X.; Gu, S.; Chew, C. Optimizing the multi-objective traveling salesman problem with a deep reinforcement learning algorithm using cross fusion attention networks. Neural Netw. 2025, 192, 107904. [Google Scholar] [CrossRef]
Fu, Y.; Zhang, Z.; Huang, M.; Guo, X.; Qi, L. Multi-objective integrated energy-efficient scheduling of distributed flexible job shop and vehicle routing by knowledge-and-learning-based hyper-heuristics. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 2137–2150. [Google Scholar] [CrossRef]
Dong, G.; Qin, J.; Wu, C.; Xu, C.; Yang, X. Reinforcement learning-enhanced genetic algorithm for wind farm layout optimization. Renew. Energy 2026, 259, 125093. [Google Scholar] [CrossRef]
Song, Y.; Wei, L.; Yang, Q.; Wu, J.; Xing, L.; Chen, Y. RL-GA: A reinforcement learning-based genetic algorithm for electromagnetic detection satellite scheduling problem. Swarm Evol. Comput. 2023, 77, 101236. [Google Scholar] [CrossRef]
Zhang, Q.; Karki, B. Integrated multi-channel supply chain planning with human resource and multi-tier discount considerations: An RL-GWO optimization approach. J. Eng. Res. 2025. preprint. [Google Scholar] [CrossRef]
Hou, Y.; Liao, X.; Chen, G.; Chen, Y. Co-Evolutionary NSGA-III with deep reinforcement learning for multi-objective distributed flexible job shop scheduling. Comput. Ind. Eng. 2025, 203, 110990. [Google Scholar] [CrossRef]
Xiao, Y.; Zhao, Q.; Kaku, I.; Xu, Y. Development of a fuel consumption optimization model for the capacitated vehicle routing problem. Comput. Oper. Res. 2012, 39, 1419–1431. [Google Scholar] [CrossRef]
Kirby, H.; Hutton, B.; McQuaid, R.; Raeside, R.; Zhang, X. Modelling the effects of transport policy levers on fuel efficiency and national fuel consumption. Transp. Res. Part D Transp. Environ. 2000, 5, 265–282. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, X. An optimization model for the vehicle routing problem in multi-product frozen food delivery. J. Appl. Res. Technol. 2014, 12, 239–250. [Google Scholar] [CrossRef]
Solomon, M.M. Algorithms for the vehicle routing and scheduling problems with time window constraints. Oper. Res. 1987, 35, 254–265. [Google Scholar] [CrossRef]
Zhang, J.; Li, C. Research on dynamic distribution vehicle route optimization under the influence of carbon emission. Chin. J. Manag. Sci. 2022, 30, 184–194. [Google Scholar] [CrossRef]

Figure 1. Overall framework of the algorithm.

Figure 2. Flowchart of DRL-NSGA-II.

Figure 3. Example of chromosome encoding.

Figure 4. Example of the destroy–repair operation.

Figure 5. Optimization diagram of cost vs. satisfaction for initial distribution schemes.

Figure 6. Optimization diagram of cost vs. satisfaction for schemes after dynamic optimization.

Figure 7. Impact of different carbon prices on total cost and carbon emissions.

Figure 8. Impact of different carbon prices on customer satisfaction and carbon emissions.

Table 1. Symbols and descriptions.

Types	Symbols	Descriptions	Symbols	Descriptions
Set	$N$	$Set of customer points to be served, N = {1, 2, \dots, n}$	$K$	$Set of available homogeneous vehicles at the distribution center, K = {1, 2, \dots, k}$
	$S$	Any non-empty proper subset of $N$ (for subtour elimination)	$Y$	Set of unserved customer point (in the static phase)
	$Y^{*}$	Set of new unserved customer point (in the dynamic phase)	$K_{V}$	Set of dispatched vehicles (in the static phase)
	$K_{V}^{*}$	Set of available vehicles (in the dynamic phase)
Parameters	$c_{1}$	Unit fixed cost	$c_{2}$	Unit variable transportation cost
	$d_{i j}$	Distance between customer $i$ and customer $j$	$q_{i}$	Demand of customer $i$
	$ρ^{0}$	Base fuel consumption rate	$ρ^{*}$	Full-Load fuel consumption rate
	$Q$	Maximum vehicle load capacity	$E$	Carbon emission amount
	$ε$	Fuel-to-emission conversion factor	$H$	Fuel consumption
	$λ_{f}$	Unit carbon emission factor for fuel consumption	$λ_{r}$	Unit carbon emission factor for the refrigeration process
	$γ$	Unit carbon emission price	$c_{4}$	Unit refrigeration cost
	$α$	Deterioration degree of a container	$R$	Heat transfer coefficient
	$S_{1}$	Outer surface area of the vehicle	$S_{2}$	Inner surface area of the vehicle
	$Δ T$	Temperature difference between inside and outside the compartment	$v$	Vehicle travel speed
	$β$	Door opening frequency coefficient for vehicle	$v_{k}$	Volume of the refrigerated compartment of vehicle $k$
	$l$	Average unloading efficiency	$M_{i}$	Satisfaction of customer $i$
	$E T_{i}$	Earliest satisfactory arrival time window for customer $i$	$L T_{i}$	Latest satisfactory arrival time window for customer $i$
	$E T_{i}^{'}$	Earliest acceptable arrival time window for customer $i$	$L T_{i}^{'}$	Latest acceptable time window for customer $i$
Variable	$t_{i}$	Arrival time at customer $i$	$q_{i j k}$	Load carried by vehicle $k$ from customer $i$ to customer $j$
Variable	$Q_{k v}$	Remaining load capacity of vehicle $k$	$x_{i j k}$	Binary variable that equals 1 if vehicle $k$ travels directly from customer $i$ to customer $j$ ; 0 otherwise.

Table 2. Comparison of algorithm performance.

Instance	Metric	MOPSO	SPEA2	NSGA-II	DRL-NSGA-II
C101-25	HV	0.5430	0.5096	0.2215	0.9250
	IGD	0.5840	0.7499	1.1843	0.0453
	Runtime/s	13.09	148.49	75.83	108.34
R101-25	HV	0.7032	0.6897	0.5198	0.9130
	IGD	0.2792	0.3505	0.5200	0.0211
	Runtime/s	6.48	212.10	66.08	100.01
RC101-25	HV	0.4834	0.5136	0.3278	0.9143
	IGD	0.5779	0.5393	0.9002	0.0164
	Runtime/s	8.14	231.53	79.84	110.35
C101-50	HV	0.3163	0.5545	0.5430	0.9375
	IGD	0.9688	0.5030	0.4885	0.0179
	Runtime/s	22.38	202.53	71.54	179.23
R101-50	HV	0.3805	0.3568	0.3970	0.8868
	IGD	0.7078	0.7850	0.7382	0.0232
	Runtime/s	26.12	172.32	82.39	180.23
RC101-50	HV	0.0622	0.1723	0.2140	0.8932
	IGD	1.5270	1.1862	1.0791	0.0166
	Runtime/s	19.30	158.76	64.71	142.96
C101-100	HV	0.5058	0.5455	0.5145	0.8962
	IGD	0.4186	0.3919	0.4291	0.0155
	Runtime/s	21.96	112.87	50.99	272.08
R101-100	HV	0.1350	0.1747	0.1962	0.8658
	IGD	1.2523	1.1470	1.0705	0.0296
	Runtime/s	18.04	130.54	50.39	247.16
RC101-100	HV	0.1423	0.1642	0.0448	0.8826
	IGD	1.2204	1.1603	1.5848	0.0215
	Runtime/s	27.50	173.61	74.18	304.98

Note: Bold indicates the optimal value among the four algorithms for each metric.

Table 3. Statistical significance comparisons of algorithms on HV and IGD metrics.

Comparison	HV Adjusted p-Value	IGD Adjusted p-Value
DRL-NSGA-II vs. MOPSO	0.024 *	0.024 *
DRL-NSGA-II vs. NSGA-II	0.016 *	0.016 *
DRL-NSGA-II vs. SPEA2	0.008 *	0.008 *

* Indicates p < 0.05 after Holm correction; unmarked indicates p ≥ 0.05.

Table 4. Coefficient of variation (CV) comparisons and stability analysis.

Comparison	Average HV CV (%)	Average IGD CV (%)	CV Rank
DRL-NSGA-II	1.9660	20.9568	1
MOPSO	30.8269	48.9575	4
NSGA-II	36.0024	42.7784	3
SPEA2	24.1213	34.7366	2

Note: Bold indicates the optimal value among the four algorithms for each metric.

Table 5. Case study experimental data.

No.	X Coordinate	Y Coordinate	Demand	Earliest Time	Preferred Earliest Time	Preferred Latest Time	Latest Time	Service Time
0	40	50	0	0.00	0.00	6.00	6.00	0
1	25	85	0.60	3.03	3.63	4.38	5.58	0.16
2	22	75	0.90	0.65	1.25	2.00	3.20	0.16
3	22	85	0.90	2.13	2.73	3.48	4.68	0.16
4	20	80	1.20	2.93	3.53	4.28	5.48	0.16
5	20	85	0.60	0.43	1.03	1.78	2.98	0.16
6	18	75	0.60	1.78	2.38	3.13	4.33	0.16
7	15	75	0.60	1.38	1.98	2.73	3.93	0.16
8	15	80	0.70	1.68	2.28	3.03	4.23	0.16
9	10	35	0.60	1.68	2.28	3.03	4.23	0.16
10	10	40	0.90	2.38	2.98	3.73	4.93	0.16
11	8	40	1.20	0.88	1.48	2.23	3.43	0.16
12	8	45	0.60	1.00	1.60	2.35	3.55	0.16
13	5	35	1.10	2.95	3.55	4.30	5.50	0.16
14	5	45	0.70	0.28	0.88	1.63	2.83	0.16
15	2	40	0.60	0.85	1.45	2.20	3.40	0.16
16	0	40	0.60	1.20	1.80	2.55	3.75	0.16
17	0	45	0.60	3.13	3.73	4.48	5.68	0.16
18	44	5	0.60	1.58	2.18	2.93	4.13	0.16
19	42	10	1.20	1.20	1.80	2.55	3.75	0.16
20	42	15	0.50	2.45	3.05	3.80	5.00	0.16
21	40	5	0.40	1.08	1.68	2.43	3.63	0.16
22	40	15	1.20	1.70	2.30	3.05	4.25	0.16
23	38	5	0.90	1.03	1.63	2.38	3.58	0.16
24	38	15	0.40	3.10	3.70	4.45	5.65	0.16
25	35	5	0.60	3.25	3.85	4.60	5.80	0.16

Table 6. Vehicle parameter information.

Parameter	Value	Parameter	Value
$γ$	0.1 kg/L	$c_{2}$	0.8 CNY/km
$Q$	5 t	$λ_{f}$	2.61 kg/L
$v$	50 km/h	$λ_{r}$	2.6 kg/L
$c_{1}$	200 CNY/vehicle

Table 7. Initial distribution schemes.

Solution	Total Cost	Satisfaction	Solution	Total Cost	Satisfaction	Solution	Total Cost	Satisfaction
1	1560.3009	0.4053	19	1638.9769	0.6194	37	1802.2132	0.7512
2	1567.2630	0.4085	20	1639.8873	0.6292	38	1824.0690	0.7517
3	1569.3898	0.4209	21	1644.5295	0.6368	39	1829.3093	0.7531
4	1572.8830	0.4260	22	1664.0750	0.6439	40	1833.0969	0.7678
5	1577.8633	0.4797	23	1665.2352	0.6462	41	1848.7828	0.7905
6	1577.8866	0.5016	24	1669.8774	0.6539	42	1865.5399	0.7912
7	1580.9701	0.5421	25	1676.9421	0.6549	43	2065.8308	0.7930
8	1592.1445	0.5433	26	1698.0612	0.6588	44	2077.7141	0.7942
9	1596.9320	0.5642	27	1702.8564	0.6617	45	2078.5990	0.8114
10	1601.8037	0.5694	28	1705.7830	0.6716	46	2098.5615	0.8185
11	1610.8753	0.5751	29	1715.7235	0.6727	47	2138.0741	0.8190
12	1613.8567	0.5847	30	1763.9377	0.6796	48	2445.7394	0.8378
13	1615.4583	0.5931	31	1765.0979	0.6819	49	2465.7018	0.8383
14	1627.8000	0.5956	32	1770.1571	0.6896	50	2580.8972	0.8416
15	1627.8312	0.5973	33	1774.2586	0.6902	51	2597.1539	0.8419
16	1628.4165	0.6047	34	1782.7566	0.6903	52	2600.8597	0.8422
17	1635.3183	0.6078	35	1783.2520	0.7060
18	1637.6747	0.6180	36	1798.9379	0.7286

Table 8. Changes in known customer information.

No.	X Coordinate	Y Coordinate	Demand	Earliest Time	Preferred Earliest Time	Preferred Latest Time	Latest Time	Service Time
3	0	0	0.00	0.00	0.00	0.00	0.00	0.00
5	20	85	0.60	0.65	1.25	2.01	3.06	0.16
6	18	75	0.50	1.78	2.38	3.13	4.33	0.16
8	15	80	0.30	1.95	2.55	3.19	4.35	0.16
9	10	35	0.50	1.68	2.28	3.03	4.23	0.16
12	8	45	0.60	1.15	1.75	2.56	3.76	0.16
14	0	0	0.00	0.00	0.00	0.00	0.00	0.00
15	2	40	1.20	0.85	1.45	2.20	3.40	0.16
21	40	5	1.10	1.08	1.68	2.43	3.63	0.16

Table 9. New customer information.

No.	X Coordinate	Y Coordinate	Demand	Earliest Time	Preferred Earliest Time	Preferred Latest Time	Latest Time	Service Time
26	39	17	0.90	3.50	3.73	5.23	6.46	0.16
27	31	14	0.60	3.60	3.83	5.33	6.56	0.16
28	28	83	0.60	3.50	3.73	5.23	6.46	0.16
29	20	79	1.50	3.50	3.73	5.23	6.46	0.16
30	22	81	1.10	3.60	3.83	5.33	6.56	0.16
31	13	28	0.90	3.55	3.78	5.28	6.51	0.16
32	13	32	0.60	3.56	3.79	5.29	6.52	0.16
33	11	32	1.20	3.65	3.88	5.38	6.61	0.16
34	10	36	0.90	3.54	3.77	5.27	6.50	0.16
35	7	38	0.90	3.66	3.89	5.39	6.62	0.16

Table 10. Congested road sections.

Scenario	Congested Customer Node Pair
1	1, 4
2	2, 9
3	7, 19
4	5, 21
5	23, 24

Table 11. Distribution schemes after dynamic optimization.

Solution	Total Cost	Satisfaction	Solution	Total Cost	Satisfaction
1	2540.3157	0.5304	10	2791.8821	0.6481
2	2542.5542	0.5598	11	2795.5801	0.6573
3	2545.8995	0.5793	12	2820.7820	0.6676
4	2575.4495	0.5892	13	2945.3173	0.6756
5	2577.2428	0.6087	14	3140.5397	0.7050
6	2689.5709	0.6167	15	3212.9744	0.7252
7	2698.5190	0.6167	16	3215.1167	0.7264
8	2767.7722	0.6187	17	3354.6721	0.7294
9	2769.5655	0.6382	18	3354.9169	0.7344

Table 12. High-satisfaction distribution scheme.

Route	Route	Fixed Dispatch Cost	Variable Transportation Cost	Carbon Emission Cost	Refrigeration Cost	Carbon Emissions
0-4-27-28-1-29-0	218.67	100.00	218.67	170.69	208.71	63.93
0-30-19-31-25-26-0	221.23	100.00	221.23	220.56	182.14	62.82
0-13-21-23-18-22-0	137.95	100.00	137.95	97.04	101.27	36.77
0-6-5-2-12-20-32-33-0	201.89	100.00	201.89	128.53	139.13	56.29
0-24-15-7-8-35-34-0	200.69	100.00	200.69	136.11	145.78	56.33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, Q.; Liu, X.; Jiang, X. Multi-Objective Collaborative Optimization for Low-Carbon Cold-Chain Routing with Dynamic Demand. Mathematics 2026, 14, 753. https://doi.org/10.3390/math14050753

AMA Style

Hu Q, Liu X, Jiang X. Multi-Objective Collaborative Optimization for Low-Carbon Cold-Chain Routing with Dynamic Demand. Mathematics. 2026; 14(5):753. https://doi.org/10.3390/math14050753

Chicago/Turabian Style

Hu, Qiaoying, Xiangxin Liu, and Xiaoyun Jiang. 2026. "Multi-Objective Collaborative Optimization for Low-Carbon Cold-Chain Routing with Dynamic Demand" Mathematics 14, no. 5: 753. https://doi.org/10.3390/math14050753

APA Style

Hu, Q., Liu, X., & Jiang, X. (2026). Multi-Objective Collaborative Optimization for Low-Carbon Cold-Chain Routing with Dynamic Demand. Mathematics, 14(5), 753. https://doi.org/10.3390/math14050753

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Objective Collaborative Optimization for Low-Carbon Cold-Chain Routing with Dynamic Demand

Abstract

1. Introduction

2. Literature Review

2.1. Vehicle Routing Problem in Cold-Chain Logistics

2.2. Integration of Low-Carbon Objectives into Routing Optimization

2.3. Dynamic VRP and the Gap in Multi-Objective Cold-Chain Logistics

2.4. Solution Methodologies for VRP

2.5. RL-Enhanced Meta-Heuristic Optimization

2.6. Identified Research Gaps and Contribution

3. Problem Description and Model Construction

3.1. Problem Description

3.1.1. Analysis of the Total Cost Objective Function

3.1.2. Analysis of the Customer Satisfaction Objective Function

3.2. Model Construction

3.2.1. Basic Cold-Chain Vehicle Routing Problem Model

3.2.2. Cold-Chain Vehicle Routing Problem Model Considering Dynamic Demand

4. Research Methodology

4.1. Problem-Solving Strategy

4.2. Design of the Deep Reinforcement Learning-Enhanced Non-Dominated Sorting Genetic Algorithm II (DRL-NSGA-II)

4.2.1. Chromosome Encoding

4.2.2. Initial Population

4.2.3. Fitness Evaluation

4.2.4. Genetic Operators

4.2.5. Environmental Selection

4.2.6. Local Search Operation

4.2.7. Termination Criteria and Output

5. Experimental Process and Data Analysis

5.1. Experimental Setup

5.2. Algorithm Performance Comparative Analysis

5.3. Dynamic Distribution Case Study

5.3.1. Initial Distribution Plan

5.3.2. Distribution Plan After Dynamic Optimization

5.4. Sensitivity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI