Next Article in Journal
Incentive Scheme for Low-Carbon Travel Based on the Public–Private Partnership
Previous Article in Journal
Evolutionary Dynamics and Policy Coordination in the Vehicle–Grid Interaction Market: A Tripartite Evolutionary Game Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Learning-Enhanced Metaheuristic Algorithm for Multi-Zone Orienteering Problem with Time Windows

1
School of Transportation and Logistics, Southwest Jiaotong University, Chengdu 610031, China
2
School of Economics and Management, Southwest Jiaotong University, Chengdu 610031, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(15), 2357; https://doi.org/10.3390/math13152357
Submission received: 16 June 2025 / Revised: 16 July 2025 / Accepted: 22 July 2025 / Published: 23 July 2025
(This article belongs to the Section E: Applied Mathematics)

Abstract

Inspired by real-world logistics scenarios, in this paper, we introduce a new variant of the Orienteering Problem known as the Multi-zone Orienteering Problem with Time Windows (MzOPTW). In the MzOPTW, customers are situated in distinct zones, each with multiple entrances and exits. Each customer has specific time window requirements; access to them will generate certain profits. This problem is to simultaneously determine which zones and customers to visit, select the zonal entrances and exits, and generate the routes for visiting each zone and its customers, all while maximizing total profits within a limited time frame. To tackle the MzOPTW, this paper develops an integer programming model. There are significant computational challenges in the strong interdependencies among zone selection, customer selection within zones, entrance and exit selection for each zone, the sequence of visits to zones and customers, and arrival and stay times. To address these challenges, this paper proposes a learning-enhanced metaheuristic algorithm called the Hybrid Ant Colony Optimization (HACO) algorithm, which incorporates Pointer Network learning. The HACO algorithm combines the global search capabilities of a population-based algorithm with the parallel decision-making abilities of the Pointer Network learning model. Additionally, a method to optimize zonal stay time limits is proposed to further enhance the solution. Experimental results demonstrate that the HACO algorithm outperforms comparative algorithms, achieving better solutions in 73% of the instances within the same time frame. Furthermore, the proposed optimization method for zonal stay time limits results in improvements in 78% of instances.

1. Introduction

The Orienteering Problem (OP) originated from orienteering competitions, which are typically held in densely forested areas. Participants start from a designated origin node and, using a compass and map, visit as many nodes as possible before reaching a specified destination. Each node has a certain reward and can only be visited once. The objective is to maximize the total reward within a limited time frame [1]. Over the past few decades, the OP has attracted significant attention. Researchers such as Vansteenwegen et al. [2], Gunawan et al. [3], and Kant and Mishra [4] have conducted in-depth studies on the problem and its various variants, solution algorithms, and practical applications. Moreover, the OP can also be modeled as the Maximum Collection Problem [5], the Traveling Salesman Problem With Profits [6], and the Tourist Trip Design Problem [7], among others.
In many practical scenarios of the OP, the visited nodes exhibit clustering characteristics, such as groups of tourist attractions with similar attributes in the Tourist Trip Design Problem or multiple customers within the same community in logistics delivery. Based on the clustering characteristics of the visited nodes, Angelelli et al. [8] introduced the Clustered Orienteering Problem (COP), where each customer belongs to one or more clusters, and the reward for a cluster is obtained only by serving all customers within that cluster. The objective is to maximize the total reward within the given time frame. Similarly, Archetti et al. [9] proposed the Set Orienteering Problem (SOP), where customers are assigned to clusters, each of which contains only one cluster, and the reward for a cluster is obtained only when at least one customer from the cluster is served. The optimization objective is similar to that of the COP. These problems consider the instance with a single vehicle, while He et al. [10] and Nguyen et al. [11] studied the COP and SOP with multiple vehicles, respectively. In the study of the Traveling Salesman Problem, some scholars have also considered the distribution of clusters. For instance, Laporte et al. [12] and Khan et al. [13] studied the Generalized Traveling Salesman Problem (GTSP), where the objective is to find the minimum-cost Hamiltonian circuit to visit clusters of customers. Similarly, Bernardino and Paias [14], and Chaves et al. [15] investigated the Family Traveling Salesman Problem (FTSP), aiming to visit a predefined number of customers in each cluster at the minimum cost.
However, the research on OP with node clustering characteristics has the following limitations when applied to certain practical scenarios: (1) Lack of consideration of the entry and exit nodes of zones. In real-world scenarios, such as last-mile logistics delivery to community customers, communities (or residential areas) typically have multiple entry and exit nodes. Delivery personnel must select an appropriate entry node to access the zone and serve the customers within the community. For larger zones, different entry-node decisions can affect the travel time to customers, thereby influencing the sequence of customer visits, zone access, and total profits. (2) Inflexibility in the number of customers visited. For example, in the COP, it is assumed that all customers within a cluster must be visited, while in the SOP, only one customer from each cluster is required to be visited. This lack of flexibility in optimizing the number of customers visited within a zone affects the total profits. The characteristics of MzOPTW and related problems are compared in Table 1.
To better align with practical application scenarios, this study introduces a new class of OP—the Multi-zone Orienteering Problem with Time Windows (MzOPTW). Compared to existing research, this problem incorporates the decision-making process for the entry and exit nodes of zones, while relaxing the constraints that all customers within a zone must be visited or only one customer needs to be visited.
As the Orienteering Problem (OP) is a well-known NP-Hard problem [1], the proposed MzOPTW is also NP-Hard and even more complex, as it requires simultaneous optimization of intra-zone and inter-zone routing. These two layers interact dynamically—changes in zone entry and exit nodes, arrival times, and stay times can significantly affect the overall route, making exact algorithms computationally intractable. Heuristic approaches offer scalability but face trade-offs: local search algorithms often suffer from limited exploration, while global metaheuristics such as Ant Colony Optimization (ACO) may require long computational times. Recent advances in deep learning, particularly Pointer Networks (Ptr-Nets) [16], enable efficient and parallelizable solutions to combinatorial optimization problems [17]. To leverage the strengths of both paradigms, this study proposes a Hybrid Ant Colony Optimization (HACO) algorithm that integrates Pointer Networks to enhance solution efficiency and quality. Empirical results demonstrate that HACO algorithm achieves notable improvements in stability, scalability, and overall performance.
This paper makes the following four key contributions:
  • Problem Formulation: We introduce the Multi-zone Orienteering Problem with Time Windows (MzOPTW), a novel variant of OP that explicitly incorporates zone-level entry and exit node decisions, aligning with real-world logistics and tourism scenarios.
  • Model Flexibility: The MzOPTW supports flexible customer selection within each zone, bridging the gap between the restrictive assumptions of COP (visit all) and SOP (visit one), and enabling more realistic service strategies.
  • Hybrid Algorithm Design: We propose a Hybrid Ant Colony Optimization (HACO) algorithm that integrates Pointer Networks, effectively combining global metaheuristic search with learning-based inference to improve search quality and computational efficiency.
  • Empirical Validation: Extensive numerical experiments verify the effectiveness of HACO algorithm, showing superior performance over baseline methods in terms of solution quality, robustness, and computational time.

2. Problem Description and Formulation

2.1. Problem Description

The Multi-zone Orienteering Problem with Time Windows (MzOPTW) can be described as follows: A service worker needs to visit customers across multiple zones, each designed with multiple entry and exit nodes. The service worker must pass through these entry and exit nodes to reach the customers. Each customer has a specific time window, and if the service worker arrives before the time window starts, they must wait; if they arrive after the time window ends, the customer cannot be visited. Each customer can only be visited once, and the service worker cannot revisit the same zone. Additionally, considering that external road networks are usually well-developed outside the zones (communities), the distances between zones are calculated using the shortest route algorithm, while the Euclidean distance is used between customers within each zone.
The problem requires simultaneous decisions regarding which zones and customers to visit, which entry and exit nodes to select for each zone, the visiting sequence of zones, and the customer visiting sequence within each zone, with the goal of maximizing the total profit for the day. The scenario of this problem is illustrated in Figure 1.

2.2. Notation

The parameters and variables for the MzOPTW are summarized in Table 2 below.

2.3. Problem Formulation

Based on the above assumptions and definitions, the mathematical model for the MzOPTW is formulated as follows:
m a x   i N 2 p i y i
j N 1 N 3 x 0 , j = i N 1 N 3 x i , n + 1 = 1
y 0 = y n + 1 = 1
i N 0 N 1 N 3 \ ( N s 1 N s 3 ) j N s 1 N s 3 x i j = j N s 1 N s 3 i N n + 1 N 1 N 3 \ ( N s 1 N s 3 ) x j i 1 , s S
i N N 0 x i j = y j = i N N n + 1 x j i , j N
a i + L i j F / v F a j M 1 x i j i N 0 N 1 N 3 , j N n + 1 N 1 N 3
a i + T i + L i j E / v E a j M 1 x i j i , j N s , s S
o i y i a i c i y i , i N N 0 N n + 1
i , j N 4 x i j L i j F / v F + s S i , j N s x i j L i j E / v E + i N 2 y i T i T m a x
s S i N s 2 j ( N 0 N n + 1 N ) \ N s x i j = 0
s S i N s 1 N s 3 j N 2 \ N s 2 x i j = 0
s S i N s 1 N s 3 x i i = 0
s S i N s 1 j N s 3 x i j + x j i = 0 , j   i s   t h e   v i r t u a l   p o i n t   c o r r e s p o n d i n g   t o   i
T i d i , i N 0 N n + 1 N
a i 0 , i N 0 N n + 1 N
y i 0 , 1 , i N 0 N n + 1 N
x i j 0 , 1 , i , j N 0 N n + 1 N
The objective function (1) represents the maximization of the total profit. Constraints (2) and (3) ensure that the start and end nodes of the service worker’s route are the same. Constraint (4) ensures that each zone is visited at most once and guarantees connectivity between zones. Constraint (5) ensures connectivity between zones and customers, as the service worker may enter and exit the same entry/exit node of a zone. To facilitate modeling, a virtual node (virtual entry/exit node) is added to each zone’s entry/exit node. Constraint (6) captures the time dependency between two consecutively visited zones, ensuring that the arrival time is non-decreasing. Constraint (7) defines the temporal relationships among the arrival times at zone entry and exit nodes, as well as between customers within the same zone. To simplify the model, we combine each customer’s service time and the possible waiting time before the next zone or customer into a single variable T i , rather than modeling waiting time separately. This reduces model complexity while preserving temporal feasibility. Constraints (6) and (7), respectively, describe the temporal relationships between zones and within zones, corresponding to the blue solid and dashed lines in Figure 2. In the formulation, the constant M is used to relax time-related constraints. In practice, different values are assigned depending on the context: for Constraint (6), M = T m a x + max i N 0 N 1 N 3 , j N n + 1 N 1 N 3 { L i j F / v F } represents an upper bound on inter-zone arrival time, while for Constraint (7), M = T m a x + max i , j N s { L i j F / v F } reflects intra-zone timing bounds. These values are determined based on the maximum travel and service durations, ensuring feasibility while minimizing unnecessary relaxation. Constraint (8) ensures that the service at each customer begins within the specified time window. This constraint ensures that the arrival time is bounded by the customer’s time window only if the customer is visited. If a customer is not visited, the binary variable forces the arrival time to be zero, thus avoiding unconstrained variables and maintaining model strength. Constraint (9) imposes a global time limit on the service worker’s entire route. Constraint (10) ensures that after visiting the customers in a zone, the service worker must pass through the zone’s entry/exit node to access other zones. Constraint (11) ensures that the service worker must pass through the zone’s entry/exit node before visiting the customers within that zone. Constraints (12) and (13) prevent sub-cycles between the entry/exit nodes of zones. These structural constraints significantly restrict the actual number of feasible arcs during optimization, reducing redundant or unrealistic transitions without requiring explicit arc pruning. This implicit sparsification improves computational efficiency and ensures that the routing structure aligns with realistic operational logic. Constraints (14) to (17) define the ranges for intermediate and decision variables.

3. Solution Technique

The computational complexity of the MzOPTW problem is high due to the need to optimize both intra- and inter-zone routes, as well as due to the strong coupling between the selection of entry/exit nodes, visit sequence, arrival times, and stay times within each zone. Therefore, we designed a Hybrid Ant Colony Optimization (HACO) algorithm integrated with Pointer Networks, leveraging the global search advantages of population-based optimization algorithms and the parallel processing capabilities of machine learning networks to improve both the solution quality and efficiency. The algorithm includes the following modules: an inter-zone route optimization module; an intra-zone route optimization module; a maximum zone stay time optimization module; and an elite ant re-optimization module. The Algorithm 1 flow is shown as follows.
Algorithm 1. HACO algorithm flow.
1 Initialize the number of ants n k and pheromones, r o u t e =
2 While  t < t max  do
3  Use roulette wheel selection for θ d
4  Construct ant routes in parallel
5  Optimize the elite ant routes
6  if    f ( r o u t e k ) > f ( r o u t e )  then
7     r o u t e = r o u t e k , ω d = ω d + ε
8  Update the pheromones
9 Return r o u t e

3.1. Inter-Zone Route Optimization

The inter-zone route optimization problem involves deciding which zones to visit, the order in which to visit the zones, and the selection of entry/exit nodes for each zone.

3.1.1. Zone Selection

After the ant k visits the customers within a zone s , it exits zone s from one of the entry/exit nodes i i N s 1 N s 3 and selects the next zone to visit. The set of accessible entry/exit nodes for the next zone can be described as
c a n d i d a t e k 1 ( i ) = { j | j N γ 1 N γ 3 , γ u n v i s i t Z o n e , t k i + L i j F / v F + L j , n + 1 F / v F T m a x }
where γ represents the zones that the current ant k has not yet visited, and t k i represents the time at which the ant arrives at node i . Furthermore, when the ant enters the zone through the entry/exit node i , and chooses to exit through node j of the same zone, the set of accessible entry/exit nodes c a n d i d a t e k 2 i can be described as follows:
c a n d i d a t e k 2 ( i ) = { j | j N s 1 N s 3 , t k i + θ d + L j , n + 1 F / v F T m a x , L i j E / v E θ d }
where θ d represents the maximum stay time of the ant in zone s .
When the ant selects the next zone to visit, in order to balance exploration and exploitation, the ant located at node i generates a random number q that follows a uniform distribution U 0 , 1 . If q     q 0 , then the ant will greedily select the next node j as follows:
j = a r g m a x   j c a n d i d a t e k 1 i { τ i j α η i j 1 β } , S e l e c t   t h e   n e x t   z o n e   e n t r y a r g m a x   j c a n d i d a t e k 2 i { τ i j α η i j 2 β } , S e l e c t   t h e   e x i t   o f   t h e   c u r r e n t   z o n e
where q 0 is a given parameter, τ i j represents the pheromone value between node i and node j , and η i j 1 and η i j 2 are the heuristic information ( η i j 1 = 1 / L i j F , η i j 2 = 1 / L j , n + 1 F ). α and β are the pheromone influence factor and the heuristic information influence factor, respectively, indicating the relative importance of pheromone and heuristic information.
If q   >   q 0 , the next node is selected using the roulette wheel method; this will expand the ant’s search space.
ϕ i j k = τ i j α η i j 1 β j c a n d i d a t e k 1 i τ i j α η i j 1 β , S e l e c t   t h e   n e x t   z o n e   e n t r y τ i j α η i j 2 β j c a n d i d a t e k 2 i τ i j α η i j 2 β , S e l e c t   t h e   e x i t   o f   t h e   c u r r e n t   z o n e

3.1.2. Pheromone Update

After each iteration, local pheromone updates are performed on the route:
τ i j : = 1 ρ τ i j + ρ τ 0 , i , j N 4
where τ i j i , j N 4 represents the pheromone value on the route between nodes i and node j , ρ represents the pheromone evaporation factor, and τ 0 represents the initial pheromone value on each route. Local updates allow the pheromone values on the route to decay toward τ 0 , thereby increasing the diversity of the search and enhancing the ants’ global search capability. After that, the global pheromone is updated based on the route taken by the ant with the highest profit, as follows:
τ i j : = 1 ρ τ i j + ρ τ 1 , i , j r o u t e 1 ρ τ i j , o t h e r
where τ 1 represents the pheromone value released by the ant with the highest profit, and r o u t e represents the route taken by that ant. Global updates not only help prevent pheromone overaccumulation, which can overwhelm the heuristic information, but also contribute to improving the convergence speed of the algorithm.

3.2. Intra-Zone Route Optimization

Intra-zone route optimization involves deciding which customer nodes within a zone to visit and the order of visits. To efficiently process the access routes of multiple ants within a zone in parallel and improve the solution efficiency, this study designs a Pointer Network learning model to solve the intra-zone route problem. The Pointer Network, first proposed by Vinyals et al. [16], is a learning model that extracts features from discrete, variable-length input sequences and outputs pointers that probabilistically point to elements of the input sequence.

3.2.1. Markov Decision Process Construction

First, the route-solving process within zone s is modeled as a Markov Decision Process (MDP) with discrete stages and deterministic state transitions:
(1)
The next node to visit is chosen at each decision stage g = 0 , 1 , 2 , , G   <   N s 2 , where the value of G is uncertain. The process ends when the node reached is the exit of the zone.
(2)
The state set S g = τ g , r g , V g , τ g is the time at stage g , r g is the node reached at τ g , and when  g = 0 τ 0 = t s i n , r 0 = s i n . V g is the next set of nodes that can be visited at τ g , where node v i ( v i V g ) must satisfy the following conditions:
  • τ g + L r g , v i E / v E + ϖ v i c v i , ϖ v i = max 0 , o v i τ g + L r g , v i E / v E
  • τ g + L r g , v i E / v E + ϖ v i + d v i + L v i , s o u t E / v E τ 0 + π s
  • v i j = 0 g r j =
Here, condition (i) is the time window constraint for visiting node v i , where ϖ v i represents the waiting time at node v i ; condition (ii) is that the stay time in zone s cannot exceed the maximum zone stay time π s (as described in Section 3.3); and condition (iii) ensures that each node can only be visited once.
(3)
The action set x S g = a g , where a g represents the action of choosing the next node to visit at τ g ( a g V g ).
(4)
After visiting node a g , the state transitions to S g + 1 = τ g + 1 , r g + 1 , V g + 1 , where τ g + 1 = τ g + L r g , a g E / v E + ϖ a g + d a g , r g + 1 = a g , and, based on the conditions satisfied, V g is updated to V g + 1 .
(5)
R S g , a g = p a g , the reward for selecting action a g at τ g .

3.2.2. Pointer Network

When solving the intra-zone route using the Pointer Network learning model, the goal is to maximize the total reward. The structure of the Pointer Network is shown in Figure 3.
The input features of the Pointer Network learning model for solving the intra-zone route include both static features f s and dynamic features f d , g . The static features consist of the coordinates of the nodes, service duration, time windows, rewards, and the stay time in the zone. The dynamic features are related to the current time τ g at stage g , and include the difference between the current time τ g and the time window of each node, the time τ 0 at stage g = 0 , and the stay time limit for reaching the zone’s endpoint.
In addition, this study introduces the foresight dynamic features for each node [18] (such as the difference between the arrival time at the node and its corresponding time window). At the beginning of each stage g , the static and dynamic features of the nodes within the zone are encoded into a vector h e , g using a Transformer [19]. This vector h e , g 1 , along with the Transformer-encoded vector from the previous stage, the cell state c g 1 of the Long Short-Term Memory (LSTM) network, and the previous output vector h d , g 1 , is then fed into the LSTM to guide the decision process. The output of the LSTM at this stage, h d , g , and the encoded vector h e , g from the Transformer are passed into the Attention module to generate the pointer vector u a g g :
u a g g = C t a n h W 1 t a n h W 2 h e , g + W 3 h d , g , a g V g
p a g | S g = s o f t m a x u a g g , a g V g
where W 1 W 2 , and W 3 are the learnable parameters of the model, and C is a hyperparameter [20]. Finally, the pointer vector u a g g is normalized into an output distribution over the nodes, and a search strategy is applied to select the next node to visit. Since the static features of the nodes at the beginning of each stage remain unchanged, but the dynamic features evolve over time, the Transformer will re-encode the dynamic feature vector at each stage to update the node features based on the current state. During the training process of the Pointer Network, the REINFORCE algorithm [21] is used to estimate the gradient. To improve the solution efficiency and ensure the quality of the solution, this study employs two types of search strategies, greedy search and beam search, which are used at different stages of the algorithm.

3.3. Maximum Zone Stay Time

The stay time of an ant in each zone affects the number of customers selected within the zone and the sequence in which customers are visited. This indirectly influences the selection and visit order of the zones, which in turn impacts the total profit. If visiting certain zones yields high profits, setting a longer stay time can help in visiting more high-profit customers. Therefore, this study proposes setting a maximum stay time for each zone to flexibly adjust the selection of customers with different profit values, thereby increasing the total profit.
(1)
Enumeration Strategy
This strategy is used at the beginning of each iteration in the HACO algorithm. It traverses a set of initial values for the maximum zone stay times and estimates the benefit value for each zone. This provides a basis for redistributing the maximum zone stay times during the subsequent elite ant re-optimization phase (see Section 3.4). A set of initial maximum zone stay times, denoted as Θ = θ 1 , θ 2 , , θ D , is given based on factors such as the number of zones and the total time limit. The stay time for each zone in the set is sequentially assigned as the stay time limit for each ant during its visits to the zones. Afterward, both inter-zone and intra-zone route problems are solved. To obtain sufficient zonal benefit information, a random selection strategy is applied when selecting the next zone to visit. The Pointer Network model, based on a greedy search strategy, is used to process all ants’ intra-zone routes in batches. After obtaining the zone and customer visit route solutions, the benefit value for each zone is estimated as δ s = s c o r e s / s u m s , s S , where s c o r e s and s u m s represent the total profit and total visit count, respectively. The larger δ s is, the greater the potential benefit of zone s.
(2)
Roulette Wheel Strategy
This strategy is used to determine the maximum stay time for each zone for all ants during iterations of the HACO algorithm. For each element in the set Θ = θ 1 , θ 2 , , θ D , an initial weight coefficient ω d = ( d = 1 , 2 , , D ) is set. The zone’s stay time limit is then selected from Θ using the roulette wheel method, and this stay time limit θ d is assigned to each ant for its visit to the corresponding zone during the current iteration. At the end of each iteration, if the best solution found in the current iteration is better than the historical best solution, the weight coefficient ω d is increased by ε . Otherwise, the weight remains unchanged.

3.4. Elite Ant Re-Optimization

In each iteration, after all ants generate their route solutions, only a set number of elite ants n e l i t e are selected for re-optimization. This strategy consists of two stages: Stage 1: reassign the maximum zone stay time limits for the elite ants and optimize the intra-zone routes; Stage 2: for the elite ant with the highest profit, use the zone time fine-tuning algorithm to further optimize the intra-zone route.
Stage 1: If a zone has a high benefit value, allocating a longer stay time will help in visiting more high-profit customers, thus increasing the total profit. In this stage, the stay time limits for the zones traversed by the elite ants are reallocated based on the benefit value of each zone.
First, the total allocable time for the elite ant is calculated, denoted as π k , which is the sum of the actual stay time in the zones traversed by the ant k and the time difference when arriving at the destination ahead of schedule. This is calculated as follows:
π k = s S k t k s o u t t k s i n + T m a x t k , n + 1
where S k is the set of zones traversed by the elite ant k , t k s i n and t k s o u t are the times at which the ant k arrives and leaves zone s , and t k , n + 1 is the time at which the ant reaches the destination. After obtaining π k , the maximum stay time for the zone s traversed by the ant is reallocated as follows:
π k s = δ s r S k δ r π k , s S k
Under the constraint of π k s , the Pointer Network model based on the beam search strategy is used to generate the route of ant k . Then, the route is further optimized using three local operators: Insert [22], Swap, and Replace [23].
Stage 2: To improve the solution quality, a zone stay time fine-tuning algorithm is designed in this study. This further optimizes the maximum zone stay time for the elite ant with the highest profit in order to enhance the quality of the route solution. The framework of this Algorithm 2 is outlined as follows (see Appendix A for details):
Algorithm 2. Flow of the zone stay time fine-tuning algorithm
Optimize both the intra-zone and inter-zone routes
Input: r o u t e k , S k , Δ t , κ
Output: r o u t e k
1 r o u t e k r o u t e k , φ 0 , s 1 S k 0 , s 2 S k 1
2 While s 1 S k l e n S k 1 do
3   I m p F a l s e , t o L e f t T r u e , I t e r s 1 , calculate Π k s 1   and   Π k s 2
4  While T r u e do
5   If t o L e f t = T r u e then
6     π k s 1 Π k s 1 + Δ t , π k s 2 Π k s 2 Δ t ,
7     c h e c k C o n t i n u e π k s 2 , I t e r s
8   Else
9     π k s 1 Π k s 1 Δ t , π k s 2 Π k s 2 + Δ t , c h e c k C o n t i n u e π k s 1
10   End If
11   Use the Pointer Network model based on the beam search strategy and local search algorithms to optimize the zone s 1 and s 2 .
12   If the current solution is improved then
13     r o u t e k s 1 s o l u t i o n s 1 , r o u t e k s 2 s o l u t i o n s 2
14         I m p T r u e
15   Else
16     c h e c k C o n t i n u e π k s 2 , t o L e f t
17   End If
18    I t e r s I t e r s + 1
19  End While
20  If I m p = True then
21     φ 0 , s 1 S k 0 , s 2 S k 1
22  Else
23    φ φ + 1 , s 1 S k φ , s 2 S k φ + 1
24  End If
25 End While
The algorithm optimizes the stay time of adjacent zones sequentially, combining the Pointer Network model and local search algorithms, continuously adjusting the intra-zone routes to find a better overall route solution.

4. Numerical Experiments

In this study, the proposed algorithm is implemented using Python 3.9. To verify its effectiveness, a series of numerical experiments are conducted, including tests to evaluate the overall performance of the algorithm as well as the effectiveness of the related strategies.

4.1. Instance Description

Due to the lack of standard test instances for this problem, this study references the VRPTW example generation method [24] and the road network generation approach from another study [25], and randomly generates test instances of three different sizes, each with two, four, and six zones. Each size includes 20 test instances. The example generation method and parameter settings are as follows:
First, a road network is generated on a 6 × 6 two-dimensional plane, consisting of 36 square grids with a side length of 10. The start and end nodes are located at the center 30 , 30 . A certain number of grids are randomly selected, and a specified number of points k 1 are randomly generated within each grid in the two-dimensional plane 1 , 9 2 . The convex hull of these points is calculated, and the points in the convex hull are connected in sequence to form the zone’s boundary. Then, three random points are selected from the convex hull and connected to the nearest main road, and k 2 points are chosen as the entry/exit nodes ( k 2 ~ U 3 , 5 ) of the zone. Points not included in the convex hull are considered customers within the zone. Figure 4 illustrates an example with four zones, where the service worker’s start and end points are at the center. The black lines represent the main road routes, the green dashed lines represent the zone boundaries, the red stars indicate the zone’s entry/exit nodes, and the blue dots represent the customers within the zone. The values for v E and v F are set to 1.6 and 4, respectively. The working time is set to 0 , 540 , and the customers’ profits and service durations are uniformly distributed as random integers within the specified ranges 50 , 200 and 15 , 30 , respectively. The customers’ service time windows are defined by the time window center and width, which are uniformly distributed as random integers within the specified ranges U o 0 + t 0 i , c 0 t i 0 d i and, respectively.

4.2. Parameters and Pointer Network Training Process

In the HACO algorithm, the number of ants n k is set to 100, and the number of elite ants n e l i t e is set to 5. The set of maximum zone stay time limits is denoted as Θ = 50 , 75 , 100 , 150 , 200 , 400 , with an initial value q 0 of 0.6. If the historical best solution has not improved after three iterations, the value of q 0 is reduced to 0.2 to increase the search space for the ants. The parameters are set as follows: τ 0 = 1, τ 1 = 1.2, α = 2.2, β = 1.2, ρ = 0.15, κ = 30, Δ t = 20, n b = 128, C = 10, ε = 0.2, where the initial values of q 0 , τ 1 , α , β , and ρ are determined through IRACE [26], which performs iterative racing procedures to identify high-performing parameter combinations based on experimental performance. The remaining parameters are set based on instance characteristics and preliminary tuning experiments.
In the training process of the Pointer Network, a global model is trained for each of the three different sizes of test instances. For each zone’s customers, their geographical location remains fixed, but their time windows, service durations, and profits are generated randomly. Therefore, variable feature data is sampled during the training process. In each training iteration for each size, a zone and its entry/exit positions are randomly selected, and b a t c h samples are generated. The time range for entering and exiting the zone is between 0 and 540. Each training for a given size is conducted for 300,000 iterations, with a batch size of 32. The Transformer encoder in the Pointer Network consists of two neural modules, each with an 8-head multi-head attention layer. The vector dimensions for both the encoder and decoder are set to 128, and the feed-forward layer dimension is set to 256. Each model is trained offline within a few hours on a standard GPU, and the resulting models are reused across all corresponding test instances. This eliminates the need for retraining and ensures that the training cost is incurred only once per problem size.

4.3. Performance of the HACO

In this study, the Gurobi solver is used to obtain exact solutions for the three different sizes of test instances, with a time limit of 7200 s. If the solution time exceeds the time limit, the current best integer solution is output. Additionally, since there is no similar comprehensive comparison algorithm, and considering that the Variable Neighborhood Search (VNS) algorithm proposed in another study [23] can solve problems such as TDOPTW by fixing the service time, this study replaces the Pointer Network in HACO algorithm with VNS to solve the lower-level route problem, which serves as the comparison algorithm ACO-VNS. Moreover, as iterated local search (ILS) [22] has been shown to effectively balance solution quality and computation time, it is chosen as the comparison algorithm for the Pointer Network. Thus, this study replaces the Pointer Network in HACO algorithm with ILS to solve the lower-level route problem, resulting in the comparison algorithm ACO-ILS. Considering that the MzOPTW requires multiple solutions for the OPTW, which leads to long computation times, VNS only uses the operators from the Initialization phase in the zone benefit estimation and general ant route generation phases. ILS uses only the Insert operator. In the elite ant re-optimization phase, both VNS and ILS use their full algorithm frameworks to solve the OPTW.
The running time limit for each algorithm on the three different sizes of test instances is set to 30 s, 45 s, and 60 s, respectively, and each algorithm is run 10 times. These time limits are kept consistent across all algorithms to ensure a fair comparison. The overall experimental results are shown in Figure 5 (see Appendix B for details), where the vertical axis represents the average objective function value (optimal value) of each algorithm on the different-size test instances.
Due to the need to solve multiple OPTWs for different zones and optimize the stay time in each zone, the computational complexity of MzOPTW is relatively high. Within the 7200 s time limit, Gurobi was unable to find the optimal solution for all test instances. However, in the 20 test instances of each size, HACO algorithm outperformed Gurobi in 18, 19, and 17 instances, with the maximum improvements in solution quality reaching 20.17%, 22.12%, and 20.82%, and the average improvements being 5.46%, 7.95%, and 8.8%, respectively. This shows that HACO algorithm outperforms Gurobi in overall performance, and its performance becomes more stable as the size of the problem increases.
Compared to ACO-ILS, HACO algorithm performed similarly in small-scale test instances and outperformed ACO-ILS in 15 test instances, with the maximum improvement being 2.93% and the average improvement 0.73%. In the other two scales, HACO algorithm outperformed ACO-ILS in 15 and 16 test instances, with maximum improvements of 4.91% and 4.15% and average improvements of 1.34% and 1.31%, respectively. When compared to ACO-VNS, HACO algorithm outperformed ACO-VNS in 11, 16, and 14 test instances for the three scales, with the maximum improvements reaching 2.93%, 5.33%, and 3.49%, and the average improvements being 0.52%, 1.01%, and 0.93%, respectively. It should be noted that while there are isolated cases where HACO algorithm performs slightly worse than ACO-ILS and ACO-VNS, the algorithm consistently shows superior performance in terms of average objective values across multiple runs, demonstrating its overall robustness and effectiveness.
Furthermore, in the 60 test instances across all three sizes, HACO algorithm achieved the optimal solution in 44 instances, accounting for 73% of the total test instances. On average, the HACO algorithm outperformed ACO-ILS and ACO-VNS in 88% and 97% of the test instances, respectively. Notably, these results were obtained within strict computational time limits (30 s for small, 45 s for medium, and 60 s for large instances), demonstrating that the integration of the Pointer Network learning model enhances not only solution quality but also computational efficiency. Compared to the baseline heuristics, the HACO algorithm consistently reached better solutions within comparable time limits, highlighting its practical potential in time-constrained decision environments.

4.4. Performance of the Elite Ant Re-Optimization Strategy

To verify the effectiveness of the elite ant re-optimization strategy, this study removes the elite ant re-optimization from the HACO algorithm (denoted as HACO1) and removes only the second stage (zone time fine-tuning algorithm) from the HACO algorithm (denoted as HACO2). These two variants are then compared with the original HACO algorithm. The running time limit for each test instance remains unchanged for all three sizes, with each test instance being run 10 times, and the optimal value is selected. The experimental results are shown in Table 3, where Avg, Max, and Min represent the average, maximum, and minimum values, respectively, for the 20 test instances of each size.
In Table 3, HACO outperforms HACO1 in every test instance. For the three different sizes, the maximum improvements are 32.14%, 24.55%, and 20.47%; the minimum improvements are 14.41%, 11.16%, and 6.85%; and the average improvements are 25.19%, 19.13%, and 12.46%, respectively. This demonstrates that the elite ant re-optimization strategy significantly enhances the solution quality.
Furthermore, HACO, which incorporates the zone time fine-tuning algorithm, outperforms HACO2 in the majority of the test instances. For the three scales, the maximum improvements are 4.81%, 7.8%, and 5.96%, with average improvements of 1.18%, 2.37%, and 2.39%, respectively. The proportion of test instances with improvements is 60%, 85%, and 90% (see Appendix C). This indicates that the zone time fine-tuning algorithm further optimizes the solution.

5. Conclusions

This study proposes a new variant of the OP—the MzOPTW. In this problem, customers are located in different zones, and service personnel can only access customers within a zone by entering specific positions in that zone. The optimization objective is to develop the optimal service route within a limited time, serving the selected customers within their designated time windows, in order to maximize the total profit. The contributions are as follows: (1) First, a mathematical model for the problem is constructed, and a population-based optimization algorithm, HACO, which integrates the Pointer Network learning model, is designed. (2) Second, an algorithm is developed to effectively optimize the zone stay times. (3) Third, compared to the exact solver Gurobi and heuristic algorithms such as ACO-ILS and ACO-VNS, the proposed HACO algorithm framework performs better and more stably. Additionally, the elite ant re-optimization strategy significantly improves solution quality.
The proposed model and algorithm can be applied to scenarios such as the Tourist Trip Design Problem, where zones represent scenic areas and customers correspond to points of interest. Future work may explore the Team MzOPTW with multiple service personnel and develop more efficient algorithms. In addition, future work could extend the proposed approach to dynamic and online settings, where routing instances evolve over time and require real-time decision-making. Incorporating principles from online combinatorial optimization, such as instance evaluation, prioritization, and adaptive resource allocation, may enhance the responsiveness of the framework in practical applications [27].

Author Contributions

Conceptualization, H.L.; Methodology, H.L., Y.C. and Y.J.; Validation, Y.L.; Formal analysis, Y.L.; Writing—original draft, H.L.; Writing—review & editing, Y.C. and Y.J.; Supervision, Y.J.; Project administration, Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Nature Science Foundation of China (grant no. 72371206).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Algorithm for Optimizing Intra-Zone and Inter-Zone Routes Based on Fine-Tuning Zone Stay Times

First, initialize the ant’s route r o u t e k and select the first two zones s 1 and s 2 that the ant will visit (line 1), with line 2 representing the termination condition. In line 3, initialize the relevant parameters, where t o   L e f t = Ture represents the time allocated to zone s 1 , and compute the stay times in zones s 1 and s 2 ( Π k s 1 and Π k s 2 ). Lines 4–19 focus on optimizing the stay time allocation for s 1 and s 2 . In line 6, allocate time Δ t to zone s 1 (where Δ t is the fine-tuning time step) and check whether optimization should continue ( c h e c k C o n t i n u e π k s 2 , I t e r s ): when π k s 2 is smaller than a threshold κ , if I t e r s = 1 , the remaining time will be allocated to zone s 2 , and the process continues optimizing; otherwise, the loop exits. Line 9 involves allocating time Δ t to zone s 2 , and a similar check decides whether optimization should continue ( c h e c k C o n t i n u e π k s 1 ): when π k s 1 is smaller than the threshold, the loop exits. Line 11 uses beam search and local search algorithms to find the optimal route within zones s 1 and s 2 . If a better route is found, the optimal route is updated (line 13); otherwise, based on the values of I t e r s and t o   L e f t , it is determined whether to continue optimizing (line 16). If I t e r s = 1 and t o   L e f t = True indicate no improvement, set t o   L e f t = False and continue optimizing; otherwise, exit the loop. If the adjustment of stay times between zones s 1 and s 2 results in a better route, set the updated time allocation and restart optimization from the initial zone (line 21); otherwise, optimize the next pair of consecutive zones (line 23). To balance the quality and speed of the zone time fine-tuning algorithm, this study sets a threshold κ to limit the minimum stay time the ant can spend in each zone.

Appendix B. Detailed Experimental Results of Algorithm Performance Comparison

Table A1 presents the detailed experimental results for the algorithm performance comparison in Section 4.3. The test instance name R2_1_50_8 consists of four numbers representing the following: the number of zones, the test instance number, the number of customers, and the number of entry/exit points in the zone. Obj. represents the objective function value obtained by the algorithm, where all three heuristic algorithms are run 10 times. Best is the best objective function value from the 10 runs, and AVG is the average objective function value across the 10 runs. Gap (%) represents the relative deviation between the HACO algorithm and the comparison algorithm.
Table A1. Results of HACO versus Gurobi, ACO-ILS, and ACO-VNS.
Table A1. Results of HACO versus Gurobi, ACO-ILS, and ACO-VNS.
Instance Gurobi BEST AVG
ACO-ILS ACO-VNS HACO ACO-ILS ACO-VNS HACO
Obj. Gap Obj. Gap Obj. Gap Obj. Obj. Gap Obj. Gap Obj.
R2_1_50_824379.8126741.0426990.1127022652.80.082630.90.912655
R2_2_59_8290011.7732471.2232471.2232873216.31.013211.71.153249.1
R2_3_49_833702.4334540.003462–0.2334543431.6–0.173431–0.153425.8
R2_4_50_829536.0231320.3230911.6231423089.10.503082.20.723104.5
R2_5_66_1032392.3533100.2133100.2133173260.10.853246.81.263288.2
R2_6_54_10265410.9729481.1129481.1129812941.80.192919.60.942947.4
R2_7_51_726273.0327090.0027090.00270927090.0027090.002709
R2_8_47_92973–1.5028921.2628961.1329292885.50.952844.42.362913.1
R2_9_50_1029282.3329980.0029980.0029982997.10.032908.82.982998
R2_10_58_1028182.7628730.8628571.4128982867.70.232783.43.162874.2
R2_11_37_724813.3525670.0025670.0025672556.80.122556.80.122559.8
R2_12_76_631882.9832550.9432810.1532863217.50.973103.34.493249.1
R2_13_53_628281.4328560.4528690.0028692852.60.152856.30.022856.9
R2_14_62_728716.5729832.9329832.9330732960.41.062827.45.502992
R2_15_51_73310–2.3532320.0632340.0032343123.70.1630502.523128.7
R2_16_39_724233.7725180.0025180.0025182416.91.952395.82.802464.9
R2_17_56_830482.3430950.8331090.3831213070.9–0.073063.60.163068.6
R2_18_57_1030805.5232460.433290–0.9232603240.30.603214.11.413260
R2_19_64_8273520.1733841.2333841.2334263357.81.523284.83.663409.7
R2_20_46_9239715.4527841.8028350.00283527841.802820.70.502835
avg28635.463007.90.733014.40.523030.32981.60.6029471.732999.5
R4_1_126_1534457.8936542.3036542.3037403621.41.593604.52.053679.9
R4_2_108_1733165.9335110.4034990.7435253447.90.653364.43.053470.4
R4_3_100_1231687.7533572.2434130.6134343329.41.853233.74.673392
R4_4_86_1532009.713571–0.763571–0.7635443379.23.603378.83.613505.5
R4_5_135_15302913.2134900.0034770.37349034200.013325.92.773420.5
R4_6_90_1533330.8033450.4533400.60336032121.793177.12.853270.4
R4_7_77_18229222.1228901.802965–0.7529432854.51.672872.21.062902.9
R4_8_105_16277119.0732564.9133462.2834243218.33.893241.33.203348.6
R4_9_113_1335318.6437213.7336595.3338653631.21.233574.52.773676.3
R4_10_115_1435191.1035440.3935460.34355834630.313342.23.783473.6
R4_11_100_15290615.4733013.9832694.9234383225.43.523131.16.343343
R4_12_105_1633825.3235720.0035650.2035723530.2–0.623465.11.233508.3
R4_13_97_1530947.2832781.7733160.6333373193.3–0.063169.30.703191.5
R4_14_98_1534132.0134291.5534291.5534833347.31.083327.41.673383.9
R4_15_108_153409–0.2634000.0034000.0034003315.81.423182.45.393363.6
R4_16_119_14324410.413635–0.3936130.2236213527.21.603387.25.503584.5
R4_17_110_1734093.3234930.943558–0.91352634710.303431.21.443481.4
R4_18_126_1835243.7736550.1936550.19366236150.203441.34.993622.2
R4_19_103_14319211.6335990.3635671.2536123495.11.993463.92.873566.1
R4_20_88_1932273.8732592.9233181.1633573176.12.753170.82.913265.8
avg3220.27.9534481.3434581.013494.63373.71.443314.23.143422.5
R6_1_152_2634803.5736090.0036090.0036093558.61.153491.13.023599.9
R6_2_175_24322113.0936491.5436351.9237063566.12.033451.65.183640.1
R6_3_154_2734648.9936942.9437681.0038063640.50.633641.30.613663.5
R6_4_151_26308519.9738280.7038510.1038553756.70.593667.22.963779
R6_5_169_29296416.4134662.2635170.82354633831.2733861.183426.4
R6_6_201_213918–1.0338241.3938061.8638783707.31.493581.14.843763.3
R6_7_167_25292120.823696–0.1936890.0036893547.21.713484.53.443608.8
R6_8_111_2430920.4530581.5530402.1231063009.41.372987.82.083051.2
R6_9_158_2334083.3534881.083553–0.7735263423.22.053468.90.753495
R6_10_116_243178–2.5830690.9429903.4930982998.3–0.012948.61.642997.9
R6_11_142_25281218.3733682.2434071.1034453327.10.863269.32.583356
R6_12_172_21315411.7835081.8735450.8435753442.31.123412.61.983481.4
R6_13_155_2235033.6634854.1535981.0536363435.62.753466.51.873532.7
R6_14_198_25295519.263667–0.1935862.0236603542.40.2434602.563550.8
R6_15_182_2132289.7634902.4335550.62357734401.943380.93.623507.9
R6_16_168_2635722.4636001.6935961.8036623516.9–0.123463.11.413512.6
R6_17_174_233695–1.8235891.1035980.8536293519.71.673509.11.973579.5
R6_18_202_2534034.1135490.0035490.0035493444.51.603409.22.613500.4
R6_19_164_23283617.9434530.093467–0.3234563394.50.523361.11.503412.3
R6_20_145_2433977.5136510.6036730.0036733541.52.823461.25.033644.4
avg3264.38.803537.11.313551.60.933584.13459.71.283415.12.543505.2

Appendix C. Verification of the Effectiveness of the Elite Ant Re-Optimization Strategy

In the verification of the effectiveness of the elite ant re-optimization strategy, the results obtained from the comparison between HACO, HACO1, and HACO2 for each test instance are shown in Table A2.
Table A2. Detailed results of comparing HACO with HACO1 and HACO2.
Table A2. Detailed results of comparing HACO with HACO1 and HACO2.
Instance HACO HACO1 HACO2 Instance HACO HACO1 HACO2
Obj. Obj. Gap Obj. Gap Obj. Obj. Gap Obj. Gap
R2_1_50_82702203824.5727020.00R2_11_37_72567178330.5425490.70
R2_2_59_83287242426.2532471.22R2_12_76_63286227930.6532162.13
R2_3_49_83454261724.2334540.00R2_13_53_62869209027.1528690.00
R2_4_50_83142227827.5030861.78R2_14_62_73073212230.9529832.93
R2_5_66_103317225132.1432741.30R2_15_51_73234264218.3131313.18
R2_6_54_102981225824.2529172.15R2_16_39_72518190224.4623974.81
R2_7_51_72709197327.1727090.00R2_17_56_83121276511.4130731.54
R2_8_47_92929206929.3629200.31R2_18_57_103260260820.0032600.00
R2_9_50_102998217927.3229980.00R2_19_64_83426255125.5434260.00
R2_10_58_102898252013.0428511.62R2_20_46_92835201328.9928350.00
R4_1_126_153740301319.4437070.88R4_11_100_153438266822.4032694.92
R4_2_108_173525288018.3034382.47R4_12_105_163572301015.7335450.76
R4_3_100_123434263523.2733502.45R4_13_97_153337293412.0831575.39
R4_4_86_153544271223.4835440.00R4_14_98_153483292416.0533703.24
R4_5_135_153490282519.0534670.66R4_15_108_153400277218.4733730.79
R4_6_90_153360265421.0133600.00R4_16_119_143621299217.3736080.36
R4_7_77_182943233820.5629051.29R4_17_110_173526270323.3434492.18
R4_8_105_163424275819.4532505.08R4_18_126_183662312514.6636620.00
R4_9_113_133865295223.6236605.30R4_19_103_143612294318.5235282.33
R4_10_115_143558316111.1635031.55R4_20_88_193357253324.5530957.80
R6_1_152_263609317212.1136090.00R6_11_142_253445294714.4633233.54
R6_2_175_243706324612.4136262.16R6_12_172_213575307214.0734922.32
R6_3_154_273806302720.4736883.10R6_13_155_223636314413.5334794.32
R6_4_151_263855339911.8337981.48R6_14_198_253660322111.9935542.90
R6_5_169_293546300315.3134462.82R6_15_182_213577319910.5734563.38
R6_6_201_213878322516.8436475.96R6_16_168_263662329010.1635652.65
R6_7_167_253689320313.1735972.49R6_17_174_233629320711.6335701.63
R6_8_111_24310628468.3730232.67R6_18_202_25354933066.8535240.70
R6_9_158_233526303713.8734960.85R6_19_164_233456304911.7834560.00
R6_10_116_24309828508.0129903.49R6_20_145_243673323811.8436241.33

References

  1. Golden, B.L.; Levy, L.; Vohra, R. The orienteering problem. Nav. Res. Logist. (NRL) 1987, 34, 307–318. [Google Scholar] [CrossRef]
  2. Vansteenwegen, P.; Souffriau, W.; Van Oudheusden, D. The orienteering problem: A survey. Eur. J. Oper. Res. 2011, 209, 1–10. [Google Scholar] [CrossRef]
  3. Gunawan, A.; Lau, H.C.; Vansteenwegen, P. Orienteering problem: A survey of recent variants, solution approaches and applications. Eur. J. Oper. Res. 2016, 255, 315–332. [Google Scholar] [CrossRef]
  4. Kant, R.; Mishra, A. The Orienteering Problem: A Review of Variants and Solution Approaches. In Proceedings of the 26th World Multi-Conference on Systemics, Cybernetics and Informatics, Virtual Conference, 12–15 July 2022; pp. 41–46. [Google Scholar]
  5. Butt, S.E.; Cavalier, T.M. A heuristic for the multiple tour maximum collection problem. Comput. Oper. Res. 1994, 21, 101–111. [Google Scholar] [CrossRef]
  6. Feillet, D.; Dejax, P.; Gendreau, M. Traveling salesman problems with profits. Transp. Sci. 2005, 39, 188–205. [Google Scholar] [CrossRef]
  7. Ruiz-Meza, J.; Montoya-Torres, J.R. A systematic literature review for the tourist trip design problem: Extensions, solution techniques and future research lines. Oper. Res. Perspect. 2022, 9, 100228. [Google Scholar] [CrossRef]
  8. Angelelli, E.; Archetti, C.; Vindigni, M. The clustered orienteering problem. Eur. J. Oper. Res. 2014, 238, 404–414. [Google Scholar] [CrossRef]
  9. Archetti, C.; Carrabs, F.; Cerulli, R. The set orienteering problem. Eur. J. Oper. Res. 2018, 267, 264–272. [Google Scholar] [CrossRef]
  10. He, M.; Wu, Q.; Benlic, U.; Lu, Y.; Chen, Y. An effective multi-level memetic search with neighborhood reduction for the clustered team orienteering problem. Eur. J. Oper. Res. 2024, 318, 778–801. [Google Scholar] [CrossRef]
  11. Nguyen, T.D.; Martinelli, R.; Pham, Q.A.; Hà, M.H. The set team orienteering problem. Eur. J. Oper. Res. 2025, 321, 75–87. [Google Scholar] [CrossRef]
  12. Laporte, G.; Asef-Vaziri, A.; Sriskandarajah, C. Some applications of the generalized travelling salesman problem. J. Oper. Res. Soc. 1996, 47, 1461–1467. [Google Scholar] [CrossRef]
  13. Khan, I.; Maiti, M.K.; Basuli, K. A random-permutation based ga for generalized traveling salesman problem in imprecise environments. Evol. Intell. 2021, 16, 229–245. [Google Scholar] [CrossRef]
  14. Bernardino, R.; Paias, A. Solving the family traveling salesman problem. Eur. J. Oper. Res. 2018, 267, 453–466. [Google Scholar] [CrossRef]
  15. Chaves, A.A.; Vianna, B.L.; da Silva, T.T.; Schenekemberg, C.M. A parallel branch-and-cut and an adaptive metaheuristic to solve the Family Traveling Salesman Problem. Expert. Syst. Appl. 2023, 238, 121735. [Google Scholar] [CrossRef]
  16. Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2692–2700. [Google Scholar]
  17. Garmendia, A.I.; Ceberio, J.; Mendiburu, A. Applicability of neural combinatorial optimization: A critical view. ACM Trans. Evol. Learn. Optim. 2024, 4, 1–26. [Google Scholar] [CrossRef]
  18. Gama, R.; Fernandes, H.L. A reinforcement learning approach to the orienteering problem with time windows. Comput. Oper. Res. 2021, 133, 105357. [Google Scholar] [CrossRef]
  19. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  20. Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
  21. Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef]
  22. Vansteenwegen, P.; Souffriau, W.; Berghe, G.V.; Van Oudheusden, D. Iterated local search for the team orienteering problem with time windows. Comput. Oper. Res. 2009, 36, 3281–3290. [Google Scholar] [CrossRef]
  23. Khodadadian, M.; Divsalar, A.; Verbeeck, C.; Gunawan, A.; Vansteenwegen, P. Time dependent orienteering problem with time windows and service time dependent profits. Comput. Oper. Res. 2022, 143, 105794. [Google Scholar] [CrossRef]
  24. Solomon, M.M. Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints. Oper. Res. 1987, 35, 254–265. [Google Scholar] [CrossRef]
  25. Xia, Y.; Chen, C.; Liu, Y.; Shi, J.; Liu, Z. Two-layer path planning for multi-area coverage by a cooperative ground vehicle and drone system. Expert. Syst. Appl. 2023, 217, 119604. [Google Scholar] [CrossRef]
  26. López-Ibáñez, M.; Dubois-Lacoste, J.; Cáceres, L.P.; Birattari, M.; Stützle, T. The irace package: Iterated racing for automatic algorithm configuration. Oper. Res. Perspect. 2016, 3, 43–58. [Google Scholar] [CrossRef]
  27. Duque, R.; Arbelaez, A.; Díaz, J.F. Online over time processing of combinatorial problems. Constraints 2018, 23, 310–334. [Google Scholar] [CrossRef]
Figure 1. Scenario of the MzOPTW.
Figure 1. Scenario of the MzOPTW.
Mathematics 13 02357 g001
Figure 2. Path structure of zones, entry/exit nodes, and customers.
Figure 2. Path structure of zones, entry/exit nodes, and customers.
Mathematics 13 02357 g002
Figure 3. Ptr-Net structure.
Figure 3. Ptr-Net structure.
Mathematics 13 02357 g003
Figure 4. An example of four zones.
Figure 4. An example of four zones.
Mathematics 13 02357 g004
Figure 5. Comparison of overall solution results for different algorithms (optimal value).
Figure 5. Comparison of overall solution results for different algorithms (optimal value).
Mathematics 13 02357 g005
Table 1. Characteristics of MzOPTW and its related problems.
Table 1. Characteristics of MzOPTW and its related problems.
ProblemMultiple ZonesTime WindowsEntry and Exit Nodes of ZonesCustomer FlexibilityAlgorithm
COP [8] All customers per zoneHeuristic
SOP [9] Only one customer per zoneMatheuristic
GTSP [13] Only one customer per zoneHeuristic
FTSP [14,15] Predefined customers per zone Branch-and-cut
MzOPTWAny customers per zone Learning-enhanced metaheuristic algorithm
√ Multiple Zones: The problem involves multiple zones; Time Windows: Customers have specific time windows; Entry and Exit Nodes of Zones: Each zone has multiple entry and exit nodes.
Table 2. Parameters and variables.
Table 2. Parameters and variables.
SetDescription
S The set of zones, S = 1 , 2 , , s
N 0 , N n + 1 Set of start and end nodes, N 0 = 0 , N n + 1 = n + 1
N The set of entry and exit nodes for all zones, virtual entry/exit nodes, and the set of customers, N = 1 , 2 , , n
N s The set of entry and exit nodes, virtual entry/exit nodes, and customers of zone s
N s 1 , N s 3 The set of entry and exit nodes of zone s , and the set of virtual entry/exit nodes of zone s
N s 2 The set of customers in zone s
N 1 , N 3 The set of entry and exit nodes for all zones, and the set of virtual entry/exit nodes for all zones
N 2 The set of all customers
N 4 The set of entry and exit nodes, virtual entry/exit nodes, and start and end nodes for all zones
L i j F The shortest route distance between node i and node j
L i j E The Euclidean distance between node i and node j
v F The driving speed outside the zone
v E The driving speed within the zone
d i The service time of customer i
p i The profit of customer i
T m a x Total time limit
[ o i , c i ] The service time window of customer i
T i Auxiliary variable: the sum of the service time of customer i and the waiting time of the subsequent customer
a i Auxiliary variable: the start time of the service for customer i
y i Auxiliary variable: 1 if the service worker visits node i , otherwise 0.
x i j Decision variable: 1 if the service worker traverses arc (i, j), otherwise 0.
Table 3. Results of HACO vs. HACO1 and HACO2.
Table 3. Results of HACO vs. HACO1 and HACO2.
Number of Zones Obj.Gap
HACOHACO1HACO2HACO1HACO2
|S| = 2Max34542765345432.144.81
Min25181783239711.410.00
Avg3030.32268.12994.925.191.18
|S| = 4Max38653161370724.557.80
Min29432338290511.160.00
Avg3494.62826.6341219.132.37
|S| = 6Max38783399379820.475.96
Min3098284629906.850.00
Avg3584.13134.13498.212.462.39
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, H.; Luo, Y.; Chen, Y.; Jiang, Y. A Learning-Enhanced Metaheuristic Algorithm for Multi-Zone Orienteering Problem with Time Windows. Mathematics 2025, 13, 2357. https://doi.org/10.3390/math13152357

AMA Style

Li H, Luo Y, Chen Y, Jiang Y. A Learning-Enhanced Metaheuristic Algorithm for Multi-Zone Orienteering Problem with Time Windows. Mathematics. 2025; 13(15):2357. https://doi.org/10.3390/math13152357

Chicago/Turabian Style

Li, Hongwu, Yongqi Luo, Yanru Chen, and Yangsheng Jiang. 2025. "A Learning-Enhanced Metaheuristic Algorithm for Multi-Zone Orienteering Problem with Time Windows" Mathematics 13, no. 15: 2357. https://doi.org/10.3390/math13152357

APA Style

Li, H., Luo, Y., Chen, Y., & Jiang, Y. (2025). A Learning-Enhanced Metaheuristic Algorithm for Multi-Zone Orienteering Problem with Time Windows. Mathematics, 13(15), 2357. https://doi.org/10.3390/math13152357

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop