In this study, the proposed algorithm is implemented using Python 3.9. To verify its effectiveness, a series of numerical experiments are conducted, including tests to evaluate the overall performance of the algorithm as well as the effectiveness of the related strategies.
4.1. Instance Description
Due to the lack of standard test instances for this problem, this study references the VRPTW example generation method [
24] and the road network generation approach from another study [
25], and randomly generates test instances of three different sizes, each with two, four, and six zones. Each size includes 20 test instances. The example generation method and parameter settings are as follows:
First, a road network is generated on a 6 × 6 two-dimensional plane, consisting of 36 square grids with a side length of 10. The start and end nodes are located at the center
. A certain number of grids are randomly selected, and a specified number of points
are randomly generated within each grid in the two-dimensional plane
. The convex hull of these points is calculated, and the points in the convex hull are connected in sequence to form the zone’s boundary. Then, three random points are selected from the convex hull and connected to the nearest main road, and
points are chosen as the entry/exit nodes (
) of the zone. Points not included in the convex hull are considered customers within the zone.
Figure 4 illustrates an example with four zones, where the service worker’s start and end points are at the center. The black lines represent the main road routes, the green dashed lines represent the zone boundaries, the red stars indicate the zone’s entry/exit nodes, and the blue dots represent the customers within the zone. The values for
and
are set to 1.6 and 4, respectively. The working time is set to
, and the customers’ profits and service durations are uniformly distributed as random integers within the specified ranges
and
, respectively. The customers’ service time windows are defined by the time window center and width, which are uniformly distributed as random integers within the specified ranges
and, respectively.
4.2. Parameters and Pointer Network Training Process
In the HACO algorithm, the number of ants
is set to 100, and the number of elite ants
is set to 5. The set of maximum zone stay time limits is denoted as
, with an initial value
of 0.6. If the historical best solution has not improved after three iterations, the value of
is reduced to 0.2 to increase the search space for the ants. The parameters are set as follows:
= 1,
= 1.2,
= 2.2,
= 1.2,
= 0.15,
= 30,
= 20,
= 128, C = 10,
= 0.2, where the initial values of
,
,
,
, and
are determined through IRACE [
26], which performs iterative racing procedures to identify high-performing parameter combinations based on experimental performance. The remaining parameters are set based on instance characteristics and preliminary tuning experiments.
In the training process of the Pointer Network, a global model is trained for each of the three different sizes of test instances. For each zone’s customers, their geographical location remains fixed, but their time windows, service durations, and profits are generated randomly. Therefore, variable feature data is sampled during the training process. In each training iteration for each size, a zone and its entry/exit positions are randomly selected, and samples are generated. The time range for entering and exiting the zone is between 0 and 540. Each training for a given size is conducted for 300,000 iterations, with a batch size of 32. The Transformer encoder in the Pointer Network consists of two neural modules, each with an 8-head multi-head attention layer. The vector dimensions for both the encoder and decoder are set to 128, and the feed-forward layer dimension is set to 256. Each model is trained offline within a few hours on a standard GPU, and the resulting models are reused across all corresponding test instances. This eliminates the need for retraining and ensures that the training cost is incurred only once per problem size.
4.3. Performance of the HACO
In this study, the Gurobi solver is used to obtain exact solutions for the three different sizes of test instances, with a time limit of 7200 s. If the solution time exceeds the time limit, the current best integer solution is output. Additionally, since there is no similar comprehensive comparison algorithm, and considering that the Variable Neighborhood Search (VNS) algorithm proposed in another study [
23] can solve problems such as TDOPTW by fixing the service time, this study replaces the Pointer Network in HACO algorithm with VNS to solve the lower-level route problem, which serves as the comparison algorithm ACO-VNS. Moreover, as iterated local search (ILS) [
22] has been shown to effectively balance solution quality and computation time, it is chosen as the comparison algorithm for the Pointer Network. Thus, this study replaces the Pointer Network in HACO algorithm with ILS to solve the lower-level route problem, resulting in the comparison algorithm ACO-ILS. Considering that the MzOPTW requires multiple solutions for the OPTW, which leads to long computation times, VNS only uses the operators from the Initialization phase in the zone benefit estimation and general ant route generation phases. ILS uses only the Insert operator. In the elite ant re-optimization phase, both VNS and ILS use their full algorithm frameworks to solve the OPTW.
The running time limit for each algorithm on the three different sizes of test instances is set to 30 s, 45 s, and 60 s, respectively, and each algorithm is run 10 times. These time limits are kept consistent across all algorithms to ensure a fair comparison. The overall experimental results are shown in
Figure 5 (see
Appendix B for details), where the vertical axis represents the average objective function value (optimal value) of each algorithm on the different-size test instances.
Due to the need to solve multiple OPTWs for different zones and optimize the stay time in each zone, the computational complexity of MzOPTW is relatively high. Within the 7200 s time limit, Gurobi was unable to find the optimal solution for all test instances. However, in the 20 test instances of each size, HACO algorithm outperformed Gurobi in 18, 19, and 17 instances, with the maximum improvements in solution quality reaching 20.17%, 22.12%, and 20.82%, and the average improvements being 5.46%, 7.95%, and 8.8%, respectively. This shows that HACO algorithm outperforms Gurobi in overall performance, and its performance becomes more stable as the size of the problem increases.
Compared to ACO-ILS, HACO algorithm performed similarly in small-scale test instances and outperformed ACO-ILS in 15 test instances, with the maximum improvement being 2.93% and the average improvement 0.73%. In the other two scales, HACO algorithm outperformed ACO-ILS in 15 and 16 test instances, with maximum improvements of 4.91% and 4.15% and average improvements of 1.34% and 1.31%, respectively. When compared to ACO-VNS, HACO algorithm outperformed ACO-VNS in 11, 16, and 14 test instances for the three scales, with the maximum improvements reaching 2.93%, 5.33%, and 3.49%, and the average improvements being 0.52%, 1.01%, and 0.93%, respectively. It should be noted that while there are isolated cases where HACO algorithm performs slightly worse than ACO-ILS and ACO-VNS, the algorithm consistently shows superior performance in terms of average objective values across multiple runs, demonstrating its overall robustness and effectiveness.
Furthermore, in the 60 test instances across all three sizes, HACO algorithm achieved the optimal solution in 44 instances, accounting for 73% of the total test instances. On average, the HACO algorithm outperformed ACO-ILS and ACO-VNS in 88% and 97% of the test instances, respectively. Notably, these results were obtained within strict computational time limits (30 s for small, 45 s for medium, and 60 s for large instances), demonstrating that the integration of the Pointer Network learning model enhances not only solution quality but also computational efficiency. Compared to the baseline heuristics, the HACO algorithm consistently reached better solutions within comparable time limits, highlighting its practical potential in time-constrained decision environments.
4.4. Performance of the Elite Ant Re-Optimization Strategy
To verify the effectiveness of the elite ant re-optimization strategy, this study removes the elite ant re-optimization from the HACO algorithm (denoted as HACO
1) and removes only the second stage (zone time fine-tuning algorithm) from the HACO algorithm (denoted as HACO
2). These two variants are then compared with the original HACO algorithm. The running time limit for each test instance remains unchanged for all three sizes, with each test instance being run 10 times, and the optimal value is selected. The experimental results are shown in
Table 3, where Avg, Max, and Min represent the average, maximum, and minimum values, respectively, for the 20 test instances of each size.
In
Table 3, HACO outperforms HACO
1 in every test instance. For the three different sizes, the maximum improvements are 32.14%, 24.55%, and 20.47%; the minimum improvements are 14.41%, 11.16%, and 6.85%; and the average improvements are 25.19%, 19.13%, and 12.46%, respectively. This demonstrates that the elite ant re-optimization strategy significantly enhances the solution quality.
Furthermore, HACO, which incorporates the zone time fine-tuning algorithm, outperforms HACO
2 in the majority of the test instances. For the three scales, the maximum improvements are 4.81%, 7.8%, and 5.96%, with average improvements of 1.18%, 2.37%, and 2.39%, respectively. The proportion of test instances with improvements is 60%, 85%, and 90% (see
Appendix C). This indicates that the zone time fine-tuning algorithm further optimizes the solution.