1. Introduction
As an important branch of optimization, combinatorial optimization plays a significant role in management and economics, computer science, artificial intelligence, biology, engineering, etc. [
1]. The traveling salesman problems (TSPs) are the main subject of combinatorial optimization problems, in which the goal is to find a closed route through all the cities once, and only once. This problem is equivalent to finding a Hamilton circuit with the minimum distance. The TSP, and its variants, such as asymmetric TSPs (ATSPs) [
2], clustered TSPs (CTSPs) [
3], dynamic TSPs (DTSPs) [
4], multiple TSPs (MTSPs) [
5], and wandering salesman problems (WSPs) [
6], have wide applications in laser engraving [
7], integrated circuit design [
8], transportation [
9], energy saving [
10], logistics problems [
11], communication engineering [
12], and medical waste transportation, which is closely related to the COVID19 pandemic [
13]. The TSP was first considered in mathematical format in 1930 to solve a school bus routing problem, and then spread by researchers of Rand corporation. However, these problems were first considered only dozens of cities, but with the increase in applications, the scale of the problems may exceed millions [
14].
Although the description of TSP is simple, it has been proven as NPHard, which means that the time required to obtain the exact solution for TSP will increase exponentially when the size of the problem aggrandizes. Lots of algorithms have been developed for TSPs, and they can be split into four categories: exact methods, approximation algorithms, intelligence algorithms, and heuristics algorithms. The exact solver, such as bruteforce search, linear programming [
15], dynamic programming [
16], brand and bound [
17], brand and cut [
18], and cutting plane [
19] are powerful tools for small scale TSPs. However, the computational complexity of an exact algorithm is very huge, such that solving the instance with 85900 nodes will take over 136 CPUyears by Concorde, which is a mature exact solver for TSPs [
20]. Since there is no efficient exact solution to any NPhard problem, numerous efficient approximation solutions are presented for finding efficient approximation solutions in polynomial time complexity and with provable solution quality [
21]. Although such algorithms can obtain high approximation ratio such as
$1+\u03f5$ for Euclidean TSPs [
22] and
$7/5+\u03f5$ for TSPs [
23], the running times of these approaches, even though asymptotically polynomial, can be rather large, see [
24].
The intelligence algorithms are inspired by the nature world and have high capabilities to approximate the global optimal for optimization problems. Evolutionary algorithm (EA) [
25], ant colony optimization algorithm (ACO) [
26], ant colony system (ACS) [
27], shuffled frog leaping algorithm (SFLA) [
28], simulated annealing algorithm (SA) [
29], particle swarm optimization (PSO) [
30], and other wellknown algorithms [
31,
32] all belong to intelligence algorithms. The novel intelligence algorithm can be employed to solve the problem with
$2\times {10}^{5}$ nodes with high quality in an hour on a retail computer, but it is still hard to tackle while the scale is larger [
33]. There are two main drawbacks of intelligent algorithms: one is that they frequently converge to the local optimum; the other one is that the parameters affect the solution quality deeply but usually can only be determined empirically [
34]. The main heuristic algorithms for TSPs can be grouped into the Lin–Kernighan family and stemandcycle family; they could provide highquality solutions for nearly 2 million cities’ problems [
35]. For higher quality solutions and lower running time, some researchers combined intelligence algorithms and heuristics algorithms; see [
36,
37,
38] and the reference therein.
Genetic algorithm (GA) was proposed by Holland in 1975, the basic idea stems from “survival of the fittest” in evolutionism. Most types of GAs contain three main segments: selection operator, crossover operator, and mutation operator. Due to the high effectiveness and versatility of GAs, they have been widely employed to solve TSPs and other challenging optimization problems [
39]. However, there are still several doubts about TSPs, including premature convergence, population initialization, problem encoding, etc. [
40].
On the other hand, crossover operators have a significant influence on the performance of GA and are a key factor in global searching and convergence speed. As a matter of fact, various crossover operators have been proposed for TSP, including partially mapped crossover (PMX) [
41], ordered crossover (OX) [
42], cycle crossover (CX) [
43], sequential constructive crossover operator (SCX) [
44], completely mapped crossover operators (CMX) [
45], and others based on heuristic algorithms such as bidirectional heuristic crossover operator (BHX) [
46]. Additionally, merging GAs with local search or heuristic algorithms will reveal both of their advantages, including high convergence speed and the capacity for global optimization; therefore, it has been a hot topic of study [
36,
47,
48].
Although the size of TSPs are larger than
${10}^{5}$, seeking a high quality solution is extremely difficult, even the high powerful implementation of the Lin–Kernighan heuristic (LKH) maintained by Helsgaun [
49] will take over an hour on a
${10}^{5}$ nodes instance, see the experiments in [
33] and
Table 1. In addition, even a small improvement in quality can take a long time; the question of how to obtain an acceptable approximation solution in a reasonable time is more useful in realworld applications [
50]. Thus, a new series of twolayered algorithms has been proposed, and the fundamental concepts of them can be divided into two categories. The first type of them is to use various clustering techniques to divide the cities into small groups, calculate the subTSPs within those groups, and then merge the groups into a Hamilton cycle [
51,
52,
53]. The other one is to determine the start and end points for each small group after clustering firstly, and then solve the fixed start and end points of TSPs, which are also called WSPs; finally, combine all the groups [
54]. These algorithms are much faster than algorithms without clustering and can solve 180 K size TSP within a few hours [
7].
Naturally, twolayered algorithms can be developed to be threelayered or multiplelayered; very recent works can be seen in [
55,
56]. Admittedly, in order to fully utilize all the CPUs of computers, parallelizability is becoming extremely essential for algorithms designed to solve large and complicated problems. Some parallel algorithms for TSPs can be seen in [
57,
58].
In this paper, in order to develop a fast, easy implementation and high parallelizability algorithm for TSPs, an adaptive layered clustering framework with improved genetic algorithms (ALC_IGA) is supposed. This algorithm is not only an improvement of GA, but also an extension of the twolayered and threelayered methods in the references [
7,
51,
55] for TSPs. The key contributions of this study are as follows:
An improved genetic algorithm (IGA) integrated with hybrid selection, a selective BHX crossover operator, and simplified 2opt local search has been proposed; a numerical comparison of IGA, GA, and ACS on TSPs shows the high performance of IGA.
Plentiful numerical results also prove the effectiveness of the novel IGA for solving the WSPs.
An adaptive layered clustering framework is proposed to break down a largescale problem into a series of smallscale problems. The computational complexity of the ALC_IGA is between $O(nlogn)$ and $O\left({n}^{2}\right)$; moreover, the parallelability of it has been discussed.
We show a numerical experiment for parameters tuning of the proposed ALC_IGA; the results reveal that the larger the parameter set, the higher the solution quality that is obtained, but a longer time is required.
Dozens of twodimensional Euclidean instances have been tested with ALC_IGA and some twolayered algorithms, and the results show that ALC_IGA has advantages in terms of accuracy, stability and convergence speed over the twolayered algorithms proposed by [
7,
51].
Lots of largescale instances ranging in size from $4\times {10}^{4}$ to $2\times {10}^{5}$ have been tested, and the results show that the parallel ALC_IGA is more than four times faster than the other three compared algorithms, and obtains the best solution in the most cases. The results on very largescale TSPs, with sizes ranging from $2\times {10}^{5}$ to $2\times {10}^{6}$, also demonstrate the excellent effectiveness of ALC_IGA.
The remainder of the paper is organized as follows: a brief literature review of some related concepts is presented in
Section 2; the main procedures of IGA are shown in
Section 3; the details ALC_IGA are discussed in
Section 4; the results of experimental analyses and algorithms comparisons are shown in
Section 5; A summary of this paper and future works are listed in
Section 6.
4. The Framework of ALC_IGA for LargeScale TSPs
In recent years, some twolayered algorithms have been proposed, and they significantly reduce the time expenditure for largescale TSPs [
7,
51]. Liang et al. [
55] recently proposed a threelayered algorithm with
kmeans and indicated that it outperforms some twolayered algorithms by numerical experiments. Notwithstanding, both twolayered and threelayered algorithms may still have mediumscale or largescale groups. Naturally, this will require a significant amount of time to solve the underlying problems. Thus, upgrading the twolayered and threelayered algorithms to the adaptive layered algorithm stands to reason.
We propose a brand new framework for adaptive layered clustering that takes into account the IGA created in the previous section. The framework is divided into two parts: the first is applying clustering and IGA to initialize the solution, and the second is optimizing the initial solution. Based on our new algorithm, the largescale TSPs can be transformed into solving some TSPs and WSPs that are smaller than the specified size. The processing flows are illustrated in
Figure 2, and the details of solution initialization and optimization are represented subsequently in
Section 4.1 and
Section 4.2.
4.1. Solution Initialization
In the solution initialization phase, we combine the adaptive layered clustering and IGA. For each cluster that is larger than the specified size, kmeans will be applied to divide the problem into some small clusters and then determine the number of layers, visit order, entry cities, and exit cities of the subclusters. When the size of a cluster is smaller than the specified size, IGA is used to determine the Hamiltonian path from the entry node to the exit node within the cluster. When the size of a cluster is larger than the specified size, kmeans is used again to split the cluster. These processes are repeated until the paths of all clusters are determined. Then, combining all subpaths, and we obtain the initial feasible path.
The procedure of main steps of the solution initialization phase is illustrated in
Figure 2. Its pseudocode is given in Algorithm 6.
Algorithm 6 Solution initialization framework. 
Input: A TSP problem G, the size N of G, the nodes are designated by ${c}_{1},{c}_{2},\cdots ,{c}_{N}$, a positive integer M, and denote the size of ${G}_{i}$ by $S\left({G}_{i}\right)$. Output: The initial solution $R\left(G\right)$ for TSP G.
 1:
if
$S\left(G\right)\le M$
then  2:
Apply IGA to solve G, and output the solution $R\left(G\right)$.  3:
else  4:
Divide the problem G into ${k}_{1}$ clusters by kmeans.  5:
Denote the groups by $\{{G}_{1},{G}_{2},\cdots ,{G}_{{k}_{1}}\}$, the coordinate vectors of centers are $\{V\left({G}_{1}\right),V\left({G}_{2}\right),\cdots ,V\left({G}_{{k}_{1}}\right)\}$, the sizes of groups marked as $\{S\left({G}_{1}\right),S\left({G}_{2}\right),\cdots ,S\left({G}_{{k}_{1}}\right)\}$.  6:
if $max\{S\left({G}_{1}\right),S\left({G}_{2}\right),\cdots ,S\left({G}_{{k}_{1}}\right)\}\le M$ then  7:
Set ${D}_{ij}$ as ${min}_{k,h}d({c}_{ik},{c}_{jh})$, where ${c}_{ik}\in {G}_{i}$, ${c}_{jh}\in {G}_{h}$, $k\in \{1,2,\cdots ,S\left({G}_{i}\right)\}$, $j\in \{1,2,\cdots ,S\left({G}_{h}\right)\}$.  8:
Use IGA to solve the distance matrix M and record the visited order $\{O\left({G}_{1}\right),O\left({G}_{2}\right),\cdots ,O\left({G}_{{k}_{1}}\right)\}$.  9:
else  10:
Obtain the visited order $\{O\left({G}_{1}\right),O\left({G}_{2}\right),\cdots ,O\left({G}_{{k}_{1}}\right)\}$ by IGA for solving $\{V\left({G}_{1}\right),V\left({G}_{2}\right),\cdots ,V\left({G}_{{k}_{1}}\right)\}$.  11:
end if  12:
for ${G}_{i}$ in $\{{G}_{1},{G}_{2},\cdots ,{G}_{{k}_{1}}\}$ do  13:
Find the last visit group ${G}_{h}$ and the next visit group ${G}_{j}$.  14:
Set ${G}_{i}^{entry}$ as the nearest city in ${G}_{i}$ to ${G}_{h}$, and set ${G}_{i}^{exit}$ as the nearest city to ${G}_{j}$.  15:
if ${G}_{i}^{entry}$ = ${G}_{i}^{exit}$ then  16:
Set ${G}_{i}^{exit}$ as the second nearest city to ${G}_{j}$.  17:
end if  18:
end for  19:
while some groups are unsolved do  20:
for ${G}_{i}$ in $\{{G}_{1},{G}_{2},\cdots \}$ do  21:
if ${G}_{i}$ is unsolved then  22:
if $S\left({G}_{i}\right)\le M$ then  23:
Apply IGA to obtain the shortest route from ${G}_{i}^{entry}$ to ${G}_{i}^{exit}$.  24:
else  25:
Divide the ${G}_{i}$ into ${k}_{ih}$ clusters by kmeans.  26:
Denote the groups by $\{{G}_{i1},{G}_{i2},\cdots ,{G}_{i{k}_{ih}}\}$, the coordinate vectors of centers are $\{V\left({G}_{i1}\right),V\left({G}_{i2}\right),\cdots ,V\left({G}_{i{k}_{ih}}\right)\}$, the sizes of groups marked as $\{S\left({G}_{i1}\right),S\left({G}_{i2}\right),\cdots ,S\left({G}_{i{k}_{ih}}\right)\}$.  27:
Find the visited order $\{O\left({G}_{i1}\right),O\left({G}_{i2}\right),\cdots ,O\left({G}_{i{k}_{ih}}\right)\}$ by the same method in lines 6–11.  28:
Set the entry and exit cities of each groups by the same method mentioned in lines 12–18.  29:
end if  30:
end if  31:
end for  32:
end while  33:
Organize the visit orders, entry cities, exit cities, and the internal route of each cluster, and output the initial feasible path $R\left(G\right)$.  34:
end if

Remark 2. We point out that there are many different clustering techniques, which makes it hard to select the ideal clustering strategy for TSPs. The results in [64] indicate that the simple gridbased methods are better than the kmeans methods. Because the numerical experiments are only focused on one particular instance, we still use the standard kmeans as models. An example of a 100cities TSP and
M set to 20 is shown in
Figure 3. In the first layer, the cities have been divided into two groups
${G}_{1},{G}_{2}$ by
kmeans, and the visit order found by IGA is
$O\left({G}_{1}\right)=1$ and
$O\left({G}_{2}\right)=2$. On the one hand, since the size of
${G}_{2}$ equals
M, the visit route
$P\left({G}_{2}\right)$ of the 20 cities in
${G}_{2}$ could be solved by IGA quickly. On the other hand, because there are 80 cities in
${G}_{1}$, that is larger than
M, so
${G}_{1}$ needs to be divided into small groups again. Repeat the procedures are until all of the group sizes are less than
M, resulting in six groups and four layers being determined during the solution initialization phase. To combine the six routes, first, from the bottom layer, connect
$P\left({G}_{1311}\right)$ with
$P\left({G}_{1312}\right)$ sequentially, and obtain
$P\left({G}_{131}\right)=\{P\left({G}_{1311}\right),P\left({G}_{1312}\right)\}$. Then, in the third layer, connect
$P\left({G}_{131}\right)$ with
$P\left({G}_{132}\right)$, then
$P\left({G}_{13}\right)=\{P\left({G}_{131}\right),P\left({G}_{132}\right)\}$. Following these steps, the path for the 100cities TSP is eventually
$\{P\left({G}_{1311}\right),P\left({G}_{1312}\right),P\left({G}_{132}\right),P\left({G}_{11}\right),P\left({G}_{12}\right),P\left({G}_{2}\right)\}$.
4.2. Two Phases 2Opt for Solution Optimization
Because of the clustering algorithm used, the solution obtained in the solution initialization stage is a little rough and can be continue to be optimized. In [
66], Liao and Liu first applied the 2opt and 3opt operators to optimize the initial route with the clustering algorithm involved, and the numerical studies show a marked improvement when
kopt is used. Nevertheless, when the number of cities in the problem is exceptionally enormous, the
kopt struggles to work.
To improve the quality of the initial solution in an affordable time, a twophase simplified 2opt algorithm (TS_2opt) is given in Algorithm 7. The TS_2opt is aimed to optimize the routes and orders of all the groups which belong to a cluster at a higher layer. Once the solution is initialized, TS_2opt is used to optimize the route of each group in the penultimate layer and repeated layer by layer until the top layer is optimized. Depicted in
Figure 3, the green lines show the workflow of solution optimization. Firstly, from the bottom layer, the routes
$P\left({G}_{1311}\right)$ and
$P\left({G}_{1312}\right)$ are combined by TS_2opt to the local optimal routes
$P{\left({G}_{131}\right)}^{opt}$. Then, the two routes in the third layer also are optimized to
$P{\left({G}_{13}\right)}^{opt}$ by using TS_2opt. Follow these steps until the final solution
$P{\left(G\right)}^{opt}$ is obtained.
Algorithm 7 Twophase simplified 2opt algorithm. 
Input: A batch of groups $\{{G}_{i\cdots j1},{G}_{i\cdots j2},\cdots ,{G}_{i\cdots jh}\}$, suppose the order of them is $1,2,\cdots ,h$, and the travel routes of them $\{P\left({G}_{i\cdots j1}\right),P\left({G}_{i\cdots j2}\right),\cdots ,P\left({G}_{i\cdots jh}\right)\}$. Initialize parameters: The first phase max iteration ${L}_{1}$; the second phase max iteration ${L}_{2}$; the length selected for optimization R. Output: An optimized route $P\left({G}_{i\cdots j}\right)$ for ${G}_{i\cdots j}$.
 1:
Compute the distance ${d}_{bks}$ of the tour $\{P\left({G}_{i\cdots j1}\right),P\left({G}_{i\cdots j2}\right),\cdots ,P\left({G}_{i\cdots jh}\right)\}$.  2:
for $ite{r}_{1}$ = 1 to ${L}_{1}$ do  3:
Randomly generated two different integers ${p}_{1}$, ${p}_{2}$ between $[2,h1]$.  4:
Denote the route between ${G}_{i\cdots j{p}_{1}}$ and ${G}_{i\cdots j{p}_{2}}$ as ${P}_{{p}_{1}}^{{p}_{2}}$; denote the route between ${G}_{i\cdots j1}$ and ${G}_{i\cdots j{p}_{1}1}$ as ${P}_{1}^{{p}_{1}1}$; denote the route between ${G}_{i\cdots j{p}_{2}}$ and ${G}_{i\cdots jh}$ as ${P}_{{p}_{2}+1}^{h}$.  5:
Inverse ${P}_{{p}_{1}}^{{p}_{2}}$, denote the new route as $Inv\left({P}_{{p}_{1}}^{{p}_{2}}\right)$.  6:
Generate two routes ${P}_{1}$ and ${P}_{2}$, where ${P}_{1}$ is combined by the last R elements of ${P}_{1}^{{p}_{1}1}$ and the first R elements of $Inv\left({P}_{{p}_{1}}^{{p}_{2}}\right)$; ${P}_{2}$ is combined by the last R elements of $Inv\left({P}_{{p}_{1}}^{{p}_{2}}\right)$ and the first R elements of ${P}_{{p}_{2}+1}^{h}$. Denote the new order of groups as $\{O\left({G}_{i\cdots j1}\right),O\left({G}_{i\cdots j2}\right),\cdots ,O\left({G}_{i\cdots jh}\right)\}$, the sizes of groups is noted as $\{S\left({G}_{i\cdots j1}\right),S\left({G}_{i\cdots j2}\right),\cdots ,S\left({G}_{i\cdots jh}\right)\}$.  7:
The Algorithm 5 with max iteration number ${L}_{2}$ is applied to optimize ${P}_{1}$ and ${P}_{2}$. Denote the new routes as ${P}_{1}^{opt}$ and ${P}_{2}^{opt}$.  8:
Replace ${P}_{1}$ and ${P}_{2}$ in $\{{P}_{1}^{{p}_{1}1},Inv\left({P}_{{p}_{1}}^{{p}_{2}}\right),{P}_{{p}_{2}+1}^{h}\}$ with ${P}_{1}^{opt}$ and ${P}_{2}^{opt}$, respectively. Denote the new route as ${P}_{opt}$.  9:
Compute the distance ${d}_{opt}$ of ${P}_{opt}$.  10:
if ${d}_{bks}>{d}_{opt}$ then  11:
Assign ${d}_{opt}$ to ${d}_{bks}$.  12:
Divide ${P}_{opt}$ into h segments $\{{P}_{{m}_{1}},{P}_{{m}_{2}},\cdots ,{P}_{{m}_{h}}\}$, here $S\left({P}_{{m}_{k}}\right)$ is equal to $\left\{S\left({G}_{i\cdots jr}\right)\rightr={m}_{k}\}$.  13:
end if  14:
Replace $\{P\left({G}_{i\cdots j1}\right),P\left({G}_{i\cdots j2}\right),\cdots ,P\left({G}_{i\cdots jh}\right)\}$ by $\{{P}_{{m}_{1}},{P}_{{m}_{2}},\cdots ,{P}_{{m}_{h}}\}$.  15:
end for  16:
Output $R={P}_{opt}$.

Suppose there are three groups
$\{{G}_{11},{G}_{12},{G}_{13}\}$ belonging to the same higher group
${G}_{1}$, and the visit orders of them are
$\{2,3,1\}$, respectively.
Figure 4 illustrates the major processing of TS_2opt in detail. Each cluster is represented by a different color, whereas the start and end locations are marked by larger shapes. In Step 1 of
Figure 4, the three routes are arranged by order and assume the
${G}_{11}$ is chosen, then the path of
${G}_{11}$ is inverted. In Step 2, the segments at the junctions of the clusters are determined according to
R, where
R equals 5 for simplicity. The next step is to optimize the two segments provided by Step 2. In Step 4, three new routes are generated according to Step 3 and the input routes. Once all four steps have been completed, return to Step 1 until the termination condition is met.
We note that the purpose of the TS_2opt is not to reach the global optimal, but rather to optimize the visit orders and junctions between groups that belong to the same group at the higher layer. Despite sacrificing some precision, the computation speed of TS_2opt is very fast, which is critical in largescale TSPs.
4.3. Parallelizability and Computational Complexity Analysis
We show the highly parallelizable capability of the proposed ALC_IGA. In the phase of solution initialization, the operations for clusters are independent in each layer; the operations of subgroups that do not belong to the same cluster in different layers are also independent. As an illustration, there are three tasks in the third layer shown in
Figure 3; find the visit route for
${G}_{11}$ and
${G}_{12}$, and apply
kmeans to divide
${G}_{13}$ into small groups. As they are standalone, if there are three or more cores of the CPU, they can be computed on different cores simultaneously. Furthermore, if
kmeans is faster than the other two tasks, then the computations of
${G}_{131}$ and
${G}_{132}$ in the next layer can also be allocated to the free cores even if
$P\left({G}_{11}\right)$ and
$P\left({G}_{12}\right)$ are still being calculated.
In the second phase of ALC_IGA, solution optimization also can be parallelized, but the parallel effectiveness is not as high as in the first phase. Firstly, the complex calculation in solution optimization is only the optimization of the junctions, but there are only two junctions in each iteration, so parallel computing is unnecessary. Secondly, the optimization of the solution starts from the bottom and ends at the top layer, but the higherlayer optimizations must wait for lowerlayer optimizations to finish. In the example shown in
Figure 3, there is only one task in the fourth layer, which is connecting
${G}_{1311}$ and
${G}_{1312}$. Because the route of
${G}_{131}$ is not determined before the computation of the fourth layer is finished, the free cores can not be used to combine
${G}_{131}$ and
${G}_{132}$ in the third layer.
Notwithstanding, parallel techniques can be used in each layer to speed up computation while the scale of the problem is very large. The computational complexity of the major stages of the proposed ALC_IGA is presented in the remainder of this subsection.
We recall that the time complexity of kmeans is known as $O\left(NKID\right)$, where N is the number of points, K is the number of clusters, I is the specified max iterations, and D is the number of dimensions. For the sake of simplicity, we assume that there are n nodes in the TSP, and m and k are two positive integers, ${T}_{1}$ is the max run time of the IGA for solving mnodes TSP, I and D are fixed. After that, we look at the time complexity in two parts.
In the bestcase scenario, we assume that $n={m}^{k}$, and each use of clustering divides the cluster into m subclusters, where the number of nodes of each subcluster is equal. Firstly, in the second layer, the IGA well is used once to obtain the visited cluster order, and in the third layer, it is m times. We deduce that the total times of IGA are $1+m+{m}^{2}+\cdots +{m}^{k1}$, and by $n={m}^{k}$ we have $\frac{n1}{m1}$, then the upper bound of the total time of IGA is $\frac{n1}{m1}{T}_{1}$, which is $O\left(n\right)$. Secondly, in the top layer, the time complexity of kmeans is $O\left(nm\right)$; in the second layer, the time complexity of kmeans is $O\left(\frac{n}{m}m\right)m$, which is $O\left(nm\right)$. Subsequently, we can infer that the time complexity of each layer is always $O\left(nm\right)$. Note that when there are $k={log}_{m}n$ layers, then the total time complexity of ALC_IGA is $O\left(mn{log}_{m}n\right)$, and since the m is a given constant, the time complexity of ALC_IGA is $O(nlogn)$.
In the worst case, each cluster ends up with $m1$ groups that contain a single city and a single group that contains all the other cities. It can be seen that in this condition, the numbers of kmeans and IGA are both far more than the best scenario. Suppose $n=k(m1)+m$, then there will be k times clustering and $k+1$ times IGA. The time of IGA applied is no more than $(k+1){T}_{1}$, it is $O\left(n\right)$. Similar to the bestcase analysis, the computational complexity of clustering in the worst condition is $O\left(mn\right)+O\left(m\right(n(m1)\left)\right)+\cdots +O\left(m\right(n(k1)(m1)\left)\right)$; by some calculation, we obtain that the time complexity of the kmeans used is $O\left({n}^{2}\right)$. Accordingly, the computational complexity of ALC_IGA in the worst condition is $O\left({n}^{2}\right)$.
In summary, the computational complexity of the ALC_IGA ranges from
$O(nlogn)$ to
$O\left({n}^{2}\right)$. The computational complexity of ALC_IGA is closer to
$O(nlogn)$, however, in the majority of cases. This is supported by the numerical experiments presented in
Section 5.
Remark 3. Comparing with the algorithms in [7,51,55], we note briefly that ALC_IGA exhibits several innovations and advantages as follows: As a tool for solving subTSPs, IGA has been improved in some aspects based on existing techniques, and has shown significant improvements compared to GA [51] and ACS [7] on smallscale TSP problems; see the experiments in Section 5. ALC_IGA only requires attention to one parameter: the maximum number of clusters for kmeans. This simplicity is more convenient than that of two or threelayered algorithms and is crucial for solving largescale TSPs.
Based on the characteristics of layeredclustering computation, we have proposed a fast finetuning algorithm; this step has not been introduced in [7,51,55]. By applying adaptive layered clustering, we are able to analyze the time complexity of ALC_IGA, which is still challenging to in the case of two or threelayered algorithms.
5. Numerical Results and Discussions
Fourpart numerical experiments are presented in this paper to illustrate the effectiveness of ALC_IGA. First,
Section 5.4 shows that IGA is substantially superior to GA and ACS in terms of accuracy and convergence speed. The implications of the primary parameter setting performance on ACL_IGA are examined in the second part. The third part proves the superiority of ALC_IGA on middlescale benchmark datasets over two twolayered algorithms from the literature. The last part proves the excellent performance and parallelizability of the proposed ALC_IGA in comparison to some representative algorithms.
5.1. Experimental Setting
In this study, all experiments were computed on a Dell PowerEdge R620 with two Intel Xeon E52680V2 10core processors and 64.0 GB of 1066 MHz DDR3 memory under Windows 10 OS. The speed of all cores is locked to 2.80 GHz without turbo boost technology and disabled hyperthreading to ensure the fairness and stability of the numerical experiments. All the programs are edited and run on MATLAB R2020a, the used parallel technique is the parallel computing toolbox in MATLAB, and only the experiments in
Section 5.7 were run in parallel. By default, each instance was computed 20 times under the same setting. In detail, if the algorithm was singlethreaded, the instance on 20 cores was executed simultaneously; if the algorithm was multithreaded, they were run one by one. The sources of GA, ACS [
27], IGA, twolevel genetic algorithm (TLGA) [
51], TLACS [
7], and ALC_IGA are published on GitHub (
https://github.com/nefphys/tsp, published on 4 January 2023), and the instances involved are also on this repository.
5.2. Benchmark Instances
For various experimental tasks, the instances are classified into three categories: smallscale TSPs $(n\le 500)$, mediumscale TSPs $(500<n\le 4\times {10}^{4})$, and largescale TSPs $(n>4\times {10}^{4})$. Smallscale TSPs were used to study the effectiveness of IGA, middlescale TSPs were employed to tune parameters and compare ALC_IGA with TLACS and TLGA in a single thread, and largescale TSPs were adopted to compare ALC_IGA with some relevant algorithms in parallel and verify the efficiency of ALC_IGA.
5.3. Evaluation Criteria
The following are the evaluation criteria for the algorithmic analyses on instances:
The minimum objective value among all runs: ${R}_{best}$.
The average objective value among all runs: ${R}_{avg}$.
The standard deviation of results among all runs: ${R}_{std}$.
The best known solution of the instance: $BKS$.
The deviation percentage of
${R}_{best}$ is defined by:
The deviation percentage of
${R}_{avg}$ is defined by:
The running time ${T}_{Rb}$ in seconds while ${R}_{best}$ was found.
The average of the running time in seconds among all runs: ${T}_{avg}$.
The count of the best ${R}_{best}$, ${R}_{avg}$, ${R}_{std}$, and ${T}_{avg}$ are denoted as ${C}_{Rb}$, ${C}_{Ra}$, ${C}_{std}$, and ${C}_{Ta}$.
5.4. Performance Comparison of IGA, GA, and ACS
In addition to clustering, the most timeconsuming part of ALC is eliminating the subTSPs. That is why the IGA proposed. To illustrate that IGA is efficient on TSPs, a comparison of IGA, GA, and ACS is imperative, and 42 smallscale benchmark instances were used in this numerical comparison. The parameters setting of IGA were as follows: the population was set to 0.4 times the number of nodes; the maximum number of iterations for S_2opt was set to 20 times the number of nodes; the parameters of selection operator,
${r}_{1}$ and
${r}_{2}$, were set to 0.15 and 0.5; and the probability of mutation was set to 0.05. The population size of GA was set to 0.8 times the size of the instance and the mutation number was always set at three individuals. The parameters setting of ACS is as same as the literature [
7]. Finally, the termination condition for the three compared algorithms is when there has been no improvement in the population for
X iterations. In this experiment,
X were set to 100, 100, and
${10}^{4}$ for IGA, ACS, and GA, respectively. The results of the comparison without parallelization are displayed in
Table 2, and various evaluation criteria were considered, including
${R}_{best}$,
$P{D}_{best}$,
${R}_{avg}$,
$P{D}_{avg}$,
${R}_{std}$,
${T}_{Rb}$,
${T}_{avg}$,
${C}_{Rb}/{C}_{Ra}/{C}_{std}/{C}_{Ta}$, and the average value for
$P{D}_{avg}$,
${R}_{std}$, and
${T}_{avg}$.
From
Table 2, the
${C}_{Rb}/{C}_{Ra}/{C}_{std}/{C}_{Ta}$ of IGA, GA, and ACS are
$42/42/39/41$,
$2/0/0/0$, and
$1/0/3/1$, respectively. It is clear that the innovative IGA consistently produces superior results over GA and ACS. Additionally, the average computation time of IGA is the least in 97% instances, and its stability also has a far higher level than the other two algorithms. More specifically, the average
$P{D}_{best}$ of IGA is 0.27%, but GA and ACS are 2.79% and 5.19%, respectively, 10 times and 19 times of IGA. In almost all cases, the
$P{D}_{avg}$ of IGA is less than 2%, but GA and ACS are often greater than 5%, especially ACS, and even greater than 10% in some instances. In the view of stability, the average of the evaluation criteria
${R}_{std}$ of IGA is 125.45, only 22.56% of GA and 63.52% of ACS. The average computation time of IGA is 90.43 s, which is less than onesixth as long as GA or half as long as ACS. The above discussion indicates that all the accuracy and the convergence speeds of IGA are substantially superior to the traditional GA and ACS, which proves that the proposed IGA can reduce the computation time and improve the solution of ALC_IGA.
In
Figure 5, the convergence speeds of IGA, GA, and ACS are compared under four instances with sizes ranging from 51 to 226. It can be observed that the convergence speed of IGA in the initial stage is much faster than that of GA and ACS. This is due to the heuristic crossover SBHX and the local search S_2opt combined in IGA.
We know that the suggested IGA can be utilized to solve WSP as stated in
Section 3, with just a minor adjustment to the distance between the start and end cities. In this part, to validate the effectiveness of IGA for WSP, the 42 instances in
Table 2 were reinvestigated. The start and end cities of these instances were determined using the first and last elements of the best known solutions provided by TSPLIB and TSP test data, and the distances between start and end cities were set to −
${10}^{5}$. The benchmark algorithm is the famous TSP solver LKH proposed by Helsgaun [
49]. The results, which include
${R}_{best}$,
${PD}_{Best}$,
${R}_{avg}$,
$P{D}_{avg}$,
${R}_{worst}$,
${R}_{std}$,
${T}_{Rb}$, and
${T}_{avg}$ are shown in
Table 3.
It is clear from
Table 3 that the IGA can produce the solution of WSP with a high level of accuracy. We note that all
$P{D}_{best}$ are lower than 1% and 18 out of 42 are as good as LKH. The
$P{D}_{best}$ of 25 out of 42 instances produced by IGA are less than 0.1%, and all the
$P{D}_{best}$ are lower than 1%. The outcomes on WSPs are even superior to those of IGA on TSPs in some aspects. In detailed, the averages of
$P{D}_{best}$,
${R}_{std}$, and
${T}_{avg}$ are 0.2%, 134.28, and 81.83, respectively. By comparison, they are 0.27%, 125.45, and 90.43 on TSPs, that indicating that the IGA is able to find better solutions on WSPs in a shorter time than on TSPs. Especially on d493, the average execution time
${T}_{avg}$ of IGA on WSPs is only 473.19, whereas it is 650.09 on TSPs.
According to the aforementioned analyses, the proposed IGA significantly outperforms GA and ACS in terms of convergence speed, solution quality, and stability. Additionally, on the WSP, which appeared more often in ALC_IGA, IGA also performed very well.
5.5. Parameters Tuning for ALC_IGA
The solution initialization phase of ALC_IGA shown in
Section 4.1 shows that the main parameter of ALC_IGA in only the first phase is
M, which limits the time required to solve TSP or WSP less than
${T}_{1}$. The results from the previous subsection show that, under ordinary situations, the IGA can handle TSPs with less than 100 nodes in 6 s and solve TSPs with less than 150 nodes in 20 s. Consequently, a decent
M should not go beyond 150 too much. In order to choose a favorable
M for ALC_IGA to balance the computation time and quality of solution, numerical comparison of
M was set to 50, 100, and 150 on 45 instances, which are considered in this subsection. These instances were mediumscale, with sizes ranging from
$1.3\times {10}^{3}$ to
$2.5\times {10}^{4}$. Due to the fact that the distribution of nodes greatly affects the clustering effect, in order to fairly study the influence of
M on the results of ALC_IGA, a variety of instances coming from TSPLIB, TSP test data, and TNM data were studied in this experiment. In the following subsections of this paper, the termination condition of IGA is set to when there has been no improvement in the population for 30 iterations, and the other parameters are as same as in the last
Section 5.7. Denote the ALC_IGA with
$M=50,100,150$ as ALC_IGA50, ALC_IGA100, and ALC_IGA150, respectively; the major five evaluation criteria
${R}_{best}$,
$P{D}_{best}$,
${R}_{avg}$,
$P{D}_{avg}$,
${T}_{avg}$ and
${C}_{Rb}/{C}_{Ra}/{C}_{Ta}$ of the results, which ran without parallelization, are presented in
Table 4.
From
Table 4, the
${C}_{Rb}/{C}_{Ra}/{C}_{Ta}$ of the ALC_IGA50, ALC_IGA100, and ALC_IGA150 are
$3/3/45$,
$5/4/0$, and
$37/38/0$, respectively. As can be seen, the ALC_IGA50 is the fastest, whereas the ALC_IGA150 algorithm usually produces the best results. When the size of instance is less than
$2\times {10}^{3}$, ALC_IGA50 has the minimum
$P{D}_{best}$ and
$P{D}_{avg}$ on fl1400, and ALC_IGA10 has the lowest
$P{D}_{best}$ on dca1389 and dkd1973. However, the
$P{D}_{best}$ and
$P{D}_{avg}$ of ALC_IGA150 on the three instances are all less than 10%, and this is still a respectable result. When the instance size is larger than
$2\times {10}^{3}$, the ALC_IGA50 and ALC_IGA100 only perform better than the ALC_IGA150 on TNM instances. Concerning specifics, the ALC_IGA50 works well on Tnm2002 and Tnm4000, the ALC_IGA100 excels on Tnm6001, Tnm8002, and Tnm10000, but the ALC_IGA150 provided the best result on the large instance of Tnm20002. The results of ALC_IGA150 are therefore superior to those of ALC_IGA50 and ALC_IGA100 in TSPLIB and TSP test data, and it is still a suitable approach for TNM data. The average of
$P{D}_{best}$ and
$P{D}_{avg}$ for the three algorithms shown at the bottom of
Table 4 also support this.
Furthermore, considering the algorithms’ running times, the mean of ${T}_{avg}$ of ALC_IGA50 is 91.02, which is threefifths of the time taken by ALC_IGA100 and twofifths of ALC_IGA150. This indicates that the fastest algorithm is ALC_IGA50, and the ratio of running time hardly changes with the size of the instance. However, even the slowest proposed ALC_IGA150 could handle the ${10}^{4}$ nodes instance with just approximately 10% deviation percentage in the same amount of running time as the IGA, which can only solve the instance with a size of roughly 400 nodes. The fastest ALC_IGA50, which is more than 60 times faster than the IGA, can deal with $2.5\times {10}^{4}$ nodes in the same amount of time. Thus, the high efficiency of ALC_IGA has been verified.
Figure 6 displays the deviation percentage of each run among all instances. It is noteworthy that for all three algorithms, most of the deviation percentages are under 20%. In particular, the deviation percentages of the ALC_IGA100 and ALC_IGA150 are less than 10% in the majority of instances. Furthermore, the figure also reveals that the ALC_IGA100 and ALC_IGA150 have many overlapping regions, indicating that the performance of the two algorithms is roughly equivalent.
Additionally, the relationship between the running time of ALC_IGA and the value of
M is taken into account. The average execution time for the instances of the three algorithms is plotted in
Figure 7 in different colors. In order to discuss the computational complexity of the algorithms, the exponential curve fitting for each group was calculated. Because the computation time of ALC_IGA150 is larger than the other two, its slope shown in the figure is undoubtedly the steepest. The approximated time complexities of ALC_IGA50, ALC_IGA100, and ALC_IGA150 are
$O\left({n}^{0.9992}\right)$,
$O\left({n}^{0.9958}\right)$, and
$O\left({n}^{1.02}\right)$, respectively, which are all extremely close to the linear computational complexity
$O\left(n\right)$. With 95% confidence bounds, the upper bound of the computational complexity for ALC_IGA50 is 1.0326, and the other two are 1.0963 and 1.151. The statistical outcomes of curve fitting are shown in
Table 5. It can be seen that all three fitting models have high confidence, especially the
${R}^{2}$ of ALC_IGA50, which is over 0.99. The above results prove the computational complexity analysis of the proposed ALC_IGA in
Section 4.3.
To sum up, the quality of the solution obtained by ALC_IGA has a strong relationship with the data distribution and the value of M. On the other hand, the larger M is set, the longer the computation time required by ALC_IGA according to the numerical experiments. In most cases, setting M to 100 is a typical compromise choice to balance computation time and quality.
5.6. ALC_IGA Compared with TwoLayered Algorithms
The effectiveness of ALC_IGA on mediumscale problems was confirmed in
Section 5.5, although it is unclear whether it is superior to the other layered algorithms. To illustrate the performance of ALC_IGA, the proposed ALC_IGA was compared with two typical algorithms, which were TLGA [
51] and TLACS [
7]. The TLGA and TLACS were recoded in Matlab, and to be fair, the running time and the solution quality were improved to be better than the literature. The main parameters were set as follows: the
M of ALC_IGA was set to 100; the numbers of cluster centers of TLACS and TLGA were automatically adjusted according to the size of the instance; the termination conditions of ALC_IGA, TLACS, and TLGA were that when there has been no improvement of the solution for 30, 30, and 100 iterations, respectively. All of the algorithms were implemented in singlethread. There were 45 mediumscale instances whose sizes ranging from
$1\times {10}^{3}$ to
$4\times {10}^{5}$ were investigated in this experiment.
As is shown in
Table 6, the evaluation criteria
${C}_{Rb}/{C}_{Ra}/{C}_{Ta}$ of ALC_IGA are
$41/40/30$, the
${C}_{Rb}/{C}_{Ra}/{C}_{Ta}$ of TLACS are
$4/5/15$, and
${C}_{Rb}/{C}_{Ra}/{C}_{Ta}$ of TLGA are
$0/0/0$. First of all, it is pointed out that TLGA has no advantage in all instances compared with the other two algorithms in terms of solution quality and convergence speed. The TLACS obtained the four best
$P{D}_{best}$ and five best
$P{D}_{avg}$ among all 45 instances. In detail, TLACS outperforms ALC_IGA on fl1400 and fl1577, but ALC_IGA defeats TLACS on fl3795. The other three instances where TLACS performs better are all hardtosolve instances [
69]. That is because the fewer clusters generated, the better solution produced, which is according to the results in
Section 5.5. The averages of
$P{D}_{best}$ and
$P{D}_{avg}$ for ALC_IGA are 8.51 and 9.74, whereas for TLACS and TLGA, they are 12.89 and 14.10, and 88.84 and 102.43, respectively. The analyses above verify that the accuracy of ALC_IGA is superior to TLACS and TLGA in all scenarios except for TNM instances.
From
Table 6, the average values of
${T}_{avg}$ of ALC_IGA, TLACS and TLGA are 209.98, 489.48, and 1020.86 s. It can be seen that the proposed ALC_IGA is much faster than the other two algorithms. In detail, when the size of the instance is less than
$4.5\times {10}^{3}$, TLACS is faster than ALC_IGA in most cases. When the size of the instance is between
$4.5\times {10}^{3}$ and
${10}^{4}$, the running times of ALC_IGA and TLACS are very close. When the size of the instance is larger than
${10}^{4}$, the proposed ALC_IGA has huge advantages, especially when the problem size is greater than
$3\times {10}^{4}$, as the computation time of ALC_IGA is less than onethird of TLACS and less than onefifth of TLGA.
Figure 8 converts a large amount of data in
Table 6 into an explicit image. The real lines represent the
$P{D}_{avg}$ and
${T}_{avg}$ of ALC_IGA. It is closer to the horizontal axis, which means that the ALC_IGA has a high performance in accuracy and convergence speed. The results of the run times for ALC_IGA, TLACS, and TLGA with exponential curve fittings were
$O\left({n}^{0.945}\right)$,
$O\left({n}^{1.611}\right)$, and
$O\left({n}^{1.221}\right)$. This reveals that the gap in computation time between ALC_IGA and the other two algorithms will increase as the size of the problem increases.
5.7. Results on LargeScale TSP Instances
In this section, to investigate the performance of ALC_ IGA in largescale instances, the new ALC_IGA is compared to the TLACS [
7], an accelerating genetic algorithm evolution via antbased mutation and crossover (ERACO) [
32] and a 3LMFEAMP [
55]. The ALC_IGA and TLACS were implemented in Matlab R2022a and parallelized by the parallel computing toolbox in Matlab. The ERACO was set on an AMD Ryzen 2700 CPU with 16 threads in parallel. The parallel 3LMFEAMP was coded in Python, and it was implemented on a server with a 24core Intel Xeon CPU and 96 GB RAM. The sizes of the 15 involved instances range from
$4\times {10}^{4}$ to
$2\times {10}^{5}$.
The results and five evaluation criteria
${R}_{best}$,
$P{D}_{best}$,
${R}_{avg}$,
$P{D}_{avg}$, and
${T}_{avg}$ are shown in
Table 7. Compared to ALC_IGA with TLACS, the advantage of ALC_IGA in running time is apparent again. The running time of ALC_IGA is roughly onesixth of TLACS when the problem size is around
$5\phantom{\rule{4pt}{0ex}}\times {10}^{4}$, but when the size approaches
$2\times {10}^{5}$, the running time of it is just oneninth of TLACS. The performance of ALC_IGA is better than TLACS in most conditions, but TLACS works pretty well on TNM instances.
There are four instances compared with 3LMFEAMP; results shown in
Table 7 reveal that the performance of it is very close to TLACS, and the difference between them in terms of
$P{D}_{best}$ and
$P{D}_{avg}$ is about 2%, whereas, compared with ALC_IGA, the 3LMFEAMP is far worse than it in terms of convergence speed and solution quality. On the involved six instances, the
$P{D}_{best}$ and
$P{D}_{avg}$ of the novel intelligence algorithm ERACO exceeded ALC_IGA by 2.5 times. Additionally, the proposed ALC_IGA runs significantly faster than ERACO.
Figure 9 shows the average computation times and deviation percentages of the four algorithms. It is clear that ALC_IGA performs well in most situations and is significantly faster than the others. According to the results illustrated in
Section 5.5, the only drawback of ALC_IGA is on TNM instances, which can be improved by setting
M larger.
Finally, the results of ALC_IGA under
M set to 50, 100, and 150 for five huge instances are also given. The ara238025, lra498378, and lrb744710 are three instances containing hundreds of thousands of nodes, which are the very largescale integration instances of TSP test data. The Santa, which has 1437195 cities, as a benchmark instance for largescale TSPs, has been investigated thoroughly by several wellknown solvers in [
64]. Gaia was published by William Cook in 2019 and includes two million coordinates of stars.
Five evaluation criteria and the averages of them are presented in
Table 8. It shows again that the larger the
M set, the better the solution obtained and the longer computation time needed. For ALC_IGA50, ALC_IGA100, and ALC_IGA150, the averages of
$P{D}_{best}$ are 13.944, 11.122, and 10.308, respectively, which are extremely close to the average of
$P{D}_{avg}$. This illustrates the strong stability of ALC_IGA, which the average of
${R}_{std}$ has also proven. While
M was set to 50 or 100, the
$1.4\times {10}^{6}$ nodes instance can be handled within 1 h on our implement, and even the large threedimensional Gaia can be fixed within 1.5 h.
Figure 10 depicts the best solutions obtained by the ALC_IGA with
$M=100$.
6. Conclusions and Discussion
Inspired by twolayered [
7,
51] and threelayered [
55] algorithms for TSPs, ALC_IGA with high parallelizability is proposed to solve largescale TSPs with millions of nodes in this paper. In the first phase, ALC_IGA ensures that all subTSPs and subWSPs are smaller than the specified size through
kmeans repeatedly applied, thereby reducing the computation time. In the second phase, the TS_2opt is developed to rapidly improve the initial solution. The IGA is also proposed for smallscale TSPs and WSPs, with the following significant modifications: the polygynandryinspired SBHX is designed for high convergence speed; the S_2opt for balancing convergence speed and falling into local optimum is created. According to the study, the computational complexity of ALC_IGA is between
$O(nlogn)$ and
$O\left({n}^{2}\right)$.
The numerical results on 42 instances show that the proposed IGA is better than both GA and ACS in terms of convergence speed and accuracy, and it performs better on WSP than on TSP. According to the numerical results on lots of instances from diverse sources, in most conditions, ALC_IGA outperforms TLGA, TLACS, and 3LMFEAMP and the novel ERACO in terms of precision, stability, and computation speed. The worst situation of ALC_IGA is on the hardtosolve TSP instances, where the errors are still less than 20% and can be improved by adjusting the parameters.
MariescuIstodor and Fränti [
64] compared three types of algorithms for solving the largescale Santa problem within 1 h on an enterprise server without parallelization. They achieved a highquality solution (111,636 km) using their LKH and grid clustering implement (
https://cs.uef.fi/sipu/soft/tspDiv.zip, accessed on 20 March 2023), which outperforms the best result (121,831 km) obtained by ALC_IGA with parallelization. Moreover, it is worth noting that LKH without clustering achieved a 108,996 km solution, which is over 12% better than our result. As a result, we give the following suggestions for future research:
It is worth combining the adaptive layered clustering framework with LKH and some new techniques in [
64] and other references.
Investigate the impact of different clustering algorithms on the quality of solutions.
Explore better tuning algorithm to enhance solution quality.
Extending ALC_IGA to tackle largescale ATSPs, CTSPs, DTSPs, and other related problems would also be meaningful.