Optimization of Truck-Drone Parcel Delivery Using Metaheuristics

: This research addresses a variant of the traveling salesman problem in drone-based delivery systems known as the TSP-D. The TSP-D is a combinatorial optimization problem in which a truck and a drone collaborate to deliver parcels to customers, with the objective of minimizing the total delivery time. Determining the optimal solution is NP-hard; thus, the size of the problems that can be solved optimally is limited. Therefore, metaheuristics are used to solve the problem. Metaheuristics are adaptive and intelligent algorithms that have proved their success in many similar problems. In this study, a solution to the TSP-D problem using the greedy, randomized adaptive search procedure with two local search alternatives and a self-adaptive neighborhood selection scheme is presented. The proposed approach was tested on 200 instances with different properties from the publicly available “Instances of TSP with Drone” benchmark. Results were evaluated against state-of-the-art algorithms. Non-parametric statistical tests concluded that the proposed approach has comparable performance to the rival algorithms ( p = 0.074) in terms of tour duration. The proposed approach has better or similar performance in instances where the drone and truck have the same speed ( α = 1).


Introduction
Over the last few years, drones have been adopted in many sectors [1,2], especially the commercial sector [3,4]. Drones are now deployed to support parcel delivery, an area that has traditionally been handled by trucks. Moreover, the COVID-19 pandemic has taken a significant toll on people all over the world [5]. Drones have helped people to comply with social distancing rules, as they provide contact-free delivery services. In addition, drones help with transportation logistics in many ways, such as avoiding traffic congestion, allowing for faster deliveries, and providing lower transportation costs [6]. However, there is an upper limit for the parcel weight, as drones cannot carry heavy parcels [7]. Additionally, drones are battery-powered, thereby limiting their delivery ranges. In contrast, truck delivery can work over a longer range, as it is fuel-based, and trucks can carry heavy and large parcels. Nonetheless, traditional truck delivery is slow and has high transportation costs [6]. Combining both truck and drone delivery could minimize the total operational costs and time, thereby enhancing the quality of service. This variant of the traveling salesman problem (TSP) [8,9] is known as the traveling salesman problem with drone (TSP-D) [6,10]. If a customer is served by a truck, it is called a truck delivery, whereas if a customer is served by a drone, it is called a drone delivery. The drone delivery is represented as a tuple < i, j, k >, in which i is the launch node (i.e., the node where the truck launches the drone, as in nodes 1 and 6 in Figure 1b), j is the drone node (i.e., the node that the drone delivers the parcel to, as in nodes 2 and 7 in Figure 1b), and k is the rendezvous node (i.e., where the truck and drone meet again, as in nodes 3 and 9 in Figure 1b); k can either be a customer location or the depot. A sortie refers to a sequence of nodes between a launch node and a rendezvous node. For example, the sequences < 1, 2, 3 > and < 6,7,8,9 > represent two distinct sorties in Figure 1b. The drone has a constant level of endurance, defined as the maximum time that the drone can fly without needing to be charged. < i, j, k > is a feasible solution for a drone delivery when i = j, j = k, k = i, t ij + t jk ≤ e , where t ij is the time taken by the drone to move from i to j, t jk is the time taken by the drone to move from j to k, and e is the drone's endurance. The drone path follows the road on the network for safety. The objective of the TSP-D is to find the tour with the shortest delivery time such that all customer locations are served by either the truck or the drone. The pickup, delivery, and recharging times of the drone are neglected.
There are additional constraints that need to be considered: • Both vehicles must start from and return to the depot; • Customers can only be served once, either by the drone or the truck; • The drone does not have a rendezvous node before the delivery drone node; • The drone cannot serve more than one customer between nodes i and j.
A very similar problem to the TSP-D is the flying sidekick traveling salesman problem (FSTSP) [12]. The FSPSP is also a collaboration between a single truck and a single drone to deliver parcels to customers. The difference between the TSP-D and FSTSP is that in the FSTSP, two metrics are used for the distance-one for the truck (Manhattan distance) and one for the drone (Euclidean distance)-as the drone can fly directly from the launch point to the destination, thereby ignoring road restrictions. In the TSP-D, however, the drone must follow the road network.
The TSP-D is considered an NP-hard problem [13]. Exact methods have been used to solve the TSP-D [13][14][15][16]; however, due to computational limitations, these methods are only feasible on instances of limited sizes. Heuristics and metaheuristics are generally used to minimize the tour costs. Several taxonomies have been proposed for use in metaheuristics [17,18], among which solution and population-based approaches are the most common classifications.
The route-first, partition-second approach has been used in all reviewed studies. In this approach, the TSP route is constructed first, as if there were no drone. Following that, the drone nodes are selected and removed from the truck route. The Concorde TSP solver [19][20][21] is an exact algorithm which is able to find the optimal solution for a traditional TSP problem that includes instances with a large number of nodes. Many studies have used Concorde for the routing step [10,[22][23][24].

The Original Solution to the TSP-D
Agatz et al. [10] have proposed several heuristics that are able to provide fast and acceptable solutions for instances of reasonable sizes. First, they constructed a truck-only tour using either the Concorde TSP solver or Kruskal's minimum spanning tree algorithm. Next, two approaches were proposed to classify some nodes as drone nodes. The first approach was a fast greedy heuristic, and the second approach was an exact partitioning algorithm based on dynamic programming. In both approaches, the order in which the nodes are visited remains unchanged. Both the greedy heuristic and the exact partitioning algorithm depend on the initial TSP tour. Therefore, the authors decided to start with a variety of initial tours, in order to determine whether they could obtain a better solution. They applied several heuristics based on the local search (LS) method, which modified the initial tour. Then they iteratively sought improvements. Several neighborhood functions were considered, which yielded several versions. In one of the versions, all neighborhoods were examined and the best neighbor generated in each iteration was selected.
The authors chose to run their experiments using the most challenging instances, where there were no limitations on the drone range and all locations could be visited by both drone and truck [23]. The greedy approach solved all instances with 100 nodes in a few seconds, but with limited savings in time relative to the original TSP tour. In contrast, the run time of the exact partitioning heuristics method increased tremendously with the number of nodes.

Solution-Based Metaheuristic Approaches to TSP-D
Solution-based metaheuristics have been used in previous studies to solve the TSP-D [6,11,25] and related FSTSP problems [12,[22][23][24]. An initial tour for the simulated annealing algorithm [26] was implemented by Ponza [22] using a nearest-neighbor heuristic. To make a move (i.e., selecting a drone node for a sortie), several neighbors were randomly generated, and the roulette-wheels selection method [27] was used to choose a neighbor. De Freitas and Penna [23] first found the optimal solution for the TSP using the Concorde solver. The initial solution for the FSTSP was then defined, using a heuristic utilizing a greedy approach. This procedure removes a truck customer and assigns it to a drone, effectively partitioning the truck route into sub-routes. The initial solution was optimized using the variable neighborhood search [28]. Seven different neighborhood structures were used and randomly selected, resulting in a change in the original sequence of node visits. Similarly, de Freitas and Penna [23,29] used a randomized variable neighborhood descent heuristic with five different neighborhoods. Ha et al. [11] attempted to solve the TSP-D using two heuristics, TSP-LS and GRASP. The aim of this paper was to minimize the total operational cost of the two vehicles. In each iteration, a new giant tour (TSP tour) was generated using three different heuristics: k-nearest neighbors, k-cheapest insertion, and random insertion. A split algorithm built the minimum cost (min-cost) TSP-D solution from the giant tour solution by substituting a truck customer with a drone. After building the min-cost solution, the LS improved the solution using four move operators. The tests conducted in this study showed that the k-nearest neighbors generated a better min-cost TSP, when compared with k-cheapest insertion and random insertion. The quality of solutions obtained with GRASP outperformed those obtained with TSP-LS. However, TSP-LS was faster than GRASP. The proposed GRASP method has been modified by Marinelli et al. [6], such that the drone can launch from and join the truck not only at a customer node, but also along the route.

Population-Based Metaheuristic Approaches to TSP-D
Özogu et al. [30] proposed a solution to the FSTSP: a hybrid approach combining a genetic algorithm (GA) [31] and Clarke and Wright's savings algorithm (CW) [32]. In this approach, the GA is used to assign the drone and truck to the customers. Following that, the sequence of the customers is determined using the CW. The aforementioned process continues until a specified number of iterations is reached. The authors proposed multiple specific mathematical assumptions for the problem. The most consequential assumption is that the drone must launch and return to the truck at the same node, resulting in an increase in the waiting time of the truck. Ha et al. [33] used another hybrid GA, called the HGA, to solve the TSP-D with both minimum time (min-time) and minimum cost (min-cost) objectives. To improve the HGA's quality, the authors educated their offspring using LS methods. The authors designed a hill-climbing and first-improvement LS to address the min-time and min-cost objectives. In addition, the authors developed a restoration method to "educate" the TSP-D solution. Ferrandez et al. [34] used K-means clustering to help to find the truck stops, which were also the drone launch points. A GA was used to solve the truck route as a TSP. Houseknecht [35] implemented an extension of the classical ant system [36]. Using ant pairs is essential for the TSP-D, as a truck route and a drone route are needed every time the tour is constructed. When constructing its own path, every ant in the pair lays down a pheromone which is different from those the others. Every ant type is attracted to its own pheromone.

Discussion of Previous Approaches
Solution-based metaheuristic methods can be less destructive than population-based methods. For instance, genetic operators can drastically change the characteristics of the current solution. The use of generic genetic operators may lead to infeasible solutions. Restorative steps can repair the obtained solution, in some cases. Solution-based methods are more inclined to undergo intensification. Notably, several studies have utilized a considerable number of neighborhood structures (e.g., seven [23] or five [10,28]). The neighborhood structure for the next move has been chosen randomly or pseudo-randomly.
Population-based methods also work much more slowly than solution-based methods, as population-based methods process an entire population of solutions in each iteration. For instance, HGA was found to be 1.5 times slower than GRASP [33]. Notably, populationbased metaheuristic approaches have been applied to this problem less frequently than single solution-based methods. The aforementioned approaches can achieve good results and provide the opportunity to further reduce the total route cost, compared with the exact cost in small to medium instances [10]. We considered GRASP [37,38], as it is a simple LS heuristic that utilizes a so-called restricted candidate list, which is a convenient way to store the set of candidate drone nodes. The neighborhood structure, (or equivalently, the next move to be applied) can be viewed as a search parameter. The optimal parameter choice largely depends on the problem type or the problem instance [39]. Self-adaptation has been shown to be beneficial for the selection of a genetic mutation operator for permutation encoding [40]. The search is guided by self-adaptively selected move operators: e.g., if there is a larger probability that moves with higher success rates, the instance in consideration will be identified and exploited, thereby improving the search outcome. This study aims to apply self-adaptive selection when searching the neighborhood in GRASP.

Proposed Greedy Randomized Adaptive Search Procedure for TSP-D
The route-first, partition-second approach was followed. The representative solution and the required data structures are described in Section 2.1. The details of the algorithm are presented in Section 2.2. Finally, the objective function is presented in Section 2.3.

Representative Solution and Supportive Data Structures
The candidate solution, tspd, is represented using an array that contains both truck and drone deliveries. The first and last indices refer to the depot. The elements in the array are ordered by the visit sequence. Each element in the array consists of a node id and node label (t: truck node, d: drone node, and c: combined node, where the location is visited by both the truck and drone). An example of a solution is shown in Figure 2, where Figure 2a illustrates the representative solution of the actual graph presented in Figure 2b. As part of the algorithm's initialization, the node IDs are initialized with the solution of the TSP tour. The node labels are initialized to t. The locations of the customer and the depot are stored in a location matrix, L. The column index represents the node id. The first row contains the x-coordinates, and the second row contains the y-coordinates. The values of the x and y-coordinates are read from the input data set.
In addition, two cost (adjacency) matrices are used to specify the cost over time (seconds): a drone cost matrix (C D ) and a truck cost matrix (C T ). Two cost matrices are used, as the drone and truck differ in terms of speed (s D and s T , respectively). The drone and truck speeds are read as inputs from the dataset. First, the Euclidean distance d ij between nodes i and j for each i; j in L is computed using the x and y-coordinates. Following that, C D ij and C T ij are respectively computed using the following equations:

Algorithm Details
The proposed GRASP approach is presented as Algorithm 1. minTime and bestTour are initialized to ensure the minimum time and the best attained TSP-D solution found so far, respectively. In line 3, the optimal TSP tour is obtained using the Concorde TSP solver. The following steps are repeated maxTrials times. The output of optimalTspTour() is passed to randomizedInitTspd(), in order to construct an initial TSP-D solution. This is explained in detail in Section 2.2.1. Following that, localSearch() is used to improve the acquired TSP-D solution, which is detailed in Section 2.2.2. CompCost() then computes the cost (objective function) of the current TSP-D solution as currentTime (Section 2.3). If there are positive savings in cost, with respect to minTime, the current TSP-D solution is saved as the best solution found thus far, bestTour.

Building the Randomized Initial TSP-D Tour
The main function of the initial TSP-D procedure (Algorithm 2) is to perform the initial partitioning of the available truck route into a truck route and a drone route, using the restricted candidate list (RCL).
First, the time saving obtained for all nodes (except the depot) when a node j is transformed from a truck node to a drone node is computed. The procedure identifies the minimum amount of saved time for all nodes c min , the maximum amount of saved time among all nodes c max , and an array savingTime which contains the time saved by each node. The saved time is computed using Equation (1) and illustrated in Figure 3, where i is the node ID of the launch node, j is the node ID of the drone node, and k is the node ID of the rendezvous node. Calculating the time savings obtained for all nodes takes O(n) time.
The threshold value, tau, is computed using Equation (2) [41], where δ is a randomly generated value, δ ∈ (0, 1). The RCL is accordingly built by adding the customers that may be serviced by the drone; that is, all nodes i with savingTime[i] ≥ tau. The buildRCL procedure requires O(n) time. Next, nodes are iteratively and randomly selected and removed from the RCL to be transformed into drone nodes, using the BuildTspd algorithm explained in Algorithm 3.
buildTspd was inspired by the greedy partitioning heuristic presented by Agatz et al. [10]. In that method, one of the three moves (MakeFly, which converts a truck node to a drone node, PushLe f t, and PushRight) is randomly selected in each iteration. The selected move is applied to an unprocessed node. However, in this implementation, as presented in Algorithm 3, a candidate node, candidateElement, is chosen randomly and removed from the RCL, in order to be converted into a drone node. candidateElement can be a drone node if it has a predecessor and a successor, meaning that it is not a boundary node. In addition, it must not lie in a sortie. Once this move is successful, the feasibility of the PushLe f t and PushRight moves for the sortie under consideration is examined. If both moves can be feasibly implemented while improving the cost, then one of them is greedily selected, such that the cost saving is maximized. Otherwise, the feasible move is applied if it improves the cost.
PushLe f t inserts the truck node to the left of the launch node into the sortie, as shown in Figure 4. The move is only feasible if the node preceding the launch node is a truck node other than the depot and is not part of a sortie. The time taken by the sortie before applying PushLe f t, be f oreLe f t, is calculated using Equation (3). Equation (4) computes the time taken by the sortie after applying PushLe f t, a f terLe f t. The time saved is computed by taking the difference between be f oreLe f t and a f terLe f t. If the procedure results in time savings greater than zero, the move is accepted. The PushRight operator is symmetric with respect to PushLe f t. Converting the candidateElement into a drone node in lines 4-10 of Algorithm 3 requires O(1) time. The complexity of both moves, including the work required to compute the savings obtained by the moves, is O(1) time. However, the time required by the CompCost step at line 1 of the algorithm is O(n) (Section 2.3), resulting in a time of O(n) for buildTspd. The buildTspd algorithm is called a number of times equal to the length of the RCL as in line 6 from Algorithm 2. Accordingly, the time complexity required by the randomizedInitTspd algorithm is O(n 2 ).  12 Greedily select and apply the move that maximizes cost savings to tspd; 13 else 14 if (PushLe f t is feasible) then 15 Apply PushLe f t to tspd;  be

Optimizing the Obtained Solution with Local Search
The LS procedure helps to improve the initial TSP-D solution by applying two different moves: Swap and TwoOpt [10]. The LS procedure is explained, followed by a description of the two moves.
A general structure for the LS procedure is shown in Algorithm 4. After the initialization step, an iterative process starts. In each iteration, either the Swap or TwoOpt move is selected in a self-adaptive manner, based on the move probability, p m , as follows (lines 7-12). A random number, rand ∈ [0, 1], is generated. If the value of rand is less than p m , the Swap move is selected; otherwise, the TwoOpt move is selected. The selected move is applied to the current solution, tspd. If the resulting neighbor has a lower cost, the move is accepted. Otherwise, the acceptance criterion is tested. The success ratio of the chosen move is updated. The procedure is repeated until the termination condition or convergence is reached. The move selection probability, p m , is updated consistently after a certain number of iterations, windowSize, based on the success ratio of the Swap and TwoOpt moves: swapSR and twoOptSR, respectively (lines 4-6 in Algorithm 4). The success ratio is computed as the number of times that the move is successfully applied, divided by the total number of times that the move was applied during the past windowSize. If swapSR is better than twoOptSR, p m is updated such that Swap has a higher probability of being chosen and vice versa, as shown in Equation (5), where β ∈ (0, 1) is a learning rate parameter.
Moreover, in order to control the computational time, the convergence is examined and the LS procedure is stopped if it fails to improve the solution after a specific number of iterations (convergence), as shown in lines 1 and 14-18.
Two variants of GRASP were considered, according to the LS procedure used: hillclimbing local search (HCLS) [42] and LS with simulated annealing (SA) [26]. These variants differ in terms of the initialization step and the acceptance criteria (lines 2 and 13 in Algorithm 4, respectively). In the HCLS variant, the loop control parameter is a simple loop counter i, which is initialized to zero. The termination condition is satisfied when i reaches the maximum number of allowed iterations, maxIter. In this variant, a move is accepted only if the resulting neighbor has a lower cost than that of the current solution.
In the second variant, SA, the loop is controlled by a temperature parameter, T i . T i is initialized to an initial temperature, T 0 , and updated using a geometric cooling schedule, as shown in Equation (6), where γ ∈ (0, 1) [26]. The acceptance criterion in SA differs from that in the HCLS variant for inferior moves. A sub-optimal move can be accepted, based on a certain probability P(∆E, T) that it follows the Boltzmann distribution, as shown in Equation (7), where ∆E represents the difference between the objective function (cost) of the current solution and that of the resulting neighbor. A random number, randP, is generated. If the value of randP is less than the probability P(∆E, T), the move is accepted; otherwise, the move is not accepted. The termination condition in the SA variant is satisfied when the temperature T i exceeds the final temperature T f .
The Swap Move: In the Swap move, two distinct nodes are randomly selected and swapped ( Figure 5). A swapped node may be a truck node, a drone node, or a combined node, but it cannot be the depot [22]. The TwoOpt Move: In the TwoOpt move, two edges are removed from the tour and replaced with another set of edges, in order to obtain a valid tour. The removed edges must be between truck or combined nodes [22]. The first step in applying TwoOpt is to select the two origin nodes of the selected edges randomly. In Figure 6a, nodes 4 and 2 are selected. Two edges are thus removed: (4, 6) and (2, 0). Then, two new edges are created, as shown in Figure 6b: (4, 2) and (6, 0). As a result, the sortie path is reversed, as shown in Figure 6b. As twoOpt randomly selects two edges, there are several invalid cases that must be taken into consideration, as follows: 1.
If the selected origin node or its destination node happen to be inside a sortie, as shown in Figure 7;

2.
If both selected origin nodes are launch nodes and their destinations are rendezvous nodes, as shown in Figure 8; 3. If the selected origin or its destination node happens to be a common node between different sorties, as shown in Figure 9. The complexity of Swap and TwoOpt moves, including the work required to compute the savings obtained by the moves, is O(n).

Objective Function
We aimed to minimize the total delivery time required to serve all customers starting from and returning to the depot, as shown in the objective function (8), where customer i is where the sub-route starts and customer k is where the sub-route ends. Customer j is served by the drone [23].
The CompCost procedure, presented in Algorithm 5, computes the cost of the full tspd tour. Firstly, the currentTime is initialized to 0. Following that, the nodes in the tour are examined in sequence. If the node is a launch node, the sortie time is computed. A launch location i and a rendezvous location k are identified. The sortie time is calculated by taking the maximum time between the droneTime and truckTime. The droneTime is calculated by adding the time taken by the drone to move from the launch location to the drone location, to the time taken by the drone to move from the drone location to the rendezvous location. The truckTime is computed by calculating the time taken by the truck to move from the launch location to the rendezvous location. In line 16, the value of the counter i is updated to that of the rendezvous location k, as the entire sortie has been processed. Otherwise, if the location is not in a sortie, it is visited by the truck and the cost required by the truck is computed by calculating the time taken by the truck to move from its current location to the following location.

Experimental Setup
The benchmark dataset is described in Section 3.1. The algorithm was implemented in Python. The algorithm parameters are explained in Section 3.2, and the model evaluation is presented in Section 3.3. We used an Ubuntu 16.04 LTS desktop powered by a 3.60 GHz × 8 Intel Xeon(R) CPU E5-1620 with 31.3 GiB of RAM and a 427.9 GB SSD.

Benchmark Dataset
We used the publicly-available dataset "Instances for the TSP with Drone" [43]. This has been used in several studies [10,23]. The dataset categorizes instances based upon the distribution type, instance size, and vehicle speed configuration. The categories are as follows. We used two different distribution types, based on how each instance was generated in the benchmark dataset. In a uniform distribution, the coordinates are generated independently and uniformly; in a single-center distribution, the coordinates are generated using an angle and a distance drawn from a normal distribution. The dataset contains a wide variety of instance sizes, ranging from 5 to 500 customer locations. The experiments in this study were applied using instances of sizes 10, 20, 50, 75, and 100. As for the vehicle speed configuration (α), we utilized two different relative speed configurations for the vehicles. With (α = 1), the truck and drone have the same speed; with (α = 2), the drone is two times faster than the truck.
For each (type, size, α) combination, the dataset contains 10 different instances. In each instance, the vehicle speed configuration (α), the total number of nodes (including the depot), and for each node, the customer ID and location specified using the x and ycoordinates are defined. Table 1 summarizes the initial values of the parameters used in the proposed GRASP approach. Several parameters were adaptively set using the instance size, n. For example, the maximum number of iterations in the GRASP algorithm (maxTrials) and in the HCLS variant (maxIter); the window size used to update the move probability parameter in LS, windowSize; and the convergence used to signal LS convergence. When n was less than 50, maxTrials was set to 50 × 10. In addition, p m was initialized to 0.5, in order to allow for equal chances of Swap and TwoOpt moves occurring at the beginning of the search. The learning rate, β, was initialized to 0.02, a common default value for standard artificial neural networks [44].

Model Evaluation
The outcomes of the proposed GRASP approach were compared with the results of two studies. Agatz et al. [10] solved the TSP-D problem by proposing a route-first, cluster-second heuristic based on the local search (LS) method. On the other hand, de Freitas et al. [23] first used a mixed-integer linear-programming solver, and then applied a variable neighborhood search metaheuristic to construct sorties and improve the solution. The effectiveness of the model was measured using the complete tour duration performance measure (in seconds, secs). The efficiency of the model was measured using the runtime performance measure (in secs). As GRASP is a stochastic algorithm, the run for each instance was repeated ten times, and the average value over each of the obtained performance measures was reported.
To obtain a solid statistical basis for the model comparisons, non-parametric statistical tests were used. The Wilcoxon signed-rank test was used to test the median differences among paired data points (pairwise comparisons) [45]. The Friedman test was used to compare more than two related data samples [46]. The significance level was set to 5%. The statistical analysis was conducted using the SPSS version 1.0.0.1012 software.

Results
This section reports the results for the two variants proposed: the GRASP HCLS variant and the GRASP SA variant (Section 4.1). The better-performing GRASP variant was then compared with two the rival algorithms [10,23]

Results of the Proposed GRASP Algorithm for TSP-D
The GRASP HCLS variant experiment used 200 test instances, (10,20,50,75,100) in size, of uniform and single-center type, and with a relative truck/drone speed of α = 1 or α = 2. All parameters were initialized to the settings stated in Table 1. Table 2 summarizes the results obtained using the proposed GRASP approach with HCLS. The values reported in the table represent average values of the best-known solutions obtained over the ten runs for each of the ten different instances for each of the (type, size, α) combinations. The table is divided according to the distributions; that is, uniform followed by single-center. In each distribution, the results are sectioned by the relative drone/truck speed (α). In the table, the column Size represents the instance size, the column Cost represents the average trip duration of the obtained TSP-D solution (in secs), the column Runtime represents the average running time (in secs), the column Swap represents the average number of times in which a Swap move was applied, the column TwoOpt represents the average number of times that a TwoOpt move was applied, the column Swap Success represents the average number of times that a Swap move was successful (i.e., it improved the TSP-D solution and decreased its cost), and finally, the column TwoOpt Success represents the average number of times that a TwoOpt move was successful. Moreover, instances of size 100 were run on different machines; hence, the run time is not reported (NA).  Table 2 shows that the Swap move was used more successfully in smaller-sized instances than in larger-sized instances, while the TwoOpt move was more successful in larger-sized instances. Nonetheless, the Swap move was more successful than the TwoOpt move overall. The standard deviation of the solution cost in instances with a uniform distribution tended to be lower than that observed in instances with a singlecenter distribution. In addition, the standard deviation of the solution cost in instances in which α = 1 tended to be lower than instances in which α = 2. Concerning the computational time, when α = 1, the algorithm ran for longer. The run time did not differ between uniform and single-center instances.
Notably, the moves were often not accepted in the GRASP HCLS variant, which suggests that it became stuck in a local optimum. To address this problem, the GRASP SA variant for LS, in which sub-optimal moves can be accepted, was examined. In this experiment, 160 test instances of sizes (10,20,50,75), with both uniform and single-center types and vehicle speed configurations of α = 1 and α = 2 were utilized. All parameters were initialized to the settings stated in Table 1. Table 3 summarizes the results obtained using the proposed GRASP approach with the SA variant. The values reported in the table represent the average values of the best known solutions obtained over the ten runs for each of the ten different instances in each of the (type, size, α) combinations. The column definitions are the same as those used in Table 2.  Table 3 shows that, as the instance size increased, the success ratios of both moves also increased. Nonetheless, the Swap move was still more successful than the TwoOpt move in all cases, except for the case (type = single-center, size = 75, α = 2). The standard deviation of the solution cost in uniform instances tended to be slightly lower than that for single-centered instances. We also observed that the standard deviation of the solution cost in instances in which α = 1 tended to be lower, compared to that for instances having α = 2.
As can be seen in Figure 10, overall, the GRASP SA variant performed worse than or approximately equal to the GRASP HCLS variant. The overall average solution cost for the models obtained using the GRASP HCLS variant was 517.79 secs, compared to 556.90 secs for the models generated using the GRASP SA variant. The standard deviation of the solution cost obtained with the GRASP SA models was higher than that of the GRASP HCLS model. A possible explanation for the large gap between runs of instances of the same size in the former is that the LS with SA accepts sub-optimal moves.
The Wilcoxon signed-rank test was used to compare the models obtained with the two GRASP variants. For a valid comparison, 160 instances were used for each model. A statistically significant difference was detected between the means of the two models (p < 0.001). Based on this, the HCLS variant was considered to perform better than the SA variant.

Evaluation against Rival Algorithms
Based on the conclusion presented in Section 4.1, the GRASP HCLS variant outperformed the SA variant. The better performing GRASP variant was compared with the rival algorithms, LS [10] and HGVNS [23], for the TSP-D problem, where the endurance of the drone was set to infinity and the service time for the drone launch and return was set to zero. The parameter settings for both rival algorithms can be found in the study of de Freitas and Penna [23].
The experiment used 200 instances, as described in Section 3.1. Table 4 reports the average values (in secs) for the best-known solutions for the ten instances for each (type, size, α) combination. The column definitions are the same as those used in Table 2, and are sectioned according to the algorithm addressed. Figure 11 compares the average solution cost results. Similarly to the findings presented by de Freitas and Penna [23], the algorithms generally performed better for instances with a uniform distribution, compared to instances with a single-center distribution, for all values of α. The proposed GRASP approach performed better in instances where α = 1; however, in instances where α = 2, that size was found to play a role. Accordingly, the proposed GRASP approach had a better or approximately equal performance in instances with small size (i.e., 10 and 20). Conversely, in instances with large size (i.e., 50 and 75), our approach had a worse or approximately equal performance. The average solution costs over all instances (in secs) were as follows: The HGVNS scored 565.52, closely followed by the LS (567.10) and GRASP (594.11). The Friedman test applied under the null hypothesis of equal performance of all algorithms obtained a p-value of (p = 0.074). Since the p-value was larger than the significance level (0.05), the null hypothesis is accepted, indicating that no statistically significant differences were detected among all the means of solution costs obtained from the three algorithms on multiple datasets.
Considering the efficiency of the three algorithms under consideration, all three algorithms start by constructing the initial TSP tour using the Concorde TSP solver. Both of the rival algorithms, LS and HGVNS, adapt the best-improvement search strategy, which exhaustively explores the neighborhoods and returns the solution having the minimum cost. The proposed GRASP algorithm, on the other hand, employs the first-improvement strategy, accepting the first neighbor having a cost that is lower than that of the current solution. Excluding the time required for obtaining the initial TSP tour, a single iteration of the LS procedure requires O(n 3 log n) [10]. A single iteration of the HGVNS algorithm runs in Θ(n 2 |N 2 |) [23], where |N | is the number of neighborhoods considered. On the contrary, and considering the value of maxIter in HCLS to be equal to n, a single iteration of GRASP requires O(n 2 ) time. Accordingly, the proposed algorithm has lower computational requirements than the two rival algorithms, LS and HGVNS.

Discussion
The GRASP HCLS variant focuses on improving the obtained TSP-D solution using relatively optimal moves only, whereas in the SA variant, sub-optimal moves with a given probability can be accepted. Overall, the GRASP SA variant was shown to have a worse or approximately equal performance to the HCLS variant. Setting the initial temperature to a high value may have played a role. Furthermore, setting γ to 0.95 and using the geometric schedule to update the temperature may have resulted in the temperature decreasing faster than needed. It is possible that changing the values of the SA parameters could result in better performance.
Comparing the performance of the proposed GRASP HCLS variant with the two rival algorithms, LS and HGVNS, the proposed approach performed better in instances where α = 1 and instances of smaller sizes where α = 2. Overall, the Friedman test showed that the proposed GRASP was comparable with the two state-of-the-art rival algorithms specified earlier in terms of solution cost. It is possible that the number of iterations in the LS procedure was not sufficient to effectively improve the solution. Furthermore, the random δ value heavily affects the size of the RCL. Thus, a small δ value may result in a small and inadequate RCL. However, the proposed algorithm had lesser computational complexity than the other two.

Conclusions
In this study, we addressed the TSP-D problem and its variants. Various algorithms, including population-based metaheuristic algorithms and solution-based metaheuristic algorithms, have been proposed for TSP-D and its variants, such as the FSTSP. We used GRASP as a single-solution metaheuristic algorithm. The proposed algorithm starts by constructing the entire truck route as a classical TSP. The obtained route is then partitioned into truck nodes and drone nodes, yielding an initial TSP-D solution. Following that, the proposed GRASP algorithm is applied to minimize the total delivery time for the initial TSP-D solution. Two versions of the LS procedure in GRASP were used: HCLS and LS with SA. Both versions use a self-adaptive neighborhood. The proposed approach was tested on the "Instances of TSP with Drone" benchmark dataset. Generally, the HCLS variant outperformed the SA variant of GRASP. Subsequently, the performance of the proposed GRASP HCLS variant was compared with two state-of-the-art algorithms: LS [10] and HGVNS [23]. Overall, the proposed approach was found to have a performance comparable to those of the two rival algorithms. Nonetheless, under some (type, size, α) combinations, our approach outperformed the rival algorithms.
The main contribution of this study is the self-adaptive selection of the search neighborhood, which was implemented for two neighborhoods (associated with Swap and TwoOpt moves), but which could be easily extended to a larger number of neighborhoods. The self-adaptive selection of the search neighborhood for the TSP-D problem may also inspire other methods, which in turn may significantly outperform existing ones. We recommend that the proposed GRASP approach be further improved by increasing the number of iterations in the LS procedure and tuning the δ value. Moreover, it would be interesting to investigate different parameters for the SA variant and to add more neighborhood alternatives into the search.

Data Availability Statement:
The data used in this study are openly available in Instances for the TSP with Drone at doi:10.5281/zenodo.1204676, reference number [43].

Acknowledgments:
The authors would like to thank the anonymous reviewers for their constructive comments. Special thanks to Heba Kurdi, Department of Computer Science, King Saud University, for her valuable advice that helped improve the quality of this work.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: