A Sim-Learnheuristic for the Team Orienteering Problem: Applications to Unmanned Aerial Vehicles

: In this paper, we introduce a novel sim-learnheuristic method designed to address the team orienteering problem (TOP) with a particular focus on its application in the context of unmanned aerial vehicles (UAVs). Unlike most prior research, which primarily focuses on the deterministic and stochastic versions of the TOP, our approach considers a hybrid scenario, which combines deterministic, stochastic, and dynamic characteristics. The TOP involves visiting a set of customers using a team of vehicles to maximize the total collected reward. However, this hybrid version becomes notably complex due to the presence of uncertain travel times with dynamically changing factors. Some travel times are stochastic, while others are subject to dynamic factors such as weather conditions and traffic congestion. Our novel approach combines a savings-based heuristic algorithm, Monte Carlo simulations, and a multiple regression model. This integration incorporates the stochastic and dynamic nature of travel times, considering various dynamic conditions, and generates high-quality solutions in short computational times for the presented problem.


Introduction
The team orienteering problem (TOP) is a classic optimization problem in which a set of routes is constructed for a team of vehicles to traverse, aimed at collecting the highest possible reward from customers within the routes before a defined time limit [1].In sectors such as search and rescue, surveillance, and logistics, addressing the TOP is crucial.One example of this could be a search and rescue mission, where a team of unmanned aerial vehicles (UAVs) must navigate through a disaster-stricken area to locate survivors and deliver supplies.While UAVs have emerged as essential assets in such missions due to their capacity to perform hazardous duties and their skill in sensing, monitoring, and navigating, the difficulties they face are difficult to overstate [2].These challenges are multifaceted and include unpredictable environmental factors like weather or traffic conditions.Such unpredictability makes reliable behavior prediction impossible, resulting in suboptimal solutions and a reduced mission efficiency.Additionally, the natural limitations of UAVs, such as the limited battery life and cargo capacity, demand precise optimization for mission accomplishment.Despite these challenges, UAVs remain a critical technology for solving complex problems in various domains, emphasizing the importance of the TOP and its applicability to UAVs [3].Furthermore, the integration of Internet of Things technologies stands as one of the most promising approaches for enabling UAVs to sense their surroundings and collect data.This integration facilitates UAVs to gather and exchange real-time environmental data, thereby optimizing their routes accordingly [4].
In recent years, the majority of studies focused on either deterministic or stochastic variants of the TOP.In the case of the deterministic TOP, many of the studies in the literature rely significantly on exact methods to solve this problem [5].However, when presented with challenges given by large-scale instances, these exact methods show limited effectiveness.On the other hand, in the case of the stochastic TOP, various studies have explored different strategies, such as stochastic programming [6], robust optimization [7], and, recently, the utilization of simheuristic approaches [8].Furthermore, several versions of the TOP have been developed to address specific real-world scenarios, including heterogeneous vehicle capacities, customer time constraints, and unpredictable travel times and profits.For example, Kirac et al. [9] addressed the TOP with time windows and mandatory visits, Lin and Vincent [10] tackled the TOP with mandatory visits, Gunawan et al. [11] addressed the TOP with variable profits, and Panadero et al. [12] tackled the TOP with stochasticity.
Our paper focuses on a hybrid variant of the TOP, which combines deterministic, stochastic, and dynamic characteristics, addressing a gap in the scientific literature.To solve this complex problem, a new sim-learnheuristic approach is introduced, with special applications in UAVs that can be used to aid in disaster rescue.This approach combines a biased-randomized savings-based heuristic, Monte Carlo simulations, and a multiple regression model.The main objective of this novel methodology is to combine the strengths of all three components, allowing for an agile exploration of the search space to produce high-quality solutions for the presented problem. Figure 1 depicts this new variant of the TOP, where the travel times display a wide range of features.These features include certain travel times (deterministic), uncertain travel times (stochastic), and travel times that can change over time due to dynamic factors (dynamic), such as varying traffic congestion and weather conditions.This blend of characteristics results in a more precise representation of the complexities evident in real-world situations.The main contributions of this research are as follows: (i) we present a hybrid variant of the TOP, which combines deterministic, stochastic, and dynamically changing travel times due to factors such as traffic congestion and weather conditions; and (ii) we propose a novel methodology called the sim-learnheuristic algorithm to solve this rich variant of the TOP, where the integration of three distinct components is crucial for its capability to solve increasingly complex combinatorial optimization problems.Figure 2 illustrates the integration of the three components.Initially, the metaheuristic component addresses the deterministic aspects of the problem.However, real-world scenarios often involve uncertainties.To account for this, the simulation component incorporates the stochastic nature of the environment.By combining this with the metaheuristic component, the simheuristic component effectively tackles stochastic problems.Moreover, real-world conditions are dynamic; thus, machine learning algorithms are employed to predict and adapt to these changes.Integrating these predictions within the simheuristic framework enables it to address and solve complex dynamic problems effectively.The remaining sections of the paper are structured as follows: Section 2 presents brief reviews of related articles.Section 3 presents the definition of the hybrid TOP, and Section 4 describes the proposed methodology of the sim-learnheuristic algorithm.Section 5 carries out a series of computational experiments to illustrate the performance of the proposed algorithm, and Section 6 presents the discussion of the obtained results.Finally, Section 7 highlights the main conclusions and future research lines of this work.

Related Work
In the field of optimization, simheuristics and learnheuristics are increasingly being used to solve complex combinatorial optimization problems under uncertainty.Simheuristic methods combine the advantages of both simulation and heuristics to tackle such problems.On the other hand, learnheuristics employ machine learning algorithms to improve the quality of solutions over time.Gonzalez-Neira et al. [13] proposed a hybrid simheuristic method to solve a complex multicriteria stochastic permutation flow shop problem.This method combines a greedy randomized adaptive search procedure, Monte Carlo simulations, a Pareto archived evolution strategy, and an analytic hierarchy process to handle stochastic processing times and sequence-dependent setup times.It considers quantitative criteria like earliness/tardiness and qualitative criteria like product and customer importance.The experimental results showed the significant impact of processing time distributions and coefficients of variation on decision criteria.Similarly, Caldeira and Gnanavelbabu [14] proposed a simheuristic approach to solve the stochastic flexible job shop scheduling problem.They integrated Monte Carlo simulations into a Jaya algorithm framework to minimize the expected makespan, considering uncertainties in production processes represented by random variables with known probability distributions.The computational results demonstrated its effectiveness across different variability levels using reliability-based methods, showcasing its capability in handling stochasticity in flexible job shop scheduling.Yazdani et al. [15] proposed a novel simheuristic approach to address the waste collection routing problem in the context of construction and demolition waste management.The proposed approach utilized a hybrid genetic algorithm to optimize vehicle route planning from construction projects to recycling facilities.In their research.they conducted a comparative analysis with existing approaches and demonstrated the high performance of the proposed simheuristic algorithm.Real case studies from Sydney, Australia, were used for evaluation, contributing significantly to optimizing future waste collection problems and providing recommendations for decision-makers and practitioners.Crawford et al. [16] proposed a Q-learnheuristic framework integrating Q-Learning into metaheuristics to address the exploration-exploitation balance dilemma.This framework, applied to various metaheuristics including the Whale Optimization Algorithm and the Sine-Cosine Algorithm, enhanced the selection of operators like binarization schemes for binary combinatorial problems.This study extended the framework's applicability, demonstrating statistical improvements in both the exploration-exploitation balance and the solution quality when solving the set covering problem.Gomez et al. [17] introduced a novel approach to tackle the capacitated dispersion problem, which involves selecting elements within a network while considering their bounded service capacities.The introduction of a random Bernoulli component influences node capacities, potentially resulting in nodes with zero capacity based on environmental variables.The proposed learnheuristic approach hybridizes a heuristic algorithm with reinforcement learning, offering a promising method to handle this intricate problem variant effectively.Bullah and van Zyl [18] proposed a novel learnheuristic approach to solve a constrained multi-objective portfolio optimization problem.The authors developed a hybrid approach that combines machine learning and a metaheuristic algorithm to obtain high-quality solutions that satisfy multiple objectives, such as maximizing returns while minimizing risk and adhering to constraints such as transaction costs and diversification requirements.The learnheuristic approach involves training a neural network to predict the quality of solutions obtained by a metaheuristic algorithm and using this prediction to guide the search toward promising regions of the solution space.
Focusing on the TOP, several articles have been published in recent years.Notably, Tricoire et al. [19] employed an adaptive algorithm based on the Markov decision process to decide whether to continue the route or take a shortcut given a specific deadline.Mufalli et al. [20] discussed sensor assignment and routing for UAVs to maximize intelligence gathering, considering constraints like a limited battery and sensor weight.They proposed a detailed plan using mathematical models and local search strategies, with column generation for efficiency in larger missions.Likewise, Saeedvand et al. [21] presented a hybrid technique for solving the TOP in rescue operations using humanoid robots, aiming to optimize energy, task completion, and decision-making through a learning algorithm and an evolutionary multi-objective approach.Schmitt-Ulms et al. [22] presented a machine learning approach to solving a stochastic TOP with time windows.The authors used a reinforcement learning algorithm, specifically the Q-learning algorithm, to learn how to make routing decisions for a deliveryman facing stochastic travel times and customer demands.Also, Lee and Ahn [23] presented a data-efficient deep reinforcement learning (DRL) approach to solve a multi-start TOP for UAV mission re-planning.The authors propose a method that learns to adapt to changing conditions by iteratively updating a policy network through experience gained from previous UAV mission planning problems.Xu et al. [24] discussed a novel application of electric vehicles in TOP systems, focusing on charging energy-critical sensors using mobile chargers.Due to charger limitations, visiting all sensors was restricted, and costs were assigned considering charger energy consumption and travel.Different vehicle types and their capabilities were also considered, with sensors possibly visited by multiple vehicles, affecting profit margins.An approximation algorithm was proposed initially and later refined for handling vehicles of the same type.In contrast, Sundar et al. [25] introduced a concurrent multi-threaded branch-and-price algorithm with acceleration techniques for the TOP involving fixed-wing drones.They addressed the kinematic constraints limiting drones' maneuverability, particularly quick turns and minimum turn radii.While their approach achieved optimal solutions for most cases, its reliance on exact methods hindered real-time solutions, crucial for dynamic systems.Wang et al. [26] focused on the dynamic and stochastic orienteering problem in autonomous transportation systems.They introduced self-adaptive heuristic algorithms to optimize task execution for intelligent agents, maximizing rewards and ensuring timely returns to the starting location within a limited time frame.The orienteering problem enables agents to selectively visit spots and optimize routing sequences for reward maximization.The study proposed a hybrid simulated annealing-tabu search algorithm for initial routing plans and three multi-stage optimization strategies for real-time adjustments.Simulation experiments demonstrated the effectiveness of the algorithm, improving the solution quality and ensuring timely arrivals for agents.Elzein and Di Caro [27] proposed a novel multi-stage metaheuristic algorithm to efficiently tackle large orienteering problem instances.The metaheuristic partitions a potentially large set of candidate sites into smaller clusters, where a solver is used to find near-optimal solutions within a bounded time.These solutions are then merged and optimized to provide a final high-quality solution.The metaheuristic significantly improves the computation time without a substantial loss in solution quality, making it suitable for online applications in robotics, logistics, and transportation scenarios with dynamic environments and a large number of candidate points.The results demonstrated the effectiveness and efficiency of the proposed approach over large benchmark instances.Le et al. [28] addressed the dynamic orienteering problem based on a VNS algorithm.The algorithm efficiently adapted existing solutions in a dynamic environment.Experimental comparisons were made with two other improvement heuristics using benchmark instances and road networks.This study evaluated the algorithm's performance, considering the impact of dynamic node values and budget changes.Fang et al. [29] investigated the use of UAVs for monitoring landslide-prone areas by solving a TOP with mandatory visits that maximize data collection with multiple UAVs.The authors incorporated a novel algorithm such as a large neighborhood search with a neural network heuristic for an enhanced efficiency and solution quality in both small-and large-scale scenarios.Lastly, Juan et al. [30] provided a hybrid methodology that combined simulation and reinforcement learning to effectively manage a vehicle's battery under dynamic settings.This approach demonstrated considerable advantages over non-informed decisions, showing its potential for routing plans that account for multiple dynamic elements such as weather, congestion, and battery conditions.
Most of the reviewed papers are summarized in Table 1, which discuss deterministic, stochastic, and dynamic versions of the TOP.This paper considers the combination of stochastic and dynamic scenarios of the TOP, and introduces a novel sim-learnheuristic methodology designed to address these combined scenarios.

Problem Definition
The TOP is an NP-hard optimization problem [1], with a formal definition on a directed graph G = (V, A).The graph has V = {0, 1, 2, . . ., n + 1} nodes, where the origin depot is node 0, the destination depot is node n + 1, and N = {1, . . ., n} are the intermediate nodes.The edges in A = {(i, j)|i, j ∈ V, i ̸ = j} represent the connections between the nodes.We consider a set of homogeneous UAVs denoted as D, and each UAV d ∈ D begins its journey from the origin depot, serves a subset of intermediate nodes, and finally reaches the destination depot.The objective is to maximize the total reward collected by all UAVs.Following the mixed-integer linear programming model for the TOP proposed by [12], the hybrid version of the problem, where the travel times are either deterministic, stochastic, or dynamic, can be expressed formally as follows: where the objective is to maximize the aggregated deterministic reward of visited nodes over the set of edges.For each visited node in the route, it yields a reward u i ≥ 0. Note that these rewards are zero for both the origin and destination depots, while they hold strictly positive values for the customers.More specifically, for each edge (i, j) ∈ A and each UAV d ∈ D, we consider the binary variable x d ij , which is equal to 1 if UAV d traverses through edge (i, j), and takes the value 0 otherwise.
Below are the provided constraints with their explanations.In the course of the tour, each point is visited at most once to ensure that no point is revisited (2).
The variable y d i is introduced to indicate the position of node i in the tour made by vehicle d, and constraint (3) makes sure that there are no sub-tours.
Constraint (4) states that the total travel time of each vehicle should not be more than its threshold (T max ), since each UAV d ∈ D starts its route and can only serve some intermediate nodes due to the limited travel time.The total travel time is the sum of the travel times of deterministic, dynamic, and stochastic edges, which are included in A det , A dyn , and A stoch , respectively.These are disjoint sets that include all the edges of set A. For each edge (i, j) ∈ A det , we assume that the travel time of each edge is deterministic and predefined (t ij = t ji > 0).Similarly, for each edge (i, j) ∈ A stoch , we assume the travel time T ij is stochastic and can be modeled using a probability distribution function.Lastly, for each edge (i, j) ∈ A dyn , we assume the travel time tij is dynamic, which depends on various factors such as weather and traffic conditions.Since the stochastic and dynamic travel times bring in a level of uncertainty, this constraint introduces a probabilistic approach to account for such unpredictability in travel times.The parameter γ, which ranges between 0 and 1, represents a threshold that defines the acceptable level of probability for keeping the travel time within the specified travel limit.
Constraint (5) ensures that for every arrival at a node, there is a corresponding departure from that node.
Constraint (6) indicates that the tour always commences at the starting node 0 and ends at the ending node n + 1.
Constraints ( 7) and ( 8) refer to the nature of y d j and x d ij variables.Finally, the vehicle departs from each node it visited, except for the end node.
Although the deterministic version of the TOP has been widely studied in the literature, the high degree of uncertainty in real-life applications of the TOP makes it a good candidate for our purpose.In this work, both stochastic and dynamic travel times are introduced in the TOP.Hence, we will consider that the travel time of some edges can be modeled as random variables following a theoretical probability distribution.Likewise, we will consider that other edges have uncertain travel times that have to be predicted using a multiple regression model.As far as we know, no other authors have addressed this realistic version of the TOP in the past.

Our Sim-Learnheuristic Approach
To effectively solve the hybrid TOP described in Section 3, we propose a novel simlearnheuristic method that combines a biased-randomized heuristic multi-start (BR-MS) framework, Monte Carlo simulations (MCSs), and a multiple regression model.This methodology aims to combine the advantages of all elements, enabling efficient exploration of the search space for optimal solutions [31].
This sim-learnheuristic methodology extends the simheuristic approach proposed in [8] to solve the TOP with stochastic processing times.Figure 3 outlines the main components of the sim-learnheuristic methodology.The sim-learnheuristic method can be summarized in the following six main steps, which can be identified by labels S1 through S6.First, the optimization problem is initially converted to its deterministic version, where variables are replaced by their expected or most likely values, effectively eliminating uncertainty.This deterministic problem is then solved using a BR-MS framework, leading to the generation of multiple high-quality solutions.These deterministic solutions are subsequently subjected to a comprehensive evaluation, considering the various sources of uncertainty inherent in the problem.This evaluation involves a short number of MCS runs, during which random variables and dynamic elements are assigned different values based on probability distributions or machine learning models, respectively.Additionally, constraints are assessed under uncertain conditions.Descriptive statistics are computed for each solution, yielding detailed insights into their performance.The examination of these solutions informs the generation of new solutions within the BR-MS framework until a predefined stopping criterion, such as a maximum execution time, is met.Following the previous stage, a subset of top-performing solutions, referred to as 'elite' solutions, are further examined under uncertainty.This second examination involves a longer number of MCS runs than the initial evaluations.Based on this intensive examination, the solutions are ranked, and the best solution is recommended.The selection of the best solution may take into consideration measures of interest beyond just the expected value, such as variance or reliability, to ensure a more robust decision-making process.Algorithm 1 outlines the main steps of our sim-learnheuristic approach.The simlearnheuristic algorithm receives an instance as input, comprising the nodes, the maximum number of vehicles, and the maximum travel time.First, a feasible initial solution (initSol) is generated by applying a savings-based heuristic (line 1).The savings-based heuristic was first proposed by Panadero et al. [8].Then, a short number of MCS runs (n short ) are performed to determine the expected reward of the initial solution (line 2).At this point, the initial solution becomes the best-known solution so far (bestSol) and the pool of best stochastic solutions (eliteSols) is initialized with this solution (lines 3-4).Next, the algorithm performs an iterative procedure to generate new solutions until a maximum execution time is reached (lines 6-16).At each iteration, the algorithm generates a new solution (newSol) using a biased-randomized version of the savings-based heuristic (line 7).The biasedrandomized heuristic introduces a slight modification in the greedy behavior, providing a certain degree of randomness while maintaining the logic behind the savings-based heuristic.The biased-randomized version considers each element in the edges list with a probability that follows a geometric distribution with a single parameter β ∈ (0, 1), which controls the relative level of greediness in the randomized behavior of the algorithm.By employing a biased-randomized version of the savings-based heuristic, multiple alternative solutions can be generated without losing the logic of the original heuristic.Next, the algorithm compares the newly generated solution to the best-known solution.If the new solution has a higher reward value (line 8), a short number of MCS runs are performed on the new solution to calculate its expected reward (line 9).In addition, if the newly generated solution obtains a higher expected reward (line 10), the best-known solution is updated and the new solution is added to the pool of elite solutions (lines [11][12].The computational time employed to generate new solutions is updated before checking if the maximum execution time (t max ) is reached (line 15).Once the stopping criteria are met, the algorithm conducts a refinement procedure using a larger number of MCS runs (n long ) over the pool of elite solutions (lines [17][18][19][20][21][22].This allows for more accurate estimations of the solutions in the elite pool.Finally, the solution with the highest expected reward is selected from the pool of elite solutions and returned by the algorithm (line 23).end if 22: end for 23: return bestSol Algorithm 2 depicts the main components of the savings-based heuristic.The heuristic starts by constructing a dummy solution (sol) with as many routes as nodes (line 1).Specifically, each route goes from the origin depot to a node before concluding its route at the destination depot.Notice that the constructed routes that exceed the maximum travel time are deleted from the solution.Then, the savings s i,j associated with each edge connecting two different nodes i and j are generated as follows:

Algorithm 1 Sim-learnheuristic algorithm
where t i,j is the travel time between nodes i and j, t i,n+1 is the travel time between node i and the destination depot, t 0,j is the travel time between the origin depot and node j, u i and u j are the rewards at nodes i and j, and α is a parameter used to balance travel time and collected reward.Notably, each pair of nodes has two different savings, one for each direction or edge.The savings associated with each edge are sorted in descending order, from the highest to the lowest saving value (line 2).This sorted list of savings is iterated, and a merging process is started (lines 3-15).At each iteration, a savings edge is extracted from the list of savings, and the two routes associated with the savings edge are obtained (lines 4-6).These two routes are merged, and the resulting route is checked to see if the merge does not exceed the maximum travel time constraint (line 7).If the merge is feasible, the two routes are removed from the solution, and the merged route is inserted instead (lines 8-13).Once the merging process finishes, the routes are sorted based on their collected reward in decreasing order from higher to lower rewards (lines 16).Then, the number of routes equal to the number of vehicles in the fleet (v max ) constitutes the solution proposed by the heuristic (line 17).Finally, the resulting solution with the highest rewarded routes is returned by the procedure (line 18).ijEdge ←selectNext(savingsList) iRoute ←getOriginRoute(ijEdge) jRoute ←getEndRoute(ijEdge) newRoute ←mergeRoutes(iRoute, jRoute) isMergePossible ←checkMergeConditions(newRoute) The simulation process is depicted in Algorithm 3. The simulation procedure takes two input parameters: a solution (sol) and the number of simulation replications (n runs ) to estimate the performance of the solution under stochastic and dynamic conditions.In the procedure, a variable keeps track of the accumulated expected reward (sumReward), which is initialized to zero at the start of the simulation process (line 1).The simulation is run it max times, and at each run, the expected reward of the solution (solReward) is added to the accumulated expected reward (lines 2-28).For each route in the solution, the route's expected reward (routeReward) is computed along with the route's expected cost (routeCost) (lines 4-26).If the expected route cost exceeds the maximum travel time, a penalty is applied to the route, resulting in a loss of all the collected rewards from the visited nodes (lines [22][23][24].To compute the route's expected reward, we iterate through the route's composing edges, and for each edge, we obtain its expected cost (lines 7-21).The expected cost of an edge is computed based on its deterministic, stochastic, or dynamic characteristics.The expected cost of a deterministic edge is predefined and specific to each particular instance (line 12).Moreover, the expected cost of a stochastic edge is sampled using a non-negative probability distribution, such as the Weibull or Log-Normal distributions, to simulate the uncertainty in the route plan (line 14).Finally, the expected cost of a dynamic edge is predicted using a multiple regression model considering the current dynamic conditions at this time (lines [16][17].Once the simulation replications are complete, the expected reward is computed as the average accumulated expected reward across all the simulation replications and is assigned as the expected reward of the solution (lines 29-30).edgeCost ←getCost(edge) sumReward ← sumReward + solReward 28: end for 29: expReward ← sumReward/it max 30: setExpReward(sol, expReward)

Computational Experiments
The proposed sim-learnheuristic algorithm was implemented using Julia 1.8.2 and tested on a personal computer equipped with an Intel Core i7 processor operating at 2.8 GHz and 16 GB of RAM.The algorithm's computational time was set to a time limit of 60 s for all instances.The β parameter for the geometric distribution was randomly assigned from the interval (0.1, 0.3) after a quick tuning process over a random sample of instances which established a good performance.For the exploratory and refinement stages of the simulation phase, we conducted 100 and 1000 simulation runs, respectively.To validate the proposed methodology, we randomly selected 23 instances from a well-known benchmark presented next and conducted 10 separate executions for each instance, utilizing varying initial seeds for the algorithm.Among the results from these runs, we reported the best solutions based on the highest collected reward.
Given the unavailability of publicly accessible instances for the dynamic and stochastic TOP, we decided to enhance the benchmark instances introduced by Chao et al. [1] for the deterministic version of the TOP.The benchmark collection consists of 320 examples, categorized into seven distinct sets.Each example within a set is identified using the naming convention px.y.z,where x represents the set number, y denotes the number of drones, ranging from two to four depending on the specific instance, and z indicates the maximum driving range.These benchmark instances have been widely used in previous research to assess the performance of algorithms in solving the deterministic TOP [32][33][34].However, for our study, we extended these instances to encompass four different scenarios: deterministic, stochastic, dynamic, and hybrid scenarios.
In the deterministic scenario, the travel time for each route is predefined.The travel time is given by the Euclidean distance defined between each pair of nodes of the benchmark instances.
In the stochastic scenario, we introduce uncertainty into the travel times between nodes.Each edge (i, j) ∈ A in the directed graph G = (V, A) is now defined by a travel time, T ij = T ji > 0, which is not deterministic but follows a best-fit probability distribution function with a mean value E[t ij ] > 0. In our computational experiments, we used the Log-Normal probability distribution to model the random travel times.The Log-Normal distribution is preferred over the Normal distribution when modeling non-negative random variables.It has two parameters, namely the location parameter µ and the scale parameter σ.These parameters can be determined based on the properties of the Log-Normal distribution considering the stochastic travel times between nodes i and j are assumed to be as follows: , the travel costs of the deterministic instances), and Var[T ij ] =c•t ij for all i, j ∈ {0, 1, 2, . . ., n + 1}.The parameter c serves as a design parameter enabling us to control the level of uncertainty.As c approaches zero, the results of the stochastic scenario are expected to converge with those obtained in the deterministic scenario.For our analysis, we have employed the value c = 1, which introduces a significant level of uncertainty as it uses the deterministic travel time as variance.A more comprehensive discussion on the implications of varying the parameter c can be found in Panadero et al. [12].Specifically, the authors consider three different levels of uncertainty: low (c = 0.05), medium (c = 0.25), and large (c = 0.75).In addition, the edges are categorized into two parts for the stochastic scenario following these conditions.If the node index is even, we classify the corresponding edge as stochastic.Otherwise, we classify the edge as deterministic.
In the dynamic scenario, the travel times between nodes are predicted using a multiple regression model that takes into account the current dynamic conditions.This means that instead of relying on fixed or predefined travel times, the travel times between nodes are now estimated based on the real-time or dynamic information available at the time of planning the routes.This multiple regression model takes into account external factors such as weather and congestion conditions to calculate the dynamic cost.The weather adversity level w and congestion adversity level c, ranging from 0 (low) to 1 (high), serve as input variables for the regression model.These two parameters are generated from a uniform distribution, which indicates that extreme weather and traffic conditions are as likely to occur as more standard ones.This generation matches the dynamic nature of the system, as every condition is likely to occur at any point in time.Thus, the dynamic travel times are calculated with Equation (10) as follows: where t ij represents the deterministic time of the edge connecting nodes i and j, b e is a coefficient for edge standard time, b 1 represents the coefficient for the weather, and b 2 represents the coefficient of the congestion.In addition, the term ϵ represents the independent error term, assumed to be 0 to maintain the deterministic travel time value when the conditions are ideal (w and c parameters are 0).For our analysis, we have employed the values b e = 1, b 1 = 0.005, and b 2 = 0.0075, which was designed so the dynamic travel time varies between the deterministic travel time and 1.125 times the deterministic travel time.In addition, the edges are categorized into two parts for the dynamic scenario following these conditions.If the node index is even, we classify the corresponding edge as dynamic.Otherwise, we classify the edge as deterministic.
In the hybrid scenario, we combine three distinct edge types: deterministic, stochastic, and dynamic edges.In determining the type of edge, we implement these specific conditions.If the node index is even, we classify the corresponding edge as stochastic.Otherwise, if the node index is divisible by three, we classify the edge as dynamic.If neither of these conditions is met, we assign the edge as a deterministic edge.This systematic approach ensures the accurate classification of edges within our model.
Due to the uncertain nature of travel times, there is a possibility of route failure when a UAV is unable to reach the destination depot within the designated time limit.Therefore, the reward collected by a particular UAV, r d , is defined as follows: Hence, when a UAV embarks on a route, it faces uncertainty in the time it takes to travel between nodes due to the probabilistic nature of the travel times.If the UAV fails to complete the planned route within the limited time, it forfeits all the rewards it accumulated during its journey.It is important to note that any partial rewards obtained are considered valid solely if the UAV manages to reach the destination node before the end of its travel time.

Results and Discussion
Table 2 shows the experimental results for each instance.The first column identifies the instances, while the remaining columns show the obtained results under the four considered scenarios.The first section of the table reports the best-known solutions for the deterministic variant of the problem obtained from the literature (BKS), our best-found solutions for the deterministic version of the problem (OBD), and the percentage gaps when both solutions are compared.The second section of the table reports the obtained results for the stochastic scenario.First, the best-found solutions for the deterministic scenario when evaluated in a stochastic environment are reported (OBD-S).To compute this solution, an intensive simulation process has been applied to the OBD.The main idea behind this process is to assess the best-found deterministic solutions under stochastic uncertainty.Next, the best-found stochastic solutions obtained using our sim-leanheuristic approach are reported (OBS).This approach considers the stochastic elements during the search for a high-quality solution.The next section of the table shows the obtained results for the dynamic scenario.First, the best-found solutions for the deterministic version when evaluated under a dynamic environment are shown (OBD-Dy).Similarly, the best-found solutions for the dynamic scenario are reported (OBDy).Lastly, the results for the hybrid scenario are reported.First, the best-found solutions for the deterministic version when evaluated in a hybrid scenario are shown (OBD-H).Next, the best-found solutions for the hybrid scenario are reported (OBH).
It is noteworthy that for each instance, the best-found solutions obtained by our sim-learnheuristic approach are consistently closer to the BKS when compared with the best-found deterministic solutions simulated under the corresponding types of uncertainty.In other words, our best-found solutions for the deterministic version of the problem are sub-optimal for the different stochastic, dynamic, and hybrid scenarios.For instance, considering the dynamic scenario, when simulating the OBD, solutions often exhibit significant fluctuations.This is exemplified by the p5.3.zinstance, as exemplified by p5.3.z,where OBD-Dy yields an expected reward of 2.34 and OBDy yields an expected reward of 1525.00.This difference can be attributed to the way our sim-learnheuristic approach explores the search space for solutions.Specifically, our novel methodology considers this uncertainty throughout the search for high-quality solutions.In other words, incorporating uncertainty elements in the search process leads to superior results compared to solely solving a deterministic version of the problem and then applying the solution in a real-life scenario with dynamic uncertainty.This fundamental difference serves as the key factor behind the substantial disparities observed between the best-found solutions obtained using our sim-learnheuristic method and the results obtained simulating the best-found deterministic solutions under the different uncertainty scenarios.Moreover, if we compare the computed averages for the different scenarios, we observe that we can classify the four different scenarios based on their uncertainty levels.Firstly, the deterministic scenario can be considered a reference scenario with perfect information (i.e., without uncertainty) for the expected reward under different uncertainty conditions.Conversely, in the stochastic scenario, we observe a lower uncertainty level, resulting in an average expected reward of 221.87 and 243.02 for the OBD-S and OBS, respectively.This can be attributed to half of the edges' travel times being stochastic in nature, while the other half being deterministic.Similarly, for the hybrid scenario, we can observe a medium level of uncertainty which results in an average expected reward of 209.73 and 233.10 for the OBD-H and OBH, respectively.Lastly, considering the dynamic scenario, it demonstrates the highest level of uncertainty, driven by the dynamic conditions that result in highly variable travel times.The resulting average expected reward is 97.76 and 243.02 for the OBD-Dy and OBDy, respectively.The dynamic scenario comprises half the dynamic and half the deterministic travel time, while the hybrid scenario consists of roughly one-third of the dynamic and one-third of the stochastic travel times.In other words, the combined uncertainty of the travel times in the dynamic scenario surpasses that of the hybrid and stochastic scenarios.
Figure 4 shows an overview of Table 2 which details the performance of our simlearnheuristic approach for all considered scenarios.The horizontal and vertical axes represent the four uncertainty scenarios and the percentage gap obtained with respect to the BKS reported in the literature, respectively.Median values are represented by triangles and lines, while outliers are presented by circles.The average percentage gap for the OBD-S is 33.48, while the average percentage gap for the OBS is 30.16.This means that, on average, the OBS solutions are closer to the BKS solutions in the stochastic scenario.In the hybrid scenario, if we compare the percentage gaps of the OBD-H with the OBH, we can see that the OBH outperforms the OBD-H and is closer to the BKS.However, the average gap for OBD-Dy is 55.12.This very high gap indicates that the deterministic approach does not adapt well to the changing dynamics of the system and results in considerably worse outcomes.

Conclusions and Future Work
This paper presents a hybrid version of the TOP encompassing deterministic, stochastic, and dynamic travel times.To address this problem, a novel sim-learnheuristic methodology is proposed, effectively combining a savings-based heuristic algorithm, Monte Carlo simulations, and a multiple regression model.By integrating stochastic and dynamic components in the sim-learnheuristic methodology, this approach captures the impact of randomness and variability, obtaining high-quality solutions in scenarios with uncertainty.This combination allows for the modeling of scenarios with both stochastic and dynamic components simultaneously.The computational experiments reveal that relying on the best solutions found for the deterministic version of the problem can lead to high variance and unreliability in realistic scenarios with uncertainties.These uncertainties can result in increasing travel times due to factors like traffic congestion and weather conditions.In contrast, the sim-learnheuristic approach is able to find high-quality solutions in uncertainty scenarios as it incorporates the stochastic and dynamic components in the search for solutions that maximize the collected rewards.This approach ensures the generated solutions are both reliable and robust, even in uncertain conditions with stochastic and dynamic travel times.
Moving forward, our future work encompasses two key aspects.Firstly, we aim to solve the multi-depot variant of the TOP.Secondly, we intend to enhance the realism of our approach by employing discrete event simulations and more intricate machine learning models.This will enable us to account for diverse interactions and complex environmental conditions, such as synchronization between various vehicles or UAV battery levels.

TFigure 1 .
Figure 1.Hybrid variant of the TOP considering different kinds of travel times.

Figure 4 .
Figure 4. Gaps with respect to BKS for the different scenarios.

Table 1 .
Summary of reviewed papers.

Table 2 .
Experimental results for the different scenarios.