Combining Parallel Computing and Biased Randomization for Solving the Team Orienteering Problem in Real-Time

: In smart cities, unmanned aerial vehicles and self-driving vehicles are gaining increased concern. These vehicles might utilize ultra-reliable telecommunication systems, Internet-based technologies, and navigation satellite services to locate their customers and other team vehicles to plan their routes. Furthermore, the team of vehicles should serve their customers by speciﬁed due date efﬁciently. Coordination between the vehicles might be needed to be accomplished in real-time in exceptional cases, such as after a trafﬁc accident or extreme weather conditions. This paper presents the planning of vehicle routes as a team orienteering problem. In addition, an ‘agile’ optimization algorithm is presented to plan these routes for drones and other autonomous vehicles. This algorithm combines an extremely fast biased-randomized heuristic and a parallel computing approach.


Introduction
Sustainable cities and communities are identified as one of the 17 sustainabledevelopment goals proposed by the UN [1]. The achievement of this goal requires the significant renewal of the way urban space is perceived. In this context, smart cities are envisaged as the main driver of such a transformation. Advanced materials, sensors, electronics, and networks embedded in our physical and social systems constitute the core concept of a smart city, allowing for more sustainable and resilient societies [2].
Smart cities are an ever-changing environment [3], which aims to provide high-quality life to its citizens supported by the advancement in information and communication technology and the integration of the Internet of Things (IoT). As a result, new mobility modes are considered such as ride-sharing [4] or the incorporation of electric vehicles [5].
Unmanned aerial vehicles (UAVs), commonly known as drones, have attracted significant attention in the last decade, given their potential for new applications [6]. For example, in the case of smart cities, they can work as aerial base stations, either collecting data from mobile and ground sensors or serving as sensor-mounted aerial platforms [7]. In that way, an on-demand data service can be realized, covering a larger number of sensors and users.
A major disadvantage of these devices is their limited energy capacity, which adds a certain complexity to implementing these types of services, especially when serving remote sites [8]. Several operational research lines arise to cope with this limitation: (i) the optimal drone placement (ODP) problem, which guarantees the coverage of static or dynamic targets minimizing the required energy [9,10]; (ii) the problem of selecting the optimum charging station once the drones have finished their tasks [11]; and (iii) the route-planning problem and all its variants [12,13]. Considering that a fleet of drones can work in a coordinated manner to achieve a common goal, such as the aerial surveillance of an extended area, the problem of coordinating their individual paths accounting for their available energy can be envisioned as a team orienteering problem (TOP). The objective of the TOP would be, following the case of the aerial surveillance, to maximize the area covered by the fleet of drones. Therefore, the efficiency of the fleet of drones is maximized. The drone scheduling problem (DSP) is an extension of the TOP that considers multiple depots (stations) [14]. Figure 1 depicts the conceptual framework of several operational research problems regarding drones. Smart cities allow for real-time data, nevertheless, their potential can come to fruition only if combined with real-time decision-making. This is not only the case for drones, where optimal routing should be established accounting for the dynamic conditions (e.g., dynamic targets) but also for other types of autonomous vehicles working collaboratively, which might be increasingly frequent in transportation and mobility activities. Traditional optimization approaches cannot handle real-time conditions effectively [15]. Therefore, new optimization approaches are required to deal with large-scale systems in 'real-time' (e.g., less than one second). The development of efficient solving approaches becomes incredibly challenging given the large-scale problems that real applications involve, and the fact that the TOP is NP-hard in nature like the most of routing problems [16].
In order to tackle the challenges imposed by smart cities, this paper proposes the use of agile optimization algorithms [17] to solve TOPs with dynamic conditions in 'real-time', i.e., below one second of wall-clock time. In this context, optimization algorithms are redesigned to become (i) extremely fast to support real-time decision-making; (ii) extendable to adapt parallelization techniques; (iii) flexible to handle different problems; (iv) parameter loss to avoid parameter fine-tuning; and (v) dynamic to rerun as new data become available (re-optimization).
Agile optimization includes the hybridization of biased-randomized algorithms [18] and parallel computing [19]. The biased randomized algorithms result from utilizing skewed probability distributions in deterministic heuristics [20]. The deterministic heuristics handle even large-scale problems efficiently. The utilized distributions introduce a randomized common-sense effect in these heuristics and result in probabilistic algorithms. The probabilistic algorithms could be run in parallel to generate different solutions using an affordable computing device. Thus, thousands of these probabilistic algorithms could be run simultaneously. The probabilistic algorithms generate many alternative solutions in the same clock time compared to one solution found by the deterministic heuristic. Some of these solutions could outperform the solution determined by the deterministic heuristic. The biased-randomization techniques have been successfully tested on a variety of optimization problems [21][22][23].
The parallel algorithms were utilized by a number of researchers in solving routing problems. For example, Roberge and Tarbouchi [24] developed a framework to download data from sensors by unmanned aerial vehicles. In their research, they used a genetic algorithm as the single source algorithm to optimize the routes of the vehicles. The single algorithm was run in parallel on a graphics processing unit and aimed to avoid collision. Yelmewad and Talawar [25] used a graphics processing unit based parallel strategy to reduce execution time needed to solve the vehicle routing problem. Other researchers utilized multi-core approach for parallel computing. Abbasi et al. [26] aimed to reduce the cost of intelligent systems in transportation. For that purpose, they studied multi-core processors as well as graphics processing units and concluded that the parallelization resources are efficiently utilized. Multi-threading in a multi-core system was utilized in the multi-start approach [27]. Other researchers studied used protocol for data transmission between different nodes in a network, such as Huang et al. [28]. These protocols aim to reduce delays and energy consumption in different IoT devices.
In previous publications, the stochastic version of the TOP was investigated [29,30]. Simheuristic approach was used to handle the stochasticity of travel times. Monte Carlo simulation was integrated with a biased randomized heuristic in Panadero et al. [29], and Panadero et al. [30] combined a saving based heuristic and the variable neighborhood search metaheuristic and integrated them in a simheuristic approach. Furthermore, the dynamic change of customers' reward was investigated by Reyes-Rubiano et al. [31]. For this purpose, a biased randomized heuristic was extended into a learnheuristic. Therefore, the main achievements of this paper are (i) proposing a fast heuristic able to generate real-time solutions of reasonably good quality for the TOP; (ii) extending the heuristic into a biased-randomized algorithm to generate many alternative promising solutions-some of them might outperform the original one found by the heuristic; and (iii) integrating the biased-randomized algorithm into a parallelization framework to generate solutions in real-time. This paper presents a parallelized biased-randomized algorithm to solve the TOP.
The remaining of the paper is structured as follows. Section 2 provides some related work on the TOP, which serves as a scenario for testing our concepts, and Section 3 introduces a mathematical formulation of the considered problem. Section 4 presents a biased-randomized algorithm to be easily parallelized. The actual parallelization concepts are analyzed in Section 5. Section 6 describes the set of computational experiments that have been carried out to test both exact and approximation-based solution methods. Section 7 analyzes in detail the obtained results. Finally, Section 8 summarizes the main contributions of the work and possible further future work.

Related Work
The Orienteering Problem (OP) is among the most widely studied combinatorial optimization problems, and even the Traveling Salesman Problem (TSP) with profits or Selective TSP is commonly named as the OP [32]. The term OP was first introduced by Golden et al. [33] that defines the TSP involving Knapsack constraints. Therefore, the goal of the OP is to simultaneously (i) minimize the travel cost, which usually appears as a constraint, and (ii) maximize the collected profit associated with each node. Moreover, it is remarkable to mention that the OP is NP-hard [34]. We refer the reader to the surveys Feillet et al. [35] for further information about OP solution approaches and more recently in [36,37] for further information about OP numerous extensions and recent variants.
The Team Orienteering Problem (TOP) is a well-known extension of the OP [38], and it is also referred as the Vehicle Routing Problem (VRP) with profits [37]. In particular, the TOP considers a group of agents for collecting the profits and provides a solution with several tours [39] and, in that sense, the connotation of the TSP turns into the VRP. In addition, most of the TOP's studies originate from previous studies devoted to the OP and VRP that have been adapted or extended to the context of the TOP.
Regarding the TOP solution techniques, from last decades until now, there are few solution approaches based on exact methods such as Branch-and-Price [40,41], Branchand-Cut [42], and Column Generation [43]. However, to tackle large instances and reduce computational times, we observe that (meta)heuristic methods are desirable. These methods might find (near)optimal solutions for the TOP problem, such as Tabu Search [38,44], Variable Neighborhood Search [45], Ant Colony [46], Memetic Algorithm [47], Simulated Annealing [48], Particle Swarm Optimization [49], Genetic Algorithm [50], Pareto Mimic Algorithm [51], and Hybrid Harmony Search [52]. In recent years, simheuristics have been introduced to solve large combinatorial optimization problems [53][54][55]. Regarding the TOP, we observe that all these techniques in the state of the art have been tested to prove that their solutions, in terms of objective function value and computational times, are competitive ones. Furthermore, we find an extensive comparison of algorithms in the literature on instances of Chao's benchmark [39], such as the comparison in Dang et al. [49]. Commonly, the emerging algorithms focus on reaching the best-known-solution (BKS) for these instances in the benchmark. However, in this paper, we constrain computational time to only one second. Then, we compare our obtained solutions with the BKS and show with a complete analysis that our solutions reach the BKS in most instances.

A Formal Description of the Problem
The TOP can be mathematically formalized as follows. Let us assume an undirected graph G = (N, A), where: (i) N = {0, 1, 2, . . . , n + 1} includes the set of nodes accounting for n customers, N = {1, 2, . . . , n}, a source node 0, and a sink node n + 1; and (ii) A is the set of edges (i, j), with i = j, connecting nodes i ∈ N \ {n + 1} and j ∈ N \ {0}. At the source, node 0, the team is composed of a limited number of vehicles m ≥ 1. Furthermore, there is a maximum time, t 0 > 0, for completing each open route (from its source node to the sink node, with at least one customer node in between). Each customer node i ∈ N has a constant reward u i > 0, while u 0 = u n+1 = 0. Rewards can only be collected on the first visit to a node. Each edge (i, j) ∈ A has an associated travel time, t ij > 0. It is assumed that t ij = t ji , i.e., for (i, j) ∈ A, the matrix of travel times, T = [t ij ], is a symmetric one that satisfies the triangle inequality.
A solution to the TOP is a set of feasible routes departing from the source node, visiting a subset of customers in a specified order, and arriving at the sink node. In other words, each route starts at the source node 0, collects the reward from one or more customers in N , and ends at the sink node, without exceeding the maximum travel time allowed, t 0 . Let us consider the binary decision variable x ij , which takes the value 1 if the edge (i, j) ∈ A is used by a vehicle to collect the reward at node j, and 0 otherwise. In addition, let us define the continuous variable w i , which represents the total travel time that the vehicle has spent after visiting customer i. The objective function is given by the maximization of the total collected reward: This objective function is subject to the following constraints: Constraints (2) and (3) impose that each customer node has at most one edge departing from it or entering it, respectively. In addition, constraint (4) imposes that, for each customer node, the number of incoming edges is equal to the number of outgoing edges (due to the previous constraints, this value will be either 0 or 1). Constraint (5) ensures that the number of routes starting at the source node is the same as the number of routes arriving at the sink node. Constraint (6) forces that the number of routes must be less or equal to the number of available vehicles m. Two constraints, (7) and (8), are introduced for both the connectivity of the solution and the maximum travel time requirement. Constraint (9) avoids degenerated routes. Finally, Constraints (10) and (11) define the range of the associated variables.

From a Heuristic to a Biased-Randomized Algorithm
In this paper, a heuristic is extended to a biased-randomized algorithm to solve the TOP. The proposed algorithm extends the concept of 'savings' introduced by Clarke and Wright [56] to the characteristics of the TOP: (i) different origin and destination depots, (ii) some of the customers might not be covered, and (iii) the reward as well as the savings in time or distance are considered to construct routes. Then, the constructive heuristic is extended to utilize skewed probability distributions to introduce a non-uniform randomization effect into the procedure of constructing solutions. Figure 2 presents the proposed constructive heuristic. An initial dummy solution is built for each customer i ∈ N ; a vehicle departs from the origin depot (node 0), visits customer i, and then head towards the destination depot (node n + 1). If a route associated with customer i in the dummy solution does not satisfy the driving-range constraint, customer i is discarded from the solution because it cannot be served with the current settings of the problem.
Next, the 'enriched savings' associated with each edge connecting two different customers are computed; thus, the benefits obtained by visiting both customers in the same route instead of using two different routes are calculated. The enriched savings of an edge considers the travel time required to traverse the edge and the aggregated reward generated by visiting both customers. Therefore, the enriched savings associated with an edge (i, j), s ij , are defined as s ij = α · s ij + (1 − α) · (u i + u j ) to account for the trade-off between time-based savings s ij = t i(n+1) + t 0j − t ij , and the aggregated reward, u i + u j . Here, α ∈ (0, 1) is a parameter that depends on the rewards of customers. Experiments are used to determine the value of α. In general, α is set close to zero in problems with high heterogeneity rewards and one in problems with homogeneous customer rewards.
For each edge, we obtain its unique associated savings; recall that the matrix of travel times is symmetric, so the savings do not depend on the direction in which the edge is traversed. These savings are associated with arcs that connect two potential routes to be merged. Then, the list of arcs can be sorted from higher to lower savings. Routes are merged based on the sorted savings list. In each iteration of the merging process, the arc at the top of the sorted list is selected, and the two routes connected by this arc are merged into a new route if the driving-range constraint is not violated. Finally, the list of routes is decreasingly sorted according to the total reward. Routes with the highest rewards are selected, and the number of selected routes equals the size of the vehicle fleet. The described heuristic is extended into a probabilistic algorithm as follows: The greedy behavior of the heuristic is altered by combining it with biased-randomization techniques to introduce non-uniform random behavior [57]. In our work, we utilized geometric probability distribution with a parameter β (0 < β < 1). Parameter β of the geometric distribution controls the greediness of the randomized behavior. In our experiments, we set the value of β to 0.3. This value was a result of parameter tuning experiments. Thus, the heuristic becomes randomized algorithm and utilizes the greedy behavior of the original heuristic.

Extending to a Parallel Biased-Randomized Algorithm
The intrinsic characteristics of biased-randomized algorithms make them a good candidate to be parallelized. They could be utilized in a multi-start framework, which is a sequential and iterative approach. In each iteration of this approach, a different solution is generated [58]. In our paper, multi-start methods are composed of two phases: In the first one, a new solution is generated using the biased-randomized heuristic, and in the second one, the algorithm compares the newly generated solution with the best solution obtained so far to update the latter whenever appropriate.
Instead of using this sequential approach, multiple instances of the biased-randomized algorithm can be run in parallel (e.g., each using a different computer thread or core) by assigning a different seed of the pseudo-random number generator to each instance (Figure 3). In parallel programming, these types of methods-where the same code is executed using different input parameters without requiring communication or dependency between the processes-are known as embarrassingly parallel algorithms [59]. These algorithms are usually easy to adapt for parallel execution, and they are good candidates to be executed using massively parallel processing architectures [60].  In the context of unmanned aerial vehicles or self-driving vehicles, decisions should be made in short times, e.g., seconds or even in milliseconds. Using a distributed computing architecture-such as cloud platforms-is an unsuitable approach in this context due to the relatively high latencies of the network. Thus, high-performance computing architectures on-chip must be employed. These architectures could be multi-core processors or graphics processing units (GPUs). These processor units consist of hundreds of smaller low-energy cores that are divided into groups, streaming multiprocessors [61]. The architecture enables parallel computation with relative energy efficiency compared to traditional processors. Using modern multi-core processors represents another appropriate approach for running embarrassingly parallel algorithms. Multi-core processors have become very popular in the last years, in part because modern computers contain several processing units or cores in a single chip [62]. Moreover, multi-core processors can execute several threads by core simultaneously (hyper-threading), turning these processors into a valuable and inexpensive approach to execute parallel programs. The union of biased-randomized algorithms and parallel architectures makes a perfect combination to obtain high-quality solutions in complex systems scenarios in the order of just a few seconds or even milliseconds.
In our approach (Algorithm 1), we have used a Multi-core shared memory programming paradigm. Thus, the algorithm executes asynchronous threads that contain as input parameters an independent instance of the BR-heuristic, a seed (pseudo-random number), and a list (solutionList). The list is a shared variable by all the threads, and each thread saves the best-found solution in one second using its instance of the BR-Algorithm in this list. After creating and running all the threads, the algorithm waits until all threads are executed. Notice that the algorithm ends its parallel execution at this point, and the shared variable solutionList contains all the solutions found by all the threads. Then, the program executes its last step sorting the list of solutions by reward and selecting the best solution (bestSolution).

Numerical Experiments
This section describes the numerical experiments we have carried out to test the concept of agile optimization when applied to the TOP. First, we have employed exact methods to solve the optimization problem. This initial experiment allows us to understand the possible limitations of these methods in terms of computational times and the size of the instances they can solve in practical applications. Then, we have employed our parallelized biased-randomized algorithm to generate solutions in real-time. Finally, these solutions have been compared with the optimal or near-optimal ones available in the scientific literature whenever possible.
Both solving approaches have been tested using the benchmark instances presented in [39], which are available in the repository http://www.mech.kuleuven.be/en/cib/ op/instances (accessed on 17 December 2021). This benchmark has been widely used in previous works to test the performance of algorithms aimed at solving the deterministic TOP. The benchmark comprises a total of 354 instances, which are divided into seven different sets ( Table 1). The number of nodes, the node locations, and the rewards are identical for all instances within a set. However, the maximum allowed driving range (t 0 ) and the number of vehicles vary between the instances. The sum of all customers' rewards is shown in column "Rewards" in Table 1. The number of vehicles varies between 2 and 4, and the driving range varies between the values tabulated in Table 1. Therefore, from one instance to another, the driving range is increased by the tabulated step. Each instance in a set has a nomenclature px.y.z, where x denotes the set number, y is the number of vehicles, and z symbolizes the maximum driving range.  The proposed mathematical model has been implemented using the IBM ILOG CPLEX Optimization Studio v12.6.2, and the computations were conducted on a PC with an Intel i5-6500 quad-core processor running at 3.6 GHz and with 20 GB of RAM. The CPLEX default options have been used in the experimentation, with the specification of the following settings: (i) a time limit of 3600 s, (ii) a relative gap of 0.01%, and (iii) a tolerance integrality of 0.001%. The computational results are reported in Table 2. Columns 1 and 5 identify the instance for sets p1 and p2, respectively. Columns 2 and 6 show the solution value obtained with CPLEX. The computational time (in seconds) requested to obtain these values is given in columns 3 and 7. Furthermore, the CPLEX final status indicates whether the solution is guaranteed to be optimal or not. In the latter case, the best-known solution (BKS) from the literature is provided.

Set (x) Nodes Reward Vehicles (y) Driving Range (z) Number of Instances
It is observed that in set p2 (instances with 21 nodes), optimal solutions can usually be achieved after a few seconds or minutes. However, in set p1 (instances with 33 nodes), only a few optimal solutions can be achieved, whereas others cannot reach the optimal solution even when employing up to 60 min of computation. It is noticed that some solutions differ significantly from the BKS, such as instances p1.2.r and p1.3.p. In conclusion, exact methods are not a feasible option to provide real-time solutions for large-sized instances with hundreds of nodes.

Solving the TOP with Our Agile Optimization Algorithm
The proposed heuristic has been implemented using Java SE 8.0 and tested on a workstation with a multi-core processor Intel Xeon E5-2650 v4 with 16 cores and 32 GB of RAM. Each instance is run during 1 s using a different number of threads (from 1 to 128) to evaluate the quality of the solution as the number of threads increases. Each thread was a different algorithm run, and each run used a different seed for the pseudo-random number generator. In this way, each thread explores a different path in the solution space-but always keeping the logic behind the constructive heuristic. Increasing the number of threads increases the total computing time (as more computing resources are running in parallel) but not the wall-clock time, which is still 1 s. To assess the effectiveness of the proposed approach, the obtained solutions have been compared against the BKS from the literature. In particular, the performance of the algorithm is measured using the current BKS reported by [51]. Tables 3 and 4 show instances and their found solutions. The BKS is tabulated as the obtained reward and the computational time-in seconds-to reach it. Our best solution (OBS) for a specific number of threads is tabulated. The computational time using 128 threads is only recorded in Tables 3 and 4. We have used this time because 128 threads found the best OBS among the different number of threads. The average percentage gap between the OBS for 128 threads and the BKS is calculated. Notice that increasing the number of used threads might find better solutions. For example, the solution found using 128 threads is better than the one found using 16 threads for instance p1.3.m (Table 3). These results are discussed in the next section.

Analysis of Results
The BKS is obtained in sets p1 and p2 at least in 75% of the instances (Figure 4). It is obtained even when using a reduced number of threads (between 1 and 8 in many cases). Set p3 achieved this outstanding performance in at least 50% of the instances, and sets p5, p6, and p7 in 25% of them. Figure 4 shows gaps for experiments with 128 threads. Set p4 deserves special attention due to its topology and size (number of nodes, node locations, and rewards). Indeed, it was the most challenging benchmark proposed by Chao et al. [39]. No matter the number of vehicles and the maximum driving range, the resulting gap percentage is higher than the rest of the sets, with a mean value close to 8%. It can also be observed that the larger the size of the instance, the larger the mean gap can be; instances in sets p4 and p7 have a larger percentage gap (8% and 5%, respectively). Nevertheless, it is also noticeable that the overall percentage gap across the 354 tested instances is just 3%, which is an outstanding result if one considers that it is obtained in less than one second (see Tables 3 and 4). In 54% of the instances, the BKS is achieved by using a reduced number of threads.
Regarding the computational times to reach our best solutions (OBS), Figure 5 depicts a boxplot comparing the average computational time-in seconds-invest by our approach in each set, with respect to the computational times used in [51] to reach the BKS. These times are very competitive compared to the average times required to obtain the BKS, approximately 36 s. Notice that by investing an average time of 0.53 s, we obtain an average gap of less than 2.60%, proving that our approach is highly competitive to be used under real-time scenarios. Notice that if we execute our approach serially, we would need, in the worst case, 128 s to reach the same results. Therefore, the benefits of using parallel techniques when dealing with real-time scenarios with a highly dynamic scenario are necessary. Finally, Figure 6 illustrates how our agile optimization approach can reduce the gap with respect to the BKS; increasing the number of parallel threads reduces the gap between found solutions and the BKS without increasing the wall-clock time, which is always limited to 1 s.

Conclusions
This paper discusses the need for 'agile' optimization in the context of data-driven smart cities, where unmanned aerial vehicles and self-driving vehicles might require solutions to complex problems in real-time (less than a second). The paper uses the team orienteering problem to illustrate these concepts. The team orienteering problem is an NPhard optimization problem, which challenges the capabilities of exact methods to provide optimal solutions in short computing times. For that reason, the paper proposes a greedy heuristic, which is then extended into a biased-randomized algorithm. This algorithm is then encapsulated into a parallelization framework, allowing running multiple threads in parallel and selecting the best solution. We tested our algorithm on a well-known set of benchmarks for the team orienteering problem. The experiments showed that high-quality solutions could be generated in a few seconds by our algorithm.
Our work could be extended by considering the dynamic environment of the problem. In the dynamic environment, the travel times and rewards change over time. Thus, the constructed routes might be changed during the execution to adapt these changes. The constructed routes and decisions are re-evaluated when a change is detected by sensors, in-route vehicles, video cameras, etc.