Hybrid Multiagent Collaboration for Time-Critical Tasks: A Mathematical Model and Heuristic Approach

: Principal–assistant agent teams are often employed to solve tasks in multiagent collaboration systems. Assistant agents attached to the principal agents are more ﬂexible for task execution and can assist them to complete tasks with complex constraints. However, how to employ principal–assistant agent teams to execute time-critical tasks considering the dependency between agents and the constraints among tasks is still a challenge so far. In this paper, we investigate the principal–assistant collaboration problem with deadlines, which is to allocate tasks to suitable principal–assistant teams and construct routes satisfying the temporal constraints. Two cases are considered in this paper, including single principal–assistant teams and multiple principal–assistant teams. The former is formally formulated in an arc-based integer linear programming model. We develop a hybrid combination algorithm for adapting larger scales, the idea of which is to ﬁnd an optimal combination of partial routes generated by heuristic methods. The latter is deﬁned in a path-based integer linear programming model, and a branch-and-price-based (BP-based) algorithm is proposed that introduces the number of assistant-accessible tasks surrounding a task to guide the route construction. Experimental results validate that the hybrid combination algorithm and the BP-based algorithm are superior to the benchmarks in terms of the number of served tasks and the running time.


Introduction
The hybrid principal-assistant collaboration architecture is often employed to improve the efficiency of task execution in a multiagent system [1,2], where the principal agents in a team are assisted by the assistant agents to complete the arrival tasks. Each assistant agent attaches to one principal agent, and is often more flexible for task execution (e.g., intelligent agents (robots) may be capable of executing tasks with special requirements that can hardly be achieved by humans). The introduction of assistant agents improves the flexibility of collaboration of agents in multiagent systems [1,3]. Actually, the principal-assistant collaboration problem can be considered as a new variety of the traveling salesman problem, concerning the cooperation of the principal agent routing and the assistant agent routing for tasks [4,5]. The principal-assistant systems are also employed in time-sensitive task scenarios [6], where the violation of temporal constraints will result in great losses in these missions.
Unfortunately, recent works [7][8][9] seldom address this issue. In this paper, we investigate the principal-assistant collaboration problem with deadlines, where each task is associated with an independent deadline and the principal-assistant teams need to find valid routes with temporal constraints. The problem is N P-hard because each deadline-TSP can be reduced to an instance of the principal-assistant collaboration problem with deadlines, where the endurance of assistant agents is zero. If and only if the optimal solution of the deadline-TSP can be found in polynomial time, the optimal solution of the principal-assistant collaboration problem with deadlines can be found in polynomial time. However, Farbstein and Levin [10] proved that the deadline-TSP is N P-hard in the graph metrics or tree metrics, and thus the principal-assistant collaboration problem with deadlines is N P-hard. The challenges of the principal-assistant collaboration problem with deadlines involve routing and cooperation [4,7]. First, the principal agent and the assistant agent need to cooperate to serve more vertices within the temporal constraints. Second, the combination of several principal-assistant routes is involved in complex allocations of tasks.
To investigate the problem thoroughly, we consider two scenarios involving different principal-assistant agent teams: the 1-principal-1-assistant scenario and the m-principalu-assistant scenario. In the 1-principal-1-assistant scenario, the principal-assistant team is composed of a principal agent and an assistant agent. We pay attention to the inner coordination between the principal agent and the assistant agent. In the m-principal-uassistant scenario, there are m > 1 principal-assistant teams, each of which is composed of one principal agent and u > 1 assistant agents. We focus on studying the source allocation and the cooperation among different team routes. Existing algorithms [4,[7][8][9] cannot be directly employed in these scenarios because their objective is to serve all tasks and minimize the time cost of the entire route, and the arrival time cannot be restricted to each vertex in the routes, so that the arrival time to each vertex may violate certain deadlines. The unmodified application of those algorithms will result in delays, which are not allowed in real-time systems. For the 1-principal-1-assistant scenario, a hybrid combination algorithm is proposed, which finds the optimal combination of partial routes generated by three methods, including a heuristic subroutine, an iterated local search subroutine, and a simulated annealing subroutine. The partial routes prefer to insert the vertex with a smaller deadline and higher profit available per unit time cost in the subsets, which are divided by the time cost of each vertex to the initial vertex. For the m-principal-uassistant scenario, a BP-based algorithm is proposed. The outline of this algorithm is to use the branch-and-price algorithm [11,12] to search for the optimal combination on the current principal-assistant route set and use the proposed density-based construction to enlarge the principal-assistant route set. The density-based construction evaluates the density of a vertex based on the number of residual assistant agent-accessible vertices surrounding it and the time cost between them and then inserts the vertex with the higher density and smaller deadline as the preference.
In the performance validation, the number of served vertices and the running time of algorithms are regarded as the evaluation criteria. In the 1-principal-1-assistant scenario, the hybrid combination algorithm is compared with an exact integer programming method, two adaptations based on the algorithms in [7,13], and some combinations of inner procedures. The hybrid combination algorithm proposed in this paper is only inferior to the exact method, and its running time is far less than that of the exact method when the graph size is less than 15. When the graph size ranges from 20 to 100, the hybrid combination algorithm serves the most vertices and costs a reasonable running time among the compared algorithms. In the m-principal-u-assistant scenario, the BP-based algorithm embedded with the density-based construction method is compared with two greedy algorithms that employ the line route construction method [14] or a local iterated search method [15] as the route construction and two BP-based algorithms embedded with the line route construction and local iterated search method. The BP-based algorithm embedded with the density-based construction method outperforms the other benchmark algorithms in terms of the number of served vertices and the running time.
The rest of this paper is organized as follows. In Section 2, the related work on the subject is reviewed. In Section 3, the 1-principal-1-assistant scenario is modeled and the hybrid combination algorithm is described. In Section 4, the m-principal-u-assistant scenario is modeled and the BP-based algorithm embedded with the density-based construction method is described. In Section 5, the experimental results of evaluating these algorithms are presented. In Section 6, the conclusions and future work are described.

The Fundamental Principal-Assistant Systems
The fundamental hybrid principal-assistant systems proposed by Murray and Chu [7] introduce the Flying Sidekick TSP (FSTSP) model which is closely related to the model proposed in this paper, where the principal agent travels on the graph and repeatedly releases and retrieves the assistant agent. The basic idea of the algorithm in [7] for the FSTSP is to construct an initial principal agent path first by some classic TSP algorithms, and then select a vertex in the route and insert it into another place, or command the assistant agent to serve it so that the reduction on the total time cost is at the maximum. The algorithm then proposed by Agatz et al. [4] constructs a principal agent-only path by the TSP algorithm and then uses a dynamic programming algorithm to split the principal agent path into a principal agent route and an assistant agent route. Moreover, Ha et al. [8] propose a Greedy Randomized Adaptive Search Procedure (GRASP) for the principalassistant to find a min-cost team route. Overall, these studies ignore the crucial deadlines factor in the principal-assistant systems, and thus, the task executing time cannot be ensured.

The Principal-Assistant Systems with Time Windows
There are only a small number of studies on hybrid principal-assistant systems with time windows. Sawadsitang et al. [9] improve the PDSTSP by introducing uncertainty regarding the takeoff and breakdown of assistant agents and time windows of tasks (GADOP). However, the time windows are coarse-grained, concretely. They provide a decomposition method to solve the problem. Differing from the GADOP problem, each task in this paper has an independent deadline. If the method for GADOP is applied directly, independent deadlines need to be clustered into a constant number of demands, which may give rise to a service delay duration execution, which is impermissible in our hard real-time system. To sum up, the collaborative relationship between the principal agent and the assistant agent is not well characterized, causing inefficiencies in the system; in their scenarios, the assistant agent starts from the initial vertex and returns to it, so the principal agent cannot make use of the assistant agent to improve the performance of the task execution.

The Deadline-TSPs
The deadline-TSP is a different kind of problem from the principal-assistant collaboration problem with deadlines. In deadline-TSPs, a single principal agent attempts to serve the vertices before their deadlines and returns to the initial vertex. Bansal et al. [13] propose an O(log n)-approximation algorithm for the deadline-TSP. The time complexity of this approximation algorithm is O(D 2 max βn 10 ), where D max is the maximum deadline, β is the total price of the graph, and n is the number of vertices. Farbstein et al. [10] solve the deadline-TSP with contiguous deadlines. They propose an O 1 + , α 1+ -approximation algorithm. The algorithm discretizes contiguous deadlines and uses an O(α)-approximation algorithm to solve the resulting KDTSP, which is a deadline-TSP with k deadlines. The algorithm ensures that the principal agent exceeds the deadlines by a factor of 1 + with 0 < < 1 and serves at least α/(1 + ) of the number of tasks in the optimum. Its time complexity is O(n 2k ∆ k ), where ∆ is the maximum distance in the metric space. Although these approximation algorithms run in polynomial time, they cost too much time in practice. In summary, the existing algorithms cannot be applied in the principal-assistant collaboration problem with deadlines. In this paper, a hybrid combination algorithm and a BP-based algorithm are proposed to address two scenarios, including the 1-principal-1-assistant scenario and the m-principal-u-assistant scenario.

1-Principal-1-Assistant Scenario
In this section, the principal-assistant collaboration problem with deadlines, where one principal agent is equipped with one assistant agent, is defined formally, and a heuristic algorithm is proposed.

Problem Formalization
Let V = {v i | 0 ≤ i ≤ n} be the set of vertices, where v 0 is the location of the initial vertex, and the others denote the locations of tasks, each of which contains only one task. Each vertex can be visited several times, but the task at the vertex can be served at most once. The task at vertex v i ∈ V\{v 0 } has an independent deadline d i , which means that the task at v i needs to be served before d i . Additionally, the initial vertex at v 0 is associated with a deadline d 0 , which means that the principal-assistant team must return to the initial vertex before d 0 .
In such a principal-assistant collaboration system, the principal agent equipped with an assistant agent starts from the initial vertex and returns to it before d 0 , releasing and retrieving the assistant agent to serve other tasks repeatedly. The principal agent works in a connected undirected graph G 1 = (V, M 1 ), where M 1 denotes the set of the edges between the vertices in V. Let m 1 (v i , v j ) represent the minimum time cost between v i ∈ V and v j ∈ V in G 1 . Without loss of generality, it is assumed that d i ≥ m 1 (v 0 , v i ), and the assistant agent is often assumed to move freely from a vertex to another and spend less time than the principal agent [4]. Meanwhile, it works in a fully connected undirected graph G 2 = (V, M 2 ), where M 2 denotes the set of edges between the vertices in V. Let The other assumptions about the assistant agent are as follows: (1) the assistant agent has limited endurance η; (2) the assistant agent needs to be released and retrieved at a different vertex, except when the assistant agent is released and retrieved at the initial vertex v 0 ; and the assistant agent cannot reach other places except the location of the tasks being served; (3) the assistant agent needs to reach the vertex where it is retrieved before the principal agent arrives.
The principal-assistant route is denoted by a route r = {principal r , assistant r }, where principal r is the principal agent route and assistant r is the assistant agent route. Let N(r) denote the number of served vertices in the route r. The principal agent route is denoted by principal r = {p i }, where the principal agent departs from p 0 = v 0 , moves from p i to p i+1 , and serves p i . The number of vertices in principal r is denoted by |principal r | = g, and the number of the served vertices in principal r is g − 1. The assistant agent route is denoted by assistant r = {(p i , p k , p j )}, where a tuple (p i , p j , p k ) indicates that the assistant agent is released at p i ∈ principal r (called the released vertex) and retrieved at p j ∈ principal r (called the retrieved vertex), serving p k / ∈ principal r . The problem can be then modeled as follows, and Table 1 provides the summaries of the parameters and variable notations.
The objective function (1) seeks to maximize the number of served tasks. Constraints (2)-(4) establish the relationship between s i , e i , and Constraints on the principal agent routes. The principal agent travels in the graph, holding constraints (5)- (14). Constraints (5)-(7) ensure that the principal agent departs from the initial vertex v 0 at time 0 and finally returns to it before d 0 . Constraint (8) indicates that the principal agent serves at most one vertex at any time, and constraint (9) indicates that the principal agent serves a task at a vertex except initial vertex v 0 at most one time. Constraint (10) states that the moving distance of the principal agent from v i to v j is not less than the minimum distance between them. Constraints (11)-(14) link z i,j and x i,t , x j,t . Constraint (11) ensures z i,j cannot be 1 if the principal agent visits v j after its deadline. Constraints (12) and (13) ensure that the principal agent serves any tasks at most once. Constraint (14) ensures that if z i,j = 1, the moving distance between any two vertices is not less than the minimum distance between them.
Constraints on the assistant agent routes. Constraints (15)- (27) are the constraints on the assistant agent routes. Constraint (15) ensures the assistant agent serves tasks that are not served by the principal agent. Constraint (16) guarantees that the assistant agent will be retrieved by the principal agent and Constraint (17) guarantees that the assistant agent moves in the range. Constraints (18)-(20) ensure the assistant agent serves any task at a vertex at most once. Constraints (21) and (22) ensure that the assistant agent serves the task vertex v k before its deadline and arrives at retrieved vertex v j before the principal agent arrives. Constraints (23)- (27) link b k,t and y i,k,j . Constraint (23) requires the assistant agent executes at most one task at any time. Constraint (24) ensures b k,t = 0, 0 ≤ t ≤ d 0 for unserved vertex v k . Constraints (25)- (27) ensure that if the assistant agent serves v k , departing from the principal agent at v i and returning to the principal agent at v j , b k,t = 1, s i ≤ t < e j ; otherwise, b k,t = 0. Constraint (28) indicates that v i can only be served by the principal agent and the assistant agent once.

Hybrid Combination Algorithm
The principal-assistant collaboration problem with deadlines in the 1-principal-1assistant scenario is N P-hard, and the proof is described as follows. Theorem 1. The principal-assistant collaboration problem with deadlines in the 1-principal-1assistant scenario is N P-hard.
Proof of Theorem 1. Let SP denote the principal-assistant collaboration problem with deadlines in the 1-principal-1-assistant scenario. If a principal-assistant route is r for SP, the number of served vertices in route r is in polynomial time; hence, SP is clearly in N P.
Let CP denote the deadline-TSP [10]. Then, a reduction from the CP to an instance of the SP is provided. Each CP can be reduced to a corresponding instance of the SP with η = 0. If and only if the optimum of the CP can be found in polynomial time can the optimum of the corresponding SP with η = 0 be found in polynomial time. However, the CP is proved to be N P-hard [10]. Therefore, the principal-assistant collaboration problem with deadlines in the 1-principal-1-assistant scenario is N P-hard.
According to the computational complexity analyzed in Theorem 1, this is an N Phard problem, so we cannot obtain an optimal solution in polynomial time [16]. To obtain a suitable result in a reasonable time, we propose a heuristic algorithm. The basic idea of the algorithm is to divide the vertices into different subsets and find the optimal combination of the partial routes based on the subsets. The division will classify the vertices into different subsets based on their minimum time cost to the initial vertex v 0 . The valid partial routes are constructed by the process CADROUTES, which includes three procedures, namely a heuristic method CONSTRUCT, an iterated local search ITERLOCALSERACH and a simulated annealing procedure SIMANNEAL.
The division and combination are described in Section 3.3 and the construction of the partial route is described in Section 3.4 and Section 3.5. The time complexity of the hybrid combination algorithm is analyzed in Section 3.6.

Division and Combination
The process employs a fact that, in a principal agent route principal r , there exists at least one vertex called the turn vertex p turn = arg max p i ∈principal r m 1 (p i , v 0 ), and the principal agent route principal r can be split into two sequences at the turn vertex, including the {p 0 , p 1 , ..., p turn } and the {p turn , ..., p g−2 , p g−1 }. In Figure 1, p 5 is the turn vertex and the principal agent route is split into {p 0 , p 1 , p 2 , p 3 , p 4 , p 5 } and {p 5 , p 6 , p 7 , p 8 , p 9 }. The partial routes contain monotonic parts and wavy parts [17]. The vertices of the monotonic parts are sorted in some order and the vertices of the wavy parts are disordered. Most of the vertices of the former are in monotonic parts, sorted in the non-decreasing order of the time cost to v 0 , e.g., {p 2 , p 3 , p 4 , p 5 }. However, there exist some vertices in wavy parts, where a vertex may be closer to v 0 than its previous vertex, e.g., {p 1 , p 2 }. In general, the former sequence of the principal agent route {p 0 , p 1 , p 2 , p 3 , p 4 , p 5 } is the process of leaving from the initial vertex. Similarly, the latter sequence of the principal agent route {p 5 , p 6 , p 7 , p 8 , p 9 } also has monotonic parts and wavy parts, but it is a return process, where the principal-assistant team returns to the initial vertex. The divisions and combinations are designed based on the feature, which is detailed in Algorithm 1.
In the division step, the turn vertex and the arrival time to it are selected first. In line 2, each vertex except the initial vertex can be regarded as the turn vertex v turn .
, v i ∈ U} denotes the set of candidate vertices for the principal agent. In line 6, each time t between t min and t max will be regarded as the arrival time to v turn , the time budget of the leave process is t and the time budget of the return process is d 0 − t. Given the different subsets and time budgets, the leave processes and return processes are generated by CADROUTES. The LR and RR represent the set of candidate leave processes and the set of candidate return processes, respectively. In line 11, the optimal principal-assistant route is found. In line 13, the optimal combination of a leaving process from LR and a return process from RR is selected as the output.

Heuristic Construction
The procedure CADROUTES is the core of the algorithm. It provides three partial routes, i.e., HP, ILS, and SA, which are generated by three methods, including the heuristic CONSTRUCT, an iterated local search ITERLOCALSERACH, and a simulated annealing procedure, respectively.
Heuristics are effective methods to address routing problems [18], so that the procedure CONSTRUCT is designed to find a valid solution. Before describing the method, the features of the routes for the principal-assistant team are provided first. The route r = {principal r , assistant r } can be merged into one structure, as in Figure 2, and the route can be rewritten as r = {r i }, where r i is a partial route. The partial routes can be classified into three types of components, including short line segments, simple triangles, and complex triangles. As Figure 2 shows, in a short line segment, the principal agent moves from one vertex to the other directly, serving one vertex, such as the principal agent route {v 0 , v 6 } and {v 3 , v 9 } without the assistant agent route. In a simple triangle such as the principal agent route {v 6 , v 3 } with the assistant agent route {v 6 , v 5 , v 3 }, the principal agent departs from the released vertex, releasing the assistant agent, and moves to retrieved vertex directly, retrieving the assistant agent. The principal-assistant serves two vertices. In a complex triangle, the principal agent moves from the released vertex to the retrieved vertex through other vertices, such as the principal agent route {v 8 , v 1 , v 2 , v 4 } with the assistant agent route {v 8 , v 7 , v 4 }. It is intuitive to select the partial route that serves the most tasks locally, but a smaller time cost may improve the opportunity to serve more tasks in the future. To balance these two factors, the profit per unit time cost φ(r) is defined in Definition 1 to help select the partial routes. At every iteration, the heuristic selects a valid partial route with the highest profit available per unit cost.
where cost(principal r ) is the traveling time of the principal agent in principal r , and N(r) is the number of tasks served by route r.
It can be derived that at each iteration, there are O(n) short line segments, O(n 2 ) simple triangles, and O(n n ) complex triangles to be selected. The solution space of complex triangles is too large to search quickly. However, a complex triangle {principal r , assistant r } can be approximated by a corresponding long line segment {principal r , {}}. It can be derived that where > 0 is a real number. A larger cost(principal r ) indicates a smaller gap between {principal r , assistant r } and {principal r , {}}. Because a long line segment can be split into several short line segments, to accelerate the search process, the heuristic only selects partial routes from short line segments and simple triangles.
The procedure CONSTRUCT is outlined in Algorithm 2. Based on the above observation, it searches for the partial routes r with the highest φ(r ) from the short lines and simple triangles and inserts them into the original routes iteratively, until there are no valid partial routes. In line 6, all the short line segments and simple triangles are found and inserted into the set R. From line 8 to line 11, the algorithm selects the partial route r with the maximum φ(r ). If more than one partial route has the same φ(·), the partial route with the minimum d i , which is the deadline of the end vertex of the partial route, is selected.

Theorem 2. The Construct Algorithm proposed in Algorithm 2 is an O( l min
2·l max )-approximation algorithm, where l min is the minimum distance between any two vertices in G 1 and l max is the maximum distance between any two vertices in G 1 .

Proof of Theorem 2.
To prove the approximation ratio, the upper bound of the optimum and the lower bound of the Construct Algorithm need to be provided firstly. Assume that the optimum r serves q tasks and the route r pair created by Construct Algorithm serves q tasks. In the optimal case, r is totally constituted by simple triangles, each of which serves two tasks and costs l min . Hence, for the optimum, On the other hand, there exists a lower bound of its profit per unit cost. Assume r = (r 0 , r 1 , · · · ) and each component r i serves q i vertices and costs cost(r i ) time steps. It can be derived that q cost(r ) = ∑ r i ∈r q i ∑ r i ∈r cost(r i ) ≥ min q cost(r ) is greater than or equal to the minimum local profit per unit cost. In the worst case, the component departs from the start vertex and directly moves to serve the farthest vertex and returns, costing l max . Thus, The approximate ratio can be calculated by Thus, the Construct Algorithm proposed in Algorithm 2 is an O( l min 2·l max )-approximation algorithm.

Iterated Local Search and Simulated Annealing
It is possible for the procedure CONSTRUCT to be trapped in local optimum; thus, the iterated local search [15] and the simulated annealing [19,20] are incorporated to improve the opportunity of escaping from local optimum. The two methods contain some basic operations on the principal-assistant routes. There are six basic operations that refer to the operations mentioned by Guanwan et al. [21], as shown in Table 2. The operation Swap and Replace ensures that the new principal-assistant route serves at least as many tasks as the old one does. The operation Insert inserts an unserved task into the principal agent route and deletes the resulting infeasible partial routes of the assistant agent. The operation Subjoin inserts the partial assistant agent route with the minimum time cost. If more than one assistant agent route has the same time cost, it selects the assistant agent route with the minimum d i , which is the deadline of the served vertex v i . The operation Removet deletes one vertex in the principal agent route and the resulting invalid assistant agent routes. The operation Removeu is a safe operation and decreases the quality of the current solution but may lead to a better solution in the following process.

Swap
Exchange two vertices in the principal agent route. Replace Replace a served vertex in the principal agent route with an unserved one. Insert Insert one unserved vertex into the principal agent route.

Subjoin
Plan partial assistant agent routes on the current principal agent route, and adopt the partial routes with the shortest time.

Removet
Remove a vertex from the principal agent route.

Removeu
Remove a triad from the assistant agent route.
The procedure ITERLOCALSERACH proposed in Algorithm 3 is a metaheuristic algorithm, which starts from the initial solution HP created by the CONSTRUCT and adopts four operations, i.e., Swap, Replace, Insert and Subjoin, to orderly search the neighbor solutions. The operations will be used iteratively. At Step 1, Swap is used to operate two vertices in the principal agent route, except the start vertex and end vertex. If the Swap operation reduces the time cost, it is adopted to update the route. At Step 2, Replace is used to operate any vertex except the start vertex and end vertex. If it is valid, the operation will be accepted, and the replaced position will not be operated again. Replace is similar to Swap, as it also updates the route and terminates when there is no valid operation. At Step 3, Insert is used to fully utilize the budget. One valid vertex is inserted into the first available position, and the principal agent route is updated. Then, Insert continues to be used in the new route. At Step 4, Subjoin is used to plan new partial routes for the assistant agent. The valid simple or complex triangles that serve other tasks are sorted in the increasing order of their time cost. The triangle with the smaller time cost is subjoined preferentially. The ITERLOCALSERACH will execute until the principal-assistant route cannot be improved or the number of iterations reaches the upper limit cntlimit.

End Function
The route ILS generated by the ITERLOCALSERACH is not necessarily better than the HP but provides another possible partial principal-assistant route that can be combined with others to generate a better entire partial route.
Simulated annealing can converge to the global optimum in some probabilistic sense [19] and has been applied in many optimization problems [18]. Simulated annealing is different from the iterated local search. It has an environment temperature which will decrease as the procedure runs; when the temperature is smaller than a minimum temperature, the procedure terminates. At the temperature decreasing moment, it generates a new partial route; the new one will be accepted and will replace the current principal-assistant route if the new one is better; otherwise, the new one will be accepted as the current one with a certain probability, which is related to the current temperature. Simulated annealing includes the following three components: neighbor generation, acceptance probability, and cooling schedule. The neighbor generation is used to generate a neighbor principalassistant route based on the current partial route. In every generation, a valid operation is selected randomly from all six basic operations, and a new route is created by applying the operation on the current solution. After the generation of the new route, the method needs to decide to accept one route between the original one and the new one. Assume that the gap λ is the number tasks served by the new route minus the number of tasks served by the original route. If λ is greater than 0, the new route will be accepted and is regarded as the current solution. Otherwise, the new route will be accepted with an acceptance probability. The acceptance probability adopts the Metropolis principle [22] and is set as follows: where Temp denotes the current temperature. The procedure will terminate once the temperature is less than the minimum temperature Temp min . The Temp is decreased by multiplying a positive factor α < 1 at each iteration. In some previous papers, α retains a certain value for different instances, but it is not suitable in its own scenario. For example, to create a route r 1 with a larger budget and another route r 2 with a smaller budget, if simulated annealing uses the same α, they will have similar time costs. However, a smaller budget indicates less valid combinations of routes, and many iterations in the construction of r 2 are in vain. To accelerate the route construction with different budgets, let where 0 < const < 1 is a constant real value, and bdg is the rest of the budget. Temp decreases faster if the budget is small.

Analysis of Time Complexity
In the hybrid combination algorithm, the division process selects the turn vertices and their arrival time; the combination assembles a constant number of partial routes. The CADROUTES subroutine generates new partial principal-assistant routes.

Assumption and Model
The m-principal-u-assistant scenario, where each of m principal agents is equipped with u assistant agents, is defined in this section. Most of the assumptions held in this problem are the same as the descriptions in Section 3, except that there are m principalassistant teams to complete the tasks. Each principal-assistant team has one principal agent and u assistant agents.
Due to the introduction of multiple assistant agents, the arc-based model becomes overly complicated. For conciseness, the problem is modeled as a path-based integer linear programming based on [14]. The notations of the parameters and variables are displayed in Table 3. The problem can be modeled as follows. Table 3. Parameters and variables for the m-principal-u-assistant scenario.

Notation
Description The set of all available routes. c r Parameter that indicates the number of tasks served in the route r. a i,r Parameter that equals one when one task v i is served in route r, and zero otherwise. w r Binary decision variable equal to one if route r is used, and zero otherwise.
The objective (37) seeks an optimal combination of routes that serves the most tasks. Constraint (38) requires that each task is served at most once. Constraint (39) requires the number of principal-assistant teams to be less than the maximum capacity m. Constraint (40) is the definition of the decision variables w r .

Branch-and-Price Algorithm
The master problem (37) is hard to solve, because the set Ω is an exceptionally large set. The branch-and-price algorithm [11,12] is a practical method to decompose such a problem, which embeds column generation into the branch-and-bound algorithm.

Linear Relaxation and Decomposition
In the branch-and-price algorithm, the master problem is converted from a 0-1 integer linear problem into a general linear problem, as follows: After the linear relaxation, the difficulty of solving the objective (37) has been decreased slightly, but the set Ω is still too large to enumerate completely. However, a fact should be exploited, i.e., that the ratio of the number of actually used routes to the number of all valid routes is exceedingly small, so the final solution can be constructed from a smaller route set Ω ⊆ Ω. The column generation formulates a restricted master problem by replacing the set Ω with the set Ω as follows: The initial set Ω can be enlarged by adding new routes. According to the duality theory, a better route r satisfies the following condition: where δ i is the dual variable of the current optimal solution for each constraint in Constraint (44), and γ is the dual variable for Constraint (45). Furthermore, if no route with c r > 0 exists, the global optimum is found. The following process is to find and price an undiscovered route r with c r = max r∈Ω−Ω c r , which is called the pricing subproblem and is detailed in the next section.

Pricing Subproblem
The pricing subproblem seeks better valid routes. The previous studies proposed many exact or heuristic algorithms to solve the routing problems [12,14,23].
Based on the definition of c r , it can be rewritten as follows: Associated with Equation (47), the following can be deduced: Assume that V c denotes the set of vertices involved in Ω and V u = V − V c . From the analysis of the restricted master problem, it is deduced that δ i = 0, ∀v i ∈ V u and δ i > 0,∀v i ∈ V c . The route that serves the most v i ∈ V u vertices is the optimal choice in terms of the mathematical formulation. However, temporal constraints restrict the number of vertices in V u . Additionally, the original allocation of the vertices may not be optimal; the new routes can include some involved vertices so that to replace the old one creates a better entire result. The routing process concerns vertices in the set V c and V u .

Principal-Assistant Route Construction
The outline of the routing algorithm is selecting different subsets of the vertices and constructing partial routes on them, which includes the randomized selection procedure described in Section 4.3.1, the principal agent route construction described in Section 4.3.2, and the assistant agent route construction described in Section 4.3.3. The entire algorithm is described in Section 4.3.4.

Randomized Selection
The randomized selection of vertices is described in Algorithm 4. The vertex v i is selected with a probability prob i , assuming that Ω v i ∈ Ω denotes the set of routes that contain v i ∈ V. For vertex v i ∈ V c , a higher ratio |Ω v i |/|Ω | indicates that we know more about vertex v i than the other vertices in V c . To obtain more information about the whole problem, v i ∈ V c with a lower |Ω v i |/|Ω | should be selected with a higher probability. The vertex v i ∈ V c is selected with the following probability: For the vertices in V u , each of them is selected with a probability as follows: which indicates that before being explored, they all have equal values. Additionally, the expectation of the number of vertices to be selected is equal to 1, which ensures that all the uninvolved vertices will be included after several selections.

Principal Agent Route Construction
The construction of the principal agent routes with temporal constraints is a classic problem, called the orienteering problem with time windows (OPTW) [21,24], where a traveler needs to serve vertices during their time windows and then return to the initial vertex. Two efficient algorithms can be used in the principal agent route construction, including the line construction [21] and an iterated local search algorithm [15].
The outline of the line construction is to insert an unserved vertex between two vertices in a valid route iteratively and ensure that the resulting route is also valid until no vertex can be inserted. If there is more than one option, the operation adding the minimum time cost will be selected. The iterated local search algorithm (ILS) is a two-step process. First, it creates a valid route by line construction, and then the perturbation and local search techniques are used to improve the route within a certain iteration. Two algorithms will be embedded into our integrated algorithm individually, and their performances will be evaluated.
The above two methods outperform the other routing algorithms [23] in the principal agent route construction, but in the construction of the principal-assistant routes, they ignore the influence of the assistant agent routes and result in poor performance in some situations. An example is displayed in Figure 3. In Figure 3, the line construction creates principal r = {v 0 , v 3 , v 0 }, and the assistant agent routes assistant 1 r , assistant 2 r are empty. However, there is an obvious better principal-assistant route, principal r = {v 0 , v 2 , v 0 }, It demonstrates a phenomenon that if a principal agent travels to a vertex surrounded by some assistant agent-accessible tasks, it is possible to serve tasks then directly move to a task costing less time. Inspired by the influence of the assistant agents, the density of the vertex is defined in Definition 2. The density i of v i is influenced by the number of assistant agent-accessible tasks and their distances to v i in graph G 2 . The distances are squared to improve their weight; the vertex with the most assistant agent-accessible tasks located in an adjacent area is preferred. Definition 2. The density of the task v i is as follows: The density-based construction algorithm is designed based on the line construction and the concept of the density of a task, which is shown in Algorithm 5. Before using line construction, all the provided tasks are sorted in ascending order based on the density of each task, and a factor 0.0 < ζ < 1.0 decides how many tasks have high priority to be inserted in the principal agent route. Assume that V is the sorted set of tasks; first, |V | · ζ tasks in V have high priority. The tasks with high priority will be stored in order in list H, and the rest of the tasks with low priority will be stored in order in L. The tasks in H will be inserted into principal agent route principal r by the line construction lineInsert(principal, S), until there is no vertex that can be inserted.

Algorithm 5: Density-Based Algorithm
Input: The set of vertices V ⊆ V, the set of deadlines D, a parameter 0.0 < ζ < 1.0, the start vertex v s , the end vertex v e Output: A principal agent route principal r 1 principal r ← {v s , v e }; 2 Sort vertices in V in the descending order based on the density V i , vertices with the same density are sorted in the ascending order based on the deadline d i , Then, the tasks in L will be considered. In the experiments, a suitable ζ can improve the performance, and 0.3 is the best value in our experiments. For example, in Figure 3, density 1 = 2 1 2 ·2 + 0 2 2 ·2 = 1.0, density 2 = 3 1 2 ·2 + 0 2 2 ·2 = 1.5, density 3 = 1 1 2 ·2 + 0 2 2 ·2 = 0.5, and if ζ = 0.3, then H = {v 2 }, L = {v 1 , v 3 }, and the principal-assistant route is which is the optimum.

Assistant Agent Route Construction
The assistant agent route construction algorithm is displayed in Algorithm 6. In the beginning, all the valid partial assistant agent routes are collected from Line 2 to Line 5. In Line 6, the partial assistant agent route (v i , v k , v j ) in set Cand is sorted in descending order based on the time cost cost(v i , v j ), and the routes with the same time cost are sorted in ascending order based on the deadline d k . The partial assistant agent routes are inserted into the suitable places of the initial assistant agent routes in order.

Integrated BP-Based Algorithm
The integrated BP-based algorithm is detailed in Algorithm 7. The main parts of it are described in the previous sections. The framework of the column generation is described in Section 4.2.1, and the principal agent route construction and assistant agent route construction are described in Section 4.3.2 and Section 4.3.3, respectively.

Algorithm 6: Assistant Agent Ranking Algorithm
Input: A principal agent route principal r , V u Output: u assistant agent routes Additionally, the actual moving distance of the assistant agent in the sortie is an important element. As prescribed, the duration of the assistant agent is η. In previous studies, the assistant agent is assumed to take full use of its power and serve all accessible tasks. However, it is not a suitable policy in practice. An example is displayed in Figure 4. In Figure 4, the line construction creates principal agent route principal 1 = {v 0 , v 3 , v 0 }, and if the full duration is used, the assistant agent routes will be assistantagent 1 1 = {(v 3 , v 2 , v 0 )} and assistantagent 2 1 = ∅. The other principal-assistant team cannot serve any other vertices. However, if the actual duration is reduced to 2, two principal-assistant routes can be created, including a principal agent route principal 1 = {v 0 , v 3 , v 0 } and two empty assistant agent routes assistantagent 1 1 = ∅, assistantagent 2 1 = ∅, and another principal agent route principal 2 = {v 0 , v 2 , v 0 } and two assistant agent routes assistantagent 1 2 The latter solution is better than the former one. To search for better principal-assistant routes, in Line 2 to Line 9, different actual distances are used to construct principal-assistant routes, and in Line 16, a random actual duration is generated for each principal-assistant route construction.  All the principal-assistant routes will be examined in the ideal environment, but an overly long execution time is impractical. Hence, an iteration upper bound cntlimit is provided to control the execution time.

Experimental Setup and Evaluation Criteria
The computations are performed on a PC with a Windows 10 OS, 16 GB RAM, and i7-7700 4.2 GHz CPU in Python 3.6. CPLEX (version 12.6) is used to solve all the integer linear programming.
The experiments are conducted on random graphs generated based on the Erdos-Rényi model [25], where all the distance and time costs are discretized into integer numbers. The random graphs ensure that the distance between two adjacent vertices varies from 1 to 10 randomly. In G 1 , each vertex is connected to the other with the probability 6/|V|. The d 0 is twice the diameter of the graph and d i , v i = v 0 is a random variable greater than m 1 (v i , v 0 ) and less than d 0 . In G 2 , let m 2 (v i , v j ) be a random variable less than or equal to m 1 (v i , v j ) and greater than 1. The endurance of the assistant agent η is twice the median of the distances between the vertices in G 2 .
The evaluation criteria include the number of served vertices and the running time of algorithms. For each graph size n, 100 instances are generated; the average values of the results will be regarded as the performance of algorithms.

Compared Algorithms
The hybrid combination algorithm is denoted by HC, and in the simulated annealing, const = 0.96, Temp = 10,000, and Temp min = 100 according to the preliminary finding.
The HC is compared to seven algorithms. The ILP denotes the method that solves the integer linear programming method by CPLEX directly. C1 denotes an adaptation of the algorithm proposed by Murray and Chu [7]. The outline of C1 is to construct an initial principal agent route serving all tasks by the saving heuristic proposed by Clarke [26] and then take out a vertex from the principal agent route, which saves the most time, and insert it into other principal agent route or let the assistant agent serve it iteratively until there is no valid vertex to select. C2 is the approximation algorithm for the deadline-TSP without assistant agent proposed by Bansal et al. [13]. The C2+Subjoin uses the operation Subjoin to append the assistant agent route to the principal agent route generated by C2. The CST employs the procedure CONSTURCT on the whole problem directly. CST+CB combines the procedure CONSTURCT and the process of division and combination. The CSC combines the procedure CONSTURCT and the operation Subjoin.

Computational Performance
The number of served vertices and the running time of algorithms are displayed in Figure 5. In Figure 5a,b, until the graph size n = 15, the ILP is the optimal method, but its running time exceeds 790 seconds. Because the running time of ILP will grow exponentially as the graph size n grows larger, the ILP is not a suitable method when n > 15. When the graph size n ranges from 16 to 20, except for C1 and C2, the other five algorithms are close to each other. However, the running times of C2 and C2+Subjoin exceed 500 s when n = 19. As abovementioned, the time complexities of C2 and C2+Subjoin are greater than O(n 10 ), and their actual running times are also extremely large as the graph size n > 19. Hence, ILP and C2+Subjoin can be applied in small graphs but are not suitable in the larger graphs due to their running time. Additionally, HC, CST, CST+CB and CSC need to be employed in larger graphs to evaluate their performances. As shown in Figure 5a, the HC serves the most vertices, the CSC is only inferior to the HC, and the CST and the CST+CB serve far fewer vertices than they do. In Figure 5c, the HC costs the most time in each graph size, and when n = 100, its running time approaches 900 s. The CSC, CST and CST+CB cost less time, even if n = 100, their running time is less than 10 seconds. As a whole, when the graph size ranges from 30 to 100, the HC is the optimal choice, which although costs the most time, serves most vertices.

Compared Algorithms
The proposed algorithm, called IP-D, combines the BP-based algorithm and the density-based construction. The IP-D is compared with four algorithms. The GL denotes a greedy adaption of line construction [27], which iteratively constructs the principalassistant routes by line construction and the operation Subjoin based on the residual vertices. The GI denotes a greedy adaption of the line construction [15], which iteratively constructs the principal-assistant routes by the iterated local search construction and the operation Subjoin based on residual vertices. The IP-L combines the BP-based algorithm and the line construction, and the IP-I combines the BP-based algorithm and the iterated local search construction.

Computational Performance
There are three concrete scenarios, including the 3-principal-2-assistant scenario, the 3-principal-3-assistant scenario, and the 4-principal-2-assistant scenario. The results are shown in Figure 6. In the 3-principal-2-assistant scenario, as shown in Figure 6a,c, GL, IP-L, IP-I, and IP-D share similar numbers of served vertices, but GI serves far less vertices than they do. The GL, GI, IP-L, and IP-D cost time less than 40 seconds, even if n = 20, but the IP-I costs more time, and exceeds 800 seconds when n = 20. The running time of IP-I is too large to be employed in larger graphs. Figure 6b,d shows the algorithm performances in the larger graphs, the size of which ranges from 30 to 100. Except when n = 30, the IP-D serves the most vertices. The gap between the number of vertices served by the IP-D and the others grows larger as the sizes of graphs grow larger. With respect to the running time, the IP-D costs the most time among all the algorithms, but its maximum running time is less than 600 s. In the 3-principal-3-assistant scenario shown in Figure 6e-h, and the 4-principal-2-assistant scenario shown in Figure 6i-l, the IP-D has a similar performance. As a whole, the number of vertices served by the IP-D is the largest among all the algorithms, and the running time of the IP-D is the largest among the others but is reasonable in practice.
There is a strange phenomenon where IP-L and IP-D expend more time when n = 30 than n = 100. The main time cost comes from the integer programming, which is N P-hard. More columns in the integer programming indicate a longer CPU time. Based on the experimental data, when the number of columns that denote the principal-assistant routes in the paper reaches 400, the IP-L and IP-D will expend more time to solve their integer programming problems. Because the d 0 is twice the diameter of the graph, when n = 30, one principal-assistant route cannot serve many tasks surrounding the initial vertex, so the new route r, which serves more unserved tasks far from the initial vertex, can be added to Ω . However, when n = 100, the d 0 is very large and the principal-assistant routes seek to serve the vertices surrounding the initial vertex; many generated routes are the same or cannot coexist, and the actual size of the route set Ω when n = 100 is less than the size when n = 30. In general, the number of vertices served by the IP-D is largest among all the algorithms, although the running time of the IP-D is also largest among them; however, this is reasonable in practice.

Conclusions and Future Research
The principal-assistant collaboration problem with deadlines is investigated in this paper. Two scenarios are systemically discussed here, including the 1-principal-1-assistant scenario and the m-principal-u-assistant scenario. The hybrid combination algorithm is proposed to solve the 1-principal-1-assistant scenario, which combines different partial routes generated by three methods. Through comprehensive experiments, we validate the performance of the hybrid combination algorithm by comparing to several benchmark approaches. It can find a better route than the other algorithms compared except for the exact integer linear programming, but costs far less time than the integer linear programming. Therefore, this method can be feasible in 1-principal-1-assistant collaboration applications. A BP-based algorithm embedded with a density-based construction is proposed for the m-principal-u-assistant scenario. It uses the branch-and-price algorithm to find the optimal combination of the current route set and uses the density-based construction to enlarge the route set. Through comprehensive experiments, we compare the performance of the proposed algorithm with several benchmark approaches. In most cases, the proposed algorithm can serve the most vertices, although it takes more running time but is still reasonable. Therefore, the algorithm is practical in m-principal-u-assistant collaboration scenarios.
In the future, we will mainly explore the adaptation of hybrid principal-assistant collaboration systems from two directions: (1) There are often multiple initial locations of agents in real applications, and multiple tasks may be in the same location; hence, one direction is to consider multiple initial vertices and more than one task in a vertex in the principal-assistant collaboration problem. (2) There are various collaborative relationships between principal agents and assistant agents in different environments; hence, the other direction is to improve the locomotion of the principal-assistant teams by considering appropriate collaboration pattern selection.

Conflicts of Interest:
The authors declare no conflict of interest.