SC-M*: A Multi-Agent Path Planning Algorithm with Soft-Collision Constraint on Allocation of Common Resources

Featured Application: SC-M* generalizes the M* algorithm to address real-world multi-agent path planning problems in the soft-collision context, which considers the allocation of common resources requested by agents. Application examples include but are not limited to city-scale passenger routing in mass transit systems, network trafﬁc engineering and planning for large-scale autonomous vehicles. Abstract: Multi-agent path planning (MAPP) is increasingly being used to address resource allocation problems in highly dynamic, distributed environments that involve autonomous agents. Example domains include surveillance automation, trafﬁc control and others. Most MAPP approaches assume hard collisions, e.g., agents cannot share resources, or co-exist at the same node or edge. This assumption unnecessarily restricts the solution space and does not apply to many real-world scenarios. To mitigate this limitation, this paper introduces a more general class of MAPP problems—MAPP in a soft-collision context. In soft-collision MAPP problems, agents can share resources or co-exist in the same location at the expense of reducing the quality of the solution. Hard constraints can still be modeled by imposing a very high cost for sharing. This paper motivates and deﬁnes the soft-collision MAPP problem, and generalizes the widely-used M* MAPP algorithm to support the concept of soft-collisions. Soft-collision M* (SC-M*) extends M* by changing the deﬁnition of a collision, so paths with collisions that have a quality penalty below a given threshold are acceptable. For each candidate path, SC-M* keeps track of the reduction in satisfaction level of each agent using a collision score, and it places agents whose collision scores exceed its threshold into a soft-collision set for reducing the score. Our evaluation shows that SC-M* is more ﬂexible and more scalable than M*. It can also handle complex environments that include agents requesting different types of resources. Furthermore, we show the beneﬁts of SC-M* compared with several baseline algorithms in terms of path cost, success rate and run time.


Introduction
Multi-agent path planning (MAPP) involves finding the set of least-cost paths for a set of agents co-existing in a given graph such that each of the agents is free from collision, where a collision is defined as at least two agents moving to the same location at the same time.MAPP attracts increasing attention due to its practical applications in multi-robot systems for surveillance automation, video gaming, traffic control, and many other domains [1][2][3][4].This problem is, however, difficult to solve because the configuration space grows exponentially with the number of agents in the system, incurring extremely heavy computational efforts.It is an NP-hard problem to find optimal solutions for MAPP in its general form [5].
Approaches to solving MAPP problems fold into three main categories: coupled, decoupled and intermediate [6].Coupled approaches search the joint configuration space of the multi-agent system, which is the Tensor product of the free configuration spaces of all the individual agents.A popular coupled planner is the A* algorithm [7] that directly searches the whole joint configuration space, making such an approach computationally infeasible when the number of agents is large.Enhanced variants of A*, such as operator decomposition (OD), enhanced partial expansion A* (EPEA*), and iterative deepening A* (IDA*), can-to some extent-mitigate the exponential growth in the number of neighbors by improving the admissible heuristics [8][9][10][11].Coupled approaches are optimal and complete, but usually at high computational cost.Decoupled approaches plan for each agent separately and then adjust the path to avoid collisions.Algorithms in this category are generally faster because they perform a graph search and collision-avoidance adjustment in low-dimensional spaces.However, optimality and completeness are not guaranteed [3,12].
Intermediate approaches lie between coupled and decoupled ones because they dynamically couple agents and grow the search space during the planning.In this way, the search space is initially small and grows when necessary.A few intermediate MAPP algorithms can guarantee optimality and completeness.State-of-the-art examples include Conflict-Based Search (CBS) [6,13].CBS is a two-level algorithm.At the high level, conflicts are added into a conflict tree (CT).At the low level, solutions consistent with the constraints given by the CT are found and updated to agents.CBS behaves poorly when a set of agents is strongly coupled.Meta-agent CBS (MA-CBS) is then proposed by merging strongly coupled agents into a meta-agent to handle the strongly coupled scenarios.
The M* algorithm is a state-of-the-art coupled approach.It starts with decoupled planning and applies a strategy called sub-dimensional expansion to dynamically increase the dimensionality of the search space in regions in which agent collisions occur.In this way, an efficient graph search with a strict collision-free constraint can be achieved, while minimizing the explored portion of the joint configuration space.M* identifies which subsets of agents can be safely decoupled and hence plans for multi-agents in a lower-dimensional space.Compared to CBS and its variant MA-CBS, M* and its variants, e.g., recursive M* (rM*), have much more fine-grained control over some technical details, such as the management of conflict sets for better scalability.The fine-grained nature of M* allows it to be integrated into MA-CBS to take advantage of both [14].Recent work extended both M* and CBS algorithms to handle the imperfect path execution due to unmodeled environments and delays [15,16].
Most fundamental MAPP approaches assume hard collisions, which means that solutions in which agents share resources (nodes or edges) are rejected.In many real world scenarios, some degree of resource sharing between agents is acceptable, so the hard-collision constraint needlessly over-constrains the solution space.This paper relaxes the hard collisions constraint by allowing some sharing of resources, including space and various services on edges/nodes, by agents.Such sharing reduces the quality of the path, i.e., the satisfaction level of the agent using it, but as long as the quality reduction for each path is below a settable threshold, the solution is acceptable.We call this concept soft collisions.Hard collisions are still supported by having a very strict threshold, i.e., a penalty for sharing is very high.The reduction in satisfaction level experienced by an agent caused by soft collisions on resources in its path is quantified using a collision score.In this paper, we develop a generalized version of the M* algorithm, called soft-collision M* (SC-M*), for solving the MAPP problem in the soft-collision context.Note that we that we are not simply replacing hard with soft collisions, but instead introducing soft collisions as a generalization that allows modeling different types of collisions.
SC-M* extends M* by taking the perspective of soft collision on common resources.Specifically, SC-M* tracks the collision score of each agent and places agents whose collision scores exceed certain thresholds into a soft-collision set for sub-dimensional expansion, a technique that limits the search space while maintaining the optimality of the algorithm with respect to the objective.In this way, SC-M* achieves improved scalability to handle a larger number of agents while limiting the probability of collisions on resources to a bound.
In this paper, we show that SC-M* has advanced flexibility and scalability for efficiently solving the MAPP problem in the soft-collision context where common resources are considered, and can handle complex environments (e.g., with multiple types of agents requesting multiple types of resources).We theoretically prove that SC-M* is complete and suboptimal under the soft-collision constraints on resources.Experimental results demonstrate the advantages/trade-offs of SC-M* in terms of path cost, success rate and run time against baseline SC-based MAPP planners, such as SC-A* and SC-CBS.
The rest of the paper is organized as follows.Section 2 discusses the motivation of soft collisions.Section 3 gives technical briefing of the M* algorithm.Section 4 presents our proposed SC-M* approach.Section 5 evaluates SC-M* in a grid public transit network.Finally, Section 6 concludes our work.

Motivation
In some planning problems, solutions in which agents share resources, i.e., they collide using the traditional MAPP problem definition, may be acceptable, at the cost of having a reduced level of agent satisfaction.Problems of this type have two properties in common: (1) Agents' satisfaction conditions are reduced when meeting at the same place; and (2) the extent of reduction in satisfaction depends on how long the dissatisfying situation lasts in terms of distance or time.
One motivating example of this type of problems involves mass transit systems, in which passengers have various preferences, even necessities, in terms of common resources, such as seat availability (necessary for seniors) and on-vehicle Wi-Fi supply (preferred by video viewers and game players during the trip).Passengers may interfere with one another on common resources in crowded situations.Individually optimal paths can cause serious interference, leading to low-quality experiences.Interference between passengers is soft because it is possible that they do not call for the same resource when they are on the same public vehicle.In addition, even when they call for the same resource and interfere, they are able to tolerate each other over a short time and distance.Intuitively, how likely a collision (intolerable interference) actually happens depends on: (1) whether the resource supply is less than the demands; and (2) how long the lack-of-supply condition lasts in terms of the time and distance that the passengers stay together.Passengers can be viewed as agents, moving through the transportation network.When planners plan for all the agents, sticking to eliminating any hard collision is neither necessary nor feasible.Thus, people are more interested in another problem: How can the resource received by all agents be maximized such that the probability of collision of each agent is less than a bound?This is an important topic of passenger-centered research [17][18][19].
In addition to public transit scenarios, other examples include: network traffic engineering, where multiple data streams can route through a router.Long streams will have a higher chance of being blocked when unexpected traffic spikes pop up, exceeding the link capacity [20].How to maximize the throughput with a bounded chance of blocking is of great interest to researchers in the field of communications and computer networks.
Another example is planning for large-scale self-driving cars, where multiple cars can share the same lane, and the number of cars on a road will influence the chance of crashes among autonomous vehicles [21,22].Scholars and engineers dealing with the fundamentals of autonomous vehicles in unstructured and dynamic environments aim to increase the road traffic while bounding the crash risk.
Military transportation also has the soft-collision property, in which transport aircrafts or vehicles are subject to higher risks to be detected and attacked by enemy troops when many of they move together due to path overlap for a long distance.Formally, as the transportation volume on a road increases, the degree of concealment decreases [23].The dispatcher must bound the security risk when attempting to maximize the military transportation efficiency.
To support these application classes, we introduce the soft-collision property (related to common resources) to MAPP.SC-M*, introduced in this paper, is the first attempt to generalize M* to handle real MAPP problems in a soft-collision context, considering various common resources requested by agents.Specifically, SC-M* changes M*'s definition of a collision so it can represent soft collisions on resources and their impact on an agent's dissatisfaction level.We show the advantages of the SC-M* against other SC-based MAPP solvers.

Technical Briefing of M*
Before introducing the SC-M* method, this section reviews the traditional MAPP problem and the M* algorithm [6].

MAPP Problem Definition
In this problem, we have m agents indexed by the set I = {1, . . ., m}.Let the free configuration space of agent j be represented by the directed graph G j = {V j , E j }.For any agent j, graph G j is the same.The joint configuration space, which describes the set of all possible states of the multi-agent system, is defined as the tensor product of the graphs of all individual agents: G consists of a joint vertex set V and a joint edge set E. As an example, in a 2-D joint configuration space given by the agents j and k, the two 2-D joint vertexes v p = (v j p , v k p ) and v q = (v j q , v k q ) is connected by the joint edge (e j pq , e k pq ).Note that v j p ∈ V j and e j pq ∈ E j .Let π j (v j p , v j q ) denote a sequence of joint vertexes, called a path in G j from v j p to v j q .The cost of a path π(v p , v q ) in G is defined as where g(π) is the sum of all edge costs involved in the joint path π.
The goal of MAPP is to find a collision-free path, which is optimal with respect to minimal cost, from the source configuration To determine the collision between agents, a collision function ψ(v p ) is defined to return the set of conflicting agents at v p .
Most fundamental MAPP approaches use hard collisions, where no intersection is allowed between any two agents in terms of the occupation of any resource, such as a workspace.This implies that the capacity of each resource can support only one agent at a time (i.e., a collision happens immediately once agents intersect at any resource).Suppose we have a set of resources A = {A 1 , . . ., A L } requested by each agent in the multi-agent system, where A i is defined as the set of resource of type i on all edges and vertexes in G.A i is a continuous set because only continuous resources are considered in the paper.A traditional hard-collision constrained MAPP problem is formulated as follows: where A k (v j p ) denotes the subset of resource A k occupied by the agent j at the joint vertex v p .One state-of-the-art solver to this problem is M*, which uses the sub-dimensional expansion strategy to dynamically increase the dimensionality of the search space in regions featuring some agent collisions.M* enables a relatively cheaper graph search under the strict hard-collision constraint.

Graphic-Centric Description of M*
This section uses the graphic-centric description introduced by wanger [6] to illustrate M*.M* is a complete and optimal MAPP algorithm.The main idea of M* is to iteratively construct/update a so-called search graph G sch (i.e., to iteratively remove the collision configuration vertexes and expand necessary neighbors) and apply the A* algorithm on the new G sch until the optimal collision-free path to v d exists in the G sch and is found by the A* search.Specifically, G sch is a sub-graph of G and consists of three other sub-graphs: the expanded graph G exp , neighbor graph G nbh , and policy graph G φ .The expanded graph G exp is the sub-graph of G that has been explored by M*.G nbh contains the limited neighbors of all the joint vertexes in G exp .The definition of limited neighbors is given below.G φ consists of the paths induced by the individually optimal policy φ that connects each joint vertex in G nbh ∪ G exp to v d without the collision-free constraint.Specifically, φ j is the individually optimal policy for the agent j that leads any v j in G nbh ∪ G exp to v j d without considering collisions.Examples of policy φ include the standard Dijkstra's algorithm [24] and A* [5].Using the above graphic concepts, we can define the collision set C p as where V p V p V p = {v q |∃π(v p , v q ) ⊆ G exp } is the set of the joint vertexes to which there exists a path to from v p in G exp .Let φ j (v j ) be the immediate successor vertex of v j in the policy path, then the set of limited neighbors V nbh p for the joint vertex v p in G nbh is defined as where The definition of the limited neighbors implies the sub-dimensional expansion strategy: We only expand the search space at the dimensions where the collision occurs (j ∈ C p ), otherwise for collision-free dimensions (j / ∈ C p ), M* will not expand, limiting the unexpanded search space to the graph that only consists of individually optimal path induced by the policy φ.

Algorithm Description of M*
The high-level description of M* is as follows [6]: Initially, M* computes the individually optimal policy φ for each agent from source v s to destination v d .The initial search graph G sch only consists of an individually optimal path: Initial G exp contains v s only; initial G nbh contains φ(v s ) only, which is the successor of v s along the individually optimal policy; and initial G φ contains the optimal policy path from the vertex in G nbh and G exp all the way to v d .C p = ∅ for all v p in initial G sch .Given the initial G sch , the A* algorithm is applied using the following admissible heuristic where π φ is the individually optimal path induced by policy φ, and π * is the ground-truth optimal multi-agent path we want to find.The initial open list (i.e., priority queue) contains v s only, with zero cost.The open list is sorted according to v p .cost + h v p , where v p .cost is the current cost of v p from the source.
In each iteration, M* expands the first-ranked vertex v p from the open list to G exp and investigates each joint vertex v q in the limited neighbors of v p (i.e., v q ∈ V nbh p ) if no collision occurs at v p ; otherwise, it jumps to the next iteration.If there exists a collision (i.e., ψ v q = ∅), M* will update the collision set C q with C q ∪ ψ v q , and this update will back-propagate from v q to: (1) its immediate predecessor v p ; and (2) all the way back to any ancestors that have at least one path inside of G exp leading to v q (see Equation (3) for details).After this pre-processing, the algorithm:

•
investigates and updates the cost of the vertex v q and records its corresponding predecessor; and • adds v q and all its predecessors/ancestors, of which the collision sets are changed, to the open list.
This process is repeated until v d is expanded or open list is empty.The critical point is that: Only when a collision set C p is changed will the search graph G sch change.It is the operation of updating the collision set in a back-propagation way that makes the story different: By including ψ v p to C p , M* can tell which agents are immediately collided at the current v p ; by including all ψ v q for v q ∈ V p to C p (i.e., the collision information of all the expanded downstream successors from v p ), M* can preview which agents will collide in the future, making it possible to pre-plan to avoid that.Therefore, using the limited neighbor set in Equation ( 4) makes sense: It advises M* to only expand the dimensions where there exists an immediate collision at v p or there will be collisions in the future, starting from v p , in the current expanded graph G exp .Figure 1 shows an example of how M* solves the optimal collision-free path planning for the two agents.In Figure 1, we can visualize the evolution of the search graph G sch of Agent 2. G sch consists of an expanded graph G exp (circle), a neighbor graph G nbh (diamond), and a policy graph G φ (square).Edge cost and direction-changing cost are considered during planning.Yellow zones are preferred areas with lower edge cost.In M*, individually optimal paths are induced by φ for each individual agent (Figure 1a).We can observe that there will be a collision at vertex s 10 , which is ignored by φ.For Agent 2, M* searches in the subspace, and the most promising vertex is expanded at each iteration (Figure 1b,c).Then, a collision occurs at vertex s 10 and triggers the removal of the rest of G sch (Figure 1d), which is equivalent to jumping to the next iteration.Following the sub-dimensional expansion strategy, M* extends the search space to include the limited neighbors, and a new G sch is obtained (Figure 1e).By searching in the new G sch , M* finds the optimal collision-free path for Agent 2 (Figure 1f,g).On the other hand, the planning for Agent 1 is conducted simultaneously, and, finally, the collision-free optimal paths for both agents are found by M* (Figure 1h).

Soft-Collision M* (SC-M*)
M* assumes hard-collision constraint which does not apply to many real-world applications.Our contribution in this paper is to generalize M* to soft-collision context where common resources are considered, and to introduce soft collisions as a generalized concept allows us to model different types of collisions.In addition, we show the advantages and trade-offs of the proposed algorithm in this new scenario.The proposed SC-M* extends M* by changing the definition of a collision, so paths with hard collisions but with a level of dissatisfaction on resources below a given threshold are acceptable.In this section, we formulate the concept of soft collision on common resources, describe the generalized M* (i.e., soft-collision M*) for planning in the soft-collision context, and extend our approach to a more complex environment with multiple types of agents requesting multiple types of resources.

Soft-Collision Constraint on Common Resources
Inspired by real-world scenarios, we introduce the recourse-related soft-collision property to the model of an agent.We define that all the agents have the following properties: (1) a collision among agents is soft, quantified using some collision scores; and (2) different agents have different collision scores, according to their individual experiences through the paths.We suppose that each agent cares about a set of resources A = {A 1 , . . ., A L }.To obtain the properties, we introduce to each agent an additional attribute called resource experience (for each resource) and use the resource experience to calculate the collision score.
In doing so, this section first uses the resource experience (as defined in Section 4.1.1Definition 1) to quantify how dissatisfying the agent is about the resource allocated to it.Then, we combine this information of all the resources into a collision score (as defined in Section 4.1.2Definition 2) that indicates the probability of the agent announcing a collision given its resource experience.Threshold of collision is used to limit the collision score, implying to what degree of unpleasantness we want to pursue the solution.The agent, of which the collision score exceeds the threshold, will be placed into a soft-collision set via the soft-collision function for sub-dimensional expansion (as defined in Section 4.1.3Definition 3).

Definition 1 (Resource Experience)
We define resource experience to quantify the dissatisfying experience per resource about which an agent cares. Let ) be a path from the source v s to some v b ; • v q = π v p be the immediate successor of v p along the path π; • A k (e j pq ) be the capacity (amount) of the subset of the resource A k on the edge e j pq , given by the graph model; and ) be the amount of the subset of the resource A k actually allocated to the agent j on the edge e j pq , called the allocated resource value.
The resource experience is then defined as the dissatisfying experience of agent j on resource A k along the path π j : where 1(•) is the indicator function, whose value is one if the logical condition is true, else zero; incurred by other agents when they physically move together.In contrast, the traditional hard-collision setting will always label a collision to the agent j and all other involved agents whenever A j k (e j pq ) is (even slightly) smaller than A k (e j pq ).The resource experience is implemented as an attribute of the vertex class and can be calculated incrementally using Algorithm 1.
Input: v k : base vertex; v l : immediate successor of the base vertex; A: list of resources Output: v l with updated experience 1: for A p in A do 2: for j in I do 3: end for 5: end for 6: return v l //the successor with updated experience Combined with the allocated resource value, which serves as a proxy of the interference level, the definition of resource experience in Equation ( 6) actually defines a property of an agent: Only the situation in which the resource allocated to an agent is dissatisfying because of the co-existence of other agents (i.e., A k (e j pq ) ≥ k should hold), will contribute to the dissatisfying experience of that agent.Furthermore, each dissatisfying condition is weighted by the edge cost g(e j pq ).In this way, we can quantify the resource experience in terms of how long such a dissatisfying condition lasts in travel time or distance, which is quantified by g(e j pq ).As discussed below, the resource experience of an agent will determine its collision score, which is defined from a probabilistic point of view.

Definition 2 (Collision Score)
We use the resource experience results from Definition 1 to calculate the collision scores.This is defined from the view point of collision probability, that must be constrained under some threshold.Let

•
Col j be the event that agent j announces a collision (i.e., when agent j calls for one of the resources, the allocated resource is less than satisfying); , be the set of dissatisfying experiences of agent j along path π j on the resource A k ; and . ., f L } be a customized cumulative distribution function (CDF) defined on [0, +∞), mapping the resource experience D to a probability of collision on the resource A k .
The collision score of the agent j is defined as the probability of how likely a collision occurs to the agent j on at least one of the resources given its resource experience D j : Note that P Col j D j calculates the complement of the success probability-the joint probability of being tolerable at all resources.Figure 2 shows two example designs of f : f 1 (D) = sigmoid(D − δ), with a discontinuity point f 1 (0) = 0, is a sigmoid-based CDF function, featuring a surge in the collision score (the derivative is bump-shaped) at the experience value around δ.This function is suitable to important resources that are sensitive to the agent; f 2 (D) = min(1, D/(4δ)) is a linear CDF with a shallow slope (the derivative is flat).This function can apply to trivial resources that are not very sensitive to the agent but still accumulate to contribute to the collision score.We use the offset parameter δ to adjust the tolerance level of the dissatisfying experience.With larger δ, the agent will tolerate a longer unpleasant experience before announcing a collision.
Although the definition of the collision score can be customized according to different practices, the probabilistic definition of collision score introduced here is a general one: Different types of resources may have different value ranges, and Equation ( 8) standardizes the resource ranges, mapping them to a value within [0, 1] and enabling an efficient integration of different types of resources to the framework.

Definition 3 (Soft-Collision Function)
Now, according to the collision scores from Definition 2, we want to pick out the above-threshold agents and place them into the soft-collision set via the soft-collision function for the purpose of applying the sub-dimensional expansion.
Given a path π = π (v s , v b ) and corresponding resource experience D j for the agent j, the soft-collision function of the agent j is where T is the threshold of collision.The definition of the global soft-collision function is then defined as Based on Definition 3, we can formally construct the soft-collision constraint on common resources and obtain the soft-collision constrained MAPP problem: This problem setting is general and can be utilized to express the hard collision setting in Equation ( 2) by setting T = 0 or changing the condition inside the indicator function of Equation ( 6

SC-M* Description
SC-M* is a general solver to the MAPP problem in Equation ( 11) by adjusting M* to the soft-collision constraints on common resources.The pseudocode for SC-M* is presented in Algorithm 2, where critical commands relative to the soft-collision constraint are underscored.In this algorithm, Lines 1-7 initialize each vertex v in the vertex set V with infinite cost from the source v s (the cost of v s itself is zero), set dissatisfying experience to zero and make collision set C k empty.The initial open list contains v s only (Line 8).In each iteration, SC-M* expands the first-ranked vertex v k in the open list ordered by the total cost v k .cost+ heuristic[v k ] (Lines 10 and 11).The algorithm terminates and returns the result if the expanded v k is the destination v d (Lines 12-14) or jumps to the next iteration if immediate collision occurs at v k , i.e., ψ (v k ) = ∅ (Lines 15-17).Line 18 constructs the limit neighbors V nbh k of v k using Equation (4).For each vertex v l in V nbh k (Line 19), it adds v l to the descendant set V k of v k (Line 20), updates the dissatisfying experience of v l using Algorithm 1 (Line 21), and merges the immediate collision at v l to its soft-collision set C l (Line 22).On top of the new collision set of v l , SC-M* backpropagates to update all the affected ancestor vertexes from v l (see Equation ( 3)) and adds them back to the open list for re-expanding (Line 23).After this collision set updating operation, if v l is free from collision and has improved cost, the algorithm accepts the new cost by save the trace-back information and adding v l to the open list for expansion (Lines 24-28).This process repeats until the open list is empty (Line 9) when no solution exists or the optimal solution is found (Lines 12-14).conduct the construction of V nbh k using Eq.( 4) 19: //update experience using Algorithm 1 22: ) update all the affected soft-collision sets using Eq.( 3) //2) add all affected vertexes back to open list (see reference [6] for details) end for 30: end while 31: return no path exists SC-M* can make a transition from a decoupled individual A* (T = 1) to a standard hard-collision constrained M* (T = 0), providing more flexibility to the performance of the algorithm with bounded soft-collision scores.

Completeness and Cost-Suboptimality
A MAPP algorithm is complete if it guarantees that it will either return a path, or determine that no path exists in finite time.An algorithm is optimal if it guarantees returning an optimal path if such a solution exists.SC-M* is complete and suboptimal conditioned on the soft-collision constraint (i.e., P Col j D j < T, for a given collision threshold T).

Completeness
Theorem 1. SC-M* is a complete algorithm.
Proof of Theorem 1. SC-M* inherits the sub-dimensional expansion from M* (i.e., it changes the G sch only when one of the soft-collision sets C p changes).The algorithm applies A* in the updated search graph.Due to the merging operation applied to collision set C p , as shown in Equation ( 3), C p for each vertex will change finite times (at most m times, which is the number of agents).Because A* is complete, applying A* to a given G sch takes finite time to return a result.Therefore, SC-M* is complete.

Cost-Suboptimality
Different from M*, which is optimal, SC-M* is suboptimal because Equations ( 9) and ( 10) only include the immediate conflicting agents to the soft-collision set; the agents that softly interfere with the conflicting agents in the upstream path are excluded.Those excluded agents also contribute to the announced collision (i.e., making the collision score above the threshold).Because of this, SC-M* cannot guarantee the inclusion property, which is the basis to ensure the optimality in M* [6]: The optimal path for some subset of agents costs no more than the optimal joint path for the entire set of agents.Without the inclusion property, SC-M* may not guarantee cost optimality.
Figure 3 provides a counterexample of the inclusion property of SC-M* in the soft-collision MAPP context defined in this paper.Let π Ω (v k , v f ) be the joint path constructed by combining the optimal path for a subset Ω ∈ I of agents with the individually optimal paths for the agents in I\Ω.The inclusion property is defined as follows: If the configuration graph contains an optimal path In the soft-collision context, this inclusion property does not always hold.In Figure 3, we have a three-agent MAPP problem (I = {r1, r2, r3}) in the soft-collision context.Agents r1, r2, and r3 attempt to move from the vertexes a, f , and h to the vertexes e, g, and i, respectively.The individually optimal paths (shortest distance) are a → b → c → d → e with distance 4 for r1, f → c → d → g with distance 3 for r2, and h → b → c → i with distance 3 for r3.The total cost of the joint individually optimal path is 10.However, assuming that the agents can only tolerate a dissatisfying experience with distance 1, r1 will announce a collision at vertex d because of the interference on the edge b → c and c → d from agents r3 and r2, respectively.
If we choose Ω = {r1, r2} ∈ I, as can be seen in Figure 3, the only solution would be that r1 takes a detour through the vertex x to avoid the collision on the edge c → d, resulting in a cost of 5 for r1, and the total g(π Ω (v k , v f )) is 11 (3 for r2, 5 for r1 and 3 for r3).On the other hand, by searching through all three dimensions, a better solution would be that r3 detours through the vertex y, and r1 is free from collision because the interference on the edge b → c disappears.The total cost of this joint path is 10.5, and we have g(π The reason for this phenomenon is that, in the hard-collision context, only the immediate conflicting agent r2 contributes to the collision of r1 at vertex d.However, in the soft-collision context, both r2 and r3 contribute to the collision of r1 at vertex d, and thus, the inclusion property does not apply.Without this inclusion property, which is the basis of the optimality of M*, the optimality of SC-M* cannot be guaranteed. However, we notice that suboptimal methods have long been used successfully to solve many interesting MAPP problems [15,25,26].Given the fact that we show in the next section that SC-M* is superior to other alternative SC-based MAPP solvers (e.g., SC-A* and SC-CBS) in terms of scalability, run time, and path cost, we demonstrate that the proposed method, which is adjusted to MAPP in the soft-collision context, is a powerful tool in practice.

Experiments and Results
We evaluated SC-M* in simulation on a grid public mass transit network with an Intel Core i7-6700 CPU at 3.4 GHz with 16 GB RAM.As shown in Figure 4, the grid transit environment has 20 × 20 stops.There are 20 bidirectional horizontal lines.Likewise, 20 bidirectional vertical lines are deployed in the environment.At each stop, agents can switch lines.The yellow areas are covered by some resources, such as the on-vehicle free Wi-Fi in our experiments.Agents traversing those areas can enjoy high-quality on-vehicle Wi-Fi connections.A fully covered edge has a Wi-Fi resource value of 100, and the Wi-Fi value of an edge is proportional to the length of coverage.Each agent wants to move from its source (square) to its destination (circle) with the lowest cost (i.e., a linear combination of distance cost and Wi-Fi cost) as well as bounded collision score.The second resource is the space on the edge, which is fixed at 5. The satisfying values are ε 1 = 20 and ε 2 = 1 for Wi-Fi and space resources, respectively.
We randomly generated a source-destination pair for each agent.Each trial was given a 1000-s run-time limitation to find a solution.For each configuration (including the number of agents, collision threshold T, and offset parameter δ), we ran 20 random trials to calculate the average metrics (i.e., the success rate and run time).The success rate is the number of trials ending with a solution divided by the number of trials.The run time is the average over trials ending with a solution or a no-solution declaration.If all trials under a certain configuration exceeded 1000 s, we used ">1000" to represent the run time of the corresponding configuration.We used the standard A* as the coupled planner and policy generator in the SC-M* framework and compared our results to the baselines.

Planning for the One-Resource-One-Agent-Type
The first experiment considered Wi-Fi as the only resource requested by agents (i.e., A = {A 1 : "WiFi"}).Only one agent type exists, and all agents use sigmoid-based function f 1 as the collision CDF.
We first studied the influence of the collision threshold T and the offset parameter δ on performance.Figure 5a shows the success rate of the one-resource-one-type SC-M* with different thresholds T =0 (equivalent to the basic M*), 0.2, 0.4, and 0.45, while the offset parameter is fixed to δ = 6.0.Table 1 (left) shows the run time in seconds for the experiment.The results clearly show that larger thresholds bring improvement in performance with a higher success rate and lower run time for a large system size (m > 50).The improvement in performance results from the property of SC-M* that larger thresholds render more relaxed constraints, and thus, agents are less likely to collide on resources.Figure 5b shows the success rate of the SC-M* with different offset parameters δ = 0, 3.0, 6.0, and 9.0, with fixed T = 0.35.Table 1 (right) shows the run time for the experiment.The results illustrate that SC-M* is sensitive to δ and can efficiently handle up to 100 agents with δ = 9.0.These results are reasonable because the sigmoid-based CDF is used in the experiments, featuring a surge in the collision probability at the experience value around the offset, and the offset parameter poses a cutoff value on the resource experience, with collision always announced once the resource experience is larger than the offset.The standard M* (T = 0) can only scale to fewer than 30 agents.Taking advantage of this property, one can tune the parameters to trade off the scalability against the tightness of constraints.We also evaluated SC-M* in more complex environments: two agent types requesting two resources.This experiment considered both Wi-Fi and space capacity (i.e., A = {A 1 : "WiFi", A 2 : "Space"}).Type I agents use f 1 in Figure 2 as the collision CDF for the Wi-Fi resource, and the linear CDF f 2 for the space resource, implying that they treat Wi-Fi and space as important and trivial, respectively.On the other hand, Type II agents use f 1 for space and f 2 for Wi-Fi.Each agent has a 50% chance of being Type I.Both CDFs are adjusted using the same δ at each trial, as illustrated in Figure 2.
Figure 6a shows the success rate of the two-resource-two-type SC-M* with different thresholds T = 0 (equivalent to the basic M*), 0.2, 0.35, and 0.45, and with a fixed offset parameter δ = 9.0.Table 2 (left) shows the run time for the experiments.As can be seen from the results, in general, SC-M* can handle the two-resource-two-type systems and plan for more than 80 agents.Because more resources contribute more factors to increasing the collision score, a relatively large offset (δ = 9.0) is needed to achieve comparable performance to the one-resource-one-type SC-M*.Figure 6b and Table 2 (right) present the impact of the offset parameter δ on performance.Different from the first experiment, SC-M* with the above configurations is less sensitive to δ, when compared to Figure 5.The reason is that 50% of the agents are insensitive to one of the resources because of the linear CDF f 2 , thus increasing δ does not contribute to a significant reduction in collisions.This property implies that we can control the importance levels of resources efficiently through the design of collision CDFs.This experiment demonstrates that, with the proper parameter settings, SC-M* can feasibly handle a complex environment with multiple resources and multiple agent types.

Comparison of SC-M* to Baselines
We next compared the SC-M* to other SC-based MAPP algorithms, including SC-A* (optimal) and SC-CBS (suboptimal), in the one-resource-one-type environment.

Path Cost
Firstly, we compared the path cost of the three algorithms.We designed 60 planning tasks for environments with 4-6 agents (20 tasks for each), in which agents will encounter at least one collision along the individually optimal paths under the T = 0.05, δ = 1 setting.We start with small agent numbers because SC-A* cannot handle a large number of agents.
Figure 7 shows the average difference of the three SC-based solvers relative to the individually optimal cost (i.e., the sum of the optimal cost of each agent when the agent is the only one in the system).In other word,s the Y-axis represents the cost of collisions.We observe that SC-A* and SC-CBS have the lowest and highest additional cost, respectively.SC-M* solutions cost more than SC-A* but noticeably less than SC-CBS.The reason for the results is that SC-A* is an optimal solver for this type of MAPP problem because it always explores cheaper paths in the entire multi-agent joint space before considering the paths that cost more [7].SC-M* is suboptimal because of the process discussed in Section 4.3.2.Compared to SC-M*, SC-CBS suffers from more path cost due to the way it collects a collision: CBS collects collisions into a conflict tree and arranges the collision into the form [agent j, vertex v, step s], indicating that agent j collides at vertex v at step s.In each iteration, CBS conducts decoupled planning to avoid agent j reaching vertex v at step s.This might lose some information in the soft-collision context because there might exist another path that leads j to vertex v at step s without announcing a collision, by avoiding one of the upstream vertexes involved in soft interference.In contrast, SC-M* can explore those paths excluded by SC-CBS because it searches the entire space of the immediate colliding agents.Figure 8 shows a two-agent MAPP problem in the soft-collision context.Agents r1 and r2 attempt to move from vertexes a and f to vertexes e and g, respectively.The individually optimal paths (shortest distance) for both agents are a → b → c → d → e with distance 4 for r1 and f → b → c → d → g with distance 4 for r2, respectively.The total cost of the joint individually optimal path is 8. r1 and r2 softly collide on the edge b → c and c → d, where r2 can tolerate the dissatisfying experience with distance 2. However, r1 can only tolerate the dissatisfying experience with distance 1 and announces a collision at the vertex d.
When using SC-CBS, we record the collision that occurred to r1 as [r1, d, 3], indicating that agent r1 will collide at vertex d at the third step.Then, SC-CSB will avoid any paths leading r1 to d at Step 3 (including a → b → x → d → e and a → b → c → d → e) and will end up with a longer detour through vertex y.The SC-CBS solution has a cost of 5 for r1 and 9 in total.
When using SC-M*, the collision at d triggers the sub-dimensional expansion of the search graph in dimension 1, which includes both x and y.Thus, it can find a cheaper collision-free path through x and end up with a path a → b → x → d → e with a dissatisfying experience of distance 1 and a cost of 4.5 for r1 (8.5 in total).However, SC-M* does not expand dimension 2 because no collision has been announced by r2.
When using SC-A*, the joint search space of both dimension 1 and dimension 2 is expanded and searched.Instead of vertexes x and y, SC-A* will first investigate vertex z in dimension 2 according to some heuristics.This process leads to another cheaper path f → b → z → d → g with distance 4 for r2 (8 in total, which is the same as the individually optimal cost) and avoids all interference by moving through this path.As a result, SC-A* returns an optimal solution that satisfies the soft-collision constraint at the expense of search space.
The example in Figure 8 illustrates the optimality of SC-A* and the advantage of SC-M* in path cost over SC-CBS.To be specific, SC-M* provides a better solution than SC-CBS by searching thoroughly through the expanded dimensions, whereas the way SC-CBS identifies collisions is inappropriate in the soft-collision context.To the best of our knowledge, no other methodology capable of dealing with the soft-collision path planning defined in Equation ( 11) has been developed.It is expected that, in the future, more high-performance algorithms will be developed for solving the problem.

Run Time
Table 4 shows the average run time of the three SC-based MAPP solvers and we observe that both SC-M* and SC-CBS are significantly faster than SC-A* in terms of run time.This is reasonable because SC-A* always searches the global high-dimensional joint space, which is expensive.SC-CBS is faster than SC-M* because it always searches in one individual dimension at a time, whereas the SC-M* needs to occasionally deal with high-dimensional space when collisions occur.We compared the scalability of the three SC-based MAPP solvers in terms of planning for a large system size (m > 50). Figure 9 presents the success rate, average additional cost (i.e., how much more cost than the individually optimal path), and run-time ratio over SC-CBS under different thresholds T, where the run-time ratio of SC-CBS is compared to itself and thus is constant.SC-A* has the slackest constraint (T = 0.35, δ = 9.0) but poorest performance because of the prohibitively large search space.SC-CBS has the best success rate because of the property of the decoupled searching.However, this is at the expense of path cost.SC-M* performs decently in terms of both the success rate (significantly superior to SC-A*) and cost (noticeably lower than SC-CBS) as the number of agents increases.The run time of the SC-M* is generally longer than that of SC-CBS.In another experiment, we observe that the run-time ratio of SC-M* over that of the SC-CBS starts to decrease after a peak.This is because we force all algorithms to terminate after 1000 s, and both curves will converge to value one when their success rates decline to zero.We conducted another scalability experiment with different offsets δ (given T = 0.25) and observe the same results in terms of scalability.Figure 10 shows the experimental results.Considering the scalability and path cost altogether, SC-M* demonstrates its overall advantages over alternative SC-based solvers.

Figure 1 .
Figure 1.Illustration of traditional M* for two agents, where we show the evolution of the expanded graph G exp (circle), neighbor graph G nbh (diamond), and policy graph G φ (square) for Agent 2 as the M* algorithm proceeds.(a) Individually optimal paths; (b) the first expanded vertex; (c) the third expanded vertex; (d) collision occurs at vertex s 10 ; (e) sub-dimensional expansion; (f) search in the expanded space; (g) the destination of Agent 2 founded; (h) collision-free optimal paths for both agents founded by M*.
the satisfying value regarding the resource A k , which is a positive real value; g(e j pq ) is the edge cost regarding travel time/distance given by the graph model; and A A k (e j pq ) if and only if no other agents are physically moving along with agent j on the edge e j pq .The allocated resource value A j k (e j pq ) quantifies the level of interference

Figure 2 .
Figure 2. Example designs of cumulative distribution functions (CDFs), mapping the resource experience D of an agent to a collision probability on certain resource.f 1 : sigmoid-based CDF for important (sensitive) resources.f 2 : linear CDF for trivial (insensitive) resource.δ: offset parameter adjusting the tolerance level.

5 ,Figure 3 .
Figure 3. Counterexample of the inclusion property of soft-collision M* (SC-M*) in the soft-collision context.Agents r1, r2, and r3 have the planning O-D demands (a, e), ( f , g), and (h, i), respectively.Vertexes in the system are labeled as a, b, c, etc.

Figure 4 .
Figure 4. Grid system with 20 × 20 stops and 40 bidirectional lines.Square and circle of the same color correspond to the source and destination of an agent, respectively.

Figure 7 .
Figure 7. Average cost difference of soft-collision-based multi-agent path planning (SC-based MAPP) solvers from the individually optimal cost in the one-resource-one-type context.

Figure 9 .
Figure 9. Success rate, cost, and run time ratio of the three SC-based MAPP solvers under different T.

Figure 10 .
Figure 10.Success rate, cost, and run-time ratio of the three SC-based MAPP solvers under different δ.

Table 1 .
Run time of one-resource-one-type SC-M* under different parameters.

Table 2 .
Run time of two-resource-two-type SC-M* under different parameters.

Table 4 .
Average run time of SC-based MAPP solvers in the one-resource-one-type context.