Decentralized Navigation with Optimality for Multiple Holonomic Agents in Simply Connected Workspaces

Multi-agent systems are utilized more often in the research community and industry, as they can complete tasks faster and more efficiently than single-agent systems. Therefore, in this paper, we are going to present an optimal approach to the multi-agent navigation problem in simply connected workspaces. The task involves each agent reaching its destination starting from an initial position and following an optimal collision-free trajectory. To achieve this, we design a decentralized control protocol, defined by a navigation function, where each agent is equipped with a navigation controller that resolves imminent safety conflicts with the others, as well as the workspace boundary, without requesting knowledge about the goal position of the other agents. Our approach is rendered sub-optimal, since each agent owns a predetermined optimal policy calculated by a novel off-policy iterative method. We use this method because the computational complexity of learning-based methods needed to calculate the global optimal solution becomes unrealistic as the number of agents increases. To achieve our goal, we examine how much the yielded sub-optimal trajectory deviates from the optimal one and how much time the multi-agent system needs to accomplish its task as we increase the number of agents. Finally, we compare our method results with a discrete centralized policy method, also known as a Multi-Agent Poli-RRT* algorithm, to demonstrate the validity of our method when it is attached to other research algorithms.


Introduction
Motion Planning (MP) is one of the fundamental problems in robotics and a great amount of control algorithms have been proposed to solve it.The main purpose is to find methods to discover safe and, to the greatest extent possible, optimal trajectories either for a single mobile robot/agent system (Single-Agent System-SAS) or a multiple mobile robots/agents system (Multi-Agent System-MAS), which navigate the robot/agents' journey from any initial point of the workspace to their destination.In the research community and industry, controlling MASs is a state-of-the-art task, as these systems can complete tasks faster and more efficiently than SAS.Despite their effectiveness, developing control policies that result in suitable trajectories with minimum cost (in terms of the agent's path distance and input energy) is extremely challenging.
There are two control policies used to navigate a MAS: the centralized policy (CP), in which the agents are treated as a whole system and each agent is guided by one global controller, and the decentralized policy (DP), where each agent has its own control policy and exchanges information with the others to reach its destination safely without collisions.Designing DP controllers is more challenging than the centralized one, as the CP simplifies the problem and the agents have all the information about the environment and the goal position of all agents.On the other hand, in DP, the constraint of agents' communication plays a vital role in the control design (for more information about the differences between CP and DP, the reader is referred to [1]).There is an abundance of studies that utilize these policies to efficiently navigate the agents, helping them to accomplish their tasks.They are mostly published in the field of exploration [2][3][4][5][6][7] and formation [8][9][10][11][12][13].
Furthermore, numerous researchers have proposed methods for navigating mobile robots to a final position; see [14][15][16][17][18]. Some of these studies use a non-degenerate navigation function [19], creating a negated gradient potential field that is attractive towards the goal configuration and repulsive regarding obstacles and the workspace boundary.The first studies to apply the above function to help mobile agents navigate to their destination without facing any conflicts are [20,21] which describe a CP and DP, respectively.We have to mention that this type of controller does not provide strict global navigation, because a vector field in any manifold that has a unique attractive point should have at least as many saddles as obstacles [19].Related multi-agent navigation function methods can be found in [22][23][24][25][26][27].
As we mentioned earlier, the ultimate goal for MAS navigation is to design methods that can find the optimal path for the agents to follow.In SAS, there are discrete (DM) and continuous methods (CM) that accomplish this task.In DM, there is an RRT* algorithm with various versions [28,29]; this algorithm modifies the robot's path in order to decrease the minimum path length.The CM relies on Reinforcement Learning (RL)-based methods that use iterative algorithms to converge in an optimal navigation policy [30][31][32].It is common in MAS to use methods based on Deep Reinforcement Learning (DRL), which is a powerful tool that combines neural networks and RL algorithms that allow each agent to learn from its interactions with the environment [33][34][35][36][37][38].Despite the effectiveness of the RL-based methods, the main disadvantage in MAS is the computational complexity and abundance of data required to converge to the global policy.
Therefore in this paper, we are going to introduce a sub-optimal approach for navigating a MAS in planar simply connected workspaces (i.e., workspaces without any internal obstacles).We refer to them as sub-optimal because we begin by finding the optimal navigation for each agent separately with a novel off-policy iterative method [30] that is used to prevent the computational complexity of global MAS and, subsequently, we design a DP controller for multi-agent navigation, where we use the negated gradient of a navigation function from [19].We apply a DP method to our MAS problem, as every agent should operate independently from the others.This is due to the fact that each agent possesses predetermined information about its optimal navigation and follows this policy to its destination when it is isolated from the others.Additionally, we adopt the approach from [21,26], referred to above, to design the controller for each agent in order to prevent collisions, and this approach also demands fewer numerical operations than the CP of [20], thus relaxing the requirements of the solution.Hence, the main contributions of this work are summarized as follows:

•
We navigate each agent with a sub-optimal policy to its destination.To the best of our knowledge, this is the first work based on artificial potential fields that introduces optimality within a multi-agent navigation framework.

•
No collision with other nearby agents or the workspace boundary occurs.

•
Knowledge about the current position of the nearby agents and not their destination is required.

•
The complexity is rendered linear with respect to the number of the agents and, if combined with the recent work [39], may be fixed.
Additionally, to demonstrate the efficiency of the proposed controller, in our simulation study, we examine how much the sub-optimal trajectory of each agent deviates from the optimal one of the SAS, how much time the overall system needs to accomplish its task, and how the time taken changes as we increase the number of agents starting simultaneously from their initial position.Also, we consider the results obtained using the Multi-Agent Poli-RRT* method, which was proposed by [40], in order to compare the validity of our method.
The rest of the paper is organized as follows: Section 2 formulates the problem.Section 3 describes the proposed decentralized navigation policy.Section 4 provides the proof of correctness of our approach.Section 5 presents the simulation results.Finally, Section 6 concludes the work and proposes future research directions.

Problem Formulation
Let us assume that we have a sphere manifold (ball set) B r 0 (q) = {x ∈ R 2 : ||x − q|| < r 0 }.Inside this set, we define a simply connected workspace W ⊂ B r 0 (q) and its boundary, which is given by ∂W.There are m mobile agents (or robots) in W, with each one of them occupying a disk A i = {q ∈ W : ||q − q i || ≤ r i }, i = 1, ..., m where q i , r i denote the center and radius of the disk, respectively.The configuration space is spanned by q = [q T 1 , ..., q T m ] T .The initial position vector is defined as q = [q 1 T , ..., q m T ] T and the desired destination vector as q d = [q T d1 , ..., q T dm ] T .The motion of each agent follows a single integrator model: where u i denotes the velocity command applied to each agent.In particular, u i will be a sub-optimal decentralized control policy which will navigate the agent to its destination and prevent any conflicts/collisions with other agents and the workspace boundary ∂W.Also, the controller possesses knowledge about the current position of other nearby agents, but not their destination.Thus, every agent sees the others as moving obstacles in W.Moreover, we assume that the workspace is spacious, and at every point of it, each agent is capable of avoiding any conflict, since in this paper, we mainly focus on the functionality and the appropriate design of the sub-optimal control policy.Thus, we exclude narrowly connected workspaces, where the agents need global coordination to navigate towards their destination to alleviate blocking issues, such as in [41].We are going to deal with this issue in future work.Finally, we define the following infinite-horizon cost function for each agent: where q i is the starting point of the agent in W. Additionally, we define the state-related cost term Q and the input-related cost term R, respectively: where w ∈ (0, 1) is a weight parameter and ||.|| denotes the Euclidean norm.The metric (2) along with (3), (4) form a classical function from Optimal Regulation theory [42].The staterelated term (3) expresses the minimization of the settling time of the system (1) and the input-related term (4) penalizes the control input's Euclidean norm, which allows us to calculate the energy expenditure of the system (1).Furthermore, the weight's value plays a vital role in the final result of the metric (2).It depends on what specification (reducing settling time or input energy) we desire to pass on to our system.In summation, the aforementioned metric examines two factors: the first one is the amount of time that the whole MAS needs to reach the goal configuration and the second one is the energy of the control input signal, which is spent by each agent in order to successfully navigate the workspace.Thus, the objective of this work is to design appropriate velocity control policies u i such that collisions among the agents and the workspace boundary are safely avoided and the overall cost is rendered as small as possible for any q i ∈ W, i = 1, . . ., m.

Decentralized Navigation
In this section, we briefly discuss the sub-optimal DP.First, we are going to mention the definition of the navigation function, its properties, and the sub-optimal decentralized control law that each agent is equipped with.The control law for each agent, as mentioned before, navigates them to their destination optimally with respect to (2) and prevents them from colliding with the workspace boundary.Hence, we are going to briefly present the offpolicy RL-based method of [30], which we use to find the independent optimal navigation policy of each agent and determine the function that provides a warning regarding conflicts between the agents and suggestions of how to resolve such issues.

Navigation Function
According to [19], a Navigation Function (NF) is a real-valued map defined through cost functions, whose negated gradient field is attractive towards the goal configuration and repulsive with respect to obstacles.In the aforementioned paper, it has been shown that it is not possible to achieve strict global navigation, (i.e., with a globally attracting equilibrium state), because a smooth vector field on any sphere world, which has a unique attractor, should have at least as many saddles as obstacles.Our assumption that we have spherical robots (and thus spherical obstacles) does not constrain the generality of this work, since it has been proven [19] that navigation properties are invariant under diffeomorphisms.A NF can be defined as follows: It is analytic on F; 2.
It has only one minimum at q d ∈ int(F); 3.
Its Hessian at all critical points (zero gradient vector field) is full rank; 4.
Since we design controllers with the DP, where each agent needs to know the current position of the other close-distance agents with the goal of preventing collisions, and each of them operates solely based on its own navigation law, we consider the above class of navigation functions from [26] for each agent: is a polar function, where the column vector φ = [ φ1 , ..., φm ] T has its minimum at q d ∈ int(F), k is a positive constant and γ −1 i (0) denotes the desirable set and G −1 i (0) the set to avoid.We choose u * .This function is equal to the optimal cost function, which will be calculated with a novel iterative method such as the one in [30], which will be described in the next section.Thus, the gradient of V (i) u * shows the optimal navigation of every agent towards its desired destination.Notice that the cost function V  3) it is differentiable with regard to its variables, and (4) its Hessian is positive-definite.Hence, the sub-optimal control law for each agent will be given by with u si = w 1−w ||q i − q di || as discussed in [30], which leads the agent smoothly to its destination while avoiding collisions (u i is co-linear with ∇φ i ) and ϵ > 0 is a small constant.
Remark 1.The variable ϵ takes proximate values between 10 −4 and 10 −8 to avoid the singularity at the desired configuration.We choose the value according to the system's dynamic.
Remark 2. The estimation of the cost function (2) affects the behavior of the decentralized navigation controller (7).We can see this easily from the gradient of NF, which is equal to where γ di is equal to the estimated function V u * .As we observe in (8), the value and gradient of the γ di for any q i ∈ W affect the optimal reactive navigation of each agent and the way in which the agents interact to prevent conflicts among them.

Individual Optimal Policy
As we explained in the previous section, the control law for each agent involves an optimal navigation policy that leads it to its destination with minimal cost (2).Specifically, the function u * is the estimation of the optimal cost function extracted by an offpolicy iterative method presented in [30].In the present work, such an optimal policy is considered to be known and in this section, we are going to briefly analyze how to extract it for a simply connected workspace.More specifically, we adopt the method described in [30] to calculate optimal policies based on which the authors propose an off-policy iterative RL-based method that identifies control laws by minimizing a certain infinite-horizon cost function.The authors of the study chose the off-policy approach [43] because it has a reduced computational cost compared to the on-policy.Their method starts with an initial admissible control policy which leads the system to its goal for any starting point within W. In practice, they define the initial policy with an Artificial Harmonic Potential Field (AHPF)-based method [44].Moreover, they apply the Zeroing Barrier Function theory [45] to converge to the optimal control policy and also keep the agent at a safe distance from the workspace boundary ∂W.The approximation of the optimal cost function V (i) û is conducted by a Gaussian Radial Basis Functions (RBFs) Neural Network [46], whereas its gradient reveals the optimal navigation velocity of the agent.
In Figure 1, we show a simple example of the aforementioned algorithm.In this example, a point robot is in a simply connected workspace and we want to find its optimal navigation policy for any starting point in W. Figure 1a shows the initial admissible policy designed by the AHPF-based method and Figure 1b the optimal policy in which the algorithm converges.We observe in these two figures that the robot traverses the shortest distance with the optimal policy according to the three starting points.Finally, Figure 1c presents the minimum value of the cost function (2).As we can see, the algorithm achieves a sufficient approximation of the optimal solution within the workspace W. Nevertheless, when multiple agents navigate within the same workspace, collisions may occur, which will be handled by the following repulsive terms relating to nearby agents at risk of imminent collisions.

Resolving Conflicts via the Terms G i
In this section, we will refer to mathematical tools and terminology from [20,26] in order to define the function G i that is in charge of resolving the conflicts that lead to collisions among the agents.More specifically, the function G i is different for each agent and shows whether the agents are close enough to collide with each other or with the workspace boundary.Hence, we can define this function as described below: where G bi is the function of the i-th agent for avoiding conflicts with the boundary and G ai with the other agents.When one of these two functions approaches zero value, it means that the agent will face a conflict with some other nearby agents or the workspace boundary ∂W.

Calculate Function G bi
The function G bi is selected as where b wi is a distance metric between the agent and the workspace boundary.In [20,26] b wi defines the metric distance from the center of W to the agent's center, because a spherical world is assumed.Since we deal with generic simply connected workspaces, we cannot use this metric.Therefore, we define its equivalent as follows: where d(q i ) = min z∈∂W ||q i − z||.

Calculate Function G ai
Before defining the function G ai , we need to refer to some mathematical tools and definitions and assume that each agent tracks other nearby agents as moving obstacles.So A will symbolize the examining agent and O i , i = 1, . . .m − 1 will be the set of moving obstacles of A. The robot proximity function, which is a measure for calculating the distance between the agent and the obstacles, is defined as follows: We will use the term relation to describe an imminent collision between one agent and a moving obstacle in W. A binary relation is a relation between the agent and a single obstacle.Any relation can be expressed as a set of binary relations.Finally, the relation level is the number of elements in the set O i , i = 1, . . .m − 1.
Using the aforementioned terminology, we can define the set which involves the relations in level-l as R (l) = {R (l) i : i = 1, ..., s l } where s l = ( m−1 l ) and R (l) i is the i-th set of binary relations in level-l.In the same manner, we define the complementary set of j : i ̸ = j ∩ j = 1, ..., s l }.To keep things simple, we use R i and R i , with i = 1, ..., S and S = ∑ m−1 l=1 s l , to refer to the relation and their complementary relation, respectively.For example, in Figure 2, we see the relations of an agent which is in a workspace with two other mobile obstacles.The total relations are S = 3 and the maximum level of relations is two.For every relation shown in Figure 2 where β j is the proximity function (12) with j denoting each binary relation of the set R i .The property of RPF is that when this function has zero value, it shows that the agent conflicts with one or multiple obstacles according to the level of relation.A Relation Verification Function (RVF) is defined by where λ, h > 0 and B R i = ∏ k∈R i b k .The RVF is zero if a relation holds while no other relation from the same level holds, and has the following properties: (a) lim x→0 lim y→0 g R (x, y) = λ and (b) lim y→0 lim x→0 g R (x, y) = 0.It should be noted that due to the fact that we assume we are working with large workspaces, the function g R i can take large values in some relation R i , affecting the numerical stability of the algorithm.Consequently, we add a sigmoidal function, as defined below, which ensures the values are adequately small.Figure 3 shows the graph of the sigmoidal function along with its gradient: with τ m = 5.Having defined the RVF, then we can calculate the term G ai as follows: with its gradient given by: In Equation (17) we see another benefit of the sigmoid function.When a relation provides us with a big value from its RVF, the derivative of the sigmoidal function eliminates the gradient of the relation's RVF.As a result, when an agent is quite away from the others, it is navigated toward its destination by the optimal policy.

Proof of Correctness
Let ϵ > 0. Define a ball According to [20], we discriminate the following topologies for the function ϕ i : 1.
The destination point q di .2.
The free space boundary: The set near collisions: The set away from collision: The following theorem examines the properties of the proposed decentralized optimal navigation protocol via an overall multi-agent Navigation Function.Theorem 1.Let I 1 , I 2 ⊆ R be intervals and let φ : F → I 1 and σ : I 1 → I 2 be analytic.Define the composition φ : F → I 2 as φ = σ • φ.If σ is monotonically increasing on I 1 , then the set of critical points of φ and φ coincide and the (Morse) index of each critical point is identical.
Proof.The proof proceeds similarly to [19].Since we have a connected workspace and a well-defined function γ di , we first present some results for the navigation function φ i [20,26], which establish that the goal configurations are attainable without collisions and that there will always be a direction of movement decreasing the overall potential function.More specifically,

•
Since the workspace is connected, the destination point q di is a non-degenerate local minimum of φ i .• All critical points of φ i are in the interior of the free space.
• For every ϵ > 0, there exists a positive integer N(ϵ) such that if k > N(ϵ), then there are no critical points of φi in F 1 (ϵ).

•
There exists an ϵ 0 > 0, such that φi has no local minimum in F 0 (ϵ), as long as ϵ < ϵ 0 .Nevertheless, the aforementioned suppositions do not guarantee global convergence of the system state to the goal configurations on their own.To achieve such convergence, we define a time-invariant Lyapunov function for the overall system with respect to the positions of all agents.The candidate Lyapunov function is the sum of φ i .Thus, differentiating the candidate Lyapunov function φ = ∑ m i=1 φ i along the trajectories of the multi-agent system (i.e., substituting the navigation protocol ( 7)), we obtain a negative semi-definite derivative.Thus, the value of the Lyapunov function decreases (i.e., no collisions occur) and moreover, the system state converges to the largest invariant set of the system, which is composed of the desired configuration and the rest of equilibria that, fortunately for sufficiently large parameters h, k, attain a measured zero domain of attraction.Therefore, the system trajectories converge safely to the desired configuration initializing from almost everywhere within the workspace.

Results
In this section, we will present the simulation results of the aforementioned method.We will use 8 mobile holonomic agents with radius r = 0.1 which are in a workspace as shown in Figure 4 with the initial and desired positions given in Table 1.We will also run the same simulations with the Multi-Agent Poli-RRT* method to compare our method's result.We will provide a short description of this comparison method in the next subsection.For the simulations, we use the programming and numeric computing platform MATLAB in version R2022b and a computer with a processor Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz 2.90 GHz and RAM 8GB (Intel, Santa Clara, CA, USA) .

Multi-Agent Poli-RRT* Algorithm
The Multi-Agent Poli-RRT* method was first proposed by [40].In that work, the authors extended the Poli-RRT* algorithm [47] to a non-holonomic Multi-Agent system in which the agents move in a workspace with internal obstacles by adopting a priority-based approach.That is, the agents are ranked according to a priority criterion and the algorithm finds trajectories in sequence, starting from the highest-priority agent and moving to the lowest one.Furthermore, in each iteration, when it is their turn, the agents with lower priority will learn about the trajectories of the previous higher-priority agents from an updated list.This list involves the edges of the agents' trajectories and the time duration for which they were there.With this structure, the agents can prevent a possible collision and maintain a safe distance from the previous one.The aforementioned algorithm can be simply adjusted to our problem, and we adapt the agents' controller to be similar to the (7).

Simulations
Before we define the sub-optimal control law for each agent, we use the off-policy method to determine the optimal cost function for each agent.Because we want to attach equal importance to the minimization of the settling time and input response of the system (1), we set the w = 0.5.In Figure 5, we illustrate the optimal vector fields of each agent using the method presented in [30].Additionally, in Table 2 and Table 3 we present the time and cost value of each optimal policy, respectively.
As we mentioned earlier, to demonstrate the efficiency of the proposed sub-optimal controller, in this simulation, we examine how much the trajectory of each agent deviates from the optimal one of a SAS and the time that the global system will take to accomplish its task as we increase the number of agents starting simultaneously from the initial position.As the goal of the aforementioned MAS, we stated that all agents should reach their destination starting from the above initial position and following an optimal no-collision trajectory.To this end, we ran three simulations.In the first one, a group of two agents started towards their destination; in the second, a group of four started together; and in the last one, all the agents started together.We assume that an agent interacts only with the group's members and does not affect the navigation of the other agents which are not in the group.Notably, using the same aforementioned properties, we implemented the comparison method, too.For each simulation, we chose the group of agents that would interact and the variables of sub-optimal control law (k, λ, k, τ m ).
For simulations, we categorized the agents into the following groups: • Two members (A 1 , A 2 ), (A 3 , A 4 ), (A 5 , A 6 ) and (A 7 , A 8 ).We have to notice that in simulation one, the variables λ = h = NaN because the complementary relations of each agent are not defined when we observe an interaction between two agents.Also, the selected variables are the best ones available, providing the best sub-optimal no-collision trajectory for each agent.Finally, the accompanying videos for each simulation are given in slow motion in order to observe the collision avoidance property.In Table 2, we list the time durations each agent needs to accomplish its tasks and in Table 3 we list the cost value of each agent for each simulation.Moreover, we also state the global system time duration and cost value in the last line.Both of the navigation methods' results are shown in these tables too.Finally, in Figure 6 we have the optimal and sub-optimal greedy trajectory of each agent for optimal SAS and each simulation, and also in Figure 7 we present the trajectories of the two navigation methods during Simulation 3.

Discussion
In Figure 6, we provide the agents' trajectories during the above simulations.Specifically, we see the optimal (solid blue line) and sub-optimal trajectories of each agent separately (dashed lines).In all simulations, at the start and the end of their trajectory, we notice that every agent is close to its optimal navigation.On the contrary, in the center of workspace, where the group of agents meets up, each agent deviates from its optimal trajectory or remains idle for a short time period in order to avoid conflicts with the other agents.As a result, the sub-optimal cost value increases as we upsize the members of the group.This occurs as a result of the tuning of the parameters of the sub-optimal controller according to Theorem 1, which is implemented to establish convergence from almost everywhere in the workspace.If we had not selected proper parameter variables, we would have watched the agents remain idle in the center of the workspace without accomplishing their tasks owing to the presence of a stable unwanted equilibrium.
In comparison with the Multi-Agent Poli-RRT* method, in Tables 2 and 3, we showcase the global efficiency of our method.Also, we observe the efficiency of the priority attitude using the comparison method.That is, in Tables 2 and 3, the highest-priority agents have a smaller time duration and cost value than the respective agents when our method is used.We observe the same result in Figure 7, which lists the trajectory results of the two methods during Simulation 3. The trajectories of the highest-priority agents (dashed purple line) are more limited in their optimal trajectory than sub-optimal (dashed red line).However, when the updated list increases in each iteration, the lowest-priority agents start to deviate from their optimal trajectory.For the above reasons, our approach is globally valid in comparison with the above priority criterion algorithm.
Moreover, in the aforementioned results, as we expected, we observe a trade-off between total time duration and cost value.What we mean is that as we increase the number of agents that interact in the workspace, a decrease in the total time the agents require to accomplish their tasks occurs alongside an increase in the total cost value of MAS, since the agents deviate from their optimal trajectory.Figure 8 shows the decrease in the total time duration (blue lines) and percentage rise of the total cost value attached to the global optimal cost value (red lines) for the sub-optimal navigation (solid lines) and Multi-Agent Poli-RRT* (dashed lines) method where the horizontal axis is in logarithmic scale.This trade-off result provides additional confirmation of the global efficiency of our method.PoliRRT* method (dashed blue line) and the percentage increase of the total cost value of the suboptimal navigation method (solid red line) and Multi-Agent PoliRRT* method (dashed red line) according to the number of agents in the group.

Conclusions
In this paper, we propose a sub-optimal approach for the navigation problem of MAS in a simply connected workspace.We observed a trade-off between the time duration needed to fulfill the task and the overall cost value as we increase the number of members of a group.Moreover, we compare this method with the Multi-Agent Poli-RRT* to demonstrate the validity of our method compared to other research algorithms.We use this approach to avoid the computational complexity of RL-based methods, designing a sub-optimal decentralized control law.Despite the promising results of our study, there are several issues we should discuss.First of all, when using this method, the agents may be blocked by others if the workspace has narrow halls.Moreover, we cannot achieve global asymptotic navigation such as that referred to in the previous sections [19].In future research, we will extend this methodology to ensure it remains functional in narrow workspaces, coordinating the agents properly to ensure they reach their destination, and we will utilize algorithms based on game theory to asymptotically approach the global optimal navigation of the system.

Figure 2 .
Figure 2. The relations which an agent has when a workspace involves three mobile agents.A Relation Proximity Function (RPF) is a measure of the distance between the agent and the obstacles involved in a relation, and each relation has its own RPF.It is equal to b R i = ∑

Figure 4 .
Figure 4. Workspace and the initial (symbol I) and desired (symbol G) position of each agent.

Figure 8 .
Figure 8.Total time duration of sub-optimal navigation method (solid blue line) and Multi-Agent PoliRRT* method (dashed blue line) and the percentage increase of the total cost value of the suboptimal navigation method (solid red line) and Multi-Agent PoliRRT* method (dashed red line) according to the number of agents in the group.

Table 1 .
The initial and desired position of each agent.

Table 2 .
Time duration of each agent and whole system, with sub-optimal control method and Multi-Agent Poli-RRT* method.

Table 3 .
Cost value of each agent and whole system, with sub-optimal control method and Multi-Agent Poli-RRT* method.